Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.
Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.
Principal Developer Advocate at Cloudera
About
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over a ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science
Articles
Refcards
Messaging and Data Infrastructure for IoT
Apache Spark
Introduction to TensorFlow
Trend Reports
Data Pipelines
Enter the modern data stack: a technology stack designed and equipped with cutting-edge tools and services to ingest, store, and process data. No longer are we using data only to drive business decisions; we are entering a new era where cloud-based systems and tools are at the heart of data processing and analytics. Data-centric tools and techniques — like warehouses and lakes, ETL/ELT, observability, and real-time analytics — are democratizing the data we collect. The proliferation of and growing emphasis on data democratization results in increased and nuanced ways in which data platforms can be used. And of course, by extension, they also empower users to make data-driven decisions with confidence.In our 2023 Data Pipelines Trend Report, we further explore these shifts and improved capabilities, featuring findings from DZone-original research and expert articles written by practitioners from the DZone Community. Our contributors cover hand-picked topics like data-driven design and architecture, data observability, and data integration models and techniques.
Development at Scale
As organizations’ needs and requirements evolve, it’s critical for development to meet these demands at scale. The various realms in which mobile, web, and low-code applications are built continue to fluctuate. This Trend Report will further explore these development trends and how they relate to scalability within organizations, highlighting application challenges, code, and more.
Enterprise AI
In recent years, artificial intelligence has become less of a buzzword and more of an adopted process across the enterprise. With that, there is a growing need to increase operational efficiency as customer demands arise. AI platforms have become increasingly more sophisticated, and there has become the need to establish guidelines and ownership. In DZone’s 2022 Enterprise AI Trend Report, we explore MLOps, explainability, and how to select the best AI platform for your business. We also share a tutorial on how to create a machine learning service using Spring Boot, and how to deploy AI with an event-driven platform. The goal of this Trend Report is to better inform the developer audience on practical tools and design paradigms, new technologies, and the overall operational impact of AI within the business. This is a technology space that's constantly shifting and evolving. As part of our December 2022 re-launch, we've added new articles pertaining to knowledge graphs, a solutions directory for popular AI tools, and more.
Machine Learning
Industry leaders discuss the latest trends in machine learning. We dive into using machine learning with microserivces, deploying machine learning models in real-life applications, and where the field is going over the next 12 months.
Comments
Jun 26, 2023 · Jordan Baker
Any updates to this since KRaFT?
Dec 12, 2022 · Tim Spann
https://github.com/tspannhw/pulsar-thermal-pinot/blob/main/weather.md
Sep 27, 2020 · Tim Spann
I have put out three updated release NARs for New DJL 0.8.0 framework.
Jun 21, 2019 · Tim Spann
Put them into 2 docker nodes. Are you using https://hub.docker.com/r/apache/nifi-registry Is configuration right? It has to store data.https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html
Jun 19, 2019 · Tim Spann
Upgrade to 1.9. Is it kerberized or have a login?
Jan 09, 2019 · Tim Spann
Spark 1.6 is gone from NiFi 1.8. Livy is installed as part of HDP 2.6/3.0/3.1.
Nov 27, 2018 · Tim Spann
See https://community.hortonworks.com/articles/227560/real-time-stock-processing-with-apache-nifi-and-ap.html
Source: https://community.hortonworks.com/storage/attachments/93299-stock-to-kafka.xml https://community.hortonworks.com/storage/attachments/93298-stocks-copy.json https://github.com/tspannhw/stocks-nifi-kafka
Install Java 8
Install NiFi https://nifi.apache.org/download.html
Install Kafka https://kafka.apache.org/downloads
Or get a linux box or big VM
https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/index.html
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/index.html
Aug 29, 2018 · Tim Spann
code is in github https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html https://github.com/tspannhw/UsingGluonCV/tree/master https://github.com/tspannhw/OpenSourceComputerVision/
Apr 11, 2018 · Tim Spann
https://github.com/minimaxir/person-blocker is the imaging tool.
The flow is apache nifi, it's not visualization it's programming
Apr 09, 2018 · Tim Spann
we don't have one to list all topics. you could call kafka-topics.sh to get the list or make an API call
Apr 08, 2018 · Tim Spann
https://community.hortonworks.com/articles/57262/integrating-apache-nifi-and-apache-kafka.html
https://community.hortonworks.com/articles/155527/ingesting-golden-gate-records-from-apache-kafka-an.html
ConsumeKafkaRecord_1_0 (comma list of all topics) to ConvertRecord to PutHDFS you may add ConvertAvroToOrc or PutParquet
Apr 08, 2018 · Tim Spann
it has worked for me. post here https://community.hortonworks.com/gallery/index.html https://community.hortonworks.com/questions/1629/nifi-connection-to-mssql-server-db.html https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html
Mar 23, 2018 · Tim Spann
https://github.com/bazaarvoice/jolt/issues/130
I use default values. https://community.hortonworks.com/articles/149910/handling-hl7-records-part-1-hl7-ingest.html
Mar 14, 2018 · Tim Spann
good point, this was a follow up to the other article, should have had a review. sorry.
Mar 12, 2018 · Tim Spann
Josh Long is great, when I worked at Pivotal I got a few articles in the Spring Weekly list.
Mar 12, 2018 · Tim Spann
Spring for Hadoop hasn't been updated in forever. It is stuck on HDP 2.2 and we are on HDP 2.6. Spring Data JDBC and Spring Data Repositories make a lot of sense. I should do that, I'll do an update when I get the chance. Maybe add Java 9 and some other goodies. Thanks for the suggestions. If you want to fork the github repo, please do!
Mar 06, 2018 · Tim Spann
Use getfile or any of the file reading ones. NiFi doesn't care file type. Then to convert to avro just use the ConvertRecord it can do many types including JSON
Feb 08, 2018 · Tim Spann
http://opennlp.sourceforge.net/models-1.5/
Feb 08, 2018 · Tim Spann
You need to install the OpenNLP models and reference that in the processor properties. Also OpenNLP misses a lot of names and locations. Accuracy is kind of hit or miss. https://github.com/tspannhw/nifi-nlp-processor https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
Jan 19, 2018 · Tim Spann
https://community.hortonworks.com/articles/166802/versioned-dataflows-with-apache-nifi-15-and-apache.html
Aug 13, 2017 · Jean-Paul Azar
Jean-Paul,
There's another open source registry that integrates with Kafka and other systems extremely well and has a great REST API and UI:
https://github.com/hortonworks/registry
It has versioning and is moving to adding protocol buffers and going into Apache.
Have you tried that one?
Jul 10, 2017 · Tim Spann
the convertavrotoorc processor generates a ddl attribute with the create external table ddl needed you can send that directly to a puthiveql processor to build the table or di yourself
Jul 06, 2017 · Tim Spann
NiFi can run on a cluster of servers to distribute the load. NiFi generally supports 50 megabytes a second per node
Jun 18, 2017 · Tim Spann
http://www.techrepublic.com/google-amp/article/the-battle-for-apache-cassandra-highlights-major-problem-with-open-source-projects/
Jun 18, 2017 · Tim Spann
google it. apacheissues andonlydatastax cancommit
Jun 18, 2017 · Tim Spann
youd have to hand write that and start an odd project. theres some engines rhat use hbase for that part. you could fork one and do it yourself. but why? unless you work for datastax, cassandra is fir practical purposes closed source
Jun 18, 2017 · Tim Spann
no. spark id useful for processing. no need for cassandra ever
May 16, 2017 · Tim Spann
not hadoops but mine. its time to update i will run all my tools and report results
May 13, 2017 · Tim Spann
awesome
Apr 28, 2017 · Sarah Davis
have you seen the open source Superset from airbnb and hortonworks
Mar 16, 2017 · Tim Spann
paused on that one will try this weekend
Feb 19, 2017 · Tim Spann
https://dzone.com/articles/deep-learning-and-machine-learning-guide-part-ii
Feb 19, 2017 · Tim Spann
https://dzone.com/articles/deep-learning-and-machine-learning-guide-part-ii
Feb 09, 2017 · Tim Spann
Part II is in progress. Anything you want to see.
Feb 08, 2017 · Tim Spann
start with some basic NLTK if you know Python or start with the google TensorFlow examples.
Jan 05, 2017 · Tim Spann
great article.
Jan 05, 2017 · Tim Spann
Definitely no free lunches. Large proprietary interfaces, any programs that cannot handle the extra time required for network calls or extra startup costs. Any existing code that works and doesn't require major changes. Not everything is greenfield. There are a lot of negatives, mostly around networking costs, duplication and transaction concerns.
Jan 02, 2017 · Tim Spann
very cool
Oct 11, 2016 · Tim Spann
tensorflow andphoenixin Pythonandthey scale out fine
Oct 11, 2016 · Tim Spann
JVM is great for concurrency.. You can do java, scala, jython, clojure and a ton other JVM langs
Oct 11, 2016 · Tim Spann
there is Spark in Scala.
Oct 11, 2016 · Tim Spann
Python is used in a lot of visualization and machine learning. At this point and with all the nodes you can run in a big hadoop / spark cluster, it doesn't matter. YARN runs things well.
Oct 11, 2016 · Tim Spann
Scala and Python are nice for data scientists and engineers. But most of the big data infrastructure is in Java. Java is way too verbose for most stuff, but Google chose it for Apache Beam. I like Apache NiFi so I don't have to write too much code when I don't need. Same for Zeppelin, I like doing bits of Scala and Python code when necessary.
Sep 15, 2016 · Tim Spann
Yes Java and Scala are important. The first half of the article is about machine learning. The second half lists 4 python libraries. probably could be 2 different articles, but I was looking at both. H20 is Java and Scala and I like both languages.
Aug 18, 2016 · Tim Spann
I have used Titan, is a really cool tool
Aug 16, 2016 · Tim Spann
Post a blog on your test query performance. I would like to see all the major Open Source SQL on Hadoop engines tested from Hive2 to HAWQ to Presto to Spark SQL to Drill, etc...
Aug 11, 2016 · Tim Spann
Cassandra is definitely an option for some use cases, but it overlaps with Hadoop and HBase. Cassandra requires some specialized knowledge and is it's own thing. If you are are interested in Cassandra evaluate the SMACK stack. It's not as integrated into the big data tools and stack. There's a big learning curve and those skill don't transfer over. If you learn Hadoop and Spark, you can work with tons of tools. Start with the industry standards and then look at exotic databases when you have some special needs.
Aug 10, 2016 · Tim Spann
tim at sparkdeveloper or @paasdev on twitter. or threw meetup http://www.meetup.com/futureofdata-princeton/
Aug 10, 2016 · Tim Spann
Interesting, contact me, I would like to do an article on your product
Aug 10, 2016 · Tim Spann
Trafodion looks very cool. When I was at HP I remember hearing about it. Tweet at me @paasdev if you want to give me details for an article, thanks.
Jun 02, 2016 · Tim Spann
Probably fixing n
May 12, 2016 · Tim Spann
twitter message me at PaasDev and then I'll email you.
May 12, 2016 · Tim Spann
Fixed, thanks for the catch! If you have any more information on DL4J, I would love to do an article on it.
Apr 19, 2016 · Tim Spann
Some ways to do Java 8. https://dzone.com/articles/zlwell-written-java
Apr 15, 2016 · Tim Spann
i have seen a lot of messy code. You bring it into IntelliJ / Eclipse and format the code, hide the bad comments and run some static code analysis tools. Java is nice for that.
Jan 30, 2016 · Tim Spann
Nice, that is very cool. Thanks!
Jan 28, 2016 · Tim Spann
This is a horrible comment
// don't touch these lines
int crazyVariableDontRemove = -999;
// don't touch above lines
Jan 17, 2016 · Tim Spann
nice slidedeck! Here it is as a link.
Jan 03, 2016 · Tim Spann
Docker will have a tough year ahead
Jan 03, 2016 · Tim Spann
looks great
Jan 03, 2016 · Tim Spann
Typo fixed thanks!
Jan 03, 2016 · Tim Spann
most likely that would be possible with BlueMix
Dec 31, 2015 · Tim Spann
That's a great list!
Apr 15, 2015 · Mr B Loid
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Mr B Loid
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Mr B Loid
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Mr B Loid
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Allen Coin
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Allen Coin
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Allen Coin
This article was from 2013. Those are good options too. This should probably be upgraded.
Apr 15, 2015 · Allen Coin
This article was from 2013. Those are good options too. This should probably be upgraded.
Nov 03, 2014 · Geertjan Wielenga
Have you seen Spring Data REST ?
Nov 03, 2014 · Benjamin Ball
Have you seen Spring Data REST ?
Oct 02, 2014 · Mr B Loid
What about Gemfire XD?
http://www.pivotal.io/big-data/pivotal-gemfire-xd
What about Gemfire?
http://www.pivotal.io/big-data/pivotal-gemfire
Oct 02, 2014 · Darrell Burgan
What about Gemfire XD?
http://www.pivotal.io/big-data/pivotal-gemfire-xd
What about Gemfire?
http://www.pivotal.io/big-data/pivotal-gemfire
Jun 05, 2013 · Eric Gregory
Good catch. I like your second idea better though.
Jun 05, 2013 · Eric Gregory
Good catch. I like your second idea better though.
Jun 03, 2013 · Tim Spann
Feb 02, 2013 · Ken Lee
Ours was dying on hotmail and gmail only. Turned out to be a mail server issue with one particular email address used for sending. Black list security issue.
/**
*
* @param subject
* @param message
* @param emailAddress
* @return
*/
public final static String sendEmail(String subject, String message, String emailAddress) {
// email sent
boolean emailSent = false;
// email message
StringBuilder emailLoggingMessage =
new StringBuilder(UserProvisioningUtility.gkBUFFER_SIZE);
// start of message
emailLoggingMessage
.append(Messages.getString("UserProvisioningProcess.SMTP_MESSAGE_LOG"))
.append(emailAddress).append(System.getProperty("line.separator"));
Email email = new SimpleEmail();
//email.setHostName("smtp.googlemail.com");
//email.setSmtpPort(465);
email.setHostName(Messages.getString("UserProvisioningProcess.EMAIL_SERVER"));
int smtpPort = 25;
if ( null != Messages.getString("UserProvisioningProcess.EMAIL_PORT")) {
try {
smtpPort = Integer.parseInt(Messages.getString("UserProvisioningProcess.EMAIL_PORT"));
}
catch(Throwable t) {
emailLoggingMessage.append(t.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
}
email.setSmtpPort(smtpPort);
// email.setAuthenticator(new DefaultAuthenticator("username", "password"));
// email.setSSLOnConnect(true);
try {
email.setFrom(Messages.getString("UserProvisioningProcess.FROM_EMAIL"));
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
email.setSubject(subject);
try {
email.setMsg(message);
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
try {
email.addTo(emailAddress);
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
try {
email.send();
emailSent = true;
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
// sent message
if (emailSent) {
emailLoggingMessage.append(
Messages.getString("UserProvisioningProcess.SMTP_MESSAGE_SENT"))
.append(System.getProperty("line.separator"));
}
return emailLoggingMessage.toString();
}
Feb 02, 2013 · Allen Coin
Ours was dying on hotmail and gmail only. Turned out to be a mail server issue with one particular email address used for sending. Black list security issue.
/**
*
* @param subject
* @param message
* @param emailAddress
* @return
*/
public final static String sendEmail(String subject, String message, String emailAddress) {
// email sent
boolean emailSent = false;
// email message
StringBuilder emailLoggingMessage =
new StringBuilder(UserProvisioningUtility.gkBUFFER_SIZE);
// start of message
emailLoggingMessage
.append(Messages.getString("UserProvisioningProcess.SMTP_MESSAGE_LOG"))
.append(emailAddress).append(System.getProperty("line.separator"));
Email email = new SimpleEmail();
//email.setHostName("smtp.googlemail.com");
//email.setSmtpPort(465);
email.setHostName(Messages.getString("UserProvisioningProcess.EMAIL_SERVER"));
int smtpPort = 25;
if ( null != Messages.getString("UserProvisioningProcess.EMAIL_PORT")) {
try {
smtpPort = Integer.parseInt(Messages.getString("UserProvisioningProcess.EMAIL_PORT"));
}
catch(Throwable t) {
emailLoggingMessage.append(t.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
}
email.setSmtpPort(smtpPort);
// email.setAuthenticator(new DefaultAuthenticator("username", "password"));
// email.setSSLOnConnect(true);
try {
email.setFrom(Messages.getString("UserProvisioningProcess.FROM_EMAIL"));
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
email.setSubject(subject);
try {
email.setMsg(message);
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
try {
email.addTo(emailAddress);
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
try {
email.send();
emailSent = true;
} catch (EmailException e) {
emailLoggingMessage.append(e.getLocalizedMessage()).append(
System.getProperty("line.separator"));
}
// sent message
if (emailSent) {
emailLoggingMessage.append(
Messages.getString("UserProvisioningProcess.SMTP_MESSAGE_SENT"))
.append(System.getProperty("line.separator"));
}
return emailLoggingMessage.toString();
}
Jan 29, 2013 · Allen Coin
Eclipse and other IDEs have automation in the sense of Wizards and configuration tools that do a lot of the basic work for you. There's also plugins to do some things. You can also have scripts in Maven, Roo, Gradle and Ant that can be run from the IDE to automate things.
And once you have a project setup, some shell code and unit tests and version control configured, Jenkins can automate your build/deploy/tests/... That first project setup is usually a wizard, template or a github base project. Spring IDE and JBOSS have some extra templates that really help you get started. That's still a lot of complexity.
Jul 31, 2012 · Eric Genesky