Principal Developer Advocate and Field Engineer at Data In Motion
Company website: https://datainmotion.dev
Tim Spann is a Principal Developer Advocate. He works with Python, Generative AI, LLM, Vectors, Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over a ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science
Messaging and Data Infrastructure for IoT
Apache Spark
Introduction to TensorFlow
Data Pipelines
Enter the modern data stack: a technology stack designed and equipped with cutting-edge tools and services to ingest, store, and process data. No longer are we using data only to drive business decisions; we are entering a new era where cloud-based systems and tools are at the heart of data processing and analytics. Data-centric tools and techniques — like warehouses and lakes, ETL/ELT, observability, and real-time analytics — are democratizing the data we collect. The proliferation of and growing emphasis on data democratization results in increased and nuanced ways in which data platforms can be used. And of course, by extension, they also empower users to make data-driven decisions with confidence.In our 2023 Data Pipelines Trend Report, we further explore these shifts and improved capabilities, featuring findings from DZone-original research and expert articles written by practitioners from the DZone Community. Our contributors cover hand-picked topics like data-driven design and architecture, data observability, and data integration models and techniques.
Development at Scale
As organizations’ needs and requirements evolve, it’s critical for development to meet these demands at scale. The various realms in which mobile, web, and low-code applications are built continue to fluctuate. This Trend Report will further explore these development trends and how they relate to scalability within organizations, highlighting application challenges, code, and more.
Enterprise AI
In recent years, artificial intelligence has become less of a buzzword and more of an adopted process across the enterprise. With that, there is a growing need to increase operational efficiency as customer demands arise. AI platforms have become increasingly more sophisticated, and there has become the need to establish guidelines and ownership.In DZone's 2022 Enterprise AI Trend Report, we explore MLOps, explainability, and how to select the best AI platform for your business. We also share a tutorial on how to create a machine learning service using Spring Boot, and how to deploy AI with an event-driven platform. The goal of this Trend Report is to better inform the developer audience on practical tools and design paradigms, new technologies, and the overall operational impact of AI within the business.This is a technology space that's constantly shifting and evolving. As part of our December 2022 re-launch, we've added new articles pertaining to knowledge graphs, a solutions directory for popular AI tools, and more.
Machine Learning
Industry leaders discuss the latest trends in machine learning. We dive into using machine learning with microserivces, deploying machine learning models in real-life applications, and where the field is going over the next 12 months.
Comments
Sep 02, 2024 · Tim Spann
This is the code cleaned up for rag injupyter
Aug 30, 2024 · Tim Spann
Source Code
https://github.com/tspannhw/AIM-NYCStreetCams/tree/main/MultipleVectorsAdvanced%20SearchDataModelDesign
Aug 30, 2024 · Tim Spann
Example run demo in Youtube.
video
https://www.youtube.com/watch?v=HaRc0rsaMo0
Jun 26, 2023 · Jordan Baker
Any updates to this since KRaFT?
Dec 12, 2022 · Tim Spann
https://github.com/tspannhw/pulsar-thermal-pinot/blob/main/weather.md
Jun 21, 2019 · Tim Spann
Put them into 2 docker nodes. Are you using https://hub.docker.com/r/apache/nifi-registry Is configuration right? It has to store data.https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html
Jun 19, 2019 · Tim Spann
Upgrade to 1.9. Is it kerberized or have a login?
Nov 27, 2018 · Tim Spann
See https://community.hortonworks.com/articles/227560/real-time-stock-processing-with-apache-nifi-and-ap.html
Source: https://community.hortonworks.com/storage/attachments/93299-stock-to-kafka.xml https://community.hortonworks.com/storage/attachments/93298-stocks-copy.json https://github.com/tspannhw/stocks-nifi-kafka
Install Java 8
Install NiFi https://nifi.apache.org/download.html
Install Kafka https://kafka.apache.org/downloads
Or get a linux box or big VM
https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/index.html
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/index.html
Apr 09, 2018 · Tim Spann
we don't have one to list all topics. you could call kafka-topics.sh to get the list or make an API call
Apr 08, 2018 · Tim Spann
https://community.hortonworks.com/articles/57262/integrating-apache-nifi-and-apache-kafka.html
https://community.hortonworks.com/articles/155527/ingesting-golden-gate-records-from-apache-kafka-an.html
ConsumeKafkaRecord_1_0 (comma list of all topics) to ConvertRecord to PutHDFS you may add ConvertAvroToOrc or PutParquet
Apr 08, 2018 · Tim Spann
it has worked for me. post here https://community.hortonworks.com/gallery/index.html https://community.hortonworks.com/questions/1629/nifi-connection-to-mssql-server-db.html https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html
Mar 23, 2018 · Tim Spann
https://github.com/bazaarvoice/jolt/issues/130
I use default values. https://community.hortonworks.com/articles/149910/handling-hl7-records-part-1-hl7-ingest.html
Mar 14, 2018 · Tim Spann
good point, this was a follow up to the other article, should have had a review. sorry.
Mar 12, 2018 · Tim Spann
Josh Long is great, when I worked at Pivotal I got a few articles in the Spring Weekly list.
Mar 12, 2018 · Tim Spann
Spring for Hadoop hasn't been updated in forever. It is stuck on HDP 2.2 and we are on HDP 2.6. Spring Data JDBC and Spring Data Repositories make a lot of sense. I should do that, I'll do an update when I get the chance. Maybe add Java 9 and some other goodies. Thanks for the suggestions. If you want to fork the github repo, please do!
Feb 08, 2018 · Tim Spann
http://opennlp.sourceforge.net/models-1.5/
Feb 08, 2018 · Tim Spann
You need to install the OpenNLP models and reference that in the processor properties. Also OpenNLP misses a lot of names and locations. Accuracy is kind of hit or miss. https://github.com/tspannhw/nifi-nlp-processor https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
Aug 13, 2017 · Jean-Paul Azar
Jean-Paul,
There's another open source registry that integrates with Kafka and other systems extremely well and has a great REST API and UI:
https://github.com/hortonworks/registry
It has versioning and is moving to adding protocol buffers and going into Apache.
Have you tried that one?
Jul 06, 2017 · Tim Spann
NiFi can run on a cluster of servers to distribute the load. NiFi generally supports 50 megabytes a second per node
Apr 28, 2017 · Sarah Davis
have you seen the open source Superset from airbnb and hortonworks
Mar 16, 2017 · Tim Spann
paused on that one will try this weekend
Apr 19, 2016 · Tim Spann
Some ways to do Java 8. https://dzone.com/articles/zlwell-written-java
Apr 15, 2016 · Tim Spann
i have seen a lot of messy code. You bring it into IntelliJ / Eclipse and format the code, hide the bad comments and run some static code analysis tools. Java is nice for that.
Jun 05, 2013 · Eric Gregory
Good catch. I like your second idea better though.
Jun 05, 2013 · Eric Gregory
Good catch. I like your second idea better though.