DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How are you handling the data revolution? We want your take on what's real, what's hype, and what's next in the world of data engineering.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Core Badge
Avatar

Tim Spann

DZone Core CORE

Senior Sales Engineer at Snowflake

Company website: https://datainmotion.dev

Hightstown, US

Joined Jun 2008

https://www.datainmotion.dev/

About

Tim Spann is a Senior Sales Engineer. He works with Python, SQL, Snowflake, Cortex AI, Apache Iceberg, ML, Notebooks, Jupyter Notebooks, Generative AI, LLM, Vectors, Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over a ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science

  • Education

Education

Montclair State University · Computer Science

MS

Jan 1995 - Jan 2000

Stats

Reputation: 16769
Pageviews: 3.5M
Articles: 67
Comments: 25

Expertise

AI/ML Expertise Icon

AI/ML

IoT Expertise Icon

IoT

  • Articles
  • Refcards
  • Trend Reports
  • Events
  • Comments

Articles

article thumbnail
Multimodal RAG Is Not Scary, Ghosts Are Scary
Run ghastly multimodal analytics and Retrieval Augmented Generation with our "ghosts" collections in the open-source Milvus vector database.
October 30, 2024
· 4,106 Views · 3 Likes
article thumbnail
How to Improve RAG Quality by Storing Knowledge Graphs in Vector Databases
This tutorial provides an in-depth look at how to improve standard LLM and vector database RAG quality with knowledge graphs.
September 30, 2024
· 4,855 Views · 8 Likes
article thumbnail
Using Flink, Kafka, and NiFi for Real-Time Airport Arrivals and Departures
Learn to build a streaming application using the best of NiFi, Kafka, and Flink for event-driven apps. OpenSky networks rest feeds provide all the data.
September 26, 2024
· 4,040 Views · 5 Likes
article thumbnail
Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data
Learn how to build, architect, and design complex unstructured data applications for vector databases utilizing Milvus, GenAI, LangChain, YoLo, and more.
August 29, 2024
· 5,476 Views · 4 Likes
article thumbnail
Mixtral: Generative Sparse Mixture of Experts in DataFlows
Explore the use of a new type of GenAI LLM with streaming pipelines in this tutorial about how to build a real-time LLM flow with Mixtral AI's new open model.
March 13, 2024
· 3,047 Views · 2 Likes
article thumbnail
Building a Generative AI Processor in Python
Why not create a Python Processor for Apache NiFi 2.0.0? In this tutorial, discover whether the challenge to do so is easy or difficult.
January 23, 2024
· 4,245 Views · 5 Likes
article thumbnail
Building a Real-Time Slackbot With Generative AI
Learn how to build a cool Slackbot with Apache NiFi, LLM, Foundation Models, and streaming. We will cover model choices and integration.
November 29, 2023
· 3,132 Views · 5 Likes
article thumbnail
Real-Time Analytics: All Data, Any Data, Any Scale, at Any Time
All data, any data, any scale, at any time: Learn why data pipelines need to embrace real-time data streams to harness the value of data as it is created.
October 3, 2023
· 6,988 Views · 6 Likes
article thumbnail
What Is a Modern Developer? In Today’s World, It’s a Citizen Engineer
The modern developer is an overarching term covering a large variety of different roles, responsibilities, and skills. In today’s world, it’s a citizen engineer.
July 20, 2023
· 12,395 Views · 7 Likes
article thumbnail
Streaming Change Data Capture Data Two Ways
Walk through how to use Debezium with Flink, Kafka, and NiFi for Change Data Capture using two different mechanisms: Kafka Connect and Flink SQL.
July 3, 2023
· 6,044 Views · 2 Likes
article thumbnail
Harnessing the Power of NiFi: Building a Seamless Flow To Ingest PM2.5 Data From a MiNiFi Java Agent With a Particle Sensor
In this tutorial, discover how to use MiNiFi, NiFi, Kafka, and Flink for sensor ingest, processing, analytics, and visualization.
June 13, 2023
· 5,756 Views · 4 Likes
article thumbnail
Real-Time Stream Processing With Hazelcast and StreamNative
In this article, readers will learn about real-time stream processing with Hazelcast and StreamNative in a shorter time, along with demonstrations and code.
Updated January 31, 2023
· 12,637 Views · 7 Likes
article thumbnail
How to Create a Real-Time Scalable Streaming App Using Apache NiFi, Apache Pulsar, and Apache Flink SQL
In this article, we'll cover how and when to use Pulsar with NiFi and Flink as you build your streaming application.
January 22, 2023
· 7,916 Views · 7 Likes
article thumbnail
Building Real-Time Weather Dashboards With Apache Pinot
Let's build a real-time weather dashboard application with Apache Pinot and Apache Pulsar.
December 11, 2022
· 7,427 Views · 4 Likes
article thumbnail
Real-Time Pulsar and Python Apps on a Pi
Build a Python application on a Raspberry Pi that streams sensor data and more from the edge to any and all data stores while processing data in event time.
April 5, 2022
· 9,234 Views · 7 Likes
article thumbnail
Deploying AI With an Event-Driven Platform
Explore advantages of an event-driven platform for model deployment of your platform and create greater accessibility to your model classifications results.
March 14, 2022
· 8,209 Views · 3 Likes
article thumbnail
Generating Simulated Streaming Data
In this article, learn more about using the Python library, Faker, to build synthetic data for tests and utilize Pulsar to send messages to topics at scale.
March 6, 2022
· 8,284 Views · 5 Likes
article thumbnail
Pulsar in Python on Pi for Sensors
Utilizing Apache Pulsar's Python Client on Raspberry Pi - FLiP-Py Stack
February 27, 2022
· 14,697 Views · 4 Likes
article thumbnail
Real-Time Edge Application With Apache Pulsar
In this article, you will learn how to build edge applications using Pulsar, the challenges of developing edge applications and why Apache Pulsar is the solution.
December 18, 2021
· 26,864 Views · 4 Likes
article thumbnail
Introducing Cloudera SQL Stream Builder (SSB)
SSB is an improved release of Eventador's SQL Stream Builder with integration into Cloudera Manager, Cloudera Flink, and other streaming tools.
Updated June 6, 2021
· 14,466 Views · 5 Likes
article thumbnail
Modern Apache NiFi Load Balancing
In this article, we discuss the newest ways to perform load balancing in Apache NiFi (version 1.8.0^) that now make Remote Process Groups obsolete.
December 19, 2019
· 24,918 Views · 3 Likes
article thumbnail
Exploring Apache NiFi 1.10: Parameters and Stateless Engine
In this article, we discuss the new version of Apache NiFi and how to use two of the biggest new features: parameters and stateless.
November 26, 2019
· 29,010 Views · 4 Likes
article thumbnail
Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Multiple Sinks
How-to move off of legacy Flume and into modern Apache NiFi for data pipelines.
October 15, 2019
· 18,666 Views · 9 Likes
article thumbnail
Real-Time Transit Feed Data Processing With Apache NiFi
Ingesting and processing real-time transit feeds at scale with Apache NiFi.
August 26, 2019
· 13,764 Views · 4 Likes
article thumbnail
Creating Apache Kafka Topics Dynamically as Part of a DataFlow
Creating Kafka topics programmatically as part of streaming.
August 13, 2019
· 26,596 Views · 5 Likes
article thumbnail
Edge Data Processing With Jetson Nano
Learn more about edge data processing with Jetson Nano.
July 24, 2019
· 13,723 Views · 5 Likes
article thumbnail
Advanced XML Processing With Apache NiFi 1.9.1
In this post, we'll be using Apache NiFi to simply process very complex XML and RSS data files.
April 2, 2019
· 19,835 Views · 7 Likes
article thumbnail
HDP 3.1 Released! All The Kafka!
A major upgrade to Hadoop distribution has been released. Read on to learn how to upgrade to it.
December 18, 2018
· 17,227 Views · 7 Likes
article thumbnail
Real-Time Stock Processing With Apache NiFi and Apache Kafka, Part 1
A big data expert starts his series on using Kafka and NiFi for real-time data flow programming.
November 20, 2018
· 43,221 Views · 15 Likes
article thumbnail
Simple Apache NiFi Operations Dashboard (Part 2): Spring Boot
In this post, we continue with uilding a dashboard with the open source big data platform Apache NiFi, using Spring Boot 2.0.6.
October 24, 2018
· 14,622 Views · 9 Likes

Refcards

Refcard #263

Messaging and Data Infrastructure for IoT

Messaging and Data Infrastructure for IoT

Refcard #204

Apache Spark

Apache Spark

Refcard #251

Introduction to TensorFlow

Introduction to TensorFlow

Trend Reports

Trend Report

Data Pipelines

Enter the modern data stack: a technology stack designed and equipped with cutting-edge tools and services to ingest, store, and process data. No longer are we using data only to drive business decisions; we are entering a new era where cloud-based systems and tools are at the heart of data processing and analytics. Data-centric tools and techniques — like warehouses and lakes, ETL/ELT, observability, and real-time analytics — are democratizing the data we collect. The proliferation of and growing emphasis on data democratization results in increased and nuanced ways in which data platforms can be used. And of course, by extension, they also empower users to make data-driven decisions with confidence.In our 2023 Data Pipelines Trend Report, we further explore these shifts and improved capabilities, featuring findings from DZone-original research and expert articles written by practitioners from the DZone Community. Our contributors cover hand-picked topics like data-driven design and architecture, data observability, and data integration models and techniques.

Data Pipelines

Trend Report

Development at Scale

As organizations’ needs and requirements evolve, it’s critical for development to meet these demands at scale. The various realms in which mobile, web, and low-code applications are built continue to fluctuate. This Trend Report will further explore these development trends and how they relate to scalability within organizations, highlighting application challenges, code, and more.

Development at Scale

Trend Report

Enterprise AI

In recent years, artificial intelligence has become less of a buzzword and more of an adopted process across the enterprise. With that, there is a growing need to increase operational efficiency as customer demands arise. AI platforms have become increasingly more sophisticated, and there has become the need to establish guidelines and ownership.In DZone's 2022 Enterprise AI Trend Report, we explore MLOps, explainability, and how to select the best AI platform for your business. We also share a tutorial on how to create a machine learning service using Spring Boot, and how to deploy AI with an event-driven platform. The goal of this Trend Report is to better inform the developer audience on practical tools and design paradigms, new technologies, and the overall operational impact of AI within the business.This is a technology space that's constantly shifting and evolving. As part of our December 2022 re-launch, we've added new articles pertaining to knowledge graphs, a solutions directory for popular AI tools, and more.

Enterprise AI

Trend Report

Machine Learning

Industry leaders discuss the latest trends in machine learning. We dive into using machine learning with microserivces, deploying machine learning models in real-life applications, and where the field is going over the next 12 months.

Machine Learning

Events

Watch

On Demand Event Thumbnail

Data Pipelines: Investigating the Modern Data Stack

Presenter: Decodeable & Informatica

Comments

Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data

Sep 02, 2024 · Tim Spann

This is the code cleaned up for rag injupyter



Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data

Aug 30, 2024 · Tim Spann

Source Code


https://github.com/tspannhw/AIM-NYCStreetCams/tree/main/MultipleVectorsAdvanced%20SearchDataModelDesign

Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data

Aug 30, 2024 · Tim Spann

Example run demo in Youtube.


video


https://www.youtube.com/watch?v=HaRc0rsaMo0

Kafka Connectors Without Kafka

Jun 26, 2023 · Jordan Baker

Any updates to this since KRaFT?

Building Real-Time Weather Dashboards With Apache Pinot

Dec 12, 2022 · Tim Spann

https://github.com/tspannhw/pulsar-thermal-pinot/blob/main/weather.md

DevOps for Apache NiFi 1.7 and More

Jun 21, 2019 · Tim Spann

Put them into 2 docker nodes. Are you using https://hub.docker.com/r/apache/nifi-registry Is configuration right? It has to store data.https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html


DevOps for Apache NiFi 1.7 and More

Jun 19, 2019 · Tim Spann

Upgrade to 1.9. Is it kerberized or have a login?


Real-Time Stock Processing With Apache NiFi and Apache Kafka, Part 1

Nov 27, 2018 · Tim Spann

See https://community.hortonworks.com/articles/227560/real-time-stock-processing-with-apache-nifi-and-ap.html

Source: https://community.hortonworks.com/storage/attachments/93299-stock-to-kafka.xml https://community.hortonworks.com/storage/attachments/93298-stocks-copy.json https://github.com/tspannhw/stocks-nifi-kafka

Install Java 8

Install NiFi https://nifi.apache.org/download.html

Install Kafka https://kafka.apache.org/downloads

Or get a linux box or big VM

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.3.0/index.html

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/index.html

How to Automatically Migrate All Tables From a Database to Hadoop With No Coding

Apr 09, 2018 · Tim Spann

we don't have one to list all topics. you could call kafka-topics.sh to get the list or make an API call

How to Automatically Migrate All Tables From a Database to Hadoop With No Coding

Apr 08, 2018 · Tim Spann

https://community.hortonworks.com/articles/57262/integrating-apache-nifi-and-apache-kafka.html


https://community.hortonworks.com/articles/155527/ingesting-golden-gate-records-from-apache-kafka-an.html


ConsumeKafkaRecord_1_0 (comma list of all topics) to ConvertRecord to PutHDFS you may add ConvertAvroToOrc or PutParquet


How to Automatically Migrate All Tables From a Database to Hadoop With No Coding

Apr 08, 2018 · Tim Spann

it has worked for me. post here https://community.hortonworks.com/gallery/index.html https://community.hortonworks.com/questions/1629/nifi-connection-to-mssql-server-db.html https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html

Using Jolt in Big Data Streams to Remove Nulls

Mar 23, 2018 · Tim Spann

https://github.com/bazaarvoice/jolt/issues/130


I use default values. https://community.hortonworks.com/articles/149910/handling-hl7-records-part-1-hl7-ingest.html

Spring Boot 2.0 on ACID! Big Data + Spring Boot

Mar 14, 2018 · Tim Spann

good point, this was a follow up to the other article, should have had a review. sorry.

Spring Boot 2.0 on ACID! Big Data + Spring Boot

Mar 12, 2018 · Tim Spann

Josh Long is great, when I worked at Pivotal I got a few articles in the Spring Weekly list.

Spring Boot 2.0 on ACID! Big Data + Spring Boot

Mar 12, 2018 · Tim Spann

Spring for Hadoop hasn't been updated in forever. It is stuck on HDP 2.2 and we are on HDP 2.6. Spring Data JDBC and Spring Data Repositories make a lot of sense. I should do that, I'll do an update when I get the chance. Maybe add Java 9 and some other goodies. Thanks for the suggestions. If you want to fork the github repo, please do!

Apache Tika and Apache OpenNLP for Easy PDF Parsing and Munching

Feb 08, 2018 · Tim Spann

http://opennlp.sourceforge.net/models-1.5/

Apache Tika and Apache OpenNLP for Easy PDF Parsing and Munching

Feb 08, 2018 · Tim Spann

You need to install the OpenNLP models and reference that in the processor properties. Also OpenNLP misses a lot of names and locations. Accuracy is kind of hit or miss. https://github.com/tspannhw/nifi-nlp-processor https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html

What Is Kafka? Everything You Need to Know

Aug 13, 2017 · Jean-Paul Azar

Jean-Paul,

There's another open source registry that integrates with Kafka and other systems extremely well and has a great REST API and UI:

https://github.com/hortonworks/registry

It has versioning and is moving to adding protocol buffers and going into Apache.

Have you tried that one?

How to Automatically Migrate All Tables From a Database to Hadoop With No Coding

Jul 06, 2017 · Tim Spann

NiFi can run on a cluster of servers to distribute the load. NiFi generally supports 50 megabytes a second per node

16 Free and Open-Source Business Intelligence Tools

Apr 28, 2017 · Sarah Davis

have you seen the open source Superset from airbnb and hortonworks

TensorFlow on the Edge, Part 1 of 5

Mar 16, 2017 · Tim Spann

paused on that one will try this weekend



Yes, Java Has Flaws. But...

Apr 19, 2016 · Tim Spann

Some ways to do Java 8. https://dzone.com/articles/zlwell-written-java

Yes, Java Has Flaws. But...

Apr 15, 2016 · Tim Spann

i have seen a lot of messy code. You bring it into IntelliJ / Eclipse and format the code, hide the bad comments and run some static code analysis tools. Java is nice for that.

Log Scraping

Jun 05, 2013 · Eric Gregory

Good catch. I like your second idea better though.

Log Scraping

Jun 05, 2013 · Eric Gregory

Good catch. I like your second idea better though.

User has been successfully modified

Failed to modify user

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: