2018: The Year in Big Data
An analysis of the last year in big data, AI, IoT, streaming and related tech, including updates to famous platforms, and some of the great conferences held.
Join the DZone community and get the full member experience.Join For Free
This has been a busy year, so much as happened, so many releases, new products, and new beginnings. In 2018, Hortonworks and Cloudera announced they would merge creating a new giant in Big Data. A number of projects faded like Apache Apex. Some became bigger than ever like Apache Kafka.
If you are not involved in IoT, AI, Streaming, K8, or Hybrid Cloud, you will be. They are touching every company and every part of the Internet. The cost of edge computing is making it impossible to ignore. Sensor data is the new social media data. You have to have it. Now, how do you get it, store it, analyze it, query it, secure it, and report on it? Hadoop in the Hybrid Cloud, fed by Apache NiFi.
We started the year with extreme worries on GDPR and those have spread to California and beyond. You need to know where, why, what, how, and when data is to remove it. Auditing is extremely important; which is why I'm thankful for Apache Ranger, Apache NiFi, Apache Hive ACID, and Apache Atlas.
Exciting Technology From 2018 for 2019 Developers
- Apache NiFi 1.8
- Apache Kafka 2.0
- Apache Spark 2.3
- Apache MXNet 1.3
- Tensorflow 1.12
- Apache Hadoop 3.1
- Hortonworks HDP 3.1
- Apache Ambari 2.7.3
- Apache Livy
- Apache HBase 2
- Apache Hive 3.1
- Superset 0.23.0
- Schema Registry 0.52.0
- SAM 0.6.0
- NiFi Registry 0.3.0
- Druid 0.12.1
- Apache Ranger 1.1
- Spring Boot
There was a lot of new technology and updates this year. Apache NiFi went through major upgrades that have made it a world-class tool for many streaming use cases.
Hadoop hit version 3.1 and included some amazing technology for running deep learning and dockerized workloads. The entire Hadoop ecosystem has moved forward to be cloud native and hybrid cloud first.
A quick list of my articles from 2018:
- IoT Edge Use Cases With Apache Kafka
- Real-Time Stock Processing With Apache NiFi and Apache Kafka, Part 1
- The Many Features of Apache MXNet GluonCV
- Simple Apache NiFi Operations Dashboard (Part 2): Spring Boot
- Building a Custom Apache NiFi Operations Dashboard (Part 1)
- Posting Images With Apache NiFi 1.7 and a Custom Processor
- Properties File Lookup Augmentation of Data Flow in Apache NiFi 1.7.x
- Using GluonCV 0.3 With Apache MXNet 1.3 and Apache NiFi 1.7
- IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning
- Monitoring and Managing Apache Kafka Clusters
- DevOps for Apache NiFi 1.7 and More
- The Summer of Big Data Innovation 2018
- Ingesting Data From Multiple IoT Devices With Apache NiFi 1.7 and MiniFi
- DataWorks Summit 2018 Apache NiFi Wrapup
- Ingest BTC.com and Blockchain.com Data via Apache NiFi
- Real-Time OCR and Hadoop Ingest
- DevOps Tips: Using the Apache NiFi Toolkit with Apache NiFI 1.6.0
- Accessing Feeds From EtherDelta on Trades, Funds, Buys, and Sells (Apache NiFi)
- Converting CSV Files to Apache Hive Tables With Apache ORC Files
- Streaming ETL Lookups With Apache NiFi and Apache HBase
- Enterprise and IIoT Edge Processing With Apache NiFi, MiNiFi, and Deep Learning
- Ingesting JMS Messages From TIBCO With Apache NiFi
- Big Data DevOps With Python
- Spring Boot 2.0 on ACID! Big Data + Spring Boot
- Scala vs. Python for Apache Spark
- Parsing Apache NiFi Records to HBase
- OpenCV + Apache MiniFi for IoT
- Ingesting Golden Gate Records From Apache Kafka and Automagically
Presentations I Gave in 2018:
It was an awesome global road trip spreading the word of Open Source Big Data, Streaming, IoT, and AI at scale.
I started April in Philly speaking on Deep Learning at the Edge for IoT at IoTFusion. A few days later in the Czech Republic I was speaking on Apache NiFi and Deep Learning in @FoD. A quick train trip later to Berlin and I was speaking at the pre-DWS meetup. Due to a speaker cancellation I got to do two talks at the conference (as well as eat too many pretzels). One on IoT and one on Apache Deep Learning.
In June, I spoke at DataWorks Summit in San Jose on Computer Vision. That was a great event and I got to help out on the Deep Learning Crash Course with a TensorFlow notebook.
In August after my summer road trip, I spoke in a webinar that you can check out on IoT. Later that month I joined the awesome Data Scientist, John Kuchmek, speaking at the NJ Shore.
As summer waned, I spoke at the very large Strata Data Conference in NYC on IoT and AI. Later that month I headed up to beautiful Montreal for a talk on Apache Deep Learning at ApacheCon.
In October before Halloween I spoke in Orlando at UAW.
Finally in December, I did a webex with my friends at AppOrchid on AI and Big Data.
The year is almost up, I have a few articles coming on some new devices, new releases, and big updates. So stay tuned, expect a lot of new stuff for 2019. I have three AI cameras coming, the Pimoroni Breakout Garden, Particle Mesh, and many others.
Happy holidays and a great 2019!
Opinions expressed by DZone contributors are their own.