Over a million developers have joined DZone.

This Week in Hadoop and More: Spark, TensorFlow, and JSoup

DZone 's Guide to

This Week in Hadoop and More: Spark, TensorFlow, and JSoup

A recap of news from all over the world of big data including Hive, Spark, Flink, and NiFi.

· Big Data Zone ·
Free Resource

Back from vacation and ready to rock.

Image title

A few interesting presentations from around the world of big data.

Hortonworks has a nice webinar (recorded) on IoT intelligence using geographically distributed sensors. There's another great talk on using Apache NiFi with its smaller cousin Apache MiniFi, as well as a great talk on enabling Kerberos Security with Apache HBase. Additionally, there's a great talk on using Apache Zeppelin and Spark for Enterprise Data Science. For Deep Learning fans, there's another great tutorial on using TensorFlow.

Big Data Spain 2016 has released a lot of excellent presentation content:

Cool Tools

  • Record Query (GitHub) allows you to read JSON, AVRO, and other semi-structured formats. Written in RUST, this is very interesting and useful command line tool.

  • Trapezium (GitHub) from Verizon is a Spark/Scala/Akka framework for building batch, streaming and API services to deploy Machine Learning Models.

  • Here is a cool article on how to use Apache NiFi to Convert Rows to Columns in Text Files.

  • This article I wrote on streaming data from a relational database to Hadoop as HBase/Phoenix and Hive/ORC tables and files.

What's Going on Today

I am building a robotic miniature car with a Raspberry Pi 3 B+, sensors, camera, and WiFi. It will use Python to send MQTT messages to the cloud which I will pull off with NiFi and land in Hadoop. That data will be graphed and track the car as it chases my robotic vacuum. I think this is in the future what robots will watch for entertainment. It's like NASCAR for robots.

This Week's Bit of Java 8

Extracting a link using JSoup:

public List<PrintableLink> extract(String url, String type) {
   List<PrintableLink> linksReturned = new ArrayList<>();

   try {
      Document doc = Jsoup.connect(url).get();
      Elements links = doc.select("a[href]");
      PrintableLink pLink = null;

      for (Element link : links) {
      if (null != type) {
        if (null != link.attr("abs:href") && link.attr("abs:href").endsWith(type)) {
        pLink = new PrintableLink();
        pLink.setDescr(trim(link.text(), 100));
        pLink = null;
  } else {
    pLink = new PrintableLink();
    pLink.setLink(trim(link.text(), 100));
    pLink = null;
  } catch (Exception x) {
return linksReturned;

Too bad Spark SQL wasn't still called Shark. 

Image title

big data ,hadoop ,spark

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}