Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

This Week in Hadoop and More: Spark, TensorFlow, and JSoup

DZone's Guide to

This Week in Hadoop and More: Spark, TensorFlow, and JSoup

A recap of news from all over the world of big data including Hive, Spark, Flink, and NiFi.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Back from vacation and ready to rock.

Image title

A few interesting presentations from around the world of big data.

Hortonworks has a nice webinar (recorded) on IoT intelligence using geographically distributed sensors. There's another great talk on using Apache NiFi with its smaller cousin Apache MiniFi, as well as a great talk on enabling Kerberos Security with Apache HBase. Additionally, there's a great talk on using Apache Zeppelin and Spark for Enterprise Data Science. For Deep Learning fans, there's another great tutorial on using TensorFlow.

Big Data Spain 2016 has released a lot of excellent presentation content:

Cool Tools

  • Record Query (GitHub) allows you to read JSON, AVRO, and other semi-structured formats. Written in RUST, this is very interesting and useful command line tool.

  • Trapezium (GitHub) from Verizon is a Spark/Scala/Akka framework for building batch, streaming and API services to deploy Machine Learning Models.

  • Here is a cool article on how to use Apache NiFi to Convert Rows to Columns in Text Files.

  • This article I wrote on streaming data from a relational database to Hadoop as HBase/Phoenix and Hive/ORC tables and files.

What's Going on Today

I am building a robotic miniature car with a Raspberry Pi 3 B+, sensors, camera, and WiFi. It will use Python to send MQTT messages to the cloud which I will pull off with NiFi and land in Hadoop. That data will be graphed and track the car as it chases my robotic vacuum. I think this is in the future what robots will watch for entertainment. It's like NASCAR for robots.

This Week's Bit of Java 8

Extracting a link using JSoup:

public List<PrintableLink> extract(String url, String type) {
   List<PrintableLink> linksReturned = new ArrayList<>();

   try {
      Document doc = Jsoup.connect(url).get();
      Elements links = doc.select("a[href]");
      PrintableLink pLink = null;

      for (Element link : links) {
      if (null != type) {
        if (null != link.attr("abs:href") && link.attr("abs:href").endsWith(type)) {
        pLink = new PrintableLink();
        pLink.setLink(link.attr("abs:href"));
        pLink.setDescr(trim(link.text(), 100));
        linksReturned.add(pLink);
        pLink = null;
      }
  } else {
    pLink = new PrintableLink();
    pLink.setDescr(link.attr("abs:href"));
    pLink.setLink(trim(link.text(), 100));
    linksReturned.add(pLink);
    pLink = null;
  }
  }
  } catch (Exception x) {
      x.printStackTrace();
}
return linksReturned;
}

Too bad Spark SQL wasn't still called Shark. 

Image title

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,hadoop ,spark

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}