DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > This Week in Hadoop: NiFi, Sparkling Water, Ambari, and Spark

This Week in Hadoop: NiFi, Sparkling Water, Ambari, and Spark

This week's round-up of interesting big data technologies from Spark to NiFi with some microservices thrown in for modern data application development.

Tim Spann user avatar by
Tim Spann
CORE ·
Aug. 19, 16 · Big Data Zone · News
Like (9)
Save
Tweet
6.09K Views

Join the DZone community and get the full member experience.

Join For Free

H2O has released a new version of Sparkling Water 2.0.  I found a few very cool articles on their blog. Spam Detection with ML Pipelines and H2O TensorFlow on AWS GPU!

Cool Spark Article on Clickbait Clustering with Spark (Github, Github)

Increment Fetch in Apache NiFi with QueryDatabaseTable

Awesome Article on Real Architectural Patterns for Microservices by Camille Fournier, Camille is one of the most brilliant people I have had the pleasure of speaking with. This is a must read.

Combining Agile and Spark, There's the interesting BDD-Spark library (Github).

Hortonworks has a number of interesting Demos, labs and training from their introduction to Hadoop workshop.

  • Risk Analysis with Spark
  • Streaming Data into HDFS
  • Risk Analysis with Pig
  • Data Manipulation with Hive
  • Loading Data into HDFS

Cool Charting

Check out this article on Data Visualization with D3, DC, Leaflet, Python 
(For more information on DC.JS, check it out.)

Spring Boot Applications in Ambari

How to Bundle a Spring Boot Application as an Ambari Service (Github)

Scala / SBT Tip

My SBT wasn't building until I upped the memory.  Now this is in a shell script for all my builds:

export SBT_OPTS="-Xmx2G -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=2G -Xss2M  -Duser.timezone=GMT"
sbt -J-Xmx4G -J-Xms4G assembly

Here is an example Spark SQL with Stanford Core NLP SBT Build File (build.sbt):

name := "Sentiment"

version := "1.0"

scalaVersion := "2.10.6"

assemblyJarName in assembly := "sentiment.jar"

libraryDependencies  += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided"
libraryDependencies  += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.6.0" % "provided"
libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.5.1"
libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.5.1" classifier "models"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

You will also need project/assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

General Hadoop CLI Tips

  1. Keep an eye on logs! Check them, make sure they rotate and old ones are archived off to cold storage or deleted. Find the biggest files on your box (du -hsx * | sort -rh | head -10).
  2. If you want to see things you have run before, check out:
     /<user>/.beeline/history, /<user>/.hivehistory, /<user>/.sqlline/history, /<user>/.pig_history, /<user>/.spark_history.

    You can also run history to check on general commands you have run (remember this will return the commands used by that previous user, which may be root or whatever you are currently logged in as.
  3. What Java am I using and are there others available?   alternatives --display java  
  4. Sometimes your PATH may not be fully set, so you can miss out on great Java CLI tools like jps.
hadoop Spring Framework

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Optional in Java: When Better Code Is Not an Alternative
  • 50 Common Java Errors and How to Avoid Them
  • The Need for a Kubernetes Alternative
  • How to Submit a Post to DZone

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo