It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week.
Top Articles from HCC
- Hive on Tez vs PySpark for weblogs parsing by: bmathew
SynopsisBoth Pig and Spark (PySpark) excel at iterative data processing against weblogs data in text delimited format. Is one faster than the other?
- Stream data into HIVE like a Boss using NiFi HiveStreaming – Olympics 1896-2008 by: nshawa
An easy tutorial showing how to can stream data from CSV format into Hive tables directly and start working on it.
- Performance of Apache Spark on HDP/HDFS vs Spark on EMR/S3 by: bmathew
Which is faster when analyzing data using Spark 1.6.1: HDP with HDFS for storage, or EMR using S3 for storage?
- More Hadoop nodes = faster IO and processing time? by: smanjee
This article covers Hadoop performance on various IaaS providers in hope to find additional performance insights.
- How to install and run Spark 2.0 on HDP 2.5 Sandbox by: phargis
Top Questions from HCC
- run simple hive — Invalid distance too far back.
- Add new host into the cluster.
- Hadoop files list sorted by time.
- Is there is any workaround to map CSV columns to Hive columns?
- Data transfer between two clusters.
Come join us on HCC and get your questions answered.