10 Questions on Hortonworks Data Cloud for AWS
Hortonworks Data Cloud can give you enhanced control over your data in AWS. Here are some questions that cropped up after two webinars that cover various uses.
Join the DZone community and get the full member experience.Join For Free
We recently concluded our highly attended How to Get Started with Hortonworks Data Cloud for AWS Webinars. Thank you Jeff Sposetti and Sean Roberts for hosting the sessions. The webinars provided a very informative overview about the offering and included a detailed demonstration to show how the product works.
Some greats questions came across during the webinar and as promised, here is a brief capture of that Q&A. If you didn’t get a chance to watch the webinar, you can check out the replay here.
For more information and to get started with Hortonworks Data Cloud for AWS, you may also want to check out the following:
- Hortonworks Data Cloud for AWS Product Page.
- Hortonworks Data Cloud for AWS Documentation.
- Hortonworks Data Cloud for AWS (AWS Marketplace listing (with a 5-day free trial)).
- Get a $100 of AWS Promotional Credit for the AWS infrastructure.
Q&A FROM WEBINAR
1. Is the Cloud Controller Service self-sufficient or do you need to purchase HDP Services to spin up a HDCloud for AWS Spark-Zeppelin cluster?
To launch Hortonworks Data Cloud for AWS, you must subscribe to the two AWS Marketplace products: Hortonworks Data Cloud – Controller Service (allows you to launch the cloud controller) and Hortonworks Data Cloud – HDP Services (allows the cloud controller to create HDP clusters).
2. Why do you need two AWS Marketplace products?
The Controller and the HDP Services products serve two different purposes, and the technology and included software is different. This means they each require a different image (i.e. “AMI”) and launch logic. Plus, pricing for the controller is different than the pricing for the HDP Services. Therefore, there are two different products in Marketplace. We are working with the AWS Marketplace team to simplify the process of having two Marketplace products.
3. Does the Cloud Controller Service need to run permanently?
No. You only need to spin up one instance of the Controller, which allows you to spin up many clusters. Think of the Controller as the entry point to the product and central management console for your clusters.
We recommend that you spin-up the Controller once and keep it running. In this way, you can choose from different instance types with flexible pricing and life cycles. In addition, you can run more than one cluster type at a time and you can re-use configurations + templates between clusters saving you time.
4. Is there a cost associated with subscribing to each of the services, or do you pay only after you spin the services up?
The cost is associated with the usage of the software. If you just subscribe to the products but do not create any instance of the software, you are not be charged. You only pay (software and AWS charges) as you run the software.
5. Are there any limitations with these types of HDP clusters?
The product includes a set of prescriptive HDP cluster configurations for running the most common Apache Hadoop, Apache Spark, and Apache Hive workloads right out the door; specifically Data Science and Exploration, ETL and Data Preparation, and Analytics and Reporting use cases. In the future, we will look at adding other cluster types to include in the offering.
6. How is data transferred from S3 to HDCloud and vice versa? Is this a manual process?
Check out the product documentation for more information on using the product with S3.
7. How do you import your data into your cluster (into HDFS) from outside of AWS?
HDCloud provides a prescriptive deployment of HDP. Thus, any tools that you would use to import data into HDP can be used with HDCloud such as standard Apache Hadoop tools, Apache Nifi, and other tools that have Hadoop clients built in. In addition, any tools that you have used to import data into S3 can be used.
8. Could you configure multi-master clusters, in order to enable HA on some Hadoop services?
At this time, the offering is focused on ephemeral workloads which does not require HA cluster configurations. We will look to add support for long-running workloads in the future.
9. The nodes provisioned in AWS are on-demand nodes right now? Any plan to support spot nodes?
The current clusters that are spun up are on-demand. We plan to have support for Spot Pricing in the future so stay tuned.
10. Overall, why should we choose Hortonworks Data Cloud for AWS versus other cloud offerings?
There are several reasons such as:
- Powered by the Hortonworks Data Platform. Hortonworks is 100% open source technology and has the most committers.
- Optimized for running Apache Hive and Apache Spark workloads cases.
- Designed and optimized for AWS by integrating with services such as EC2, S3, RDS, and IAM.
Published at DZone with permission of Roni Fontaine, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.