Protect Your Cloud Big Data Assets

DZone 's Guide to

Protect Your Cloud Big Data Assets

To protect your cloud hosted Big Data, you need to start at the perimeter and follow best practices in Cloud security to start.

· Big Data Zone ·
Free Resource

Protect Your Cloud Big Data Assets

I put together a number of minimum steps to get you started on protecting your Big Data assets in the cloud, focusing on Hadoop on AWS IaaS.

Step 1: Do not put anything into the cloud unless you have a CISO, Chieft Security Architect, Certified Cloud Administrator, full understanding of your PII and private data, a Lawyer to defend you against the coming lawsuits, full understanding of Hadoop, Hadoop Certified Administrators, a Hadoop premier support contract, a security plan, full understanding of your Hadoop architecture and layout.

Step 2: Study all running services in Ambari.

Step 3: Confirm and check all of your TCP/IP ports. Hadoop has a lot of them!

Step 4: If you are not using a service, do not run it.

Step 5: By default, disable all access to everything, always. Only open ports and access when something and someone critical cannot access them.

Step 6: SSL, SSH, VPN and Encryption Everywhere.

Step 7: Run Knox! Set it up correctly.

Step 8: Run Kali and audit all your IPs and ports.

Step 9: Use Kali hacking tools to attempt to access all your web ports, shells and other access points.

Step 10: Run in a VPC

Step 11: Setup security groups. Never open to or all ports or all IPs!?!??!?!!!

Step 12: If this seems too hard, don't run in the cloud.

Step 13 is unlucky, skip that one.

Step 14: Never run as root, have long difficult passwords changed frequently, use your own user names that are hard to guess.  Change things often!

Step 15: Read all the recommended security documentation and use it.  AWS, Apache and Hortonworks have published a great deal of excellent documentation.

Step 16: Kerberize everything.

Step 17: Run Metron to detect what is being attacked.

Step 18: People! Train them and if someone leaves remove their access instantly, delete their accounts, privileges and change all passwords and usernames.

My recommendation is get a professional services contract with an experience Hadoop organization or use something like Microsoft HDInsight or HDC that is managed.

These are guidelines, you NEED to work with an experienced security team NOW.

TCP/IP Ports Often Open in Hadoop Installations

50070 : Name Node Web UI
50470 : Name Node HTTPS Web UI
8020, 8022, 9000 : Name Node via HDFS
50075 : Data Node(s) WebUI
50475 : Data Node(s) HTTPS Web UI
50090 : Secondary Name Node
60000 : HBase Master
8080 : HBase REST
9090 : Thrift Server
50111 : WebHCat
8005 : Sqoop2
2181: Zookeeper
9010: Zookeeper JMX
10000, 60010, 60020, 60030, 2888, 3888, 8660, 8661, 8662, 8663, 8660, 8651, 3306, 80, 8085, 1004, 1006, 8485, 8480, 2049, 4242,14000, 14001, 8021, 9290, 50060, 8032, 8030, 8031, 8033, 8088, 8040, 8042, 8041, 10020, 13562, 19888, 9090, 9095, 9083, 16000, 12000, 12001, 3181, 4181, 8019, 9010, 8888, 11000, 11001, 7077, 7078, 18080, 18081, 50100

There's more of these if you are also running your own visualization tools, other data websites, other tools, Oracle, SQL Server, mail, NiFi, Druid, etc...

If it's hosted on AWS, it will be attacked, it is being attacked now (check your logs) and you could be compromised.  Take security extremely serious, your enemies do.


  • My Talk on Hadoop Perimeter Security
  • Setting up Apache Knox
  • AWS Security Article
  • Using Network Security on AWS
  • Best Practices for Hardening EC2 Instances
  • Tactial EC2 Security
  • Hortonworks Security and Governance
  • Apache Metron
  • Topics:
    hadoop ,big data ,cloud ,aws ,security

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}