Protect Your Cloud Big Data Assets
I put together a number of minimum steps to get you started on protecting your Big Data assets in the cloud, focusing on Hadoop on AWS IaaS.
Step 1: Do not put anything into the cloud unless you have a CISO, Chieft Security Architect, Certified Cloud Administrator, full understanding of your PII and private data, a Lawyer to defend you against the coming lawsuits, full understanding of Hadoop, Hadoop Certified Administrators, a Hadoop premier support contract, a security plan, full understanding of your Hadoop architecture and layout.
Step 2: Study all running services in Ambari.
Step 3: Confirm and check all of your TCP/IP ports. Hadoop has a lot of them!
Step 4: If you are not using a service, do not run it.
Step 5: By default, disable all access to everything, always. Only open ports and access when something and someone critical cannot access them.
Step 6: SSL, SSH, VPN and Encryption Everywhere.
Step 7: Run Knox! Set it up correctly.
Step 8: Run Kali and audit all your IPs and ports.
Step 9: Use Kali hacking tools to attempt to access all your web ports, shells and other access points.
Step 10: Run in a VPC
Step 11: Setup security groups. Never open to 0.0.0.0 or all ports or all IPs!?!??!?!!!
Step 12: If this seems too hard, don't run in the cloud.
Step 13 is unlucky, skip that one.
Step 14: Never run as root, have long difficult passwords changed frequently, use your own user names that are hard to guess. Change things often!
Step 15: Read all the recommended security documentation and use it. AWS, Apache and Hortonworks have published a great deal of excellent documentation.
Step 16: Kerberize everything.
Step 17: Run Metron to detect what is being attacked.
Step 18: People! Train them and if someone leaves remove their access instantly, delete their accounts, privileges and change all passwords and usernames.
My recommendation is get a professional services contract with an experience Hadoop organization or use something like Microsoft HDInsight or HDC that is managed.
These are guidelines, you NEED to work with an experienced security team NOW.
TCP/IP Ports Often Open in Hadoop Installations
50070 : Name Node Web UI 50470 : Name Node HTTPS Web UI 8020, 8022, 9000 : Name Node via HDFS 50075 : Data Node(s) WebUI 50475 : Data Node(s) HTTPS Web UI 50090 : Secondary Name Node 60000 : HBase Master 8080 : HBase REST 9090 : Thrift Server 50111 : WebHCat 8005 : Sqoop2 2181: Zookeeper 9010: Zookeeper JMX 50020 50010 50030 8021 50060 51111 9083 10000, 60010, 60020, 60030, 2888, 3888, 8660, 8661, 8662, 8663, 8660, 8651, 3306, 80, 8085, 1004, 1006, 8485, 8480, 2049, 4242,14000, 14001, 8021, 9290, 50060, 8032, 8030, 8031, 8033, 8088, 8040, 8042, 8041, 10020, 13562, 19888, 9090, 9095, 9083, 16000, 12000, 12001, 3181, 4181, 8019, 9010, 8888, 11000, 11001, 7077, 7078, 18080, 18081, 50100
There's more of these if you are also running your own visualization tools, other data websites, other tools, Oracle, SQL Server, mail, NiFi, Druid, etc...
If it's hosted on AWS, it will be attacked, it is being attacked now (check your logs) and you could be compromised. Take security extremely serious, your enemies do.