Over the holidays I've been reading a number of cool articles. There's a fun one on Akka. And the list goes on. Pick a streaming real-time data processing framework or tool. I dare you. It's hard to complain about being blessed by so many awesome open source projects. But how do you pick one to focus on? Your underlying vendors can help as some have their favorites and support some more than others. If you use Hortonworks, then Apache Storm seems like a good bet.
One thing I want to do is connect my Raspberry Pi to a Theremin that I may get for Christmas. I am hoping to be able to program it to react to various stimuli such as temperature, stock changes, Twitter sentiment and others. I would run this either off of Spring XD or small Spark cluster.
Maybe I can finally decide what file format I should use for HDFS. Is it Parquet? AVRO? Or one of the other standard formats like RCFile, CSV, ORCFile, or something else. When and where do I use Snappy? Compressing everything is an interesting thought, but isn't that initial compression time a big deal, especially if you wanted to do some kind of in-memory or fast processing? You really have to know your use case. If you only access data once, it may not be worth it.
I have a few new books due to the great deals from various booksellers and a number of free ones from mailing lists to read. Thank you tech media!