Over a million developers have joined DZone.

Data Warehousing, NoSQL, and the Cloud

DZone's Guide to

Data Warehousing, NoSQL, and the Cloud

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

With the nascent advent of NoSql, cloud computing and slick new databases, we seem to have forgotten from whence we came. I went to a conference recently on the open source search product Solr/Lucene. One of the keynote speakers, Chief Data Scientist of HortonWorks, discussed what turned him to NoSQL databases, in this case, a failed project to track every click on walmart.com in Oracle.

For all it’s idiosyncrasies and irritations Oracle (the database) is an incredibly powerful and versatile product, a power most projects do not fully use. Hortonworks appears to be trying to follow the same path as Oracle (the company), from consulting company product to vast riches. Even though many projects do not tap full featureset or power of Oracle, it is still preferred in some companies for the supposed safety of an expensive support contract. This is probably true of SQL Server as well, but I’m less experienced in that area. In the same way, I doubt few who use NoSQL solutions fully realize the power or place to use the tools available.

I think it’s worth looking at how we got here. Oracle has traditionally sells single purpose, high powered and very expensive machines, a poor fit for a scrappy web startup. The variety of configuration and installation options is overwhelming to the point that they sell pre-built HP boxes with Oracle and the OS configured for you.

Through a maze of acquisitions Oracle likely owns a company that can meet any need, if only you can figure out where to look on their website. When I last talked to their sales reps for a non database product, their preferred pricing model was revenue sharing, which to me sounds like a terrible proposal, unless you own a company that exists to lose money.

If you pay enough, Oracle will assign someone to fix your problems. When I last worked on a data warehouse, I typically found a database defect every other week, some with patches available and some without. We were running a “small” data warehousing system, recording a couple hundred million records.
Other companies in the city were well known to have larger databases,for various purposes. Had I wished for a different rendition of this project, I could well have moved to a payroll company, a grocery chain, or a computer manufacturer.

The challenge of building such a system is not unique to Oracle. Tuning queries on Postgres, in my experience, typically results in two orders of magnitude performance improvement, vs. one in Oracle. This appears to be Postgres lacking numerous micro-optimizations, while generally a solid, cleanly-designed product.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}