Over a million developers have joined DZone.

How to Start a Big Data Practice

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

This article represents key aspects of starting up a Big Data practice in your organization. Currently, I have started working in the same area and this blog is the result of my research. Hope you find it useful. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Big Data Center of Excellence (COE)

It may be a good idea to plan around setting up a Big Data Center of Excellence (COE)whose main objective would be take a holistic approach towards following two key aspects of Big Data from different perspectives such as setting up team, evaluating tools & frameworks, doing POCs etc.

  • Data processing
  • Data analytics

Senior Management Commitment: One of the most important aspect of running a successful Big Data COE is senior management commitment (towards sponsorship) for minimum of 1-1.5 years for results to start showing up. It is quite important to hire a dedicated team of minimum of 2-3 staff in Big Data team having expertise with in the area of both, data processing and data analytics.
Big Data vis-a-vis Business Domains: Another important point to consider is business domain you would want to consider for doing POCs. The idea is to pick one or two of the following and plan to identify data use-cases around which you would want to do one or more POCs. Following are some of the business domains (verticals) for your consideration:

  • Online + Advertising
  • Tele-communications
  • Financial
  • Insurance
  • Healthcare & Biotechnology
  • Automotive
  • Retail & eCommerce
  • Industrial controls

Key aspects of Big Data COE: As part of setting up the COE, following are the key areas where one would want to focus:

  • Big Data Team
  • Big Data Lab
  • Proof-of-concepts (POCs)

Following is discussed the above three aspects of Big Data COE.

Big Data Team

While setting up a team for Big Data, one needs to pay attention to the fact that the team needs to have a balance between having staff with skill-sets in following areas:

  • Data processing: For data processing, the staff would require to know (or get trained) in following two areas:
    • Open-source frameworks such as Hadoop, HBase, Hive, PIG, Solr etc
    • Big Data platforms provided by different vendors
  • Data analytics: Team needs to consist staff having expertise on different aspects of data science. This is quite a tricky one. It is predicted that there is going to be a great shortage of data scientist in near future. Having said that, the interesting thing is that not everyone could decide to get on board with data science and become expert at it. There are different aspects of Data Science which requires a person to be quite analytical and good at algorithms.

Out of the above two, it is becoming difficult for companies to find data scientist although they are able to manage a team having expertise at Hadoop stack (data processing).

Big Data Lab

Once the team is taken care of, it is equally important to setup a Big Data lab which could consist primarily of following:

  • Hardware/Boxes having sufficiently larger RAM than usual for Big Data processing. As we started with Big Data, we hit the road block sooner due to limitation of our usual laptop having the RAM of 8GB.
  • Softwares consisting of open-source technology stack and commercial Big Data platforms.
Big Data Proof-Of-Concepts (Case Studies)

Once you are setup with Big data team and lab, it is of utmost importance to identify a couple of proof-of-concepts (POC) projects which you could showcase to your potential customers. This is primarily because it is crucial to demonstrate to the potential customer that you have enough capabilities in the area of Big Data processing and analytics to take on projects of large size.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Ajitesh Kumar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}