How to Start a Big Data Practice
How to Start a Big Data Practice
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Big Data Center of Excellence (COE)
It may be a good idea to plan around setting up a Big Data Center of Excellence (COE)whose main objective would be take a holistic approach towards following two key aspects of Big Data from different perspectives such as setting up team, evaluating tools & frameworks, doing POCs etc.
- Data processing
- Data analytics
Senior Management Commitment: One of the most important aspect of running a successful Big Data COE is senior management commitment (towards sponsorship) for minimum of 1-1.5 years for results to start showing up. It is quite important to hire a dedicated team of minimum of 2-3 staff in Big Data team having expertise with in the area of both, data processing and data analytics.
Big Data vis-a-vis Business Domains: Another important point to consider is business domain you would want to consider for doing POCs. The idea is to pick one or two of the following and plan to identify data use-cases around which you would want to do one or more POCs. Following are some of the business domains (verticals) for your consideration:
- Online + Advertising
- Healthcare & Biotechnology
- Retail & eCommerce
- Industrial controls
Key aspects of Big Data COE: As part of setting up the COE, following are the key areas where one would want to focus:
- Big Data Team
- Big Data Lab
- Proof-of-concepts (POCs)
Following is discussed the above three aspects of Big Data COE.
Big Data Team
While setting up a team for Big Data, one needs to pay attention to the fact that the team needs to have a balance between having staff with skill-sets in following areas:
- Data processing: For data processing, the staff would require to know (or get trained) in following two areas:
- Open-source frameworks such as Hadoop, HBase, Hive, PIG, Solr etc
- Big Data platforms provided by different vendors
- Data analytics: Team needs to consist staff having expertise on different aspects of data science. This is quite a tricky one. It is predicted that there is going to be a great shortage of data scientist in near future. Having said that, the interesting thing is that not everyone could decide to get on board with data science and become expert at it. There are different aspects of Data Science which requires a person to be quite analytical and good at algorithms.
Out of the above two, it is becoming difficult for companies to find data scientist although they are able to manage a team having expertise at Hadoop stack (data processing).
Big Data Lab
Once the team is taken care of, it is equally important to setup a Big Data lab which could consist primarily of following:
- Hardware/Boxes having sufficiently larger RAM than usual for Big Data processing. As we started with Big Data, we hit the road block sooner due to limitation of our usual laptop having the RAM of 8GB.
- Softwares consisting of open-source technology stack and commercial Big Data platforms.
Big Data Proof-Of-Concepts (Case Studies)
Once you are setup with Big data team and lab, it is of utmost importance to identify a couple of proof-of-concepts (POC) projects which you could showcase to your potential customers. This is primarily because it is crucial to demonstrate to the potential customer that you have enough capabilities in the area of Big Data processing and analytics to take on projects of large size.
Published at DZone with permission of Ajitesh Kumar , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.