The Skills That Data Analysts Need to Master
If you're here to learn more about the data profession, read on for an overview of the tech you must know to have success in the field.
Join the DZone community and get the full member experience.Join For Free
1. The first is Excel. This seems very simple, but, in fact, it's not. Excel can not only do simple two-dimensional tables, complex nested tables, but also create line charts, column charts, bar charts, area charts, pie charts, radar charts, combo charts, and scatter charts.
2. Master SQL statements on SQL Server or Oracle. Although you are a business analyst, if you can rely on IT and IT tools (such as a multi-dimensional BI analysis model) sometimes you can't get the data you want. Learning to write nested SQL statements, including join, group by, order by, distinct, sum, count, average, and various statistical functions, can be very helpful
3. Master visualization tools, such as BI,such as Cognos ,Tableau and FineBI,etc.,specifically look at what tools the enterprise uses,like I used to use FineBI.Visualization with these tools is very convenient, especially if the analysis report can contain these images. These skills will definitely attract the attention of senior leaders, as it allows them to understand at a glance, and gain insight into, the essence of the business. In addition, as a professional analyst, using the multi-dimensional analysis model, Cube, you can easily and efficiently customize reports.
Summary: At this point, if you've mastered 80% of the above skills you can be considered a qualified analyst. Data analysts at this stage need to know how to use tools to process data, understand business scenarios, and analyze and solve basic problems. It is still important to emphasize that the most important thing for data analysts to be familiar with us the business. Knowing the business, the analyst's logic will be clear and general, and it will rule out most of the useless analyses. For a long time, for the business that I understand, I know the problem when I compare the data.
After that, if you want to take a deep-drill into technology, you can continue to develop your career in the direction of a data scientist.
The Advanced Stuff
1.The System of Learning Statistics
Pure machine learning emphasizes the predictive power and implementation of algorithms, but statistics have always emphasized "interpretability." For example, I can see if the correlation between two stocks is correlated or not. Let's say one of the inversely correlated stocks goes down, then, according to the data, we can assume the other stock will go up.
Statistical methods related to data mining (multivariate logistic regression analysis, nonlinear regression analysis, discriminant analysis, etc.).
Quantitative methods (time axis analysis, probability model, optimization).
Decision analysis (multi-purpose decision analysis, decision tree, influence diagrams, sensitivity analysis).
Establish an analysis of competitive advantage (learning basic analytical concepts through projects and success stories).
Database entry (data model, database design).
Predictive analysis (time axis analysis, principal component analysis, nonparametric regression, statistical process control).
Data Management (ETL (Extract, Transform, Load), Data Governance, Management Responsibility, Metadata).
Optimization and heuristics (integer programming, nonlinear programming, local exploration, super-inspiration (simulated annealing, genetic algorithm)).
Big data analysis (learning of unstructured data concepts, MapReduce technology, big data analysis methods).
Data mining (clustering (k-means method, segmentation method), association rules, factor analysis, survival time analysis).
Computer Simulation of Risk Analysis and Operational Analysis.
Software-level analytics (analytical topics at the organizational level, IT and business users, change management, data topics, presentation, and communication.
2. Master the AI Machine Learning Algorithm and Model it With Tools Such as Python/R
Can traditional BI analysis answer what happened in the past, what is happening now, and what will happen to the future? We must rely on algorithms. While self-service BIs like Tableau and FineBI have built-in part of the analysis model, analysts want a more comprehensive and deeper exploration that requires data mining tools like Python and R. In addition, the hidden relationship between big data sets cannot be achieved by manual analysis or with traditional tools. At this time, the algorithm is implemented, and there will be no more surprises.
Among them, the open source programming language for statistical analysis and its operating environment, R, has attracted much attention. The strength of R is not only that it contains a rich statistical analysis library, but also has a high-quality chart generation function that visualizes the results and can be run with simple commands. In addition, it has a package extension mechanism called CRAN (The Comprehensive R Archive Network), which can be used to import functions and data sets that are not supported in the standard state by importing the extension package. Although the R language is powerful, the learning curve is steep. Personally, I'd recommend you start with Python, which has a wealth of statistical libraries, such as NumPy, SciPy, the Python Data Analysis Library, and Matplotlib.
The final development of data talent revolves around data strategy. For example, data strategists can use IT knowledge and experience to make business decisions. Data scientists can use IT technology to develop complex models and algorithms. Analytical consultants can combine actual business knowledge and analytical experience to focus on your industry's next explosion point.
Therefore, you need to have communication, organization, management skills, and business thinking. This is not limited to a certain position. You need to think in a higher position and seek benefits for the company. At the same time, we must also think about how to use the "data analysis" card to play a role in the company and use data to drive business operations. This is something to think about.
Course and Book Recommendation
Learn basic probabilities first. A great resource is Introduction to Probability and Statistics.
Quickly understand what terms are used in statistical learning, what to do. For this, read: Wasserman, Larry. All of statistics: a concise course in statistical inference. Springer, 2004.
To learn basic statistical ideas, Introduction to Statistical Learning is a popular textbook.
Learn basic algorithms and algorithm analysis and know how to analyze algorithm complexity. Average complexity, worst complexity. Each time you write a program, you anticipate the time it takes (predicted by algorithmic analysis). I recommend Princeton's algorithm class on Coursera.org
Often, great data scientists have blogs for everyone to visit. I recommend these blogs, which I often read:
Opinions expressed by DZone contributors are their own.