Building a Cross-Functional Team When ML-as-a-Service Won't Work
Building a Cross-Functional Team When ML-as-a-Service Won't Work
Machine-Learning-as-a-Service doesn't always work out. Sometimes, the best solution is to hire an in-house data scientist.
Join the DZone community and get the full member experience.Join For Free
In our previous article on data science industry perspectives in the cloud, we discussed that evolution is key if you plan to grow your business. You can start with ready-made solutions, then, in time, you switch to the in-house ones that can be made with help from a group of data scientists.
This time around, we’ll talk about the cases and solutions when ML-as-a-Service doesn’t work. When this happens your company shouldn’t start off with hiring a data scientist. Instead, the best option is to invest in a custom-made solution to solve your urgent business needs. Only when you have a workable solution can you dive in deep data science and create a proper team that can create an in-house data science solution.
Know when you don’t need to hire data scientists and when you need to start investing in data science resources.
Start building a comprehensive data science product not from research but from end-to-end solutions that solve business problems.
People envisage a data scientist that has a balance of knowledge in relative subject matters, but in real life, you can barely find such an ideal candidate.
Make your data scientists successful and productive. Any team that deals with data science has to be cross-functional with adjacent roles contributing to the end solution.
Deliver end-to-end solutions that solve business problems rather than delving into research papers.
Custom-Made End-to-End Solution as a Start
Here, we should talk about classical data science — where you have data and goals, and you need to build models to solve a pressing issue. The best way to do this is to jumpstart the process by putting together bits and pieces of some ready-made services into a single workable product and to show your customer a clear-cut result shortly. This can be done without any complex or global research; you can comfortably formulate specifications taking into account all the feedback from your customer and create a more high-level data science product in the long run.
One of the biggest issues for any data science project is formulating the specifications for it. The usual request is something like, “Create something for my company using data science. Analyze data for me.” This type of a job involves lots of trials and errors, and having some custom basic end-to-end solution from the start lets you insert the needed extra services on-the-go. This lets us provide insights and predictions into a specific business workflow much easier.
I believe that you should start to build your system from the ground up — not from data science research but from the point of view of an end-to-end solution. And then, bit-by-bit, you can take out and insert the required services in the process.
How to Do Data Science Research
You start with employing a data scientist who matches your company’s needs. You can use the standard data scientist chart above; this employee should have a firm foot in app environments, math, and programming. An understanding of the topical area is key here because we are solving business issues — and the data scientist that solves more academic problems will be more focused on winning a Kaggle competition than addressing actual business needs. It is important that this person understands the product development cycle. This way, they build up models and analyze data in a way that can be deployed in production.
For example, if a person is using the R language/framework, we should note that it is more aimed at research and it is not production-ready. Correspondingly, the results of such research cannot be deployed in any end-to-end solution. Therefore, they should take note of this and work in pair with a programmer. Although in the above diagram we see that the data scientist should have hacking skills, in reality, this is not completely necessary. The vast majority of data scientists are unable to write quality code. And if you need not just the research but also an actual solution, then process-wise, you need to have a data scientist working together with a data engineer.
CAP Theory as an Analogy
It's impossible to have three database properties at once: consistency, availability, and partition tolerance. This is a basic rule that all developers should know. This is true for data scientists, as well. People envisage data scientists to overlap of these subjects, but in real life, you'll be hard-pressed to find such an ideal candidate. Usually, people tend to lean one or another way in their work, and keeping a balance is not always a priority.
In principle, one of the solutions that we use ourselves is that a data scientist should have business insights, understand the math behind the data, and work in tandem with a skilled developer. Of course, a proper data scientist should be able to write any semblance of code. But a data scientist who works with a data engineer will write and realize code together and thus both will be responsible for the quality and sustainability of a solution.
The classic tragedy of a company that decides to initiate any data science research is when a data scientist says, “I’ve got 40,000 lines of Python code on my PC. Can you make it work in production?” And, of course, this is virtually impossible to do. All your research has been wasted.
Any team that deals with data science has to be cross-functional — it has to cover a whole stack of the solutions it writes. In a normal infrastructure, there should be a DevOps engineer, data scientist, data engineer, and a product developer writing an app. This is a single team that is responsible for one result. They should work together and solve related tasks.
All of this means that the whole team is responsible for the business result. This is also true for the transitionary research done by a data scientist, which is impossible to use in production on its own.
Old-School vs. Vertical Teams
To dig in deeper, let’s consider a classic old-school layered company organization structure with a department of data scientists, operations, UI developers, big data analysts, QA engineers, and so on. In this case, we have every project penetrating most of these teams. The classic problem is that tickets and tasks are being thrown around from one team to another, and the real business goals are being watered down along the way and not solved in the end. Instead of this horizontal division, we divided the teams vertically. This allowed us to create teams that see a clear-cut goal they need to achieve. And at the same time, they can improve their cross-skills and boost their responsibility levels.
As a result, such teams began to deliver, and Scrum and Agile began to work properly. It's not directly related to data science, but nevertheless, many data scientists simply work at a university and write academic papers. That's a topic for a whole new article, but for now, you need to distinguish that there is a data scientist and a production data scientist. And you should aim to employ the latter within your teams and not let a data scientist work remotely.
Opinions expressed by DZone contributors are their own.