How I Built the Perfect Data Science Team

DZone 's Guide to

How I Built the Perfect Data Science Team

How to data science teams compare to development teams? Read on to get the view of one big data expert.

· Big Data Zone ·
Free Resource

When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering in Big Data and data science. Now is a good time to reflect on this story that started twelve years ago.

At first, I really wanted to title this article “How I built the perfect data science team (without knowing it).” However, I did not want to give the impression I did not know what I was doing (I think I did). Nevertheless, here is my story…

In 2007, I founded GreenIvory. The idea was to build a toolbox for web marketers. Whether a marketer wanted to automate content distribution, content generation, or measure brand awareness through sentiment analysis, we had a solution (and more!). A little later, the team started working on NLP (natural language processing) and we released our first product, capable of sentiment analysis at scale in early 2011. We solved many technical challenges but let’s focus on the human and organizational aspect.

The “green team” was comprised of a bunch of talented software engineers. Each engineer had their strengths in various key elements of the system: UI, data, crawling, system, ops, and more. We had rolled out several projects and products before. It was a working model. However, we did not have the science. We needed someone who could help us infuse scientific knowledge into the engineering team. That’s when we teamed up with the University of Strasbourg and hired a data scientist (that was not his title back then).

Timeline and Business Value

Extracting the business value from the science was not an easy task for a start-up

Our main issue was the timeline. Or more precisely the lack of alignment between the pace of data science and engineering. At that time, we were already following agile methodologies. As most companies in those days, it was a home-grown version, but it was team-driven and we had a great agile champion. It was working smoothly.

Each Sprint was delivering business value and we frequently updated our artifacts in production. The challenge was to incorporate the work of the scientist in an engineering organization

Blending the team was on of the element of successIt came down to integrating him directly into the development team. I wanted him to act as a lighthouse. It was not easy. There was a bit of a culture clash. The engineers did not get why it took so much time to get something and why it was so rough in the making. On his side, the scientist could not understand why his experiments, although successful on his Mac, would not scale when we would throw millions of sentences at his algorithms.

Finally, after numerous sessions of pair programming, discussions, and building stronger team spirit, we were able to leverage the science in our product.

Counter Example

The model where data scientists are parked in silos did not work for me

More recently, I have experienced a different organization, where data scientists were parked in a silo. The idea was to deliver the science almost as a consumer-ready product to business analysts and users.

Don’t get me wrong, they were able to deliver, but the silo remained the silo. The knowledge and intelligence built by the team were not getting to the rest of the organization.

A side effect was that the team was growing, eventually, merging with another team… and you know what happened: they needed more pizzas. And when you need more pizza, productivity goes down. If it’s not in the original Agile Manifesto, it must definitely be in its first amendment.

A Data Science Team Twelve Years in the Making

More recently, I attended an inspiring talk by Stacey Ronaghan at Think 2019. Ronaghan is a data scientist at IBM. She was summarizing her experience as a data scientist and being part of a team. This is when I realized that, twelve years ago, we were not that far off.

Stacey Ronaghan at Think2019

She defined the team as a key driver for success. The teams she worked with have various roles around data science, like an executive sponsor, a database administrator (that darn data!), a business analyst, a project manager (in 2019, we call them Scrum Masters), SME (subject matter expert), solution architect, software engineer, designer, and design thinking practitioner. So yes, it is a very eclectic and cross-functional team. Like a software engineering team.

The delivery is based on the value it brings to the organization. The team is not living in isolation or in a remote comfy cocoon where they just study for the sake of studying. They deliver. They solve problems.

And solving problems helps them bring business value. Like an Agile team. Her team works in an agile way. Achieving two-week Sprints is also possible.

Like in a software product organization, her team goes through building an MVP (minimum viable product). That’s where her customers can take over.


Data science is an integral part of a software engineering team, not an accessory

Each stakeholder has a role. The scientists can define a vision, craft an idea, find the right algorithms. The engineers can then “take it home” and transform the idea into production code in their toolbox or platform. Finally, the application developers can combine the science, now industrialized in the platform, to build a great product. This is what I call the industrialization of data science.

After these experiences, and being able to confront some ideas and part of those experiences with others, here are my conclusions (so far):

  • A data science team is not very different than a software engineering team.
  • Expectations are different, as the experimental part of it is more important.
  • Standard software methodology (Agile, SAFe…) can apply but it is more challenging on the research part.
  • As TDD is becoming a standard, Test Driven Data Science is not there yet.
  • There are new challenges like bias, but couldn’t that be part of the QA?
  • Governance of models is also a challenge that did not exist before.
Originally published on my blog on Feb 26th 2019 at http://jgp.net/2019/02/26/how-i-built-the-perfect-data-science-team/.
data science ,big data ,agile data ,data science teams

Published at DZone with permission of Jean Georges Perrin , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}