Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Science and IT: Finding Common Ground

DZone's Guide to

Data Science and IT: Finding Common Ground

There's a disconnect between the development and the production of models, and much of this disconnect occurs between data science and IT.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Increasingly, the ability to infuse machine learning into your business is separating successful organizations from those falling behind their competition. But putting data science into production can be a difficult task when you're struggling to align stakeholders, keep up with the latest open-source tools, and build models at a rapid pace. Today's data science teams often measure their success by the number of models they put into production, but the reality is that most companies still have very few models deployed.

So, why are so few companies successfully implementing machine learning at scale? One major reason is that there's a disconnect between the development and the production of models, and much of this disconnect occurs between data science and IT. So, why can't these two teams find common ground?

The Data Science Leader's Perspective

We've all heard that the data scientist is the sexiest job of the 21st century, but managing a data science team is another story entirely. Data science leaders, who are often practitioners themselves, manage teams who might never have operated in a production environment. Data scientists should spend more time aligning with IT, but the bulk of their time is spent accessing, wrangling, and cleansing data. On top of that, IT needs data science leaders to address governance, legal, and compliance risks, but they're often not very skilled in these areas.

Another point of contention between these two teams is using open source. Open source is a must-have for today's data scientists, but waiting months for IT approval on widely used open-source packages has forced many data scientists to download unapproved software on their desktops.

The IT Leader's Perspective

For many IT leaders, working with data science teams can be a nightmare because they just don't have the level of control they had with traditional software development. Older software development tools are production environment-ready, but today's data science teams use open-source tools like Python and R, which are difficult to put into production. The high number of Python and R packages have made package management as well as version control difficult for IT to manage.

To further exacerbate problems, many data science teams barely document data, packages, or results, so reproducing models becomes a huge issue. IT may be responsible for monitoring deployed models, but they don't know whether the models are still accurate or are being used in the right way. This often leads to poorly performing models producing inaccurate results that lead to bad business decisions.

Find Common Ground

So how do data science and IT find common ground? Check out our THINK 2018 session where we bring a real data science leader and IT leader on stage for a "couples therapy" session to talk about their perspectives, goals, and how they can work together more effectively.

If you're a data science or IT leader struggling to get your data science practice into production, click here to watch the video now, or learn more about IBM's Data Science Experience here.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
data science ,it ,big data ,open source ,machine learning

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}