DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > How Automation Can Make Data Analysis Better

How Automation Can Make Data Analysis Better

Adi Gaskell examines how an automation tool, developed by a team at MIT, improves the data analysis aspect of Big Data.

Adi Gaskell user avatar by
Adi Gaskell
·
Dec. 06, 16 · Big Data Zone · Opinion
Like (2)
Save
Tweet
2.15K Views

Join the DZone community and get the full member experience.

Join For Free

data-scientist big data is all the rage, but there is much to suggest that few organizations are truly utilizing it in their decision making.  reasons for this are many, but include the grueling and thankless task of cleaning up the data we have.  often it’s in a pretty messy state and certainly not such that we can derive quick insights from it.  there is also difficulties in knowing just what bits of the data we hold are useful in making predictions.

it’s on this latter task that a team of mit researchers have developed an automated tool .  the researchers recently published a couple of papers on the process, including the preparation of data and even the creation of problem specifications.

“the goal of all this is to present the interesting stuff to the data scientists so that they can more quickly address all these new data sets that are coming in,” the authors say. “[data scientists want to know], ‘why don’t you show me the top 10 things that i can do the best, and then i’ll dig down into those?’ so [these methods are] shrinking the time between getting a data set and actually producing value out of it.”

real world problems

the researchers attempted to keep their work as grounded in real world challenges as possible, and indeed the genesis of their study were the frequent complaints brought to them by industry researchers.  for instance, it would be common that data scientists would take months to define a prediction problem, even when they had data ready and available.

the researchers, who are bringing their tool to market via their feature labs company, developed a new programming language, called trane, to reduce the time data scientists spend on defining prediction problems to days rather than months.  the team are confident that similar improvements can be made for label-segment featurize (lsf) processes.

the system was tested out on real-world questions that were posed by data scientists working with around 60 datasets.  even with a relatively small sample, the system was capable of devising not only all of the questions posed by the data scientists, but also many that they hadn’t considered.

the work represents a big step forward in allowing data scientists to represent prediction problems in a more efficient way so that these can be more easily shared between data analysts and domain experts, which is an area of real difficulty at the moment.

it’s likely to be one of many tools that emerge to make us more effective at working with the big data that our organizations are generating.

Data science Big data Data analysis

Published at DZone with permission of Adi Gaskell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Are Microservices?
  • Open Source Monitoring and Metrics Landscape
  • No-Code/Low-Code Use Cases in the Enterprise
  • Top 7 Features in Jakarta EE 10 Release

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo