What CDOs and CAOs Struggle With Most
What CDOs and CAOs Struggle With Most
In this article, we discuss some of the biggest struggles for CDOs CAOs, including performing analytics on unstructured data.
Join the DZone community and get the full member experience.Join For Free
Our team recently attended the Chief Data & Analytics officers (CDAO) conference in Boston and used the opportunity to conduct an informal poll. The conference wills packed with C-suit executives trying to wrangle big data at companies like Tesla, Lionsgate, AMD, Capital One, and Ford. We asked everyone about their analytics challenges. There were two standout issues that we kept hearing about again and again.
1. Their data scientists get bogged down with data access challenges
A recent study showed that data preparation and data engineering tasks represent over 80% of the time consumed in most AI and Machine Learning projects.
But those may be the best cases. Those may be the projects where the overall data architecture is mapped out and understood completely at the start. A lot of data science work isn’t that straightforward, especially when you need to do exploratory analyses.
This is a pain point we heard a lot during our conversations at CDAO. Data scientists spend the majority of their time moving data to the right data warehouse for analysis, wrangling data sets so they can do data analyses even though some data may be stored in unstructured data sources like MongoDB or a datalake indexed with Elasticseach and some may be stored in traditional structured SQL databases.
This becomes especially problematic with business intelligence tools that either do not support unstructured data or use clunky connectors and drivers to do so. Which leads to the next pain poin
2. Unstructured data is the future, but doing analytics on it is stuck in the past
We had numerous discussions where attendees talked about their challenges with doing analytics unstructured data. A lot of companies are using a traditional BI solution like Tableau or Qlik, or one of the ones second generation solutions like Looker. But all of these suffer from the same problem, they only do analytics on structured data. And if you want to do analytics on unstructured or NoSQL data, they need to build a complex and costly ETL data pipeline to move everything to a data warehouse.
This was particularly frustrated for Dave, the Lead Data Scientist at a mid-sized startup with an app that collects a lot of use data. When we spoke to him, he was pretty open about the challenges his company faced in this area:
“We signed up for Looker to manage our data because that’s what our board members knew and trusted. And don’t get me wrong, Looker is amazing for most things. But when we tried to implement Elasticsearch for log files, we couldn’t merge that data with our other customer data.”
The need for unstructured data analytics makes a lot of sense when you consider the growth of unstructured databases and data solutions. Take a look at this plot of database popularity. This plot shows Oracle (black), MySQL (blue), and Microsoft SQL Server (green) at the top with relatively little change. Then it shows the unstructured solutions that are rapidly growing in popularity: MongoDB (purple), Cassandra (teal), and Elasticsearch (yellow).
With all this growth, it has prompted conversations like this one: Is MongoDB is good for analytics?
The popularity ranking score is drawn from a number of metrics but can be thought of as an approximation of relative market share. The scale is logarithmic, which makes it a bit understated, but you can clearly see that the unstructured data solutions MongoDB, Elasticsearch, and Cassandra are gaining on the old guard SQL databases.
If you would like to learn how we’re solving the unstructured data analytics issue, take a look at our Elasticserach analytics page.
Featured image credit: Photo by DJ Johnson on Unsplash
Opinions expressed by DZone contributors are their own.