DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Leveraging IBM WatsonX Data With Milvus to Build an Intelligent Slack Bot for Knowledge Retrieval
  • Split-Brain in Distributed Systems
  • DZone Community Awards 2022
  • In-Memory Showdown: Redis vs. Tarantool

Trending

  • Cosmos DB Disaster Recovery: Multi-Region Write Pitfalls and How to Evade Them
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  • Dropwizard vs. Micronaut: Unpacking the Best Framework for Microservices
  • Scaling DevOps With NGINX Caching: Reducing Latency and Backend Load
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Why Your Database Needs a Machine Learning Brain

Why Your Database Needs a Machine Learning Brain

Learn how using ML in the database is making it easier to create forecasts about what that data will look like in the future.

By 
Jorge Torres user avatar
Jorge Torres
·
Jun. 14, 22 · Opinion
Likes (3)
Comment
Save
Tweet
Share
6.8K Views

Join the DZone community and get the full member experience.

Join For Free

The past 10-15 years have seen organizations put vast resources into creating databases that let them understand their business better, spot trends earlier, and manage tasks more effectively.

Indeed, a whole industry has now grown up around it, not just with database companies like Clickhouse, DataStax, MariaDB, MongoDB, MySQL, PostgreSQL, SingleStore, or Snowflake, but with a swathe of companies developing business intelligence (BI) tools like Tableau to give insight from the data housed in them.

These databases have traditionally been great at using historical data to spot trends but forecasting (or rather, accurate forecasting) was a little more elusive. Artificial intelligence changes this and as machine learning capabilities improve, it is becoming possible to make far more accurate predictions — in some cases, hour-by-hour business predictions.

Consequently, AI adoption is accelerating, especially in the wake of the COVID-19 pandemic. According to PWC, most of the companies that have fully embraced AI are already reporting seeing major benefits.

What Predictions Are Possible

Databases now collect and hold information from virtually every function in a business and organizations are turning to ML to use these data more effectively. Indeed, recent announcements on ML have come from organizations as disparate as Vancouver’s bus company TransLink, which used it to improve arrival time predictions and warn of potentially crowded buses; and the Munich Leukemia Laboratory, where researchers are using it to predict if gene variants might be benign or pathogenic.

From a business intelligence point of view, ML can be used in, for example, retail to optimize promotional displays, just-in-time stock control, and staffing levels. It can be used in energy production to predict demand and outages, or in finance for better credit scoring and risk analysis.

A good example of how organizations can use ML’s predictive capabilities on their existing data can be seen in a dataset we recently presented using data from New York City Taxis and its payment system app from Creative Mobile Technologies (CMT).

This is a hugely complex system, with the distribution of fares not only varying throughout the day for a single taxi vendor, but also between the taxi vendors themselves. Adding to the complexity is there being multiple vendors, each having its own time series.

Fig 1: How temporal dynamics vary for each group of data — using NYC Taxi data

However, once this data was cleaned, it was possible to use the historical data from the database and use a SQL query and MindsDB to train a multivariate time series predictor that was able to accurately predict demand seven hours ahead, and do this using just three variables: vendor, pickup time, and taxi fare.

Fig 2: NYC Taxi Company fare predictions — MindsDB forecast (blue), vs. reality (yellow)

As we see, it takes about 10 predictions before a forecast mirrors reality, with very little deviation after the first 15 predictions, allowing for better allocation of taxis and drivers at specific and for specific sectors of the city.

So, Databases Need a Brain – Where Is the Best Place to Put It?

As we can see, the information in the databases can be used to make very accurate predictions with the addition of ML, and this can be used for a huge array of business applications, from predicting customer behavior to improving employee retention to improving industrial processes….

 And that gives us two options: export the data to the brain, or import the brain to the data.

 Currently, most ML systems export the data housed in a database using a similar series of steps to those below:

  1. Extract data
  2. Prep it (for example, turning it into a flat file)
  3. Load it into the BI tool
  4. Export the data from the BI tool to the ML extension
  5. Create a model
  6. Train the ML
  7. Run predictions via the AutoML extension
  8. Load those predictions back into the BI tool
  9. Prepare visualization in the BI tool

This method is not ideal. It not only takes time, but it also requires a considerable amount of extraction, transformation, and loading of data from one system to another, which can be challenging, particularly when dealing with the complexities of highly-sensitive data such as in financial services, retail, manufacturing, or healthcare.

Indeed, one small-scale survey by CrowdFlower found that 80% of data scientists’ time was taken up by data prep, and three-fourths of data scientists consider this prep as the least enjoyable part of the job.

By keeping the ML at the database level, you’re able to eliminate several of the most time-consuming steps — and in doing so, ensure sensitive data can be analyzed within the governance model of the database. At the same time, you’re able to reduce the timeline of the project and cut points of potential failure.

Furthermore, by placing ML at the data layer, it can be used for experimentation and simple hypothesis testing without it becoming a mini-project that requires time and resources to be signed off. This means you can try things on the fly, and not only increase the amount of insight but the agility of your business planning.

By integrating the ML models as virtual database tables, alongside common BI tools, even large datasets can be queried with simple SQL statements. This technology incorporates a predictive layer into the database, allowing anyone trained in SQL to solve even complex problems related to time series, regression or classification models. In essence, this approach "democratizes" access to predictive data-driven experiences.

Adding Trust Alongside the Predictions

Even with the smartest database, there is more to the application of ML technology than just the machine’s prediction. Nuance is needed, with those using such predictions required to interpret predictions and drive reliable business outcomes.

Optimization tends to happen when the models are assisted with the human decision-making process. However, even then models can still show significant biases and research has discovered the model’s output can also introduce cognitive bias to the human.

A critical aspect, therefore, is to be able to understand the model and be able to trust accuracy and value.

To help business analysts understand why the ML model made certain predictions, it’s best to deploy an ML tool that generates predictions with visualizations and explainable AI (XAI) features. This not only builds the needed trust, it also provides an opportunity for analysts charged with interpreting the results to quickly see if there are any data cleanliness issues or human bias that might skew the model output.

So, Does Your Database Need a Brain?

Absolutely. And while ML has traditionally been kept separate from the data layer, this is changing. Your database houses a great history for virtually every vital part of your business, and by using ML in the database, it is becoming more simple to create forecasts about what that data will look like in the future, running queries using little more than standard database commands.

Database Machine learning Brain (computer virus) Data (computing)

Published at DZone with permission of Jorge Torres. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Leveraging IBM WatsonX Data With Milvus to Build an Intelligent Slack Bot for Knowledge Retrieval
  • Split-Brain in Distributed Systems
  • DZone Community Awards 2022
  • In-Memory Showdown: Redis vs. Tarantool

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!