Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Why Speed Matters When It Comes to Querying Your Data

DZone's Guide to

Why Speed Matters When It Comes to Querying Your Data

To start thinking about how people ask questions of their data, it helps to look at a place where 3.5 billion queries are submitted each day: Google.

· Big Data Zone
Free Resource

See how the beta release of Kubernetes on DC/OS 1.10 delivers the most robust platform for building & operating data-intensive, containerized apps. Register now for tech preview.

When evaluating new tools, speed can sometimes be at odds with how many bells and whistles it comes with or how much information it can handle. You can get a CRM platform with a ton of features, but it’ll take a while for your team to understand how to use them. You can get the most highly tailored automated email generator but exponentially increase the time it takes to generate an email.

When it comes to analytics and querying your data, speed matters.

From everyday search engine queries to enterprise-level data queries, getting results as quickly as possible is just as important, if not more so, than the volume of data that can be stored or the complexity limits of a query.

As we’ll talk about in this post, this conclusion is based on how people actually search for things—as a multi-step, imprecise process that follows the speed of thought, not a single detailed question.

What We Do When We Ask Questions

To start thinking about how people ask questions of their data, it helps to look at a place where 3.5 billion queries are submitted each day: Google.

Even five years ago, when you did a Google search, you were on average searching through an index of 52,000,000 pages.

But according to a 2015 Chitika study, the first result alone got an average of 32.5% of the total traffic. And forget about looking at the second or third page of results — only 8% of traffic makes it there.

Image title

People are really only clicking on the first few results, which seems to point to the conclusion that they must be finding the links they want within those results. Out of those 52,000,000 pages, are people using picture-perfect search terms that get them the exact result they need within just the first four?

Probably not. Let's take a look at another statistic: the number of words used in Google queries.

Image title

94% of searches use five words or fewer. And over a third of searches are just a single word!

If people were finding the results they wanted straightaway, you'd expect them to need more words to get there. Short search queries tend to give broader results, with what you want buried further down the list or a few pages in. But we know that people aren't clicking on those later results. This suggests that people aren't actually finding what they want in one search, but are searching using an iterative style, formulating many short search queries until what they want shows up in the first few results.

For example, let's say you want to learn about different types of cloud computing environments. You'll probably search something like "cloud types." Whoops. All the results on the first page are about real clouds. But you're not going to waste time looking through results until you find something that has to do with cloud computing (not until page 4, by the way). Instead, you'll change your search to something like, "cloud computing types." From the titles of those results, you'll see that there's discussion of public, private, and hybrid clouds, so to get more detailed, you'll search "hybrid clouds." But you're more interested in advantages and disadvantages than definitions, so you search "hybrid cloud benefits," which gets you to a link you'll actually click on.

People don't carefully think out detailed search phrases or look through page after page of results. Instead, they rapidly iterate on imprecise queries until the result they want is listed near the top.

Why Speed Is Key for Queries

This behavior — looking at a query as more than just a single search — is what makes speed so important.

If people could instantly come up with queries detailing exactly what they wanted to look at or if they had the time and motivation to look through huge reports, then the amount and accuracy of the data on hand would be more important than speed. But this isn't how people work.

People can't predict which sectors of the customer population will be most interesting before they look at the data. They won't know all the variables that need to be included and excluded. And your data team can't solve this by producing long-winded reports with a thousand variations; people won't look past page 1.

The only solution is to take out the middleman and let every employee interact with the data quickly and easily. If it takes a while to receive a query result, the process of iterating off of imperfect queries will be incredibly time consuming-so people won't do it. They'll either work with the imprecise results of their first query or not look into the data at all.

In Interana's 2017 State of Data Insights Report, almost a third of the 200 respondents from various companies in various positions listed slow query speeds as a top pain point with their existing analytics solution. And almost two-thirds couldn't get answers to their queries in less than a full day.

Image title

If it takes more than a few seconds (let alone a few days or weeks) to get a question answered, then your company loses a large piece of the benefits of collecting data in the first place.

When Interana co-founder Lior Abraham introduced the Scuba analytics platform at Facebook back in 2011, it "caused a phase change" in how people analyzed data at the company.

Scuba runs interactive, ad hoc queries in less than a second. It was built because Facebook's old query systems, Hive and Peregrine, were too slow to catch performance bugs before they affected a large portion of users. With Scuba, these bugs could be identified and fixed in minutes to hours, not days. And even though Scuba was just meant for performance analysis, it soon became the system of choice for many teams. People noticed a huge leap in how quickly insights were gleaned from data, and former Facebook engineers even say that Scuba is the thing they miss the most about working there.

If you want people across your organization to start using your company's data analytics tool every day, it needs to be fast enough.

Speed Up Your Queries

If you’re having problems getting people at your company to use your data analytics stack, speed may be the problem. High latency is an anathema to reeling off queries on a daily basis.

Ideas will be lost if people have to carefully formulate each question, go to their data team, and wait forever for a response. When you use a high-speed analytics stack designed for ad-hoc queries, you allow people to query the way they do in the rest of their life — quickly, imperfectly, successively.

New Mesosphere DC/OS 1.10: Production-proven reliability, security & scalability for fast-data, modern apps. Register now for a live demo.

Topics:
big data ,querying ,speed ,search processing

Published at DZone with permission of Archana Madhavan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}