DZone Research: Database Issues
DZone Research: Database Issues
The most common issues companies are having with databases are 1) data management; 2) database sprawl; and, 3) choosing the right database to solve the business problem.
Join the DZone community and get the full member experience.Join For Free
Download the Altoros NoSQL Performance Benchmark 2018. Compare top NoSQL solutions – Couchbase Server v5.5, MongoDB v3.6, and DataStax Enterprise v6 (Cassandra).
To gather insights on the current and future state of the database ecosystem, we talked to IT executives from 22 companies about how their clients are using databases today and how they see use and solutions changing in the future.
We asked them, "What are the most common issues you see companies having with databases?" Here's what they told us:
- 1) Technical level forcing a database engine to do something not designed to do. 2) Challenge around multiple database platforms where’s the data is coming from, what’s being done with it, where is it going. GDPR is putting new requirements on metadata. How do I get the metadata to store, manage, track, and automate the ongoing evolution of the data warehouse?
- 1) Loading their data into the system 2 to 3 TB per second; 2) preparation of the data; 3) The time it takes to get results and how much data you can actually analyze. Metadata tagging. GPU is good for repetitive mathematical operation. 3 TBs per hour with a single GPU. 200 separate iterations of optimization.
- Once data is moved into the cloud how to handle continuous updates and reassemble into a structure that is transactionally consistent and queryable. Easy to throw in the data lake. Hard to reassemble and make datasets that can be analyzed. We’re automating the process. Derive analytic value from the data lake. Make it production and analytics ready. Solving the problem for transactional data. Still issues for semi-structured and unstructured data.
- The top three issues I see are 1) tapped-out databases, 2) design mistakes, and 3) data quality issues. We often end up replacing databases that have insufficient performance for hybrid transaction/analytic style workloads. Design mistakes and data quality issues are more complex, although the wrong database can make mistakes more likely and data quality harder to maintain.
- Database vendors make solutions too easy to deploy on VMs and you get database sprawl. Databases don’t talk to each other and cannot be integrated. As a standalone database, you can lose all your data. We educate on the need to stack instances and create a highly available cluster. Data is always available. No more database sprawl. One platform to deal with.
- “Database sprawl” has continued to be one of the biggest issues facing companies today. As applications continue to evolve, their increasing requirements have led to a growing number of point solutions at the data layer. The organization is then forced to stitch together a broad array of niche solutions and manage the complexity of changing API’s and versions. Without a platform to contain this sprawl, companies are moving data between systems, changing the data model or format to suit each individual technology while working to learn the internal skills necessary to manage all of them. That’s why so many companies are choosing a platform to consolidate these technologies, enabling them to bring their solutions faster to market.
Choosing the Right Database
- 1) Using the wrong database for the wrong job. 2) Incorrectly classifying the buckets of different kinds of technologies. 3) Assuming just because its open source that you should be able to figure it out by yourself. You’re going to need help putting it into production.
- Because of the number of options available try one technology and learn if it's not the right fit and try another. Big data and Hadoop try to use Hadoop in ways it should not be used. Hadoop cannot solve every problem. Good fit for data lake but not advanced analytics. Vendors should do a better job of educating where they are a good fit and where they are not a good fit.
- 1) Using legacy databases (architecture in the 70s) for today’s data which is coming in at unprecedented velocity, volume, and variety. By using legacy technology, it is just not possible to derive value from fast/big data. 2) Not using the right database for the right workload/job – every database has a fit and purpose/use case. It is important to understand the individual use case and then use the appropriate database technology. 3) Not operationalize machine learning – machine learning models are built with considerable time and expense, but by not implementing them into production organizations are missing out on deriving maximum value out of the models. 4) Trying to cobble together a “solution” with multiple disparate open source tools – open source technology is tremendous, and for a handful of use cases, they can be used as an out-of-the-box solution. However, most enterprise use cases and mission-critical real-time apps require throughput, scale, speed, latency and built-in logic - open source solutions off the shelf don’t work. Assembling and maintaining a custom solution from multiple open source tools requires expensive engineering resources that only large tech companies possess, and even such a solution is complex, hard to manage and maintain, and brings increased latency with each open source layer added on.
- When companies use a generic solution such as RDBMS or Hadoop/NoSQL stores for time-series data they run into all sorts of issues from performance problems to types of queries they simply cannot execute.
- Depends on the industry. In some edge IoT situations, protocol translation is needed to get the data to the database. We connect to main systems. Most of our business comes from industries that are not leading edge but are going through digital transformation. They’ve been collecting digital data for last 40 years. They want to be able to take better advantage of the data they have. Open source is seeing broad adoption. A little over a year old. One million downloads. Active slack channel. Self-sustaining community.
- Availability and ability to scale out versus up. Manage things with automation and less manual. We live in a hybrid world. Need simplicity on-prem plus to reduce time and error.
- 1) Operational side. 2) Data modeling side. They can impact each other. When customers get managed services we spend a lot of time helping our customers on the data modeling side. Make sure customers get the most bang for the buck.
- Issues are scalability, and can they get an answer to the business question they want. SQL is used and known by a lot of people but is not the best for semantic queries with a lot of relationships.
- Varies by customer, industry, geography. If you're starting a new company tomorrow you will not do things the way you did 10 to 20 years ago. Legacy applications are able to provide all of the options – we can handle old SQL server databases.
- Most common just getting their head around how to use the database properly. An analogy is my daughter getting ready to drive. We'll start with an older car, not a Mustang. With Graph and high performance, you need to think about proper data model and deploy across multiple data centers and multiple clouds.
- The ability to coordinate disparate technologies in the diverse and often intricate conditions IT departments face. Lack of a single point of control. A single workflow often includes the sharing of information with a variety of database platforms including SQL Server, Informatica, SAP, and/or Oracle, as well as other Microsoft applications such as Dynamics AX. Integration between platforms that requires custom scripting, which is time-consuming and difficult to maintain. It also requires senior-level resources, preventing them from focusing on more mission-critical activities, and limits adaptability and scalability. Regarding SQL Server Agent, there are several noteworthy limitations. If a job running under SQL Server Agent encounters a failover event, it does not resume. Jobs are logged as started, but no further completion/failure entries are sent. SQL Server Agent cannot offer failover protection via priority escalation or restarts on another machine. Important scheduling capabilities are also lacking. There is no support for other date/ time parameters that are part of the business, e.g., fiscal calendars or unique business hours. Condition-based scheduling is limited to situations such as when CPU utilization on a computer reaches an idle state. It can, however, issue calls to SQL Server Integration Services as part of a workflow. SQL Server Integration Services can execute workflows across multiple SQL Server machines — a concept known as job chaining. Yet it cannot pass information from SQL Server to other servers, operating systems, or applications. Furthermore, there is no way to balance workloads across machines and systems to ensure completion.
- The most pervasive challenge with databases is data friction: the tension between the incredible demand for data access, the difficulty of moving large amounts of data, and the need to mitigate risk. Within that overarching challenge are a few common issues. One worth calling out is security. Many companies struggle to efficiently identify and secure sensitive data, slowing data access and introducing additional risk. Another issue is databases anchoring businesses to on-prem environments. More and more enterprises are making the migration to cloud or hybrid environments, but onboarding databases in the cloud is often a complex and lengthy process that is prohibitive to the kind of speed needed to compete — and win — in today’s Data Economy.
- There are three big issues companies are facing. 1) The first is the need to secure access to the sensitive data in their databases in order to comply with increased data protection legislation and meet demands from customers to prevent data breaches. That’s placed a much bigger emphasis on masking sensitive data in copies of the production database that are used in development and testing. 2) The second is the desire to stop the database being a bottleneck in the development process. The increasing adoption of DevOps means companies are moving away from the big bang releases of the past, where changes were released every three or six months. Now they want to release features faster and that often means the database has to be updated or changed as well. So they want to streamline and speed up the release of database changes, often in step with changes to applications, while still keeping data safe. 3) And the third is the growth in data, which means bigger databases, and an increasing number of databases, all of which need to be monitored constantly so that any glitches can be resolved before they become problems. This is connected to the desire to release changes faster because if a breaking change does reach production, it needs to be spotted immediately, preferably with a monitoring solution which provides a focused set of performance metrics to help pinpoint the cause of the problem in minutes, not hours.
Here’s who we talked to:
Opinions expressed by DZone contributors are their own.