DZone Research: Keys to Database Success
DZone Research: Keys to Database Success
Know the business needs you need to fulfill and then worry about availability, scalability, performance, and security.
Join the DZone community and get the full member experience.Join For Free
Download "Why Your MySQL Needs Redis" and discover how to extend your current MySQL or relational database to a Redis database.
To gather insights on the current and future state of the database ecosystem, we talked to IT executives from 22 companies about how their clients are using databases today and how they see use and solutions changing in the future.
We asked them, "What are the keys to a successful database strategy?" Here's what they told us:
- Determine needs. Are you sufficiently advanced to unlock the value in the database? Building on top of the relational database allows you to take data and scale across the organization. SQL can speak with all existing tools while business people can use data from Tableau. In a form factor, non-technical users can find value in the data. The pendulum is swinging back from NoSQL. Legacy manufacturing and industrial organizations never got on the NoSQL train so it’s easier for them to go from Oracle or PostgreSQL to another database or platform. If they are using SQL tools like Tableau, it’s an easy transition.
- We’re seeing the unshackling of an application based on the underlying application. CX, risk mitigation, operational efficiency are the problems to solve and databases are a piece but not exclusive. Flexibility across processing and in terms of the database itself. There are more than 150+ listings of NoSQL databases. What are the precise aspects of this workload I’m developing for, so I can choose the right database — write intensive versus read intensive. We extend out the capabilities so you can do read/write on the same database at the same time. Constant intelligent multi-model database with queries across the database while transactional uses are happening. We focus on a document database style which are important for web and IoT applications.
- Determine on a case by case basis. Who customers are, where they are based, what they are trying to achieve, what are they using in infrastructure. Teradata, Netezza interested in thinking for the future and growing from .5 TB to 10TB or 10 to 100. M&A in the works with Oracle and integrate with the mainframe. Based around the business and what the business is trying to achieve. A bank needs to decide between historical data or giving trading floor fast response time. Look back to see if we have solutions we need — data, analysis solutions, size of the load.
- The key with graph databases is all around the use case that ties back to the business. Pharmaceutical drug company wants to know how to roll out to prescribers and who are the right prescribers in the right geography. Able to correlate the data. Deducing fraud between payer, payee, and medical facility. Connect relationships in the data. What is the business outcome and how do you look to get there with what you have today? Financial services compliance has related requirements where every wire transfer needs a phone number attached to it. The account holder may have accounts in three different divisions of the financial institution. Graph pulls data from a variety of sources and correlates it. What is the business problem they’re trying to solve? Plan for the future. Needs will change.
- Understanding functional and non-functional requirements from a business perspective — time latency launch query to response. The freshness of data. Availability is important. Developing an informed opinion on the different ways to classify technology across your portfolio. Understand problems and systems that require storing and retrieving data versus aggregation and filtering (Hadoop) and systems that deal with patterns, context and causality Real-time versus transactional and online analytics. Understand how to classify a portfolio of technologies – off-line versus real-time, functional and non-functional. Once you have this, you can start mapping back to technologies.
- 1) How much of the existing stuff do you have? 2) How much of the application forces your choices? WordPress blog = MySQL. Building new apps and microservices look at the shape of the data, the size of the data ultimately, what are query patterns this helps direct to the right path. Analytics with vast amounts of data need columnar format pushes to Athena or RedShift. If operational/transactional get pushed towards DynamoDB or Aurora. For key-value lookups, DynamoDB fits well. Accounting databases like Postgres fit well. Pay attention to data models do not pick the easiest things to start with it will not scale. TB of data very different than PB of data.
- Businesses are looking at what databases to use. The market is very crowded and it can get very confusing. What does each product do well and when should it be used? Associate use case with the business requirements. What is the outcome of the exercise? It's important to look at from a business perspective and then determine the technical requirements. Telecom companies are looking to optimize network performance — location analytics are important. Viewership of high-speed data ingestion with querying important. Associate requirements back to technology.
- A database strategy should be part of an organization’s larger information management and analytics strategy and tied to the business strategy. Speed, scale, security, flexibility, and interoperability are important in every context, but the relative importance and the specific needs for each depends on this broader strategy. There is no one-size-fits-all strategy and no one-size-fits-all solution. Today, I most often see organizations trying to derive more value from data in the context of a digital transformation strategy. Speed is the key here, though speed has several facets. Raw speed in the form of low-latency triggers or real-time transactions is one area. Fast throughput per unit hardware (or dollar) is a second, and it’s about both speed and scale. Rapid time to value is a third; for this, the flexibility and ease of development with a database is what matters.
Availability, Scalability, Performance, and Security
- Database availability is a problem people have a hard time solving simply. Also scalability. Performance is needed by others. Cloud makes databases available and easier to use. Storage, replication, that’s fine if customers are driving to putting everything in the cloud. Organizations still need help with on-prem and can move to hybrid solutions.
- Availability and scalability are key to 80% of our customer base. We are focused on users who have already identified an architecture. People will come to us because we offer one technology they are interested in and as we talk we explore other requirements and drive requirements conversations around highly available technologies.
- Security, scalability, and management of the data. Train staff to learn how import, export, migrations, transform to get data in and out with security being very important. Cannot allow too many rights to the data so you become insecure.
- Our primary concerns are data durability and security. Losing data, or having your data compromised, causes irrevocable damage. Something so serious has to be your number one concern. Other important concerns are availability and performance. Ideally, all of these components work in concert to provide a safe, available, and high-performance database solution.
- In the market for better scaling, high performance, uptime. Those three things are the core and what Cassandra was built on additionally. Now everyone has the problem because they’re competing with Amazon. Companies realize they need to up their game now.
- Organizations need to take an “architectural” approach to facilitate database automation and scheduling to simplify management of complex end-to-end data workflows, reducing the need for custom scripting while improving productivity and reducing IT operational costs. The architectural approach helps database managers overcome six automation obstacles, such as Fragmented Scheduling Across Servers — Platform-specific scheduling tools create silos of automation that increase the risk of failure for things like executing scripts or SQL Stored Procedures that automate key business processes like SSIS packages, data integration, and ETL-type workloads. A workload automation solution unifies these platform-specific tools within a single framework for better control — eliminating the need to license, deploy, and manage multiple tools, and the cumbersome and error-prone practices of scheduling and executing jobs at the database level. Repetitive and Time-Consuming Database Operations — Essential to effective database operations are maintenance functions such as database backups, file system movements, FTP operations, and more. A workload automation solution can enable management via a single solution, improving batch success rates while reducing runtimes, improve resource availability by dynamically balancing workload execution across multiple databases and platforms, and provide centralized monitoring and alerts for faster resolution of problems. Complex Database Growth and Change — Implementing new technologies as the enterprise grows is a key element for competitive and efficient operations. Creating, modifying and testing scripts is time consuming and resource intensive, which can inhibit the ability to update existing workflows or incorporate new ones. The architectural approach to automation leverages production-ready templates and workflows to reduce reliance on custom scripting, providing an object-based architecture that emphasizes reusability, simplifies the modification of existing workloads, and accommodates for data growth and complexity by simplifying data integration, ETL, data warehousing, and database processes. Security Vulnerability — Organization can’t afford to risk unauthorized changes to production processes and workloads, potentially resulting in the release of sensitive information. In regulated industries, for example, any unauthorized change to a production workflow involving patient data or financial account information can be devastating. A workload automation solution establishes user-based roles, providing a platform to prevent unauthorized changes and giving database managers the ability to monitor and audit changes for security compliance. Virtual and Cloud Complications — Virtual and/or cloud-based systems encounter challenges when numerous machines on the same network need to be managed. Further complications occur when the environment is comprised of a heterogeneous combination of on-premise, virtual and/or cloud computing. These platforms come equipped with their own native job schedulers, but just as physical databases, they represent silos of automation that prevent dynamic managing and provisioning of virtual and/or cloud instances. An architectural solution provides a single point of control for virtual and physical computing platforms, reduces IT operational costs by automatically provisioning and de-provisioning virtual/cloud resources based on workload execution, and automates desktop virtualization and administrative processes that involve virtualized assets. Interdepartmental Orchestration Difficulties — As automation is frequently moving beyond the data center, and even beyond the IT department, transitions among departments can create unnecessary additional work. For example, when one department finishes part of a high-level business workflow, the information and data are handed off to another department, which often requires manual dependencies and intervention. The architectural approach offers tools for coordination, often eliminating many interdepartmental dependencies involved in handing off workflow checkpoints. A simple, single point of control for database automation can assist managers with repetitive tasks and free staff for more advanced projects that provide greater business value to the enterprise.
- The key to a successful database strategy is to use the right tool (database) for the right job. There is a multitude of databases out there: OLAP/OTLP, SQL/NoSQL, relational/non-relational, in-memory/on-disk, Graph, KV stores, etc. Some database vendors claim they “can do it all,” which is simply not feasible. For example, using an OLAP columnar database for a CRUD heavy transactional workload just does not make sense. For any modern application, speed, scale, low-latency, security and built-in operational logic are must-haves: In-memory: Over recent years memory has gotten more affordable than ever before, and undoubtedly memory is considerably faster than disk. SQL & Key / Value functionality: SQL is the most comprehensive and widely used database language, but for newer apps having the Key/Value (K/V) functionality helps, so ideally a database that can offer both enables a developer/DBA to start with K/V functionality and move to relational SQL as the app becomes more complex over time. Speed & Scale: Similarly, while initially speed (especially for high throughput/fast data) and scale may not be an immediate requirement for new apps/start-ups, as your app becomes popular and traffic grows, fast data will inevitability become a reality. It is crucial to consider the speed at scale and future-proof your database investment from the get-go. Low Latency: With the rise of the digital economy comes intense competition, and consumers of apps are demanding instant gratification more than ever before. In most mission-critical enterprise and consumer apps, the latency of even a few milliseconds is just not acceptable and leads to app abandonment and loss of revenue/market share. Choosing a low latency database is more important now than ever before. Operationalize Machine learning: Legacy database technology was focused on analyzing historical data to gain a rear-view understanding of business performance. While it is important to analyze where the business is coming from, in order to gain the competitive advantage and differentiate your application it is critical to utilize deep learning and take action in-event, with the end goal of driving desirable business outcomes. Building a robust predictive analytics model is only half the battle. Utilizing the model in production for real-time decision making is the key element of machine learning. To operationalize machine learning models, you need a database that can deploy a PMML model built in Apache Spark (or other popular data science tools/languages) as a Java-based stored procedure.
- All of those things. However, I’d focus on data management, speed, and security. Firstly, data management is critical, particularly when it comes to hybrid environments and test data management. From there, speed is more important than ever. To stay competitive, businesses need to release applications at a faster and faster clip. Data is often the No. 1 bottleneck preventing them from releasing as frequently as they’d like to. At large enterprises, it often takes days or even weeks to deliver data to application teams. The bottom line is, developers and testers simply can’t work in the absence of data. Companies need to bring speed and automation to the data layer in order to stay competitive. And finally, data access can’t compromise your database’s security. Technologies like masking allow you to automatically identify and replace sensitive data with a realistic, non-sensitive equivalent. This means you can access and use data to power innovation without increasing risk.
- We think the key is to keep data safe while being able to release updates to the database to take advantages of new features faster. This may appear to be a paradox, but introducing practices like version control, continuous integration, and automated releases to database development help to protect data while enabling changes to be released quicker. There is one source of truth in development, for example, changes are tested at the point they are made so errors are caught earlier, and human error is removed from deployments.
- Time-series databases are required to have high ingestion and query rates and ability to support complex mathematical functions. Our purpose-built modern time-series platform can handle: 1) New workload requirements which require a high volume of real-time writes and require solutions that can handle both irregular (events) and regular (metrics). 2) Time-based data and time-based queries which can handle aggregation, summation, range scans and other functions such as ordering, ranking and limiting. 3) High-level scalability and availability with distributed design to provide fast and consistent response time with an always available architecture.
How to coach clients through developing and deploying on a chosen platform. DevOps aspects. Write code optimized for a particular platform. Customers are using multiple database platforms. They need to think about how to take advantage of the elasticity of cloud platforms.
- The biggest “aha” moment is when organizations realize they left the database behind while focusing on software delivery. You have to think about the full stack. Treat the database as a first-class citizen when automating software delivery. A lot of teams have a manual process from database release. You need to automate the process. It’s a bigger shift to have cross-functional teams have a database professional who brings perspective to the team. Bringing DevOps to the database.
- Provide coaching on how to assemble, stage, and move from transactional systems to the cloud into a data lake in the cloud. First landing is to the object store, a low-cost storage layer and then provision out based on needs. Assemble into production-ready data for analytics use cases. Moving to Amazon S3, create a cloud data mart in Snowflake. Provide guidance as the organization moves to the cloud, think about how to handle updating datasets and provision subsets out as needed.
- Over the years, enterprises have invested significant financial and human resources on their databases. Consequently, any move to new technologies is undertaken with much caution and due diligence. In this context, the best database adoption strategy is to adopt new and modern databases for new workloads as well as introduce them in places where the traditional databases no longer meet the needs. The microservice architecture that is prevalent now easily allows for this to happen as enterprises modernize their infrastructure by moving away from monolithic applications running on traditional databases to microservices running on NoSQL databases, wherever necessary. These microservices can then be developed and modified with agility and deployed at any scale on-prem or in the cloud.
Here’s who we talked to:
Opinions expressed by DZone contributors are their own.