DZone Data Persistence Trend Report: Leaders in Tech
Members of the Oracle Executive Team offer critical advice on data persistence and management.
Join the DZone community and get the full member experience.Join For Free
The world of data persistence and management is constantly shifting and evolving. And for developers, being able to stay informed of technology trends and best practices to harness simple, accessible data is critical.
We sat down with members of the executive team at Oracle to discuss our research findings and gather advice for the developer community. Jenny Tsai-Smith, Vice President of Overall Database Product Management, and Tirthankar Lahiri, Senior Vice President of Data and In-Memory Technologies, at Oracle explain the importance of simplicity, consistency, and more so that developers can better utilize their organization’s data.
To read the full interview and DZone-exclusive research, check out our latest Trend Report, Data Persistence.
Jenny & Tirthankar’s Advice to Developers
- Follow the right example. Examples are great. But instead of always following the architecture and scale of web giants like Facebook, look at companies with similar needs and requirements.
- Don’t underestimate the power of SQL. All members of your organization who are handling data and/or databases regularly should be fluent in SQL and understand the basics.
- Continue to focus on consistency and availability. When looking at the CAP theorem, consistency and availability are most important in delivering efficient, reliable applications.
- Be aware of long-term consequences. Developers often are very tactical in solving problems, without a focus on long-term consequences. This can lead to inherent inefficiencies throughout the application.
Two thirds of respondents reported frequent mismatch between the DBMS’ ability to model the domain and its ability to read/write performantly. What’s your advice when tackling that mismatch and closing the gap in performance degradation?
Jenny: There isn’t really any business process or any business data that modern databases can't model. The problem lies in the fact that the users of the systems may not have the knowledge or experience to use database systems correctly, and in particular, we're talking about configuration of the interactions with the database management systems. So the performance problems that they're experiencing are not so much that the data model can't be accommodated by the database management system, but it's that they haven't configured the system correctly.
Tirthankar: In a lot of applications today, developers take short-term approaches as they try to solve their immediate problem(s). But what they do is create infrastructure that doesn't extend easily, and then people come in and try to enhance the same application; they end up with a lot of long-term problems. One classic example is an application that interacts with the database for every single data access operation instead of pushing work down in much larger chunks to allow the database to process data most efficiently. For instance, some applications join data between different tables within the application itself, instead of letting the database perform the join much more efficiently.
I've helped many customers who think they have database problems, but actually, they have application design issues, such as the improper management of database interactions. The thing is, modern enterprise-grade relational databases are extremely sophisticated and have a lot of very advanced capabilities. What people often get tripped up by is the fact that while you can certainly get going with basic design principles and not be very sophisticated, it is still important to design upfront for scale and performance. So, study the capabilities that exist within your database and model your data properly — and know how to engineer the database accesses so that you interact optimally with the database.
Senior software professionals (>5 years of experience) are somewhat more likely to write SQL manually more frequently than junior professionals. What are your thoughts on manual SQL writing? Who in the organization should be tackling SQL and why?
Jenny: SQL is a really simple yet powerful language. Everybody who's studying computer science should learn it and should gain some familiarity with it because SQL can do so much more than most programming languages, where many lines of code are needed for just a single SQL statement. The other thing is the reliance on tools that generate SQL statements; this really is not a good thing because most of these tools are generating SQL statements that are sub-optimal because it has to be generic. It has to work with any number of database systems, and so it's really better for people to learn SQL and understand the basics of it. Developers should definitely learn SQL and be fluent in it, but also the data analysts and data scientists should also probably gain some understanding of SQL.
3NF, including 3.5NF, is the most popular approach to relational data normalization. Are you surprised by this finding? Any concern around such high usage here? And what’s your advice for those considering alternatives?
Jenny: Real-world applications do require some discipline and data modeling. And so, this particular phenomenon is pretty expected. But one of the things that, when I used to teach data modeling, I told people was not to get too carried away; it's not a religion, so you have to look at what you're trying to do and what you're trying to accomplish in it. And in some cases, maybe you don't have to go to 3.5 normal form. Maybe the third normal form is fine. Just think about what it is that you're trying to accomplish. But in general, we do see that applications that are built with a data modeling discipline up to third normal form are actually going to be better suited for long-term application use.
Tirthankar: It is, unfortunately, common for applications to implement their own metadata. For instance, they may embed a picklist of values for a column within the tool, such as the list of names for a “universities” column. The problem is that other tools or applications cannot easily share that same value metadata, and different applications may even spell the same university in different ways leading to data divergence (e.g., “UCLA” vs ”University of California, Los Angeles”). Third normal form or 3.5 design centralizes the list of university names in a shared database table so that applications use references to values in that table, and thus view the data consistently. This is a good design practice. It lets the application evolve more easily. Otherwise, we can end up with fragmented metadata and downstream anomalies across all the different applications that interact with the database.
And for people considering alternatives, consider carefully. It’s okay to deviate from the normal form in certain cases, but do that with deliberation and be aware of potential downsides. I think the main thing is just to be aware of the consequences when you develop the application and the data models.
Respondents reported that they use ORMs more frequently now than in the past. Any insight as to why that is? Technology changes, market changes, or something else?
Application developers do look at code examples extensively on GitHub or Stack Overflow. They look at how other people map databases to the application’s needs. Then, they adopt those examples. And, in a sense, that's why this snowball keeps growing. There’s now a bigger body of code that uses our DBMS via tools and frameworks, without developers having to interact directly with it.
From the developer's point of view, it's great because they just deal with objects and the objects automatically get persisted to the database and retrieved from the database. However, at some point, you do run out of performance and extensibility so that, again, you can get going quickly with the ORM, especially for Transaction Processing applications, which is what the big draw of ORMs is. But there does come a point where you need to do more and write more efficient native SQL to really get things done in a performant fashion, especially for complex applications and analytic reporting.
When looking at Brewer’s theorem regarding CAP tradeoff, we found most respondents reported the prioritization of consistency over availability and (least of all) partition-tolerance. Is this attitude/mindset a problem? What (if any) should be most important?
Jenny: Personally, I'm really glad that consistency is becoming more top of mind. Generally, partition-tolerance is much less of a concern. I asked my son, who's a computer science major, what he thought, and he said, with tongue-in-cheek, “Well, you know you can solve the partition-tolerance requirement by just having one node.”
So here’s an example of why I think consistency is most important, with availability as the second: My daughter decided to redecorate her room. We found this really nice cabinet for sale at Target, and with Target, their website allows you to choose where and when to pick up your merchandise. So we did that, and it said: “Yes. You got it. There’s one left here and you can go pick it up in two hours.” So we did, and we showed up in two hours and it said, “Oh, it's not there anymore.”
What I think happened is we were foiled by the eventual consistency problem where they took our money but then they sold it to somebody else because the inventory quantities weren't being updated right away. It's not consistent. I personally think it's good that people are looking at consistency and ranking it the number one issue and concern to solve for. And then the other one is availability. Obviously, if their website wasn't available, then we couldn’t even do business.
Tirthankar: Jenny said it right: Consistency and availability are what people really care about. There are very, very few applications today that really need partition-tolerance. Globally spanning geo-distributed applications may care about partition-tolerance, but one thing that I think that most developers are realizing is that not everybody needs that type of scale. The problems of planet scale are typically not the same problems most developers encounter. Most developers need to design for efficiency, availability, and correctness. So the fact that there is a focus on C and A makes a lot of sense.
However, from what I have heard about Google, they too walked away from eventual consistency (required for the P in the CAP theorem – partition-tolerance) saying that it's great for performance but horrible for developers. If you can't reliably read what you just wrote into the database, it makes application development very very challenging.
When looking at the future for data persistence and management, are there any important developments we’ve missed? What do you think is the most important thing to keep in mind over the next 6-12 months in terms of data persistence? And what’s your advice to the developer community?
Jenny: My advice for developers is simplicity. So one of the things we talked about earlier is the impact of speed of change and new business opportunities. This includes changes that we can't predict like the pandemic. Simplicity is something that you can achieve if you look for products that can be more than just that one piece of the puzzle. Oracle’s strategy is to provide customers with what we call a converged database approach. By this, we mean the ability to support any data type, data model, or any workload type — at the same time with the same database engine.
The good thing there is that you eliminate the cost of integrating a bunch of different data stores and/or data engines. And also, you remove the complexity because oftentimes complexity will introduce insecurity and availability problems. Additionally, there are no training costs. I know developers. They like to learn new things. But if you're an IT Manager or CTO, you want to be able to use the same resources for many different things. And so, by limiting the number of database engines that you're bringing in house to get the work done, you are minimizing the investment cost and learning curve for introducing new and different sets of skills. So that's the main thing that I want to promote to developers: simplicity.
Tirthankar: Simplicity is the key to everything. Simplicity wins in the long term. And a lot of applications that I see today are tactical — they'll adopt some technology to solve a specific problem. And over time, that ecosystem develops this proliferation of different technologies, one for each new problem they solve. Security becomes especially problematic when they do that because each of these engines has its own independent set of security vulnerabilities. So while adopting single-purpose technologies often does solve the immediate problem, it also creates long-term headaches.
I look at some of these application designs and I often think, “Look, you guys are trying to follow the process of building a Formula One racecar when you are really trying to build a street-legal passenger-carrying sedan."
Formula One racecars are great. I love Formula One racing. However, I know that much of its technology doesn't really belong in my car. I do love fast cars, but the car I drive has to also fulfill other requirements — such as safety, reliability, efficiency.
The thing is to be realistic about the examples you follow. Examples are good, and it's always fun to look at extreme-scale architectures — there’s nothing wrong with them. But make sure the examples you follow are based on needs and requirements similar to what you need to meet. And keep it simple.
To read the full interview and DZone-exclusive research, check out our latest Trend Report, Data Persistence.
Vice President, Overall Database Product Management
Since joining Oracle in 1993, Jenny Tsai-Smith has held leadership roles spanning technical support, content development, education delivery, plus Oracle Cloud acceleration of startups and scientific research. As the leader for database product management, Jenny works with release and development management to take products and services from design through development to production. Her team runs the customer advisory board, drives technology adoption partners, performs field enablement, assists with migrations to Oracle Database, and works directly with a wide range of customers. She holds a BS in Biology from Stanford University.
Senior Vice President of the Data and In-Memory Technologies, Oracle Database
Tirthankar has 25 years of experience in the database industry and has worked extensively in a variety of areas including manageability, performance, scalability, high availability, caching, distributed concurrency control, In-Memory Data Management and NoSQL architectures. He has 45 issued patents, in addition to several pending. He holds a B.Tech in Computer Science from the Indian Institute of Technology (Kharagpur) and an MS in Electrical Engineering from Stanford University.
Opinions expressed by DZone contributors are their own.