Writing VoltDB Apps in Java Q&A
Join the DZone community and get the full member experience.
Join For FreeCurator's Note: The content of this article refers to VoltDB 2.8.2.
NewSQL and NoSQL databases present developers with a new and interesting programming model where there are fewer connections to the database, query parallelism, partitions and an easy model for supporting new clusters. Last month I presented a webinar introducing the basics of application development using VoltDB. We had a lot of great questions and I thought it would be handy to write them up and share the answers with everyone.
Q: Can you address how you would handle joining
tables that have different partition keys? Would you not partition the
tables in that case?
A: For multi-partition queries you join replicated
tables and partitioned tables, and you can join two or more partitioned
tables but they have to be joined on all table’s partition key.
If the query has a where clause that limits the query to a single partition, it can be executed as a single partition query. For single partition queries, you can join any combination of replicated and partitioned tables.
In all cases, consider joining on columns supported by indices (so that they are fast).
There are a number of factors to consider when joining tables in general sense. Should the table be a replicated table, thus local to every partition, or is it too large? For example, I may have a “users” table that contains thousands of users. I may need to map those users to a role and there may be only seven or eight roles within this “roles” table. In this case, it is safe to partition the “users” table and to replicate the “roles” table.
Consider though if your “users” table was joining with an “orders” table. This “orders” table may have thousands or millions of rows, making it a poor candidate for replication. Further, because of business rules it may be necessary to partition the orders on something other than the user but instead on the order ID. A join in this circumstance would result in a multi-partition query.
Q: When writing VoltDB SQL select statement that
makes use of a non-unique index, does all the selected columns need to
be included in the ORDER BY clause, in order to guarantee a
deterministic behavior?
A: This is generally true since the non-unique index
does not guarantee the relative ordering of records with duplicate
indexed columns and different non-indexed columns. However, if ALL of
the selected columns are indexed columns in the selected index, the
columns do not need to be included in the ORDER BY clause.
If there are other (non-indexed) columns, the ORDER BY clause should include all of the selected columns.
For a unique index, the values of any other columns would be completely determined by the indexed column’s ordering.
Q: PostgreSQL has an ARRAY [] cell type, which we extensively use. Does VoltDB has something like that?
A: There is really no comparable type without using
your own custom serialization mechanisms. We are releasing a new JSON
column type in a future revision and JSON supports array fields. You can
pass arrays to stored procedures as well so using the upcoming JSON
type or a custom serializer may become options for you.
Q: Are Java Stored Procedures compatible with Oracle Stored Procedures? If not, do you know how to manage a migration?
A: They are not compatible. We use a number of VoltDB
specific Java classes and consequently cannot run existing Oracle Java
stored procedures.
Managing a migration, regardless of the source DB, is a bit tricky and
is dependent upon your schedule. Under the best circumstances I would
recommend reviewing the procedures first to see which ones make sense to
move to VoltDB. You may find that one or more stored procedures can be
merged into a single stored procedure. Next, I would build unit tests
against the Oracle Java procedures using a known dataset and then create
the VoltDB stored procedures and test them using the unit tests.
Q: Is it possible to use CDC (change data
capture) to replicate VolDB data into an Analytics Database? If not,
what are the alternatives?
A: There are a number of strategies that VoltDB users
use to push data to analytics databases. The first is to use the VoltDB
Export table functionality. Export tables act like persistent queues –
you can add data to them via SQL and an external client polls on that
data through a well-published protocol. This external “export” client
pulls on the data and delivers it to the target external system, ack’ing
the receipt of the data (and thus removing it from the VoltDB Export
table). VoltDB supports (and ships) a flat file export client, a
PostgreSQL client as well as a Hadoop export client out of the box.
Developing new export clients are a fairly simple task. For more
information on the VoltDB Export feature please see: http://blog.voltdb.com/voltdb-export-connecting-voltdb-to-other-systems/
Other more traditional CDC strategies such as time stamping the rows or versioning the rows are easy to accomplish in VoltDB at the application layer by adding the appropriate columns to your tables. There is no internal timestamp per row, for example, in VoltDB.
Q: Is the database size is limited by the size of RAM available? How to manage virtual memory?
A: The database size is limited by the amount of memory
within the cluster. We suggest users disable swapping to avoid
unintentional performance degradations.
Note that virtualized machines often do not allow you to disable swapping or swapping is hidden from the guest OS. In either case, it is a good idea to do throughput testing as other factors can come into play that can negatively impact throughput and latency, such as the quality of the network and the drives assuming you are using either snapshots or command logging.
Q: What is largest implementation in terms of number of tables?
A: There are hundreds of VoltDB applications deployed
throughout the world. While we don’t have exact table count numbers for
those applications, we know of implementations in the 100 to 200 table
range. Additionally, we have customers deploying applications on
upwards of 30 nodes both on bare metal, as well as on Amazon’s cloud.
Q: Hi – we like to do all our builds with Maven. Do you support that?
A: Yes, we do work with maven but our libraries are not
in the central repository. You usually need to manually add two
libraries. The first is the client library if you are working on the
client side and the stored procedure library if you are working on the
server side. The following maven import statements are examples of what I
used in VoltDB 2.8.2:
mvn install:install-file -Dfile=./voltdb-2.8.2.jar -DgroupId=org.voltdb -DartifactId=VoltDB -Dversion=2.8.2 -Dpackaging=jar
mvn install:install-file -Dfile=./voltdbclient-2.8.2.jar -DgroupId=org.voltdb -DartifactId=VoltDBclient -Dversion=2.8.2 -Dpackaging=jar
Q: Is there any recommended way to keep the db schema evolution in order (I.E. as a set of “patches”)? Thank you
A: VoltDB supports catalog updates where you can add
and remove tables. You can also add, remove and update stored
procedures. Over the coming weeks we’ll be adding additional support for
online schema changes such as adding/dropping indexes and columns.
Q: How well VoltDB gels with hibernate/mybatis or other java based persistence framework?
A: We do not currently support Hibernate or MyBatis directly.
Q: You mentioned about the limitation of 50MB
for the result set of a query, does it mean the developer should
predetermine the result set size of the query and not use such?
A: Yes, you need to be sure that your result set is less than 50MB or the query or stored procedure will fail.
Q: VoltDB is described as an in-memory database, but what makes it different from loading a complete Oracle database in memory?
A: We’re dramatically faster than Oracle in memory.
Research at MIT, Brown and Yale have identified over 90% of processing
in traditional databases can be classified as “overhead” leaving a scant
7% processing to the actual “work”. This overhead includes such things
as buffer management, locking, latching and logging. VoltDB was
architected from the ground up to eliminate this overhead by being
in-memory, thus eliminating buffering, and to run each transaction in
parallel and in isolation, thus eliminating the need to latch or lock.
When traditional databases are running in memory, they still carry the
traditional overhead, or baggage.
For more information on the innovative architecture and processing power of VoltDB, please reference the following research papers:
- The end of an Architectural Era: (It’s Time for a Complete Rewrite): http://hstore.cs.brown.edu/papers/hstore-endofera.pdf
- OLTP through the looking glass, and what we found there http://hstore.cs.brown.edu/papers/hstore-lookingglass.pdf
Thanks to everyone who tuned in and especially to those with who sent along some great questions. Don’t forget to check out the forums section on our community site for more information as well. Please join us next week when I will be presenting our next webinar “Writing VoltDB Apps in NodeJS” on October 11th.
Published at DZone with permission of Mike Stonebraker, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
An Overview of Kubernetes Security Projects at KubeCon Europe 2023
-
Tomorrow’s Cloud Today: Unpacking the Future of Cloud Computing
-
Building and Deploying Microservices With Spring Boot and Docker
-
Redefining DevOps: The Transformative Power of Containerization
Comments