Writing VoltDB Apps in Java Q&A
Join the DZone community and get the full member experience.
Join For FreeEditor's Note: This post was originally authored by Andrew Wilson from the VoltDB blog.
NewSQL and NoSQL databases present developers with a new and
interesting programming model where there are fewer connections to the
database, query parallelism, partitions and an easy model for supporting
new clusters. Last month I presented a webinar introducing the basics
of application development using VoltDB. We had a lot of great questions
and I thought it would be handy to write them up and share the answers
with everyone.
Q: Can you address how you would handle joining
tables that have different partition keys? Would you not partition the
tables in that case?
A: For multi-partition queries you join replicated
tables and partitioned tables, and you can join two or more partitioned
tables but they have to be joined on all table’s partition key.
If the query has a where clause that limits the query to a single
partition, it can be executed as a single partition query. For single
partition queries, you can join any combination of replicated and
partitioned tables.
In all cases, consider joining on columns supported by indices (so that they are fast).
There are a number of factors to consider when joining tables in
general sense. Should the table be a replicated table, thus local to
every partition, or is it too large? For example, I may have a “users”
table that contains thousands of users. I may need to map those users
to a role and there may be only seven or eight roles within this “roles”
table. In this case, it is safe to partition the “users” table and to
replicate the “roles” table.
Consider though if your “users” table was joining with an “orders”
table. This “orders” table may have thousands or millions of rows,
making it a poor candidate for replication. Further, because of business
rules it may be necessary to partition the orders on something other
than the user but instead on the order ID. A join in this circumstance
would result in a multi-partition query.
Q: When writing VoltDB SQL select statement that
makes use of a non-unique index, does all the selected columns need to
be included in the ORDER BY clause, in order to guarantee a
deterministic behavior?
A: This is generally true since the non-unique index
does not guarantee the relative ordering of records with duplicate
indexed columns and different non-indexed columns. However, if ALL of
the selected columns are indexed columns in the selected index, the
columns do not need to be included in the ORDER BY clause.
If there are other (non-indexed) columns, the ORDER BY clause should include all of the selected columns.
For a unique index, the values of any other columns would be completely determined by the indexed column's ordering.
Q: PostgreSQL has an ARRAY [] cell type, which we extensively use. Does VoltDB has something like that?
A: There is really no comparable type without using
your own custom serialization mechanisms. We are releasing a new JSON
column type in a future revision and JSON supports array fields. You can
pass arrays to stored procedures as well so using the upcoming JSON
type or a custom serializer may become options for you.
Q: Are Java Stored Procedures compatible with Oracle Stored Procedures? If not, do you know how to manage a migration?
A: They are not compatible. We use a number of VoltDB
specific Java classes and consequently cannot run existing Oracle Java
stored procedures.
Managing a migration, regardless of the source DB, is a bit tricky and
is dependent upon your schedule. Under the best circumstances I would
recommend reviewing the procedures first to see which ones make sense to
move to VoltDB. You may find that one or more stored procedures can be
merged into a single stored procedure. Next, I would build unit tests
against the Oracle Java procedures using a known dataset and then create
the VoltDB stored procedures and test them using the unit tests.
Q: Is it possible to use CDC (change data
capture) to replicate VolDB data into an Analytics Database? If not,
what are the alternatives?
A: There are a number of strategies that VoltDB users
use to push data to analytics databases. The first is to use the VoltDB
Export table functionality. Export tables act like persistent queues -
you can add data to them via SQL and an external client polls on that
data through a well-published protocol. This external “export” client
pulls on the data and delivers it to the target external system, ack’ing
the receipt of the data (and thus removing it from the VoltDB Export
table). VoltDB supports (and ships) a flat file export client, a
PostgreSQL client as well as a Hadoop export client out of the box.
Developing new export clients are a fairly simple task. For more
information on the VoltDB Export feature please see: http://voltdb.com/company/blog/voltdb-export-connecting-voltdb-other-systems
Other more traditional CDC strategies such as time stamping the rows or
versioning the rows are easy to accomplish in VoltDB at the application
layer by adding the appropriate columns to your tables. There is no
internal timestamp per row, for example, in VoltDB.
Q: Is the database size is limited by the size of RAM available? How to manage virtual memory?
A: The database size is limited by the amount of
memory within the cluster. We suggest users disable swapping to avoid
unintentional performance degradations.
Note that virtualized machines often do not allow you to disable
swapping or swapping is hidden from the guest OS. In either case, it is a
good idea to do throughput testing as other factors can come into play
that can negatively impact throughput and latency, such as the quality
of the network and the drives assuming you are using either snapshots or
command logging.
Q: What is largest implementation in terms of number of tables?
A: There are hundreds of VoltDB applications deployed
throughout the world. While we don’t have exact table count numbers for
those applications, we know of implementations in the 100 to 200 table
range. Additionally, we have customers deploying applications on
upwards of 30 nodes both on bare metal, as well as on Amazon’s cloud.
Q: Hi - we like to do all our builds with Maven. Do you support that?
A: Yes, we do work with maven but our libraries are
not in the central repository. You usually need to manually add two
libraries. The first is the client library if you are working on the
client side and the stored procedure library if you are working on the
server side. The following maven import statements are examples of what I
used in VoltDB 2.8.2:
mvn install:install-file -Dfile=./voltdb-2.8.2.jar -DgroupId=org.voltdb -DartifactId=VoltDB -Dversion=2.8.2 -Dpackaging=jar
mvn install:install-file -Dfile=./voltdbclient-2.8.2.jar -DgroupId=org.voltdb -DartifactId=VoltDBclient -Dversion=2.8.2 -Dpackaging=jar
Q: Is there any recommended way to keep the db schema evolution in order (I.E. as a set of "patches")? Thank you
A: VoltDB supports catalog updates where you can add
and remove tables. You can also add, remove and update stored
procedures. Over the coming weeks we’ll be adding additional support for
online schema changes such as adding/dropping indexes and columns.
Q: How well VoltDB gels with hibernate/mybatis or other java based persistence framework?
A: We do not currently support Hibernate or MyBatis directly.
Q: You mentioned about the limitation of 50MB
for the result set of a query, does it mean the developer should
predetermine the result set size of the query and not use such?
A: Yes, you need to be sure that your result set is less than 50MB or the query or stored procedure will fail.
Q: VoltDB is described as an in-memory database, but what makes it different from loading a complete Oracle database in memory?
A: We’re dramatically faster than Oracle in memory.
Research at MIT, Brown and Yale have identified over 90% of processing
in traditional databases can be classified as “overhead” leaving a scant
7% processing to the actual “work”. This overhead includes such things
as buffer management, locking, latching and logging. VoltDB was
architected from the ground up to eliminate this overhead by being
in-memory, thus eliminating buffering, and to run each transaction in
parallel and in isolation, thus eliminating the need to latch or lock.
When traditional databases are running in memory, they still carry the
traditional overhead, or baggage.
For more information on the innovative architecture and processing
power of VoltDB, please reference the following research papers:
- The end of an Architectural Era: (It’s Time for a Complete Rewrite): http://hstore.cs.brown.edu/papers/hstore-endofera.pdf
- OLTP through the looking glass, and what we found there http://hstore.cs.brown.edu/papers/hstore-lookingglass.pdf
Thanks to everyone who tuned in and especially to those with who sent along some great questions. Don’t forget to check out the forums section on our community site for more information as well. Please join us next week when I will be presenting our next webinar “Writing VoltDB Apps in NodeJS” on October 11th.
Published at DZone with permission of Mike Stonebraker, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments