Uses for MariaDB and the Spider Storage Engine
Uses for MariaDB and the Spider Storage Engine
Take a look at some different use cases for MariaDB's Spider storage engine using a single table link and across multiple servers.
Join the DZone community and get the full member experience.Join For Free
Spider is a storage engine for the MariaDB Platform that allows you to build distributed databases from a standard MariaDB setup. The technology is not complicated, although the implementation is. This blog will explain how the Spider storage engine works, what it does and will also show some of the use cases.
MariaDB Storage Engines
Before we look at the Spider storage engine, let's have a quick look at the storage engine concept. A storage engine is the implementation of code that manages the low level of data access in MariaDB. The storage engine handles things such as reading and writing data, row-level locking, if supported, multi-versioning and transaction management, among other things.
Storage engines are defined on a table-by-table basis, and once a table is created and defined to use a particular storage engine (using the
ENGINE Table attribute) it will generally be treated just like any other tables. Tables using different engines, once created, are handled as equal and can be joined, have data selected from one into another, etc. Tables may also have the storage engine changed after the table has been created.
You may also enjoy: The State of the Storage Engine
Spider From a High Level
What Spider provides, first and foremost, is a way to access a table on one MariaDB Server from another MariaDB Server. The MariaDB Server that holds the actual table data does not have any specific Spider code on it at all; it is a normal MariaDB Server. The MariaDB Server that is configured to access that data then uses the Spider storage engine to access the data on the other server using the usual MariaDB protocol.
As can be seen, Spider is only active on the referencing node; the target node does not need to have Spider installed on it. Creating a "spider table" means that we define a table that contains the same, or a subset of, the columns in the target table and referencing the target server.
Also, note that there is no data for these tables on the "spider node" and there is no duplication of data, all data resides on the target node.
Installing the Spider Storage Engine
Spider is included with MariaDB Server and there is also an installation script that installs the Spider engine and also some utilities are supporting objects and it is recommended to install spider using this script. If you have installed MariaDB as an RPM this spider installation script is in /usr/share/mysql and is called install_spider.sql. To run it use the MariaDB MySQL command-line tool and use the source command, like this.
After running the script check if Spider is installed using the
SHOW ENGINES command:
We are now ready to start using the Spider engine.
Single Table Link with Spider
Let's look at an example, a very basic one. On the "target" server, we create a table. Note that on this server, Spider isn't necessary; this is only required on the server that accesses remote data, and we will get to that next. So now we create a table on the "target" server, let's call that "Server 2:"
Following this, let's insert some data into this table:
I admit that this wasn't the most exciting experiment since Marie Curie invented radium, but we are not done yet. What we are going to do now is to access this table from another instance of MariaDB Server. As Spider connects to the remote server as a normal user, we need to create a user and grant it some access to the table we created above, on the same MariaDB Server instance as above:
The next step is to create a
SERVER . If you haven't used Spider you have probably not used the command before, but what it does is that it defines the parameters that are used to connect to another MariaDB Server instance, so the server is defined on the MariaDB Server instance that is to access the table we create above (let's call that Server1).
With this in place, let's create a link using Spider from Server1 to Server2. Note that we do not have to use all the fields in the target table.
OK, no errors, then try it by
SELECTing some data:
Yes, this seems to work. Still Marie Curie beats us in the excitement department, but we are getting there.
Uses for a Single Table Link
There are quite a few uses for even a single table link such as the one described above. In some cases it can be used to replace replication where it is just a single table that is to be replicated. Maybe you have a single table that is maintained in one database instance such as a customer table or something and then you want a reference to a customer ID from a completely different application running in some other MariaDB Server instance.
One needs to understand that there are drawbacks to this kind of setup; for example, performance can be an issue, because for every statement that uses a spider table a connection is made to the connected server. A join between a local table and a Spider table can also be slow, but that depends on a few things. In general, Spider is pretty performant and is not a bottleneck per se and there is quite a bit of tuning possible.
A third option is when we have a single table, for example a log table, that I don't want to mix with any other tables. Then a Spider table might be a way to achieve that.
Using Spider for Status Across Multiple Servers
If you have a MariaDB Cluster running, with a primary and several secondaries for example, then there is probably a need to see the status of all the servers in the cluster as one unit. Spider comes in handy. Here we are to look at how that can be done. We have two servers, moe and homer, that we want to monitor. Let us assume that moe is the "main" server and that we from that box want to look at the global status on both servers.
On homer we create a view for
GLOBAL_STATUS which is based on the GLOBAL STATUS table in
INFORMATION_SCHEMA , but adds a column with the name of the server and put that in the MySQL database.
And then we do a similar thing on moe.
The next step is to create a link from what we consider the master (moe) to the other server (homer). So on moe, we execute this command:
With this, we are ready to create a view on moe for viewing the status on homer.
On homer, we now have one table and one view, one for each server and with similar schema, that represent the status across the cluster. Let's combine these two into a single VIEW.
And then a
VIEW that shows the summary status across the cluster.
Let's give this a shot now and see how it works:
I think this is useful, although this is a simple example. With more servers than these two it is a lot more useful.
Combining Multiple Server Tables in the General Case
In the above example, we look at how tables with the same structure with similar content can be combined into one single view, using Spider. We look at a specific example there, but in general, there are uses for this. Let's say that you have an application that runs in multiple instances, say some ERP application that is used by multiple departments. If you then want to do reporting across the departments from a reporting server, then you can use this model to access all the instances of this application.
The alternative to using Spider is to use multi-source replication, but that requires a lot more data to be stored in the reporting server, so Spider does have some advantages.
Sharding with Spider
Sharding is the most common use case for Spider. The examples I described above are less common, but still useful. In the cases above, I mapped a table on one server to a table on another machine. With Spider used for a partitioned table on one server, each partition exists on a separate server. Except for this, there aren't many differences in practice, although the partitioning use case allows Spider to do some interesting things and Spider has some extra performance-enhancing tricks up its sleeve, for this particular use case.
To show how sharding with Spider works, let's show a very simple example. We are going to show how to set up sharding with two shards only, just to show the principle. Let us use the customer table where we have a total of three servers, two "data" servers with the data for two shards and one server than is the "Spider" server that hold no actual data for the table we are working with, but instead points to data that resides on the other two servers.
Let's start at the bottom, which is to create the tables we are going to use in MariaDB Servers Server2 and Server3 (this looks similar to the customer table above, but not exactly so). This is executed of both of these two Servers as user root
Now we have two shards created, so let's then create links so we can reach them from MariaDB Server1, so on Server1 execute this SQL statement, replacing my port, host, username, and password with something relevant in your case.
Let us then tie it together with a partitioned table, and note that you can use any reasonable partitioning scheme here, I just picked a simple one to make a point.
We can now on Server1 insert some data and see it appear as appropriate on Server2 and Server3.
Based on our partitioning set up, the three first row goes onto Server2 and the last three to Server3. I would like to point out before I finish this section that the individual shards, as they are common MariaDB instances, without the Spider engine, so they can be accessed shard by shard by connecting to, in this case, Server2 or Server3.
Uses for Sharding with Spider
The most obvious advantage of sharding is to increase performance when working with large datasets. But with Spider there are other advantages, most coming from the facts that the individual shards are plain MariaDB Servers that can be accessed individually, so the dataset can be looked at shard by shard, without any performance bottleneck, or you can look at it as a whole, using Spider, at the same time.
This blog has presented some uses for the Spider Storage Engine, all useful in their own right. Most documentation around Spider has focused on the sharding use case, so the first two uses might be surprising to some.
Published at DZone with permission of Anders Karlsson , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.