DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Harnessing the Power of AWS Aurora for Scalable and Reliable Databases
  • Enhancing Performance: Optimizing Complex MySQL Queries for Large Datasets
  • SQL Commands: A Brief Guide
  • Non-blocking Database Migrations

Trending

  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  • Stateless vs Stateful Stream Processing With Kafka Streams and Apache Flink
  • Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
  1. DZone
  2. Data Engineering
  3. Databases
  4. Uses for MariaDB and the Spider Storage Engine

Uses for MariaDB and the Spider Storage Engine

Take a look at some different use cases for MariaDB's Spider storage engine using a single table link and across multiple servers.

By 
Anders Karlsson user avatar
Anders Karlsson
·
Mar. 12, 20 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
14.5K Views

Join the DZone community and get the full member experience.

Join For Free

Spider is a storage engine for the MariaDB Platform that allows you to build distributed databases from a standard MariaDB setup. The technology is not complicated, although the implementation is. This blog will explain how the Spider storage engine works, what it does and will also show some of the use cases.

MariaDB Storage Engines

Before we look at the Spider storage engine, let's have a quick look at the storage engine concept. A storage engine is the implementation of code that manages the low level of data access in MariaDB. The storage engine handles things such as reading and writing data, row-level locking, if supported, multi-versioning and transaction management, among other things.

Storage engines are defined on a table-by-table basis, and once a table is created and defined to use a particular storage engine (using the  ENGINE  Table attribute) it will generally be treated just like any other tables. Tables using different engines, once created, are handled as equal and can be joined, have data selected from one into another, etc. Tables may also have the storage engine changed after the table has been created.

You may also enjoy: The State of the Storage Engine

Spider From a High Level

What Spider provides, first and foremost, is a way to access a table on one MariaDB Server from another MariaDB Server. The MariaDB Server that holds the actual table data does not have any specific Spider code on it at all; it is a normal MariaDB Server. The MariaDB Server that is configured to access that data then uses the Spider storage engine to access the data on the other server using the usual MariaDB protocol.

Spider and MariaDB setup

As can be seen, Spider is only active on the referencing node; the target node does not need to have Spider installed on it. Creating a "spider table" means that we define a table that contains the same, or a subset of, the columns in the target table and referencing the target server.

Also, note that there is no data for these tables on the "spider node" and there is no duplication of data, all data resides on the target node.

Installing the Spider Storage Engine

Spider is included with MariaDB Server and there is also an installation script that installs the Spider engine and also some utilities are supporting objects and it is recommended to install spider using this script. If you have installed MariaDB as an RPM this spider installation script is in /usr/share/mysql and is called install_spider.sql. To run it use the MariaDB MySQL command-line tool and use the source command, like this.

Shell
 




x
10


 
1
$ mysql -u root
2
Welcome to the MariaDB monitor. Commands end with ; or \g.
3
Your MariaDB connection id is 2835
4
Server version: 10.4.6-MariaDB-log MariaDB Server
5
 
          
6
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
7
 
          
8
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
9
 
          
10
MariaDB> source /usr/share/mysql/install_spider.sql


After running the script check if Spider is installed using the  SHOW ENGINES  command:

Shell
 




xxxxxxxxxx
1
15


 
1
MariaDB> SHOW ENGINES;
2
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
3
| Engine | Support | Comment | Transactions | XA | Savepoints |
4
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
5
| SPIDER | YES | Spider storage engine | YES | YES | NO |
6
| MRG_MyISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
7
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
8
| Aria | YES | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO | NO | NO |
9
| MyISAM | YES | Non-transactional engine with good performance and small data footprint | NO | NO | NO |
10
| SEQUENCE | YES | Generated tables filled with sequential values | YES | NO | YES |
11
| InnoDB | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES | YES | YES |
12
| PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO |
13
| CSV | YES | Stores tables as CSV files | NO | NO | NO |
14
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
15
9 rows in set (0.001 sec)


We are now ready to start using the Spider engine.

Single Table Link with Spider

Let's look at an example, a very basic one. On the "target" server, we create a table. Note that on this server, Spider isn't necessary; this is only required on the server that accesses remote data, and we will get to that next. So now we create a table on the "target" server, let's call that "Server 2:"

Shell
 




xxxxxxxxxx
1
19


 
1
$ mysql -u root -S /tmp/mariadb2.sock -u root
2
Welcome to the MariaDB monitor. Commands end with ; or \g.
3
Your MariaDB connection id is 11
4
Server version: 10.4.8-MariaDB MariaDB Server
5
 
          
6
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
7
 
          
8
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
9
 
          
10
MariaDB [(none)]> CREATE DATABASE spidertest;
11
Query OK, 1 row affected (0.001 sec)
12
 
          
13
MariaDB [(none)]> use spidertest;
14
Database changed
15
MariaDB [spidertest]> CREATE TABLE customer(
16
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
17
name VARCHAR(200) NOT NULL,
18
address VARCHAR(255) NOT NULL);
19
Query OK, 0 rows affected (0.539 sec)


Following this, let's insert some data into this table:

MySQL
 




xxxxxxxxxx
1


 
1
MariaDB [spidertest]> INSERT INTO customer VALUES(NULL, 'John Doe', '1 Main Street');
2
Query OK, 1 row affected (0.309 sec)
3
 
          
4
MariaDB [spidertest]> INSERT INTO customer VALUES(NULL, 'Bob Smith', '45 Elm Street');
5
Query OK, 1 row affected (0.092 sec)
6
 
          
7
MariaDB [spidertest]> INSERT INTO customer VALUES(NULL, 'Jane Jones',
8
'18 Second Street');
9
Query OK, 1 row affected (0.094 sec)


I admit that this wasn't the most exciting experiment since Marie Curie invented radium, but we are not done yet. What we are going to do now is to access this table from another instance of MariaDB Server. As Spider connects to the remote server as a normal user, we need to create a user and grant it some access to the table we created above, on the same MariaDB Server instance as above:

MySQL
 




xxxxxxxxxx
1


 
1
MariaDB [spidertest]> CREATE USER 'spider'@'192.168.0.11' IDENTIFIED BY 'spider';
2
Query OK, 0 rows affected (0.236 sec)
3
 
          
4
MariaDB [spidertest]> GRANT ALL ON spidertest.* TO 'spider'@'192.168.0.11';
5
Query OK, 0 rows affected (0.238 sec)
6
 
          
7
MariaDB [spidertest]> GRANT ALL ON mysql.* TO 'spider'@'192.168.0.11';
8
Query OK, 0 rows affected (0.238 sec)


The next step is to create a  SERVER . If you haven't used Spider you have probably not used the command before, but what it does is that it defines the parameters that are used to connect to another MariaDB Server instance, so the server is defined on the MariaDB Server instance that is to access the table we create above (let's call that Server1).

Shell
 




xxxxxxxxxx
1
13


 
1
r$ mysql -u root -S /tmp/mariadb1.sock -u root
2
Welcome to the MariaDB monitor. Commands end with ; or \g.
3
Your MariaDB connection id is 12
4
Server version: 10.4.8-MariaDB MariaDB Server
5
 
          
6
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
7
 
          
8
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
9
 
          
10
MariaDB [none]> CREATE SERVER Server2 FOREIGN DATA WRAPPER mysql
11
OPTIONS(HOST '192.168.0.11', DATABASE 'spidertest', PORT 10482,
12
USER 'spider', PASSWORD 'spider');
13
Query OK, 0 rows affected (0.233 sec)


With this in place, let's create a link using Spider from Server1 to Server2. Note that we do not have to use all the fields in the target table.

Shell
 




xxxxxxxxxx
1
22


 
1
$ mysql -u root -S /tmp/mariadb1.sock -u root
2
Welcome to the MariaDB monitor. Commands end with ; or \g.
3
Your MariaDB connection id is 33
4
Server version: 10.4.8-MariaDB MariaDB Server
5
 
          
6
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
7
 
          
8
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
9
 
          
10
MariaDB [(none)]> DROP DATABASE IF EXISTS spidertest;
11
Query OK, 0 rows affected, 1 warning (0.000 sec)
12
 
          
13
MariaDB [(none)]> CREATE DATABASE spidertest;
14
Query OK, 1 row affected (0.001 sec)
15
 
          
16
MariaDB [(none)]> use spidertest;
17
Database changed
18
MariaDB [spidertest]> CREATE TABLE customer(
19
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
20
name VARCHAR(200) NOT NULL) ENGINE=Spider
21
COMMENT = 'wrapper "mysql", srv "Server2"';
22
Query OK, 0 rows affected (0.132 sec)


OK, no errors, then try it by  SELECTing some data:

MySQL
 




xxxxxxxxxx
1


 
1
MariaDB [spidertest]> SELECT * FROM customer;
2
+----+------------+
3
| id | name |
4
+----+------------+
5
| 1 | John Doe |
6
| 2 | Bob Smith |
7
| 3 | Jane Jones |
8
+----+------------+
9
3 rows in set (0.006 sec)


Yes, this seems to work. Still Marie Curie beats us in the excitement department, but we are getting there.

Uses for a Single Table Link

There are quite a few uses for even a single table link such as the one described above. In some cases it can be used to replace replication where it is just a single table that is to be replicated. Maybe you have a single table that is maintained in one database instance such as a customer table or something and then you want a reference to a customer ID from a completely different application running in some other MariaDB Server instance.

One needs to understand that there are drawbacks to this kind of setup; for example, performance can be an issue, because for every statement that uses a spider table a connection is made to the connected server. A join between a local table and a Spider table can also be slow, but that depends on a few things. In general, Spider is pretty performant and is not a bottleneck per se and there is quite a bit of tuning possible.

A third option is when we have a single table, for example a log table, that I don't want to mix with any other tables. Then a Spider table might be a way to achieve that.

Using Spider for Status Across Multiple Servers

If you have a MariaDB Cluster running, with a primary and several secondaries for example, then there is probably a need to see the status of all the servers in the cluster as one unit.  Spider comes in handy. Here we are to look at how that can be done. We have two servers, moe and homer, that we want to monitor. Let us assume that moe is the "main" server and that we from that box want to look at the global status on both servers.

On homer we create a view for  GLOBAL_STATUS  which is based on the GLOBAL STATUS table in  INFORMATION_SCHEMA , but adds a column with the name of the server and put that in the MySQL database.

MySQL
 




xxxxxxxxxx
1


 
1
CREATE OR REPLACE VIEW global_status_homer
2
AS SElECT 'homer' host, gs.variable_name, gs.variable_value
3
FROM information_schema.global_status gs;


And then we do a similar thing on moe.

MySQL
 




xxxxxxxxxx
1


 
1
 
          
2
CREATE OR REPLACE VIEW global_status_moe
3
AS SElECT 'moe' host, gs.variable_name, gs.variable_value
4
FROM information_schema.global_status gs;


The next step is to create a link from what we consider the master (moe) to the other server (homer). So on moe, we execute this command:

MySQL
 




xxxxxxxxxx
1


 
1
CREATE OR REPLACE SERVER homer FOREIGN DATA WRAPPER mysql
2
OPTIONS(HOST '192.168.0.11', DATABASE 'mysql', PORT 10482, USER 'spider',
3
PASSWORD 'spider');


With this, we are ready to create a view on moe for viewing the status on homer.

MySQL
 




xxxxxxxxxx
1


 
1
CREATE OR REPLACE TABLE global_status_homer(host varchar(2048), variable_name VARCHAR(64), variable_value VARCHAR(64)) ENGINE=Spider
2
COMMENT='wrapper "mysql", srv "homer"';


On homer, we now have one table and one view, one for each server and with similar schema, that represent the status across the cluster. Let's combine these two into a single VIEW.

MySQL
 




xxxxxxxxxx
1


 
1
CREATE OR REPLACE VIEW global_status_all AS
2
SELECT host, variable_name, variable_value
3
FROM global_status_homer
4
UNION
5
SELECT host, variable_name, variable_value
6
FROM global_status_moe;


And then a  VIEW  that shows the summary status across the cluster.

MySQL
 




xxxxxxxxxx
1


 
1
CREATE OR REPLACE VIEW global_status_total AS
2
SELECT variable_name, SUM(variable_value) sum, MAX(variable_value) max,
3
MIN(variable_value) min
4
FROM global_status_all
5
GROUP BY variable_name;


Let's give this a shot now and see how it works:

MySQL
 




xxxxxxxxxx
1
15


 
1
MariaDB [mysql]> SELECT * FROM global_status_total WHERE variable_name LIKE 'open%';
2
+--------------------------+------+------+------+
3
| variable_name | sum | max | min |
4
+--------------------------+------+------+------+
5
| OPENED_FILES | 629 | 477 | 152 |
6
| OPENED_PLUGIN_LIBRARIES | 1 | 1 | 0 |
7
| OPENED_TABLES | 112 | 75 | 37 |
8
| OPENED_TABLE_DEFINITIONS | 125 | 95 | 30 |
9
| OPENED_VIEWS | 85 | 43 | 42 |
10
| OPEN_FILES | 132 | 76 | 56 |
11
| OPEN_STREAMS | 0 | 0 | 0 |
12
| OPEN_TABLES | 77 | 46 | 31 |
13
| OPEN_TABLE_DEFINITIONS | 83 | 49 | 34 |
14
+--------------------------+------+------+------+
15
9 rows in set (0.029 sec)


I think this is useful, although this is a simple example. With more servers than these two it is a lot more useful.

Combining Multiple Server Tables in the General Case

In the above example, we look at how tables with the same structure with similar content can be combined into one single view, using Spider. We look at a specific example there, but in general, there are uses for this. Let's say that you have an application that runs in multiple instances, say some ERP application that is used by multiple departments. If you then want to do reporting across the departments from a reporting server, then you can use this model to access all the instances of this application.

The alternative to using Spider is to use multi-source replication, but that requires a lot more data to be stored in the reporting server, so Spider does have some advantages.

Sharding with Spider

Sharding is the most common use case for Spider. The examples I described above are less common, but still useful. In the cases above, I mapped a table on one server to a table on another machine. With Spider used for a partitioned table on one server, each partition exists on a separate server. Except for this, there aren't many differences in practice, although the partitioning use case allows Spider to do some interesting things and Spider has some extra performance-enhancing tricks up its sleeve, for this particular use case.

To show how sharding with Spider works, let's show a very simple example. We are going to show how to set up sharding with two shards only, just to show the principle. Let us use the customer table where we have a total of three servers, two "data" servers with the data for two shards and one server than is the "Spider" server that hold no actual data for the table we are working with, but instead points to data that resides on the other two servers.

Sharding with Spider

Let's start at the bottom, which is to create the tables we are going to use in MariaDB Servers Server2 and Server3 (this looks similar to the customer table above, but not exactly so). This is executed of both of these two Servers as user root

MySQL
 




xxxxxxxxxx
1


 
1
CREATE DATABASE IF NOT EXISTS spidertest;
2
CREATE TABLE spidertest.customer(
3
id INT NOT NULL PRIMARY KEY,
4
name VARCHAR(200) NOT NULL,
5
address VARCHAR(255) NOT NULL);
6
 
          


Now we have two shards created, so let's then create links so we can reach them from MariaDB Server1, so on Server1 execute this SQL statement, replacing my port, host, username, and password with something relevant in your case.

MySQL
 




xxxxxxxxxx
1


 
1
CREATE OR REPLACE SERVER Server2 FOREIGN DATA WRAPPER mysql
2
OPTIONS(HOST '192.168.0.11', DATABASE 'spidertest', PORT 10482,
3
USER 'spider', PASSWORD 'spider');
4
 
          
5
CREATE OR REPLACE SERVER Server3 FOREIGN DATA WRAPPER mysql
6
OPTIONS(HOST '192.168.0.11', DATABASE 'spidertest', PORT 10483,
7
USER 'spider', PASSWORD 'spider');


Let us then tie it together with a partitioned table, and note that you can use any reasonable partitioning scheme here, I just picked a simple one to make a point.

MySQL
 




xxxxxxxxxx
1


 
1
CREATE TABLE spidertest.customer(id INT NOT NULL PRIMARY KEY,
2
name VARCHAR(200) NOT NULL,
3
address VARCHAR(255) NOT NULL) ENGINE=Spider
4
COMMENT 'wrapper "mysql", table "customer"'
5
PARTITION BY RANGE(id) (
6
PARTITION p0 VALUES LESS THAN (1000) COMMENT = 'srv "Server2"',
7
PARTITION p1 VALUES LESS THAN (2000) COMMENT = 'srv "Server3"');


We can now on Server1 insert some data and see it appear as appropriate on Server2 and Server3.

MySQL
 




xxxxxxxxxx
1


 
1
INSERT INTO customer VALUES(1, 'Larry', 'Main Street 1');
2
INSERT INTO customer VALUES(2, 'Ed', 'Main Street 1');
3
INSERT INTO customer VALUES(3, 'Bob', 'Main Street 1');
4
INSERT INTO customer VALUES(1001, 'Monty', 'Main Street 1');
5
INSERT INTO customer VALUES(1002, 'David', 'Main Street 1');
6
INSERT INTO customer VALUES(1003, 'Allan', 'Main Street 1');


Based on our partitioning set up, the three first row goes onto Server2 and the last three to Server3. I would like to point out before I finish this section that the individual shards, as they are common MariaDB instances, without the Spider engine, so they can be accessed shard by shard by connecting to, in this case, Server2 or Server3.

Uses for Sharding with Spider

The most obvious advantage of sharding is to increase performance when working with large datasets. But with Spider there are other advantages, most coming from the facts that the individual shards are plain MariaDB Servers that can be accessed individually, so the dataset can be looked at shard by shard, without any performance bottleneck, or you can look at it as a whole, using Spider, at the same time.

Conclusion

This blog has presented some uses for the Spider Storage Engine, all useful in their own right. Most documentation around Spider has focused on the sharding use case, so the first two uses might be surprising to some.

Happy SQL'ing
/Karlsson

Further Reading

Data Streaming With MariaDB

Database MariaDB Engine MySQL Data (computing)

Published at DZone with permission of Anders Karlsson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Harnessing the Power of AWS Aurora for Scalable and Reliable Databases
  • Enhancing Performance: Optimizing Complex MySQL Queries for Large Datasets
  • SQL Commands: A Brief Guide
  • Non-blocking Database Migrations

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!