Migrating From Sakila-MySQL to Couchbase - ETL
This is the first of our four-part series where we delve deep into migrating from a sample database in MySQL to Couchbase.
Join the DZone community and get the full member experience.
Join For FreePart 1 covers setting up the MySQL Sakila database, extracting the data into a JSON or CSV format, and loading that data into Couchbase. Let's look at the steps.
1 - Install MySQL Database, The Shell, MySQL Workbench, and Couchbase 7.0
https://dev.mysql.com/downloads/
export PATH=$PATH:/usr/local/mysql/bin
2 - Setup Sakila Models
Download the Sakila database from - https://dev.mysql.com/doc/index-other.html.
Use the schema and data in SQL files and load them using the MySQL shell.
Mysql > SOURCE <filepath>/sakila-schema.sql ;
Mysql > SOURCE <filepath>/sakila-data.sql ;
Mysql > USE sakila;
3 - Extract Data From MySQL in JSON Format
We can use the MySQL shell and figure out how to directly extract data to JSON. First, let's look at the command to extract the data.
Using MySQL Sheel to Extract Either CSV or JSON Data
echo "select * from actor;" | mysqlsh --sql --result-format=json/array --uri=root@localhost:3306/sakila > actor.json
We can also extract to CSV using
select * from actor into outfile '<filepath>/actor.csv' FIELDS ENCLOSED BY '"' TERMINATED BY
';' ESCAPED BY '"' LINES TERMINATED BY '\r\n';
3306 is the default port on which MySQL is running. We use a mix of bash and MySQL Shell to extract each table into a list of JSON objects. To extract into CSV we can use the above SQL query in the MySQL shell. This will make it in a list format that is then easy for us to import into Couchbase using cbimport.
We need to run the above commands to import data from all the tables present in the Sakila database. Some important things to be aware of when importing data:
- Exporting geographic data / geospatial data - geospatial data in Sakila was represented in a WKB (well-known binary) format. This is essentially a binary representation of spatial geometry. For example, 0x00000000010100000017540A70700160401E1C47077F7D4740 - Which in textual representation becomes - POINT(128.0449753 46.9804391)
Prior to exporting GEO data, we need to convert it into GeoJSON format using the ST_GeomFromWKB() and ST_GeomFromGeoJson() functions. - When trying to extract data, you might encounter some issues with setting secure-file-priv, i.e., “MySQL server is running with the –secure-file-priv” error. Try manually modifying the startup script in /Library/LaunchDaemons com.oracle.oss.mysql.mysqld.plist. - To add the following to the program arguments. <string>--secure-file-priv=/mysql_exp</string>
4 - Loading Data into Couchbase
Once we have data in either JSON or CSV formats it's easy to load this data using CB-import.
Mapping Tables/Schema from Relational to Couchbase
With the addition of Collections in Couchbase 7.0, mapping a relational database to Couchbase is now super easy. The database now maps to Bucket, and tables map to Collections under a single scope. One can also group tables under a specific scope, but it's easier to just map tables to Collections.
Once you manually create the collections from the UI or CLI, wait a bit before importing the data.
./cbimport json -d file:///Users/isha/Desktop/sakila-db/store.json -f list -c localhost -u Administrator -p password -b sakila -g %store_id% --scope-collection-exp _default.store
Or using CSV
./cbimport csv -d file:///Users/isha/Desktop/sakila-db/tmp.csv -f list -c localhost -u Administrator -p password -b sakila -g %actor_id% --scope-collection-exp _default.actor
Running either of the above commands after manually creating the bucket, scopem and collections will populate the collections with the data extracted from the tables. Here I created a Sakila bucket with _default scope containing 16 collections each mapping to a given base table.
There is no hard limit on the number of collections. But when importing data, we need to be aware of some possible issues including:
- Importing Unicode or binary strings encoded incorrectly will throw an error in cbimport.
- Date, time, and timestamps from MySQL to Couchbase - Couchbase sets up the dates as strings and uses scalar functions to manipulate them. But we only support ISO 8601 Extended format at this time. Assuming you have a date in a different format imported as a string, we will need to use the DATE_FORMAT_STR function to change it into a usable format. This can be rectified after importing the data.
- String encodings using character sets - MySQL allows storing data using a variety of character sets (which are basically a set of symbols and encodings) and performs comparisons according to a variety of collations (which comprises a set of rules to compare characters in a set - this also includes support for multiple language scripts). MySQL supports 40+ character sets and collations and multiple languages. Couchbase on the other hand only supports UTF-8 encoding and English collation.
- Numeric data type - If you need to store floating-point values in JSON with exact precision, it might be safer to use strings instead of numbers. This is a great place to start when trying to understand how Couchbase stores numbers.
- Importing GeoJSON data - As previously mentioned, we need to make sure this is an invalid GeoJSON format and not in a WKB (well-known binary) format.
Setting Session Variables
Setting session variables in MySQL using SET can be done in Couchbase using the \SET command in the CQB shell. This can also be set in the UI under Preferences in the Query Workbench in the form of named and positional parameters. Query parameters themselves can be set from settings, under Query settings.
Mapping Data Types From Relational to Couchbase and Other Parameters
Numeric
- TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT (SIGNED UNSIGNED).
- Fixed point types such as DECIMAL/NUMERIC (with precision and scale)
These can all be mapped to NUMBER in Couchbase/JSON.
- FLOAT, REAL, and DOUBLE PRECISION
- If you need to store floating-point values in JSON with exact precision, it might be safer to use STRINGs instead of numbers. See here.
- Couchbase only supports signed numbers (signed floats).
String
- CHAR, VARCHAR, TIMESTAMP, DATE, TIME, DATETIME, YEAR
All of the above data types are mapped to STRINGs in Couchbase/ JSON. For DATE-related types, Couchbase provides data manipulation functions.
Booleans
- Map to bool in JSON.
Spatial Geometry
- This is mapped into arrays representing lat-long pairs. We can also represent polygons as multiple lat-long point pairs.
Blob
- We can store binary or XML data in Couchbase using the steps outlined here.
Partition
- Couchbase supports index partitions.
Constrain, Foreign key, Unique key
Not applicable for Couchbase. Depends more on the user inserting the data to make sure it’s unique. We don't allow setting constraints as well. Most relational databases that have JSON support allow adding Unique constraints to their JSON data as well but some may argue that this defeats the purpose of using JSON since the intent is to be schemaless.
With this, we have now completed our extraction of data from a relational database (MySQL), transformed it, and loaded it into Couchbase!
In Part 2 we will cover views, functions as UDFs (User-defined functions), in Part 3 we will cover stored procedures, and finally, in Part 4 we will cover mapping triggers to eventing functions.
Opinions expressed by DZone contributors are their own.
Comments