How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
Learn how to use Apache Doris’s JDBC Catalog to connect multiple databases, enable federated analytics, and migrate data efficiently.
Join the DZone community and get the full member experience.
Join For FreeApache Doris is a high-performance real-time analytical database that supports multiple methods for connecting to external data sources. Among these, JDBC Catalog is a key feature that enables Doris to integrate with various mainstream databases, facilitating federated analysis and data migration.
This article provides a detailed guide on how Doris connects to multiple databases via JDBC Catalog, lists the supported database types, and outlines the usage instructions to help users get started quickly.
Introduction
Doris is a modern MPP-based analytical database widely used in data warehousing, real-time reporting, and lakehouse scenarios. Its multi-catalog feature allows users to access and manage multiple data sources through a unified SQL interface.
JDBC Catalog is one of the multi-source data catalogs provided by Doris, enabling connections to external databases that support the JDBC protocol. With JDBC Catalog, users can directly query external database tables within Doris, perform cross-source federated analytics, or migrate external data into Doris for performance optimization.
This article covers the following topics:
- List of databases supported by Doris via JDBC Catalog
- Configuration and usage of each database
- How to implement federated analytics and data migration
Supported Databases
Doris's JDBC Catalog supports integration with the following mainstream databases:
- MySQL
- PostgreSQL
- Oracle
- SQL Server
- IBM DB2
- ClickHouse
- SAP HANA
- OceanBase
These databases can be connected to Doris using standard JDBC drivers for federated access and data migration.
Basic Usage of JDBC Catalog
Before connecting an external database using JDBC Catalog, a catalog must be created in Doris. The syntax for creating a JDBC Catalog is as follows:
CREATE CATALOG <catalog_name> PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "<jdbc_url>",
"jdbc.user" = "<username>",
"jdbc.password" = "<password>",
"jdbc.driver.class" = "<driver_class>",
"jdbc.driver.url" = "<driver_url>"
);
Parameter Explanation
<catalog_name>
: Custom name for the catalog."type" = "jdbc"
: Specifies the catalog type as JDBC."jdbc.url"
: JDBC connection URL of the external database."jdbc.user"
and"jdbc.password"
: Username and password of the database."jdbc.driver.class"
: JDBC driver class name."jdbc.driver.url"
: JDBC driver download path, which can be a local directory or an object storage link.
Querying External Tables
After creating the catalog, external database tables can be accessed using the following syntax:
SELECT * FROM <catalog_name>.<database_name>.<table_name>;
For example, querying the users
table from MySQL:
SELECT * FROM mysql_catalog.my_database.users;
Database-Specific Configuration
Below are the JDBC connection configurations for different databases, including JDBC URL format, driver class name, and example setups.
MySQL
- JDBC URL format:
jdbc:mysql://<host>:<port>/<database>
- Driver class:
com.mysql.jdbc.Driver
- Example:
CREATE CATALOG mysql_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:mysql://localhost:3306/my_database",
"jdbc.user" = "root",
"jdbc.password" = "password",
"jdbc.driver.class" = "com.mysql.jdbc.Driver",
"jdbc.driver.url" = "file:///path/mysql-connector-java-8.0.23.jar"
);
PostgreSQL
- JDBC URL format:
jdbc:postgresql://<host>:<port>/<database>
- Driver class:
org.postgresql.Driver
- Example:
CREATE CATALOG pg_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:postgresql://localhost:5432/my_database",
"jdbc.user" = "postgres",
"jdbc.password" = "password",
"jdbc.driver.class" = "org.postgresql.Driver",
"jdbc.driver.url" = "file:///path/postgresql-42.2.18.jar"
);
Oracle
- JDBC URL format:
jdbc:oracle:thin:@<host>:<port>:<sid>
- Driver class:
oracle.jdbc.driver.OracleDriver
- Example:
CREATE CATALOG oracle_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:oracle:thin:@localhost:1521:orcl",
"jdbc.user" = "system",
"jdbc.password" = "password",
"jdbc.driver.class" = "oracle.jdbc.driver.OracleDriver",
"jdbc.driver.url" = "file:///path/ojdbc8-19.8.0.0.jar"
);
SQL Server
- JDBC URL format:
jdbc:sqlserver://<host>:<port>;databaseName=<database>
- Driver class:
com.microsoft.sqlserver.jdbc.SQLServerDriver
- Example:
CREATE CATALOG sqlserver_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:sqlserver://localhost:1433;databaseName=my_database",
"jdbc.user" = "sa",
"jdbc.password" = "password",
"jdbc.driver.class" = "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"jdbc.driver.url" = "file:///path/mssql-jdbc-9.2.1.jre8.jar"
);
IBM DB2
- JDBC URL format:
jdbc:db2://<host>:<port>/<database>
- Driver class:
com.ibm.db2.jcc.DB2Driver
- Example:
CREATE CATALOG db2_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:db2://localhost:50000/my_database",
"jdbc.user" = "db2inst1",
"jdbc.password" = "password",
"jdbc.driver.class" = "com.ibm.db2.jcc.DB2Driver",
"jdbc.driver.url" = "file:///path/db2jcc-11.5.0.0.jar"
);
ClickHouse
- JDBC URL format:
jdbc:clickhouse://<host>:<port>/<database>
- Driver class:
ru.yandex.clickhouse.ClickHouseDriver
- Example:
CREATE CATALOG clickhouse_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:clickhouse://localhost:8123/my_database",
"jdbc.user" = "default",
"jdbc.password" = "",
"jdbc.driver.class" = "ru.yandex.clickhouse.ClickHouseDriver",
"jdbc.driver.url" = "file:///path/clickhouse-jdbc-0.3.1.jar"
);
SAP HANA
- JDBC URL format:
jdbc:sap://<host>:<port>
- Driver class:
com.sap.db.jdbc.Driver
- Example:
CREATE CATALOG hana_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:sap://localhost:30015",
"jdbc.user" = "system",
"jdbc.password" = "password",
"jdbc.driver.class" = "com.sap.db.jdbc.Driver",
"jdbc.driver.url" = "file:///path/to/ngdbc.jar" -- 需手动上传驱动
);
OceanBase
- JDBC URL format:
jdbc:oceanbase://<host>:<port>/<database>
- Driver class:
com.oceanbase.jdbc.Driver
- Example:
CREATE CATALOG oceanbase_catalog PROPERTIES (
"type" = "jdbc",
"jdbc.url" = "jdbc:oceanbase://localhost:2881/my_database",
"jdbc.user" = "root",
"jdbc.password" = "password",
"jdbc.driver.class" = "com.oceanbase.jdbc.Driver",
"jdbc.driver.url" = "file:///path/oceanbase-client-2.2.0.jar"
);
Implementing Federated Analytics and Data Migration
Federated Analytics
With JDBC Catalog, Doris can directly query external database tables and perform operations like JOIN
and aggregation with other data sources (such as Doris local tables or Hive tables).
Example: Joining the users
table from MySQL with the orders
table in Doris:
SELECT u.name, o.order_id FROM mysql_catalog.my_database.users u JOIN doris_database.orders o ON u.id = o.user_id;
Doris optimizes the query plan by pushing computations down to the external database whenever possible to minimize data transfer and improve performance.
Data Migration
Doris supports data migration using INSERT INTO SELECT
, allowing users to move external data into Doris local tables.
Example: Migrating the users
table from MySQL to Doris:
INSERT INTO doris_database.doris_users;
SELECT * FROM mysql_catalog.my_database.users;
Doris also supports data export (EXPORT
) to external databases, though additional configurations may be required.
Best Practices
1. Creating Table Structures and Inserting Data in MySQL
The following SQL statements create the authors
and books
tables in MySQL:
CREATE TABLE authors (
author_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
country VARCHAR(50),
birth_year INT
);
CREATE TABLE books (
book_id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(200) NOT NULL,
author_id INT,
price DECIMAL(10,2),
publish_date DATE,
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);
2. Inserting Data into MySQL Tables
The following SQL statements insert sample data into the authors
and books
tables:
INSERT INTO authors (name, country, birth_year) VALUES
('J.K. Rowling', 'UK', 1965),
('George R.R. Martin', 'USA', 1948),
('Haruki Murakami', 'Japan', 1949);
INSERT INTO books (title, author_id, price, publish_date) VALUES
('Harry Potter', 1, 29.99, '1997-06-26'),
('Game of Thrones', 2, 35.99, '1996-08-01'),
('Norwegian Wood', 3, 24.99, '1987-08-04');
3. Verifying the MySQL JDBC Driver
file:///path/mysql-connector-java-8.0.23.jar
4. Creating Tables and Inserting Data in Doris
CREATE TABLE orders (
order_id INT,
book_id INT,
quantity INT NOT NULL,
order_date DATETIME,
customer_name VARCHAR(100)
)DUPLICATE KEY (`order_id`)
DISTRIBUTED BY HASH (`order_id`) BUCKETS 1
PROPERTIES("replication_num" = "1");
INSERT INTO orders (book_id, quantity, order_date, customer_name) VALUES
(1, 2, '2025-03-01 10:30:00', 'John Doe'),
(2, 1, '2025-03-02 15:45:00', 'Jane Smith'),
(3, 3, '2025-03-03 09:15:00', 'Bob Johnson');
5. Creating a JDBC Catalog and Querying MySQL Data in Doris
CREATE CATALOG mysql_catalog PROPERTIES (
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:23316/test",
"driver_url" = "file:///mnt/disk2/liyuanyuan/jdbc_drivers/mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver"
);
6. Performing Federated Queries
SELECT
o.order_id,
o.order_date,
o.customer_name,
b.title AS book_title,
a.name AS author_name,
o.quantity,
(o.quantity * b.price) AS total_price
FROM
orders o
INNER JOIN
mysql_catalog.test.books b ON o.book_id = b.book_id
INNER JOIN
mysql_catalog.test.authors a ON b.author_id = a.author_id;
SELECT
a.name AS author_name,
COUNT(o.order_id) AS total_orders,
SUM(o.quantity) AS total_books_sold,
SUM(o.quantity * b.price) AS total_revenue
FROM
mysql_catalog.test.authors a
LEFT JOIN
mysql_catalog.test.books b ON a.author_id = b.author_id
LEFT JOIN
test2.orders o ON b.book_id = o.book_id
GROUP BY
a.author_id, a.name
ORDER BY
total_revenue DESC;
Considerations and Optimization Suggestions
- Performance optimization: JDBC Catalog query performance is affected by network latency and external database performance. Consider migrating hot data to Doris local tables or enabling metadata caching.
- Driver compatibility: Ensure the JDBC driver version matches the external database and is correctly configured in
jdbc.driver.url
. - Security considerations: Use encrypted connections (e.g., SSL) and secure credential management in production environments.
- Metadata consistency: Doris caches external database metadata; periodic metadata refreshes may be needed.
Conclusion
Doris's JDBC Catalog provides a flexible multi-database connection capability, supporting MySQL, PostgreSQL, Oracle, SQL Server, IBM DB2, ClickHouse, SAP HANA, and OceanBase. Users can easily configure JDBC Catalog to enable federated analytics and data migration, leveraging Doris's strengths in real-time analytics and lakehouse integration.
This guide offers a comprehensive overview of Doris JDBC Catalog usage and configuration, helping users apply it effectively in real-world projects. For more details, refer to the official Doris documentation:
As we look forward to 2025, I am excited about the milestones ahead. The remarkable progress we’ve achieved is made possible by the incredible support of the Doris community, including its dedicated users and developers.
Opinions expressed by DZone contributors are their own.
Comments