DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Improving Java Application Reliability with Dynatrace AI Engine
  • Runtime FinOps: Making Cloud Cost Observable
  • Building a State-Driven Workflow Engine for AI Applications
  • Building a Deterministic Event Correlation Engine in Go for High-Volume Alert Systems

Trending

  • The Missing `bandit` for AI Agents: How I Built a Static Analyzer for Prompt Injection
  • Building a Spring AI Assistant With MCP Servers: A Step-by-Step Tutorial
  • Building AI-Powered Java Applications With Jakarta EE and LangChain4j
  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Deployment
  4. From Git to Fuse Engine

From Git to Fuse Engine

Getting started with Fuse Engine.

By 
bingxi Wu user avatar
bingxi Wu
·
Jul. 09, 22 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
6.1K Views

Join the DZone community and get the full member experience.

Join For Free

Databend, developed with Rust, is a new, open-source data warehouse with a cloud-oriented architecture. It is committed to providing fast elastic expansion capabilities and a pay-as-you-go user experience. GitHub:https://github.com/datafuselabs/databend

This article introduces the Databend base: Fuse Engine, a powerful columnar storage engine. The engine was designed by the Databend community with the following principles: Powerful performance, simple architecture, and high reliability.

Before we start, check out a challenging task that Databend completed: With the Fuse Engine deployed on AWS S3, a transaction wrote 22.89 TB of raw data in around one and a half hours.

MySQL
 
mysql> INSERT INTO ontime_new SELECT * FROM ontime_new; 
Query OK, 0 rows affected (1 hour 34 min 36.82 sec) 
Read 31619274180 rows, 22.89 TB in 5675.207 sec., 5.57 million rows/sec., 4.03 GB/sec.


Meanwhile, the following conditions were met as well:

  • Distributed transactions: Multiple computing nodes can read and write the same data simultaneously (This is the first problem that an architecture that separates storage from compute must solve).

  • Snapshot isolation: Different versions of data do not affect each other so you can do Zero-Copy Cloning for tables.

  • Retrospective ability: You are allowed to switch to any version of the data, so you can recover with the Time Travel feature.

  • Data merging: A new version of data can be generated after merging.

  • Simple and robust: Data relationships are described using files, and you can recover the entire data system based on these files.

From above, you will find that Fuse Engine is "Git-inspired". Before introducing the design of Fuse Engine, let's take a look at how the bottom layer of Git works.

How Git Works

Git implements data version control (including branch, commit, checkout, and merge) in a distributed environment. Based on Git semantics, it is possible to create a distributed storage engine. There are also some products built on Git-like on the market, such as Nessie - Transactional Catalog for Data Lakes and lakeFS. To better explore the underlying working mechanism of Git, we use Git semantics to complete a series of "data" operations from the perspective of the database.

Prepare a file named cloud.txt with the content:

Plain Text
 
2022/05/06, Databend, Cloud 


Commit the file cloud.txt to Git.

Shell
 
git commit -m "Add cloud.txt" 


Git generates a snapshot (Commit ID: 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71)

Shell
 
git log 
commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71 (HEAD) 
Author: BohuTANG <[email protected]> 
Date:   Fri May 6 16:44:21 2022 +0800

     Add cloud.txt


Prepare another file named warehouse.txt

Plain Text
 
2022/05/07, Databend, Warehouse


Commit the file warehouse.txt to Git

Shell
 
git commit -m "Add warehouse.txt"


Git generates another snapshot (Commit ID: 15af34e4d16082034e1faeaddd0332b3836f1424)

Shell
 
commit 15af34e4d16082034e1faeaddd0332b3836f1424 (HEAD) 
Author: BohuTANG <[email protected]> 
Date:   Fri May 6 17:41:43 2022 +0800

     Add warehouse.txt 
commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71 

Author: BohuTANG <[email protected]> 
Date:   Fri May 6 16:44:21 2022 +0800

     Add cloud.txt


Git now keeps two versions of our data:

Plain Text
 
ID 15af34e4d16082034e1faeaddd0332b3836f1424,Version2 
ID 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71,Version1


We can switch between versions by the Commit ID, which implements the functions of Time Travel and Table Zero-Copy. How does Git make it possible in the bottom layer? It's not rocket science. Git introduces these types of object files to describe the relationship:

  • Commit: Describes tree object information

  • Tree: Describes blob object information

  • Blob: Describes file information

HEAD File

First, we need to know the HEAD pointer:

Shell
 
cat .git/HEAD 15af34e4d16082034e1faeaddd0332b3836f1424


Commit File

The Commit file records metadata related to the commit, such as the current tree and parent, as well as the committer, etc.

File path:

Plain Text
 
.git/objects/15/af34e4d16082034e1faeaddd0332b3836f1424 


File content:

Shell
 
git cat-file -p 15af34e4d16082034e1faeaddd0332b3836f1424
 
tree 576c63e580846fa6df2337c1f074c8d840e0b70a 
parent 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71 
author BohuTANG <[email protected]> 1651830103 +0800 
committer BohuTANG <[email protected]> 1651830103 +0800 

Add warehouse.txt


Tree File

The Tree file records all the files of the current version

File path:

Plain Text
 
.git/objects/57/6c63e580846fa6df2337c1f074c8d840e0b70a


File content:

Shell
 
git cat-file -p 576c63e580846fa6df2337c1f074c8d840e0b70a
 
100644 blob 688de5069f9e873c7e7bd15aa67c6c33e0594dde    cloud.txt 
100644 blob bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb    warehouse.txt


Blob File

The Blob files are raw data files. You can view the file content using git cat-file (if you use Git to manage code, blobs are our code files)

Shell
 
git cat-file -p 688de5069f9e873c7e7bd15aa67c6c33e0594dde 
2022/05/06, Databend, Cloud 

git cat-file -p bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb 
2022/05/07, Databend, Warehouse 


Fuse Engine

Databend's Fuse Engine is very similar to Git from the design point of view. It introduces three description files:

  • Snapshot: Describes segment object information.

  • Segment: Describes block object information.

  • Block: Describes parquet file information.

Let's repeat the operations we just did with Git in Fuse Engine.

Create a table.

SQL
 
CREATE TABLE git(file VARCHAR, content VARCHAR); 


Write cloud.txt to Fuse Engine

SQL
 
INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud');


Fuse Engine generates a snapshot (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

SQL
 
CALL system$fuse_snapshot('default', 'git'); 
*************************** 1. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1
  bytes_uncompressed: 68
    bytes_compressed: 351 


Write warehouse.txt to Fuse Engine.

SQL
 
INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse');


Fuse Engine generates another snapshot (Snapshot ID efe2687fd1fc48f8b414b5df2cec1e19) that is linked to the previous one (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

SQL
 
CALL system$fuse_snapshot('default', 'git');
*************************** 1. row ***************************
         snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
   snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
      format_version: 1
previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
       segment_count: 2
         block_count: 2
           row_count: 2
*************************** 2. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1 previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1


Fuse Engine now keeps two versions of our data:

Plain Text
 
ID efe2687fd1fc48f8b414b5df2cec1e19,Version2
ID 6450690b09c449939a83268c49c12bb2,Version1


That's very similar to Git. Right?

HEAD

Git needs a HEAD as an entry. So does Fuse Engine. Check the HEAD of Fuse Engine:

SQL
 
SHOW CREATE TABLE git\G;
*************************** 1. row ***************************
       Table: git
Create Table: CREATE TABLE `git` (
  `file` VARCHAR,
  `content` VARCHAR
) ENGINE=FUSE SNAPSHOT_LOCATION='53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json'


SNAPSHOT_LOCATION is HEAD, which by default points to the latest snapshot efe2687fd1fc48f8b414b5df2cec1e19, then how do we switch to the snapshot data whose ID is 6450690b09c449939a83268c49c12bb2? First, check the snapshot information of the current table:

SQL
 
CALL system$fuse_snapshot('default', 'git')\G; 
*************************** 1. row ***************************
         snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
   snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
      format_version: 1 previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
       segment_count: 2
         block_count: 2
           row_count: 2 
*************************** 2. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1 previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1


Then create a new table (git_v1) and point SNAPSHOT_LOCATION to the snapshot file you need:

SQL
 
CREATE TABLE git_v1(`file` VARCHAR, `content` VARCHAR) SNAPSHOT_LOCATION='53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json'; 

SELECT * FROM git_v1;
+-----------+-----------------------------+ 
| file      | content                     | 
+-----------+-----------------------------+ 
| cloud.txt | 2022/05/06, Databend, Cloud | 
+-----------+-----------------------------+


Snapshot File

Stores the segment information.

File path:

Plain Text
 
53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json


File content:

JSON
 
{
   "format_version":1,
   "snapshot_id":"efe2687f-d1fc-48f8-b414-b5df2cec1e19",
   "prev_snapshot_id":[
      "6450690b-09c4-4993-9a83-268c49c12bb2",
      1
   ],

    "segments":[
      [
         "53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json"
      ],
      [
         "53/133/_sg/d0bff902b98846469480b23c2a8f93d7_v1.json"
      ]
   ]
   ... ...
}


Segment File

Stores block information.

File path:

Plain Text
 
53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json


File content:

JSON
 
{
   "format_version":1,
   "blocks":[
      {
         "row_count":1,
         "block_size":76,
         "file_size":360,
         "location":[
            "53/133/_b/ba4a60013e27479e856f739aefeadfaf_v0.parquet",
            0
         ],
         "compression":"Lz4Raw"
      }
   ]
   ... ...
}


Block File

The underlying data of Fuse Engine uses the Parquet format, and each file is composed of multiple blocks.

Summary

In the early design period (October 2021) of Databend's Fuse Engine, the requirements were very clear, but the solution selection didn't go smoothly. At that time, the Databend community investigated a large number of Table Format solutions (such as Iceberg) on the market. The challenge was to choose between using an existing solution and building a new one. Finally, we decided to develop a simple and suitable Storage Engine that uses the Parquet standard as the storage format. Fuse Engine stores the Parquet Footer separately to reduce unnecessary Seek operations, and introduces a more flexible indexing mechanism, for example, operations such as Aggregation and Join can have their own indexes for acceleration.

Feel free to deploy Fuse Engine with your object storage to have a different experience on the big data analysis: https://databend.rs/doc/deploy

Engine Git

Opinions expressed by DZone contributors are their own.

Related

  • Improving Java Application Reliability with Dynatrace AI Engine
  • Runtime FinOps: Making Cloud Cost Observable
  • Building a State-Driven Workflow Engine for AI Applications
  • Building a Deterministic Event Correlation Engine in Go for High-Volume Alert Systems

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook