DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. Reviewing Sled: Part III

Reviewing Sled: Part III

Look at part three of a review of Sled, an embedded database engine written in Rust.

Oren Eini user avatar by
Oren Eini
·
Apr. 29, 19 · Review
Like (1)
Save
Tweet
Share
3.89K Views

Join the DZone community and get the full member experience.

Join For Free

Unusually for me, I had a bit of a pause in reviewing Sled. As a reminder, Sled is an embedded database engine written in Rust. I last stopped looking at the buffer management, but I still don’t really have a good grasp of what is going on. You can find Part 1 here and Part 2 here.

The next file is the iterator. It looks like it translates between segments and messages in these segments. Here is the struct:

image

As you can see, the log iterator holds an iterator of segments, and iterating over it looks like it will go over all the messages in the segments in order. Yep, here is the actual work being done:

image

The next() method is fairly straightforward, I found. But I have to point out this:

image

First, the will need call is really interesting. Mostly because you have a pretty obvious way to do conditional compiling that doesn’t really suck. #if is usually much more jarring in the code.

Second, I think that the style of putting really important functions inside an if result in a pretty dense code. Especially since the if is entered only on error. I would have preferred to have it as a stand-alone variable and then check if it failed.

What I don’t understand is the read_segment call. Inside that method, we have:

image

There are also similar calls on the segment trailer. It looks like we have a single file for the data, but stuff that is too large is held externally in the blob files.

We then get to this guy, which I find to be a really elegant way to handle all the different states.

image

That is about it for interesting bits in the iterator. The next fun bit is the Log. I do have to admit that I don’t like the term log. It is too easy to get it confused with a debug log.

image

The fact that you need to figure out where to get the offset of the data you are about to write is really interesting. This is the method that does the bulk of the work:

image

Note that it just reserved and completed the operation. This also does not flush the data to disk. That is handled by the flusher or by an explicit call. The reserve() method calls to reserve_internal(), and there we find this gem:

image

I know what it does (conditional compilation), but I find it really hard to follow. Especially because it looks like a mistake with buf being defined twice. This is actually a case where an #if statement would be better in my opinion.

Most of the code in there is to manage calls to the iobuf, which I already reviewed. So I’m going to skip ahead and look at something that is going to be more interesting, the page cache. Sled has an interesting behavior, in that it can shred a page into multiple location, requiring some logic to bring it all back together. That is going to be really interesting to look at I hope.

The file starts with this:

image

And this…takes a while to unpack. Remember that epoch is a manual GC pattern for concurrent data structure without GC.

The cached_ptr value is a shared pointer to a Node (inside a lock-free stack) that holds a CacheEntry with static lifetime and thread safe to a generic argument that must have static lifetime and be thread-safe. And there is an unsigned long there as well.

No idea yet what is going on. But here is the first method on this struct:

image

That is a lot. The cache entry is a discriminated union with the following options:

image

There are some awesome documentation comments here, including a full-blown sample code that really helps me understand what is going on in the code.

There seems to be a few key methods that are involved here:

  • allocate(val) — create a new page and write an initial value, gets back a page id.
  • link(id, val) — make a write to a page id. Which simply write a value out.
  • get(id) — read all the values for a page id, and uses a materializer to merge them all to a single value.
  • replace(id, val) — set the page id to the new value, removing all the other values it had.

The idea here, as I gather. Is to allow sequential writes to the data, but allow fast reads, mostly by utilizing SSD’s random read feature.

I’m trying to follow the code, but it is a bit complicated. In particular, we have:

image

This tries to allocate either a free page or allocate a new one. One of the things that really messes with me here is the use of the term Page. I’m using B+Tree, where a page is literally some section of memory. Here, it refers to something more nebulous. The key point here is that I don’t see where the size is specified. But given that you can link items to page, that sort of make sense. I just need to get used to Pages != storage.

The fact that all of this is generic also makes it hard to follow what is actually going on. I’m getting lost in here, so I think that I’ll stop for now. Stay tuned for the next part!

Database engine Data structure

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using JSON Web Encryption (JWE)
  • Why Every Fintech Company Needs DevOps
  • Java Development Trends 2023
  • Beginners’ Guide to Run a Linux Server Securely

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: