DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Databases
  4. An Optimization Opportunity with RavenDB and FreeDB

An Optimization Opportunity with RavenDB and FreeDB

Oren Eini user avatar by
Oren Eini
·
Apr. 20, 12 · Interview
Like (0)
Save
Tweet
Share
5.68K Views

Join the DZone community and get the full member experience.

Join For Free

update: the numbers in this post are not relevant. i include them here solely so you would have a frame of reference . we have done a lo t of optimization work, and the numbers are orders of magnitude faster now. see the next post for details.

the purpose of this post is to setup a scenario, see how ravendb does with it, and then optimize the parts that we don’t like. this post is scheduled to go about two months after it was written, so anything that you see here is likely already fixed. in future posts, i’ll talk about the optimizations, what we did, and what was the result.

system note: i run those tests on a year old desktop, with all the database activity happening on a single 7200 rpm 300gb disk with 8 gb of ram. please don’t get to hung up on the actual numbers, i include them for reference, but real hardware on production system should kick this drastically higher. another thing to remember is that this was an active system, while all of those operations were running, i was actively working and developing on the machine. the main point is to give us some sort of a metric about where we are, and to see whatever we like this or not.

we keep looking at additional things that we can do with ravendb, and having large amount of information to tests things with is awesome. having non fake data is even awesomer, because fake data is predictable data, while real data tend to be much more… interesting.

that is why i decided to load the entire freedb database into ravendb and see what is happening.

what is freedb?

freedb is a database to look up cd information using the internet. this is done by a client (a freedb aware application) which calculates a (nearly) unique disc id for a cd in your cd-rom and then queries the database. as a result, the client displays the artist, cd-title, tracklist and some additional info.

the nice thing about freedb is that you can download their data* and make use of it yourself.

* the not so nice thing is that the data is in free form text format. i wrote a parser for it if you really want to use it, which you can find here: https://github.com/ayende/xmcdparser

so i decided to push all of this data into ravendb. the import process took a couple of hours (didn’t actually measure, so i am not sure exactly how much), and we ended up with a ravendb database with: 3,133,903 documents. memory usage during the import process was ~100  mb – 150 mb (no indexes were present).

the actual size in ravendb is 3.59 gb with 3.69 gb reserved on the file system.

starting the database from cold boot takes about 4 seconds.

this is what the document looks like:

image

a full backup of the database took about 3 minutes, with all of the time dedicate for pure i/o.

doing an export, using smuggler (on the local machine, 128 document batches) took about 18 minutes and resulted in a 803mb file (not surprising, smuggler output is a compressed file).

note that we created this in a completely empty database, so the next step was to actually create an index and see how the database behaves. we create the default raven/documentsbyentityname index, and got 5,870 seconds, so just over an hour and a half. for what it worth, this resulted in on disk index with a size of 125mb.

i then tried a much more complex index:

image

just to give you some idea, this index gives you full text search support over just about every music cd that was ever made . to be frank, this index scares me, because it means that we have to have index entry for every single track in the world.

after indexing was completed, we ended up with a 700 mb on disk presence. indexing took about 7 hours to complete. that is a lot, but remember what we are dealing with, we indexed 3.1 million documents, but we actually indexed, 52,561,894 values (remember, we index each and every track ).  the interesting bit is that while it took a lot of cpu (full text indexing usually does) memory usage was relatively low, it peaked about 300 mb and usually was around the 180mb).

searching over this index is not as fast as i would like, taking about a second to complete. then again, the results are quite impressive:

image

well, given that this is the equivalent of a 52 million records (in this case, literally records smile ) , and we are performing full text search, quite nice.

let us see what happens when do something a little simpler, shall we?

image

in this case, we are only indexing 3.1 millions documents, and we don’t do full text searches. this index took 2.3 hours to run.

queries on that are a much more satisfactory rate of starting out at 75 ms and dropping to 5 ms very quickly.

Database optimization

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Steel Threads Are a Technique That Will Make You a Better Engineer
  • Unlock the Power of Terragrunt’s Hierarchy
  • Custom Validators in Quarkus
  • 5 Steps for Getting Started in Deep Learning

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: