DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. FLOSS Moling with RavenDB

FLOSS Moling with RavenDB

Oren Eini user avatar by
Oren Eini
·
Jan. 25, 13 · Interview
Like (0)
Save
Tweet
Share
3.41K Views

Join the DZone community and get the full member experience.

Join For Free

there is the floss mole data set , which provide a lot of interesting information about open source projects. as i am always interested in testing ravendb with different data sets, i decided that this would be a great opportunity to do that, and get some additional information about how things are working as well.

the data is provided in a number of formats, but most of them aren’t really easy to access. sql statements and raw text files that i assume to be tab separated, but i couldn’t really figure out quickly.

i decided that this would be a great example of actually migrating content from a sql system to a ravendb system. the first thing to do was to install mysql, as that seems to be the easiest way to get the data out. (as a note, mysql workbench is really not what i would call nice.)

the data looks like this, this is the google code projects, and you can also see that a lot of the data is driven from the notion of a project.

image

i explored the data a bit, and i came to the conclusion that this is pretty simple stuff, overall. there are a few many to one associations, but all of them were capped (the max was 20 or so).

that meant, in turn, that we had a really simple work to do for the import process. i started by creating the actual model which we will use to save to ravendb:

image

the rest was just a matter of reading from mysql and writing to ravendb. i chose to use peta poco for the sql access, because it is the easiest. the following code sucks . it is written with the assumption that i know what the data sizes are, that the cost of making so many queries (roughly a 1,500,000 queries) is acceptable, etc.

    using (var docstore = new documentstore
        {
            connectionstringname = "ravendb"
        }.initialize())
    using (var db = new petapoco.database("mysql"))
    using (var bulk = docstore.bulkinsert())
    {
        foreach (var prj in db.query<dynamic>("select * from gc_projects").tolist())
        {
            string name = prj.proj_name;
            bulk.store(new project
                {
                    name = name,
                    codelicense = prj.code_license,
                    codeurl = prj.code_url,
                    contentlicense = prj.content_license,
                    contenturl = prj.content_url,
                    description = prj.project_description,
                    summary = prj.project_summary,
                    labels = db.query<string>("select label from gc_project_labels where proj_name = @0", name)
                                    .tolist(),
                    blogs = db.query<dynamic>("select * from gc_project_blogs where proj_name = @0", name)
                                .select(x => new blog { link = x.blog_link, title = x.blog_title })
                                .tolist(),
                    groups = db.query<dynamic>("select * from gc_project_groups where proj_name = @0", name)
                                .select(x => new group { name = x.group_name, url = x.group_url })
                                .tolist(),
                    links = db.query<dynamic>("select * from gc_project_links where proj_name = @0", name)
                                .select(x => new link { url = x.link, title = x.link_title })
                                .tolist(),
                    people = db.query<dynamic>("select * from gc_project_people where proj_name = @0", name)
                        .select(x => new person
                            {
                                name = x.person_name,
                                role = x.role,
                                userid = x.user_id
                            })
                        .tolist(),
                });
        }
    }

but, it does the work, and it was simple to write. using this code, i was able to insert 299,949 projects in just under 13 minutes. most of the time went to making those 1.5 million queries to the db, by the way.

everything is cool, and it is quite nice. on the next post, i’ll talk about why i wanted a new dataset. don’t worry, it is going to be cool.



MySQL Workbench Data (computing)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Why It Is Important To Have an Ownership as a DevOps Engineer
  • When AI Strengthens Good Old Chatbots: A Brief History of Conversational AI
  • Handling Automatic ID Generation in PostgreSQL With Node.js and Sequelize
  • Efficiently Computing Permissions at Scale: Our Engineering Approach

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: