Exploring CockroachDB With Jupyter Notebook and Python

Today, we're going to explore CockroachDB from the Data Science perspective, using a popular exploratory web tool called Jupyter Notebook and Python language.

Artem Ervits

CORE ·

Jan. 11, 22 · Tutorial

Likes (3)

Comment

Save

4.0K Views

This is the next installment in the series of tutorials on CockroachDB and Docker Compose.

I was inspired to write this post based on this article. The article goes over using Jupyter with Oracle, MySql, and Postgresql, we're going to do the same with Cockroach! One caveat here is the heavy reliance on ipython-sql library. We're going to use Pandas library as the ipython-sql magic functions are not compatible with Cockroach today. Hopefully, you will find it useful.

You can find the older posts here: Part 1, Part 2, Part 3, Part 4.

Information on CockroachDB can be found here.

Information on Jupyter Notebook can be found here.

We're going to continue using our trusty docker-compose. Jupyter project publishes its images on Docker Hub. We're going to use the minimal image for this tutorial.

Here's my compose file:

version: '3.9'

services:

 crdb:
   image: cockroachdb/cockroach:v21.2.3
   container_name: crdb-1
   ports:
     - "26257:26257"
     - "8080:8080"
   command: start-single-node --insecure
   volumes:
     - ${PWD}/cockroach-data/crdb:/cockroach/cockroach-data:rw

 jupyter:
   image: jupyter/minimal-notebook
   container_name: jupyter
   environment:
     - GRANT_SUDO=yes
   ports:
     - "8888:8888"
   volumes:
     - $PWD:/home/jovyan/work

If you've been following my posts, you may notice that I switched to CockroacihDB version 19.2.3. For the Jupyter environment, I'm passing GRANT_SUDO=yes as an environment variable in case, we need to install some additional software inside the container. I'm also exposing the notebook at port 8888.

Open Jupyter notebook

When docker-compose up is executed, Jupyter will output an access token to the Jupyter notebook.

    To access the notebook, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
    Or copy and paste one of these URLs:
        http://1b8009ed3252:8888/?token=37a818efee156c6e3b2d3bd21db07827ce250560a336987b
     or http://127.0.0.1:8888/?token=37a818efee156c6e3b2d3bd21db07827ce250560a336987b

You can also access this information with the `docker logs ' command. Copy one of the provided URLs and paste it into a browser.

Once you're in the notebook, you will notice a work directory. Clicking into it will expose all of the files in your host's current directory.

This behavior can be explained by the following volume mapping.

volumes:
     - $PWD:/home/jovyan/work

Let's create a new notebook by clicking New and selecting Python 3. You can click on the title of the notebook and name it to your liking.

Install prerequisite software

The beauty of Jupyter's environment is its ability to adapt to different scenarios. In the next step, we're going to install Python libraries using pip from within the notebook!

!pip install sqlalchemy cockroachdb pandas psycopg2-binary matplotlib

NOTE: Since the release of this post, cockroachdb adapter for sqlalchemy has been renamed to sqlalchemy-cockroachdb.

Then click on the Run icon to execute.

Before we go to the next step, let's switch to Cockroach and load some data.

Initialize a workload.

docker exec -it crdb-1 ./cockroach workload init movr

This will initialize a workload for our application. We've chosen a sample movr application.

10:18 $ docker exec -it crdb-1 ./cockroach workload init movr
I200211 15:18:30.420866 1 workload/workloadsql/dataload.go:135  imported users (0s, 50 rows)
I200211 15:18:30.431182 1 workload/workloadsql/dataload.go:135  imported vehicles (0s, 15 rows)
I200211 15:18:30.468309 1 workload/workloadsql/dataload.go:135  imported rides (0s, 500 rows)
I200211 15:18:30.498384 1 workload/workloadsql/dataload.go:135  imported vehicle_location_histories (0s, 1000 rows)
I200211 15:18:30.528526 1 workload/workloadsql/dataload.go:135  imported promo_codes (0s, 1000 rows)

Now we're ready to populate the database:

docker exec -it crdb-1 ./cockroach workload run movr --duration=1m

The workload will run for 1m, specified by the --duration parameter.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   59.9s        0            439            7.3      1.9      1.9      2.4      2.5      4.5  addUser

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   59.9s        0            124            2.1      3.6      3.7      4.5      4.5      5.0  addVehicle
...

There are different workload generators available with Cockroach, feel free to explore them here. When load completes, you can log in to the database and poke around in the database.

docker exec -it crdb-1 ./cockroach sql --insecure

root@:26257/defaultdb> show databases;
  database_name
+---------------+
  bank
  defaultdb
  movr
  postgres
  system
(5 rows)

Time: 2.847ms

root@:26257/defaultdb> use movr;
SET

Time: 856.8µs

root@:26257/movr> show tables;
          table_name
+----------------------------+
  promo_codes
  rides
  user_promo_codes
  users
  vehicle_location_histories
  vehicles
(6 rows)

Time: 2.459ms

root@:26257/movr>

While we're there, let's also add a user with access to our database.

Create a user and grant access

CREATE USER IF NOT EXISTS maxroach;
GRANT ALL ON DATABASE movr TO maxroach;
GRANT ALL ON TABLE movr.* TO maxroach;
\q

We can test access for users maxroach with:

docker exec -it crdb-1 ./cockroach sql --insecure --user=maxroach --database=movr

Access the database in Jupyter.

import pandas as pd
from sqlalchemy.engine import create_engine
engine = create_engine('cockroachdb://maxroach@crdb-1:26257/movr')
df = pd.read_sql('select count(*) from vehicles', engine)

If you're familiar with Python, sqlalchemy is a popular Python framework for working with databases. Feel free to explore our docs for more in-depth examples.

Once a database connection is established, we use pandas data frame to read the results of the select statement. Click on Run to execute. Nothing happened!

Because we stored the result in a variable called df, we need to use the methods provided by the library.

Type the following and hit Run.

df.head()

The good news, the result matches the total count of vehicles in the database!

Let's try another query:

df = pd.read_sql('select city, count(city) from vehicles group by city', engine)
print(df)

Use matplotlib to graph a plot

Jupyter is also capable of graphing your results, here's a simple graph chart of the data frame.

df.plot(x ='city', y='count', kind = 'bar')

Graph a pie chart.

df.plot.pie(y='count',figsize=(5, 5),autopct='%1.1f%%', startangle=90)

This is just a taste of the powerful Jupyter Notebook and CockroachDB Python capabilities. Feel free to visit the docs for more.

Please share your feedback in the comments.

jupyter notebook Database connection Python (language) CockroachDB

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Exploring CockroachDB With Jupyter Notebook and Python

Today, we're going to explore CockroachDB from the Data Science perspective, using a popular exploratory web tool called Jupyter Notebook and Python language.

Related

Partner Resources