DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Norm of a One-Dimensional Tensor in Python Libraries
  • Improving Sentiment Score Accuracy With FinBERT and Embracing SOLID Principles
  • CockroachDB TIL: Volume 11
  • What Does Synchronization With Asyncio Look Like

Trending

  • Zero-Downtime Deployments for Java Apps on Kubernetes
  • Rethinking Java CRUDs With Event Sourcing and CQRS Patterns
  • Run Gemma 4 on Your Laptop: A Hands-On Guide to Google's Latest Open Multimodal LLM
  • How to Format Articles for DZone
  1. DZone
  2. Data Engineering
  3. Databases
  4. Exploring CockroachDB With Jupyter Notebook and Python

Exploring CockroachDB With Jupyter Notebook and Python

Today, we're going to explore CockroachDB from the Data Science perspective, using a popular exploratory web tool called Jupyter Notebook and Python language.

By 
Artem Ervits user avatar
Artem Ervits
DZone Core CORE ·
Jan. 11, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
4.1K Views

Join the DZone community and get the full member experience.

Join For Free

This is the next installment in the series of tutorials on CockroachDB and Docker Compose.

I was inspired to write this post based on this article. The article goes over using Jupyter with Oracle, MySql, and Postgresql, we're going to do the same with Cockroach! One caveat here is the heavy reliance on ipython-sql library. We're going to use Pandas library as the ipython-sql magic functions are not compatible with Cockroach today. Hopefully, you will find it useful.

You can find the older posts here: Part 1, Part 2, Part 3, Part 4.

  • Information on CockroachDB can be found here.
  • Information on Jupyter Notebook can be found here.

We're going to continue using our trusty docker-compose. Jupyter project publishes its images on Docker Hub. We're going to use the minimal image for this tutorial.

Here's my compose file:

version: '3.9'

services:

 crdb:
   image: cockroachdb/cockroach:v21.2.3
   container_name: crdb-1
   ports:
     - "26257:26257"
     - "8080:8080"
   command: start-single-node --insecure
   volumes:
     - ${PWD}/cockroach-data/crdb:/cockroach/cockroach-data:rw

 jupyter:
   image: jupyter/minimal-notebook
   container_name: jupyter
   environment:
     - GRANT_SUDO=yes
   ports:
     - "8888:8888"
   volumes:
     - $PWD:/home/jovyan/work


If you've been following my posts, you may notice that I switched to CockroacihDB version 19.2.3. For the Jupyter environment, I'm passing GRANT_SUDO=yes as an environment variable in case, we need to install some additional software inside the container. I'm also exposing the notebook at port 8888.

  1. Open Jupyter notebook

When docker-compose up is executed, Jupyter will output an access token to the Jupyter notebook.

    To access the notebook, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
    Or copy and paste one of these URLs:
        http://1b8009ed3252:8888/?token=37a818efee156c6e3b2d3bd21db07827ce250560a336987b
     or http://127.0.0.1:8888/?token=37a818efee156c6e3b2d3bd21db07827ce250560a336987b


You can also access this information with the `docker logs ' command. Copy one of the provided URLs and paste it into a browser.

  1. Once you're in the notebook, you will notice a work directory. Clicking into it will expose all of the files in your host's current directory.

This behavior can be explained by the following volume mapping.

volumes:
     - $PWD:/home/jovyan/work


Let's create a new notebook by clicking New and selecting Python 3. You can click on the title of the notebook and name it to your liking.

  1. Install prerequisite software

The beauty of Jupyter's environment is its ability to adapt to different scenarios. In the next step, we're going to install Python libraries using pip from within the notebook!

!pip install sqlalchemy cockroachdb pandas psycopg2-binary matplotlib


NOTE: Since the release of this post, cockroachdb adapter for sqlalchemy has been renamed to sqlalchemy-cockroachdb.

Then click on the Run icon to execute.

Before we go to the next step, let's switch to Cockroach and load some data.

  1. Initialize a workload.
docker exec -it crdb-1 ./cockroach workload init movr


This will initialize a workload for our application. We've chosen a sample movr application.

10:18 $ docker exec -it crdb-1 ./cockroach workload init movr
I200211 15:18:30.420866 1 workload/workloadsql/dataload.go:135  imported users (0s, 50 rows)
I200211 15:18:30.431182 1 workload/workloadsql/dataload.go:135  imported vehicles (0s, 15 rows)
I200211 15:18:30.468309 1 workload/workloadsql/dataload.go:135  imported rides (0s, 500 rows)
I200211 15:18:30.498384 1 workload/workloadsql/dataload.go:135  imported vehicle_location_histories (0s, 1000 rows)
I200211 15:18:30.528526 1 workload/workloadsql/dataload.go:135  imported promo_codes (0s, 1000 rows)


Now we're ready to populate the database:

docker exec -it crdb-1 ./cockroach workload run movr --duration=1m


The workload will run for 1m, specified by the --duration parameter.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   59.9s        0            439            7.3      1.9      1.9      2.4      2.5      4.5  addUser

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   59.9s        0            124            2.1      3.6      3.7      4.5      4.5      5.0  addVehicle
...


There are different workload generators available with Cockroach, feel free to explore them here. When load completes, you can log in to the database and poke around in the database.

docker exec -it crdb-1 ./cockroach sql --insecure
root@:26257/defaultdb> show databases;
  database_name
+---------------+
  bank
  defaultdb
  movr
  postgres
  system
(5 rows)

Time: 2.847ms

root@:26257/defaultdb> use movr;
SET

Time: 856.8µs

root@:26257/movr> show tables;
          table_name
+----------------------------+
  promo_codes
  rides
  user_promo_codes
  users
  vehicle_location_histories
  vehicles
(6 rows)

Time: 2.459ms

root@:26257/movr>


While we're there, let's also add a user with access to our database.

  1. Create a user and grant access
CREATE USER IF NOT EXISTS maxroach;
GRANT ALL ON DATABASE movr TO maxroach;
GRANT ALL ON TABLE movr.* TO maxroach;
\q


We can test access for users maxroach with:

docker exec -it crdb-1 ./cockroach sql --insecure --user=maxroach --database=movr


  1. Access the database in Jupyter.
import pandas as pd
from sqlalchemy.engine import create_engine
engine = create_engine('cockroachdb://maxroach@crdb-1:26257/movr')
df = pd.read_sql('select count(*) from vehicles', engine)


If you're familiar with Python, sqlalchemy is a popular Python framework for working with databases. Feel free to explore our docs for more in-depth examples.

Once a database connection is established, we use pandas data frame to read the results of the select statement. Click on Run to execute. Nothing happened!

Because we stored the result in a variable called df, we need to use the methods provided by the library.

Type the following and hit Run.

df.head()


The good news, the result matches the total count of vehicles in the database!

Let's try another query:

df = pd.read_sql('select city, count(city) from vehicles group by city', engine)
print(df)


  1. Use matplotlib to graph a plot

Jupyter is also capable of graphing your results, here's a simple graph chart of the data frame.

df.plot(x ='city', y='count', kind = 'bar')


  1. Graph a pie chart.
df.plot.pie(y='count',figsize=(5, 5),autopct='%1.1f%%', startangle=90)


This is just a taste of the powerful Jupyter Notebook and CockroachDB Python capabilities. Feel free to visit the docs for more.

Please share your feedback in the comments.

jupyter notebook Database connection Python (language) CockroachDB

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Norm of a One-Dimensional Tensor in Python Libraries
  • Improving Sentiment Score Accuracy With FinBERT and Embracing SOLID Principles
  • CockroachDB TIL: Volume 11
  • What Does Synchronization With Asyncio Look Like

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook