The 4 Best Jupyter Notebook Environments for Deep Learning
As deep learning and artificial intelligence continue to grow in popularity, so too do Jupyter notebooks. Take a look at some of the most popular ones.
Join the DZone community and get the full member experience.Join For Free
Notebooks are becoming the standard for prototyping and analysis for data scientists. Many cloud providers offer machine learning and deep learning services in the form of Jupyter notebooks. Other players have now begun to offer cloud-hosted Jupyter environments, with similar storage, compute and pricing structures. One of the main differences can be multi-language support and version control options that allow data scientists to share their work in one place.
The Increasing Popularity of Jupyter Notebook Environments
Jupyter notebook environments are now becoming the first destination in the journey to productizing your data science project. The notebook environment allows us to keep track of errors and maintain clean code. One of the best features although simple is that the notebook would stop compiling your code if it spots an error. Regular IDE’s do not stop compilation even if an error is detected and, depending on the amount of code, it can be a waste of time to go back and manually detect where the error is located.
Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment, which is why many companies now offer cloud-hosted notebooks that are hosted on the cloud and accessible to millions of people. Many data scientists do not have the necessary hardware for conducting large scale deep learning, but with cloud hosted environments, the hardware and backend configurations are mostly taken care of which leaves the user to only configure their desired parameters.
MatrixDS is a cloud platform that provides a social network experience combined with GitHub that is tailored for sharing your data science projects with peers. They support some of the most used technologies such as R, Python, Shiny, MongoDB, NGINX, Julia, MySQL, and PostgreSQL.
They offer both free and paid tiers as well. The paid tier is similar to what is offered on the major cloud platforms whereby you can pay by usage or time. The platform provides GPU support as needed so that memory-heavy and compute-heavy tasks can be accomplished when a local machine is not sufficient.
To get started with a Jupyter notebook environment in MatrixDS:
- Sign-up for the service to create an account. It should be a free account by default.
- You will then be prompted to a Projects page. Here, click on the green button on the top right corner to start a new project. Give it a name and description and click CREATE.
- Then you will be asked to set some configurations, such as the amount of RAM and cores. Because it is a free account, you will be limited to 4GB RAM and a 1 Core CPU.
- Once completed, you will be taken to the page where your tool of choice (a Jupyter Notebook instance) will be configuring and getting ready.
- Once you see it is completed the set-up process, click START and once it is in operation, click OPEN and you will be taken to a new tab with your Jupyter Notebook instance.
2.) Google Collaboratory
- Google Colab is a free Jupyter notebook environment provided by Google especially for deep learning tasks. It runs completely in the cloud and enables you to share your work, save to your Google Drive directly and offers resources for compute power.
- One of the major advantages of Colab is it offers free GPU support (with limits placed of course – check their FAQ). See this great article by Anne Bommer on getting started with Google Colab.
- It not only comes with GPU support, we also have access to TPU’s on Colab.
A simple example of using Google colab for your Jupyter environment besides the regular Jupyter Notebook is the ability to use The cv2.imshow() and cv.imshow() functions from the opencv-python package. The two functions are incompatible with the stand-alone Jupyter Notebook. Googel colab offers a custom fix for this issue:
Run the above code in a code cell to verify that it is indeed working and begin your image and video processing tasks.
Google Cloud offers an integrated JupyterLab managed instance that comes pre-installed with the latest machine learning and deep learning libraries such as TensorFlow, PyTorch, scikit-learn, pandas, NumPy, SciPy, and Matplotlib. The notebook instance is integrated with BigQuery, Cloud Dataproc and Cloud Dataflow to offer a seamless experience from ingestion and preprocessing, to exploration, training and deployment.
The integrated services makes it hassle-free for users to scale up on demand by adding compute and storage capacity with a few clicks.
To begin your JupyterLab instance on GCP follow the steps in:
Run the following code with Keras to see how well a cloud environment and GPU support can speed up your analysis.
Here is the link to the dataset: Dataset CSV File (pima-indians-diabetes.csv). The dataset should be in the same working directory as your python file to make it simple.
Save it with the filename: pima-diabetes.csv
4.) Saturn Cloud
Saturn Cloud is a new cloud service that offers one-click Jupyter notebooks hosted on the cloud that can scale up to your compute and storage requirements using AWS in the backend. Here is a tutorial to get started.
Saturn Cloud is supposed to handle the DevOps side of data science and make your analysis more reproducible by offering version control and collaboration opportunities.
Saturn Cloud offers parallel computing infrastructure with Dask (written in Python) instead of other big data tools such as Spark.
To get started with Saturn Cloud:
- Go to their login and create an account: Saturn Cloud Login. The basic plan is free to get used to the environment
- To create your notebook instance:
- Specify a name for the notebook
- The amount of storage
- The GPU or CPU to be used
- (Optional) Python environment (eg:- Pip, Conda)
- (Optional) Auto-Shutdown
- A requirements.txt to install the libraries for your project.
- After the above parameters have been specified you can click CREATE to start the server and your notebook instance.
- Saturn Cloud also offers to host your notebook making it shareable. This is an example of Saturn Cloud taking care of the DevOps for a data science project so that the user need not worry.
Run the below code to verify your instance is running as intended.
What is the Best Jupyter Notebook Environment?
We ranked the Jupyter Notebook Environments from best to worst based on a number of different factors like analysis, visualization capabilities, data storage, and database functionality. Each platform is different with its best and worst use cases and its own unique selling point.
All of the above services are made to cater to your deep learning requirements and provide an environment of reproducibility to share you work and conduct your analysis with as little backend work as possible. As deep learning hits new milestones, the algorithms still require vast amounts of data and most data scientists do not have the capacity in their local machine to make it happen. This is when the above alternatives allow us to conduct our analysis with a seamless experience. The following is our best attempt at an objective point of view for which platform is best and which is the worst:
MatrixDS is unique from the others in that it gives users the options to different tools for different tasks. For analysis, it provides Python, R, Julia, Tensorboard and more, and for visualization it can support Superset, Shiny, Flask, Bokeh. To store data it works with PostgreSQL.
Saturn cloud provides parallel computing support and makes the sign-up process and creating a Jupyter notebook is simpler compared to the other providers on this list. For users that just want to get started with minimal frills and only need a server that can handle big data, this is probably the best choice.
AI Platform Notebooks by Google:
This notebook environment provides support for both Python and R. Data science users may have a preferred language and the support for both on a major cloud provider is an attractive offer. It gives access to GCP’s other services such as BigQuery as well, making querying data more efficient and powerful.
While quite powerful and the only one to offer TPU support, it is not feature rich for a relatively comprehensive data science workflow as the others. It only has Python support and functions similarly to a standard Jupyter Notebook with a different user interface. It offers to share your notebook on your google drive and can access your google drive data as well.
Published at DZone with permission of Kevin Vu. See the original article here.
Opinions expressed by DZone contributors are their own.