If you are here, then I assume you'd already know what Docker is all about and requires no further explanation. In case you have no clue about it, I'd highly recommend you search it and come back to this post for real hands-on and getting started. In this post, I'll try to give you an individual programmer's perspective on Docker.
Below is the Docker workflow:
There are basically three things you need on your machine to get started with Docker Containers:
Docker Machine: This is simply a Linux Virtual Machine on which Docker Containers are going to run.
Docker Images: These are similar to ISO images that you run over VM, but it's a highly stripped down version. All the redundant packages/libraries that are already there in Docker Machine have already been removed.
Docker Containers: These are snapshots of a Docker Image that you can start, stop, modify, or publish as another image.
After you install Docker from its official website, you should have Docker Machine called default installed and ready to run on you PC. In case it's missing you can always create one using the command
docker-machine create default .
Note that you can specify disk size and memory size of your Docker Machine in the above command by specifying special switches. I highly recommend it if you plan to do any data science stuff. You may also install multiple Docker machines with different configurations and purposes. You can check the list of machines using
docker-machine ls .
Here is the sample output:
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS default * virtualbox Running tcp://192.168.99.100:2376 v17.04.0-ce
You may start the machine using the
docker-machine start command. Note that I have skipped machine name default because you can skip machine name if it's default; in other cases, you'll have to specify.
You may check the status of your Docker machine by using the
docker-machine status command. This will simply say "running" or "stopped."
Another thing you need to be careful about is to make sure your host OS understands all the env variables of Docker Machine. The
docker-machine env command is the convenient way to get the script to run that would set up all the env variables.
SET DOCKER_TLS_VERIFY=1 SET DOCKER_HOST=tcp://192.168.99.100:2376 SET DOCKER_CERT_PATH=C:\Users\kushukla\.docker\machine\machines\default SET DOCKER_MACHINE_NAME=default SET COMPOSE_CONVERT_WINDOWS_PATHS=true REM Run this command to configure your shell: REM @FOR /f "tokens=*" %i IN ('docker-machine env') DO @%i
Just copy paste the output to the terminal or command prompt or follow the remark to set the env variables.
Ok, that's a lot about Docker Machine. Now let's pull some images and run on the machine. If you already know a repository, then you can issue the
docker pull rocker/rstudio command to download the image, but if you don't know the image, you can go to Docker Hub (similar to GitHub); it's a repository of Docker images and you can pick one to start with.
Let's install another one this time using run command. Run is similar to pull, it checks if the image is already available locally, if not it pulls it from Docker Hub, and start a new container using this image:
docker run -p 8888:8888 --name tensorflow -it gcr.io/tensorflow/udacity-assignments:1.0.0
The parameters above are basically used to customize the container. It says which port to map, what will be the name of the container and it should have an interactive tty. So, don't be surprised if this command blocks your terminal since it's an interactive tty, which means your terminal stdin, stdout, and stderr are now linked to the running container.
You may check the different images downloaded on you PC using the
docker images command.
REPOSITORY TAG IMAGE ID CREATED SIZE rocker/rstudio latest 7a807646f0be 11 days ago 993MB gcr.io/tensorflow/udacity-assignments 1.0.0 4e01459e7150 2 months ago 1.03GB
Docker images can be identified using Image ID or Repository in the above table. So, to delete an Image, you can run this command:
docker rmi 7a807646f0be
While Docker Images are static in nature, Containers are something that can start/stop and do the actual job. Containers are created using another Docker Image. You may run the below command to check what all containers are there on your PC.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 21d0cdc5051d gcr.io/tensorflow/udacity-assignments:1.0.0 "/run_jupyter.sh" 15 minutes ago Up 18 minutes tensorflow
If you remove the -a switch, you should see only the containers that are currently running. In our case, that's only the rstudio container.
You may stop a container using
docker stop rstudio
To start a container, you can guess:
docker start rstudio
I'd rather start it as below:
docker run -d -p 8787:8787 -v /c/Users/kushukla:/home/rstudio/kushukla --name rstudio rocker/rstudio This is because I want to open up 8787 and also link my local directory to container's file system so that my R code is accessible in that container. Now if you wonder what happens next. I can go to a browser and open up RStudio. IP is specified in env variable and Port in the Docker run.
That is all. I hope you found it interesting.