How to test your A.I. code with JupyterLab
There are many online development tools like Replit which can be used to test and share your Artificial Intelligence (A.I.) code. However, if you want to go further and have the full control of your code, this article will show you how to install your own A.I. testing tool based on Jupyterlab Development tool.
Jupyterlab is a web-based interactive development tool. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing and machine learning. Jupyterlab is an open-source software which is compatible with many languages like python and R which are very popular for data analysis.
Step 1: Install virtualBox
VirtualBox a is a free open source virtualization tool which will allow you to create your own dedicated environment to test your A.I. code. It runs on a large number of operating systems (Windows, Linux, Macintosh, …) and can be downloaded here :
You need to install first the virtualBox plateform package and in a second time the virtualBox Extension Pack.
Step 2: Create a new Virtual Machine
The creation of a new virtual machine is similar to the creation of a new server. You just need to start VirtualBox and use the menu “Machine » New…” and define the following parameters :
- The name of the virtual machine
- The folder where all the content of this new machine will be stored
- The guest Operating System : Linux
- The version of the guest Operating System : Debian (64-bit)
- The memory size (ex: 8 Gb)
- The virtual hard disk size (ex: 100Gb)
Step 3: Install a linux OS
You need to install an Operating System on top of your new virtual machine. In my case, I chose debian linux OS but you can use another linux system (Ubuntu, CentOS, etc…).
For this installation, you will need to download the image of the linux OS you want to install. For Debian, you can download the image (ex: debian-11.2.0-amd64-DVD-1.iso) here :
To use this image for your OS installation, click on the “Storage” section of your virtual machine and set up the IDE optical Drive with your downloaded image file.
To install Debian in your new virtual machine, just select your virtual machine in VirtualBox and click on the Start button. You will get the following Debian installation screen to start your OS installation :
An overview of the Debian linux installation steps can be found here :
This document will guide you in the following installation steps :
- Setting up Debian Installer (language, location, keyboard, hostname, domain name, network, …)
- Setting Up Users And Passwords
- Configuring the Clock and Time Zone
- Partitioning and Mount Point Selection
- Installing the Base System
- Installing Additional Software
- Finishing the Installation (post install steps)
After installing your linux OS, don’t forget to remove the installation media from your virtual machine just after the reboot, so that you boot into your new linux system rather than restarting the installation from scratch…
Step 4: Install docker and docker-compose on your virtual machine
The only software you will need to install and run on your virtual machine is Docker and Docker-compose.
Docker is an open platform for developing, shipping, and running applications very quickly inside linux containers.
Docker-Compose is an orchestrator which manages, scales and maintains containerized applications.
Please refer to the following official documentation for the installation of Docker :
- Ubuntu : https://docs.docker.com/engine/install/ubuntu/
- Debian : https://docs.docker.com/engine/install/debian/
- Centos : https://docs.docker.com/engine/install/centos/
- Fedora : https://docs.docker.com/engine/install/fedora/
The following example provides the linux commands to run to install docker and docker-compose tools on your Debian OS :
# Setup Docker Repository $ sudo apt update $ sudo apt install -y apt-transport-https ca-certificates curl gnupg2 software-properties-common $ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg $ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list $ sudo apt update # Install Docker Engine $ sudo apt update $ sudo apt install -y docker-ce docker-ce-cli containerd.io $ sudo systemctl start docker $ sudo systemctl enable docker $ sudo systemctl status docker $ docker -v ==> Docker version 20.10.10, build b485636 # installation docker-compose $ sudo curl -L "https://github.com/docker/compose/releases/download/v2.0.1/docker-compose-linux-x86_64" -o /usr/local/bin/docker-compose $ sudo chmod +x /usr/local/bin/docker-compose $ docker-compose version ==> Docker Compose version v2.0.1 # Allow Non-root users to run Docker Commands $ sudo groupadd docker $ sudo usermod -aG docker <non-root user>
Step 5 : Create jupyter container
Jupyter is an open-source software which support many different languages. You can get all the details of this tool here:
To create your own jupyter docker image, you can download all the needed files from the following GitHub repository:
After downloading all the needed files on your server and creating all the required directories, you should get the following tree :
# command to run from your virtual machine after downloading all the files # (the root directory is /lab/projet01 but you can change it) $ cd /lab/projet01 $ tree -L 5 +-- docker-code ¦ +-- jupyter ¦ ¦ +-- 3.2 ¦ ¦ +-- jupyter-code ¦ ¦ ¦ +-- notebook01 ¦ ¦ +-- bash-code ¦ ¦ +-- python-code ¦ ¦ +-- config ¦ ¦ +-- Dockerfile ¦ ¦ +-- README.md ¦ ¦ +-- scripts ¦ ¦ ¦ +-- docker-create-user.sh ¦ ¦ ¦ +-- docker-entrypoint.sh ¦ ¦ ¦ +-- docker-healthcheck.sh ¦ ¦ ¦ +-- docker-install-software.sh ¦ ¦ ¦ +-- password-convert-sha256.py ¦ ¦ +-- software ¦ +-- jupyter-docker-compose.yml +-- docker-data +-- data-shared +-- data-jupyter +-- datasets
Files / Directories description :
- jupyter-docker-compose.yml: compose file to start jupyter container
- Dockerfile: docker file to define the content of jupyter container
- docker-create-user.sh: bash script to create a none-root user
- docker-entrypoint.sh: bash script to start the container
- docker-healthcheck.sh: bash script to check the heath of the container
- docker-install-software.sh: bash script to install jupyterlab tool and Data Science libraries
- password-convert-sha256.py: script to encrypt the password to sha256 hash
- data-shared: directory to share data between your jupyter container and the virtual machine
- data-jupyter: directory to store datasets for testing purposes
- notebook01: directory to store the jupyter code
- bash-code: directory to store bash shell scripts
- python-code: directory to store python scripts
Here is the content of the jupyter-docker-compose.yml
version: '3.8' services: # ---------------------------------------------- # jupyter docker creation # ---------------------------------------------- jupyter: build: context: jupyter/3.2 args: USER_UID: 1001 GROUP_GID: 1002 UNAME: 'admin' image: lab/jupyter:3.2 container_name: 'jupyter' hostname: jupyter tty: true networks: - lab links: - debian environment: JUPYTER_SETUP: 'notebook01' JUPYTER_PASSWORD: 'passwordtochange' DOCKER_DEBUG_FLAG: 'no' healthcheck: test: ['CMD', '/opt/jupyter/scripts/docker-healthcheck.sh'] timeout: 5s interval: 30s start_period: 10s retries: 3 restart: always ports: - '8888:8888' volumes: - /lab/projet01/docker-code/jupyter/3.2/bash-code:/opt/jupyter/bash-code - /lab/projet01/docker-code/jupyter/3.2/python-code:/opt/jupyter/python-code - /lab/projet01/docker-code/jupyter/3.2/jupyter-code:/opt/jupyter/jupyter-code - /lab/projet01/docker-data/data-jupyter/datasets:/opt/jupyter/datasets - /lab/projet01/docker-data/data-shared:/opt/projet01/data-shared - volume_temp:/opt/projet01/temp # ---------------------------------------------- # networks # ---------------------------------------------- networks: lab: driver: bridge # ---------------------------------------------- # volumes: volumes creation for persistant data # ---------------------------------------------- volumes: volume_temp: driver: local
Step 6 : Start JupyterLab
The security of JupyterLab is based on a password. So before starting your jupyter container, don’t forget to update the jupyter-docker-compose file and change the variable “JUPYTER-PASSWORD” with your own password.
It will start the next-generation user interface named « JupyterLab » instead of the old version of « Jupyter Notebook » which is now deprecated. It offers more features and an enhanced interface, which can be extended through extensions you can get on github :
You can manage you own list of Data Science libraries to install in your JupyterLab tool by updating the installation script « docker-install-software.sh » accordingly. Currently, the following popular libraries for Data Science are installed by default :
- Pandas : open source data analysis and manipulation tool, built on top of the Python programming language.
- StatsModels : provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
- Matplotlib : library for creating static, animated, and interactive visualizations in Python.
- Seaborn : visualization library based on Matplotlib
- Bokeh : provide another wide array of visualization tools in python
- Sklearn : machine learning library. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and more …
- NumPy : core library for scientific computing in Pythonn adding support for large, multi-dimensional arrays and matrices.
- Tensorflow : free and open-source software library for machine learning and artificial intelligence.
- Keras : library dedicated to Deep Learning with python
- Spacy : library for Natural Language Processing in python
- NLTK : another library for Natural Language Processing in python
To start JupyterLab, you will need to run the following docker command :
# commands to run from the directory you created on your own server : # start jupyter container listening to port 8888 $ cd /lab/projet01/docker-code $ docker-compose -f jupyter-docker-compose.yml up -d --build # check that your jupyter container is working (status=healthy) $ docker ps CONTAINER ID IMAGE STATUS PORTS NAMES ------------------------------------------------------------------------------------- e321637b41c6 lab/jupyter:3.2 healthy 0.0.0.0:8888->8888/tcp,... jupyter # another check to confirm that you can run commands inside your jupyter container $ docker exec -it jupyter bash $ sudo netstat -a|grep 8888 Proto Recv-Q Send-Q Local Address Foreign Address State -------------------------------------------------------------------------- tcp 0 0 jupyter:8888 0.0.0.0:* LISTEN
If the above checks are successfull, then open a browser in your virtual machine and use the following URL to enter into your JupyterLab environment :
You will get the following JupyterLab login screens :
Step 7 : Test it !
You are now ready to test your own machine learning algorithms !
Test N°1: Data Exploration with Seaborn
The following example is based on the dataset stored in seaborn library regarding three species of penguins (Adelie, Gentoo and Chinstrap). The following code visualizes the relationships between measurements of these three species :
# import library seaborn import seaborn as sns # set theme sns.set_theme(style="ticks") # load dataset penguins penguins = sns.load_dataset("penguins") # display the first rows of the dataset penguins.head() # plot pairwise relationships in the dataset sns.pairplot(penguins, hue="species")
To test this code, you just need to :
- create a new python3 notebook in jupyterlab interface by clicking on the blue button « + »
- do a copy/paste of your code in the first empty cell of your notebook
- execute the code by clicking on the « run» button
The result of your code in each cell is displayed just after the cell.
In this first test, you will get the result of the « pairplot » command which will show you that the different measurements included in the dataset provide an excellent way to distinguish the three species of penguins, especially the length and depth of the bill :
Test N°2: Data Transformation with statsmodels
The following example is based on the dataset stored in statsmodels library regarding co2 Air Samples at Mauna Loa Observatory, Hawaii, U.S.A. detailed here :
The following code shows how to decompose this time series in three differents components (trend, seasonal and residual) :
# import python libraries import pandas as pd import statsmodels.api as sm import matplotlib as mpl # load dataset "co2" data = sm.datasets.co2.load_pandas() co2 = data.data # resample data y = co2['co2'].resample('MS').mean() y = y.fillna(y.bfill()) # data visualization co2.head(2) # decomposition of the time series mpl.rcParams['figure.figsize'] = 11, 9 decomposition = sm.tsa.seasonal_decompose(y, model='additive') fig = decomposition.plot() mpl.pyplot.show()
The result shows the raw data (in green) decomposed in three different components (in blue) with statsmodels library :
Test N°3 : Images classification with Neural Networks
This test will show you how to classify and order images from the MNIST database (Modified National Institute of Standards and Technology database) which contains 60.000 handwritten digits. The first part of the code loads the images and plot the first ones :
# import Python librairies import matplotlib.pyplot as plt import seaborn as sns import numpy as np import matplotlib as mpl import warnings from tensorflow.keras.utils import to_categorical from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D, Convolution2D # ignore warnings warnings.filterwarnings("ignore") # import mnist dataset from keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data() # plot the images plt.figure(figsize=(10, 5)) for i in range(80): plt.subplot(10,20,i+1) plt.imshow(X_train[i,:].reshape([28,28]), cmap='gray') plt.axis('off') plt.draw()
Here are the results of this code inside the JupyterLab Notebook “test-tensorflow.ipynb” :
The second part of the code creates a neural network and train it on the training MNIST dataset (X_train, y_train). Then the model is used to classify the testing part of the dataset (X_test, y_test) and to plot them in sequence :
# reshape the images and convert to binary class matrices X_train2 = (X_train.reshape((60000, 28, 28, 1))).astype('float32') / 255 X_test2 = (X_test.reshape((10000, 28, 28, 1))).astype('float32') / 255 y_train2 = to_categorical(y_train) y_test2 = to_categorical(y_test) # creation of a neural network (convolutional model) from scratch and train it on the dataset model = Sequential() # 1. convolutional layer : 32 filters model.add(Convolution2D(32,kernel_size=(5, 5),activation='relu',input_shape=(28, 28, 1))) model.add(MaxPooling2D(pool_size=(2,2))) # 2. convolutional layer : 64 filters model.add(Convolution2D(64,kernel_size=(5, 5),activation='relu',input_shape=(28, 28, 1))) model.add(MaxPooling2D(pool_size=(2,2))) # 3. flatten layer model.add(Flatten()) # 4. fully connected layer of 100 neurons model.add(Dense(units=100, activation='relu')) # 5. fully connected layer of 10 neurons model.add(Dense(units=10, activation='softmax')) # print and train the model print(model.summary(line_length=80)) model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy']) model.fit(X_train2, y_train2, batch_size=64, validation_split=0.1, epochs=5,verbose=1) np.set_printoptions(precision=2) prediction = model.predict(X_test2) # print training score scores = model.evaluate(X_test2, y_test2, verbose=0) print("model %s: %.2f%%" % (model.metrics_names, scores*100),"nn") # plot the testing images in order plt.figure(figsize=(10, 5)) n=0 for digit in range(10): for i in range(100): b=np.argmax(prediction[i]) if b==digit: n=n+1 plt.subplot(10,20,n) plt.imshow(X_test[i,:].reshape([28,28]), cmap='gray') plt.axis('off') plt.draw()
You should get the following results :
In this test, the model’s accuracy is 99.15%. It means that for 100 testing digits, this model is able to classify correctly 99 of them.
That’s all folks !
If you need more details on jupyterlab, check the following online documentation :
For more explanations on the creation of this neural networks, you can check the following documentation which explain the role of each layer :
For other questions, please feel free to comment this article 😉