How to test your A.I. code with JupyterLab

There are many online development tools like Replit which can be used to test and share your Artificial Intelligence (A.I.) code. However, if you want to go further and have the full control of your code, this article will show you how to install your own A.I. testing tool based on Jupyterlab Development tool.

Jupyterlab is a web-based interactive development tool. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing and machine learning. Jupyterlab is an open-source software which is compatible with many languages like python and R which are very popular for data analysis.

Figure 1: A.I.testing tool installation steps

Step 1: Install virtualBox

VirtualBox a is a free open source virtualization tool which will allow you to create your own dedicated environment to test your A.I. code. It runs on a large number of operating systems (Windows, Linux, Macintosh, …) and can be downloaded here :

Note:
You need to install first the virtualBox plateform package and in a second time the virtualBox Extension Pack.

Figure 2: virtualBox Download page

Step 2: Create a new Virtual Machine

The creation of a new virtual machine is similar to the creation of a new server. You just need to start VirtualBox and use the menu “Machine » New…” and define the following parameters :

  • The name of the virtual machine
  • The folder where all the content of this new machine will be stored
  • The guest Operating System : Linux
  • The version of the guest Operating System : Debian (64-bit)
  • The memory size (ex: 8 Gb)
  • The virtual hard disk size (ex: 100Gb)
Figure 3: Creation of a new virtual machine

Step 3: Install a linux OS

You need to install an Operating System on top of your new virtual machine. In my case, I chose debian linux OS but you can use another linux system (Ubuntu, CentOS, etc…).

For this installation, you will need to download the image of the linux OS you want to install. For Debian, you can download the image (ex: debian-11.2.0-amd64-DVD-1.iso) here :

To use this image for your OS installation, click on the “Storage” section of your virtual machine and set up the IDE optical Drive with your downloaded image file.

Figure 4: IDE Optical Drive Setup

To install Debian in your new virtual machine, just select your virtual machine in VirtualBox and click on the Start button. You will get the following Debian installation screen to start your OS installation :

Figure 5: Debian Installation screen

An overview of the Debian linux installation steps can be found here :

This document will guide you in the following installation steps :

  • Setting up Debian Installer (language, location, keyboard, hostname, domain name, network, …)
  • Setting Up Users And Passwords
  • Configuring the Clock and Time Zone
  • Partitioning and Mount Point Selection
  • Installing the Base System
  • Installing Additional Software
  • Finishing the Installation (post install steps)

Note:
After installing your linux OS, don’t forget to remove the installation media from your virtual machine just after the reboot, so that you boot into your new linux system rather than restarting the installation from scratch…

Step 4: Install docker and docker-compose on your virtual machine

The only software you will need to install and run on your virtual machine is Docker and Docker-compose.
Docker is an open platform for developing, shipping, and running applications very quickly inside linux containers.
Docker-Compose is an orchestrator which manages, scales and maintains containerized applications.
Please refer to the following official documentation for the installation of Docker :

The following example provides the linux commands to run to install docker and docker-compose tools on your Debian OS :

# Setup Docker Repository
$ sudo apt update
$ sudo apt install -y apt-transport-https ca-certificates curl gnupg2 software-properties-common
$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list
$ sudo apt update


# Install Docker Engine
$ sudo apt update
$ sudo apt install -y docker-ce docker-ce-cli containerd.io
$ sudo systemctl start docker
$ sudo systemctl enable docker
$ sudo systemctl status docker
$ docker -v
==> Docker version 20.10.10, build b485636

# installation docker-compose
$ sudo curl -L "https://github.com/docker/compose/releases/download/v2.0.1/docker-compose-linux-x86_64" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
$ docker-compose version
==> Docker Compose version v2.0.1

# Allow Non-root users to run Docker Commands
$ sudo groupadd docker
$ sudo usermod -aG docker <non-root user>

Step 5 : Create jupyter container

Jupyter is an open-source software which support many different languages. You can get all the details of this tool here:


To create your own jupyter docker image, you can download all the needed files from the following GitHub repository:

After downloading all the needed files on your server and creating all the required directories, you should get the following tree :

# command to run from your virtual machine after downloading all the files 
# (the root directory is /lab/projet01 but you can change it)

$ cd /lab/projet01  
$ tree -L 5
+-- docker-code
¦   +-- jupyter
¦   ¦   +-- 3.2
¦   ¦       +-- jupyter-code
¦   ¦       ¦   +-- notebook01
¦   ¦       +-- bash-code
¦   ¦       +-- python-code
¦   ¦       +-- config
¦   ¦       +-- Dockerfile
¦   ¦       +-- README.md
¦   ¦       +-- scripts
¦   ¦       ¦   +-- docker-create-user.sh
¦   ¦       ¦   +-- docker-entrypoint.sh
¦   ¦       ¦   +-- docker-healthcheck.sh
¦   ¦       ¦   +-- docker-install-software.sh
¦   ¦       ¦   +-- password-convert-sha256.py
¦   ¦       +-- software
¦   +-- jupyter-docker-compose.yml
+-- docker-data
    +-- data-shared
    +-- data-jupyter
        +-- datasets

Files / Directories description :

  • jupyter-docker-compose.yml: compose file to start jupyter container
  • Dockerfile: docker file to define the content of jupyter container
  • docker-create-user.sh: bash script to create a none-root user
  • docker-entrypoint.sh: bash script to start the container
  • docker-healthcheck.sh: bash script to check the heath of the container
  • docker-install-software.sh: bash script to install jupyterlab tool and Data Science libraries
  • password-convert-sha256.py: script to encrypt the password to sha256 hash
  • data-shared: directory to share data between your jupyter container and the virtual machine
  • data-jupyter: directory to store datasets for testing purposes
  • notebook01: directory to store the jupyter code
  • bash-code: directory to store bash shell scripts
  • python-code: directory to store python scripts

Here is the content of the jupyter-docker-compose.yml

version: '3.8'
services:

# ----------------------------------------------
# jupyter docker creation
# ----------------------------------------------
 jupyter:
  build:
   context: jupyter/3.2
   args:
    USER_UID: 1001
    GROUP_GID: 1002
    UNAME: 'admin'
  image: lab/jupyter:3.2
  container_name: 'jupyter'
  hostname: jupyter
  tty: true
  networks:
   - lab
  links:
   - debian
  environment:
    JUPYTER_SETUP: 'notebook01'
    JUPYTER_PASSWORD: 'passwordtochange'
    DOCKER_DEBUG_FLAG: 'no'
  healthcheck:
       test: ['CMD', '/opt/jupyter/scripts/docker-healthcheck.sh']
       timeout: 5s
       interval: 30s
       start_period: 10s
       retries: 3
  restart: always
  ports:
   - '8888:8888'
  volumes:
   - /lab/projet01/docker-code/jupyter/3.2/bash-code:/opt/jupyter/bash-code
   - /lab/projet01/docker-code/jupyter/3.2/python-code:/opt/jupyter/python-code
   - /lab/projet01/docker-code/jupyter/3.2/jupyter-code:/opt/jupyter/jupyter-code
   - /lab/projet01/docker-data/data-jupyter/datasets:/opt/jupyter/datasets  
   - /lab/projet01/docker-data/data-shared:/opt/projet01/data-shared
   - volume_temp:/opt/projet01/temp  


# ----------------------------------------------
# networks
# ----------------------------------------------
networks:
   lab:
     driver: bridge     

# ----------------------------------------------
# volumes: volumes creation for persistant data
# ----------------------------------------------
volumes:
 volume_temp: 
   driver: local  

Step 6 : Start JupyterLab

The security of JupyterLab is based on a password. So before starting your jupyter container, don’t forget to update the jupyter-docker-compose file and change the variable “JUPYTER-PASSWORD” with your own password.

It will start the next-generation user interface named « JupyterLab » instead of the old version of « Jupyter Notebook » which is now deprecated. It offers more features and an enhanced interface, which can be extended through extensions you can get on github :

You can manage you own list of Data Science libraries to install in your JupyterLab tool by updating the installation script « docker-install-software.sh » accordingly. Currently, the following popular libraries for Data Science are installed by default :

  • Pandas : open source data analysis and manipulation tool, built on top of the Python programming language.
  • StatsModels : provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
  • Matplotlib : library for creating static, animated, and interactive visualizations in Python.
  • Seaborn : visualization library based on Matplotlib
  • Bokeh : provide another wide array of visualization tools in python
  • Sklearn : machine learning library. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and more …
  • NumPy : core library for scientific computing in Pythonn adding support for large, multi-dimensional arrays and matrices.
  • Tensorflow : free and open-source software library for machine learning and artificial intelligence.
  • Keras : library dedicated to Deep Learning with python
  • Spacy : library for Natural Language Processing in python
  • NLTK : another library for Natural Language Processing in python

To start JupyterLab, you will need to run the following docker command :

# commands to run from the directory you created on your own server :
# start jupyter container listening to port 8888

$ cd /lab/projet01/docker-code
$ docker-compose -f jupyter-docker-compose.yml up -d --build

# check that your jupyter container is working (status=healthy)
$ docker ps

CONTAINER ID   IMAGE            STATUS      PORTS                        NAMES
-------------------------------------------------------------------------------------
e321637b41c6   lab/jupyter:3.2  healthy     0.0.0.0:8888->8888/tcp,...   jupyter

# another check to confirm that you can run commands inside your jupyter container 
$ docker exec -it jupyter bash
$ sudo netstat -a|grep 8888

Proto Recv-Q Send-Q Local Address           Foreign Address         State      
--------------------------------------------------------------------------
tcp        0      0 jupyter:8888            0.0.0.0:*               LISTEN

If the above checks are successfull, then open a browser in your virtual machine and use the following URL to enter into your JupyterLab environment :

  • http://localhost:8888


You will get the following JupyterLab login screens :

Step 7 : Test it !

You are now ready to test your own machine learning algorithms !

Test N°1: Data Exploration with Seaborn
The following example is based on the dataset stored in seaborn library regarding three species of penguins (Adelie, Gentoo and Chinstrap). The following code visualizes the relationships between measurements of these three species :

# import library seaborn
import seaborn as sns
# set theme
sns.set_theme(style="ticks")

# load dataset penguins
penguins = sns.load_dataset("penguins")
# display the first rows of the dataset
penguins.head()
# plot pairwise relationships in the dataset  
sns.pairplot(penguins, hue="species")

To test this code, you just need to :

  • create a new python3 notebook in jupyterlab interface by clicking on the blue button « + »
  • do a copy/paste of your code in the first empty cell of your notebook
  • execute the code by clicking on the « run» button

The result of your code in each cell is displayed just after the cell.
In this first test, you will get the result of the « pairplot » command which will show you that the different measurements included in the dataset provide an excellent way to distinguish the three species of penguins, especially the length and depth of the bill :

Figure 9: Three different species of pengouins

Test N°2: Data Transformation with statsmodels

The following example is based on the dataset stored in statsmodels library regarding co2 Air Samples at Mauna Loa Observatory, Hawaii, U.S.A. detailed here :

The following code shows how to decompose this time series in three differents components (trend, seasonal and residual) :

# import python libraries
import pandas as pd
import statsmodels.api as sm
import matplotlib as mpl

# load dataset "co2"
data = sm.datasets.co2.load_pandas()
co2 = data.data

# resample data
y = co2['co2'].resample('MS').mean()
y = y.fillna(y.bfill())

# data visualization 
co2.head(2)

# decomposition of the time series
mpl.rcParams['figure.figsize'] = 11, 9
decomposition = sm.tsa.seasonal_decompose(y, model='additive')
fig = decomposition.plot()
mpl.pyplot.show()

The result shows the raw data (in green) decomposed in three different components (in blue) with statsmodels library :

Test N°3 : Images classification with Neural Networks

This test will show you how to classify and order images from the MNIST database (Modified National Institute of Standards and Technology database) which contains 60.000 handwritten digits. The first part of the code loads the images and plot the first ones :

# import Python librairies
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib as mpl
import warnings
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D, Convolution2D

# ignore warnings
warnings.filterwarnings("ignore")

# import mnist dataset
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# plot the images 
plt.figure(figsize=(10, 5)) 
for i in range(80):
   plt.subplot(10,20,i+1) 
   plt.imshow(X_train[i,:].reshape([28,28]), cmap='gray')
   plt.axis('off') 
plt.draw()

Here are the results of this code inside the JupyterLab Notebook “test-tensorflow.ipynb” :

The second part of the code creates a neural network and train it on the training MNIST dataset (X_train, y_train). Then the model is used to classify the testing part of the dataset (X_test, y_test) and to plot them in sequence :

# reshape the images and convert to binary class matrices
X_train2 = (X_train.reshape((60000, 28, 28, 1))).astype('float32') / 255
X_test2 = (X_test.reshape((10000, 28, 28, 1))).astype('float32') / 255
y_train2 = to_categorical(y_train)
y_test2 = to_categorical(y_test)

# creation of a neural network (convolutional model) from scratch and train it on the dataset
model = Sequential()
# 1. convolutional layer : 32 filters
model.add(Convolution2D(32,kernel_size=(5, 5),activation='relu',input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2,2)))
# 2. convolutional layer : 64 filters
model.add(Convolution2D(64,kernel_size=(5, 5),activation='relu',input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2,2)))
# 3. flatten layer
model.add(Flatten())
# 4. fully connected layer of 100 neurons
model.add(Dense(units=100, activation='relu'))
# 5. fully connected layer of 10 neurons
model.add(Dense(units=10, activation='softmax'))

# print and train the model
print(model.summary(line_length=80))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(X_train2, y_train2, batch_size=64, validation_split=0.1, epochs=5,verbose=1)
np.set_printoptions(precision=2)
prediction = model.predict(X_test2)

# print training score
scores = model.evaluate(X_test2, y_test2, verbose=0) 
print("model %s: %.2f%%" % (model.metrics_names[1], scores[1]*100),"nn")

# plot the testing images in order 
plt.figure(figsize=(10, 5))
n=0 
for digit in range(10): 
 for i in range(100): 
   b=np.argmax(prediction[i])
   if b==digit:   
      n=n+1  
      plt.subplot(10,20,n) 
      plt.imshow(X_test[i,:].reshape([28,28]), cmap='gray')
      plt.axis('off') 
 plt.draw()

You should get the following results :

Note:
In this test, the model’s accuracy is 99.15%. It means that for 100 testing digits, this model is able to classify correctly 99 of them.

Conclusion

That’s all folks !

If you need more details on jupyterlab, check the following online documentation :

For more explanations on the creation of this neural networks, you can check the following documentation which explain the role of each layer :

For other questions, please feel free to comment this article 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.