Running a Container for Data Science in a Kubernetes Cluster

Using this guide, you can run a container with pre-installed machine learning tools in your Kubernetes cluster and run the Jupyter Notebook service in it.

Below is the list of packages in a container:

Containers can be used for training and model inference when developing apps and working with data.

Creating a Сluster and Getting Started

Follow these steps to create a cluster:

  1. Go to Cloud platform — Kubernetes in the Control panel.
  2. Create a cluster by following the instructions.
  3. We recommend choosing a configuration of at least 4 vCPUs, 8 GB RAM, and 20 GB SSD.

    Please note that system requirements may vary depending on the executable scripts.

  4. Wait for the cluster status to become Active.

  5. Select the created cluster and go to the Settings tab.

  6. Click Download kubeconfig and save the YAML configuration file.

  7. Set it a name, for example, my-kube.yaml.

Managing Clusters through Console Client

Install kubectl, the Kubernetes console client.

In the console, export the path to my-kube.yaml file downloaded in step 6 to an environment variable:

export KUBECONFIG=my-kube.yaml

Verify that the configuration was successful:

kubectl get nodes

If the configuration was successful, the output will be similar to the following:

NAME        STATUS    ROLES     AGE    VERSION
your-node   Ready     <none>    2m01s  v1.17.6

Running a Container

Follow these steps to run a container:

  1. Download the YAML deployment configuration file.
  2. Run the following:

    kubectl apply -f selectel-ml.yaml
  3. The command has been accepted:

    deployment.apps/selectel-ml created
  4. Check the container status by running the following:

    kubectl get pod -w
  5. The ContainerCreating status will be displayed first:

    NAME         READY     STATUS              RESTARTS    AGE
    selectel-ml  0/1       ContainerCreating   0           10s

    Press Ctrl+C to exit preview mode.

  6. Wait for the Running status to be displayed. It means that the container is created and running (this may take several minutes):

    selectel-ml	1/1	Running

Running the Jupyter Notebook

Follow these steps to run the Jupyter Notebook:

  1. Open the port to access the service:

    kubectl expose deployment selectel-ml --type=LoadBalancer --name=my-service
  2. The command has been accepted:

    service/my-service exposed
  3. To get the port to connect to the Jupyter server, run the following:

    kubectl get services
  4. The output should be similar to the following:

    NAME       TYPE          CLUSTER-IP      EXTERNAL-IP     PORT(S)         AGE
    my-service LoadBalancer  10.100.90.86    203.0.113.1     8888:31779/TCP  30s
  5. Enter the IP address from EXTERNAL-IP and the port number from PORT(S) in the browser address bar, for example: 203.0.113.1:8888. If EXTERNAL-IP field shows <pending>, run kubectl get services after a few minutes.

  6. Enter the default password: 9lG0eXCevt in the Jupyter Notebook web interface that opens.

See more about changing your password in the Jupyter Notebook documentation.

Managing Containers through the Console

To manage containers through the console, run the following:

kubectl exec -it [pod name] /bin/bash