Deploy Scikit-learn Models to Amazon SageMaker with the SageMaker Python SDK using Script mode

aws
ml
sagemaker
The aim of this notebook is to demonstrate how to train and deploy a scikit-learn model in Amazon SageMaker using script mode.
Published

July 7, 2022

Introduction

You may have trained a model with your favorite ML framework, and now you are asked to move your code to Amazon SageMaker. The good news is that SageMaker’s fully managed training works well with many popular ML frameworks, including scikit-learn. In addition, SageMaker provides its prebuilt container for the scikit-learn framework, enabling us to seamlessly port our scripts to SageMaker and benefit from its training and deployment capabilities. SageMaker’s scikit-learn Container is an open source library for making the scikit-learn framework run on the Amazon SageMaker platform. You can read more about sklearn container features from its GitHub page SageMaker Scikit-learn Container.

Amazon SageMaker also provides open source Python SDK to train and deploy models on SageMaker. SageMaker SDK provides several high-level abstractions (classes), including: * Session Provides a collection of methods for working with SageMaker resources * Estimators Encapsulate training on SageMaker * Predictors Provide real-time inference and transformation using Python data types against a SageMaker endpoint

You can read more on SageMaker Python SDK from its official site Amazon SageMaker Python SDK

This approach of using a custom training script with SageMaker’s prebuilt container is commonly called as Script Mode. To train a scikit-learn model by using the SageMaker Python SDK involves three steps:

  1. Prepare a training script. The training script is similar to any other scikit-learn training script that you might use outside of SageMaker
  2. Create an Estimator object from class sagemaker.sklearn.SKLearn. Scikit-learn estimator class handles end-to-end training and deployment of custom scikit-learn code. We pass our training script to the SKLearn estimator, and it executes the script within a SageMaker Training Job. This training job is an Amazon-built Docker container that runs functions defined in the provided Python script.
  3. Call the Estimator’s fit method on training data. Training is started by calling fit() on this Estimator. After training is complete, calling deploy() creates a hosted SageMaker endpoint and returns a SKLearnPredictor instance that can be used to perform inference against the hosted model. We will discuss the SKLearn Estimator in more detail later in this post.

To read more about using scikit-learn with the SageMaker Python SDK, you may refer to the official documentation using Scikit-learn with the SageMaker Python SDK. The official documentation is valuable, and I would highly recommend checking it and keeping it as a reference.

In this post we will built a scikit-learn RandomForrestClassifier on iris public dataset. There is a similar example in SageMaker documentation. Train a SKLearn Model using Script Mode. But it does not discuss many important aspects of a scikit-learn container and its environment. In this post, we will learn about them and cover all the details of training a scikit-learn model with script mode. I also noted that the example in the documentation uses RandomForrestRegressor on a classification problem which I believe is a mistake.

We have much to cover and learn, so let’s start.

Environment

This notebook is prepared with AWS SageMaker notebook running on ml.t3.medium instance and “conda_python3” kernel.

!aws --version
aws-cli/1.22.97 Python/3.8.12 Linux/5.10.102-99.473.amzn2.x86_64 botocore/1.24.19
!cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
!python3 --version
Python 3.8.12
#collapse-output
!conda env list
# conda environments:
#
base                     /home/ec2-user/anaconda3
JupyterSystemEnv         /home/ec2-user/anaconda3/envs/JupyterSystemEnv
R                        /home/ec2-user/anaconda3/envs/R
amazonei_mxnet_p36       /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36
amazonei_pytorch_latest_p37     /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37
amazonei_tensorflow2_p36     /home/ec2-user/anaconda3/envs/amazonei_tensorflow2_p36
mxnet_p37                /home/ec2-user/anaconda3/envs/mxnet_p37
python3               *  /home/ec2-user/anaconda3/envs/python3
pytorch_p38              /home/ec2-user/anaconda3/envs/pytorch_p38
tensorflow2_p38          /home/ec2-user/anaconda3/envs/tensorflow2_p38

Prepare training and test data

We will use Iris flower dataset. It includes three iris species (Iris setosa, Iris virginica, and Iris versicolor) with 50 samples each. Four features were measured for each sample: the length and the width of the sepals and petals, in centimeters. We can train a model to distinguish the species from each other based on the combination of these four features. You can read more about this dataset at Iris flower data set. The dataset has five columns representing. 1. sepal length in cm 2. sepal width in cm 3. petal length in cm 4. petal width in cm 5. class: Iris Setosa, Iris Versicolour, Iris Virginica

Download and preprocess data

##
# download dataset
import boto3
import pandas as pd
import numpy as np

s3 = boto3.client("s3")
s3.download_file(
    f"sagemaker-sample-files", "datasets/tabular/iris/iris.data", "iris.data"
)

df = pd.read_csv(
    "iris.data",
    header=None,
    names=["sepal_len", "sepal_wid", "petal_len", "petal_wid", "class"],
)
df.head()
sepal_len sepal_wid petal_len petal_wid class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
##
# Convert the three classes from strings to integers in {0,1,2}
df["class_cat"] = df["class"].astype("category").cat.codes
categories_map = dict(enumerate(df["class"].astype("category").cat.categories))
print(categories_map)
df.head()
{0: 'Iris-setosa', 1: 'Iris-versicolor', 2: 'Iris-virginica'}
sepal_len sepal_wid petal_len petal_wid class class_cat
0 5.1 3.5 1.4 0.2 Iris-setosa 0
1 4.9 3.0 1.4 0.2 Iris-setosa 0
2 4.7 3.2 1.3 0.2 Iris-setosa 0
3 4.6 3.1 1.5 0.2 Iris-setosa 0
4 5.0 3.6 1.4 0.2 Iris-setosa 0

Prepare and store train and test sets as CSV files

##
# split the data into train and test set
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2, random_state=42)

print(f"train.shape: {train.shape}")
print(f"test.shape: {test.shape}")
train.shape: (120, 6)
test.shape: (30, 6)

We have our dataset ready. Let’s define a local directory local_path to keep all the files and artifacts related to this post. I will refer to this directory as ‘workspace’.

##
# `local_path` will be the root directory for this post.
local_path = "./datasets/2022-07-07-sagemaker-script-mode"

We have train and test sets ready. Let’s create two more directories in our workspace and store our data in them.

from pathlib import Path

# local paths
local_train_path = local_path + "/train"
local_test_path = local_path + "/test"

# create local directories
Path(local_train_path).mkdir(parents=True, exist_ok=True)
Path(local_test_path).mkdir(parents=True, exist_ok=True)

print("local_train_path: ", local_train_path)
print("local_test_path: ", local_test_path)

# local file names
local_train_file = local_train_path + "/train.csv"
local_test_file = local_test_path + "/test.csv"

# write train and test CSV files
train.to_csv(local_train_file, index=False)
test.to_csv(local_test_file, index=False)

print("local_train_file: ", local_train_file)
print("local_test_file: ", local_test_file)
local_train_path:  ./datasets/2022-07-07-sagemaker-script-mode/train
local_test_path:  ./datasets/2022-07-07-sagemaker-script-mode/test
local_train_file:  ./datasets/2022-07-07-sagemaker-script-mode/train/train.csv
local_test_file:  ./datasets/2022-07-07-sagemaker-script-mode/test/test.csv

Create SageMaker session

import sagemaker

session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = session.default_bucket()
region = session.boto_region_name

print("sagemaker.__version__: ", sagemaker.__version__)
print("Session: ", session)
print("Role: ", role)
print("Bucket: ", bucket)
print("Region: ", region)
sagemaker.__version__:  2.86.2
Session:  <sagemaker.session.Session object at 0x7f80ad720460>
Role:  arn:aws:iam::801598032724:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole
Bucket:  sagemaker-us-east-1-801598032724
Region:  us-east-1

What we have done here is * imported the SageMaker Python SDK into our runtime * get a session to work with SageMaker API and other AWS services * get the execution role associated with the user profile. It is the same profile that is available to the user to work from console UI and has AmazonSageMakerFullAccess policy attached to it. * create or get a default bucket to use and return its name. Default bucket name has the format sagemaker-{region}-{account_id}. If it doesn’t exist then our session will automatically create it. You may also use any other bucket in its place given that you have enough permission for reading and writing. * get the region name attached to our session

Next, we will use this session to upload data to our default bucket.

Upload data to Amazon S3 bucket

##
# You may choose any other prefix for your bucket.
# All the data related to this post will be under this prefix.
bucket_prefix = "2022-07-07-sagemaker-script-mode"

Now upload the data. In the output, we will get the complete path (S3 URI) for our uploaded data.

s3_train_uri = session.upload_data(local_train_file, key_prefix=bucket_prefix + "/data")
s3_test_uri = session.upload_data(local_test_file, key_prefix=bucket_prefix + "/data")

print("s3_train_uri: ", s3_train_uri)
print("s3_test_uri: ", s3_test_uri)
s3_train_uri:  s3://sagemaker-us-east-1-801598032724/2022-07-07-sagemaker-script-mode/data/train.csv
s3_test_uri:  s3://sagemaker-us-east-1-801598032724/2022-07-07-sagemaker-script-mode/data/test.csv

At this point, our data preparation step is complete. Train and test CSV files are available on the local system and in our default Amazon S3 bucket.

Prepare SageMaker local environment

The Amazon SageMaker training environment is managed, but SageMaker Python SDK also supports local mode, allowing you to train and deploy models to your local environment. This is a great way to test training scripts before running them in SageMaker’s managed training or hosting environment.

How SageMaker managed environment works?

When you send a request to SageMaker API (fit or deploy call) * it spins up new instances with the provided specification * loads the algorithm container * pulls the data from S3 * runs the training code * store the results and trained model artifacts to S3 * terminates the new instances

All this happens behind the scenes with a single line of code and is a huge advantage. Spinning up new hardware every time can be good for repeatability and security, but it can add some friction while testing and debugging our code. We can test our code on a small dataset in our local environment with SageMaker local mode and then switch seamlessly to SageMaker managed environment by changing a single line of code.

Steps to prepare Amazon SageMaker local environment

Install the following pre-requisites if you want to set up Amazon SageMaker on your local system. 1. Install required Python packages: pip install boto3 sagemaker pandas scikit-learn pip install 'sagemaker[local]' 2. Docker Desktop installed and running on your computer: docker ps 3. You should have AWS credentials configured on your local machine to be able to pull the docker image from ECR.

Instructions for SageMaker notebook instances

You can also set up SageMaker’s local environment in SageMaker notebook instances. Required Python packages and Docker service is already there. You only need to upgrade the sagemaker[local] Python package.

#collapse_output
# this is required for SageMaker notebook instances
!pip install 'sagemaker[local]' --upgrade
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: sagemaker[local] in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (2.86.2)
Collecting sagemaker[local]
  Downloading sagemaker-2.99.0.tar.gz (542 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 542.7/542.7 KB 10.6 MB/s eta 0:00:0000:01
  Preparing metadata (setup.py) ... done
Requirement already satisfied: attrs<22,>=20.3.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (20.3.0)
Requirement already satisfied: boto3<2.0,>=1.20.21 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.21.42)
Requirement already satisfied: google-pasta in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (0.2.0)
Requirement already satisfied: numpy<2.0,>=1.9.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.20.3)
Requirement already satisfied: protobuf<4.0,>=3.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (3.19.1)
Requirement already satisfied: protobuf3-to-dict<1.0,>=0.1.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (0.1.5)
Requirement already satisfied: smdebug_rulesconfig==1.0.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.0.1)
Requirement already satisfied: importlib-metadata<5.0,>=1.4.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (4.8.2)
Requirement already satisfied: packaging>=20.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (21.3)
Requirement already satisfied: pandas in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.3.4)
Requirement already satisfied: pathos in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (0.2.8)
Requirement already satisfied: urllib3==1.26.8 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.26.8)
Requirement already satisfied: docker-compose==1.29.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.29.2)
Requirement already satisfied: docker~=5.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (5.0.3)
Requirement already satisfied: PyYAML==5.4.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (5.4.1)
Requirement already satisfied: texttable<2,>=0.9.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (1.6.4)
Requirement already satisfied: websocket-client<1,>=0.32.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.59.0)
Requirement already satisfied: docopt<1,>=0.6.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.6.2)
Requirement already satisfied: jsonschema<4,>=2.5.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (3.2.0)
Requirement already satisfied: dockerpty<1,>=0.4.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.4.1)
Requirement already satisfied: distro<2,>=1.5.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (1.7.0)
Requirement already satisfied: python-dotenv<1,>=0.13.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.20.0)
Requirement already satisfied: requests<3,>=2.20.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (2.26.0)
Collecting botocore<1.25.0,>=1.24.42
  Downloading botocore-1.24.46-py3-none-any.whl (8.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 34.3 MB/s eta 0:00:00:00:0100:01
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from boto3<2.0,>=1.20.21->sagemaker[local]) (0.5.2)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from boto3<2.0,>=1.20.21->sagemaker[local]) (0.10.0)
Requirement already satisfied: zipp>=0.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from importlib-metadata<5.0,>=1.4.0->sagemaker[local]) (3.6.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from packaging>=20.0->sagemaker[local]) (3.0.6)
Requirement already satisfied: six in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from protobuf3-to-dict<1.0,>=0.1.5->sagemaker[local]) (1.16.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pandas->sagemaker[local]) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pandas->sagemaker[local]) (2021.3)
Requirement already satisfied: multiprocess>=0.70.12 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (0.70.12.2)
Requirement already satisfied: pox>=0.3.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (0.3.0)
Requirement already satisfied: dill>=0.3.4 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (0.3.4)
Requirement already satisfied: ppft>=1.6.6.4 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (1.6.6.4)
Requirement already satisfied: paramiko>=2.4.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker~=5.0.0->sagemaker[local]) (2.10.3)
Requirement already satisfied: pyrsistent>=0.14.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from jsonschema<4,>=2.5.1->docker-compose==1.29.2->sagemaker[local]) (0.18.0)
Requirement already satisfied: setuptools in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from jsonschema<4,>=2.5.1->docker-compose==1.29.2->sagemaker[local]) (59.4.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from requests<3,>=2.20.0->docker-compose==1.29.2->sagemaker[local]) (2.0.8)
Requirement already satisfied: idna<4,>=2.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from requests<3,>=2.20.0->docker-compose==1.29.2->sagemaker[local]) (3.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from requests<3,>=2.20.0->docker-compose==1.29.2->sagemaker[local]) (2021.10.8)
Requirement already satisfied: pynacl>=1.0.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (1.5.0)
Requirement already satisfied: cryptography>=2.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (36.0.0)
Requirement already satisfied: bcrypt>=3.1.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (3.2.0)
Requirement already satisfied: cffi>=1.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from bcrypt>=3.1.3->paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (1.15.0)
Requirement already satisfied: pycparser in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from cffi>=1.1->bcrypt>=3.1.3->paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (2.21)
Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... done
  Created wheel for sagemaker: filename=sagemaker-2.99.0-py2.py3-none-any.whl size=756462 sha256=309b5159cfb7f5c739c6159b8bf309bfa7ce28d2ca402296e824f3e84bc837c1
  Stored in directory: /home/ec2-user/.cache/pip/wheels/fc/df/14/14b7871f4cf108cfe8891338510d97e28cfe2da00f37114fcf
Successfully built sagemaker
Installing collected packages: botocore, sagemaker
  Attempting uninstall: botocore
    Found existing installation: botocore 1.24.19
    Uninstalling botocore-1.24.19:
      Successfully uninstalled botocore-1.24.19
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.86.2
    Uninstalling sagemaker-2.86.2:
      Successfully uninstalled sagemaker-2.86.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.22.97 requires botocore==1.24.42, but you have botocore 1.24.46 which is incompatible.
aiobotocore 2.0.1 requires botocore<1.22.9,>=1.22.8, but you have botocore 1.24.46 which is incompatible.
Successfully installed botocore-1.24.46 sagemaker-2.99.0
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.

Instructions for SageMaker Studio environment

Note that SageMaker local mode will not work in SageMaker Studio environment as it does not have docker service installed on the provided instances.

Create SageMaker local session

SageMaker local session is required for working in a local environment. Let’s create it.

from sagemaker.local import LocalSession

session_local = LocalSession()
session_local
<sagemaker.local.local_session.LocalSession at 0x7f80ac223910>
##
# configure local session
session_local.config = {"local": {"local_code": True}}

Prepare SageMaker training script

We will call our training script train_and_serve.py and place it in our workspace under the /src folder. Then, we will start with a simple Hello World message code. After that, we will update and complete our training script as we learn more about the SageMaker scikit-learn container environment.

script_file_name = "train_and_serve.py"
script_path = local_path + "/src"
script_file = script_path + "/" + script_file_name

print("script_file_name: ", script_file_name)
print("script_path: ", script_path)
print("script_file: ", script_file)
script_file_name:  train_and_serve.py
script_path:  ./datasets/2022-07-07-sagemaker-script-mode/src
script_file:  ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
##
# make sure that the directory exists
Path(script_path).mkdir(parents=True, exist_ok=True)

Now the training script.

%%writefile $script_file

if __name__ == "__main__":
    print("*** Hello from the SageMaker script mode***")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py

Prepare SageMaker SKLearn estimator

To create SKLearn Estimator object we need to pass it following items * entry_point (str) Path (absolute or relative) to the Python source file, which should be executed as the entry point to training * framework_version (str) Scikit-learn version you want to use for executing your model training code * role (str) An AWS IAM role (either name or full ARN) * instance_type (str) Type of instance to use for training. For local mode use string local * instance_count (int) Number of instances to use for training. Since we will train in the local environment and have a single instance, we will use ‘1’ here

You can read more about the SKLearn Estimator class from the official documentation Scikit Learn Estimator

Let’s find the SKLearn framework version.

import sklearn

print(sklearn.__version__)
1.0.1

Note that version number 1.0.1 has to be provided to the SKLearn estimator class as 1.0-1. Otherwise, you will get the following error message.

ValueError: Unsupported sklearn version: 1.0.1. You may need to upgrade your SDK version (pip install -U sagemaker) for newer sklearn versions. Supported sklearn version(s): 0.20.0, 0.23-1, 1.0-1.

Now let us create the SageMaker SKLearn estimator object and pass our training script to it.

#collapse-output
from sagemaker.sklearn import SKLearn

sk_estimator = SKLearn(
    entry_point=script_file,
    role=role,
    instance_count=1,
    instance_type="local",
    framework_version="1.0-1"
)

sk_estimator.fit()
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Creating fvm7gkf0bq-algo-1-ju7k8 ... 
Creating fvm7gkf0bq-algo-1-ju7k8 ... done
Attaching to fvm7gkf0bq-algo-1-ju7k8
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,041 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,045 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,054 sagemaker_sklearn_container.training INFO     Invoking user training script.
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,272 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,284 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,297 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,306 sagemaker-training-toolkit INFO     Invoking user script
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | Training Env:
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | {
fvm7gkf0bq-algo-1-ju7k8 |     "additional_framework_parameters": {},
fvm7gkf0bq-algo-1-ju7k8 |     "channel_input_dirs": {},
fvm7gkf0bq-algo-1-ju7k8 |     "current_host": "algo-1-ju7k8",
fvm7gkf0bq-algo-1-ju7k8 |     "framework_module": "sagemaker_sklearn_container.training:main",
fvm7gkf0bq-algo-1-ju7k8 |     "hosts": [
fvm7gkf0bq-algo-1-ju7k8 |         "algo-1-ju7k8"
fvm7gkf0bq-algo-1-ju7k8 |     ],
fvm7gkf0bq-algo-1-ju7k8 |     "hyperparameters": {},
fvm7gkf0bq-algo-1-ju7k8 |     "input_config_dir": "/opt/ml/input/config",
fvm7gkf0bq-algo-1-ju7k8 |     "input_data_config": {},
fvm7gkf0bq-algo-1-ju7k8 |     "input_dir": "/opt/ml/input",
fvm7gkf0bq-algo-1-ju7k8 |     "is_master": true,
fvm7gkf0bq-algo-1-ju7k8 |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-22-17-814",
fvm7gkf0bq-algo-1-ju7k8 |     "log_level": 20,
fvm7gkf0bq-algo-1-ju7k8 |     "master_hostname": "algo-1-ju7k8",
fvm7gkf0bq-algo-1-ju7k8 |     "model_dir": "/opt/ml/model",
fvm7gkf0bq-algo-1-ju7k8 |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-22-17-814/source/sourcedir.tar.gz",
fvm7gkf0bq-algo-1-ju7k8 |     "module_name": "train_and_serve",
fvm7gkf0bq-algo-1-ju7k8 |     "network_interface_name": "eth0",
fvm7gkf0bq-algo-1-ju7k8 |     "num_cpus": 2,
fvm7gkf0bq-algo-1-ju7k8 |     "num_gpus": 0,
fvm7gkf0bq-algo-1-ju7k8 |     "output_data_dir": "/opt/ml/output/data",
fvm7gkf0bq-algo-1-ju7k8 |     "output_dir": "/opt/ml/output",
fvm7gkf0bq-algo-1-ju7k8 |     "output_intermediate_dir": "/opt/ml/output/intermediate",
fvm7gkf0bq-algo-1-ju7k8 |     "resource_config": {
fvm7gkf0bq-algo-1-ju7k8 |         "current_host": "algo-1-ju7k8",
fvm7gkf0bq-algo-1-ju7k8 |         "hosts": [
fvm7gkf0bq-algo-1-ju7k8 |             "algo-1-ju7k8"
fvm7gkf0bq-algo-1-ju7k8 |         ]
fvm7gkf0bq-algo-1-ju7k8 |     },
fvm7gkf0bq-algo-1-ju7k8 |     "user_entry_point": "train_and_serve.py"
fvm7gkf0bq-algo-1-ju7k8 | }
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | Environment variables:
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | SM_HOSTS=["algo-1-ju7k8"]
fvm7gkf0bq-algo-1-ju7k8 | SM_NETWORK_INTERFACE_NAME=eth0
fvm7gkf0bq-algo-1-ju7k8 | SM_HPS={}
fvm7gkf0bq-algo-1-ju7k8 | SM_USER_ENTRY_POINT=train_and_serve.py
fvm7gkf0bq-algo-1-ju7k8 | SM_FRAMEWORK_PARAMS={}
fvm7gkf0bq-algo-1-ju7k8 | SM_RESOURCE_CONFIG={"current_host":"algo-1-ju7k8","hosts":["algo-1-ju7k8"]}
fvm7gkf0bq-algo-1-ju7k8 | SM_INPUT_DATA_CONFIG={}
fvm7gkf0bq-algo-1-ju7k8 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
fvm7gkf0bq-algo-1-ju7k8 | SM_CHANNELS=[]
fvm7gkf0bq-algo-1-ju7k8 | SM_CURRENT_HOST=algo-1-ju7k8
fvm7gkf0bq-algo-1-ju7k8 | SM_MODULE_NAME=train_and_serve
fvm7gkf0bq-algo-1-ju7k8 | SM_LOG_LEVEL=20
fvm7gkf0bq-algo-1-ju7k8 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
fvm7gkf0bq-algo-1-ju7k8 | SM_INPUT_DIR=/opt/ml/input
fvm7gkf0bq-algo-1-ju7k8 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
fvm7gkf0bq-algo-1-ju7k8 | SM_OUTPUT_DIR=/opt/ml/output
fvm7gkf0bq-algo-1-ju7k8 | SM_NUM_CPUS=2
fvm7gkf0bq-algo-1-ju7k8 | SM_NUM_GPUS=0
fvm7gkf0bq-algo-1-ju7k8 | SM_MODEL_DIR=/opt/ml/model
fvm7gkf0bq-algo-1-ju7k8 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-22-17-814/source/sourcedir.tar.gz
fvm7gkf0bq-algo-1-ju7k8 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{},"current_host":"algo-1-ju7k8","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-ju7k8"],"hyperparameters":{},"input_config_dir":"/opt/ml/input/config","input_data_config":{},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-22-17-814","log_level":20,"master_hostname":"algo-1-ju7k8","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-22-17-814/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-ju7k8","hosts":["algo-1-ju7k8"]},"user_entry_point":"train_and_serve.py"}
fvm7gkf0bq-algo-1-ju7k8 | SM_USER_ARGS=[]
fvm7gkf0bq-algo-1-ju7k8 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
fvm7gkf0bq-algo-1-ju7k8 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | Invoking script with the following command:
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | /miniconda3/bin/python train_and_serve.py
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | 
fvm7gkf0bq-algo-1-ju7k8 | *** Hello from the SageMaker script mode***
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,332 sagemaker-containers INFO     Reporting training SUCCESS
fvm7gkf0bq-algo-1-ju7k8 exited with code 0
Aborting on container exit...
===== Job Complete =====
##
# The estimator will pick a local session when we use instance_type='local'
sk_estimator.sagemaker_session
<sagemaker.local.local_session.LocalSession at 0x7f80ac53da90>

When you first run the SKLearn estimator, executing it may take some time as it has to download the scikit-learn container to the local docker environment. You will get the container logs in the output when the container completes the execution. The logs show that the container has successfully run the training script, and the hello message is also printed. But there is a lot more information available in the logs. We will discuss it in the coming section.

sklearn-output-1

Understanding SKLearn container output and environment varaibles

From the SKLearn estimator output, we can see that our train_and_serve.py script is executed by the container with the following command.

/miniconda3/bin/python train_and_serve.py

Inspecting SageMaker SKLearn docker image

Since the container was executed in the local environment, we can also inspect the SageMaker SKLearn local image.

!docker images
REPOSITORY                                                            TAG             IMAGE ID       CREATED       SIZE
683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn   1.0-1-cpu-py3   8a6ea8272ad0   10 days ago   3.7GB

Let’s also inspect the docker image. Notice multiple container environment variables and their default values in the output.

#collapse-output
!docker inspect 8a6ea8272ad0
[
    {
        "Id": "sha256:8a6ea8272ad003ec816569b0f879b16c770116584301161565f065aadb99436c",
        "RepoTags": [
            "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.0-1-cpu-py3"
        ],
        "RepoDigests": [
            "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn@sha256:fc8c3a617ff0e436c25f3b64d03e1f485f1d159478c26757f3d1d267fc849445"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2022-07-06T18:55:02.854297671Z",
        "Container": "11b9a5fec2d61294aee63e549100ed18ceb7aa0de6a4ff198da2f556dfe3ec2f",
        "ContainerConfig": {
            "Hostname": "11b9a5fec2d6",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "8080/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "PYTHONDONTWRITEBYTECODE=1",
                "PYTHONUNBUFFERED=1",
                "PYTHONIOENCODING=UTF-8",
                "LANG=C.UTF-8",
                "LC_ALL=C.UTF-8",
                "SAGEMAKER_SKLEARN_VERSION=1.0-1",
                "SAGEMAKER_TRAINING_MODULE=sagemaker_sklearn_container.training:main",
                "SAGEMAKER_SERVING_MODULE=sagemaker_sklearn_container.serving:main",
                "SKLEARN_MMS_CONFIG=/home/model-server/config.properties",
                "SM_INPUT=/opt/ml/input",
                "SM_INPUT_TRAINING_CONFIG_FILE=/opt/ml/input/config/hyperparameters.json",
                "SM_INPUT_DATA_CONFIG_FILE=/opt/ml/input/config/inputdataconfig.json",
                "SM_CHECKPOINT_CONFIG_FILE=/opt/ml/input/config/checkpointconfig.json",
                "SM_MODEL_DIR=/opt/ml/model",
                "TEMP=/home/model-server/tmp"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) ",
                "LABEL transform_id=9be8b540-703b-4ecd-a127-c37333a0dcec_sagemaker-scikit-learn-1_0"
            ],
            "Image": "sha256:58b15b990d550868caed6f885423deee97a6c7f525c228a043096bf28e775d18",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "TRANSFORM_TYPE": "Aggregate-1.0",
                "VERSION_SET_NAME": "SMFrameworksSKLearn/release-cdk",
                "VERSION_SET_REVISION": "6086988568",
                "com.amazonaws.sagemaker.capabilities.accept-bind-to-port": "true",
                "com.amazonaws.sagemaker.capabilities.multi-models": "true",
                "transform_id": "9be8b540-703b-4ecd-a127-c37333a0dcec_sagemaker-scikit-learn-1_0"
            }
        },
        "DockerVersion": "20.10.15",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "8080/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "PYTHONDONTWRITEBYTECODE=1",
                "PYTHONUNBUFFERED=1",
                "PYTHONIOENCODING=UTF-8",
                "LANG=C.UTF-8",
                "LC_ALL=C.UTF-8",
                "SAGEMAKER_SKLEARN_VERSION=1.0-1",
                "SAGEMAKER_TRAINING_MODULE=sagemaker_sklearn_container.training:main",
                "SAGEMAKER_SERVING_MODULE=sagemaker_sklearn_container.serving:main",
                "SKLEARN_MMS_CONFIG=/home/model-server/config.properties",
                "SM_INPUT=/opt/ml/input",
                "SM_INPUT_TRAINING_CONFIG_FILE=/opt/ml/input/config/hyperparameters.json",
                "SM_INPUT_DATA_CONFIG_FILE=/opt/ml/input/config/inputdataconfig.json",
                "SM_CHECKPOINT_CONFIG_FILE=/opt/ml/input/config/checkpointconfig.json",
                "SM_MODEL_DIR=/opt/ml/model",
                "TEMP=/home/model-server/tmp"
            ],
            "Cmd": [
                "bash"
            ],
            "Image": "sha256:58b15b990d550868caed6f885423deee97a6c7f525c228a043096bf28e775d18",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "TRANSFORM_TYPE": "Aggregate-1.0",
                "VERSION_SET_NAME": "SMFrameworksSKLearn/release-cdk",
                "VERSION_SET_REVISION": "6086988568",
                "com.amazonaws.sagemaker.capabilities.accept-bind-to-port": "true",
                "com.amazonaws.sagemaker.capabilities.multi-models": "true",
                "transform_id": "9be8b540-703b-4ecd-a127-c37333a0dcec_sagemaker-scikit-learn-1_0"
            }
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 3699696670,
        "VirtualSize": 3699696670,
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/01a97258168fa360e9f6aa63ac0c6b2417c0ea0ebe888123edad87eb4a646765/diff:/var/lib/docker/overlay2/3b85b71e8fe52c7a27ae71ed492ff72c7e430cccdeea17046e2a361e8d7fd960/diff:/var/lib/docker/overlay2/7de8e16dd696c868ffd028a3ba1f1a80ef04237b9323229e578bc5e3aa6a29d7/diff:/var/lib/docker/overlay2/5eeb27014ab7ac7a894efdbb166d8a87fb9d4b8b739eccd82546ad6a2b53aa70/diff:/var/lib/docker/overlay2/bbd9a81a7aa5bf4c79e81ecf47670a3f8c098eee9c6682f36f88ec52db8e1946/diff:/var/lib/docker/overlay2/eb0e7f3a5bd45c1d611e4c37ba641d1e978043954312da5908fd4003c41c7e7d/diff:/var/lib/docker/overlay2/3daaedc78711e353befc51544a944ad35954327325d056094f445502bf65ce53/diff:/var/lib/docker/overlay2/9dd41e3edfb9d8f852732a968a7b179ca811e0f9d55614a0b193de753fc6aca0/diff:/var/lib/docker/overlay2/ede189a574c79eebc565041a44ebf8b586247a36a99fe3ff9588b8c940783498/diff:/var/lib/docker/overlay2/6b1d78a9c074a42d78650406b90b7b4f51eb31660a7b1e2dcc6d73cc43d29b6b/diff:/var/lib/docker/overlay2/3e0420f6740f876c9355d526cbdedd9ebde5be94ddf0d93d7dadd4f34cae351b/diff:/var/lib/docker/overlay2/de1a2da7ee1b5d9a1b4e5c3dd1adff213185dde7e1212db96c0435e512f50701/diff:/var/lib/docker/overlay2/bebca69aef394f0553634413c7875eb58228c7e6359a305a7501705e75c2b58b/diff:/var/lib/docker/overlay2/8a410db2a038a175ee6ddfb005383f8776c80b1b1901f5d2feedfc8d837ffa40/diff:/var/lib/docker/overlay2/6f6686a8cb3ccf47b214854717cbe33ba777e0985200e3d7b7f761f99231b274/diff:/var/lib/docker/overlay2/ad8b24fa9173d28a83284e4f31d830f1b3d9fe30a3fcc8cbb37895ec2fded7bf/diff:/var/lib/docker/overlay2/e8b0842f0da5b0dbb5076e350bfe1a70ef291546bbbf207fe1f90ae7ccd64517/diff",
                "MergedDir": "/var/lib/docker/overlay2/632d2d4d01646bd8be2ec147edc70eb44f59fb262aa12b217fd560c464edd4cb/merged",
                "UpperDir": "/var/lib/docker/overlay2/632d2d4d01646bd8be2ec147edc70eb44f59fb262aa12b217fd560c464edd4cb/diff",
                "WorkDir": "/var/lib/docker/overlay2/632d2d4d01646bd8be2ec147edc70eb44f59fb262aa12b217fd560c464edd4cb/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:1dc52a6b4de8561423dd3ec5a1f7f77f5309fd8cb340f80b8bc3d87fa112003e",
                "sha256:b13a10ce059365d68a2113e9dbcac05b17b51f181615fca6d717a0dcf9ba8ffb",
                "sha256:790d00cf365a312488151b354f0b0ae826be031edffb8a4de6a1fab048774dc7",
                "sha256:323e43c53a1cd5abbd55437588f19da04f716452bc6d05486759b35f3e485390",
                "sha256:c99c9d462af0bac5511ed046178ab0de79b8cdad33cd85246e9f661e098426cd",
                "sha256:4a3a4d9fb4d250b1b64629b23bc0a477a45ee2659a8410d59a31a181dad70002",
                "sha256:27b35f432a27e5e275038e559ebbe1aa7e91447bf417f5da01e3326739ba9366",
                "sha256:ee12325fe0b7e7930b76d9a3dc81fcc37fa51a3267b311d2ed7c38703f193d75",
                "sha256:7ceb40593535cdc07299efa2ce3a2c2267c2fa683161515fd6ab97f733492bf0",
                "sha256:f18dbe0eec054f0aedf54a94aa29dab0d2c0f3d920fb482c99819622b0094f47",
                "sha256:df2a7845ea611463f9f3282ccb45156ba883f40b15013ee49bd0a569301738d8",
                "sha256:bcbd5416b87e3e37e05c22e46cbff2e3503d9caa0ec283a44931dc63e51c8cb7",
                "sha256:5bcbb3ccae766c8a72d98ce494500bfd44c32e5780a1cb153139a4c5c143a8d5",
                "sha256:4ecc8a8ffa902f3ea9bebb8d610e02a32ce1ca94c1a3160a31da98b73c1f55a0",
                "sha256:a7a7b8b26735eb2d137fd0f91b83c73ad48cf2c4b83e9d0cadece410d6e598ba",
                "sha256:ae939a0c9d32674ad6674947853ecfda4ff0530a8137960064448ae5e45fa1c5",
                "sha256:6948f39c8f3cf6ec104734ccd1112fcb4af85a7c26c9c3d43495494b9b799f25",
                "sha256:affd18c8e88f35e75bd02158e0418f3aeb4eec4269a208ede24cc829fa88c850"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

Pass hyperparameters to SKLearn estimator

Let’s pass some dummy hyperparameters to the estimator and see how it affects the output.

#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file,
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)

sk_estimator.fit()
Creating kc4ahx6e84-algo-1-8m8ve ... 
Creating kc4ahx6e84-algo-1-8m8ve ... done
Attaching to kc4ahx6e84-algo-1-8m8ve
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,385 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,389 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,398 sagemaker_sklearn_container.training INFO     Invoking user training script.
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,595 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,608 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,621 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,630 sagemaker-training-toolkit INFO     Invoking user script
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | Training Env:
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | {
kc4ahx6e84-algo-1-8m8ve |     "additional_framework_parameters": {},
kc4ahx6e84-algo-1-8m8ve |     "channel_input_dirs": {},
kc4ahx6e84-algo-1-8m8ve |     "current_host": "algo-1-8m8ve",
kc4ahx6e84-algo-1-8m8ve |     "framework_module": "sagemaker_sklearn_container.training:main",
kc4ahx6e84-algo-1-8m8ve |     "hosts": [
kc4ahx6e84-algo-1-8m8ve |         "algo-1-8m8ve"
kc4ahx6e84-algo-1-8m8ve |     ],
kc4ahx6e84-algo-1-8m8ve |     "hyperparameters": {
kc4ahx6e84-algo-1-8m8ve |         "dummy_param_1": "val1",
kc4ahx6e84-algo-1-8m8ve |         "dummy_param_2": "val2"
kc4ahx6e84-algo-1-8m8ve |     },
kc4ahx6e84-algo-1-8m8ve |     "input_config_dir": "/opt/ml/input/config",
kc4ahx6e84-algo-1-8m8ve |     "input_data_config": {},
kc4ahx6e84-algo-1-8m8ve |     "input_dir": "/opt/ml/input",
kc4ahx6e84-algo-1-8m8ve |     "is_master": true,
kc4ahx6e84-algo-1-8m8ve |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-44-284",
kc4ahx6e84-algo-1-8m8ve |     "log_level": 20,
kc4ahx6e84-algo-1-8m8ve |     "master_hostname": "algo-1-8m8ve",
kc4ahx6e84-algo-1-8m8ve |     "model_dir": "/opt/ml/model",
kc4ahx6e84-algo-1-8m8ve |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-44-284/source/sourcedir.tar.gz",
kc4ahx6e84-algo-1-8m8ve |     "module_name": "train_and_serve",
kc4ahx6e84-algo-1-8m8ve |     "network_interface_name": "eth0",
kc4ahx6e84-algo-1-8m8ve |     "num_cpus": 2,
kc4ahx6e84-algo-1-8m8ve |     "num_gpus": 0,
kc4ahx6e84-algo-1-8m8ve |     "output_data_dir": "/opt/ml/output/data",
kc4ahx6e84-algo-1-8m8ve |     "output_dir": "/opt/ml/output",
kc4ahx6e84-algo-1-8m8ve |     "output_intermediate_dir": "/opt/ml/output/intermediate",
kc4ahx6e84-algo-1-8m8ve |     "resource_config": {
kc4ahx6e84-algo-1-8m8ve |         "current_host": "algo-1-8m8ve",
kc4ahx6e84-algo-1-8m8ve |         "hosts": [
kc4ahx6e84-algo-1-8m8ve |             "algo-1-8m8ve"
kc4ahx6e84-algo-1-8m8ve |         ]
kc4ahx6e84-algo-1-8m8ve |     },
kc4ahx6e84-algo-1-8m8ve |     "user_entry_point": "train_and_serve.py"
kc4ahx6e84-algo-1-8m8ve | }
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | Environment variables:
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | SM_HOSTS=["algo-1-8m8ve"]
kc4ahx6e84-algo-1-8m8ve | SM_NETWORK_INTERFACE_NAME=eth0
kc4ahx6e84-algo-1-8m8ve | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
kc4ahx6e84-algo-1-8m8ve | SM_USER_ENTRY_POINT=train_and_serve.py
kc4ahx6e84-algo-1-8m8ve | SM_FRAMEWORK_PARAMS={}
kc4ahx6e84-algo-1-8m8ve | SM_RESOURCE_CONFIG={"current_host":"algo-1-8m8ve","hosts":["algo-1-8m8ve"]}
kc4ahx6e84-algo-1-8m8ve | SM_INPUT_DATA_CONFIG={}
kc4ahx6e84-algo-1-8m8ve | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
kc4ahx6e84-algo-1-8m8ve | SM_CHANNELS=[]
kc4ahx6e84-algo-1-8m8ve | SM_CURRENT_HOST=algo-1-8m8ve
kc4ahx6e84-algo-1-8m8ve | SM_MODULE_NAME=train_and_serve
kc4ahx6e84-algo-1-8m8ve | SM_LOG_LEVEL=20
kc4ahx6e84-algo-1-8m8ve | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
kc4ahx6e84-algo-1-8m8ve | SM_INPUT_DIR=/opt/ml/input
kc4ahx6e84-algo-1-8m8ve | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
kc4ahx6e84-algo-1-8m8ve | SM_OUTPUT_DIR=/opt/ml/output
kc4ahx6e84-algo-1-8m8ve | SM_NUM_CPUS=2
kc4ahx6e84-algo-1-8m8ve | SM_NUM_GPUS=0
kc4ahx6e84-algo-1-8m8ve | SM_MODEL_DIR=/opt/ml/model
kc4ahx6e84-algo-1-8m8ve | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-44-284/source/sourcedir.tar.gz
kc4ahx6e84-algo-1-8m8ve | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{},"current_host":"algo-1-8m8ve","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-8m8ve"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-44-284","log_level":20,"master_hostname":"algo-1-8m8ve","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-44-284/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-8m8ve","hosts":["algo-1-8m8ve"]},"user_entry_point":"train_and_serve.py"}
kc4ahx6e84-algo-1-8m8ve | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
kc4ahx6e84-algo-1-8m8ve | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
kc4ahx6e84-algo-1-8m8ve | SM_HP_DUMMY_PARAM_1=val1
kc4ahx6e84-algo-1-8m8ve | SM_HP_DUMMY_PARAM_2=val2
kc4ahx6e84-algo-1-8m8ve | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | Invoking script with the following command:
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | 
kc4ahx6e84-algo-1-8m8ve | *** Hello from the SageMaker script mode***
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,657 sagemaker-containers INFO     Reporting training SUCCESS
kc4ahx6e84-algo-1-8m8ve exited with code 0
Aborting on container exit...
===== Job Complete =====

sklearn-output-hyperparams

From the output we can see that our hyperparameters are passed to our training script as command line arguments. This is an important point and we will update our script using this information.

SageMaker SKLearn container environment variables

Let’s now discuss some important environment variables we see in the output.

SM_MODULE_DIR

SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-13-13-05-48-675/source/sourcedir.tar.gz

SM_MODULE_DIR points to a location in the S3 bucket where SageMaker will automatically backup our source code for that particular run. SageMaker will create a separate folder in the default bucket for each new run. The default value is s3://sagemaker-{aws-region}-{aws-id}/{training-job-name}/source/sourcedir.tar.gz

Note: We have used local_code for the SKLean estimator, then why is the source code backed up on the S3 bucket. Should it not be backed on the local system and bypass S3 altogether in local mode? Well, this should have been the default behavior, but it looks like SageMaker SDK is doing it otherwise, and even with the local mode it is using the S3 bucket for keeping source code. You can read more about this behavior in this issue ticket Model repack always uploads data to S3 bucket regardless of local mode settings

SM_MODEL_DIR

SM_MODEL_DIR=/opt/ml/model

SM_MODEL_DIR points to a directory located inside the container. When the training job finishes, the container and its file system will be deleted, except for the /opt/ml/model and /opt/ml/output directories. Use /opt/ml/model to save the trained model artifacts. These artifacts are uploaded to S3 for model hosting.

SM_OUTPUT_DATA_DIR

SM_OUTPUT_DIR=/opt/ml/output

SM_OUTPUT_DIR points to a directory in the container to write output artifacts. Output artifacts may include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed and uploaded to S3 to the same S3 prefix as the model artifacts.

SM_CHANNELS

SM_CHANNELS='["testing","training"]'

A channel is a named input source that training algorithms can consume. You can partition your training data into different logical “channels” when you run training. Depending on your problem, some common channel ideas are: “training”, “testing”, “evaluation” or “images” and “labels”. You can read more about the channels from SageMaker API reference Channel

SM CHANNEL {channel_name}

SM_CHANNEL_TRAIN='/opt/ml/input/data/train'
SM_CHANNEL_TEST='/opt/ml/input/data/test'

Suppose that you have passed two input channels, ‘train’ and ‘test’, to the Scikit-learn estimator’s fit() method, the following will be set, following the format SM_CHANNEL_[channel_name]: * SM_CHANNEL_TRAIN: it points to the directory in the container that has the train channel data downloaded * SM_CHANNEL_TEST: Same as above, but for the test channel

Note that the channel names train and test are the conventions. Still, you can use any name here, and the environment variables will be created accordingly. It is important to know that the SageMaker container automatically downloads the data from the provided input channels and makes them available in the respective local directories once it starts executing. The training script can then load the data from the local container directories.

There are more environment variables available, and you can read about them from Environment variables

Pass input channel to SKLearn estimator

Now that we understand the SKLearn container environment more let’s pass the training data channel to the estimator and see if the data becomes available inside the container directory.

Update our script to list all the files in the SM_CHANNEL_TRAIN directory.

%%writefile $script_file
import argparse, os, sys

if __name__ == "__main__":
    print(" *** Hello from SageMaker script container *** ")

    training_dir = os.environ.get("SM_CHANNEL_TRAIN")
    dir_list = os.listdir(training_dir)

    print("training_dir files list: ", dir_list)
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file,
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)

sk_estimator.fit({"train": f"file://{local_train_path}"})
Creating wp2g5fxyg1-algo-1-o05g1 ... 
Creating wp2g5fxyg1-algo-1-o05g1 ... done
Attaching to wp2g5fxyg1-algo-1-o05g1
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,444 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,447 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,456 sagemaker_sklearn_container.training INFO     Invoking user training script.
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,638 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,653 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,667 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,676 sagemaker-training-toolkit INFO     Invoking user script
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | Training Env:
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | {
wp2g5fxyg1-algo-1-o05g1 |     "additional_framework_parameters": {},
wp2g5fxyg1-algo-1-o05g1 |     "channel_input_dirs": {
wp2g5fxyg1-algo-1-o05g1 |         "train": "/opt/ml/input/data/train"
wp2g5fxyg1-algo-1-o05g1 |     },
wp2g5fxyg1-algo-1-o05g1 |     "current_host": "algo-1-o05g1",
wp2g5fxyg1-algo-1-o05g1 |     "framework_module": "sagemaker_sklearn_container.training:main",
wp2g5fxyg1-algo-1-o05g1 |     "hosts": [
wp2g5fxyg1-algo-1-o05g1 |         "algo-1-o05g1"
wp2g5fxyg1-algo-1-o05g1 |     ],
wp2g5fxyg1-algo-1-o05g1 |     "hyperparameters": {
wp2g5fxyg1-algo-1-o05g1 |         "dummy_param_1": "val1",
wp2g5fxyg1-algo-1-o05g1 |         "dummy_param_2": "val2"
wp2g5fxyg1-algo-1-o05g1 |     },
wp2g5fxyg1-algo-1-o05g1 |     "input_config_dir": "/opt/ml/input/config",
wp2g5fxyg1-algo-1-o05g1 |     "input_data_config": {
wp2g5fxyg1-algo-1-o05g1 |         "train": {
wp2g5fxyg1-algo-1-o05g1 |             "TrainingInputMode": "File"
wp2g5fxyg1-algo-1-o05g1 |         }
wp2g5fxyg1-algo-1-o05g1 |     },
wp2g5fxyg1-algo-1-o05g1 |     "input_dir": "/opt/ml/input",
wp2g5fxyg1-algo-1-o05g1 |     "is_master": true,
wp2g5fxyg1-algo-1-o05g1 |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-47-051",
wp2g5fxyg1-algo-1-o05g1 |     "log_level": 20,
wp2g5fxyg1-algo-1-o05g1 |     "master_hostname": "algo-1-o05g1",
wp2g5fxyg1-algo-1-o05g1 |     "model_dir": "/opt/ml/model",
wp2g5fxyg1-algo-1-o05g1 |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-47-051/source/sourcedir.tar.gz",
wp2g5fxyg1-algo-1-o05g1 |     "module_name": "train_and_serve",
wp2g5fxyg1-algo-1-o05g1 |     "network_interface_name": "eth0",
wp2g5fxyg1-algo-1-o05g1 |     "num_cpus": 2,
wp2g5fxyg1-algo-1-o05g1 |     "num_gpus": 0,
wp2g5fxyg1-algo-1-o05g1 |     "output_data_dir": "/opt/ml/output/data",
wp2g5fxyg1-algo-1-o05g1 |     "output_dir": "/opt/ml/output",
wp2g5fxyg1-algo-1-o05g1 |     "output_intermediate_dir": "/opt/ml/output/intermediate",
wp2g5fxyg1-algo-1-o05g1 |     "resource_config": {
wp2g5fxyg1-algo-1-o05g1 |         "current_host": "algo-1-o05g1",
wp2g5fxyg1-algo-1-o05g1 |         "hosts": [
wp2g5fxyg1-algo-1-o05g1 |             "algo-1-o05g1"
wp2g5fxyg1-algo-1-o05g1 |         ]
wp2g5fxyg1-algo-1-o05g1 |     },
wp2g5fxyg1-algo-1-o05g1 |     "user_entry_point": "train_and_serve.py"
wp2g5fxyg1-algo-1-o05g1 | }
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | Environment variables:
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | SM_HOSTS=["algo-1-o05g1"]
wp2g5fxyg1-algo-1-o05g1 | SM_NETWORK_INTERFACE_NAME=eth0
wp2g5fxyg1-algo-1-o05g1 | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
wp2g5fxyg1-algo-1-o05g1 | SM_USER_ENTRY_POINT=train_and_serve.py
wp2g5fxyg1-algo-1-o05g1 | SM_FRAMEWORK_PARAMS={}
wp2g5fxyg1-algo-1-o05g1 | SM_RESOURCE_CONFIG={"current_host":"algo-1-o05g1","hosts":["algo-1-o05g1"]}
wp2g5fxyg1-algo-1-o05g1 | SM_INPUT_DATA_CONFIG={"train":{"TrainingInputMode":"File"}}
wp2g5fxyg1-algo-1-o05g1 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
wp2g5fxyg1-algo-1-o05g1 | SM_CHANNELS=["train"]
wp2g5fxyg1-algo-1-o05g1 | SM_CURRENT_HOST=algo-1-o05g1
wp2g5fxyg1-algo-1-o05g1 | SM_MODULE_NAME=train_and_serve
wp2g5fxyg1-algo-1-o05g1 | SM_LOG_LEVEL=20
wp2g5fxyg1-algo-1-o05g1 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
wp2g5fxyg1-algo-1-o05g1 | SM_INPUT_DIR=/opt/ml/input
wp2g5fxyg1-algo-1-o05g1 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
wp2g5fxyg1-algo-1-o05g1 | SM_OUTPUT_DIR=/opt/ml/output
wp2g5fxyg1-algo-1-o05g1 | SM_NUM_CPUS=2
wp2g5fxyg1-algo-1-o05g1 | SM_NUM_GPUS=0
wp2g5fxyg1-algo-1-o05g1 | SM_MODEL_DIR=/opt/ml/model
wp2g5fxyg1-algo-1-o05g1 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-47-051/source/sourcedir.tar.gz
wp2g5fxyg1-algo-1-o05g1 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1-o05g1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-o05g1"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-47-051","log_level":20,"master_hostname":"algo-1-o05g1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-47-051/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-o05g1","hosts":["algo-1-o05g1"]},"user_entry_point":"train_and_serve.py"}
wp2g5fxyg1-algo-1-o05g1 | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
wp2g5fxyg1-algo-1-o05g1 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
wp2g5fxyg1-algo-1-o05g1 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
wp2g5fxyg1-algo-1-o05g1 | SM_HP_DUMMY_PARAM_1=val1
wp2g5fxyg1-algo-1-o05g1 | SM_HP_DUMMY_PARAM_2=val2
wp2g5fxyg1-algo-1-o05g1 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | Invoking script with the following command:
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 | 
wp2g5fxyg1-algo-1-o05g1 |  *** Hello from SageMaker script container *** 
wp2g5fxyg1-algo-1-o05g1 | training_dir files list:  ['train.csv']
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,715 sagemaker-containers INFO     Reporting training SUCCESS
wp2g5fxyg1-algo-1-o05g1 exited with code 0
Aborting on container exit...
===== Job Complete =====

sklearn-output-traincsv

From the output, we can see that train.csv, which was in our local environment, is now available inside the container on path SM_CHANNEL_TRAIN=/opt/ml/input/data/train.

Let’s also test the same with our training data on the S3 bucket.

#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file,
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)

sk_estimator.fit({"train": s3_train_uri})
Creating 7ao431iiu5-algo-1-9jid1 ... 
Creating 7ao431iiu5-algo-1-9jid1 ... done
Attaching to 7ao431iiu5-algo-1-9jid1
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,073 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,079 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,094 sagemaker_sklearn_container.training INFO     Invoking user training script.
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,335 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,348 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,360 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,369 sagemaker-training-toolkit INFO     Invoking user script
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | Training Env:
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | {
7ao431iiu5-algo-1-9jid1 |     "additional_framework_parameters": {},
7ao431iiu5-algo-1-9jid1 |     "channel_input_dirs": {
7ao431iiu5-algo-1-9jid1 |         "train": "/opt/ml/input/data/train"
7ao431iiu5-algo-1-9jid1 |     },
7ao431iiu5-algo-1-9jid1 |     "current_host": "algo-1-9jid1",
7ao431iiu5-algo-1-9jid1 |     "framework_module": "sagemaker_sklearn_container.training:main",
7ao431iiu5-algo-1-9jid1 |     "hosts": [
7ao431iiu5-algo-1-9jid1 |         "algo-1-9jid1"
7ao431iiu5-algo-1-9jid1 |     ],
7ao431iiu5-algo-1-9jid1 |     "hyperparameters": {
7ao431iiu5-algo-1-9jid1 |         "dummy_param_1": "val1",
7ao431iiu5-algo-1-9jid1 |         "dummy_param_2": "val2"
7ao431iiu5-algo-1-9jid1 |     },
7ao431iiu5-algo-1-9jid1 |     "input_config_dir": "/opt/ml/input/config",
7ao431iiu5-algo-1-9jid1 |     "input_data_config": {
7ao431iiu5-algo-1-9jid1 |         "train": {
7ao431iiu5-algo-1-9jid1 |             "TrainingInputMode": "File"
7ao431iiu5-algo-1-9jid1 |         }
7ao431iiu5-algo-1-9jid1 |     },
7ao431iiu5-algo-1-9jid1 |     "input_dir": "/opt/ml/input",
7ao431iiu5-algo-1-9jid1 |     "is_master": true,
7ao431iiu5-algo-1-9jid1 |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-50-077",
7ao431iiu5-algo-1-9jid1 |     "log_level": 20,
7ao431iiu5-algo-1-9jid1 |     "master_hostname": "algo-1-9jid1",
7ao431iiu5-algo-1-9jid1 |     "model_dir": "/opt/ml/model",
7ao431iiu5-algo-1-9jid1 |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-50-077/source/sourcedir.tar.gz",
7ao431iiu5-algo-1-9jid1 |     "module_name": "train_and_serve",
7ao431iiu5-algo-1-9jid1 |     "network_interface_name": "eth0",
7ao431iiu5-algo-1-9jid1 |     "num_cpus": 2,
7ao431iiu5-algo-1-9jid1 |     "num_gpus": 0,
7ao431iiu5-algo-1-9jid1 |     "output_data_dir": "/opt/ml/output/data",
7ao431iiu5-algo-1-9jid1 |     "output_dir": "/opt/ml/output",
7ao431iiu5-algo-1-9jid1 |     "output_intermediate_dir": "/opt/ml/output/intermediate",
7ao431iiu5-algo-1-9jid1 |     "resource_config": {
7ao431iiu5-algo-1-9jid1 |         "current_host": "algo-1-9jid1",
7ao431iiu5-algo-1-9jid1 |         "hosts": [
7ao431iiu5-algo-1-9jid1 |             "algo-1-9jid1"
7ao431iiu5-algo-1-9jid1 |         ]
7ao431iiu5-algo-1-9jid1 |     },
7ao431iiu5-algo-1-9jid1 |     "user_entry_point": "train_and_serve.py"
7ao431iiu5-algo-1-9jid1 | }
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | Environment variables:
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | SM_HOSTS=["algo-1-9jid1"]
7ao431iiu5-algo-1-9jid1 | SM_NETWORK_INTERFACE_NAME=eth0
7ao431iiu5-algo-1-9jid1 | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
7ao431iiu5-algo-1-9jid1 | SM_USER_ENTRY_POINT=train_and_serve.py
7ao431iiu5-algo-1-9jid1 | SM_FRAMEWORK_PARAMS={}
7ao431iiu5-algo-1-9jid1 | SM_RESOURCE_CONFIG={"current_host":"algo-1-9jid1","hosts":["algo-1-9jid1"]}
7ao431iiu5-algo-1-9jid1 | SM_INPUT_DATA_CONFIG={"train":{"TrainingInputMode":"File"}}
7ao431iiu5-algo-1-9jid1 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
7ao431iiu5-algo-1-9jid1 | SM_CHANNELS=["train"]
7ao431iiu5-algo-1-9jid1 | SM_CURRENT_HOST=algo-1-9jid1
7ao431iiu5-algo-1-9jid1 | SM_MODULE_NAME=train_and_serve
7ao431iiu5-algo-1-9jid1 | SM_LOG_LEVEL=20
7ao431iiu5-algo-1-9jid1 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
7ao431iiu5-algo-1-9jid1 | SM_INPUT_DIR=/opt/ml/input
7ao431iiu5-algo-1-9jid1 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
7ao431iiu5-algo-1-9jid1 | SM_OUTPUT_DIR=/opt/ml/output
7ao431iiu5-algo-1-9jid1 | SM_NUM_CPUS=2
7ao431iiu5-algo-1-9jid1 | SM_NUM_GPUS=0
7ao431iiu5-algo-1-9jid1 | SM_MODEL_DIR=/opt/ml/model
7ao431iiu5-algo-1-9jid1 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-50-077/source/sourcedir.tar.gz
7ao431iiu5-algo-1-9jid1 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1-9jid1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-9jid1"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-50-077","log_level":20,"master_hostname":"algo-1-9jid1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-50-077/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-9jid1","hosts":["algo-1-9jid1"]},"user_entry_point":"train_and_serve.py"}
7ao431iiu5-algo-1-9jid1 | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
7ao431iiu5-algo-1-9jid1 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
7ao431iiu5-algo-1-9jid1 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
7ao431iiu5-algo-1-9jid1 | SM_HP_DUMMY_PARAM_1=val1
7ao431iiu5-algo-1-9jid1 | SM_HP_DUMMY_PARAM_2=val2
7ao431iiu5-algo-1-9jid1 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | Invoking script with the following command:
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 | 
7ao431iiu5-algo-1-9jid1 |  *** Hello from SageMaker script container *** 
7ao431iiu5-algo-1-9jid1 | training_dir files list:  ['train.csv']
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,409 sagemaker-containers INFO     Reporting training SUCCESS
7ao431iiu5-algo-1-9jid1 exited with code 0
Aborting on container exit...
===== Job Complete =====

Again the results are the same. SageMaker will download the data from the S3 bucket and make it available in the container. In the environment variables section we also learned that two directories are special /opt/ml/model and /opt/ml/output. Container environment variables SM_MODEL_DIR and SM_OUTPUT_DATA_DIR point to them, respectively. Whatever artifacts we put on them will be stored on the S3 bucket when the training job finishes. “SM_MODEL_DIR” is for trained models, and “SM_OUTPUT_DATA_DIR” is for other artifacts like logs, graphs, plots, results, etc. Let’s update our training script and put some dummy data in these directories. Once the job is complete, we will verify the stored artifacts on the S3 bucket.

%%writefile $script_file
import argparse, os, sys

if __name__ == "__main__":
    print(" *** Hello from SageMaker script container *** ")

    # list files in SM_CHANNEL_TRAIN
    training_dir = os.environ.get("SM_CHANNEL_TRAIN")
    dir_list = os.listdir(training_dir)
    print("training_dir files list: ", dir_list)

    # write dummy model file to SM_MODEL_DIR
    sm_model_dir = os.environ.get("SM_MODEL_DIR")
    with open(f"{sm_model_dir}/dummy-model.txt", "w") as f:
        f.write("this is a dummy model")

    # write dummy artifact file to SM_OUTPUT_DATA_DIR
    sm_output_data_dir = os.environ.get("SM_OUTPUT_DATA_DIR")
    with open(f"{sm_output_data_dir}/dummy-output-data.txt", "w") as f:
        f.write("this is a dummy output data")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file,
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)

sk_estimator.fit({"train": s3_train_uri})
Creating c30093mavu-algo-1-p87y9 ... 
Creating c30093mavu-algo-1-p87y9 ... done
Attaching to c30093mavu-algo-1-p87y9
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,051 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,055 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,065 sagemaker_sklearn_container.training INFO     Invoking user training script.
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,251 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,267 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,281 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,291 sagemaker-training-toolkit INFO     Invoking user script
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | Training Env:
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | {
c30093mavu-algo-1-p87y9 |     "additional_framework_parameters": {},
c30093mavu-algo-1-p87y9 |     "channel_input_dirs": {
c30093mavu-algo-1-p87y9 |         "train": "/opt/ml/input/data/train"
c30093mavu-algo-1-p87y9 |     },
c30093mavu-algo-1-p87y9 |     "current_host": "algo-1-p87y9",
c30093mavu-algo-1-p87y9 |     "framework_module": "sagemaker_sklearn_container.training:main",
c30093mavu-algo-1-p87y9 |     "hosts": [
c30093mavu-algo-1-p87y9 |         "algo-1-p87y9"
c30093mavu-algo-1-p87y9 |     ],
c30093mavu-algo-1-p87y9 |     "hyperparameters": {
c30093mavu-algo-1-p87y9 |         "dummy_param_1": "val1",
c30093mavu-algo-1-p87y9 |         "dummy_param_2": "val2"
c30093mavu-algo-1-p87y9 |     },
c30093mavu-algo-1-p87y9 |     "input_config_dir": "/opt/ml/input/config",
c30093mavu-algo-1-p87y9 |     "input_data_config": {
c30093mavu-algo-1-p87y9 |         "train": {
c30093mavu-algo-1-p87y9 |             "TrainingInputMode": "File"
c30093mavu-algo-1-p87y9 |         }
c30093mavu-algo-1-p87y9 |     },
c30093mavu-algo-1-p87y9 |     "input_dir": "/opt/ml/input",
c30093mavu-algo-1-p87y9 |     "is_master": true,
c30093mavu-algo-1-p87y9 |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-53-775",
c30093mavu-algo-1-p87y9 |     "log_level": 20,
c30093mavu-algo-1-p87y9 |     "master_hostname": "algo-1-p87y9",
c30093mavu-algo-1-p87y9 |     "model_dir": "/opt/ml/model",
c30093mavu-algo-1-p87y9 |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz",
c30093mavu-algo-1-p87y9 |     "module_name": "train_and_serve",
c30093mavu-algo-1-p87y9 |     "network_interface_name": "eth0",
c30093mavu-algo-1-p87y9 |     "num_cpus": 2,
c30093mavu-algo-1-p87y9 |     "num_gpus": 0,
c30093mavu-algo-1-p87y9 |     "output_data_dir": "/opt/ml/output/data",
c30093mavu-algo-1-p87y9 |     "output_dir": "/opt/ml/output",
c30093mavu-algo-1-p87y9 |     "output_intermediate_dir": "/opt/ml/output/intermediate",
c30093mavu-algo-1-p87y9 |     "resource_config": {
c30093mavu-algo-1-p87y9 |         "current_host": "algo-1-p87y9",
c30093mavu-algo-1-p87y9 |         "hosts": [
c30093mavu-algo-1-p87y9 |             "algo-1-p87y9"
c30093mavu-algo-1-p87y9 |         ]
c30093mavu-algo-1-p87y9 |     },
c30093mavu-algo-1-p87y9 |     "user_entry_point": "train_and_serve.py"
c30093mavu-algo-1-p87y9 | }
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | Environment variables:
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | SM_HOSTS=["algo-1-p87y9"]
c30093mavu-algo-1-p87y9 | SM_NETWORK_INTERFACE_NAME=eth0
c30093mavu-algo-1-p87y9 | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
c30093mavu-algo-1-p87y9 | SM_USER_ENTRY_POINT=train_and_serve.py
c30093mavu-algo-1-p87y9 | SM_FRAMEWORK_PARAMS={}
c30093mavu-algo-1-p87y9 | SM_RESOURCE_CONFIG={"current_host":"algo-1-p87y9","hosts":["algo-1-p87y9"]}
c30093mavu-algo-1-p87y9 | SM_INPUT_DATA_CONFIG={"train":{"TrainingInputMode":"File"}}
c30093mavu-algo-1-p87y9 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
c30093mavu-algo-1-p87y9 | SM_CHANNELS=["train"]
c30093mavu-algo-1-p87y9 | SM_CURRENT_HOST=algo-1-p87y9
c30093mavu-algo-1-p87y9 | SM_MODULE_NAME=train_and_serve
c30093mavu-algo-1-p87y9 | SM_LOG_LEVEL=20
c30093mavu-algo-1-p87y9 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
c30093mavu-algo-1-p87y9 | SM_INPUT_DIR=/opt/ml/input
c30093mavu-algo-1-p87y9 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
c30093mavu-algo-1-p87y9 | SM_OUTPUT_DIR=/opt/ml/output
c30093mavu-algo-1-p87y9 | SM_NUM_CPUS=2
c30093mavu-algo-1-p87y9 | SM_NUM_GPUS=0
c30093mavu-algo-1-p87y9 | SM_MODEL_DIR=/opt/ml/model
c30093mavu-algo-1-p87y9 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz
c30093mavu-algo-1-p87y9 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1-p87y9","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-p87y9"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-53-775","log_level":20,"master_hostname":"algo-1-p87y9","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-p87y9","hosts":["algo-1-p87y9"]},"user_entry_point":"train_and_serve.py"}
c30093mavu-algo-1-p87y9 | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
c30093mavu-algo-1-p87y9 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
c30093mavu-algo-1-p87y9 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
c30093mavu-algo-1-p87y9 | SM_HP_DUMMY_PARAM_1=val1
c30093mavu-algo-1-p87y9 | SM_HP_DUMMY_PARAM_2=val2
c30093mavu-algo-1-p87y9 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | Invoking script with the following command:
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 | 
c30093mavu-algo-1-p87y9 |  *** Hello from SageMaker script container *** 
c30093mavu-algo-1-p87y9 | training_dir files list:  ['train.csv']
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,328 sagemaker-containers INFO     Reporting training SUCCESS
c30093mavu-algo-1-p87y9 exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmpuwvrle8_/algo-1-p87y9 Please remove it manually.
===== Job Complete =====

Our training job is now complete. Let us check the S3 bucket to see if our dummy model and other artifacts are present.

First, we need the S3 URI for these artifacts. For our dummy model (from SM_MODEL_DIR), we can use our estimator object to get the URI.

model_data = sk_estimator.model_data
model_data
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/model.tar.gz'

Let’s download model_data from S3 to a local directory for verification. For this create a local /tmp to store these downloaded files.

local_tmp_path = local_path + "/tmp"
print(local_tmp_path)

# create the local '/tmp' directory
Path(local_tmp_path).mkdir(parents=True, exist_ok=True)
./datasets/2022-07-07-sagemaker-script-mode/tmp

We will use SageMaker S3Downloader object to download the model file.

from sagemaker.s3 import S3Downloader

S3Downloader.download(
    s3_uri=model_data, local_path=local_tmp_path, sagemaker_session=session
)

File is downloaded. Let’s uncompress it to verify the model file.

!tar -xzvf $local_tmp_path/model.tar.gz -C $local_tmp_path
dummy-model.txt

Yes, the “dummy-model.txt” file is present. This tells us that SageMaker will automatically upload the files from the model directory (SM_MODEL_DIR) to the S3 bucket. Let’s do the same for the output data directory (SM_OUTPUT_DATA_DIR). There is no direct way to get the S3 URI from the estimator object for the output data directory. But we can prepare it ourselves. So let’s do that next.

print("estimator.output_path: ", sk_estimator.output_path)
print("estimator.latest_training_job.name: ", sk_estimator.latest_training_job.name)
estimator.output_path:  s3://sagemaker-us-east-1-801598032724/
estimator.latest_training_job.name:  sagemaker-scikit-learn-2022-07-17-15-23-53-775
def get_s3_output_uri(estimator):
    return estimator.output_path + estimator.latest_training_job.name
    
get_s3_output_uri(sk_estimator)
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775'
##
# S3 URI for output data artifacts
s3_output_uri = get_s3_output_uri(sk_estimator) + '/output.tar.gz'
s3_output_uri
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/output.tar.gz'
## 
# S3 URI for model artifact. We have already veirifed it.
s3_model_uri = get_s3_output_uri(sk_estimator) + '/model.tar.gz'
s3_model_uri
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/model.tar.gz'
##
# S3 URI for source code
s3_source_uri = get_s3_output_uri(sk_estimator) + '/source/sourcedir.tar.gz'
s3_source_uri
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz'

Let’s download these artifacts to our local ‘/tmp’ directory for verification.

!aws s3 cp $s3_output_uri $local_tmp_path
!aws s3 cp $s3_source_uri $local_tmp_path
download: s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/output.tar.gz to datasets/2022-07-07-sagemaker-script-mode/tmp/output.tar.gz
download: s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz to datasets/2022-07-07-sagemaker-script-mode/tmp/sourcedir.tar.gz
##
# extract the output data files from 'output.tar.gz'
!tar -xzvf $local_tmp_path/output.tar.gz -C $local_tmp_path
data/
data/dummy-output-data.txt
success
##
# extract the source code files from 'sourcedir.tar.gz'
!tar -xzvf $local_tmp_path/sourcedir.tar.gz -C $local_tmp_path
train_and_serve.py

Summary till now

Let’s summarize what we have learned till now. * We can use SageMaker SKLearn local mode to test our code in a local environment * SKLearn container executes our provided script with the command /miniconda3/bin/python train_and_server.py * Hyperparameters passed to the container are passed to our script as command line arguments * Data from input channels will be downloaded by the container and made available for our script to load and process * ‘/opt/ml/model’ and ‘/opt/ml/output’ directories are special. Anything stored on them will be automatically backed up on the S3 bucket when the job finishes. These directories are defined in the container environment variables ‘SM_MODEL_DIR’ and ‘SM_OUTPUT_DATA_DIR’, respectively. SM_MODEL_DIR should be used to write model artifacts. SM_OUTPUT_DATA_DIR should be used to write any other supporting artifact.

Let’s use this knowledge to update our script to train a RandomForrestClassifier on the Iris flower dataset.

##
# cleanup /tmp directory before moving to next section
!rm -r $local_tmp_path/*

Prepare training script for RandomForestClassifier

Let’s update our training script to train a scikit-learn random forest classifier model on the iris data set. The script will read training and testing data from input data channel directories and trains a classifier on it. It will then save the model to the model directory and validation results (‘y_pred.csv’) to the output data directory. Notice that we have also parsed container environment variables as command line arguments. It makes sense for hyperparameters (‘–estimators’) because we know they will be passed to the script as command line parameters. For other environment variables (e.g. ‘SM_MODEL_DIR’), we have checked first if they are given as command line arguments. If they are, then we parse them to get the values. Otherwise, we read their values from the environment. This is done so we can test our script locally from the command line without setting the environment variables.

%%writefile $script_file

import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
import joblib

if __name__ == "__main__":

    # Pass in environment variables and hyperparameters
    parser = argparse.ArgumentParser()

    # Hyperparameters
    parser.add_argument("--estimators", type=int, default=15)

    # sm_model_dir: model artifacts stored here after training
    # sm-channel-train: input training data location
    # sm-channel-test: input test data location
    # sm-output-data-dir: output artifacts location
    parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
    parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))

    args, _ = parser.parse_known_args()

    print("command line arguments: ", args)

    estimators = args.estimators
    sm_model_dir = args.sm_model_dir
    training_dir = args.sm_channel_train
    testing_dir = args.sm_channel_test
    output_data_dir = args.sm_output_data_dir

    print(f"training_dir: {training_dir}")
    print(f"training_dir files list: {os.listdir(training_dir)}")
    print(f"testing_dir: {testing_dir}")
    print(f"testing_dir files list: {os.listdir(testing_dir)}")
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"output_data_dir: {output_data_dir}")

    # Read in data
    df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
    df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")

    # Preprocess data
    X_train = df_train.drop(["class", "class_cat"], axis=1)
    y_train = df_train["class_cat"]
    X_test = df_test.drop(["class", "class_cat"], axis=1)
    y_test = df_test["class_cat"]

    print(f"X_train.shape: {X_train.shape}")
    print(f"y_train.shape: {y_train.shape}")
    print(f"X_train.shape: {X_test.shape}")
    print(f"y_train.shape: {y_test.shape}")

    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    # Build model
    regressor = RandomForestClassifier(n_estimators=estimators)
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)

    # Save the model
    joblib.dump(regressor, sm_model_dir + "/model.joblib")

    # Save the results
    pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py

Now give proper execution rights to the script.

!chmod +x $script_file

Let’s test this script locally before passing it to the SKLearn estimator. We will invoke this script from a command line and pass the required parameters similar to how an estimator container will execute it. For testing this script, we need to pass four directory paths: * sm-model-dir This will point to a directory where our script will store the trained model. We can point it to ‘/tmp’ directory for test purposes * sm-channel-train This will point to a directory containing training data. We already have it as ‘local_train_path’ * sm-channel-test This will point to a directory containing test data. We also have it as ‘local_test_path’ * sm-output-data-dir This will point to a directory where our script will store other artifacts. We can also point it to ‘/tmp’ directory for test purposes

Once the script is successfully run, we will find the trained model file ‘model.joblib’ and ‘y_pred.csv’ in the ‘/tmp’ directory.

#collapse-output
!python3 $script_file \
    --sm-model-dir $local_tmp_path \
    --sm-channel-train $local_train_path \
    --sm-channel-test $local_test_path \
    --sm-output-data-dir $local_tmp_path \
    --estimators 10
command line arguments:  Namespace(estimators=10, sm_channel_test='./datasets/2022-07-07-sagemaker-script-mode/test', sm_channel_train='./datasets/2022-07-07-sagemaker-script-mode/train', sm_model_dir='./datasets/2022-07-07-sagemaker-script-mode/tmp', sm_output_data_dir='./datasets/2022-07-07-sagemaker-script-mode/tmp')
training_dir: ./datasets/2022-07-07-sagemaker-script-mode/train
training_dir files list: ['train.csv']
testing_dir: ./datasets/2022-07-07-sagemaker-script-mode/test
testing_dir files list: ['test.csv']
sm_model_dir: ./datasets/2022-07-07-sagemaker-script-mode/tmp
output_data_dir: ./datasets/2022-07-07-sagemaker-script-mode/tmp
X_train.shape: (120, 4)
y_train.shape: (120,)
X_train.shape: (30, 4)
y_train.shape: (30,)

Let’s check the local ‘/tmp’ directory for artifacts.

!ls $local_tmp_path
model.joblib  y_pred.csv

Now that we have test our script and it is working as expected, let’s pass it to SKLean container.

#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file,
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"estimators":10},
)

sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
Creating aer2alr1w1-algo-1-10beq ... 
Creating aer2alr1w1-algo-1-10beq ... done
Attaching to aer2alr1w1-algo-1-10beq
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,011 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,015 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,024 sagemaker_sklearn_container.training INFO     Invoking user training script.
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,226 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,239 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,251 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,260 sagemaker-training-toolkit INFO     Invoking user script
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | Training Env:
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | {
aer2alr1w1-algo-1-10beq |     "additional_framework_parameters": {},
aer2alr1w1-algo-1-10beq |     "channel_input_dirs": {
aer2alr1w1-algo-1-10beq |         "train": "/opt/ml/input/data/train",
aer2alr1w1-algo-1-10beq |         "test": "/opt/ml/input/data/test"
aer2alr1w1-algo-1-10beq |     },
aer2alr1w1-algo-1-10beq |     "current_host": "algo-1-10beq",
aer2alr1w1-algo-1-10beq |     "framework_module": "sagemaker_sklearn_container.training:main",
aer2alr1w1-algo-1-10beq |     "hosts": [
aer2alr1w1-algo-1-10beq |         "algo-1-10beq"
aer2alr1w1-algo-1-10beq |     ],
aer2alr1w1-algo-1-10beq |     "hyperparameters": {
aer2alr1w1-algo-1-10beq |         "estimators": 10
aer2alr1w1-algo-1-10beq |     },
aer2alr1w1-algo-1-10beq |     "input_config_dir": "/opt/ml/input/config",
aer2alr1w1-algo-1-10beq |     "input_data_config": {
aer2alr1w1-algo-1-10beq |         "train": {
aer2alr1w1-algo-1-10beq |             "TrainingInputMode": "File"
aer2alr1w1-algo-1-10beq |         },
aer2alr1w1-algo-1-10beq |         "test": {
aer2alr1w1-algo-1-10beq |             "TrainingInputMode": "File"
aer2alr1w1-algo-1-10beq |         }
aer2alr1w1-algo-1-10beq |     },
aer2alr1w1-algo-1-10beq |     "input_dir": "/opt/ml/input",
aer2alr1w1-algo-1-10beq |     "is_master": true,
aer2alr1w1-algo-1-10beq |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-24-03-447",
aer2alr1w1-algo-1-10beq |     "log_level": 20,
aer2alr1w1-algo-1-10beq |     "master_hostname": "algo-1-10beq",
aer2alr1w1-algo-1-10beq |     "model_dir": "/opt/ml/model",
aer2alr1w1-algo-1-10beq |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-03-447/source/sourcedir.tar.gz",
aer2alr1w1-algo-1-10beq |     "module_name": "train_and_serve",
aer2alr1w1-algo-1-10beq |     "network_interface_name": "eth0",
aer2alr1w1-algo-1-10beq |     "num_cpus": 2,
aer2alr1w1-algo-1-10beq |     "num_gpus": 0,
aer2alr1w1-algo-1-10beq |     "output_data_dir": "/opt/ml/output/data",
aer2alr1w1-algo-1-10beq |     "output_dir": "/opt/ml/output",
aer2alr1w1-algo-1-10beq |     "output_intermediate_dir": "/opt/ml/output/intermediate",
aer2alr1w1-algo-1-10beq |     "resource_config": {
aer2alr1w1-algo-1-10beq |         "current_host": "algo-1-10beq",
aer2alr1w1-algo-1-10beq |         "hosts": [
aer2alr1w1-algo-1-10beq |             "algo-1-10beq"
aer2alr1w1-algo-1-10beq |         ]
aer2alr1w1-algo-1-10beq |     },
aer2alr1w1-algo-1-10beq |     "user_entry_point": "train_and_serve.py"
aer2alr1w1-algo-1-10beq | }
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | Environment variables:
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | SM_HOSTS=["algo-1-10beq"]
aer2alr1w1-algo-1-10beq | SM_NETWORK_INTERFACE_NAME=eth0
aer2alr1w1-algo-1-10beq | SM_HPS={"estimators":10}
aer2alr1w1-algo-1-10beq | SM_USER_ENTRY_POINT=train_and_serve.py
aer2alr1w1-algo-1-10beq | SM_FRAMEWORK_PARAMS={}
aer2alr1w1-algo-1-10beq | SM_RESOURCE_CONFIG={"current_host":"algo-1-10beq","hosts":["algo-1-10beq"]}
aer2alr1w1-algo-1-10beq | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
aer2alr1w1-algo-1-10beq | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
aer2alr1w1-algo-1-10beq | SM_CHANNELS=["test","train"]
aer2alr1w1-algo-1-10beq | SM_CURRENT_HOST=algo-1-10beq
aer2alr1w1-algo-1-10beq | SM_MODULE_NAME=train_and_serve
aer2alr1w1-algo-1-10beq | SM_LOG_LEVEL=20
aer2alr1w1-algo-1-10beq | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
aer2alr1w1-algo-1-10beq | SM_INPUT_DIR=/opt/ml/input
aer2alr1w1-algo-1-10beq | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
aer2alr1w1-algo-1-10beq | SM_OUTPUT_DIR=/opt/ml/output
aer2alr1w1-algo-1-10beq | SM_NUM_CPUS=2
aer2alr1w1-algo-1-10beq | SM_NUM_GPUS=0
aer2alr1w1-algo-1-10beq | SM_MODEL_DIR=/opt/ml/model
aer2alr1w1-algo-1-10beq | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-03-447/source/sourcedir.tar.gz
aer2alr1w1-algo-1-10beq | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-10beq","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-10beq"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-24-03-447","log_level":20,"master_hostname":"algo-1-10beq","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-03-447/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-10beq","hosts":["algo-1-10beq"]},"user_entry_point":"train_and_serve.py"}
aer2alr1w1-algo-1-10beq | SM_USER_ARGS=["--estimators","10"]
aer2alr1w1-algo-1-10beq | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
aer2alr1w1-algo-1-10beq | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
aer2alr1w1-algo-1-10beq | SM_CHANNEL_TEST=/opt/ml/input/data/test
aer2alr1w1-algo-1-10beq | SM_HP_ESTIMATORS=10
aer2alr1w1-algo-1-10beq | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | Invoking script with the following command:
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | /miniconda3/bin/python train_and_serve.py --estimators 10
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | 
aer2alr1w1-algo-1-10beq | command line arguments:  Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
aer2alr1w1-algo-1-10beq | training_dir: /opt/ml/input/data/train
aer2alr1w1-algo-1-10beq | training_dir files list: ['train.csv']
aer2alr1w1-algo-1-10beq | testing_dir: /opt/ml/input/data/test
aer2alr1w1-algo-1-10beq | testing_dir files list: ['test.csv']
aer2alr1w1-algo-1-10beq | sm_model_dir: /opt/ml/model
aer2alr1w1-algo-1-10beq | output_data_dir: /opt/ml/output/data
aer2alr1w1-algo-1-10beq | X_train.shape: (120, 4)
aer2alr1w1-algo-1-10beq | y_train.shape: (120,)
aer2alr1w1-algo-1-10beq | X_train.shape: (30, 4)
aer2alr1w1-algo-1-10beq | y_train.shape: (30,)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:07,286 sagemaker-containers INFO     Reporting training SUCCESS
aer2alr1w1-algo-1-10beq exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmp0yb8k7nj/algo-1-10beq Please remove it manually.
===== Job Complete =====
##
# cleanup /tmp directory before moving to next section
!rm -r $local_tmp_path/*

Passing custom libraries and dependencies to SKLean container

We have successfully trained our classifier but assume we have an additional task. One of your colleagues has created a library that takes the confusion matrix array and plots it with seaborn visualization library. You have been told to use this custom library with the training script and save the confusion matrix plot to the output data directory.

Let’s prepare code for this custom library to take an array and return a confusion matrix plot from seaborn.

# create a path to store the custom library code
custom_library_path = local_path + "/my_custom_library"
custom_library_file = custom_library_path + "/seaborn_confusion_matrix.py"

print(f"custom_library_path: {custom_library_path}")
print(f"custom_library_file: {custom_library_file}")

# make sure the path exists
Path(custom_library_path).mkdir(parents=True, exist_ok=True)
custom_library_path: ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library
custom_library_file: ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library/seaborn_confusion_matrix.py

Now the code to plot the confusion matrix.

%%writefile $custom_library_file

import seaborn as sns
import numpy as np
import argparse, os


def save_confusion_matrix(cf_matrix, path="./"):
    sns_plot = sns.heatmap(cf_matrix, annot=True)
    sns_plot.figure.savefig(path + "/output_cm.png")


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--path", type=str, default="./")
    args, _ = parser.parse_known_args()
    path = args.path

    dummy_cm = np.array([[23, 5], [3, 30]])
    save_confusion_matrix(dummy_cm, path)
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library/seaborn_confusion_matrix.py

Convert the directory containing seaborn code into a Python package directory.

%%writefile $custom_library_path/__init__.py

from .seaborn_confusion_matrix import *
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library/__init__.py

Our custom library has a dependency on the seaborn Python package. So let’s create ‘requirements.txt’ and put all our dependencies in it. Later it will be passed to the SKLean container to install them during initialization.

%%writefile $script_path/requirements.txt

seaborn==0.11.2
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt

Let’s first test this library in our local environment. It should plot a dummy confusion matrix in local /tmp directory.

#collapse-output
# intall the dependiencies first
!pip install -r $script_path/requirements.txt
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: seaborn==0.11.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from -r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (0.11.2)
Requirement already satisfied: numpy>=1.15 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.20.3)
Requirement already satisfied: matplotlib>=2.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (3.5.0)
Requirement already satisfied: scipy>=1.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.5.3)
Requirement already satisfied: pandas>=0.23 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.3.4)
Requirement already satisfied: fonttools>=4.22.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (4.28.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.3.2)
Requirement already satisfied: pillow>=6.2.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (9.0.1)
Requirement already satisfied: packaging>=20.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (21.3)
Requirement already satisfied: pyparsing>=2.2.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (3.0.6)
Requirement already satisfied: cycler>=0.10 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (0.11.0)
Requirement already satisfied: pytz>=2017.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (2021.3)
Requirement already satisfied: six>=1.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.16.0)
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.
##
# test the custom library
!python3 $custom_library_file --path $local_tmp_path
Matplotlib is building the font cache; this may take a moment.
##
# verify the custom library output from the /tmp directory
!ls $local_tmp_path
output_cm.png

So our custom library code works. Let’s update our script to use it.

%%writefile $script_file

import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
import joblib

from my_custom_library import save_confusion_matrix

if __name__ == "__main__":

    # Pass in environment variables and hyperparameters
    parser = argparse.ArgumentParser()

    # Hyperparameters
    parser.add_argument("--estimators", type=int, default=15)

    # sm_model_dir: model artifacts stored here after training
    # sm-channel-train: input training data location
    # sm-channel-test: input test data location
    # sm-output-data-dir: output artifacts location
    parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
    parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))

    args, _ = parser.parse_known_args()

    print("command line arguments: ", args)

    estimators = args.estimators
    sm_model_dir = args.sm_model_dir
    training_dir = args.sm_channel_train
    testing_dir = args.sm_channel_test
    output_data_dir = args.sm_output_data_dir

    print(f"training_dir: {training_dir}")
    print(f"training_dir files list: {os.listdir(training_dir)}")  
    print(f"testing_dir: {testing_dir}")
    print(f"testing_dir files list: {os.listdir(testing_dir)}")
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"output_data_dir: {output_data_dir}")

    # Read in data
    df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
    df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")

    # Preprocess data
    X_train = df_train.drop(["class", "class_cat"], axis=1)
    y_train = df_train["class_cat"]
    X_test = df_test.drop(["class", "class_cat"], axis=1)
    y_test = df_test["class_cat"]

    print(f"X_train.shape: {X_train.shape}")
    print(f"y_train.shape: {y_train.shape}")
    print(f"X_train.shape: {X_test.shape}")
    print(f"y_train.shape: {y_test.shape}")

    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    # Build model
    regressor = RandomForestClassifier(n_estimators=estimators)
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)

    # Save the model
    joblib.dump(regressor, sm_model_dir + "/model.joblib")

    # Save the results
    pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")

    # save the confusion matrix
    cf_matrix = confusion_matrix(y_test, y_pred)
    save_confusion_matrix(cf_matrix, output_data_dir)

    # print sm_model_dir info
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"sm_model_dir files list: {os.listdir(sm_model_dir)}")

    # print output_data_dir info
    print(f"output_data_dir: {output_data_dir}")
    print(f"output_data_dir files list: {os.listdir(output_data_dir)}")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py

Finally, all the ingredients are ready. Let’s run our script from the SKLean container.

In the next cell, you can see that we have passed two extra parameters to the estimator. * source_dir this path points to the directory with the entry_point script train_and_serve.py and requirements.txt. If any requirements.txt file is in this directory, the estimator will pick that and install those packages in the container during initialization. * dependencies this points to a list of dependencies (custom libraries) that we want available in the container.

Our local directory structure is shown below.

local_path/
├── my_custom_library/
│   ├── seaborn_confusion_matrix.py
│   └── __init__.py
└── src/
    ├── train_and_serve.py
    └── requirements.txt
#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file_name,
    source_dir=script_path,
    dependencies=[custom_library_path],
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"estimators":10},
)

sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
Creating xm0kutxos7-algo-1-8yrs9 ... 
Creating xm0kutxos7-algo-1-8yrs9 ... done
Attaching to xm0kutxos7-algo-1-8yrs9
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,458 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,462 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,472 sagemaker_sklearn_container.training INFO     Invoking user training script.
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,661 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
xm0kutxos7-algo-1-8yrs9 | /miniconda3/bin/python -m pip install -r requirements.txt
xm0kutxos7-algo-1-8yrs9 | Collecting seaborn==0.11.2
xm0kutxos7-algo-1-8yrs9 |   Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 5.1 MB/s eta 0:00:0000:01eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
xm0kutxos7-algo-1-8yrs9 | Collecting matplotlib>=2.2
xm0kutxos7-algo-1-8yrs9 |   Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 31.0 MB/s eta 0:00:0000:0100:01:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
xm0kutxos7-algo-1-8yrs9 | Collecting kiwisolver>=1.0.1
xm0kutxos7-algo-1-8yrs9 |   Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 18.6 MB/s eta 0:00:00:00:01ta -:--:--
xm0kutxos7-algo-1-8yrs9 | Collecting cycler>=0.10
xm0kutxos7-algo-1-8yrs9 |   Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
xm0kutxos7-algo-1-8yrs9 | Collecting fonttools>=4.22.0
xm0kutxos7-algo-1-8yrs9 |   Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 102.1 MB/s eta 0:00:0031m? eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Collecting packaging>=20.0
xm0kutxos7-algo-1-8yrs9 |   Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 11.9 MB/s eta 0:00:001m? eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
xm0kutxos7-algo-1-8yrs9 | Collecting pyparsing>=2.2.1
xm0kutxos7-algo-1-8yrs9 |   Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 24.8 MB/s eta 0:00:001m? eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
xm0kutxos7-algo-1-8yrs9 | Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
xm0kutxos7-algo-1-8yrs9 | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
xm0kutxos7-algo-1-8yrs9 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,839 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,859 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,879 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,894 sagemaker-training-toolkit INFO     Invoking user script
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | Training Env:
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | {
xm0kutxos7-algo-1-8yrs9 |     "additional_framework_parameters": {},
xm0kutxos7-algo-1-8yrs9 |     "channel_input_dirs": {
xm0kutxos7-algo-1-8yrs9 |         "train": "/opt/ml/input/data/train",
xm0kutxos7-algo-1-8yrs9 |         "test": "/opt/ml/input/data/test"
xm0kutxos7-algo-1-8yrs9 |     },
xm0kutxos7-algo-1-8yrs9 |     "current_host": "algo-1-8yrs9",
xm0kutxos7-algo-1-8yrs9 |     "framework_module": "sagemaker_sklearn_container.training:main",
xm0kutxos7-algo-1-8yrs9 |     "hosts": [
xm0kutxos7-algo-1-8yrs9 |         "algo-1-8yrs9"
xm0kutxos7-algo-1-8yrs9 |     ],
xm0kutxos7-algo-1-8yrs9 |     "hyperparameters": {
xm0kutxos7-algo-1-8yrs9 |         "estimators": 10
xm0kutxos7-algo-1-8yrs9 |     },
xm0kutxos7-algo-1-8yrs9 |     "input_config_dir": "/opt/ml/input/config",
xm0kutxos7-algo-1-8yrs9 |     "input_data_config": {
xm0kutxos7-algo-1-8yrs9 |         "train": {
xm0kutxos7-algo-1-8yrs9 |             "TrainingInputMode": "File"
xm0kutxos7-algo-1-8yrs9 |         },
xm0kutxos7-algo-1-8yrs9 |         "test": {
xm0kutxos7-algo-1-8yrs9 |             "TrainingInputMode": "File"
xm0kutxos7-algo-1-8yrs9 |         }
xm0kutxos7-algo-1-8yrs9 |     },
xm0kutxos7-algo-1-8yrs9 |     "input_dir": "/opt/ml/input",
xm0kutxos7-algo-1-8yrs9 |     "is_master": true,
xm0kutxos7-algo-1-8yrs9 |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-24-22-270",
xm0kutxos7-algo-1-8yrs9 |     "log_level": 20,
xm0kutxos7-algo-1-8yrs9 |     "master_hostname": "algo-1-8yrs9",
xm0kutxos7-algo-1-8yrs9 |     "model_dir": "/opt/ml/model",
xm0kutxos7-algo-1-8yrs9 |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-22-270/source/sourcedir.tar.gz",
xm0kutxos7-algo-1-8yrs9 |     "module_name": "train_and_serve",
xm0kutxos7-algo-1-8yrs9 |     "network_interface_name": "eth0",
xm0kutxos7-algo-1-8yrs9 |     "num_cpus": 2,
xm0kutxos7-algo-1-8yrs9 |     "num_gpus": 0,
xm0kutxos7-algo-1-8yrs9 |     "output_data_dir": "/opt/ml/output/data",
xm0kutxos7-algo-1-8yrs9 |     "output_dir": "/opt/ml/output",
xm0kutxos7-algo-1-8yrs9 |     "output_intermediate_dir": "/opt/ml/output/intermediate",
xm0kutxos7-algo-1-8yrs9 |     "resource_config": {
xm0kutxos7-algo-1-8yrs9 |         "current_host": "algo-1-8yrs9",
xm0kutxos7-algo-1-8yrs9 |         "hosts": [
xm0kutxos7-algo-1-8yrs9 |             "algo-1-8yrs9"
xm0kutxos7-algo-1-8yrs9 |         ]
xm0kutxos7-algo-1-8yrs9 |     },
xm0kutxos7-algo-1-8yrs9 |     "user_entry_point": "train_and_serve.py"
xm0kutxos7-algo-1-8yrs9 | }
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | Environment variables:
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | SM_HOSTS=["algo-1-8yrs9"]
xm0kutxos7-algo-1-8yrs9 | SM_NETWORK_INTERFACE_NAME=eth0
xm0kutxos7-algo-1-8yrs9 | SM_HPS={"estimators":10}
xm0kutxos7-algo-1-8yrs9 | SM_USER_ENTRY_POINT=train_and_serve.py
xm0kutxos7-algo-1-8yrs9 | SM_FRAMEWORK_PARAMS={}
xm0kutxos7-algo-1-8yrs9 | SM_RESOURCE_CONFIG={"current_host":"algo-1-8yrs9","hosts":["algo-1-8yrs9"]}
xm0kutxos7-algo-1-8yrs9 | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
xm0kutxos7-algo-1-8yrs9 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
xm0kutxos7-algo-1-8yrs9 | SM_CHANNELS=["test","train"]
xm0kutxos7-algo-1-8yrs9 | SM_CURRENT_HOST=algo-1-8yrs9
xm0kutxos7-algo-1-8yrs9 | SM_MODULE_NAME=train_and_serve
xm0kutxos7-algo-1-8yrs9 | SM_LOG_LEVEL=20
xm0kutxos7-algo-1-8yrs9 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
xm0kutxos7-algo-1-8yrs9 | SM_INPUT_DIR=/opt/ml/input
xm0kutxos7-algo-1-8yrs9 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
xm0kutxos7-algo-1-8yrs9 | SM_OUTPUT_DIR=/opt/ml/output
xm0kutxos7-algo-1-8yrs9 | SM_NUM_CPUS=2
xm0kutxos7-algo-1-8yrs9 | SM_NUM_GPUS=0
xm0kutxos7-algo-1-8yrs9 | SM_MODEL_DIR=/opt/ml/model
xm0kutxos7-algo-1-8yrs9 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-22-270/source/sourcedir.tar.gz
xm0kutxos7-algo-1-8yrs9 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-8yrs9","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-8yrs9"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-24-22-270","log_level":20,"master_hostname":"algo-1-8yrs9","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-22-270/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-8yrs9","hosts":["algo-1-8yrs9"]},"user_entry_point":"train_and_serve.py"}
xm0kutxos7-algo-1-8yrs9 | SM_USER_ARGS=["--estimators","10"]
xm0kutxos7-algo-1-8yrs9 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
xm0kutxos7-algo-1-8yrs9 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
xm0kutxos7-algo-1-8yrs9 | SM_CHANNEL_TEST=/opt/ml/input/data/test
xm0kutxos7-algo-1-8yrs9 | SM_HP_ESTIMATORS=10
xm0kutxos7-algo-1-8yrs9 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | Invoking script with the following command:
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | /miniconda3/bin/python train_and_serve.py --estimators 10
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | 
xm0kutxos7-algo-1-8yrs9 | command line arguments:  Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
xm0kutxos7-algo-1-8yrs9 | training_dir: /opt/ml/input/data/train
xm0kutxos7-algo-1-8yrs9 | training_dir files list: ['train.csv']
xm0kutxos7-algo-1-8yrs9 | testing_dir: /opt/ml/input/data/test
xm0kutxos7-algo-1-8yrs9 | testing_dir files list: ['test.csv']
xm0kutxos7-algo-1-8yrs9 | sm_model_dir: /opt/ml/model
xm0kutxos7-algo-1-8yrs9 | output_data_dir: /opt/ml/output/data
xm0kutxos7-algo-1-8yrs9 | X_train.shape: (120, 4)
xm0kutxos7-algo-1-8yrs9 | y_train.shape: (120,)
xm0kutxos7-algo-1-8yrs9 | X_train.shape: (30, 4)
xm0kutxos7-algo-1-8yrs9 | y_train.shape: (30,)
xm0kutxos7-algo-1-8yrs9 | sm_model_dir: /opt/ml/model
xm0kutxos7-algo-1-8yrs9 | sm_model_dir files list: ['model.joblib']
xm0kutxos7-algo-1-8yrs9 | output_data_dir: /opt/ml/output/data
xm0kutxos7-algo-1-8yrs9 | output_data_dir files list: ['y_pred.csv', 'output_cm.png']
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:33,003 sagemaker-containers INFO     Reporting training SUCCESS
Failed to delete: /tmp/tmpee3z9n_9/algo-1-8yrs9 Please remove it manually.
xm0kutxos7-algo-1-8yrs9 exited with code 0
Aborting on container exit...
===== Job Complete =====

sklearn-output-train-complete

SKLearn container output shows that our classifier is successfully trained, and the model and output artifacts are placed in their respective folders. We know from the first section of this post that these artifacts will automatically be uploaded to the S3 bucket. This concludes the model training part of our implementation. Let’s now proceed to model serving part of our solution.

Serve SKLearn model in local mode

At this point, we have our trained model ready. Can we deploy it already?

The answer is no. If we try to deploy this model using command

sk_predictor = sk_estimator.deploy(
    initial_instance_count=1,
    instance_type='local'
)

It will generate an exception message telling us that the estimator does not know how to load the model. So we need to tell the estimator by implementing model_fn function in our script.

[2022-07-09 06:15:45 +0000] [31] [ERROR] Error handling request /ping
Traceback (most recent call last):
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper
    return fn(*args, **kwargs)
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_sklearn_container/serving.py", line 43, in default_model_fn
    return transformer.default_model_fn(model_dir)
  File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_transformer.py", line 35, in default_model_fn
    raise NotImplementedError(
NotImplementedError: 
Please provide a model_fn implementation.
See documentation for model_fn at https://github.com/aws/sagemaker-python-sdk

The model_fn has the following signature:

def model_fn(model_dir)

Besides loading the model, we also need to tell the model server how to get predictions from the loaded model. For this, we need to implement the second function predict_fn, which has the following signature.

def predict_fn(input_data, model)

After we have called the fit function on our SKLearn estimator, we can deploy it by calling the deploy function to create an inference endpoint. Once you call deploy on the estimator two objects are created in response * SageMaker scikit-learn Endpoint: This Endpoint encapsulates a model server running under it. The model server will load the model saved during training and perform inference on it. It requires two helper functions to load the model and make inferences on it: model_fn and predict_fn. * Predictor object: This object is returned in response to the deploy call. It can be used to do inference on the Endpoint hosting our SKLearn model.

Let’s update our script and add these two functions.

%%writefile $script_file

import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
import joblib

from my_custom_library import save_confusion_matrix

if __name__ == "__main__":

    # Pass in environment variables and hyperparameters
    parser = argparse.ArgumentParser()

    # Hyperparameters
    parser.add_argument("--estimators", type=int, default=15)

    # sm_model_dir: model artifacts stored here after training
    # sm-channel-train: input training data location
    # sm-channel-test: input test data location
    # sm-output-data-dir: output artifacts location
    parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
    parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))

    args, _ = parser.parse_known_args()

    print("command line arguments: ", args)

    estimators = args.estimators
    sm_model_dir = args.sm_model_dir
    training_dir = args.sm_channel_train
    testing_dir = args.sm_channel_test
    output_data_dir = args.sm_output_data_dir

    print(f"training_dir: {training_dir}")
    print(f"training_dir files list: {os.listdir(training_dir)}")  
    print(f"testing_dir: {testing_dir}")
    print(f"testing_dir files list: {os.listdir(testing_dir)}")
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"output_data_dir: {output_data_dir}")

    # Read in data
    df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
    df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")

    # Preprocess data
    X_train = df_train.drop(["class", "class_cat"], axis=1)
    y_train = df_train["class_cat"]
    X_test = df_test.drop(["class", "class_cat"], axis=1)
    y_test = df_test["class_cat"]

    print(f"X_train.shape: {X_train.shape}")
    print(f"y_train.shape: {y_train.shape}")
    print(f"X_train.shape: {X_test.shape}")
    print(f"y_train.shape: {y_test.shape}")

    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    # Build model
    regressor = RandomForestClassifier(n_estimators=estimators)
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)

    # Save the model
    joblib.dump(regressor, sm_model_dir + "/model.joblib")

    # Save the results
    pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")

    # save the confusion matrix
    cf_matrix = confusion_matrix(y_test, y_pred)
    save_confusion_matrix(cf_matrix, output_data_dir)

    # print sm_model_dir info
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"sm_model_dir files list: {os.listdir(sm_model_dir)}")

    # print output_data_dir info
    print(f"output_data_dir: {output_data_dir}")
    print(f"output_data_dir files list: {os.listdir(output_data_dir)}")


# Model serving
"""
Deserialize fitted model
"""
def model_fn(model_dir):
    print(f"model_fn model_dir: {model_dir}")
    model = joblib.load(os.path.join(model_dir, "model.joblib"))
    return model

"""
predict_fn
    input_data: returned array from input_fn above
    model (sklearn model) returned model loaded from model_fn above
"""
def predict_fn(input_data, model):
    return model.predict(input_data)
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
sk_estimator = SKLearn(
    entry_point=script_file_name,
    source_dir=script_path,
    dependencies=[custom_library_path],
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"estimators":10},
)

sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
Creating wxtcttdsw0-algo-1-jym48 ... 
Creating wxtcttdsw0-algo-1-jym48 ... done
Attaching to wxtcttdsw0-algo-1-jym48
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,721 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,726 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,735 sagemaker_sklearn_container.training INFO     Invoking user training script.
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,923 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
wxtcttdsw0-algo-1-jym48 | /miniconda3/bin/python -m pip install -r requirements.txt
wxtcttdsw0-algo-1-jym48 | Collecting seaborn==0.11.2
wxtcttdsw0-algo-1-jym48 |   Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 3.3 MB/s eta 0:00:0000:01eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
wxtcttdsw0-algo-1-jym48 | Collecting matplotlib>=2.2
wxtcttdsw0-algo-1-jym48 |   Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 101.6 MB/s eta 0:00:0000:0100:01:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
wxtcttdsw0-algo-1-jym48 | Collecting packaging>=20.0
wxtcttdsw0-algo-1-jym48 |   Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 12.9 MB/s eta 0:00:001m? eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
wxtcttdsw0-algo-1-jym48 | Collecting cycler>=0.10
wxtcttdsw0-algo-1-jym48 |   Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
wxtcttdsw0-algo-1-jym48 | Collecting kiwisolver>=1.0.1
wxtcttdsw0-algo-1-jym48 |   Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 86.1 MB/s eta 0:00:0031m? eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
wxtcttdsw0-algo-1-jym48 | Collecting fonttools>=4.22.0
wxtcttdsw0-algo-1-jym48 |   Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 8.7 MB/s eta 0:00:0000:01eta -:--:--
wxtcttdsw0-algo-1-jym48 | Collecting pyparsing>=2.2.1
wxtcttdsw0-algo-1-jym48 |   Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 27.8 MB/s eta 0:00:001m? eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
wxtcttdsw0-algo-1-jym48 | Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
wxtcttdsw0-algo-1-jym48 | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
wxtcttdsw0-algo-1-jym48 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,696 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,711 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,723 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,732 sagemaker-training-toolkit INFO     Invoking user script
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | Training Env:
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | {
wxtcttdsw0-algo-1-jym48 |     "additional_framework_parameters": {},
wxtcttdsw0-algo-1-jym48 |     "channel_input_dirs": {
wxtcttdsw0-algo-1-jym48 |         "train": "/opt/ml/input/data/train",
wxtcttdsw0-algo-1-jym48 |         "test": "/opt/ml/input/data/test"
wxtcttdsw0-algo-1-jym48 |     },
wxtcttdsw0-algo-1-jym48 |     "current_host": "algo-1-jym48",
wxtcttdsw0-algo-1-jym48 |     "framework_module": "sagemaker_sklearn_container.training:main",
wxtcttdsw0-algo-1-jym48 |     "hosts": [
wxtcttdsw0-algo-1-jym48 |         "algo-1-jym48"
wxtcttdsw0-algo-1-jym48 |     ],
wxtcttdsw0-algo-1-jym48 |     "hyperparameters": {
wxtcttdsw0-algo-1-jym48 |         "estimators": 10
wxtcttdsw0-algo-1-jym48 |     },
wxtcttdsw0-algo-1-jym48 |     "input_config_dir": "/opt/ml/input/config",
wxtcttdsw0-algo-1-jym48 |     "input_data_config": {
wxtcttdsw0-algo-1-jym48 |         "train": {
wxtcttdsw0-algo-1-jym48 |             "TrainingInputMode": "File"
wxtcttdsw0-algo-1-jym48 |         },
wxtcttdsw0-algo-1-jym48 |         "test": {
wxtcttdsw0-algo-1-jym48 |             "TrainingInputMode": "File"
wxtcttdsw0-algo-1-jym48 |         }
wxtcttdsw0-algo-1-jym48 |     },
wxtcttdsw0-algo-1-jym48 |     "input_dir": "/opt/ml/input",
wxtcttdsw0-algo-1-jym48 |     "is_master": true,
wxtcttdsw0-algo-1-jym48 |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-24-34-114",
wxtcttdsw0-algo-1-jym48 |     "log_level": 20,
wxtcttdsw0-algo-1-jym48 |     "master_hostname": "algo-1-jym48",
wxtcttdsw0-algo-1-jym48 |     "model_dir": "/opt/ml/model",
wxtcttdsw0-algo-1-jym48 |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-34-114/source/sourcedir.tar.gz",
wxtcttdsw0-algo-1-jym48 |     "module_name": "train_and_serve",
wxtcttdsw0-algo-1-jym48 |     "network_interface_name": "eth0",
wxtcttdsw0-algo-1-jym48 |     "num_cpus": 2,
wxtcttdsw0-algo-1-jym48 |     "num_gpus": 0,
wxtcttdsw0-algo-1-jym48 |     "output_data_dir": "/opt/ml/output/data",
wxtcttdsw0-algo-1-jym48 |     "output_dir": "/opt/ml/output",
wxtcttdsw0-algo-1-jym48 |     "output_intermediate_dir": "/opt/ml/output/intermediate",
wxtcttdsw0-algo-1-jym48 |     "resource_config": {
wxtcttdsw0-algo-1-jym48 |         "current_host": "algo-1-jym48",
wxtcttdsw0-algo-1-jym48 |         "hosts": [
wxtcttdsw0-algo-1-jym48 |             "algo-1-jym48"
wxtcttdsw0-algo-1-jym48 |         ]
wxtcttdsw0-algo-1-jym48 |     },
wxtcttdsw0-algo-1-jym48 |     "user_entry_point": "train_and_serve.py"
wxtcttdsw0-algo-1-jym48 | }
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | Environment variables:
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | SM_HOSTS=["algo-1-jym48"]
wxtcttdsw0-algo-1-jym48 | SM_NETWORK_INTERFACE_NAME=eth0
wxtcttdsw0-algo-1-jym48 | SM_HPS={"estimators":10}
wxtcttdsw0-algo-1-jym48 | SM_USER_ENTRY_POINT=train_and_serve.py
wxtcttdsw0-algo-1-jym48 | SM_FRAMEWORK_PARAMS={}
wxtcttdsw0-algo-1-jym48 | SM_RESOURCE_CONFIG={"current_host":"algo-1-jym48","hosts":["algo-1-jym48"]}
wxtcttdsw0-algo-1-jym48 | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
wxtcttdsw0-algo-1-jym48 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
wxtcttdsw0-algo-1-jym48 | SM_CHANNELS=["test","train"]
wxtcttdsw0-algo-1-jym48 | SM_CURRENT_HOST=algo-1-jym48
wxtcttdsw0-algo-1-jym48 | SM_MODULE_NAME=train_and_serve
wxtcttdsw0-algo-1-jym48 | SM_LOG_LEVEL=20
wxtcttdsw0-algo-1-jym48 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
wxtcttdsw0-algo-1-jym48 | SM_INPUT_DIR=/opt/ml/input
wxtcttdsw0-algo-1-jym48 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
wxtcttdsw0-algo-1-jym48 | SM_OUTPUT_DIR=/opt/ml/output
wxtcttdsw0-algo-1-jym48 | SM_NUM_CPUS=2
wxtcttdsw0-algo-1-jym48 | SM_NUM_GPUS=0
wxtcttdsw0-algo-1-jym48 | SM_MODEL_DIR=/opt/ml/model
wxtcttdsw0-algo-1-jym48 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-34-114/source/sourcedir.tar.gz
wxtcttdsw0-algo-1-jym48 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-jym48","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-jym48"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-24-34-114","log_level":20,"master_hostname":"algo-1-jym48","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-34-114/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-jym48","hosts":["algo-1-jym48"]},"user_entry_point":"train_and_serve.py"}
wxtcttdsw0-algo-1-jym48 | SM_USER_ARGS=["--estimators","10"]
wxtcttdsw0-algo-1-jym48 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
wxtcttdsw0-algo-1-jym48 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
wxtcttdsw0-algo-1-jym48 | SM_CHANNEL_TEST=/opt/ml/input/data/test
wxtcttdsw0-algo-1-jym48 | SM_HP_ESTIMATORS=10
wxtcttdsw0-algo-1-jym48 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | Invoking script with the following command:
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | /miniconda3/bin/python train_and_serve.py --estimators 10
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | 
wxtcttdsw0-algo-1-jym48 | command line arguments:  Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
wxtcttdsw0-algo-1-jym48 | training_dir: /opt/ml/input/data/train
wxtcttdsw0-algo-1-jym48 | training_dir files list: ['train.csv']
wxtcttdsw0-algo-1-jym48 | testing_dir: /opt/ml/input/data/test
wxtcttdsw0-algo-1-jym48 | testing_dir files list: ['test.csv']
wxtcttdsw0-algo-1-jym48 | sm_model_dir: /opt/ml/model
wxtcttdsw0-algo-1-jym48 | output_data_dir: /opt/ml/output/data
wxtcttdsw0-algo-1-jym48 | X_train.shape: (120, 4)
wxtcttdsw0-algo-1-jym48 | y_train.shape: (120,)
wxtcttdsw0-algo-1-jym48 | X_train.shape: (30, 4)
wxtcttdsw0-algo-1-jym48 | y_train.shape: (30,)
wxtcttdsw0-algo-1-jym48 | sm_model_dir: /opt/ml/model
wxtcttdsw0-algo-1-jym48 | sm_model_dir files list: ['model.joblib']
wxtcttdsw0-algo-1-jym48 | output_data_dir: /opt/ml/output/data
wxtcttdsw0-algo-1-jym48 | output_data_dir files list: ['y_pred.csv', 'output_cm.png']
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:43,775 sagemaker-containers INFO     Reporting training SUCCESS
wxtcttdsw0-algo-1-jym48 exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmpa3uld7ha/algo-1-jym48 Please remove it manually.
===== Job Complete =====

Our model is trained. Let’s also deploy it in the local model. For model loading model_fn, SageMaker will download the model artifacts from S3 and mount them on /opt/ml/model. This way, our script can load the model from within the container.

#collapse-output
sk_predictor = sk_estimator.deploy(
    initial_instance_count=1,
    instance_type='local'
)
Attaching to 3fvyanwal0-algo-1-tz4ow
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,644 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,648 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,649 INFO - sagemaker-containers - nginx config: 
3fvyanwal0-algo-1-tz4ow | worker_processes auto;
3fvyanwal0-algo-1-tz4ow | daemon off;
3fvyanwal0-algo-1-tz4ow | pid /tmp/nginx.pid;
3fvyanwal0-algo-1-tz4ow | error_log  /dev/stderr;
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow | worker_rlimit_nofile 4096;
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow | events {
3fvyanwal0-algo-1-tz4ow |   worker_connections 2048;
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow | http {
3fvyanwal0-algo-1-tz4ow |   include /etc/nginx/mime.types;
3fvyanwal0-algo-1-tz4ow |   default_type application/octet-stream;
3fvyanwal0-algo-1-tz4ow |   access_log /dev/stdout combined;
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow |   upstream gunicorn {
3fvyanwal0-algo-1-tz4ow |     server unix:/tmp/gunicorn.sock;
3fvyanwal0-algo-1-tz4ow |   }
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow |   server {
3fvyanwal0-algo-1-tz4ow |     listen 8080 deferred;
3fvyanwal0-algo-1-tz4ow |     client_max_body_size 0;
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow |     keepalive_timeout 3;
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow |     location ~ ^/(ping|invocations|execution-parameters) {
3fvyanwal0-algo-1-tz4ow |       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
3fvyanwal0-algo-1-tz4ow |       proxy_set_header Host $http_host;
3fvyanwal0-algo-1-tz4ow |       proxy_redirect off;
3fvyanwal0-algo-1-tz4ow |       proxy_read_timeout 60s;
3fvyanwal0-algo-1-tz4ow |       proxy_pass http://gunicorn;
3fvyanwal0-algo-1-tz4ow |     }
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow |     location / {
3fvyanwal0-algo-1-tz4ow |       return 404 "{}";
3fvyanwal0-algo-1-tz4ow |     }
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow |   }
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow | 
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,866 INFO - sagemaker-containers - Module train_and_serve does not provide a setup.py. 
3fvyanwal0-algo-1-tz4ow | Generating setup.py
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,866 INFO - sagemaker-containers - Generating setup.cfg
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,866 INFO - sagemaker-containers - Generating MANIFEST.in
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,867 INFO - sagemaker-containers - Installing module with the following command:
3fvyanwal0-algo-1-tz4ow | /miniconda3/bin/python3 -m pip install . -r requirements.txt
3fvyanwal0-algo-1-tz4ow | Processing /opt/ml/code
3fvyanwal0-algo-1-tz4ow |   Preparing metadata (setup.py) ... done
3fvyanwal0-algo-1-tz4ow | Collecting seaborn==0.11.2
3fvyanwal0-algo-1-tz4ow |   Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 4.8 MB/s eta 0:00:0000:01eta -:--:--
3fvyanwal0-algo-1-tz4ow | Collecting matplotlib>=2.2
3fvyanwal0-algo-1-tz4ow |   Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 93.0 MB/s eta 0:00:0000:0100:01:--:--
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
3fvyanwal0-algo-1-tz4ow | Collecting packaging>=20.0
3fvyanwal0-algo-1-tz4ow |   Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 9.1 MB/s eta 0:00:0031m? eta -:--:--
3fvyanwal0-algo-1-tz4ow | Collecting fonttools>=4.22.0
3fvyanwal0-algo-1-tz4ow |   Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 8.1 MB/s eta 0:00:0000:01eta -:--:--
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
3fvyanwal0-algo-1-tz4ow | Collecting cycler>=0.10
3fvyanwal0-algo-1-tz4ow |   Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
3fvyanwal0-algo-1-tz4ow | Collecting pyparsing>=2.2.1
3fvyanwal0-algo-1-tz4ow |   Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 21.4 MB/s eta 0:00:001m? eta -:--:--
3fvyanwal0-algo-1-tz4ow | Collecting kiwisolver>=1.0.1
3fvyanwal0-algo-1-tz4ow |   Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 17.1 MB/s eta 0:00:00:00:01ta -:--:--
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
3fvyanwal0-algo-1-tz4ow | Building wheels for collected packages: train-and-serve
3fvyanwal0-algo-1-tz4ow |   Building wheel for train-and-serve (setup.py) ... 2022/07/17 15:24:49 [crit] 15#15: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:24:49 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
3fvyanwal0-algo-1-tz4ow | done
3fvyanwal0-algo-1-tz4ow |   Created wheel for train-and-serve: filename=train_and_serve-1.0.0-py2.py3-none-any.whl size=6122 sha256=914e6ad8ea2651da0216fefbc30c28bc25124ff514c30452de608e5b9807197c
3fvyanwal0-algo-1-tz4ow |   Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-2u_4hcln/wheels/f3/75/57/158162e9eab7af12b5c338c279b3a81f103b89d74eeb911c00
3fvyanwal0-algo-1-tz4ow | Successfully built train-and-serve
3fvyanwal0-algo-1-tz4ow | Installing collected packages: train-and-serve, pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
3fvyanwal0-algo-1-tz4ow | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2 train-and-serve-1.0.0
3fvyanwal0-algo-1-tz4ow | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
3fvyanwal0-algo-1-tz4ow | 2022/07/17 15:24:54 [crit] 15#15: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:24:54 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:55,286 INFO - matplotlib.font_manager - generated new fontManager
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [37] [INFO] Starting gunicorn 20.0.4
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [37] [INFO] Listening at: unix:/tmp/gunicorn.sock (37)
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [37] [INFO] Using worker: gevent
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [39] [INFO] Booting worker with pid: 39
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:56 +0000] [40] [INFO] Booting worker with pid: 40
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:59,750 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | model_fn model_dir: /opt/ml/model
!3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:25:01 +0000] "GET /ping HTTP/1.1" 200 0 "-" "python-urllib3/1.26.8"

Let’s create a sample request and get a prediction from our local inference endpoint.

request = [[9.0, 3571, 1976, 0.525]]

response  = sk_predictor.predict(request)
response = int(response[0])
response
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:25:01,760 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | model_fn model_dir: /opt/ml/model
2
3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:25:03 +0000] "POST /invocations HTTP/1.1" 200 136 "-" "python-urllib3/1.26.8"
##
# map response to correct category type
print("Predicted class category {} ({})".format(response, categories_map[response]))
Predicted class category 2 (Iris-virginica)

Since the enpoint in running in the local environment we can observe a webserver running in a docker instance.

!docker ps
CONTAINER ID   IMAGE                                                                               COMMAND   CREATED          STATUS          PORTS                                       NAMES
5a22263f860b   683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.0-1-cpu-py3   "serve"   18 seconds ago   Up 17 seconds   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   3fvyanwal0-algo-1-tz4ow
##
# delete the local endpoint
sk_predictor.delete_endpoint()
Gracefully stopping... (press Ctrl+C again to force)

Note that in local mode, we can only serve a single model simultaneously. Therefore, if you are getting an error on the deploy call in the local model, then check that there is no other endpoint running.

SKLean model server input and output processing

SageMaker model server breaks the incoming request into three steps: 1. input processing 2. prediction, and 3. output processing

In the last section, we have seen that the predict_fn function in the source code file defines model prediction. Similarly, SageMaker provides two additional functions to control input and output processing, defined as input_fn and output_fn, respectively. Both these function have their default implementations. But we can override them by providing our own implementation for them in the source script. If no definition is provided in the source script, then the SageMaker scikit-learn model server will use the default implementation.

  • input_fn: Takes request data and deserializes the data into an object for prediction.
  • output_fn: Takes the prediction result and serializes this according to the response content type.

Let’s update our script to preprocess input request and output response as JSON objects.

%%writefile $script_file

import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
import joblib
import json

from my_custom_library import save_confusion_matrix

if __name__ == "__main__":

    # Pass in environment variables and hyperparameters
    parser = argparse.ArgumentParser()

    # Hyperparameters
    parser.add_argument("--estimators", type=int, default=15)

    # sm_model_dir: model artifacts stored here after training
    # sm-channel-train: input training data location
    # sm-channel-test: input test data location
    # sm-output-data-dir: output artifacts location
    parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
    parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))

    args, _ = parser.parse_known_args()

    print("command line arguments: ", args)

    estimators = args.estimators
    sm_model_dir = args.sm_model_dir
    training_dir = args.sm_channel_train
    testing_dir = args.sm_channel_test
    output_data_dir = args.sm_output_data_dir

    print(f"training_dir: {training_dir}")
    print(f"training_dir files list: {os.listdir(training_dir)}")  
    print(f"testing_dir: {testing_dir}")
    print(f"testing_dir files list: {os.listdir(testing_dir)}")
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"output_data_dir: {output_data_dir}")

    # Read in data
    df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
    df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")

    # Preprocess data
    X_train = df_train.drop(["class", "class_cat"], axis=1)
    y_train = df_train["class_cat"]
    X_test = df_test.drop(["class", "class_cat"], axis=1)
    y_test = df_test["class_cat"]

    print(f"X_train.shape: {X_train.shape}")
    print(f"y_train.shape: {y_train.shape}")
    print(f"X_train.shape: {X_test.shape}")
    print(f"y_train.shape: {y_test.shape}")

    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    # Build model
    regressor = RandomForestClassifier(n_estimators=estimators)
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)

    # Save the model
    joblib.dump(regressor, sm_model_dir + "/model.joblib")

    # Save the results
    pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")

    # save the confusion matrix
    cf_matrix = confusion_matrix(y_test, y_pred)
    save_confusion_matrix(cf_matrix, output_data_dir)

    # print sm_model_dir info
    print(f"sm_model_dir: {sm_model_dir}")
    print(f"sm_model_dir files list: {os.listdir(sm_model_dir)}")

    # print output_data_dir info
    print(f"output_data_dir: {output_data_dir}")
    print(f"output_data_dir files list: {os.listdir(output_data_dir)}")
    
# Model serving
"""
Deserialize fitted model
"""
def model_fn(model_dir):
    print(f"model_fn model_dir: {model_dir}")
    model = joblib.load(os.path.join(model_dir, "model.joblib"))
    return model

"""
predict_fn
    input_data: returned array from input_fn above
    model (sklearn model) returned model loaded from model_fn above
"""
def predict_fn(input_data, model):
    return model.predict(input_data)

"""
input_fn
    request_body: The body of the request sent to the model.
    request_content_type: (string) specifies the format/variable type of the request
"""
def input_fn(request_body, request_content_type):
    if request_content_type == "application/json":
        request_body = json.loads(request_body)
        inpVar = request_body["Input"]
        return inpVar
    else:
        raise ValueError("This model only supports application/json input")

"""
output_fn
    prediction: the returned value from predict_fn above
    content_type: the content type the endpoint expects to be returned. Ex: JSON, string
"""
def output_fn(prediction, content_type):
    res = int(prediction[0])
    respJSON = {"Output": res}
    return respJSON
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
# train and deploy model with input and output as JSON objects
sk_estimator = SKLearn(
    entry_point=script_file_name,
    source_dir=script_path,
    dependencies=[custom_library_path],
    role=role,
    instance_count=1,
    instance_type='local',
    framework_version="1.0-1",
    hyperparameters={"estimators":10},
)

sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})

sk_predictor = sk_estimator.deploy(
    initial_instance_count=1,
    instance_type='local'
)
Creating ubyi50juw8-algo-1-9w0jk ... 
Creating ubyi50juw8-algo-1-9w0jk ... done
Attaching to ubyi50juw8-algo-1-9w0jk
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,036 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,041 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,050 sagemaker_sklearn_container.training INFO     Invoking user training script.
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,233 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
ubyi50juw8-algo-1-9w0jk | /miniconda3/bin/python -m pip install -r requirements.txt
ubyi50juw8-algo-1-9w0jk | Collecting seaborn==0.11.2
ubyi50juw8-algo-1-9w0jk |   Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 38.8 MB/s eta 0:00:0031m? eta -:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
ubyi50juw8-algo-1-9w0jk | Collecting matplotlib>=2.2
ubyi50juw8-algo-1-9w0jk |   Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 94.9 MB/s eta 0:00:0000:0100:01:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
ubyi50juw8-algo-1-9w0jk | Collecting pyparsing>=2.2.1
ubyi50juw8-algo-1-9w0jk |   Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 27.9 MB/s eta 0:00:001m? eta -:--:--
ubyi50juw8-algo-1-9w0jk | Collecting cycler>=0.10
ubyi50juw8-algo-1-9w0jk |   Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
ubyi50juw8-algo-1-9w0jk | Collecting fonttools>=4.22.0
ubyi50juw8-algo-1-9w0jk |   Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 26.2 MB/s eta 0:00:0000:01eta -:--:--
ubyi50juw8-algo-1-9w0jk | Collecting packaging>=20.0
ubyi50juw8-algo-1-9w0jk |   Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 3.6 MB/s eta 0:00:0031m? eta -:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
ubyi50juw8-algo-1-9w0jk | Collecting kiwisolver>=1.0.1
ubyi50juw8-algo-1-9w0jk |   Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 15.9 MB/s eta 0:00:00:00:01ta -:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
ubyi50juw8-algo-1-9w0jk | Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
ubyi50juw8-algo-1-9w0jk | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
ubyi50juw8-algo-1-9w0jk | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,747 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,760 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,773 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,781 sagemaker-training-toolkit INFO     Invoking user script
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | Training Env:
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | {
ubyi50juw8-algo-1-9w0jk |     "additional_framework_parameters": {},
ubyi50juw8-algo-1-9w0jk |     "channel_input_dirs": {
ubyi50juw8-algo-1-9w0jk |         "train": "/opt/ml/input/data/train",
ubyi50juw8-algo-1-9w0jk |         "test": "/opt/ml/input/data/test"
ubyi50juw8-algo-1-9w0jk |     },
ubyi50juw8-algo-1-9w0jk |     "current_host": "algo-1-9w0jk",
ubyi50juw8-algo-1-9w0jk |     "framework_module": "sagemaker_sklearn_container.training:main",
ubyi50juw8-algo-1-9w0jk |     "hosts": [
ubyi50juw8-algo-1-9w0jk |         "algo-1-9w0jk"
ubyi50juw8-algo-1-9w0jk |     ],
ubyi50juw8-algo-1-9w0jk |     "hyperparameters": {
ubyi50juw8-algo-1-9w0jk |         "estimators": 10
ubyi50juw8-algo-1-9w0jk |     },
ubyi50juw8-algo-1-9w0jk |     "input_config_dir": "/opt/ml/input/config",
ubyi50juw8-algo-1-9w0jk |     "input_data_config": {
ubyi50juw8-algo-1-9w0jk |         "train": {
ubyi50juw8-algo-1-9w0jk |             "TrainingInputMode": "File"
ubyi50juw8-algo-1-9w0jk |         },
ubyi50juw8-algo-1-9w0jk |         "test": {
ubyi50juw8-algo-1-9w0jk |             "TrainingInputMode": "File"
ubyi50juw8-algo-1-9w0jk |         }
ubyi50juw8-algo-1-9w0jk |     },
ubyi50juw8-algo-1-9w0jk |     "input_dir": "/opt/ml/input",
ubyi50juw8-algo-1-9w0jk |     "is_master": true,
ubyi50juw8-algo-1-9w0jk |     "job_name": "sagemaker-scikit-learn-2022-07-17-15-25-04-516",
ubyi50juw8-algo-1-9w0jk |     "log_level": 20,
ubyi50juw8-algo-1-9w0jk |     "master_hostname": "algo-1-9w0jk",
ubyi50juw8-algo-1-9w0jk |     "model_dir": "/opt/ml/model",
ubyi50juw8-algo-1-9w0jk |     "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-04-516/source/sourcedir.tar.gz",
ubyi50juw8-algo-1-9w0jk |     "module_name": "train_and_serve",
ubyi50juw8-algo-1-9w0jk |     "network_interface_name": "eth0",
ubyi50juw8-algo-1-9w0jk |     "num_cpus": 2,
ubyi50juw8-algo-1-9w0jk |     "num_gpus": 0,
ubyi50juw8-algo-1-9w0jk |     "output_data_dir": "/opt/ml/output/data",
ubyi50juw8-algo-1-9w0jk |     "output_dir": "/opt/ml/output",
ubyi50juw8-algo-1-9w0jk |     "output_intermediate_dir": "/opt/ml/output/intermediate",
ubyi50juw8-algo-1-9w0jk |     "resource_config": {
ubyi50juw8-algo-1-9w0jk |         "current_host": "algo-1-9w0jk",
ubyi50juw8-algo-1-9w0jk |         "hosts": [
ubyi50juw8-algo-1-9w0jk |             "algo-1-9w0jk"
ubyi50juw8-algo-1-9w0jk |         ]
ubyi50juw8-algo-1-9w0jk |     },
ubyi50juw8-algo-1-9w0jk |     "user_entry_point": "train_and_serve.py"
ubyi50juw8-algo-1-9w0jk | }
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | Environment variables:
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | SM_HOSTS=["algo-1-9w0jk"]
ubyi50juw8-algo-1-9w0jk | SM_NETWORK_INTERFACE_NAME=eth0
ubyi50juw8-algo-1-9w0jk | SM_HPS={"estimators":10}
ubyi50juw8-algo-1-9w0jk | SM_USER_ENTRY_POINT=train_and_serve.py
ubyi50juw8-algo-1-9w0jk | SM_FRAMEWORK_PARAMS={}
ubyi50juw8-algo-1-9w0jk | SM_RESOURCE_CONFIG={"current_host":"algo-1-9w0jk","hosts":["algo-1-9w0jk"]}
ubyi50juw8-algo-1-9w0jk | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
ubyi50juw8-algo-1-9w0jk | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
ubyi50juw8-algo-1-9w0jk | SM_CHANNELS=["test","train"]
ubyi50juw8-algo-1-9w0jk | SM_CURRENT_HOST=algo-1-9w0jk
ubyi50juw8-algo-1-9w0jk | SM_MODULE_NAME=train_and_serve
ubyi50juw8-algo-1-9w0jk | SM_LOG_LEVEL=20
ubyi50juw8-algo-1-9w0jk | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
ubyi50juw8-algo-1-9w0jk | SM_INPUT_DIR=/opt/ml/input
ubyi50juw8-algo-1-9w0jk | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
ubyi50juw8-algo-1-9w0jk | SM_OUTPUT_DIR=/opt/ml/output
ubyi50juw8-algo-1-9w0jk | SM_NUM_CPUS=2
ubyi50juw8-algo-1-9w0jk | SM_NUM_GPUS=0
ubyi50juw8-algo-1-9w0jk | SM_MODEL_DIR=/opt/ml/model
ubyi50juw8-algo-1-9w0jk | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-04-516/source/sourcedir.tar.gz
ubyi50juw8-algo-1-9w0jk | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-9w0jk","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-9w0jk"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-25-04-516","log_level":20,"master_hostname":"algo-1-9w0jk","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-04-516/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-9w0jk","hosts":["algo-1-9w0jk"]},"user_entry_point":"train_and_serve.py"}
ubyi50juw8-algo-1-9w0jk | SM_USER_ARGS=["--estimators","10"]
ubyi50juw8-algo-1-9w0jk | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
ubyi50juw8-algo-1-9w0jk | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
ubyi50juw8-algo-1-9w0jk | SM_CHANNEL_TEST=/opt/ml/input/data/test
ubyi50juw8-algo-1-9w0jk | SM_HP_ESTIMATORS=10
ubyi50juw8-algo-1-9w0jk | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | Invoking script with the following command:
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | /miniconda3/bin/python train_and_serve.py --estimators 10
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | 
ubyi50juw8-algo-1-9w0jk | command line arguments:  Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
ubyi50juw8-algo-1-9w0jk | training_dir: /opt/ml/input/data/train
ubyi50juw8-algo-1-9w0jk | training_dir files list: ['train.csv']
ubyi50juw8-algo-1-9w0jk | testing_dir: /opt/ml/input/data/test
ubyi50juw8-algo-1-9w0jk | testing_dir files list: ['test.csv']
ubyi50juw8-algo-1-9w0jk | sm_model_dir: /opt/ml/model
ubyi50juw8-algo-1-9w0jk | output_data_dir: /opt/ml/output/data
ubyi50juw8-algo-1-9w0jk | X_train.shape: (120, 4)
ubyi50juw8-algo-1-9w0jk | y_train.shape: (120,)
ubyi50juw8-algo-1-9w0jk | X_train.shape: (30, 4)
ubyi50juw8-algo-1-9w0jk | y_train.shape: (30,)
ubyi50juw8-algo-1-9w0jk | sm_model_dir: /opt/ml/model
ubyi50juw8-algo-1-9w0jk | sm_model_dir files list: ['model.joblib']
ubyi50juw8-algo-1-9w0jk | output_data_dir: /opt/ml/output/data
ubyi50juw8-algo-1-9w0jk | output_data_dir files list: ['y_pred.csv', 'output_cm.png']
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:13,824 sagemaker-containers INFO     Reporting training SUCCESS
ubyi50juw8-algo-1-9w0jk exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmp9kiuooe_/algo-1-9w0jk Please remove it manually.
===== Job Complete =====
Attaching to e6h08rxuj4-algo-1-bitrj
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,610 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,614 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,615 INFO - sagemaker-containers - nginx config: 
e6h08rxuj4-algo-1-bitrj | worker_processes auto;
e6h08rxuj4-algo-1-bitrj | daemon off;
e6h08rxuj4-algo-1-bitrj | pid /tmp/nginx.pid;
e6h08rxuj4-algo-1-bitrj | error_log  /dev/stderr;
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj | worker_rlimit_nofile 4096;
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj | events {
e6h08rxuj4-algo-1-bitrj |   worker_connections 2048;
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj | http {
e6h08rxuj4-algo-1-bitrj |   include /etc/nginx/mime.types;
e6h08rxuj4-algo-1-bitrj |   default_type application/octet-stream;
e6h08rxuj4-algo-1-bitrj |   access_log /dev/stdout combined;
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj |   upstream gunicorn {
e6h08rxuj4-algo-1-bitrj |     server unix:/tmp/gunicorn.sock;
e6h08rxuj4-algo-1-bitrj |   }
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj |   server {
e6h08rxuj4-algo-1-bitrj |     listen 8080 deferred;
e6h08rxuj4-algo-1-bitrj |     client_max_body_size 0;
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj |     keepalive_timeout 3;
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj |     location ~ ^/(ping|invocations|execution-parameters) {
e6h08rxuj4-algo-1-bitrj |       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
e6h08rxuj4-algo-1-bitrj |       proxy_set_header Host $http_host;
e6h08rxuj4-algo-1-bitrj |       proxy_redirect off;
e6h08rxuj4-algo-1-bitrj |       proxy_read_timeout 60s;
e6h08rxuj4-algo-1-bitrj |       proxy_pass http://gunicorn;
e6h08rxuj4-algo-1-bitrj |     }
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj |     location / {
e6h08rxuj4-algo-1-bitrj |       return 404 "{}";
e6h08rxuj4-algo-1-bitrj |     }
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj |   }
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj | 
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Module train_and_serve does not provide a setup.py. 
e6h08rxuj4-algo-1-bitrj | Generating setup.py
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Generating setup.cfg
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Generating MANIFEST.in
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Installing module with the following command:
e6h08rxuj4-algo-1-bitrj | /miniconda3/bin/python3 -m pip install . -r requirements.txt
e6h08rxuj4-algo-1-bitrj | Processing /opt/ml/code
e6h08rxuj4-algo-1-bitrj |   Preparing metadata (setup.py) ... done
e6h08rxuj4-algo-1-bitrj | Collecting seaborn==0.11.2
e6h08rxuj4-algo-1-bitrj |   Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 36.6 MB/s eta 0:00:0031m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
e6h08rxuj4-algo-1-bitrj | Collecting matplotlib>=2.2
e6h08rxuj4-algo-1-bitrj |   Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 101.9 MB/s eta 0:00:0000:0100:01:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
e6h08rxuj4-algo-1-bitrj | Collecting kiwisolver>=1.0.1
e6h08rxuj4-algo-1-bitrj |   Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 94.2 MB/s eta 0:00:0031m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Collecting cycler>=0.10
e6h08rxuj4-algo-1-bitrj |   Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
e6h08rxuj4-algo-1-bitrj | Collecting packaging>=20.0
e6h08rxuj4-algo-1-bitrj |   Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 11.9 MB/s eta 0:00:001m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Collecting fonttools>=4.22.0
e6h08rxuj4-algo-1-bitrj |   Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 12.1 MB/s eta 0:00:0000:01eta -:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
e6h08rxuj4-algo-1-bitrj | Collecting pyparsing>=2.2.1
e6h08rxuj4-algo-1-bitrj |   Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 27.3 MB/s eta 0:00:001m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
e6h08rxuj4-algo-1-bitrj | Building wheels for collected packages: train-and-serve
e6h08rxuj4-algo-1-bitrj |   Building wheel for train-and-serve (setup.py) ... -2022/07/17 15:25:19 [crit] 14#14: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:19 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
e6h08rxuj4-algo-1-bitrj |done
e6h08rxuj4-algo-1-bitrj |   Created wheel for train-and-serve: filename=train_and_serve-1.0.0-py2.py3-none-any.whl size=6682 sha256=f4b6952b904adaa9a17270142b81e0746714104ac88739bf2ba644d15a4fe837
e6h08rxuj4-algo-1-bitrj |   Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-6muul1xe/wheels/f3/75/57/158162e9eab7af12b5c338c279b3a81f103b89d74eeb911c00
e6h08rxuj4-algo-1-bitrj | Successfully built train-and-serve
e6h08rxuj4-algo-1-bitrj | Installing collected packages: train-and-serve, pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
e6h08rxuj4-algo-1-bitrj | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2 train-and-serve-1.0.0
e6h08rxuj4-algo-1-bitrj | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
e6h08rxuj4-algo-1-bitrj | 2022/07/17 15:25:24 [crit] 14#14: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:24 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:24,850 INFO - matplotlib.font_manager - generated new fontManager
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [35] [INFO] Starting gunicorn 20.0.4
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [35] [INFO] Listening at: unix:/tmp/gunicorn.sock (35)
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [35] [INFO] Using worker: gevent
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [37] [INFO] Booting worker with pid: 37
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [38] [INFO] Booting worker with pid: 38
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:29,802 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | model_fn model_dir: /opt/ml/model
!e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:31 +0000] "GET /ping HTTP/1.1" 200 0 "-" "python-urllib3/1.26.8"
sk_endpoint_name = sk_predictor.endpoint_name
sk_endpoint_name
'sagemaker-scikit-learn-2022-07-17-15-25-14-401'
##
# send JSON request to endpoint
import json

client = session_local.sagemaker_runtime_client

request_body = {"Input": [[9.0, 3571, 1976, 0.525]]}
data = json.loads(json.dumps(request_body))
payload = json.dumps(data)

response = client.invoke_endpoint(
    EndpointName=sk_endpoint_name, ContentType="application/json", Body=payload
)

result = json.loads(response["Body"].read().decode())["Output"]
result
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:31,439 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | model_fn model_dir: /opt/ml/model
e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:32 +0000] "POST /invocations HTTP/1.1" 200 13 "-" "python-urllib3/1.26.8"
2
##
# get JSON response from endpoint
print("Predicted class category {} ({})".format(result, categories_map[result]))
Predicted class category 2 (Iris-virginica)

Make sure we delete the endpoint once we have tested our deployed model.

sk_predictor.delete_endpoint()
Gracefully stopping... (press Ctrl+C again to force)

SKLearn model training and serving in SageMaker managed environment

Now that our script is complete and we have tested it in a local environment, let’s proceed to train and deploy it in Amazon SageMaker managed environment. Moving from a local environment to managed environment is very simple. We only need to change the instance type.

#collapse-output
# train and deploy model with input and output as JSON objects
sk_estimator = SKLearn(
    entry_point=script_file_name,
    source_dir=script_path,
    dependencies=[custom_library_path],
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    framework_version="1.0-1",
    hyperparameters={"estimators":10},
)

sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
2022-07-17 15:25:33 Starting - Starting the training job...
2022-07-17 15:25:57 Starting - Preparing the instances for trainingProfilerReport-1658071533: InProgress
.........
2022-07-17 15:27:17 Downloading - Downloading input data...
2022-07-17 15:27:57 Training - Downloading the training image...
2022-07-17 15:28:35 Training - Training image download completed. Training in progress...2022-07-17 15:28:37,458 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
2022-07-17 15:28:37,462 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:37,478 sagemaker_sklearn_container.training INFO     Invoking user training script.
2022-07-17 15:28:37,948 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
/miniconda3/bin/python -m pip install -r requirements.txt
Collecting seaborn==0.11.2
  Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 11.5 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
Collecting matplotlib>=2.2
  Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 69.0 MB/s eta 0:00:00
Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 37.9 MB/s eta 0:00:00
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 7.4 MB/s eta 0:00:00
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting pyparsing>=2.2.1
  Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 18.2 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
  Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 52.9 MB/s eta 0:00:00
Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
2022-07-17 15:28:44,790 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:44,810 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:44,831 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:44,847 sagemaker-training-toolkit INFO     Invoking user script
Training Env:
{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "test": "/opt/ml/input/data/test",
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "algo-1",
    "framework_module": "sagemaker_sklearn_container.training:main",
    "hosts": [
        "algo-1"
    ],
    "hyperparameters": {
        "estimators": 10
    },
    "input_config_dir": "/opt/ml/input/config",
    "input_data_config": {
        "test": {
            "TrainingInputMode": "File",
            "S3DistributionType": "FullyReplicated",
            "RecordWrapperType": "None"
        },
        "train": {
            "TrainingInputMode": "File",
            "S3DistributionType": "FullyReplicated",
            "RecordWrapperType": "None"
        }
    },
    "input_dir": "/opt/ml/input",
    "is_master": true,
    "job_name": "sagemaker-scikit-learn-2022-07-17-15-25-33-210",
    "log_level": 20,
    "master_hostname": "algo-1",
    "model_dir": "/opt/ml/model",
    "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-33-210/source/sourcedir.tar.gz",
    "module_name": "train_and_serve",
    "network_interface_name": "eth0",
    "num_cpus": 2,
    "num_gpus": 0,
    "output_data_dir": "/opt/ml/output/data",
    "output_dir": "/opt/ml/output",
    "output_intermediate_dir": "/opt/ml/output/intermediate",
    "resource_config": {
        "current_host": "algo-1",
        "current_instance_type": "ml.m5.large",
        "current_group_name": "homogeneousCluster",
        "hosts": [
            "algo-1"
        ],
        "instance_groups": [
            {
                "instance_group_name": "homogeneousCluster",
                "instance_type": "ml.m5.large",
                "hosts": [
                    "algo-1"
                ]
            }
        ],
        "network_interface_name": "eth0"
    },
    "user_entry_point": "train_and_serve.py"
}
Environment variables:
SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"estimators":10}
SM_USER_ENTRY_POINT=train_and_serve.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_group_name":"homogeneousCluster","current_host":"algo-1","current_instance_type":"ml.m5.large","hosts":["algo-1"],"instance_groups":[{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.m5.large"}],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["test","train"]
SM_CURRENT_HOST=algo-1
SM_MODULE_NAME=train_and_serve
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=2
SM_NUM_GPUS=0
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-33-210/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-25-33-210","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-33-210/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_group_name":"homogeneousCluster","current_host":"algo-1","current_instance_type":"ml.m5.large","hosts":["algo-1"],"instance_groups":[{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.m5.large"}],"network_interface_name":"eth0"},"user_entry_point":"train_and_serve.py"}
SM_USER_ARGS=["--estimators","10"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TEST=/opt/ml/input/data/test
SM_CHANNEL_TRAIN=/opt/ml/input/data/train
SM_HP_ESTIMATORS=10
PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
Invoking script with the following command:
/miniconda3/bin/python train_and_serve.py --estimators 10
command line arguments:  Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
training_dir: /opt/ml/input/data/train
training_dir files list: ['train.csv']
testing_dir: /opt/ml/input/data/test
testing_dir files list: ['test.csv']
sm_model_dir: /opt/ml/model
output_data_dir: /opt/ml/output/data
X_train.shape: (120, 4)
y_train.shape: (120,)
X_train.shape: (30, 4)
y_train.shape: (30,)
sm_model_dir: /opt/ml/model
sm_model_dir files list: ['model.joblib']
output_data_dir: /opt/ml/output/data
output_data_dir files list: ['y_pred.csv', 'output_cm.png']
2022-07-17 15:28:48,476 sagemaker-containers INFO     Reporting training SUCCESS

2022-07-17 15:28:58 Uploading - Uploading generated training model
2022-07-17 15:29:18 Completed - Training job completed
Training seconds: 116
Billable seconds: 116

sklearn-managed-done

We have used AWS managed ml.m5.large instance for training. Once the training job is complete, model artifacts are uploaded to the S3 bucket. In the end, it also shows the billable seconds.

Let’s confirm that the session we are using is not local.

sk_estimator.sagemaker_session
<sagemaker.session.Session at 0x7f80a7d15700>

Now deploy it on SageMaker managed ml.t2.medium instance.

sk_predictor = sk_estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium'
)
-------------!

Test deployed model with a sample request.

##
# send JSON request to endpoint
import json

client = session.sagemaker_runtime_client

request_body = {"Input": [[9.0, 3571, 1976, 0.525]]}
data = json.loads(json.dumps(request_body))
payload = json.dumps(data)

response = client.invoke_endpoint(
    EndpointName=sk_predictor.endpoint_name,
    ContentType="application/json",
    Body=payload,
)

result = json.loads(response["Body"].read().decode())["Output"]
result
2
##
# get JSON response from endpoint
print("Predicted class category {} ({})".format(result, categories_map[result]))
Predicted class category 2 (Iris-virginica)

Again, don’t forget to delete the endpoint once you are done with testing.

sk_predictor.delete_endpoint()