aws-cli/1.22.97 Python/3.8.12 Linux/5.10.102-99.473.amzn2.x86_64 botocore/1.24.19
Deploy Scikit-learn Models to Amazon SageMaker with the SageMaker Python SDK using Script mode
Introduction
You may have trained a model with your favorite ML framework, and now you are asked to move your code to Amazon SageMaker. The good news is that SageMaker’s fully managed training works well with many popular ML frameworks, including scikit-learn
. In addition, SageMaker provides its prebuilt container for the scikit-learn framework, enabling us to seamlessly port our scripts to SageMaker and benefit from its training and deployment capabilities. SageMaker’s scikit-learn Container is an open source library for making the scikit-learn framework run on the Amazon SageMaker platform. You can read more about sklearn container features from its GitHub page SageMaker Scikit-learn Container.
Amazon SageMaker also provides open source Python SDK to train and deploy models on SageMaker. SageMaker SDK provides several high-level abstractions (classes), including: * Session
Provides a collection of methods for working with SageMaker resources * Estimators
Encapsulate training on SageMaker * Predictors
Provide real-time inference and transformation using Python data types against a SageMaker endpoint
You can read more on SageMaker Python SDK from its official site Amazon SageMaker Python SDK
This approach of using a custom training script with SageMaker’s prebuilt container is commonly called as Script Mode. To train a scikit-learn model by using the SageMaker Python SDK involves three steps:
- Prepare a training script. The training script is similar to any other scikit-learn training script that you might use outside of SageMaker
- Create an Estimator object from class
sagemaker.sklearn.SKLearn
. Scikit-learn estimator class handles end-to-end training and deployment of custom scikit-learn code. We pass our training script to the SKLearn estimator, and it executes the script within a SageMaker Training Job. This training job is an Amazon-built Docker container that runs functions defined in the provided Python script. - Call the Estimator’s
fit
method on training data. Training is started by callingfit()
on this Estimator. After training is complete, callingdeploy()
creates a hosted SageMaker endpoint and returns aSKLearnPredictor
instance that can be used to perform inference against the hosted model. We will discuss theSKLearn
Estimator in more detail later in this post.
To read more about using scikit-learn with the SageMaker Python SDK, you may refer to the official documentation using Scikit-learn with the SageMaker Python SDK. The official documentation is valuable, and I would highly recommend checking it and keeping it as a reference.
In this post we will built a scikit-learn RandomForrestClassifier on iris public dataset. There is a similar example in SageMaker documentation. Train a SKLearn Model using Script Mode. But it does not discuss many important aspects of a scikit-learn container and its environment. In this post, we will learn about them and cover all the details of training a scikit-learn model with script mode. I also noted that the example in the documentation uses RandomForrestRegressor
on a classification problem which I believe is a mistake.
We have much to cover and learn, so let’s start.
Environment
This notebook is prepared with AWS SageMaker notebook running on ml.t3.medium
instance and “conda_python3” kernel.
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
# conda environments:
#
base /home/ec2-user/anaconda3
JupyterSystemEnv /home/ec2-user/anaconda3/envs/JupyterSystemEnv
R /home/ec2-user/anaconda3/envs/R
amazonei_mxnet_p36 /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36
amazonei_pytorch_latest_p37 /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37
amazonei_tensorflow2_p36 /home/ec2-user/anaconda3/envs/amazonei_tensorflow2_p36
mxnet_p37 /home/ec2-user/anaconda3/envs/mxnet_p37
python3 * /home/ec2-user/anaconda3/envs/python3
pytorch_p38 /home/ec2-user/anaconda3/envs/pytorch_p38
tensorflow2_p38 /home/ec2-user/anaconda3/envs/tensorflow2_p38
Prepare training and test data
We will use Iris flower dataset. It includes three iris species (Iris setosa, Iris virginica, and Iris versicolor) with 50 samples each. Four features were measured for each sample: the length and the width of the sepals and petals, in centimeters. We can train a model to distinguish the species from each other based on the combination of these four features. You can read more about this dataset at Iris flower data set. The dataset has five columns representing. 1. sepal length in cm 2. sepal width in cm 3. petal length in cm 4. petal width in cm 5. class: Iris Setosa, Iris Versicolour, Iris Virginica
Download and preprocess data
##
# download dataset
import boto3
import pandas as pd
import numpy as np
s3 = boto3.client("s3")
s3.download_file(
f"sagemaker-sample-files", "datasets/tabular/iris/iris.data", "iris.data"
)
df = pd.read_csv(
"iris.data",
header=None,
names=["sepal_len", "sepal_wid", "petal_len", "petal_wid", "class"],
)
df.head()
sepal_len | sepal_wid | petal_len | petal_wid | class | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
##
# Convert the three classes from strings to integers in {0,1,2}
df["class_cat"] = df["class"].astype("category").cat.codes
categories_map = dict(enumerate(df["class"].astype("category").cat.categories))
print(categories_map)
df.head()
{0: 'Iris-setosa', 1: 'Iris-versicolor', 2: 'Iris-virginica'}
sepal_len | sepal_wid | petal_len | petal_wid | class | class_cat | |
---|---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa | 0 |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa | 0 |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa | 0 |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa | 0 |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa | 0 |
Prepare and store train and test sets as CSV files
##
# split the data into train and test set
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2, random_state=42)
print(f"train.shape: {train.shape}")
print(f"test.shape: {test.shape}")
train.shape: (120, 6)
test.shape: (30, 6)
We have our dataset ready. Let’s define a local directory local_path
to keep all the files and artifacts related to this post. I will refer to this directory as ‘workspace’.
We have train and test sets ready. Let’s create two more directories in our workspace and store our data in them.
from pathlib import Path
# local paths
local_train_path = local_path + "/train"
local_test_path = local_path + "/test"
# create local directories
Path(local_train_path).mkdir(parents=True, exist_ok=True)
Path(local_test_path).mkdir(parents=True, exist_ok=True)
print("local_train_path: ", local_train_path)
print("local_test_path: ", local_test_path)
# local file names
local_train_file = local_train_path + "/train.csv"
local_test_file = local_test_path + "/test.csv"
# write train and test CSV files
train.to_csv(local_train_file, index=False)
test.to_csv(local_test_file, index=False)
print("local_train_file: ", local_train_file)
print("local_test_file: ", local_test_file)
local_train_path: ./datasets/2022-07-07-sagemaker-script-mode/train
local_test_path: ./datasets/2022-07-07-sagemaker-script-mode/test
local_train_file: ./datasets/2022-07-07-sagemaker-script-mode/train/train.csv
local_test_file: ./datasets/2022-07-07-sagemaker-script-mode/test/test.csv
Create SageMaker session
import sagemaker
session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = session.default_bucket()
region = session.boto_region_name
print("sagemaker.__version__: ", sagemaker.__version__)
print("Session: ", session)
print("Role: ", role)
print("Bucket: ", bucket)
print("Region: ", region)
sagemaker.__version__: 2.86.2
Session: <sagemaker.session.Session object at 0x7f80ad720460>
Role: arn:aws:iam::801598032724:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole
Bucket: sagemaker-us-east-1-801598032724
Region: us-east-1
What we have done here is * imported the SageMaker Python SDK into our runtime * get a session to work with SageMaker API and other AWS services * get the execution role associated with the user profile. It is the same profile that is available to the user to work from console UI and has AmazonSageMakerFullAccess
policy attached to it. * create or get a default bucket to use and return its name. Default bucket name has the format sagemaker-{region}-{account_id}
. If it doesn’t exist then our session will automatically create it. You may also use any other bucket in its place given that you have enough permission for reading and writing. * get the region name attached to our session
Next, we will use this session to upload data to our default bucket.
Upload data to Amazon S3 bucket
Now upload the data. In the output, we will get the complete path (S3 URI) for our uploaded data.
s3_train_uri = session.upload_data(local_train_file, key_prefix=bucket_prefix + "/data")
s3_test_uri = session.upload_data(local_test_file, key_prefix=bucket_prefix + "/data")
print("s3_train_uri: ", s3_train_uri)
print("s3_test_uri: ", s3_test_uri)
s3_train_uri: s3://sagemaker-us-east-1-801598032724/2022-07-07-sagemaker-script-mode/data/train.csv
s3_test_uri: s3://sagemaker-us-east-1-801598032724/2022-07-07-sagemaker-script-mode/data/test.csv
At this point, our data preparation step is complete. Train and test CSV files are available on the local system and in our default Amazon S3 bucket.
Prepare SageMaker local environment
The Amazon SageMaker training environment is managed, but SageMaker Python SDK also supports local mode, allowing you to train and deploy models to your local environment. This is a great way to test training scripts before running them in SageMaker’s managed training or hosting environment.
How SageMaker managed environment works?
When you send a request to SageMaker API (fit
or deploy
call) * it spins up new instances with the provided specification * loads the algorithm container * pulls the data from S3 * runs the training code * store the results and trained model artifacts to S3 * terminates the new instances
All this happens behind the scenes with a single line of code and is a huge advantage. Spinning up new hardware every time can be good for repeatability and security, but it can add some friction while testing and debugging our code. We can test our code on a small dataset in our local environment with SageMaker local mode and then switch seamlessly to SageMaker managed environment by changing a single line of code.
Steps to prepare Amazon SageMaker local environment
Install the following pre-requisites if you want to set up Amazon SageMaker on your local system. 1. Install required Python packages: pip install boto3 sagemaker pandas scikit-learn pip install 'sagemaker[local]'
2. Docker Desktop installed and running on your computer: docker ps
3. You should have AWS credentials configured on your local machine to be able to pull the docker image from ECR.
Instructions for SageMaker notebook instances
You can also set up SageMaker’s local environment in SageMaker notebook instances. Required Python packages and Docker service is already there. You only need to upgrade the sagemaker[local]
Python package.
#collapse_output
# this is required for SageMaker notebook instances
!pip install 'sagemaker[local]' --upgrade
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: sagemaker[local] in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (2.86.2)
Collecting sagemaker[local]
Downloading sagemaker-2.99.0.tar.gz (542 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 542.7/542.7 KB 10.6 MB/s eta 0:00:0000:01
Preparing metadata (setup.py) ... done
Requirement already satisfied: attrs<22,>=20.3.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (20.3.0)
Requirement already satisfied: boto3<2.0,>=1.20.21 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.21.42)
Requirement already satisfied: google-pasta in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (0.2.0)
Requirement already satisfied: numpy<2.0,>=1.9.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.20.3)
Requirement already satisfied: protobuf<4.0,>=3.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (3.19.1)
Requirement already satisfied: protobuf3-to-dict<1.0,>=0.1.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (0.1.5)
Requirement already satisfied: smdebug_rulesconfig==1.0.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.0.1)
Requirement already satisfied: importlib-metadata<5.0,>=1.4.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (4.8.2)
Requirement already satisfied: packaging>=20.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (21.3)
Requirement already satisfied: pandas in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.3.4)
Requirement already satisfied: pathos in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (0.2.8)
Requirement already satisfied: urllib3==1.26.8 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.26.8)
Requirement already satisfied: docker-compose==1.29.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (1.29.2)
Requirement already satisfied: docker~=5.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (5.0.3)
Requirement already satisfied: PyYAML==5.4.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from sagemaker[local]) (5.4.1)
Requirement already satisfied: texttable<2,>=0.9.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (1.6.4)
Requirement already satisfied: websocket-client<1,>=0.32.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.59.0)
Requirement already satisfied: docopt<1,>=0.6.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.6.2)
Requirement already satisfied: jsonschema<4,>=2.5.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (3.2.0)
Requirement already satisfied: dockerpty<1,>=0.4.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.4.1)
Requirement already satisfied: distro<2,>=1.5.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (1.7.0)
Requirement already satisfied: python-dotenv<1,>=0.13.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (0.20.0)
Requirement already satisfied: requests<3,>=2.20.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker-compose==1.29.2->sagemaker[local]) (2.26.0)
Collecting botocore<1.25.0,>=1.24.42
Downloading botocore-1.24.46-py3-none-any.whl (8.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 34.3 MB/s eta 0:00:00:00:0100:01
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from boto3<2.0,>=1.20.21->sagemaker[local]) (0.5.2)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from boto3<2.0,>=1.20.21->sagemaker[local]) (0.10.0)
Requirement already satisfied: zipp>=0.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from importlib-metadata<5.0,>=1.4.0->sagemaker[local]) (3.6.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from packaging>=20.0->sagemaker[local]) (3.0.6)
Requirement already satisfied: six in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from protobuf3-to-dict<1.0,>=0.1.5->sagemaker[local]) (1.16.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pandas->sagemaker[local]) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pandas->sagemaker[local]) (2021.3)
Requirement already satisfied: multiprocess>=0.70.12 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (0.70.12.2)
Requirement already satisfied: pox>=0.3.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (0.3.0)
Requirement already satisfied: dill>=0.3.4 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (0.3.4)
Requirement already satisfied: ppft>=1.6.6.4 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pathos->sagemaker[local]) (1.6.6.4)
Requirement already satisfied: paramiko>=2.4.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from docker~=5.0.0->sagemaker[local]) (2.10.3)
Requirement already satisfied: pyrsistent>=0.14.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from jsonschema<4,>=2.5.1->docker-compose==1.29.2->sagemaker[local]) (0.18.0)
Requirement already satisfied: setuptools in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from jsonschema<4,>=2.5.1->docker-compose==1.29.2->sagemaker[local]) (59.4.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from requests<3,>=2.20.0->docker-compose==1.29.2->sagemaker[local]) (2.0.8)
Requirement already satisfied: idna<4,>=2.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from requests<3,>=2.20.0->docker-compose==1.29.2->sagemaker[local]) (3.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from requests<3,>=2.20.0->docker-compose==1.29.2->sagemaker[local]) (2021.10.8)
Requirement already satisfied: pynacl>=1.0.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (1.5.0)
Requirement already satisfied: cryptography>=2.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (36.0.0)
Requirement already satisfied: bcrypt>=3.1.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (3.2.0)
Requirement already satisfied: cffi>=1.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from bcrypt>=3.1.3->paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (1.15.0)
Requirement already satisfied: pycparser in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from cffi>=1.1->bcrypt>=3.1.3->paramiko>=2.4.2->docker~=5.0.0->sagemaker[local]) (2.21)
Building wheels for collected packages: sagemaker
Building wheel for sagemaker (setup.py) ... done
Created wheel for sagemaker: filename=sagemaker-2.99.0-py2.py3-none-any.whl size=756462 sha256=309b5159cfb7f5c739c6159b8bf309bfa7ce28d2ca402296e824f3e84bc837c1
Stored in directory: /home/ec2-user/.cache/pip/wheels/fc/df/14/14b7871f4cf108cfe8891338510d97e28cfe2da00f37114fcf
Successfully built sagemaker
Installing collected packages: botocore, sagemaker
Attempting uninstall: botocore
Found existing installation: botocore 1.24.19
Uninstalling botocore-1.24.19:
Successfully uninstalled botocore-1.24.19
Attempting uninstall: sagemaker
Found existing installation: sagemaker 2.86.2
Uninstalling sagemaker-2.86.2:
Successfully uninstalled sagemaker-2.86.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.22.97 requires botocore==1.24.42, but you have botocore 1.24.46 which is incompatible.
aiobotocore 2.0.1 requires botocore<1.22.9,>=1.22.8, but you have botocore 1.24.46 which is incompatible.
Successfully installed botocore-1.24.46 sagemaker-2.99.0
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.
Instructions for SageMaker Studio environment
Note that SageMaker local
mode will not work in SageMaker Studio environment as it does not have docker service installed on the provided instances.
Create SageMaker local session
SageMaker local session is required for working in a local environment. Let’s create it.
<sagemaker.local.local_session.LocalSession at 0x7f80ac223910>
Prepare SageMaker training script
We will call our training script train_and_serve.py
and place it in our workspace under the /src
folder. Then, we will start with a simple Hello World
message code. After that, we will update and complete our training script as we learn more about the SageMaker scikit-learn
container environment.
script_file_name = "train_and_serve.py"
script_path = local_path + "/src"
script_file = script_path + "/" + script_file_name
print("script_file_name: ", script_file_name)
print("script_path: ", script_path)
print("script_file: ", script_file)
script_file_name: train_and_serve.py
script_path: ./datasets/2022-07-07-sagemaker-script-mode/src
script_file: ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
Now the training script.
Prepare SageMaker SKLearn estimator
To create SKLearn Estimator object we need to pass it following items * entry_point (str)
Path (absolute or relative) to the Python source file, which should be executed as the entry point to training * framework_version (str)
Scikit-learn version you want to use for executing your model training code * role (str)
An AWS IAM role (either name or full ARN) * instance_type (str)
Type of instance to use for training. For local mode use string local
* instance_count (int)
Number of instances to use for training. Since we will train in the local environment and have a single instance, we will use ‘1’ here
You can read more about the SKLearn Estimator class from the official documentation Scikit Learn Estimator
Let’s find the SKLearn framework version.
Note that version number 1.0.1
has to be provided to the SKLearn estimator class as 1.0-1
. Otherwise, you will get the following error message.
ValueError: Unsupported sklearn version: 1.0.1. You may need to upgrade your SDK version (pip install -U sagemaker) for newer sklearn versions. Supported sklearn version(s): 0.20.0, 0.23-1, 1.0-1.
Now let us create the SageMaker SKLearn estimator object and pass our training script to it.
#collapse-output
from sagemaker.sklearn import SKLearn
sk_estimator = SKLearn(
entry_point=script_file,
role=role,
instance_count=1,
instance_type="local",
framework_version="1.0-1"
)
sk_estimator.fit()
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Creating fvm7gkf0bq-algo-1-ju7k8 ...
Creating fvm7gkf0bq-algo-1-ju7k8 ... done
Attaching to fvm7gkf0bq-algo-1-ju7k8
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,041 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,045 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,054 sagemaker_sklearn_container.training INFO Invoking user training script.
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,272 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,284 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,297 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,306 sagemaker-training-toolkit INFO Invoking user script
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | Training Env:
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | {
fvm7gkf0bq-algo-1-ju7k8 | "additional_framework_parameters": {},
fvm7gkf0bq-algo-1-ju7k8 | "channel_input_dirs": {},
fvm7gkf0bq-algo-1-ju7k8 | "current_host": "algo-1-ju7k8",
fvm7gkf0bq-algo-1-ju7k8 | "framework_module": "sagemaker_sklearn_container.training:main",
fvm7gkf0bq-algo-1-ju7k8 | "hosts": [
fvm7gkf0bq-algo-1-ju7k8 | "algo-1-ju7k8"
fvm7gkf0bq-algo-1-ju7k8 | ],
fvm7gkf0bq-algo-1-ju7k8 | "hyperparameters": {},
fvm7gkf0bq-algo-1-ju7k8 | "input_config_dir": "/opt/ml/input/config",
fvm7gkf0bq-algo-1-ju7k8 | "input_data_config": {},
fvm7gkf0bq-algo-1-ju7k8 | "input_dir": "/opt/ml/input",
fvm7gkf0bq-algo-1-ju7k8 | "is_master": true,
fvm7gkf0bq-algo-1-ju7k8 | "job_name": "sagemaker-scikit-learn-2022-07-17-15-22-17-814",
fvm7gkf0bq-algo-1-ju7k8 | "log_level": 20,
fvm7gkf0bq-algo-1-ju7k8 | "master_hostname": "algo-1-ju7k8",
fvm7gkf0bq-algo-1-ju7k8 | "model_dir": "/opt/ml/model",
fvm7gkf0bq-algo-1-ju7k8 | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-22-17-814/source/sourcedir.tar.gz",
fvm7gkf0bq-algo-1-ju7k8 | "module_name": "train_and_serve",
fvm7gkf0bq-algo-1-ju7k8 | "network_interface_name": "eth0",
fvm7gkf0bq-algo-1-ju7k8 | "num_cpus": 2,
fvm7gkf0bq-algo-1-ju7k8 | "num_gpus": 0,
fvm7gkf0bq-algo-1-ju7k8 | "output_data_dir": "/opt/ml/output/data",
fvm7gkf0bq-algo-1-ju7k8 | "output_dir": "/opt/ml/output",
fvm7gkf0bq-algo-1-ju7k8 | "output_intermediate_dir": "/opt/ml/output/intermediate",
fvm7gkf0bq-algo-1-ju7k8 | "resource_config": {
fvm7gkf0bq-algo-1-ju7k8 | "current_host": "algo-1-ju7k8",
fvm7gkf0bq-algo-1-ju7k8 | "hosts": [
fvm7gkf0bq-algo-1-ju7k8 | "algo-1-ju7k8"
fvm7gkf0bq-algo-1-ju7k8 | ]
fvm7gkf0bq-algo-1-ju7k8 | },
fvm7gkf0bq-algo-1-ju7k8 | "user_entry_point": "train_and_serve.py"
fvm7gkf0bq-algo-1-ju7k8 | }
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | Environment variables:
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | SM_HOSTS=["algo-1-ju7k8"]
fvm7gkf0bq-algo-1-ju7k8 | SM_NETWORK_INTERFACE_NAME=eth0
fvm7gkf0bq-algo-1-ju7k8 | SM_HPS={}
fvm7gkf0bq-algo-1-ju7k8 | SM_USER_ENTRY_POINT=train_and_serve.py
fvm7gkf0bq-algo-1-ju7k8 | SM_FRAMEWORK_PARAMS={}
fvm7gkf0bq-algo-1-ju7k8 | SM_RESOURCE_CONFIG={"current_host":"algo-1-ju7k8","hosts":["algo-1-ju7k8"]}
fvm7gkf0bq-algo-1-ju7k8 | SM_INPUT_DATA_CONFIG={}
fvm7gkf0bq-algo-1-ju7k8 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
fvm7gkf0bq-algo-1-ju7k8 | SM_CHANNELS=[]
fvm7gkf0bq-algo-1-ju7k8 | SM_CURRENT_HOST=algo-1-ju7k8
fvm7gkf0bq-algo-1-ju7k8 | SM_MODULE_NAME=train_and_serve
fvm7gkf0bq-algo-1-ju7k8 | SM_LOG_LEVEL=20
fvm7gkf0bq-algo-1-ju7k8 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
fvm7gkf0bq-algo-1-ju7k8 | SM_INPUT_DIR=/opt/ml/input
fvm7gkf0bq-algo-1-ju7k8 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
fvm7gkf0bq-algo-1-ju7k8 | SM_OUTPUT_DIR=/opt/ml/output
fvm7gkf0bq-algo-1-ju7k8 | SM_NUM_CPUS=2
fvm7gkf0bq-algo-1-ju7k8 | SM_NUM_GPUS=0
fvm7gkf0bq-algo-1-ju7k8 | SM_MODEL_DIR=/opt/ml/model
fvm7gkf0bq-algo-1-ju7k8 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-22-17-814/source/sourcedir.tar.gz
fvm7gkf0bq-algo-1-ju7k8 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{},"current_host":"algo-1-ju7k8","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-ju7k8"],"hyperparameters":{},"input_config_dir":"/opt/ml/input/config","input_data_config":{},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-22-17-814","log_level":20,"master_hostname":"algo-1-ju7k8","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-22-17-814/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-ju7k8","hosts":["algo-1-ju7k8"]},"user_entry_point":"train_and_serve.py"}
fvm7gkf0bq-algo-1-ju7k8 | SM_USER_ARGS=[]
fvm7gkf0bq-algo-1-ju7k8 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
fvm7gkf0bq-algo-1-ju7k8 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | Invoking script with the following command:
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | /miniconda3/bin/python train_and_serve.py
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 |
fvm7gkf0bq-algo-1-ju7k8 | *** Hello from the SageMaker script mode***
fvm7gkf0bq-algo-1-ju7k8 | 2022-07-17 15:23:43,332 sagemaker-containers INFO Reporting training SUCCESS
fvm7gkf0bq-algo-1-ju7k8 exited with code 0
Aborting on container exit...
===== Job Complete =====
##
# The estimator will pick a local session when we use instance_type='local'
sk_estimator.sagemaker_session
<sagemaker.local.local_session.LocalSession at 0x7f80ac53da90>
When you first run the SKLearn estimator, executing it may take some time as it has to download the scikit-learn container to the local docker environment. You will get the container logs in the output when the container completes the execution. The logs show that the container has successfully run the training script, and the hello
message is also printed. But there is a lot more information available in the logs. We will discuss it in the coming section.
Understanding SKLearn container output and environment varaibles
From the SKLearn estimator output, we can see that our train_and_serve.py
script is executed by the container with the following command.
/miniconda3/bin/python train_and_serve.py
Inspecting SageMaker SKLearn docker image
Since the container was executed in the local environment, we can also inspect the SageMaker SKLearn local image.
REPOSITORY TAG IMAGE ID CREATED SIZE
683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn 1.0-1-cpu-py3 8a6ea8272ad0 10 days ago 3.7GB
Let’s also inspect the docker image. Notice multiple container environment variables and their default values in the output.
[
{
"Id": "sha256:8a6ea8272ad003ec816569b0f879b16c770116584301161565f065aadb99436c",
"RepoTags": [
"683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.0-1-cpu-py3"
],
"RepoDigests": [
"683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn@sha256:fc8c3a617ff0e436c25f3b64d03e1f485f1d159478c26757f3d1d267fc849445"
],
"Parent": "",
"Comment": "",
"Created": "2022-07-06T18:55:02.854297671Z",
"Container": "11b9a5fec2d61294aee63e549100ed18ceb7aa0de6a4ff198da2f556dfe3ec2f",
"ContainerConfig": {
"Hostname": "11b9a5fec2d6",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"8080/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"PYTHONDONTWRITEBYTECODE=1",
"PYTHONUNBUFFERED=1",
"PYTHONIOENCODING=UTF-8",
"LANG=C.UTF-8",
"LC_ALL=C.UTF-8",
"SAGEMAKER_SKLEARN_VERSION=1.0-1",
"SAGEMAKER_TRAINING_MODULE=sagemaker_sklearn_container.training:main",
"SAGEMAKER_SERVING_MODULE=sagemaker_sklearn_container.serving:main",
"SKLEARN_MMS_CONFIG=/home/model-server/config.properties",
"SM_INPUT=/opt/ml/input",
"SM_INPUT_TRAINING_CONFIG_FILE=/opt/ml/input/config/hyperparameters.json",
"SM_INPUT_DATA_CONFIG_FILE=/opt/ml/input/config/inputdataconfig.json",
"SM_CHECKPOINT_CONFIG_FILE=/opt/ml/input/config/checkpointconfig.json",
"SM_MODEL_DIR=/opt/ml/model",
"TEMP=/home/model-server/tmp"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"LABEL transform_id=9be8b540-703b-4ecd-a127-c37333a0dcec_sagemaker-scikit-learn-1_0"
],
"Image": "sha256:58b15b990d550868caed6f885423deee97a6c7f525c228a043096bf28e775d18",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": {
"TRANSFORM_TYPE": "Aggregate-1.0",
"VERSION_SET_NAME": "SMFrameworksSKLearn/release-cdk",
"VERSION_SET_REVISION": "6086988568",
"com.amazonaws.sagemaker.capabilities.accept-bind-to-port": "true",
"com.amazonaws.sagemaker.capabilities.multi-models": "true",
"transform_id": "9be8b540-703b-4ecd-a127-c37333a0dcec_sagemaker-scikit-learn-1_0"
}
},
"DockerVersion": "20.10.15",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"8080/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"PYTHONDONTWRITEBYTECODE=1",
"PYTHONUNBUFFERED=1",
"PYTHONIOENCODING=UTF-8",
"LANG=C.UTF-8",
"LC_ALL=C.UTF-8",
"SAGEMAKER_SKLEARN_VERSION=1.0-1",
"SAGEMAKER_TRAINING_MODULE=sagemaker_sklearn_container.training:main",
"SAGEMAKER_SERVING_MODULE=sagemaker_sklearn_container.serving:main",
"SKLEARN_MMS_CONFIG=/home/model-server/config.properties",
"SM_INPUT=/opt/ml/input",
"SM_INPUT_TRAINING_CONFIG_FILE=/opt/ml/input/config/hyperparameters.json",
"SM_INPUT_DATA_CONFIG_FILE=/opt/ml/input/config/inputdataconfig.json",
"SM_CHECKPOINT_CONFIG_FILE=/opt/ml/input/config/checkpointconfig.json",
"SM_MODEL_DIR=/opt/ml/model",
"TEMP=/home/model-server/tmp"
],
"Cmd": [
"bash"
],
"Image": "sha256:58b15b990d550868caed6f885423deee97a6c7f525c228a043096bf28e775d18",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": {
"TRANSFORM_TYPE": "Aggregate-1.0",
"VERSION_SET_NAME": "SMFrameworksSKLearn/release-cdk",
"VERSION_SET_REVISION": "6086988568",
"com.amazonaws.sagemaker.capabilities.accept-bind-to-port": "true",
"com.amazonaws.sagemaker.capabilities.multi-models": "true",
"transform_id": "9be8b540-703b-4ecd-a127-c37333a0dcec_sagemaker-scikit-learn-1_0"
}
},
"Architecture": "amd64",
"Os": "linux",
"Size": 3699696670,
"VirtualSize": 3699696670,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/01a97258168fa360e9f6aa63ac0c6b2417c0ea0ebe888123edad87eb4a646765/diff:/var/lib/docker/overlay2/3b85b71e8fe52c7a27ae71ed492ff72c7e430cccdeea17046e2a361e8d7fd960/diff:/var/lib/docker/overlay2/7de8e16dd696c868ffd028a3ba1f1a80ef04237b9323229e578bc5e3aa6a29d7/diff:/var/lib/docker/overlay2/5eeb27014ab7ac7a894efdbb166d8a87fb9d4b8b739eccd82546ad6a2b53aa70/diff:/var/lib/docker/overlay2/bbd9a81a7aa5bf4c79e81ecf47670a3f8c098eee9c6682f36f88ec52db8e1946/diff:/var/lib/docker/overlay2/eb0e7f3a5bd45c1d611e4c37ba641d1e978043954312da5908fd4003c41c7e7d/diff:/var/lib/docker/overlay2/3daaedc78711e353befc51544a944ad35954327325d056094f445502bf65ce53/diff:/var/lib/docker/overlay2/9dd41e3edfb9d8f852732a968a7b179ca811e0f9d55614a0b193de753fc6aca0/diff:/var/lib/docker/overlay2/ede189a574c79eebc565041a44ebf8b586247a36a99fe3ff9588b8c940783498/diff:/var/lib/docker/overlay2/6b1d78a9c074a42d78650406b90b7b4f51eb31660a7b1e2dcc6d73cc43d29b6b/diff:/var/lib/docker/overlay2/3e0420f6740f876c9355d526cbdedd9ebde5be94ddf0d93d7dadd4f34cae351b/diff:/var/lib/docker/overlay2/de1a2da7ee1b5d9a1b4e5c3dd1adff213185dde7e1212db96c0435e512f50701/diff:/var/lib/docker/overlay2/bebca69aef394f0553634413c7875eb58228c7e6359a305a7501705e75c2b58b/diff:/var/lib/docker/overlay2/8a410db2a038a175ee6ddfb005383f8776c80b1b1901f5d2feedfc8d837ffa40/diff:/var/lib/docker/overlay2/6f6686a8cb3ccf47b214854717cbe33ba777e0985200e3d7b7f761f99231b274/diff:/var/lib/docker/overlay2/ad8b24fa9173d28a83284e4f31d830f1b3d9fe30a3fcc8cbb37895ec2fded7bf/diff:/var/lib/docker/overlay2/e8b0842f0da5b0dbb5076e350bfe1a70ef291546bbbf207fe1f90ae7ccd64517/diff",
"MergedDir": "/var/lib/docker/overlay2/632d2d4d01646bd8be2ec147edc70eb44f59fb262aa12b217fd560c464edd4cb/merged",
"UpperDir": "/var/lib/docker/overlay2/632d2d4d01646bd8be2ec147edc70eb44f59fb262aa12b217fd560c464edd4cb/diff",
"WorkDir": "/var/lib/docker/overlay2/632d2d4d01646bd8be2ec147edc70eb44f59fb262aa12b217fd560c464edd4cb/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:1dc52a6b4de8561423dd3ec5a1f7f77f5309fd8cb340f80b8bc3d87fa112003e",
"sha256:b13a10ce059365d68a2113e9dbcac05b17b51f181615fca6d717a0dcf9ba8ffb",
"sha256:790d00cf365a312488151b354f0b0ae826be031edffb8a4de6a1fab048774dc7",
"sha256:323e43c53a1cd5abbd55437588f19da04f716452bc6d05486759b35f3e485390",
"sha256:c99c9d462af0bac5511ed046178ab0de79b8cdad33cd85246e9f661e098426cd",
"sha256:4a3a4d9fb4d250b1b64629b23bc0a477a45ee2659a8410d59a31a181dad70002",
"sha256:27b35f432a27e5e275038e559ebbe1aa7e91447bf417f5da01e3326739ba9366",
"sha256:ee12325fe0b7e7930b76d9a3dc81fcc37fa51a3267b311d2ed7c38703f193d75",
"sha256:7ceb40593535cdc07299efa2ce3a2c2267c2fa683161515fd6ab97f733492bf0",
"sha256:f18dbe0eec054f0aedf54a94aa29dab0d2c0f3d920fb482c99819622b0094f47",
"sha256:df2a7845ea611463f9f3282ccb45156ba883f40b15013ee49bd0a569301738d8",
"sha256:bcbd5416b87e3e37e05c22e46cbff2e3503d9caa0ec283a44931dc63e51c8cb7",
"sha256:5bcbb3ccae766c8a72d98ce494500bfd44c32e5780a1cb153139a4c5c143a8d5",
"sha256:4ecc8a8ffa902f3ea9bebb8d610e02a32ce1ca94c1a3160a31da98b73c1f55a0",
"sha256:a7a7b8b26735eb2d137fd0f91b83c73ad48cf2c4b83e9d0cadece410d6e598ba",
"sha256:ae939a0c9d32674ad6674947853ecfda4ff0530a8137960064448ae5e45fa1c5",
"sha256:6948f39c8f3cf6ec104734ccd1112fcb4af85a7c26c9c3d43495494b9b799f25",
"sha256:affd18c8e88f35e75bd02158e0418f3aeb4eec4269a208ede24cc829fa88c850"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
Pass hyperparameters to SKLearn estimator
Let’s pass some dummy hyperparameters to the estimator and see how it affects the output.
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file,
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)
sk_estimator.fit()
Creating kc4ahx6e84-algo-1-8m8ve ...
Creating kc4ahx6e84-algo-1-8m8ve ... done
Attaching to kc4ahx6e84-algo-1-8m8ve
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,385 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,389 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,398 sagemaker_sklearn_container.training INFO Invoking user training script.
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,595 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,608 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,621 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,630 sagemaker-training-toolkit INFO Invoking user script
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | Training Env:
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | {
kc4ahx6e84-algo-1-8m8ve | "additional_framework_parameters": {},
kc4ahx6e84-algo-1-8m8ve | "channel_input_dirs": {},
kc4ahx6e84-algo-1-8m8ve | "current_host": "algo-1-8m8ve",
kc4ahx6e84-algo-1-8m8ve | "framework_module": "sagemaker_sklearn_container.training:main",
kc4ahx6e84-algo-1-8m8ve | "hosts": [
kc4ahx6e84-algo-1-8m8ve | "algo-1-8m8ve"
kc4ahx6e84-algo-1-8m8ve | ],
kc4ahx6e84-algo-1-8m8ve | "hyperparameters": {
kc4ahx6e84-algo-1-8m8ve | "dummy_param_1": "val1",
kc4ahx6e84-algo-1-8m8ve | "dummy_param_2": "val2"
kc4ahx6e84-algo-1-8m8ve | },
kc4ahx6e84-algo-1-8m8ve | "input_config_dir": "/opt/ml/input/config",
kc4ahx6e84-algo-1-8m8ve | "input_data_config": {},
kc4ahx6e84-algo-1-8m8ve | "input_dir": "/opt/ml/input",
kc4ahx6e84-algo-1-8m8ve | "is_master": true,
kc4ahx6e84-algo-1-8m8ve | "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-44-284",
kc4ahx6e84-algo-1-8m8ve | "log_level": 20,
kc4ahx6e84-algo-1-8m8ve | "master_hostname": "algo-1-8m8ve",
kc4ahx6e84-algo-1-8m8ve | "model_dir": "/opt/ml/model",
kc4ahx6e84-algo-1-8m8ve | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-44-284/source/sourcedir.tar.gz",
kc4ahx6e84-algo-1-8m8ve | "module_name": "train_and_serve",
kc4ahx6e84-algo-1-8m8ve | "network_interface_name": "eth0",
kc4ahx6e84-algo-1-8m8ve | "num_cpus": 2,
kc4ahx6e84-algo-1-8m8ve | "num_gpus": 0,
kc4ahx6e84-algo-1-8m8ve | "output_data_dir": "/opt/ml/output/data",
kc4ahx6e84-algo-1-8m8ve | "output_dir": "/opt/ml/output",
kc4ahx6e84-algo-1-8m8ve | "output_intermediate_dir": "/opt/ml/output/intermediate",
kc4ahx6e84-algo-1-8m8ve | "resource_config": {
kc4ahx6e84-algo-1-8m8ve | "current_host": "algo-1-8m8ve",
kc4ahx6e84-algo-1-8m8ve | "hosts": [
kc4ahx6e84-algo-1-8m8ve | "algo-1-8m8ve"
kc4ahx6e84-algo-1-8m8ve | ]
kc4ahx6e84-algo-1-8m8ve | },
kc4ahx6e84-algo-1-8m8ve | "user_entry_point": "train_and_serve.py"
kc4ahx6e84-algo-1-8m8ve | }
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | Environment variables:
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | SM_HOSTS=["algo-1-8m8ve"]
kc4ahx6e84-algo-1-8m8ve | SM_NETWORK_INTERFACE_NAME=eth0
kc4ahx6e84-algo-1-8m8ve | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
kc4ahx6e84-algo-1-8m8ve | SM_USER_ENTRY_POINT=train_and_serve.py
kc4ahx6e84-algo-1-8m8ve | SM_FRAMEWORK_PARAMS={}
kc4ahx6e84-algo-1-8m8ve | SM_RESOURCE_CONFIG={"current_host":"algo-1-8m8ve","hosts":["algo-1-8m8ve"]}
kc4ahx6e84-algo-1-8m8ve | SM_INPUT_DATA_CONFIG={}
kc4ahx6e84-algo-1-8m8ve | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
kc4ahx6e84-algo-1-8m8ve | SM_CHANNELS=[]
kc4ahx6e84-algo-1-8m8ve | SM_CURRENT_HOST=algo-1-8m8ve
kc4ahx6e84-algo-1-8m8ve | SM_MODULE_NAME=train_and_serve
kc4ahx6e84-algo-1-8m8ve | SM_LOG_LEVEL=20
kc4ahx6e84-algo-1-8m8ve | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
kc4ahx6e84-algo-1-8m8ve | SM_INPUT_DIR=/opt/ml/input
kc4ahx6e84-algo-1-8m8ve | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
kc4ahx6e84-algo-1-8m8ve | SM_OUTPUT_DIR=/opt/ml/output
kc4ahx6e84-algo-1-8m8ve | SM_NUM_CPUS=2
kc4ahx6e84-algo-1-8m8ve | SM_NUM_GPUS=0
kc4ahx6e84-algo-1-8m8ve | SM_MODEL_DIR=/opt/ml/model
kc4ahx6e84-algo-1-8m8ve | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-44-284/source/sourcedir.tar.gz
kc4ahx6e84-algo-1-8m8ve | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{},"current_host":"algo-1-8m8ve","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-8m8ve"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-44-284","log_level":20,"master_hostname":"algo-1-8m8ve","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-44-284/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-8m8ve","hosts":["algo-1-8m8ve"]},"user_entry_point":"train_and_serve.py"}
kc4ahx6e84-algo-1-8m8ve | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
kc4ahx6e84-algo-1-8m8ve | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
kc4ahx6e84-algo-1-8m8ve | SM_HP_DUMMY_PARAM_1=val1
kc4ahx6e84-algo-1-8m8ve | SM_HP_DUMMY_PARAM_2=val2
kc4ahx6e84-algo-1-8m8ve | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | Invoking script with the following command:
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve |
kc4ahx6e84-algo-1-8m8ve | *** Hello from the SageMaker script mode***
kc4ahx6e84-algo-1-8m8ve | 2022-07-17 15:23:46,657 sagemaker-containers INFO Reporting training SUCCESS
kc4ahx6e84-algo-1-8m8ve exited with code 0
Aborting on container exit...
===== Job Complete =====
From the output we can see that our hyperparameters are passed to our training script as command line arguments. This is an important point and we will update our script using this information.
SageMaker SKLearn container environment variables
Let’s now discuss some important environment variables we see in the output.
SM_MODULE_DIR
SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-13-13-05-48-675/source/sourcedir.tar.gz
SM_MODULE_DIR
points to a location in the S3 bucket where SageMaker will automatically backup our source code for that particular run. SageMaker will create a separate folder in the default bucket for each new run. The default value is s3://sagemaker-{aws-region}-{aws-id}/{training-job-name}/source/sourcedir.tar.gz
Note: We have used local_code
for the SKLean estimator, then why is the source code backed up on the S3 bucket. Should it not be backed on the local system and bypass S3 altogether in local mode? Well, this should have been the default behavior, but it looks like SageMaker SDK is doing it otherwise, and even with the local mode it is using the S3 bucket for keeping source code. You can read more about this behavior in this issue ticket Model repack always uploads data to S3 bucket regardless of local mode settings
SM_MODEL_DIR
SM_MODEL_DIR=/opt/ml/model
SM_MODEL_DIR
points to a directory located inside the container. When the training job finishes, the container and its file system will be deleted, except for the /opt/ml/model
and /opt/ml/output
directories. Use /opt/ml/model
to save the trained model artifacts. These artifacts are uploaded to S3 for model hosting.
SM_OUTPUT_DATA_DIR
SM_OUTPUT_DIR=/opt/ml/output
SM_OUTPUT_DIR
points to a directory in the container to write output artifacts. Output artifacts may include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed and uploaded to S3 to the same S3 prefix as the model artifacts.
SM_CHANNELS
SM_CHANNELS='["testing","training"]'
A channel is a named input source that training algorithms can consume. You can partition your training data into different logical “channels” when you run training. Depending on your problem, some common channel ideas are: “training”, “testing”, “evaluation” or “images” and “labels”. You can read more about the channels from SageMaker API reference Channel
SM CHANNEL {channel_name}
SM_CHANNEL_TRAIN='/opt/ml/input/data/train'
SM_CHANNEL_TEST='/opt/ml/input/data/test'
Suppose that you have passed two input channels, ‘train’ and ‘test’, to the Scikit-learn estimator’s fit()
method, the following will be set, following the format SM_CHANNEL_[channel_name]
: * SM_CHANNEL_TRAIN
: it points to the directory in the container that has the train channel data downloaded * SM_CHANNEL_TEST
: Same as above, but for the test channel
Note that the channel names train
and test
are the conventions. Still, you can use any name here, and the environment variables will be created accordingly. It is important to know that the SageMaker container automatically downloads the data from the provided input channels and makes them available in the respective local directories once it starts executing. The training script can then load the data from the local container directories.
There are more environment variables available, and you can read about them from Environment variables
Pass input channel to SKLearn estimator
Now that we understand the SKLearn container environment more let’s pass the training data channel to the estimator and see if the data becomes available inside the container directory.
Update our script to list all the files in the SM_CHANNEL_TRAIN
directory.
%%writefile $script_file
import argparse, os, sys
if __name__ == "__main__":
print(" *** Hello from SageMaker script container *** ")
training_dir = os.environ.get("SM_CHANNEL_TRAIN")
dir_list = os.listdir(training_dir)
print("training_dir files list: ", dir_list)
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file,
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)
sk_estimator.fit({"train": f"file://{local_train_path}"})
Creating wp2g5fxyg1-algo-1-o05g1 ...
Creating wp2g5fxyg1-algo-1-o05g1 ... done
Attaching to wp2g5fxyg1-algo-1-o05g1
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,444 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,447 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,456 sagemaker_sklearn_container.training INFO Invoking user training script.
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,638 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,653 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,667 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,676 sagemaker-training-toolkit INFO Invoking user script
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | Training Env:
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | {
wp2g5fxyg1-algo-1-o05g1 | "additional_framework_parameters": {},
wp2g5fxyg1-algo-1-o05g1 | "channel_input_dirs": {
wp2g5fxyg1-algo-1-o05g1 | "train": "/opt/ml/input/data/train"
wp2g5fxyg1-algo-1-o05g1 | },
wp2g5fxyg1-algo-1-o05g1 | "current_host": "algo-1-o05g1",
wp2g5fxyg1-algo-1-o05g1 | "framework_module": "sagemaker_sklearn_container.training:main",
wp2g5fxyg1-algo-1-o05g1 | "hosts": [
wp2g5fxyg1-algo-1-o05g1 | "algo-1-o05g1"
wp2g5fxyg1-algo-1-o05g1 | ],
wp2g5fxyg1-algo-1-o05g1 | "hyperparameters": {
wp2g5fxyg1-algo-1-o05g1 | "dummy_param_1": "val1",
wp2g5fxyg1-algo-1-o05g1 | "dummy_param_2": "val2"
wp2g5fxyg1-algo-1-o05g1 | },
wp2g5fxyg1-algo-1-o05g1 | "input_config_dir": "/opt/ml/input/config",
wp2g5fxyg1-algo-1-o05g1 | "input_data_config": {
wp2g5fxyg1-algo-1-o05g1 | "train": {
wp2g5fxyg1-algo-1-o05g1 | "TrainingInputMode": "File"
wp2g5fxyg1-algo-1-o05g1 | }
wp2g5fxyg1-algo-1-o05g1 | },
wp2g5fxyg1-algo-1-o05g1 | "input_dir": "/opt/ml/input",
wp2g5fxyg1-algo-1-o05g1 | "is_master": true,
wp2g5fxyg1-algo-1-o05g1 | "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-47-051",
wp2g5fxyg1-algo-1-o05g1 | "log_level": 20,
wp2g5fxyg1-algo-1-o05g1 | "master_hostname": "algo-1-o05g1",
wp2g5fxyg1-algo-1-o05g1 | "model_dir": "/opt/ml/model",
wp2g5fxyg1-algo-1-o05g1 | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-47-051/source/sourcedir.tar.gz",
wp2g5fxyg1-algo-1-o05g1 | "module_name": "train_and_serve",
wp2g5fxyg1-algo-1-o05g1 | "network_interface_name": "eth0",
wp2g5fxyg1-algo-1-o05g1 | "num_cpus": 2,
wp2g5fxyg1-algo-1-o05g1 | "num_gpus": 0,
wp2g5fxyg1-algo-1-o05g1 | "output_data_dir": "/opt/ml/output/data",
wp2g5fxyg1-algo-1-o05g1 | "output_dir": "/opt/ml/output",
wp2g5fxyg1-algo-1-o05g1 | "output_intermediate_dir": "/opt/ml/output/intermediate",
wp2g5fxyg1-algo-1-o05g1 | "resource_config": {
wp2g5fxyg1-algo-1-o05g1 | "current_host": "algo-1-o05g1",
wp2g5fxyg1-algo-1-o05g1 | "hosts": [
wp2g5fxyg1-algo-1-o05g1 | "algo-1-o05g1"
wp2g5fxyg1-algo-1-o05g1 | ]
wp2g5fxyg1-algo-1-o05g1 | },
wp2g5fxyg1-algo-1-o05g1 | "user_entry_point": "train_and_serve.py"
wp2g5fxyg1-algo-1-o05g1 | }
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | Environment variables:
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | SM_HOSTS=["algo-1-o05g1"]
wp2g5fxyg1-algo-1-o05g1 | SM_NETWORK_INTERFACE_NAME=eth0
wp2g5fxyg1-algo-1-o05g1 | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
wp2g5fxyg1-algo-1-o05g1 | SM_USER_ENTRY_POINT=train_and_serve.py
wp2g5fxyg1-algo-1-o05g1 | SM_FRAMEWORK_PARAMS={}
wp2g5fxyg1-algo-1-o05g1 | SM_RESOURCE_CONFIG={"current_host":"algo-1-o05g1","hosts":["algo-1-o05g1"]}
wp2g5fxyg1-algo-1-o05g1 | SM_INPUT_DATA_CONFIG={"train":{"TrainingInputMode":"File"}}
wp2g5fxyg1-algo-1-o05g1 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
wp2g5fxyg1-algo-1-o05g1 | SM_CHANNELS=["train"]
wp2g5fxyg1-algo-1-o05g1 | SM_CURRENT_HOST=algo-1-o05g1
wp2g5fxyg1-algo-1-o05g1 | SM_MODULE_NAME=train_and_serve
wp2g5fxyg1-algo-1-o05g1 | SM_LOG_LEVEL=20
wp2g5fxyg1-algo-1-o05g1 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
wp2g5fxyg1-algo-1-o05g1 | SM_INPUT_DIR=/opt/ml/input
wp2g5fxyg1-algo-1-o05g1 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
wp2g5fxyg1-algo-1-o05g1 | SM_OUTPUT_DIR=/opt/ml/output
wp2g5fxyg1-algo-1-o05g1 | SM_NUM_CPUS=2
wp2g5fxyg1-algo-1-o05g1 | SM_NUM_GPUS=0
wp2g5fxyg1-algo-1-o05g1 | SM_MODEL_DIR=/opt/ml/model
wp2g5fxyg1-algo-1-o05g1 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-47-051/source/sourcedir.tar.gz
wp2g5fxyg1-algo-1-o05g1 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1-o05g1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-o05g1"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-47-051","log_level":20,"master_hostname":"algo-1-o05g1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-47-051/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-o05g1","hosts":["algo-1-o05g1"]},"user_entry_point":"train_and_serve.py"}
wp2g5fxyg1-algo-1-o05g1 | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
wp2g5fxyg1-algo-1-o05g1 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
wp2g5fxyg1-algo-1-o05g1 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
wp2g5fxyg1-algo-1-o05g1 | SM_HP_DUMMY_PARAM_1=val1
wp2g5fxyg1-algo-1-o05g1 | SM_HP_DUMMY_PARAM_2=val2
wp2g5fxyg1-algo-1-o05g1 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | Invoking script with the following command:
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 |
wp2g5fxyg1-algo-1-o05g1 | *** Hello from SageMaker script container ***
wp2g5fxyg1-algo-1-o05g1 | training_dir files list: ['train.csv']
wp2g5fxyg1-algo-1-o05g1 | 2022-07-17 15:23:49,715 sagemaker-containers INFO Reporting training SUCCESS
wp2g5fxyg1-algo-1-o05g1 exited with code 0
Aborting on container exit...
===== Job Complete =====
From the output, we can see that train.csv
, which was in our local environment, is now available inside the container on path SM_CHANNEL_TRAIN=/opt/ml/input/data/train
.
Let’s also test the same with our training data on the S3 bucket.
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file,
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)
sk_estimator.fit({"train": s3_train_uri})
Creating 7ao431iiu5-algo-1-9jid1 ...
Creating 7ao431iiu5-algo-1-9jid1 ... done
Attaching to 7ao431iiu5-algo-1-9jid1
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,073 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,079 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,094 sagemaker_sklearn_container.training INFO Invoking user training script.
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,335 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,348 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,360 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,369 sagemaker-training-toolkit INFO Invoking user script
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | Training Env:
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | {
7ao431iiu5-algo-1-9jid1 | "additional_framework_parameters": {},
7ao431iiu5-algo-1-9jid1 | "channel_input_dirs": {
7ao431iiu5-algo-1-9jid1 | "train": "/opt/ml/input/data/train"
7ao431iiu5-algo-1-9jid1 | },
7ao431iiu5-algo-1-9jid1 | "current_host": "algo-1-9jid1",
7ao431iiu5-algo-1-9jid1 | "framework_module": "sagemaker_sklearn_container.training:main",
7ao431iiu5-algo-1-9jid1 | "hosts": [
7ao431iiu5-algo-1-9jid1 | "algo-1-9jid1"
7ao431iiu5-algo-1-9jid1 | ],
7ao431iiu5-algo-1-9jid1 | "hyperparameters": {
7ao431iiu5-algo-1-9jid1 | "dummy_param_1": "val1",
7ao431iiu5-algo-1-9jid1 | "dummy_param_2": "val2"
7ao431iiu5-algo-1-9jid1 | },
7ao431iiu5-algo-1-9jid1 | "input_config_dir": "/opt/ml/input/config",
7ao431iiu5-algo-1-9jid1 | "input_data_config": {
7ao431iiu5-algo-1-9jid1 | "train": {
7ao431iiu5-algo-1-9jid1 | "TrainingInputMode": "File"
7ao431iiu5-algo-1-9jid1 | }
7ao431iiu5-algo-1-9jid1 | },
7ao431iiu5-algo-1-9jid1 | "input_dir": "/opt/ml/input",
7ao431iiu5-algo-1-9jid1 | "is_master": true,
7ao431iiu5-algo-1-9jid1 | "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-50-077",
7ao431iiu5-algo-1-9jid1 | "log_level": 20,
7ao431iiu5-algo-1-9jid1 | "master_hostname": "algo-1-9jid1",
7ao431iiu5-algo-1-9jid1 | "model_dir": "/opt/ml/model",
7ao431iiu5-algo-1-9jid1 | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-50-077/source/sourcedir.tar.gz",
7ao431iiu5-algo-1-9jid1 | "module_name": "train_and_serve",
7ao431iiu5-algo-1-9jid1 | "network_interface_name": "eth0",
7ao431iiu5-algo-1-9jid1 | "num_cpus": 2,
7ao431iiu5-algo-1-9jid1 | "num_gpus": 0,
7ao431iiu5-algo-1-9jid1 | "output_data_dir": "/opt/ml/output/data",
7ao431iiu5-algo-1-9jid1 | "output_dir": "/opt/ml/output",
7ao431iiu5-algo-1-9jid1 | "output_intermediate_dir": "/opt/ml/output/intermediate",
7ao431iiu5-algo-1-9jid1 | "resource_config": {
7ao431iiu5-algo-1-9jid1 | "current_host": "algo-1-9jid1",
7ao431iiu5-algo-1-9jid1 | "hosts": [
7ao431iiu5-algo-1-9jid1 | "algo-1-9jid1"
7ao431iiu5-algo-1-9jid1 | ]
7ao431iiu5-algo-1-9jid1 | },
7ao431iiu5-algo-1-9jid1 | "user_entry_point": "train_and_serve.py"
7ao431iiu5-algo-1-9jid1 | }
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | Environment variables:
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | SM_HOSTS=["algo-1-9jid1"]
7ao431iiu5-algo-1-9jid1 | SM_NETWORK_INTERFACE_NAME=eth0
7ao431iiu5-algo-1-9jid1 | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
7ao431iiu5-algo-1-9jid1 | SM_USER_ENTRY_POINT=train_and_serve.py
7ao431iiu5-algo-1-9jid1 | SM_FRAMEWORK_PARAMS={}
7ao431iiu5-algo-1-9jid1 | SM_RESOURCE_CONFIG={"current_host":"algo-1-9jid1","hosts":["algo-1-9jid1"]}
7ao431iiu5-algo-1-9jid1 | SM_INPUT_DATA_CONFIG={"train":{"TrainingInputMode":"File"}}
7ao431iiu5-algo-1-9jid1 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
7ao431iiu5-algo-1-9jid1 | SM_CHANNELS=["train"]
7ao431iiu5-algo-1-9jid1 | SM_CURRENT_HOST=algo-1-9jid1
7ao431iiu5-algo-1-9jid1 | SM_MODULE_NAME=train_and_serve
7ao431iiu5-algo-1-9jid1 | SM_LOG_LEVEL=20
7ao431iiu5-algo-1-9jid1 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
7ao431iiu5-algo-1-9jid1 | SM_INPUT_DIR=/opt/ml/input
7ao431iiu5-algo-1-9jid1 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
7ao431iiu5-algo-1-9jid1 | SM_OUTPUT_DIR=/opt/ml/output
7ao431iiu5-algo-1-9jid1 | SM_NUM_CPUS=2
7ao431iiu5-algo-1-9jid1 | SM_NUM_GPUS=0
7ao431iiu5-algo-1-9jid1 | SM_MODEL_DIR=/opt/ml/model
7ao431iiu5-algo-1-9jid1 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-50-077/source/sourcedir.tar.gz
7ao431iiu5-algo-1-9jid1 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1-9jid1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-9jid1"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-50-077","log_level":20,"master_hostname":"algo-1-9jid1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-50-077/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-9jid1","hosts":["algo-1-9jid1"]},"user_entry_point":"train_and_serve.py"}
7ao431iiu5-algo-1-9jid1 | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
7ao431iiu5-algo-1-9jid1 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
7ao431iiu5-algo-1-9jid1 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
7ao431iiu5-algo-1-9jid1 | SM_HP_DUMMY_PARAM_1=val1
7ao431iiu5-algo-1-9jid1 | SM_HP_DUMMY_PARAM_2=val2
7ao431iiu5-algo-1-9jid1 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | Invoking script with the following command:
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 |
7ao431iiu5-algo-1-9jid1 | *** Hello from SageMaker script container ***
7ao431iiu5-algo-1-9jid1 | training_dir files list: ['train.csv']
7ao431iiu5-algo-1-9jid1 | 2022-07-17 15:23:53,409 sagemaker-containers INFO Reporting training SUCCESS
7ao431iiu5-algo-1-9jid1 exited with code 0
Aborting on container exit...
===== Job Complete =====
Again the results are the same. SageMaker will download the data from the S3 bucket and make it available in the container. In the environment variables section we also learned that two directories are special /opt/ml/model
and /opt/ml/output
. Container environment variables SM_MODEL_DIR
and SM_OUTPUT_DATA_DIR
point to them, respectively. Whatever artifacts we put on them will be stored on the S3 bucket when the training job finishes. “SM_MODEL_DIR” is for trained models, and “SM_OUTPUT_DATA_DIR” is for other artifacts like logs, graphs, plots, results, etc. Let’s update our training script and put some dummy data in these directories. Once the job is complete, we will verify the stored artifacts on the S3 bucket.
%%writefile $script_file
import argparse, os, sys
if __name__ == "__main__":
print(" *** Hello from SageMaker script container *** ")
# list files in SM_CHANNEL_TRAIN
training_dir = os.environ.get("SM_CHANNEL_TRAIN")
dir_list = os.listdir(training_dir)
print("training_dir files list: ", dir_list)
# write dummy model file to SM_MODEL_DIR
sm_model_dir = os.environ.get("SM_MODEL_DIR")
with open(f"{sm_model_dir}/dummy-model.txt", "w") as f:
f.write("this is a dummy model")
# write dummy artifact file to SM_OUTPUT_DATA_DIR
sm_output_data_dir = os.environ.get("SM_OUTPUT_DATA_DIR")
with open(f"{sm_output_data_dir}/dummy-output-data.txt", "w") as f:
f.write("this is a dummy output data")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file,
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"dummy_param_1":"val1","dummy_param_2":"val2"},
)
sk_estimator.fit({"train": s3_train_uri})
Creating c30093mavu-algo-1-p87y9 ...
Creating c30093mavu-algo-1-p87y9 ... done
Attaching to c30093mavu-algo-1-p87y9
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,051 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,055 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,065 sagemaker_sklearn_container.training INFO Invoking user training script.
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,251 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,267 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,281 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,291 sagemaker-training-toolkit INFO Invoking user script
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | Training Env:
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | {
c30093mavu-algo-1-p87y9 | "additional_framework_parameters": {},
c30093mavu-algo-1-p87y9 | "channel_input_dirs": {
c30093mavu-algo-1-p87y9 | "train": "/opt/ml/input/data/train"
c30093mavu-algo-1-p87y9 | },
c30093mavu-algo-1-p87y9 | "current_host": "algo-1-p87y9",
c30093mavu-algo-1-p87y9 | "framework_module": "sagemaker_sklearn_container.training:main",
c30093mavu-algo-1-p87y9 | "hosts": [
c30093mavu-algo-1-p87y9 | "algo-1-p87y9"
c30093mavu-algo-1-p87y9 | ],
c30093mavu-algo-1-p87y9 | "hyperparameters": {
c30093mavu-algo-1-p87y9 | "dummy_param_1": "val1",
c30093mavu-algo-1-p87y9 | "dummy_param_2": "val2"
c30093mavu-algo-1-p87y9 | },
c30093mavu-algo-1-p87y9 | "input_config_dir": "/opt/ml/input/config",
c30093mavu-algo-1-p87y9 | "input_data_config": {
c30093mavu-algo-1-p87y9 | "train": {
c30093mavu-algo-1-p87y9 | "TrainingInputMode": "File"
c30093mavu-algo-1-p87y9 | }
c30093mavu-algo-1-p87y9 | },
c30093mavu-algo-1-p87y9 | "input_dir": "/opt/ml/input",
c30093mavu-algo-1-p87y9 | "is_master": true,
c30093mavu-algo-1-p87y9 | "job_name": "sagemaker-scikit-learn-2022-07-17-15-23-53-775",
c30093mavu-algo-1-p87y9 | "log_level": 20,
c30093mavu-algo-1-p87y9 | "master_hostname": "algo-1-p87y9",
c30093mavu-algo-1-p87y9 | "model_dir": "/opt/ml/model",
c30093mavu-algo-1-p87y9 | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz",
c30093mavu-algo-1-p87y9 | "module_name": "train_and_serve",
c30093mavu-algo-1-p87y9 | "network_interface_name": "eth0",
c30093mavu-algo-1-p87y9 | "num_cpus": 2,
c30093mavu-algo-1-p87y9 | "num_gpus": 0,
c30093mavu-algo-1-p87y9 | "output_data_dir": "/opt/ml/output/data",
c30093mavu-algo-1-p87y9 | "output_dir": "/opt/ml/output",
c30093mavu-algo-1-p87y9 | "output_intermediate_dir": "/opt/ml/output/intermediate",
c30093mavu-algo-1-p87y9 | "resource_config": {
c30093mavu-algo-1-p87y9 | "current_host": "algo-1-p87y9",
c30093mavu-algo-1-p87y9 | "hosts": [
c30093mavu-algo-1-p87y9 | "algo-1-p87y9"
c30093mavu-algo-1-p87y9 | ]
c30093mavu-algo-1-p87y9 | },
c30093mavu-algo-1-p87y9 | "user_entry_point": "train_and_serve.py"
c30093mavu-algo-1-p87y9 | }
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | Environment variables:
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | SM_HOSTS=["algo-1-p87y9"]
c30093mavu-algo-1-p87y9 | SM_NETWORK_INTERFACE_NAME=eth0
c30093mavu-algo-1-p87y9 | SM_HPS={"dummy_param_1":"val1","dummy_param_2":"val2"}
c30093mavu-algo-1-p87y9 | SM_USER_ENTRY_POINT=train_and_serve.py
c30093mavu-algo-1-p87y9 | SM_FRAMEWORK_PARAMS={}
c30093mavu-algo-1-p87y9 | SM_RESOURCE_CONFIG={"current_host":"algo-1-p87y9","hosts":["algo-1-p87y9"]}
c30093mavu-algo-1-p87y9 | SM_INPUT_DATA_CONFIG={"train":{"TrainingInputMode":"File"}}
c30093mavu-algo-1-p87y9 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
c30093mavu-algo-1-p87y9 | SM_CHANNELS=["train"]
c30093mavu-algo-1-p87y9 | SM_CURRENT_HOST=algo-1-p87y9
c30093mavu-algo-1-p87y9 | SM_MODULE_NAME=train_and_serve
c30093mavu-algo-1-p87y9 | SM_LOG_LEVEL=20
c30093mavu-algo-1-p87y9 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
c30093mavu-algo-1-p87y9 | SM_INPUT_DIR=/opt/ml/input
c30093mavu-algo-1-p87y9 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
c30093mavu-algo-1-p87y9 | SM_OUTPUT_DIR=/opt/ml/output
c30093mavu-algo-1-p87y9 | SM_NUM_CPUS=2
c30093mavu-algo-1-p87y9 | SM_NUM_GPUS=0
c30093mavu-algo-1-p87y9 | SM_MODEL_DIR=/opt/ml/model
c30093mavu-algo-1-p87y9 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz
c30093mavu-algo-1-p87y9 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1-p87y9","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-p87y9"],"hyperparameters":{"dummy_param_1":"val1","dummy_param_2":"val2"},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-23-53-775","log_level":20,"master_hostname":"algo-1-p87y9","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-p87y9","hosts":["algo-1-p87y9"]},"user_entry_point":"train_and_serve.py"}
c30093mavu-algo-1-p87y9 | SM_USER_ARGS=["--dummy_param_1","val1","--dummy_param_2","val2"]
c30093mavu-algo-1-p87y9 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
c30093mavu-algo-1-p87y9 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
c30093mavu-algo-1-p87y9 | SM_HP_DUMMY_PARAM_1=val1
c30093mavu-algo-1-p87y9 | SM_HP_DUMMY_PARAM_2=val2
c30093mavu-algo-1-p87y9 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | Invoking script with the following command:
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | /miniconda3/bin/python train_and_serve.py --dummy_param_1 val1 --dummy_param_2 val2
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 |
c30093mavu-algo-1-p87y9 | *** Hello from SageMaker script container ***
c30093mavu-algo-1-p87y9 | training_dir files list: ['train.csv']
c30093mavu-algo-1-p87y9 | 2022-07-17 15:23:56,328 sagemaker-containers INFO Reporting training SUCCESS
c30093mavu-algo-1-p87y9 exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmpuwvrle8_/algo-1-p87y9 Please remove it manually.
===== Job Complete =====
Our training job is now complete. Let us check the S3 bucket to see if our dummy model and other artifacts are present.
First, we need the S3 URI for these artifacts. For our dummy model (from SM_MODEL_DIR), we can use our estimator object to get the URI.
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/model.tar.gz'
Let’s download model_data
from S3 to a local directory for verification. For this create a local /tmp
to store these downloaded files.
local_tmp_path = local_path + "/tmp"
print(local_tmp_path)
# create the local '/tmp' directory
Path(local_tmp_path).mkdir(parents=True, exist_ok=True)
./datasets/2022-07-07-sagemaker-script-mode/tmp
We will use SageMaker S3Downloader
object to download the model file.
File is downloaded. Let’s uncompress it to verify the model file.
Yes, the “dummy-model.txt” file is present. This tells us that SageMaker will automatically upload the files from the model directory (SM_MODEL_DIR) to the S3 bucket. Let’s do the same for the output data directory (SM_OUTPUT_DATA_DIR). There is no direct way to get the S3 URI from the estimator object for the output data directory. But we can prepare it ourselves. So let’s do that next.
print("estimator.output_path: ", sk_estimator.output_path)
print("estimator.latest_training_job.name: ", sk_estimator.latest_training_job.name)
estimator.output_path: s3://sagemaker-us-east-1-801598032724/
estimator.latest_training_job.name: sagemaker-scikit-learn-2022-07-17-15-23-53-775
def get_s3_output_uri(estimator):
return estimator.output_path + estimator.latest_training_job.name
get_s3_output_uri(sk_estimator)
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775'
##
# S3 URI for output data artifacts
s3_output_uri = get_s3_output_uri(sk_estimator) + '/output.tar.gz'
s3_output_uri
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/output.tar.gz'
##
# S3 URI for model artifact. We have already veirifed it.
s3_model_uri = get_s3_output_uri(sk_estimator) + '/model.tar.gz'
s3_model_uri
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/model.tar.gz'
##
# S3 URI for source code
s3_source_uri = get_s3_output_uri(sk_estimator) + '/source/sourcedir.tar.gz'
s3_source_uri
's3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz'
Let’s download these artifacts to our local ‘/tmp’ directory for verification.
download: s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/output.tar.gz to datasets/2022-07-07-sagemaker-script-mode/tmp/output.tar.gz
download: s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-23-53-775/source/sourcedir.tar.gz to datasets/2022-07-07-sagemaker-script-mode/tmp/sourcedir.tar.gz
Summary till now
Let’s summarize what we have learned till now. * We can use SageMaker SKLearn local mode to test our code in a local environment * SKLearn container executes our provided script with the command /miniconda3/bin/python train_and_server.py
* Hyperparameters passed to the container are passed to our script as command line arguments * Data from input channels will be downloaded by the container and made available for our script to load and process * ‘/opt/ml/model’ and ‘/opt/ml/output’ directories are special. Anything stored on them will be automatically backed up on the S3 bucket when the job finishes. These directories are defined in the container environment variables ‘SM_MODEL_DIR’ and ‘SM_OUTPUT_DATA_DIR’, respectively. SM_MODEL_DIR should be used to write model artifacts. SM_OUTPUT_DATA_DIR should be used to write any other supporting artifact.
Let’s use this knowledge to update our script to train a RandomForrestClassifier on the Iris flower dataset.
Prepare training script for RandomForestClassifier
Let’s update our training script to train a scikit-learn random forest classifier model on the iris data set. The script will read training and testing data from input data channel directories and trains a classifier on it. It will then save the model to the model directory and validation results (‘y_pred.csv’) to the output data directory. Notice that we have also parsed container environment variables as command line arguments. It makes sense for hyperparameters (‘–estimators’) because we know they will be passed to the script as command line parameters. For other environment variables (e.g. ‘SM_MODEL_DIR’), we have checked first if they are given as command line arguments. If they are, then we parse them to get the values. Otherwise, we read their values from the environment. This is done so we can test our script locally from the command line without setting the environment variables.
%%writefile $script_file
import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
import joblib
if __name__ == "__main__":
# Pass in environment variables and hyperparameters
parser = argparse.ArgumentParser()
# Hyperparameters
parser.add_argument("--estimators", type=int, default=15)
# sm_model_dir: model artifacts stored here after training
# sm-channel-train: input training data location
# sm-channel-test: input test data location
# sm-output-data-dir: output artifacts location
parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))
args, _ = parser.parse_known_args()
print("command line arguments: ", args)
estimators = args.estimators
sm_model_dir = args.sm_model_dir
training_dir = args.sm_channel_train
testing_dir = args.sm_channel_test
output_data_dir = args.sm_output_data_dir
print(f"training_dir: {training_dir}")
print(f"training_dir files list: {os.listdir(training_dir)}")
print(f"testing_dir: {testing_dir}")
print(f"testing_dir files list: {os.listdir(testing_dir)}")
print(f"sm_model_dir: {sm_model_dir}")
print(f"output_data_dir: {output_data_dir}")
# Read in data
df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")
# Preprocess data
X_train = df_train.drop(["class", "class_cat"], axis=1)
y_train = df_train["class_cat"]
X_test = df_test.drop(["class", "class_cat"], axis=1)
y_test = df_test["class_cat"]
print(f"X_train.shape: {X_train.shape}")
print(f"y_train.shape: {y_train.shape}")
print(f"X_train.shape: {X_test.shape}")
print(f"y_train.shape: {y_test.shape}")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Build model
regressor = RandomForestClassifier(n_estimators=estimators)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
# Save the model
joblib.dump(regressor, sm_model_dir + "/model.joblib")
# Save the results
pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
Now give proper execution rights to the script.
Let’s test this script locally before passing it to the SKLearn estimator. We will invoke this script from a command line and pass the required parameters similar to how an estimator container will execute it. For testing this script, we need to pass four directory paths: * sm-model-dir This will point to a directory where our script will store the trained model. We can point it to ‘/tmp’ directory for test purposes * sm-channel-train This will point to a directory containing training data. We already have it as ‘local_train_path’ * sm-channel-test This will point to a directory containing test data. We also have it as ‘local_test_path’ * sm-output-data-dir This will point to a directory where our script will store other artifacts. We can also point it to ‘/tmp’ directory for test purposes
Once the script is successfully run, we will find the trained model file ‘model.joblib’ and ‘y_pred.csv’ in the ‘/tmp’ directory.
#collapse-output
!python3 $script_file \
--sm-model-dir $local_tmp_path \
--sm-channel-train $local_train_path \
--sm-channel-test $local_test_path \
--sm-output-data-dir $local_tmp_path \
--estimators 10
command line arguments: Namespace(estimators=10, sm_channel_test='./datasets/2022-07-07-sagemaker-script-mode/test', sm_channel_train='./datasets/2022-07-07-sagemaker-script-mode/train', sm_model_dir='./datasets/2022-07-07-sagemaker-script-mode/tmp', sm_output_data_dir='./datasets/2022-07-07-sagemaker-script-mode/tmp')
training_dir: ./datasets/2022-07-07-sagemaker-script-mode/train
training_dir files list: ['train.csv']
testing_dir: ./datasets/2022-07-07-sagemaker-script-mode/test
testing_dir files list: ['test.csv']
sm_model_dir: ./datasets/2022-07-07-sagemaker-script-mode/tmp
output_data_dir: ./datasets/2022-07-07-sagemaker-script-mode/tmp
X_train.shape: (120, 4)
y_train.shape: (120,)
X_train.shape: (30, 4)
y_train.shape: (30,)
Let’s check the local ‘/tmp’ directory for artifacts.
Now that we have test our script and it is working as expected, let’s pass it to SKLean container.
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file,
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"estimators":10},
)
sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
Creating aer2alr1w1-algo-1-10beq ...
Creating aer2alr1w1-algo-1-10beq ... done
Attaching to aer2alr1w1-algo-1-10beq
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,011 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,015 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,024 sagemaker_sklearn_container.training INFO Invoking user training script.
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,226 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,239 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,251 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:06,260 sagemaker-training-toolkit INFO Invoking user script
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | Training Env:
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | {
aer2alr1w1-algo-1-10beq | "additional_framework_parameters": {},
aer2alr1w1-algo-1-10beq | "channel_input_dirs": {
aer2alr1w1-algo-1-10beq | "train": "/opt/ml/input/data/train",
aer2alr1w1-algo-1-10beq | "test": "/opt/ml/input/data/test"
aer2alr1w1-algo-1-10beq | },
aer2alr1w1-algo-1-10beq | "current_host": "algo-1-10beq",
aer2alr1w1-algo-1-10beq | "framework_module": "sagemaker_sklearn_container.training:main",
aer2alr1w1-algo-1-10beq | "hosts": [
aer2alr1w1-algo-1-10beq | "algo-1-10beq"
aer2alr1w1-algo-1-10beq | ],
aer2alr1w1-algo-1-10beq | "hyperparameters": {
aer2alr1w1-algo-1-10beq | "estimators": 10
aer2alr1w1-algo-1-10beq | },
aer2alr1w1-algo-1-10beq | "input_config_dir": "/opt/ml/input/config",
aer2alr1w1-algo-1-10beq | "input_data_config": {
aer2alr1w1-algo-1-10beq | "train": {
aer2alr1w1-algo-1-10beq | "TrainingInputMode": "File"
aer2alr1w1-algo-1-10beq | },
aer2alr1w1-algo-1-10beq | "test": {
aer2alr1w1-algo-1-10beq | "TrainingInputMode": "File"
aer2alr1w1-algo-1-10beq | }
aer2alr1w1-algo-1-10beq | },
aer2alr1w1-algo-1-10beq | "input_dir": "/opt/ml/input",
aer2alr1w1-algo-1-10beq | "is_master": true,
aer2alr1w1-algo-1-10beq | "job_name": "sagemaker-scikit-learn-2022-07-17-15-24-03-447",
aer2alr1w1-algo-1-10beq | "log_level": 20,
aer2alr1w1-algo-1-10beq | "master_hostname": "algo-1-10beq",
aer2alr1w1-algo-1-10beq | "model_dir": "/opt/ml/model",
aer2alr1w1-algo-1-10beq | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-03-447/source/sourcedir.tar.gz",
aer2alr1w1-algo-1-10beq | "module_name": "train_and_serve",
aer2alr1w1-algo-1-10beq | "network_interface_name": "eth0",
aer2alr1w1-algo-1-10beq | "num_cpus": 2,
aer2alr1w1-algo-1-10beq | "num_gpus": 0,
aer2alr1w1-algo-1-10beq | "output_data_dir": "/opt/ml/output/data",
aer2alr1w1-algo-1-10beq | "output_dir": "/opt/ml/output",
aer2alr1w1-algo-1-10beq | "output_intermediate_dir": "/opt/ml/output/intermediate",
aer2alr1w1-algo-1-10beq | "resource_config": {
aer2alr1w1-algo-1-10beq | "current_host": "algo-1-10beq",
aer2alr1w1-algo-1-10beq | "hosts": [
aer2alr1w1-algo-1-10beq | "algo-1-10beq"
aer2alr1w1-algo-1-10beq | ]
aer2alr1w1-algo-1-10beq | },
aer2alr1w1-algo-1-10beq | "user_entry_point": "train_and_serve.py"
aer2alr1w1-algo-1-10beq | }
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | Environment variables:
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | SM_HOSTS=["algo-1-10beq"]
aer2alr1w1-algo-1-10beq | SM_NETWORK_INTERFACE_NAME=eth0
aer2alr1w1-algo-1-10beq | SM_HPS={"estimators":10}
aer2alr1w1-algo-1-10beq | SM_USER_ENTRY_POINT=train_and_serve.py
aer2alr1w1-algo-1-10beq | SM_FRAMEWORK_PARAMS={}
aer2alr1w1-algo-1-10beq | SM_RESOURCE_CONFIG={"current_host":"algo-1-10beq","hosts":["algo-1-10beq"]}
aer2alr1w1-algo-1-10beq | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
aer2alr1w1-algo-1-10beq | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
aer2alr1w1-algo-1-10beq | SM_CHANNELS=["test","train"]
aer2alr1w1-algo-1-10beq | SM_CURRENT_HOST=algo-1-10beq
aer2alr1w1-algo-1-10beq | SM_MODULE_NAME=train_and_serve
aer2alr1w1-algo-1-10beq | SM_LOG_LEVEL=20
aer2alr1w1-algo-1-10beq | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
aer2alr1w1-algo-1-10beq | SM_INPUT_DIR=/opt/ml/input
aer2alr1w1-algo-1-10beq | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
aer2alr1w1-algo-1-10beq | SM_OUTPUT_DIR=/opt/ml/output
aer2alr1w1-algo-1-10beq | SM_NUM_CPUS=2
aer2alr1w1-algo-1-10beq | SM_NUM_GPUS=0
aer2alr1w1-algo-1-10beq | SM_MODEL_DIR=/opt/ml/model
aer2alr1w1-algo-1-10beq | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-03-447/source/sourcedir.tar.gz
aer2alr1w1-algo-1-10beq | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-10beq","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-10beq"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-24-03-447","log_level":20,"master_hostname":"algo-1-10beq","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-03-447/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-10beq","hosts":["algo-1-10beq"]},"user_entry_point":"train_and_serve.py"}
aer2alr1w1-algo-1-10beq | SM_USER_ARGS=["--estimators","10"]
aer2alr1w1-algo-1-10beq | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
aer2alr1w1-algo-1-10beq | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
aer2alr1w1-algo-1-10beq | SM_CHANNEL_TEST=/opt/ml/input/data/test
aer2alr1w1-algo-1-10beq | SM_HP_ESTIMATORS=10
aer2alr1w1-algo-1-10beq | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | Invoking script with the following command:
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | /miniconda3/bin/python train_and_serve.py --estimators 10
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq |
aer2alr1w1-algo-1-10beq | command line arguments: Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
aer2alr1w1-algo-1-10beq | training_dir: /opt/ml/input/data/train
aer2alr1w1-algo-1-10beq | training_dir files list: ['train.csv']
aer2alr1w1-algo-1-10beq | testing_dir: /opt/ml/input/data/test
aer2alr1w1-algo-1-10beq | testing_dir files list: ['test.csv']
aer2alr1w1-algo-1-10beq | sm_model_dir: /opt/ml/model
aer2alr1w1-algo-1-10beq | output_data_dir: /opt/ml/output/data
aer2alr1w1-algo-1-10beq | X_train.shape: (120, 4)
aer2alr1w1-algo-1-10beq | y_train.shape: (120,)
aer2alr1w1-algo-1-10beq | X_train.shape: (30, 4)
aer2alr1w1-algo-1-10beq | y_train.shape: (30,)
aer2alr1w1-algo-1-10beq | 2022-07-17 15:24:07,286 sagemaker-containers INFO Reporting training SUCCESS
aer2alr1w1-algo-1-10beq exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmp0yb8k7nj/algo-1-10beq Please remove it manually.
===== Job Complete =====
Passing custom libraries and dependencies to SKLean container
We have successfully trained our classifier but assume we have an additional task. One of your colleagues has created a library that takes the confusion matrix array and plots it with seaborn visualization library. You have been told to use this custom library with the training script and save the confusion matrix plot to the output data directory.
Let’s prepare code for this custom library to take an array and return a confusion matrix plot from seaborn.
# create a path to store the custom library code
custom_library_path = local_path + "/my_custom_library"
custom_library_file = custom_library_path + "/seaborn_confusion_matrix.py"
print(f"custom_library_path: {custom_library_path}")
print(f"custom_library_file: {custom_library_file}")
# make sure the path exists
Path(custom_library_path).mkdir(parents=True, exist_ok=True)
custom_library_path: ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library
custom_library_file: ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library/seaborn_confusion_matrix.py
Now the code to plot the confusion matrix.
%%writefile $custom_library_file
import seaborn as sns
import numpy as np
import argparse, os
def save_confusion_matrix(cf_matrix, path="./"):
sns_plot = sns.heatmap(cf_matrix, annot=True)
sns_plot.figure.savefig(path + "/output_cm.png")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--path", type=str, default="./")
args, _ = parser.parse_known_args()
path = args.path
dummy_cm = np.array([[23, 5], [3, 30]])
save_confusion_matrix(dummy_cm, path)
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library/seaborn_confusion_matrix.py
Convert the directory containing seaborn code into a Python package directory.
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/my_custom_library/__init__.py
Our custom library has a dependency on the seaborn Python package. So let’s create ‘requirements.txt’ and put all our dependencies in it. Later it will be passed to the SKLean container to install them during initialization.
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt
Let’s first test this library in our local environment. It should plot a dummy confusion matrix in local /tmp
directory.
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: seaborn==0.11.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from -r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (0.11.2)
Requirement already satisfied: numpy>=1.15 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.20.3)
Requirement already satisfied: matplotlib>=2.2 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (3.5.0)
Requirement already satisfied: scipy>=1.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.5.3)
Requirement already satisfied: pandas>=0.23 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.3.4)
Requirement already satisfied: fonttools>=4.22.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (4.28.2)
Requirement already satisfied: python-dateutil>=2.7 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.3.2)
Requirement already satisfied: pillow>=6.2.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (9.0.1)
Requirement already satisfied: packaging>=20.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (21.3)
Requirement already satisfied: pyparsing>=2.2.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (3.0.6)
Requirement already satisfied: cycler>=0.10 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (0.11.0)
Requirement already satisfied: pytz>=2017.3 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (2021.3)
Requirement already satisfied: six>=1.5 in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r ./datasets/2022-07-07-sagemaker-script-mode/src/requirements.txt (line 2)) (1.16.0)
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.
Matplotlib is building the font cache; this may take a moment.
So our custom library code works. Let’s update our script to use it.
%%writefile $script_file
import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
import joblib
from my_custom_library import save_confusion_matrix
if __name__ == "__main__":
# Pass in environment variables and hyperparameters
parser = argparse.ArgumentParser()
# Hyperparameters
parser.add_argument("--estimators", type=int, default=15)
# sm_model_dir: model artifacts stored here after training
# sm-channel-train: input training data location
# sm-channel-test: input test data location
# sm-output-data-dir: output artifacts location
parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))
args, _ = parser.parse_known_args()
print("command line arguments: ", args)
estimators = args.estimators
sm_model_dir = args.sm_model_dir
training_dir = args.sm_channel_train
testing_dir = args.sm_channel_test
output_data_dir = args.sm_output_data_dir
print(f"training_dir: {training_dir}")
print(f"training_dir files list: {os.listdir(training_dir)}")
print(f"testing_dir: {testing_dir}")
print(f"testing_dir files list: {os.listdir(testing_dir)}")
print(f"sm_model_dir: {sm_model_dir}")
print(f"output_data_dir: {output_data_dir}")
# Read in data
df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")
# Preprocess data
X_train = df_train.drop(["class", "class_cat"], axis=1)
y_train = df_train["class_cat"]
X_test = df_test.drop(["class", "class_cat"], axis=1)
y_test = df_test["class_cat"]
print(f"X_train.shape: {X_train.shape}")
print(f"y_train.shape: {y_train.shape}")
print(f"X_train.shape: {X_test.shape}")
print(f"y_train.shape: {y_test.shape}")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Build model
regressor = RandomForestClassifier(n_estimators=estimators)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
# Save the model
joblib.dump(regressor, sm_model_dir + "/model.joblib")
# Save the results
pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")
# save the confusion matrix
cf_matrix = confusion_matrix(y_test, y_pred)
save_confusion_matrix(cf_matrix, output_data_dir)
# print sm_model_dir info
print(f"sm_model_dir: {sm_model_dir}")
print(f"sm_model_dir files list: {os.listdir(sm_model_dir)}")
# print output_data_dir info
print(f"output_data_dir: {output_data_dir}")
print(f"output_data_dir files list: {os.listdir(output_data_dir)}")
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
Finally, all the ingredients are ready. Let’s run our script from the SKLean container.
In the next cell, you can see that we have passed two extra parameters to the estimator. * source_dir this path points to the directory with the entry_point script train_and_serve.py
and requirements.txt
. If any requirements.txt
file is in this directory, the estimator will pick that and install those packages in the container during initialization. * dependencies this points to a list of dependencies (custom libraries) that we want available in the container.
Our local directory structure is shown below.
local_path/
├── my_custom_library/
│ ├── seaborn_confusion_matrix.py
│ └── __init__.py
└── src/
├── train_and_serve.py
└── requirements.txt
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file_name,
source_dir=script_path,
dependencies=[custom_library_path],
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"estimators":10},
)
sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
Creating xm0kutxos7-algo-1-8yrs9 ...
Creating xm0kutxos7-algo-1-8yrs9 ... done
Attaching to xm0kutxos7-algo-1-8yrs9
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,458 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,462 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,472 sagemaker_sklearn_container.training INFO Invoking user training script.
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:24,661 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:
xm0kutxos7-algo-1-8yrs9 | /miniconda3/bin/python -m pip install -r requirements.txt
xm0kutxos7-algo-1-8yrs9 | Collecting seaborn==0.11.2
xm0kutxos7-algo-1-8yrs9 | Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 5.1 MB/s eta 0:00:0000:01eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
xm0kutxos7-algo-1-8yrs9 | Collecting matplotlib>=2.2
xm0kutxos7-algo-1-8yrs9 | Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 31.0 MB/s eta 0:00:0000:0100:01:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
xm0kutxos7-algo-1-8yrs9 | Collecting kiwisolver>=1.0.1
xm0kutxos7-algo-1-8yrs9 | Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 18.6 MB/s eta 0:00:00:00:01ta -:--:--
xm0kutxos7-algo-1-8yrs9 | Collecting cycler>=0.10
xm0kutxos7-algo-1-8yrs9 | Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
xm0kutxos7-algo-1-8yrs9 | Collecting fonttools>=4.22.0
xm0kutxos7-algo-1-8yrs9 | Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 102.1 MB/s eta 0:00:0031m? eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Collecting packaging>=20.0
xm0kutxos7-algo-1-8yrs9 | Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 11.9 MB/s eta 0:00:001m? eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
xm0kutxos7-algo-1-8yrs9 | Collecting pyparsing>=2.2.1
xm0kutxos7-algo-1-8yrs9 | Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 24.8 MB/s eta 0:00:001m? eta -:--:--
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
xm0kutxos7-algo-1-8yrs9 | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
xm0kutxos7-algo-1-8yrs9 | Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
xm0kutxos7-algo-1-8yrs9 | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
xm0kutxos7-algo-1-8yrs9 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,839 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,859 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,879 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:30,894 sagemaker-training-toolkit INFO Invoking user script
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | Training Env:
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | {
xm0kutxos7-algo-1-8yrs9 | "additional_framework_parameters": {},
xm0kutxos7-algo-1-8yrs9 | "channel_input_dirs": {
xm0kutxos7-algo-1-8yrs9 | "train": "/opt/ml/input/data/train",
xm0kutxos7-algo-1-8yrs9 | "test": "/opt/ml/input/data/test"
xm0kutxos7-algo-1-8yrs9 | },
xm0kutxos7-algo-1-8yrs9 | "current_host": "algo-1-8yrs9",
xm0kutxos7-algo-1-8yrs9 | "framework_module": "sagemaker_sklearn_container.training:main",
xm0kutxos7-algo-1-8yrs9 | "hosts": [
xm0kutxos7-algo-1-8yrs9 | "algo-1-8yrs9"
xm0kutxos7-algo-1-8yrs9 | ],
xm0kutxos7-algo-1-8yrs9 | "hyperparameters": {
xm0kutxos7-algo-1-8yrs9 | "estimators": 10
xm0kutxos7-algo-1-8yrs9 | },
xm0kutxos7-algo-1-8yrs9 | "input_config_dir": "/opt/ml/input/config",
xm0kutxos7-algo-1-8yrs9 | "input_data_config": {
xm0kutxos7-algo-1-8yrs9 | "train": {
xm0kutxos7-algo-1-8yrs9 | "TrainingInputMode": "File"
xm0kutxos7-algo-1-8yrs9 | },
xm0kutxos7-algo-1-8yrs9 | "test": {
xm0kutxos7-algo-1-8yrs9 | "TrainingInputMode": "File"
xm0kutxos7-algo-1-8yrs9 | }
xm0kutxos7-algo-1-8yrs9 | },
xm0kutxos7-algo-1-8yrs9 | "input_dir": "/opt/ml/input",
xm0kutxos7-algo-1-8yrs9 | "is_master": true,
xm0kutxos7-algo-1-8yrs9 | "job_name": "sagemaker-scikit-learn-2022-07-17-15-24-22-270",
xm0kutxos7-algo-1-8yrs9 | "log_level": 20,
xm0kutxos7-algo-1-8yrs9 | "master_hostname": "algo-1-8yrs9",
xm0kutxos7-algo-1-8yrs9 | "model_dir": "/opt/ml/model",
xm0kutxos7-algo-1-8yrs9 | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-22-270/source/sourcedir.tar.gz",
xm0kutxos7-algo-1-8yrs9 | "module_name": "train_and_serve",
xm0kutxos7-algo-1-8yrs9 | "network_interface_name": "eth0",
xm0kutxos7-algo-1-8yrs9 | "num_cpus": 2,
xm0kutxos7-algo-1-8yrs9 | "num_gpus": 0,
xm0kutxos7-algo-1-8yrs9 | "output_data_dir": "/opt/ml/output/data",
xm0kutxos7-algo-1-8yrs9 | "output_dir": "/opt/ml/output",
xm0kutxos7-algo-1-8yrs9 | "output_intermediate_dir": "/opt/ml/output/intermediate",
xm0kutxos7-algo-1-8yrs9 | "resource_config": {
xm0kutxos7-algo-1-8yrs9 | "current_host": "algo-1-8yrs9",
xm0kutxos7-algo-1-8yrs9 | "hosts": [
xm0kutxos7-algo-1-8yrs9 | "algo-1-8yrs9"
xm0kutxos7-algo-1-8yrs9 | ]
xm0kutxos7-algo-1-8yrs9 | },
xm0kutxos7-algo-1-8yrs9 | "user_entry_point": "train_and_serve.py"
xm0kutxos7-algo-1-8yrs9 | }
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | Environment variables:
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | SM_HOSTS=["algo-1-8yrs9"]
xm0kutxos7-algo-1-8yrs9 | SM_NETWORK_INTERFACE_NAME=eth0
xm0kutxos7-algo-1-8yrs9 | SM_HPS={"estimators":10}
xm0kutxos7-algo-1-8yrs9 | SM_USER_ENTRY_POINT=train_and_serve.py
xm0kutxos7-algo-1-8yrs9 | SM_FRAMEWORK_PARAMS={}
xm0kutxos7-algo-1-8yrs9 | SM_RESOURCE_CONFIG={"current_host":"algo-1-8yrs9","hosts":["algo-1-8yrs9"]}
xm0kutxos7-algo-1-8yrs9 | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
xm0kutxos7-algo-1-8yrs9 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
xm0kutxos7-algo-1-8yrs9 | SM_CHANNELS=["test","train"]
xm0kutxos7-algo-1-8yrs9 | SM_CURRENT_HOST=algo-1-8yrs9
xm0kutxos7-algo-1-8yrs9 | SM_MODULE_NAME=train_and_serve
xm0kutxos7-algo-1-8yrs9 | SM_LOG_LEVEL=20
xm0kutxos7-algo-1-8yrs9 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
xm0kutxos7-algo-1-8yrs9 | SM_INPUT_DIR=/opt/ml/input
xm0kutxos7-algo-1-8yrs9 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
xm0kutxos7-algo-1-8yrs9 | SM_OUTPUT_DIR=/opt/ml/output
xm0kutxos7-algo-1-8yrs9 | SM_NUM_CPUS=2
xm0kutxos7-algo-1-8yrs9 | SM_NUM_GPUS=0
xm0kutxos7-algo-1-8yrs9 | SM_MODEL_DIR=/opt/ml/model
xm0kutxos7-algo-1-8yrs9 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-22-270/source/sourcedir.tar.gz
xm0kutxos7-algo-1-8yrs9 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-8yrs9","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-8yrs9"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-24-22-270","log_level":20,"master_hostname":"algo-1-8yrs9","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-22-270/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-8yrs9","hosts":["algo-1-8yrs9"]},"user_entry_point":"train_and_serve.py"}
xm0kutxos7-algo-1-8yrs9 | SM_USER_ARGS=["--estimators","10"]
xm0kutxos7-algo-1-8yrs9 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
xm0kutxos7-algo-1-8yrs9 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
xm0kutxos7-algo-1-8yrs9 | SM_CHANNEL_TEST=/opt/ml/input/data/test
xm0kutxos7-algo-1-8yrs9 | SM_HP_ESTIMATORS=10
xm0kutxos7-algo-1-8yrs9 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | Invoking script with the following command:
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | /miniconda3/bin/python train_and_serve.py --estimators 10
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 |
xm0kutxos7-algo-1-8yrs9 | command line arguments: Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
xm0kutxos7-algo-1-8yrs9 | training_dir: /opt/ml/input/data/train
xm0kutxos7-algo-1-8yrs9 | training_dir files list: ['train.csv']
xm0kutxos7-algo-1-8yrs9 | testing_dir: /opt/ml/input/data/test
xm0kutxos7-algo-1-8yrs9 | testing_dir files list: ['test.csv']
xm0kutxos7-algo-1-8yrs9 | sm_model_dir: /opt/ml/model
xm0kutxos7-algo-1-8yrs9 | output_data_dir: /opt/ml/output/data
xm0kutxos7-algo-1-8yrs9 | X_train.shape: (120, 4)
xm0kutxos7-algo-1-8yrs9 | y_train.shape: (120,)
xm0kutxos7-algo-1-8yrs9 | X_train.shape: (30, 4)
xm0kutxos7-algo-1-8yrs9 | y_train.shape: (30,)
xm0kutxos7-algo-1-8yrs9 | sm_model_dir: /opt/ml/model
xm0kutxos7-algo-1-8yrs9 | sm_model_dir files list: ['model.joblib']
xm0kutxos7-algo-1-8yrs9 | output_data_dir: /opt/ml/output/data
xm0kutxos7-algo-1-8yrs9 | output_data_dir files list: ['y_pred.csv', 'output_cm.png']
xm0kutxos7-algo-1-8yrs9 | 2022-07-17 15:24:33,003 sagemaker-containers INFO Reporting training SUCCESS
Failed to delete: /tmp/tmpee3z9n_9/algo-1-8yrs9 Please remove it manually.
xm0kutxos7-algo-1-8yrs9 exited with code 0
Aborting on container exit...
===== Job Complete =====
SKLearn container output shows that our classifier is successfully trained, and the model and output artifacts are placed in their respective folders. We know from the first section of this post that these artifacts will automatically be uploaded to the S3 bucket. This concludes the model training part of our implementation. Let’s now proceed to model serving part of our solution.
Serve SKLearn model in local mode
At this point, we have our trained model ready. Can we deploy it already?
The answer is no. If we try to deploy this model using command
sk_predictor = sk_estimator.deploy(
initial_instance_count=1,
instance_type='local'
)
It will generate an exception message telling us that the estimator does not know how to load the model. So we need to tell the estimator by implementing model_fn
function in our script.
[2022-07-09 06:15:45 +0000] [31] [ERROR] Error handling request /ping
Traceback (most recent call last):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper
return fn(*args, **kwargs)
File "/miniconda3/lib/python3.8/site-packages/sagemaker_sklearn_container/serving.py", line 43, in default_model_fn
return transformer.default_model_fn(model_dir)
File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_transformer.py", line 35, in default_model_fn
raise NotImplementedError(
NotImplementedError:
Please provide a model_fn implementation.
See documentation for model_fn at https://github.com/aws/sagemaker-python-sdk
The model_fn has the following signature:
def model_fn(model_dir)
Besides loading the model, we also need to tell the model server how to get predictions from the loaded model. For this, we need to implement the second function predict_fn, which has the following signature.
def predict_fn(input_data, model)
After we have called the fit
function on our SKLearn estimator, we can deploy it by calling the deploy
function to create an inference endpoint. Once you call deploy
on the estimator two objects are created in response * SageMaker scikit-learn Endpoint: This Endpoint encapsulates a model server running under it. The model server will load the model saved during training and perform inference on it. It requires two helper functions to load the model and make inferences on it: model_fn and predict_fn. * Predictor object: This object is returned in response to the deploy call. It can be used to do inference on the Endpoint hosting our SKLearn model.
Let’s update our script and add these two functions.
%%writefile $script_file
import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
import joblib
from my_custom_library import save_confusion_matrix
if __name__ == "__main__":
# Pass in environment variables and hyperparameters
parser = argparse.ArgumentParser()
# Hyperparameters
parser.add_argument("--estimators", type=int, default=15)
# sm_model_dir: model artifacts stored here after training
# sm-channel-train: input training data location
# sm-channel-test: input test data location
# sm-output-data-dir: output artifacts location
parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))
args, _ = parser.parse_known_args()
print("command line arguments: ", args)
estimators = args.estimators
sm_model_dir = args.sm_model_dir
training_dir = args.sm_channel_train
testing_dir = args.sm_channel_test
output_data_dir = args.sm_output_data_dir
print(f"training_dir: {training_dir}")
print(f"training_dir files list: {os.listdir(training_dir)}")
print(f"testing_dir: {testing_dir}")
print(f"testing_dir files list: {os.listdir(testing_dir)}")
print(f"sm_model_dir: {sm_model_dir}")
print(f"output_data_dir: {output_data_dir}")
# Read in data
df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")
# Preprocess data
X_train = df_train.drop(["class", "class_cat"], axis=1)
y_train = df_train["class_cat"]
X_test = df_test.drop(["class", "class_cat"], axis=1)
y_test = df_test["class_cat"]
print(f"X_train.shape: {X_train.shape}")
print(f"y_train.shape: {y_train.shape}")
print(f"X_train.shape: {X_test.shape}")
print(f"y_train.shape: {y_test.shape}")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Build model
regressor = RandomForestClassifier(n_estimators=estimators)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
# Save the model
joblib.dump(regressor, sm_model_dir + "/model.joblib")
# Save the results
pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")
# save the confusion matrix
cf_matrix = confusion_matrix(y_test, y_pred)
save_confusion_matrix(cf_matrix, output_data_dir)
# print sm_model_dir info
print(f"sm_model_dir: {sm_model_dir}")
print(f"sm_model_dir files list: {os.listdir(sm_model_dir)}")
# print output_data_dir info
print(f"output_data_dir: {output_data_dir}")
print(f"output_data_dir files list: {os.listdir(output_data_dir)}")
# Model serving
"""
Deserialize fitted model
"""
def model_fn(model_dir):
print(f"model_fn model_dir: {model_dir}")
model = joblib.load(os.path.join(model_dir, "model.joblib"))
return model
"""
predict_fn
input_data: returned array from input_fn above
model (sklearn model) returned model loaded from model_fn above
"""
def predict_fn(input_data, model):
return model.predict(input_data)
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
sk_estimator = SKLearn(
entry_point=script_file_name,
source_dir=script_path,
dependencies=[custom_library_path],
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"estimators":10},
)
sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
Creating wxtcttdsw0-algo-1-jym48 ...
Creating wxtcttdsw0-algo-1-jym48 ... done
Attaching to wxtcttdsw0-algo-1-jym48
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,721 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,726 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,735 sagemaker_sklearn_container.training INFO Invoking user training script.
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:36,923 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:
wxtcttdsw0-algo-1-jym48 | /miniconda3/bin/python -m pip install -r requirements.txt
wxtcttdsw0-algo-1-jym48 | Collecting seaborn==0.11.2
wxtcttdsw0-algo-1-jym48 | Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 3.3 MB/s eta 0:00:0000:01eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
wxtcttdsw0-algo-1-jym48 | Collecting matplotlib>=2.2
wxtcttdsw0-algo-1-jym48 | Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 101.6 MB/s eta 0:00:0000:0100:01:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
wxtcttdsw0-algo-1-jym48 | Collecting packaging>=20.0
wxtcttdsw0-algo-1-jym48 | Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 12.9 MB/s eta 0:00:001m? eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
wxtcttdsw0-algo-1-jym48 | Collecting cycler>=0.10
wxtcttdsw0-algo-1-jym48 | Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
wxtcttdsw0-algo-1-jym48 | Collecting kiwisolver>=1.0.1
wxtcttdsw0-algo-1-jym48 | Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 86.1 MB/s eta 0:00:0031m? eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
wxtcttdsw0-algo-1-jym48 | Collecting fonttools>=4.22.0
wxtcttdsw0-algo-1-jym48 | Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 8.7 MB/s eta 0:00:0000:01eta -:--:--
wxtcttdsw0-algo-1-jym48 | Collecting pyparsing>=2.2.1
wxtcttdsw0-algo-1-jym48 | Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 27.8 MB/s eta 0:00:001m? eta -:--:--
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
wxtcttdsw0-algo-1-jym48 | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
wxtcttdsw0-algo-1-jym48 | Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
wxtcttdsw0-algo-1-jym48 | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
wxtcttdsw0-algo-1-jym48 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,696 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,711 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,723 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:41,732 sagemaker-training-toolkit INFO Invoking user script
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | Training Env:
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | {
wxtcttdsw0-algo-1-jym48 | "additional_framework_parameters": {},
wxtcttdsw0-algo-1-jym48 | "channel_input_dirs": {
wxtcttdsw0-algo-1-jym48 | "train": "/opt/ml/input/data/train",
wxtcttdsw0-algo-1-jym48 | "test": "/opt/ml/input/data/test"
wxtcttdsw0-algo-1-jym48 | },
wxtcttdsw0-algo-1-jym48 | "current_host": "algo-1-jym48",
wxtcttdsw0-algo-1-jym48 | "framework_module": "sagemaker_sklearn_container.training:main",
wxtcttdsw0-algo-1-jym48 | "hosts": [
wxtcttdsw0-algo-1-jym48 | "algo-1-jym48"
wxtcttdsw0-algo-1-jym48 | ],
wxtcttdsw0-algo-1-jym48 | "hyperparameters": {
wxtcttdsw0-algo-1-jym48 | "estimators": 10
wxtcttdsw0-algo-1-jym48 | },
wxtcttdsw0-algo-1-jym48 | "input_config_dir": "/opt/ml/input/config",
wxtcttdsw0-algo-1-jym48 | "input_data_config": {
wxtcttdsw0-algo-1-jym48 | "train": {
wxtcttdsw0-algo-1-jym48 | "TrainingInputMode": "File"
wxtcttdsw0-algo-1-jym48 | },
wxtcttdsw0-algo-1-jym48 | "test": {
wxtcttdsw0-algo-1-jym48 | "TrainingInputMode": "File"
wxtcttdsw0-algo-1-jym48 | }
wxtcttdsw0-algo-1-jym48 | },
wxtcttdsw0-algo-1-jym48 | "input_dir": "/opt/ml/input",
wxtcttdsw0-algo-1-jym48 | "is_master": true,
wxtcttdsw0-algo-1-jym48 | "job_name": "sagemaker-scikit-learn-2022-07-17-15-24-34-114",
wxtcttdsw0-algo-1-jym48 | "log_level": 20,
wxtcttdsw0-algo-1-jym48 | "master_hostname": "algo-1-jym48",
wxtcttdsw0-algo-1-jym48 | "model_dir": "/opt/ml/model",
wxtcttdsw0-algo-1-jym48 | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-34-114/source/sourcedir.tar.gz",
wxtcttdsw0-algo-1-jym48 | "module_name": "train_and_serve",
wxtcttdsw0-algo-1-jym48 | "network_interface_name": "eth0",
wxtcttdsw0-algo-1-jym48 | "num_cpus": 2,
wxtcttdsw0-algo-1-jym48 | "num_gpus": 0,
wxtcttdsw0-algo-1-jym48 | "output_data_dir": "/opt/ml/output/data",
wxtcttdsw0-algo-1-jym48 | "output_dir": "/opt/ml/output",
wxtcttdsw0-algo-1-jym48 | "output_intermediate_dir": "/opt/ml/output/intermediate",
wxtcttdsw0-algo-1-jym48 | "resource_config": {
wxtcttdsw0-algo-1-jym48 | "current_host": "algo-1-jym48",
wxtcttdsw0-algo-1-jym48 | "hosts": [
wxtcttdsw0-algo-1-jym48 | "algo-1-jym48"
wxtcttdsw0-algo-1-jym48 | ]
wxtcttdsw0-algo-1-jym48 | },
wxtcttdsw0-algo-1-jym48 | "user_entry_point": "train_and_serve.py"
wxtcttdsw0-algo-1-jym48 | }
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | Environment variables:
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | SM_HOSTS=["algo-1-jym48"]
wxtcttdsw0-algo-1-jym48 | SM_NETWORK_INTERFACE_NAME=eth0
wxtcttdsw0-algo-1-jym48 | SM_HPS={"estimators":10}
wxtcttdsw0-algo-1-jym48 | SM_USER_ENTRY_POINT=train_and_serve.py
wxtcttdsw0-algo-1-jym48 | SM_FRAMEWORK_PARAMS={}
wxtcttdsw0-algo-1-jym48 | SM_RESOURCE_CONFIG={"current_host":"algo-1-jym48","hosts":["algo-1-jym48"]}
wxtcttdsw0-algo-1-jym48 | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
wxtcttdsw0-algo-1-jym48 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
wxtcttdsw0-algo-1-jym48 | SM_CHANNELS=["test","train"]
wxtcttdsw0-algo-1-jym48 | SM_CURRENT_HOST=algo-1-jym48
wxtcttdsw0-algo-1-jym48 | SM_MODULE_NAME=train_and_serve
wxtcttdsw0-algo-1-jym48 | SM_LOG_LEVEL=20
wxtcttdsw0-algo-1-jym48 | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
wxtcttdsw0-algo-1-jym48 | SM_INPUT_DIR=/opt/ml/input
wxtcttdsw0-algo-1-jym48 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
wxtcttdsw0-algo-1-jym48 | SM_OUTPUT_DIR=/opt/ml/output
wxtcttdsw0-algo-1-jym48 | SM_NUM_CPUS=2
wxtcttdsw0-algo-1-jym48 | SM_NUM_GPUS=0
wxtcttdsw0-algo-1-jym48 | SM_MODEL_DIR=/opt/ml/model
wxtcttdsw0-algo-1-jym48 | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-34-114/source/sourcedir.tar.gz
wxtcttdsw0-algo-1-jym48 | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-jym48","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-jym48"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-24-34-114","log_level":20,"master_hostname":"algo-1-jym48","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-24-34-114/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-jym48","hosts":["algo-1-jym48"]},"user_entry_point":"train_and_serve.py"}
wxtcttdsw0-algo-1-jym48 | SM_USER_ARGS=["--estimators","10"]
wxtcttdsw0-algo-1-jym48 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
wxtcttdsw0-algo-1-jym48 | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
wxtcttdsw0-algo-1-jym48 | SM_CHANNEL_TEST=/opt/ml/input/data/test
wxtcttdsw0-algo-1-jym48 | SM_HP_ESTIMATORS=10
wxtcttdsw0-algo-1-jym48 | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | Invoking script with the following command:
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | /miniconda3/bin/python train_and_serve.py --estimators 10
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 |
wxtcttdsw0-algo-1-jym48 | command line arguments: Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
wxtcttdsw0-algo-1-jym48 | training_dir: /opt/ml/input/data/train
wxtcttdsw0-algo-1-jym48 | training_dir files list: ['train.csv']
wxtcttdsw0-algo-1-jym48 | testing_dir: /opt/ml/input/data/test
wxtcttdsw0-algo-1-jym48 | testing_dir files list: ['test.csv']
wxtcttdsw0-algo-1-jym48 | sm_model_dir: /opt/ml/model
wxtcttdsw0-algo-1-jym48 | output_data_dir: /opt/ml/output/data
wxtcttdsw0-algo-1-jym48 | X_train.shape: (120, 4)
wxtcttdsw0-algo-1-jym48 | y_train.shape: (120,)
wxtcttdsw0-algo-1-jym48 | X_train.shape: (30, 4)
wxtcttdsw0-algo-1-jym48 | y_train.shape: (30,)
wxtcttdsw0-algo-1-jym48 | sm_model_dir: /opt/ml/model
wxtcttdsw0-algo-1-jym48 | sm_model_dir files list: ['model.joblib']
wxtcttdsw0-algo-1-jym48 | output_data_dir: /opt/ml/output/data
wxtcttdsw0-algo-1-jym48 | output_data_dir files list: ['y_pred.csv', 'output_cm.png']
wxtcttdsw0-algo-1-jym48 | 2022-07-17 15:24:43,775 sagemaker-containers INFO Reporting training SUCCESS
wxtcttdsw0-algo-1-jym48 exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmpa3uld7ha/algo-1-jym48 Please remove it manually.
===== Job Complete =====
Our model is trained. Let’s also deploy it in the local model. For model loading model_fn
, SageMaker will download the model artifacts from S3 and mount them on /opt/ml/model
. This way, our script can load the model from within the container.
#collapse-output
sk_predictor = sk_estimator.deploy(
initial_instance_count=1,
instance_type='local'
)
Attaching to 3fvyanwal0-algo-1-tz4ow
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,644 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,648 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,649 INFO - sagemaker-containers - nginx config:
3fvyanwal0-algo-1-tz4ow | worker_processes auto;
3fvyanwal0-algo-1-tz4ow | daemon off;
3fvyanwal0-algo-1-tz4ow | pid /tmp/nginx.pid;
3fvyanwal0-algo-1-tz4ow | error_log /dev/stderr;
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | worker_rlimit_nofile 4096;
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | events {
3fvyanwal0-algo-1-tz4ow | worker_connections 2048;
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | http {
3fvyanwal0-algo-1-tz4ow | include /etc/nginx/mime.types;
3fvyanwal0-algo-1-tz4ow | default_type application/octet-stream;
3fvyanwal0-algo-1-tz4ow | access_log /dev/stdout combined;
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | upstream gunicorn {
3fvyanwal0-algo-1-tz4ow | server unix:/tmp/gunicorn.sock;
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | server {
3fvyanwal0-algo-1-tz4ow | listen 8080 deferred;
3fvyanwal0-algo-1-tz4ow | client_max_body_size 0;
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | keepalive_timeout 3;
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | location ~ ^/(ping|invocations|execution-parameters) {
3fvyanwal0-algo-1-tz4ow | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
3fvyanwal0-algo-1-tz4ow | proxy_set_header Host $http_host;
3fvyanwal0-algo-1-tz4ow | proxy_redirect off;
3fvyanwal0-algo-1-tz4ow | proxy_read_timeout 60s;
3fvyanwal0-algo-1-tz4ow | proxy_pass http://gunicorn;
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | location / {
3fvyanwal0-algo-1-tz4ow | return 404 "{}";
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow | }
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow |
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,866 INFO - sagemaker-containers - Module train_and_serve does not provide a setup.py.
3fvyanwal0-algo-1-tz4ow | Generating setup.py
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,866 INFO - sagemaker-containers - Generating setup.cfg
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,866 INFO - sagemaker-containers - Generating MANIFEST.in
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:46,867 INFO - sagemaker-containers - Installing module with the following command:
3fvyanwal0-algo-1-tz4ow | /miniconda3/bin/python3 -m pip install . -r requirements.txt
3fvyanwal0-algo-1-tz4ow | Processing /opt/ml/code
3fvyanwal0-algo-1-tz4ow | Preparing metadata (setup.py) ... done
3fvyanwal0-algo-1-tz4ow | Collecting seaborn==0.11.2
3fvyanwal0-algo-1-tz4ow | Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 4.8 MB/s eta 0:00:0000:01eta -:--:--
3fvyanwal0-algo-1-tz4ow | Collecting matplotlib>=2.2
3fvyanwal0-algo-1-tz4ow | Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 93.0 MB/s eta 0:00:0000:0100:01:--:--
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
3fvyanwal0-algo-1-tz4ow | Collecting packaging>=20.0
3fvyanwal0-algo-1-tz4ow | Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 9.1 MB/s eta 0:00:0031m? eta -:--:--
3fvyanwal0-algo-1-tz4ow | Collecting fonttools>=4.22.0
3fvyanwal0-algo-1-tz4ow | Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 8.1 MB/s eta 0:00:0000:01eta -:--:--
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
3fvyanwal0-algo-1-tz4ow | Collecting cycler>=0.10
3fvyanwal0-algo-1-tz4ow | Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
3fvyanwal0-algo-1-tz4ow | Collecting pyparsing>=2.2.1
3fvyanwal0-algo-1-tz4ow | Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 21.4 MB/s eta 0:00:001m? eta -:--:--
3fvyanwal0-algo-1-tz4ow | Collecting kiwisolver>=1.0.1
3fvyanwal0-algo-1-tz4ow | Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 17.1 MB/s eta 0:00:00:00:01ta -:--:--
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
3fvyanwal0-algo-1-tz4ow | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
3fvyanwal0-algo-1-tz4ow | Building wheels for collected packages: train-and-serve
3fvyanwal0-algo-1-tz4ow | Building wheel for train-and-serve (setup.py) ... 2022/07/17 15:24:49 [crit] 15#15: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:24:49 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
3fvyanwal0-algo-1-tz4ow | done
3fvyanwal0-algo-1-tz4ow | Created wheel for train-and-serve: filename=train_and_serve-1.0.0-py2.py3-none-any.whl size=6122 sha256=914e6ad8ea2651da0216fefbc30c28bc25124ff514c30452de608e5b9807197c
3fvyanwal0-algo-1-tz4ow | Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-2u_4hcln/wheels/f3/75/57/158162e9eab7af12b5c338c279b3a81f103b89d74eeb911c00
3fvyanwal0-algo-1-tz4ow | Successfully built train-and-serve
3fvyanwal0-algo-1-tz4ow | Installing collected packages: train-and-serve, pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
3fvyanwal0-algo-1-tz4ow | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2 train-and-serve-1.0.0
3fvyanwal0-algo-1-tz4ow | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
3fvyanwal0-algo-1-tz4ow | 2022/07/17 15:24:54 [crit] 15#15: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:24:54 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:55,286 INFO - matplotlib.font_manager - generated new fontManager
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [37] [INFO] Starting gunicorn 20.0.4
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [37] [INFO] Listening at: unix:/tmp/gunicorn.sock (37)
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [37] [INFO] Using worker: gevent
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:55 +0000] [39] [INFO] Booting worker with pid: 39
3fvyanwal0-algo-1-tz4ow | [2022-07-17 15:24:56 +0000] [40] [INFO] Booting worker with pid: 40
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:24:59,750 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | model_fn model_dir: /opt/ml/model
!3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:25:01 +0000] "GET /ping HTTP/1.1" 200 0 "-" "python-urllib3/1.26.8"
Let’s create a sample request and get a prediction from our local inference endpoint.
request = [[9.0, 3571, 1976, 0.525]]
response = sk_predictor.predict(request)
response = int(response[0])
response
3fvyanwal0-algo-1-tz4ow | 2022-07-17 15:25:01,760 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
3fvyanwal0-algo-1-tz4ow | model_fn model_dir: /opt/ml/model
2
3fvyanwal0-algo-1-tz4ow | 172.18.0.1 - - [17/Jul/2022:15:25:03 +0000] "POST /invocations HTTP/1.1" 200 136 "-" "python-urllib3/1.26.8"
##
# map response to correct category type
print("Predicted class category {} ({})".format(response, categories_map[response]))
Predicted class category 2 (Iris-virginica)
Since the enpoint in running in the local environment we can observe a webserver running in a docker instance.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5a22263f860b 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.0-1-cpu-py3 "serve" 18 seconds ago Up 17 seconds 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp 3fvyanwal0-algo-1-tz4ow
Gracefully stopping... (press Ctrl+C again to force)
Note that in local mode, we can only serve a single model simultaneously. Therefore, if you are getting an error on the deploy
call in the local model, then check that there is no other endpoint running.
SKLean model server input and output processing
SageMaker model server breaks the incoming request into three steps: 1. input processing 2. prediction, and 3. output processing
In the last section, we have seen that the predict_fn
function in the source code file defines model prediction. Similarly, SageMaker provides two additional functions to control input and output processing, defined as input_fn
and output_fn
, respectively. Both these function have their default implementations. But we can override them by providing our own implementation for them in the source script. If no definition is provided in the source script, then the SageMaker scikit-learn model server will use the default implementation.
input_fn
: Takes request data and deserializes the data into an object for prediction.output_fn
: Takes the prediction result and serializes this according to the response content type.
Let’s update our script to preprocess input request and output response as JSON objects.
%%writefile $script_file
import argparse, os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
import joblib
import json
from my_custom_library import save_confusion_matrix
if __name__ == "__main__":
# Pass in environment variables and hyperparameters
parser = argparse.ArgumentParser()
# Hyperparameters
parser.add_argument("--estimators", type=int, default=15)
# sm_model_dir: model artifacts stored here after training
# sm-channel-train: input training data location
# sm-channel-test: input test data location
# sm-output-data-dir: output artifacts location
parser.add_argument("--sm-model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
parser.add_argument("--sm-channel-train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
parser.add_argument("--sm-channel-test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
parser.add_argument("--sm-output-data-dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR"))
args, _ = parser.parse_known_args()
print("command line arguments: ", args)
estimators = args.estimators
sm_model_dir = args.sm_model_dir
training_dir = args.sm_channel_train
testing_dir = args.sm_channel_test
output_data_dir = args.sm_output_data_dir
print(f"training_dir: {training_dir}")
print(f"training_dir files list: {os.listdir(training_dir)}")
print(f"testing_dir: {testing_dir}")
print(f"testing_dir files list: {os.listdir(testing_dir)}")
print(f"sm_model_dir: {sm_model_dir}")
print(f"output_data_dir: {output_data_dir}")
# Read in data
df_train = pd.read_csv(training_dir + "/train.csv", sep=",")
df_test = pd.read_csv(testing_dir + "/test.csv", sep=",")
# Preprocess data
X_train = df_train.drop(["class", "class_cat"], axis=1)
y_train = df_train["class_cat"]
X_test = df_test.drop(["class", "class_cat"], axis=1)
y_test = df_test["class_cat"]
print(f"X_train.shape: {X_train.shape}")
print(f"y_train.shape: {y_train.shape}")
print(f"X_train.shape: {X_test.shape}")
print(f"y_train.shape: {y_test.shape}")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Build model
regressor = RandomForestClassifier(n_estimators=estimators)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
# Save the model
joblib.dump(regressor, sm_model_dir + "/model.joblib")
# Save the results
pd.DataFrame(y_pred).to_csv(output_data_dir + "/y_pred.csv")
# save the confusion matrix
cf_matrix = confusion_matrix(y_test, y_pred)
save_confusion_matrix(cf_matrix, output_data_dir)
# print sm_model_dir info
print(f"sm_model_dir: {sm_model_dir}")
print(f"sm_model_dir files list: {os.listdir(sm_model_dir)}")
# print output_data_dir info
print(f"output_data_dir: {output_data_dir}")
print(f"output_data_dir files list: {os.listdir(output_data_dir)}")
# Model serving
"""
Deserialize fitted model
"""
def model_fn(model_dir):
print(f"model_fn model_dir: {model_dir}")
model = joblib.load(os.path.join(model_dir, "model.joblib"))
return model
"""
predict_fn
input_data: returned array from input_fn above
model (sklearn model) returned model loaded from model_fn above
"""
def predict_fn(input_data, model):
return model.predict(input_data)
"""
input_fn
request_body: The body of the request sent to the model.
request_content_type: (string) specifies the format/variable type of the request
"""
def input_fn(request_body, request_content_type):
if request_content_type == "application/json":
request_body = json.loads(request_body)
inpVar = request_body["Input"]
return inpVar
else:
raise ValueError("This model only supports application/json input")
"""
output_fn
prediction: the returned value from predict_fn above
content_type: the content type the endpoint expects to be returned. Ex: JSON, string
"""
def output_fn(prediction, content_type):
res = int(prediction[0])
respJSON = {"Output": res}
return respJSON
Overwriting ./datasets/2022-07-07-sagemaker-script-mode/src/train_and_serve.py
#collapse-output
# train and deploy model with input and output as JSON objects
sk_estimator = SKLearn(
entry_point=script_file_name,
source_dir=script_path,
dependencies=[custom_library_path],
role=role,
instance_count=1,
instance_type='local',
framework_version="1.0-1",
hyperparameters={"estimators":10},
)
sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
sk_predictor = sk_estimator.deploy(
initial_instance_count=1,
instance_type='local'
)
Creating ubyi50juw8-algo-1-9w0jk ...
Creating ubyi50juw8-algo-1-9w0jk ... done
Attaching to ubyi50juw8-algo-1-9w0jk
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,036 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,041 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,050 sagemaker_sklearn_container.training INFO Invoking user training script.
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:07,233 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:
ubyi50juw8-algo-1-9w0jk | /miniconda3/bin/python -m pip install -r requirements.txt
ubyi50juw8-algo-1-9w0jk | Collecting seaborn==0.11.2
ubyi50juw8-algo-1-9w0jk | Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 38.8 MB/s eta 0:00:0031m? eta -:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
ubyi50juw8-algo-1-9w0jk | Collecting matplotlib>=2.2
ubyi50juw8-algo-1-9w0jk | Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 94.9 MB/s eta 0:00:0000:0100:01:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
ubyi50juw8-algo-1-9w0jk | Collecting pyparsing>=2.2.1
ubyi50juw8-algo-1-9w0jk | Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 27.9 MB/s eta 0:00:001m? eta -:--:--
ubyi50juw8-algo-1-9w0jk | Collecting cycler>=0.10
ubyi50juw8-algo-1-9w0jk | Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
ubyi50juw8-algo-1-9w0jk | Collecting fonttools>=4.22.0
ubyi50juw8-algo-1-9w0jk | Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 26.2 MB/s eta 0:00:0000:01eta -:--:--
ubyi50juw8-algo-1-9w0jk | Collecting packaging>=20.0
ubyi50juw8-algo-1-9w0jk | Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 3.6 MB/s eta 0:00:0031m? eta -:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
ubyi50juw8-algo-1-9w0jk | Collecting kiwisolver>=1.0.1
ubyi50juw8-algo-1-9w0jk | Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 15.9 MB/s eta 0:00:00:00:01ta -:--:--
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
ubyi50juw8-algo-1-9w0jk | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
ubyi50juw8-algo-1-9w0jk | Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
ubyi50juw8-algo-1-9w0jk | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
ubyi50juw8-algo-1-9w0jk | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,747 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,760 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,773 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:11,781 sagemaker-training-toolkit INFO Invoking user script
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | Training Env:
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | {
ubyi50juw8-algo-1-9w0jk | "additional_framework_parameters": {},
ubyi50juw8-algo-1-9w0jk | "channel_input_dirs": {
ubyi50juw8-algo-1-9w0jk | "train": "/opt/ml/input/data/train",
ubyi50juw8-algo-1-9w0jk | "test": "/opt/ml/input/data/test"
ubyi50juw8-algo-1-9w0jk | },
ubyi50juw8-algo-1-9w0jk | "current_host": "algo-1-9w0jk",
ubyi50juw8-algo-1-9w0jk | "framework_module": "sagemaker_sklearn_container.training:main",
ubyi50juw8-algo-1-9w0jk | "hosts": [
ubyi50juw8-algo-1-9w0jk | "algo-1-9w0jk"
ubyi50juw8-algo-1-9w0jk | ],
ubyi50juw8-algo-1-9w0jk | "hyperparameters": {
ubyi50juw8-algo-1-9w0jk | "estimators": 10
ubyi50juw8-algo-1-9w0jk | },
ubyi50juw8-algo-1-9w0jk | "input_config_dir": "/opt/ml/input/config",
ubyi50juw8-algo-1-9w0jk | "input_data_config": {
ubyi50juw8-algo-1-9w0jk | "train": {
ubyi50juw8-algo-1-9w0jk | "TrainingInputMode": "File"
ubyi50juw8-algo-1-9w0jk | },
ubyi50juw8-algo-1-9w0jk | "test": {
ubyi50juw8-algo-1-9w0jk | "TrainingInputMode": "File"
ubyi50juw8-algo-1-9w0jk | }
ubyi50juw8-algo-1-9w0jk | },
ubyi50juw8-algo-1-9w0jk | "input_dir": "/opt/ml/input",
ubyi50juw8-algo-1-9w0jk | "is_master": true,
ubyi50juw8-algo-1-9w0jk | "job_name": "sagemaker-scikit-learn-2022-07-17-15-25-04-516",
ubyi50juw8-algo-1-9w0jk | "log_level": 20,
ubyi50juw8-algo-1-9w0jk | "master_hostname": "algo-1-9w0jk",
ubyi50juw8-algo-1-9w0jk | "model_dir": "/opt/ml/model",
ubyi50juw8-algo-1-9w0jk | "module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-04-516/source/sourcedir.tar.gz",
ubyi50juw8-algo-1-9w0jk | "module_name": "train_and_serve",
ubyi50juw8-algo-1-9w0jk | "network_interface_name": "eth0",
ubyi50juw8-algo-1-9w0jk | "num_cpus": 2,
ubyi50juw8-algo-1-9w0jk | "num_gpus": 0,
ubyi50juw8-algo-1-9w0jk | "output_data_dir": "/opt/ml/output/data",
ubyi50juw8-algo-1-9w0jk | "output_dir": "/opt/ml/output",
ubyi50juw8-algo-1-9w0jk | "output_intermediate_dir": "/opt/ml/output/intermediate",
ubyi50juw8-algo-1-9w0jk | "resource_config": {
ubyi50juw8-algo-1-9w0jk | "current_host": "algo-1-9w0jk",
ubyi50juw8-algo-1-9w0jk | "hosts": [
ubyi50juw8-algo-1-9w0jk | "algo-1-9w0jk"
ubyi50juw8-algo-1-9w0jk | ]
ubyi50juw8-algo-1-9w0jk | },
ubyi50juw8-algo-1-9w0jk | "user_entry_point": "train_and_serve.py"
ubyi50juw8-algo-1-9w0jk | }
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | Environment variables:
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | SM_HOSTS=["algo-1-9w0jk"]
ubyi50juw8-algo-1-9w0jk | SM_NETWORK_INTERFACE_NAME=eth0
ubyi50juw8-algo-1-9w0jk | SM_HPS={"estimators":10}
ubyi50juw8-algo-1-9w0jk | SM_USER_ENTRY_POINT=train_and_serve.py
ubyi50juw8-algo-1-9w0jk | SM_FRAMEWORK_PARAMS={}
ubyi50juw8-algo-1-9w0jk | SM_RESOURCE_CONFIG={"current_host":"algo-1-9w0jk","hosts":["algo-1-9w0jk"]}
ubyi50juw8-algo-1-9w0jk | SM_INPUT_DATA_CONFIG={"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}}
ubyi50juw8-algo-1-9w0jk | SM_OUTPUT_DATA_DIR=/opt/ml/output/data
ubyi50juw8-algo-1-9w0jk | SM_CHANNELS=["test","train"]
ubyi50juw8-algo-1-9w0jk | SM_CURRENT_HOST=algo-1-9w0jk
ubyi50juw8-algo-1-9w0jk | SM_MODULE_NAME=train_and_serve
ubyi50juw8-algo-1-9w0jk | SM_LOG_LEVEL=20
ubyi50juw8-algo-1-9w0jk | SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
ubyi50juw8-algo-1-9w0jk | SM_INPUT_DIR=/opt/ml/input
ubyi50juw8-algo-1-9w0jk | SM_INPUT_CONFIG_DIR=/opt/ml/input/config
ubyi50juw8-algo-1-9w0jk | SM_OUTPUT_DIR=/opt/ml/output
ubyi50juw8-algo-1-9w0jk | SM_NUM_CPUS=2
ubyi50juw8-algo-1-9w0jk | SM_NUM_GPUS=0
ubyi50juw8-algo-1-9w0jk | SM_MODEL_DIR=/opt/ml/model
ubyi50juw8-algo-1-9w0jk | SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-04-516/source/sourcedir.tar.gz
ubyi50juw8-algo-1-9w0jk | SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1-9w0jk","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1-9w0jk"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"TrainingInputMode":"File"},"train":{"TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-25-04-516","log_level":20,"master_hostname":"algo-1-9w0jk","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-04-516/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-9w0jk","hosts":["algo-1-9w0jk"]},"user_entry_point":"train_and_serve.py"}
ubyi50juw8-algo-1-9w0jk | SM_USER_ARGS=["--estimators","10"]
ubyi50juw8-algo-1-9w0jk | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
ubyi50juw8-algo-1-9w0jk | SM_CHANNEL_TRAIN=/opt/ml/input/data/train
ubyi50juw8-algo-1-9w0jk | SM_CHANNEL_TEST=/opt/ml/input/data/test
ubyi50juw8-algo-1-9w0jk | SM_HP_ESTIMATORS=10
ubyi50juw8-algo-1-9w0jk | PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | Invoking script with the following command:
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | /miniconda3/bin/python train_and_serve.py --estimators 10
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk |
ubyi50juw8-algo-1-9w0jk | command line arguments: Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
ubyi50juw8-algo-1-9w0jk | training_dir: /opt/ml/input/data/train
ubyi50juw8-algo-1-9w0jk | training_dir files list: ['train.csv']
ubyi50juw8-algo-1-9w0jk | testing_dir: /opt/ml/input/data/test
ubyi50juw8-algo-1-9w0jk | testing_dir files list: ['test.csv']
ubyi50juw8-algo-1-9w0jk | sm_model_dir: /opt/ml/model
ubyi50juw8-algo-1-9w0jk | output_data_dir: /opt/ml/output/data
ubyi50juw8-algo-1-9w0jk | X_train.shape: (120, 4)
ubyi50juw8-algo-1-9w0jk | y_train.shape: (120,)
ubyi50juw8-algo-1-9w0jk | X_train.shape: (30, 4)
ubyi50juw8-algo-1-9w0jk | y_train.shape: (30,)
ubyi50juw8-algo-1-9w0jk | sm_model_dir: /opt/ml/model
ubyi50juw8-algo-1-9w0jk | sm_model_dir files list: ['model.joblib']
ubyi50juw8-algo-1-9w0jk | output_data_dir: /opt/ml/output/data
ubyi50juw8-algo-1-9w0jk | output_data_dir files list: ['y_pred.csv', 'output_cm.png']
ubyi50juw8-algo-1-9w0jk | 2022-07-17 15:25:13,824 sagemaker-containers INFO Reporting training SUCCESS
ubyi50juw8-algo-1-9w0jk exited with code 0
Aborting on container exit...
Failed to delete: /tmp/tmp9kiuooe_/algo-1-9w0jk Please remove it manually.
===== Job Complete =====
Attaching to e6h08rxuj4-algo-1-bitrj
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,610 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,614 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,615 INFO - sagemaker-containers - nginx config:
e6h08rxuj4-algo-1-bitrj | worker_processes auto;
e6h08rxuj4-algo-1-bitrj | daemon off;
e6h08rxuj4-algo-1-bitrj | pid /tmp/nginx.pid;
e6h08rxuj4-algo-1-bitrj | error_log /dev/stderr;
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | worker_rlimit_nofile 4096;
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | events {
e6h08rxuj4-algo-1-bitrj | worker_connections 2048;
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | http {
e6h08rxuj4-algo-1-bitrj | include /etc/nginx/mime.types;
e6h08rxuj4-algo-1-bitrj | default_type application/octet-stream;
e6h08rxuj4-algo-1-bitrj | access_log /dev/stdout combined;
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | upstream gunicorn {
e6h08rxuj4-algo-1-bitrj | server unix:/tmp/gunicorn.sock;
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | server {
e6h08rxuj4-algo-1-bitrj | listen 8080 deferred;
e6h08rxuj4-algo-1-bitrj | client_max_body_size 0;
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | keepalive_timeout 3;
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | location ~ ^/(ping|invocations|execution-parameters) {
e6h08rxuj4-algo-1-bitrj | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
e6h08rxuj4-algo-1-bitrj | proxy_set_header Host $http_host;
e6h08rxuj4-algo-1-bitrj | proxy_redirect off;
e6h08rxuj4-algo-1-bitrj | proxy_read_timeout 60s;
e6h08rxuj4-algo-1-bitrj | proxy_pass http://gunicorn;
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | location / {
e6h08rxuj4-algo-1-bitrj | return 404 "{}";
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj | }
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj |
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Module train_and_serve does not provide a setup.py.
e6h08rxuj4-algo-1-bitrj | Generating setup.py
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Generating setup.cfg
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Generating MANIFEST.in
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:16,826 INFO - sagemaker-containers - Installing module with the following command:
e6h08rxuj4-algo-1-bitrj | /miniconda3/bin/python3 -m pip install . -r requirements.txt
e6h08rxuj4-algo-1-bitrj | Processing /opt/ml/code
e6h08rxuj4-algo-1-bitrj | Preparing metadata (setup.py) ... done
e6h08rxuj4-algo-1-bitrj | Collecting seaborn==0.11.2
e6h08rxuj4-algo-1-bitrj | Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 36.6 MB/s eta 0:00:0031m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
e6h08rxuj4-algo-1-bitrj | Collecting matplotlib>=2.2
e6h08rxuj4-algo-1-bitrj | Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 101.9 MB/s eta 0:00:0000:0100:01:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
e6h08rxuj4-algo-1-bitrj | Collecting kiwisolver>=1.0.1
e6h08rxuj4-algo-1-bitrj | Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 94.2 MB/s eta 0:00:0031m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Collecting cycler>=0.10
e6h08rxuj4-algo-1-bitrj | Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
e6h08rxuj4-algo-1-bitrj | Collecting packaging>=20.0
e6h08rxuj4-algo-1-bitrj | Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 11.9 MB/s eta 0:00:001m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Collecting fonttools>=4.22.0
e6h08rxuj4-algo-1-bitrj | Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 12.1 MB/s eta 0:00:0000:01eta -:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
e6h08rxuj4-algo-1-bitrj | Collecting pyparsing>=2.2.1
e6h08rxuj4-algo-1-bitrj | Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 27.3 MB/s eta 0:00:001m? eta -:--:--
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
e6h08rxuj4-algo-1-bitrj | Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
e6h08rxuj4-algo-1-bitrj | Building wheels for collected packages: train-and-serve
e6h08rxuj4-algo-1-bitrj | Building wheel for train-and-serve (setup.py) ... -2022/07/17 15:25:19 [crit] 14#14: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:19 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
e6h08rxuj4-algo-1-bitrj |done
e6h08rxuj4-algo-1-bitrj | Created wheel for train-and-serve: filename=train_and_serve-1.0.0-py2.py3-none-any.whl size=6682 sha256=f4b6952b904adaa9a17270142b81e0746714104ac88739bf2ba644d15a4fe837
e6h08rxuj4-algo-1-bitrj | Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-6muul1xe/wheels/f3/75/57/158162e9eab7af12b5c338c279b3a81f103b89d74eeb911c00
e6h08rxuj4-algo-1-bitrj | Successfully built train-and-serve
e6h08rxuj4-algo-1-bitrj | Installing collected packages: train-and-serve, pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
e6h08rxuj4-algo-1-bitrj | Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2 train-and-serve-1.0.0
e6h08rxuj4-algo-1-bitrj | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
e6h08rxuj4-algo-1-bitrj | 2022/07/17 15:25:24 [crit] 14#14: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:24 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.8"
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:24,850 INFO - matplotlib.font_manager - generated new fontManager
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [35] [INFO] Starting gunicorn 20.0.4
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [35] [INFO] Listening at: unix:/tmp/gunicorn.sock (35)
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [35] [INFO] Using worker: gevent
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [37] [INFO] Booting worker with pid: 37
e6h08rxuj4-algo-1-bitrj | [2022-07-17 15:25:25 +0000] [38] [INFO] Booting worker with pid: 38
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:29,802 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | model_fn model_dir: /opt/ml/model
!e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:31 +0000] "GET /ping HTTP/1.1" 200 0 "-" "python-urllib3/1.26.8"
'sagemaker-scikit-learn-2022-07-17-15-25-14-401'
##
# send JSON request to endpoint
import json
client = session_local.sagemaker_runtime_client
request_body = {"Input": [[9.0, 3571, 1976, 0.525]]}
data = json.loads(json.dumps(request_body))
payload = json.dumps(data)
response = client.invoke_endpoint(
EndpointName=sk_endpoint_name, ContentType="application/json", Body=payload
)
result = json.loads(response["Body"].read().decode())["Output"]
result
e6h08rxuj4-algo-1-bitrj | 2022-07-17 15:25:31,439 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
e6h08rxuj4-algo-1-bitrj | model_fn model_dir: /opt/ml/model
e6h08rxuj4-algo-1-bitrj | 172.18.0.1 - - [17/Jul/2022:15:25:32 +0000] "POST /invocations HTTP/1.1" 200 13 "-" "python-urllib3/1.26.8"
2
##
# get JSON response from endpoint
print("Predicted class category {} ({})".format(result, categories_map[result]))
Predicted class category 2 (Iris-virginica)
Make sure we delete the endpoint once we have tested our deployed model.
SKLearn model training and serving in SageMaker managed environment
Now that our script is complete and we have tested it in a local environment, let’s proceed to train and deploy it in Amazon SageMaker managed environment. Moving from a local environment to managed environment is very simple. We only need to change the instance type.
#collapse-output
# train and deploy model with input and output as JSON objects
sk_estimator = SKLearn(
entry_point=script_file_name,
source_dir=script_path,
dependencies=[custom_library_path],
role=role,
instance_count=1,
instance_type='ml.m5.large',
framework_version="1.0-1",
hyperparameters={"estimators":10},
)
sk_estimator.fit({"train": s3_train_uri, "test": s3_test_uri})
2022-07-17 15:25:33 Starting - Starting the training job...
2022-07-17 15:25:57 Starting - Preparing the instances for trainingProfilerReport-1658071533: InProgress
.........
2022-07-17 15:27:17 Downloading - Downloading input data...
2022-07-17 15:27:57 Training - Downloading the training image...
2022-07-17 15:28:35 Training - Training image download completed. Training in progress...2022-07-17 15:28:37,458 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
2022-07-17 15:28:37,462 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:37,478 sagemaker_sklearn_container.training INFO Invoking user training script.
2022-07-17 15:28:37,948 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:
/miniconda3/bin/python -m pip install -r requirements.txt
Collecting seaborn==0.11.2
Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 kB 11.5 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.15 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.21.0)
Collecting matplotlib>=2.2
Downloading matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 69.0 MB/s eta 0:00:00
Requirement already satisfied: scipy>=1.0 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.5.3)
Requirement already satisfied: pandas>=0.23 in /miniconda3/lib/python3.8/site-packages (from seaborn==0.11.2->-r requirements.txt (line 2)) (1.1.3)
Requirement already satisfied: pillow>=6.2.0 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (9.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /miniconda3/lib/python3.8/site-packages (from matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (2.8.1)
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 37.9 MB/s eta 0:00:00
Collecting packaging>=20.0
Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 kB 7.4 MB/s eta 0:00:00
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting pyparsing>=2.2.1
Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 kB 18.2 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
Downloading fonttools-4.34.4-py3-none-any.whl (944 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 944.1/944.1 kB 52.9 MB/s eta 0:00:00
Requirement already satisfied: pytz>=2017.2 in /miniconda3/lib/python3.8/site-packages (from pandas>=0.23->seaborn==0.11.2->-r requirements.txt (line 2)) (2022.1)
Requirement already satisfied: six>=1.5 in /miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn==0.11.2->-r requirements.txt (line 2)) (1.15.0)
Installing collected packages: pyparsing, kiwisolver, fonttools, cycler, packaging, matplotlib, seaborn
Successfully installed cycler-0.11.0 fonttools-4.34.4 kiwisolver-1.4.4 matplotlib-3.5.2 packaging-21.3 pyparsing-3.0.9 seaborn-0.11.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
2022-07-17 15:28:44,790 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:44,810 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:44,831 sagemaker-training-toolkit INFO No GPUs detected (normal if no gpus installed)
2022-07-17 15:28:44,847 sagemaker-training-toolkit INFO Invoking user script
Training Env:
{
"additional_framework_parameters": {},
"channel_input_dirs": {
"test": "/opt/ml/input/data/test",
"train": "/opt/ml/input/data/train"
},
"current_host": "algo-1",
"framework_module": "sagemaker_sklearn_container.training:main",
"hosts": [
"algo-1"
],
"hyperparameters": {
"estimators": 10
},
"input_config_dir": "/opt/ml/input/config",
"input_data_config": {
"test": {
"TrainingInputMode": "File",
"S3DistributionType": "FullyReplicated",
"RecordWrapperType": "None"
},
"train": {
"TrainingInputMode": "File",
"S3DistributionType": "FullyReplicated",
"RecordWrapperType": "None"
}
},
"input_dir": "/opt/ml/input",
"is_master": true,
"job_name": "sagemaker-scikit-learn-2022-07-17-15-25-33-210",
"log_level": 20,
"master_hostname": "algo-1",
"model_dir": "/opt/ml/model",
"module_dir": "s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-33-210/source/sourcedir.tar.gz",
"module_name": "train_and_serve",
"network_interface_name": "eth0",
"num_cpus": 2,
"num_gpus": 0,
"output_data_dir": "/opt/ml/output/data",
"output_dir": "/opt/ml/output",
"output_intermediate_dir": "/opt/ml/output/intermediate",
"resource_config": {
"current_host": "algo-1",
"current_instance_type": "ml.m5.large",
"current_group_name": "homogeneousCluster",
"hosts": [
"algo-1"
],
"instance_groups": [
{
"instance_group_name": "homogeneousCluster",
"instance_type": "ml.m5.large",
"hosts": [
"algo-1"
]
}
],
"network_interface_name": "eth0"
},
"user_entry_point": "train_and_serve.py"
}
Environment variables:
SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"estimators":10}
SM_USER_ENTRY_POINT=train_and_serve.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_group_name":"homogeneousCluster","current_host":"algo-1","current_instance_type":"ml.m5.large","hosts":["algo-1"],"instance_groups":[{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.m5.large"}],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["test","train"]
SM_CURRENT_HOST=algo-1
SM_MODULE_NAME=train_and_serve
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=2
SM_NUM_GPUS=0
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-33-210/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1"],"hyperparameters":{"estimators":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2022-07-17-15-25-33-210","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-east-1-801598032724/sagemaker-scikit-learn-2022-07-17-15-25-33-210/source/sourcedir.tar.gz","module_name":"train_and_serve","network_interface_name":"eth0","num_cpus":2,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_group_name":"homogeneousCluster","current_host":"algo-1","current_instance_type":"ml.m5.large","hosts":["algo-1"],"instance_groups":[{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.m5.large"}],"network_interface_name":"eth0"},"user_entry_point":"train_and_serve.py"}
SM_USER_ARGS=["--estimators","10"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TEST=/opt/ml/input/data/test
SM_CHANNEL_TRAIN=/opt/ml/input/data/train
SM_HP_ESTIMATORS=10
PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python38.zip:/miniconda3/lib/python3.8:/miniconda3/lib/python3.8/lib-dynload:/miniconda3/lib/python3.8/site-packages
Invoking script with the following command:
/miniconda3/bin/python train_and_serve.py --estimators 10
command line arguments: Namespace(estimators=10, sm_channel_test='/opt/ml/input/data/test', sm_channel_train='/opt/ml/input/data/train', sm_model_dir='/opt/ml/model', sm_output_data_dir='/opt/ml/output/data')
training_dir: /opt/ml/input/data/train
training_dir files list: ['train.csv']
testing_dir: /opt/ml/input/data/test
testing_dir files list: ['test.csv']
sm_model_dir: /opt/ml/model
output_data_dir: /opt/ml/output/data
X_train.shape: (120, 4)
y_train.shape: (120,)
X_train.shape: (30, 4)
y_train.shape: (30,)
sm_model_dir: /opt/ml/model
sm_model_dir files list: ['model.joblib']
output_data_dir: /opt/ml/output/data
output_data_dir files list: ['y_pred.csv', 'output_cm.png']
2022-07-17 15:28:48,476 sagemaker-containers INFO Reporting training SUCCESS
2022-07-17 15:28:58 Uploading - Uploading generated training model
2022-07-17 15:29:18 Completed - Training job completed
Training seconds: 116
Billable seconds: 116
We have used AWS managed ml.m5.large
instance for training. Once the training job is complete, model artifacts are uploaded to the S3 bucket. In the end, it also shows the billable seconds.
Let’s confirm that the session we are using is not local.
Now deploy it on SageMaker managed ml.t2.medium
instance.
-------------!
Test deployed model with a sample request.
##
# send JSON request to endpoint
import json
client = session.sagemaker_runtime_client
request_body = {"Input": [[9.0, 3571, 1976, 0.525]]}
data = json.loads(json.dumps(request_body))
payload = json.dumps(data)
response = client.invoke_endpoint(
EndpointName=sk_predictor.endpoint_name,
ContentType="application/json",
Body=payload,
)
result = json.loads(response["Body"].read().decode())["Output"]
result
2
##
# get JSON response from endpoint
print("Predicted class category {} ({})".format(result, categories_map[result]))
Predicted class category 2 (Iris-virginica)
Again, don’t forget to delete the endpoint once you are done with testing.