Dataleap
- May 19, 2023
- 3 min read

Hosting a Scikit-Learn model in AWS Sagemaker and creating a REST API

Updated: Feb 12

In this blog post, we explore the step-by-step process of hosting your machine learning model and setting up a robust REST API. Unleash the power of scalable deployment and effortless integration as we guide you through the world of model hosting on AWS Sagemaker.

Amazon SageMaker is a fully-managed AWS service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale. This Amazon Sagemaker service provides an integrated Jupyter Notebook for access to our data resources for analysis and exploration. Sagemaker provides us with common machine learning algorithms. It allows us to deploy our machine learning model using AWS API Gateway which will be covered in this tutorial. Let’s first start signing into our AWS console and search for Amazon Sagemaker. After that follow the below steps

Sagemaker Notebook Setup

Create a new notebook instance and use the below configurations

Add a notebook name and select an instance type.
Get the IAM role from DevOps. The same role would be used for lambda as well.
Add your VPC configurations in the Network section.

Once we create this we will have access to this jupyter notebook. We will train the model and deploy it inside the notebook.

Prepare the Machine Learning Model

We will use Text Dataset which is an IMDb dataset for NLP analysis.

We save the data and the vectorizer in an S3 bucket which will be used later while training the model.

sagemaker_session = sagemaker.Session()
sagemaker_session.upload_data(path="bow_vectorizer.pickle",
                        bucket=sagemaker.Session().default_bucket(), 
                              key_prefix='SentimentDataset/data')sagemaker_session.upload_data(path="labeledTrainData.tsv",                              
                        bucket=sagemaker.Session().default_bucket(), 
                              key_prefix='SentimentDataset/data')

Since we will be using a custom sklearn model, we will use sklearn estimator. Here you need to pass a Python file as an entry point which contains the whole process of training and saving the model.

sagemaker_session = sagemaker.Session(boto3.session.Session(region_name='ap-south-1'))
# create SKLearn estimator
sklearn_estimator = SKLearn(entry_point="train_model.py",
                            role=role,
                            instance_type="ml.m5.large",
                            instance_count=1,
                            sagemaker_session=sagemaker_session,
                            framework_version='0.20.0',
                            py_version = 'py3',source_dir = './',
                            dependencies=['requirements.txt'],
                            base_job_name="TestSemantic",
                            code_location="s3://S3location/SentimentDataset/",
                            output_path="s3://S3location/SentimentDataset/")

Here we will specify a requirements.txt file that will be required while training the model on a new machine.

code snippet for train_model.py

We will call the fit method to train the model and save the model artifacts.

# map all data files from the S3 directory to args.train
data_channels = {"train": "some s3location/SentimentDataset/data"}
# fit model        
sklearn_estimator.fit(inputs=data_channels,
                      logs=True,wait=True)

Once the model is trained we will now deploy it using the ‘deploy()’ method and get an endpoint.

We will use the SKLearnPredictor to predict the output. Once we have this output we can now move to the next step. We can check the logs of the whole process using https://aws.amazon.com/cloudwatch/ based on the job_name variable which we mentioned while creating sklearn_estimator.

lambda Function Lambda function is a serverless compute service that lets us run our code without managing the server. Lambda functions trigger from the code based on incoming user requests or events. Here we will use AWS API to trigger the lambda function. Create a new lambda function and use the below configurations

Add a function name and select the same Execution role as the notebook instance.
Enable the VPC configurations in Advanced Settings.

lambda_function.py

Here we will pass the endpoint to the sagemaker runtime and get the predictions. We can return the output in any format we need

Let’s save an environment variable called ENDPOINT_NAME which will contain the endpoint name.

Let’s also add a test event to check whether the lambda code is working as expected.

Deploy and test the changes

Click on Test to get the predictions

AWS API Gateway

Amazon API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs at any scale. Click on Build Rest API, we can build a private Rest API if it needs to be accessed only within VPC.