AWS Machine Learning Certification Notes (MLS-C01)

aws

My notes for AWS ML speciality exam passed on May 14, 2022.

Published

May 14, 2022

About

This post is a compilation of important notes and references for AWS Machine Learning Certification MLSC01.

Notes

Amazon Comprehend

Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text.

References

https://aws.amazon.com/comprehend/

Amazon Rekognition

Amazon Rekognition offers pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos.

References

https://aws.amazon.com/rekognition/

Amazon Polly

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.

References

https://aws.amazon.com/polly/

Amazon Lex

Amazon Lex is a fully managed artificial intelligence (AI) service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications (chat bots).

References

https://aws.amazon.com/lex/

Amazon Transcribe

Amazon Transcribe is an automatic speech recognition service that makes it easy to add speech to text capabilities to any application. Transcribe’s features enable you to ingest audio input, produce easy to read and review transcripts, improve accuracy with customization, and filter content to ensure customer privacy.

References

https://aws.amazon.com/transcribe/

Latent Dirichlet Allocation (LDA) Algorithm

It is a topic modeling technique to generate abstract topics based on word frequency from a set of documents
It is similar to unsupervised classification of documents
It is useful for automatically organizing, summerizing, understanding and searching large electronic archives. It can help in
- discovering hidden themes in the collection
- classifying document into dicoverable themes
- organize/summerize/search the documents

References

https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2

Multinomial Logistic Regression Algorithm

Multinomial Logistic Regression is an extension of logistic regression (supervised) that allows more than two discrete outcomes (multiclass).

References

https://en.wikipedia.org/wiki/Multinomial_logistic_regression

Factorization Machines Algorithm

The Factorization Machines algorithm is a general-purpose supervised learning algorithm that you can use for both classification and regression tasks. It is an extension of a linear model that is designed to capture interactions between features within high dimensional sparse datasets economically. For example, in a click prediction system, the Factorization Machines model can capture click rate patterns observed when ads from a certain ad-category are placed on pages from a certain page-category. Factorization machines are a good choice for tasks dealing with high dimensional sparse datasets, such as click prediction and item recommendation.

References

https://docs.aws.amazon.com/sagemaker/latest/dg/fact-machines.html

Sequence to Sequence (seq2seq) Algorithm

Amazon SageMaker Sequence to Sequence is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. Example applications include * machine translation (input a sentence from one language and predict what that sentence would be in another language) * text summarization (input a longer string of words and predict a shorter string of words that is a summary) * speech-to-text (audio clips converted into output sentences in tokens)

Problems in this domain have been successfully modeled with deep neural networks that show a significant performance boost over previous methodologies. Amazon SageMaker seq2seq uses Recurrent Neural Networks (RNNs) and Convolutional Neural Network (CNN) models with attention as encoder-decoder architectures.

References

https://docs.aws.amazon.com/sagemaker/latest/dg/seq-2-seq.html

Term frequency-inverse document frequency Algorithm

TF-IDF (term frequency-inverse document frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents.

This is done by multiplying two metrics: how many times a word appears in a document (frequency), and the inverse document frequency of the word across a set of documents.

The term frequency of a word in a document. There are several ways of calculating this frequency, with the simplest being a raw count of instances a word appears in a document.
The inverse document frequency of the word across a set of documents. This means, how common or rare a word is in the entire document set. The closer it is to 0, the more common a word is. This metric can be calculated by taking the total number of documents, dividing it by the number of documents that contain a word, and calculating the logarithm. So, if the word is very common and appears in many documents, this number will approach 0. Otherwise, it will approach 1

It has many uses, most importantly in automated text analysis, and is very useful for scoring words in machine learning algorithms for Natural Language Processing (NLP).

References

https://monkeylearn.com/blog/what-is-tf-idf

BlazingText Algorithm

The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.

The Word2vec algorithm maps words to high-quality distributed vectors. The resulting vector representation of a word is called a word embedding. Words that are semantically similar correspond to vectors that are close together. That way, word embeddings capture the semantic relationships between words.

References

https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html

Amazon SageMaker Batch Transform

Use batch transform when you need to do the following:

Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
Get inferences from large datasets.
Run inference when you don’t need a persistent endpoint.
Associate input records with inferences to assist the interpretation of results.

References

https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html

Amazon SageMaker Real-time inference / Hosting Services

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.

References

https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html

Amazon SageMaker Inference Pipeline

An inference pipeline is a Amazon SageMaker model that is composed of a linear sequence of two to fifteen containers that process requests for inferences on data. You use an inference pipeline to define and deploy any combination of pretrained SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers. You can use an inference pipeline to combine preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully managed.

Within an inference pipeline model, SageMaker handles invocations as a sequence of HTTP requests. The first container in the pipeline handles the initial request, then the intermediate response is sent as a request to the second container, and so on, for each container in the pipeline. SageMaker returns the final response to the client.

When you deploy the pipeline model, SageMaker installs and runs all of the containers on each Amazon Elastic Compute Cloud (Amazon EC2) instance in the endpoint or transform job. Feature processing and inferences run with low latency because the containers are co-located on the same EC2 instances.

References

https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html

Amazon SageMaker Neo

Amazon SageMaker Neo automatically optimizes machine learning models for inference on cloud instances and edge devices to run faster with no loss in accuracy. You start with a machine learning model already built with DarkNet, Keras, MXNet, PyTorch, TensorFlow, TensorFlow-Lite, ONNX, or XGBoost and trained in Amazon SageMaker or anywhere else. Then you choose your target hardware platform, which can be a SageMaker hosting instance or an edge device based on processors from Ambarella, Apple, ARM, Intel, MediaTek, Nvidia, NXP, Qualcomm, RockChip, Texas Instruments, or Xilinx. With a single click, SageMaker Neo optimizes the trained model and compiles it into an executable. The compiler uses a machine learning model to apply the performance optimizations that extract the best available performance for your model on the cloud instance or edge device. You then deploy the model as a SageMaker endpoint or on supported edge devices and start making predictions.

References

https://aws.amazon.com/sagemaker/neo/

LSTM / Long Short-Term Memory

LSTM is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can process not only single data points (such as images), but also entire sequences of data (such as speech or video).

LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. LSTM is applicable to tasks such as anomaly detection in network traffic or IDSs (intrusion detection systems)

References

https://en.wikipedia.org/wiki/Long_short-term_memory

Semantic Segmentation Algorithm

The SageMaker semantic segmentation algorithm provides a fine-grained, pixel-level approach to developing computer vision applications. It tags every pixel in an image with a class label from a predefined set of classes. Tagging is fundamental for understanding scenes, which is critical to an increasing number of computer vision applications, such as self-driving vehicles, medical imaging diagnostics, and robot sensing.

For comparison, the SageMaker Image Classification Algorithm is a supervised learning algorithm that analyzes only whole images, classifying them into one of multiple output categories. The Object Detection Algorithm is a supervised learning algorithm that detects and classifies all instances of an object in an image. It indicates the location and scale of each object in the image with a rectangular bounding box.

Because the semantic segmentation algorithm classifies every pixel in an image, it also provides information about the shapes of the objects contained in the image. The segmentation output is represented as a grayscale image, called a segmentation mask. A segmentation mask is a grayscale image with the same shape as the input image.

The SageMaker semantic segmentation algorithm is built using the MXNet Gluon framework and the Gluon CV toolkit.

References * https://docs.aws.amazon.com/sagemaker/latest/dg/semantic-segmentation.html

Accuracy

Accuracy measures the fraction of correct predictions. The range is 0 to 1.

Accuracy = (TP + TN) / (TP + FP + TN + FN)

Precision

Precision measures the fraction of actual positives among those examples that are predicted as positive. The range is 0 to 1. This formula tells us that the larger value of FP (False Positives), the lower the Precision.

Precision = TP / (TP + FP)

For maximun precision there should be no FP. FP are also called Type 1 error.

Recall

The Recall measures the fraction of actual positives that are predicted as positive. The range is 0 to 1. This formula tells us that the larger value of FN (False Negatives), the lower the Recall.

Recall = TP / (TP + FN)

For maximun recall there should be no FN. FN are also called Type 2 error.

Note: Precision and Recall are inversely proportional to eachother.

References

https://towardsdatascience.com/model-evaluation-i-precision-and-recall-166ddb257c7b

L1 regularization

L1 regularization, also known as L1 norm or Lasso (in regression problems), combats overfitting by shrinking the parameters towards 0. This makes some features obsolete.

It’s a form of feature selection, because when we assign a feature with a 0 weight, we’re multiplying the feature values by 0 which returns 0, eradicating the significance of that feature. If the input features of our model have weights closer to 0, our L1 norm would be sparse. A selection of the input features would have weights equal to zero, and the rest would be non-zero.

L2 regularization

L2 regularization, or the L2 norm, or Ridge (in regression problems), combats overfitting by forcing weights to be small, but not making them exactly 0. This regularization returns a non-sparse solution since the weights will be non-zero (although some may be close to 0). A major snag to consider when using L2 regularization is that it’s not robust to outliers.

References

https://neptune.ai/blog/fighting-overfitting-with-l1-or-l2-regularization

K-means

K-means is a clustering algorithm that tries to partition a set of points into K sets (clusters) such that the points in each cluster tend to be near each other. It is unsupervised because the points have no external classification.

K-nearest neighbors

K-nearest neighbors is a classification (or regression) algorithm that in order to determine the classification of a point, combines the classification of the K nearest points. It is supervised because you are trying to classify a point based on the known classification of other points.

Courses

Exam Readiness: AWS Certified Machine Learning - Specialty

https://explore.skillbuilder.aws/learn/course/27/play/54/exam-readiness-aws-certified-machine-learning-specialty

This is overall a very good short course that can help you identify your strengths and weaknesses in each exam domain so you know where to focus when studying for the exam.

ACloudGuru AWS Certified Machine Learning - Specialty 2020

https://acloudguru.com/course/aws-certified-machine-learning-specialty

This is a detailed course on the topics covered in the exam. But this course lacks on “Modeling” domain and hands-on labs. Besides taking this course you should have a good knowledge and working experience in data science and machine learning domain. I already have AI/ML background so it was not an issue for me. Some people have recommended taking Machine Learning, Data Science and Deep Learning with Python from Frank Kane on Udemy if you don’t have an ML background but I am not sure about it’s worth.

Practice Projects

Besides preparing for the exam you should do some projects to build good hands-on knowledge. For this you can use How-To Guides from AWS Getting Started Resource Center (Link Here). Some of my favorite projects are * Build, train, deploy, and monitor a machine learning model with Amazon SageMaker Studio * Optimizing and Scaling Machine Learning Training with Managed Spot Training for Amazon SageMaker

Practice Dumps

For exam practice tests I have used Jon Bonso Udemy course AWS Certified Machine Learning Specialty Practice Exams

Other Tips

About a week before your exam date start checking Reddit Communities. From time to time people post about their achievements and experiences on taking the exam. People also mention the services or topics that they were asked about during the exam. Keep a close eye on such posts and try to find any topic that you have not covered before.