<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Michael Stainsbury </title>
    <description>The latest articles on Forem by Michael Stainsbury  (@mlexam).</description>
    <link>https://forem.com/mlexam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1125740%2F931c84a7-57cd-4a3b-bb47-14765863e8bf.jpg</url>
      <title>Forem: Michael Stainsbury </title>
      <link>https://forem.com/mlexam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mlexam"/>
    <language>en</language>
    <item>
      <title>AWS Machine Learning exam guide</title>
      <dc:creator>Michael Stainsbury </dc:creator>
      <pubDate>Wed, 02 Aug 2023 17:03:11 +0000</pubDate>
      <link>https://forem.com/mlexam/aws-machine-learning-exam-guide-3nl1</link>
      <guid>https://forem.com/mlexam/aws-machine-learning-exam-guide-3nl1</guid>
      <description>&lt;p&gt;&lt;strong&gt;A guide to the guide&lt;/strong&gt;&lt;br&gt;
Syllabus, specification and blue print are all terms to describe the knowledge domain of an exam or course. However AWS call the description of the content of their Machine Learning exam the Exam Guide. Perhaps this is a telling choice since the information they provide is far from comprehensive, it is just a guide.&lt;/p&gt;

&lt;p&gt;If you come from an AWS Machine Learning background the Exam Guide PDF will be sufficient for you. However if you are already a Data Scientist and wish to move into AWS, or you use AWS and want to learn to SageMaker and Machine Learning then large chunks of the Exam Guide will be unintelligible. This guide to the guide fills the gaps and explains high level concepts.&lt;/p&gt;

&lt;p&gt;The Exam Guide is where the exam subjects are listed, split into four domains and fifteen sub-domains. This article describes each sub-domain in enough detail for the complete newbie to get a good idea of what it is about. If you intend to study for the AWS Machine Learning certificate this will give you an overview of what you are getting yourself into.&lt;/p&gt;

&lt;p&gt;AWS pdf: &lt;a href="https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf"&gt;https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Domain 1: Data Engineering
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.mlexam.com/home/domain-1-data-engineering/"&gt;Domain 1 Data Engineering&lt;/a&gt; is concerned with obtaining the data, transforming it and putting it in a repository. It comprises 20% of the exam marks. There are three sub-domains that can be summarised as:&lt;/p&gt;

&lt;p&gt;1.1 &lt;a href="https://www.mlexam.com/data-repositories/"&gt;Data repositories&lt;/a&gt;&lt;br&gt;
1.2 &lt;a href="https://www.mlexam.com/data-cleansing/"&gt;Data ingestion&lt;/a&gt;&lt;br&gt;
1.3 &lt;a href="https://www.mlexam.com/data-transformation/"&gt;Data transformation&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1.1 Data repositories
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Create data repositories for machine learning&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The data repository is where raw and processed data is stored. S3 is the repository of choice for Machine Learning in AWS and all built-in algorithms and services can consume data from S3. Other data stores are also mentioned in the exam guide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database (Relational Database Service)&lt;/li&gt;
&lt;li&gt;Data Lake (LakeFormation)&lt;/li&gt;
&lt;li&gt;EFS&lt;/li&gt;
&lt;li&gt;EBS
Often data is generated by the business themselves, but sometimes data from other sources is needed to train the model. For example libraries of image data to train the Object Detection algorithm. Many data sources are publicly available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1.2 Data ingestion
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Identify and implement a data ingestion solution&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The data ingestion sub-domain is concerned with gathering the raw data into the repository. This can be via batch processing or streaming data. With &lt;a href="https://www.mlexam.com/batch-processing/"&gt;batch processing&lt;/a&gt;, data is collected and grouped at a point in time and passed to the data store. &lt;a href="https://www.mlexam.com/streaming-data-for-machine-learning/"&gt;Streaming data&lt;/a&gt; is constantly being collected and fed into the data store. The AWS streaming services are:&lt;/p&gt;

&lt;p&gt;Kinesis family of streaming data services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kinesis Data Streams&lt;/li&gt;
&lt;li&gt;Kinesis Firehose&lt;/li&gt;
&lt;li&gt;Kinesis Analytics&lt;/li&gt;
&lt;li&gt;Kinesis Video Streams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zbPakx5S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dynjieq533pcdmx22sxx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zbPakx5S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dynjieq533pcdmx22sxx.jpg" alt="An infographic describing the Kinesis family of streaming data services: Kinesis Data Streams, Kinesis Firehose, Kinesis Analytics and Kinesis Video Streams" width="450" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Batch processing requires a way to schedule or trigger the processing, also called job scheduling. Examples are Glue Workflow and Step Functions. AWS batch services include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EMR (Hadoop)&lt;/li&gt;
&lt;li&gt;Glue (Spark)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1.3 Data transformation
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Identify and implement a data transformation solution&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The third Data Engineering sub-domain focuses on how raw data is transformed into data that can be used for ML processing. The transformation process changes the data structure. The data may also need to be clean up, de-duplicated, incomplete data managed and have it’s attributes standardised. The AWS Services are similar to those used with data ingestion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Glue (Spark)&lt;/li&gt;
&lt;li&gt;EMR (Hadoop, Spark, Hive)&lt;/li&gt;
&lt;li&gt;AWS Batch
Once these data engineering processes are complete the data is ready for further pre-processing prior to being fed into a Machine Learning algorithm. This preprocessing is covered by the second knowledge domain Exploratory Data Analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Domain 2: Exploratory Data Analysis
&lt;/h2&gt;

&lt;p&gt;In the &lt;a href="https://www.mlexam.com/home/domain-2-exploratory-data-analysis/"&gt;Exploratory Data Analysis domain&lt;/a&gt; the data is analysed so it can be understood and cleaned up. It comprises 24% of the exam marks. There are three sub-domains:&lt;/p&gt;

&lt;p&gt;2.1 &lt;a href="https://www.mlexam.com/data-cleansing/"&gt;Prep and sanitise data&lt;/a&gt;&lt;br&gt;
2.2 &lt;a href="https://www.mlexam.com/feature-engineering/"&gt;Feature engineering&lt;/a&gt;&lt;br&gt;
2.3 &lt;a href="https://www.mlexam.com/data-visualization/"&gt;Analyse and visualize data&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Prep and sanitise data
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Sanitize and prepare data for modeling&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://www.mlexam.com/data-cleansing/"&gt;Sanitize and prepare data for modeling&lt;/a&gt;, the data can be cleaned up using techniques to remove distortions and fill in gaps.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing data&lt;/li&gt;
&lt;li&gt;corrupt data&lt;/li&gt;
&lt;li&gt;stop words&lt;/li&gt;
&lt;li&gt;formatting&lt;/li&gt;
&lt;li&gt;normalizing&lt;/li&gt;
&lt;li&gt;augmenting&lt;/li&gt;
&lt;li&gt;scaling data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GIf4aer2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/86z2vo4trcr87z9v79ez.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GIf4aer2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/86z2vo4trcr87z9v79ez.jpg" alt="An infographic showing techniques used to clean data." width="450" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Data labeling is the process of identifying raw data and adding one or more meaningful and informative labels to provide context. (&lt;a href="https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/"&gt;AWS&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Data labeling can be costly and time consuming because it involves applying the labels manually. AWS provides the service called Mechanical Turk to reduce the cost and speed up the labelling process.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Feature engineering
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Perform feature engineering&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.mlexam.com/feature-engineering/"&gt;Feature Engineering&lt;/a&gt; is about creating new features from existing ones to make the Machine Learning algorithms more powerful. Feature Engineering techniques are used to reduce the number of features and categorise the data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;binning&lt;/li&gt;
&lt;li&gt;tokenization&lt;/li&gt;
&lt;li&gt;outliers&lt;/li&gt;
&lt;li&gt;synthetic features&lt;/li&gt;
&lt;li&gt;1 hot encoding&lt;/li&gt;
&lt;li&gt;reducing dimensionality of data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EksCgaFv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zit1oce33wu9n4ouvo5e.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EksCgaFv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zit1oce33wu9n4ouvo5e.jpg" alt="An infographic showing aspects of feature engineering." width="800" height="2000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Analyse and visualize data
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Analyze and visualize data for machine learning&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.mlexam.com/data-visualization/"&gt;Analyzing and visualizing the data&lt;/a&gt; overlaps with the other two sub-domains which use these techniques. The techniques include graphs, charts and matrices.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scatter plot&lt;/li&gt;
&lt;li&gt;histogram&lt;/li&gt;
&lt;li&gt;&lt;p&gt;box plot&lt;br&gt;
Before data can be sanitized and prepared is has to bet understood. This is done using statistics that focus on specific aspects of the data and graphs and charts that allow relationships and distributions to be seen.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;correlation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;summary statistics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;p value&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;elbow plot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cluster size&lt;br&gt;
You now understand your data and have cleaned it up ready for the next stage, modeling.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Domain 3: Modeling
&lt;/h2&gt;

&lt;p&gt;When people talk about Machine Learning they are mostly thinking about Modeling. Modeling is selecting and testing the algorithms to process data to find the information of value. It comprises 36% of the exam marks. This domain has five sub-domains:&lt;/p&gt;

&lt;p&gt;3.1 &lt;a href="https://www.mlexam.com/problem-framing-machine-learning/"&gt;Frame the business problem&lt;/a&gt;&lt;br&gt;
3.2 &lt;a href="https://www.mlexam.com/how-to-select-a-model-for-a-given-machine-learning-problem/"&gt;Select the appropriate model&lt;/a&gt;&lt;br&gt;
3.3 &lt;a href="https://www.mlexam.com/training-machine-learning-models/"&gt;Train the model&lt;/a&gt;&lt;br&gt;
3.4 &lt;a href="https://www.mlexam.com/model-tuning/"&gt;Tune the model&lt;/a&gt;&lt;br&gt;
3.5 &lt;a href="https://www.mlexam.com/how-to-evaluate-machine-learning-models/"&gt;Evaluate the model&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Frame the business problem
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Frame business problems as machine learning problems&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;First we decide if Machine Learning is appropriate for this problem. Machine Learning is good for data driven problems involving large amounts of data where the rules cannot easily be coded. The business problem can probably be framed in many ways and this determines what kind of Machine Learning problem is being solved. For example the business problem could be framed to require a yes/no answer as in fraud detection, or a value as in share price. Also in this sub-domain we can find out the type of data and so if the algorithm will use a supervised, or unsupervised paradigm. From the type of problem that has to be solved features of the algorithm can be identified, for example classification, regression, forecasting, clustering or recommendation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4SJ2vDvb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tqpjt3bdikfh23bf5l7o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4SJ2vDvb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tqpjt3bdikfh23bf5l7o.jpg" alt="An infographic describing best practice when framing Machine Learning problems." width="450" height="675"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Select the appropriate model
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Select the appropriate model(s) for a given machine learning problem&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Many models are available through AWS Machine Learning services with SageMaker having over seventeen built-in algorithms. Each model has it’s own use cases and requirements. Once the model has been chosen an iterative process of training, tuning and evaluation is undertaken.&lt;/p&gt;

&lt;p&gt;The exam guide lists the SageMaker built-in algorithms XGboost, K-means. Since there are many built-in algorithms perhaps these are just the most important. Modeling concepts are also listed:&lt;/p&gt;

&lt;p&gt;linear regression — Linear Learner, K-Nearest Neighbors, Factorization Machines&lt;br&gt;
logistic regression — XGBoost&lt;br&gt;
decision trees — XGBoost&lt;br&gt;
random forests — Random Cut Forest&lt;br&gt;
RNN — DeepAR forecasting, Sequence to Sequence&lt;br&gt;
CNN — Sequence to Sequence&lt;br&gt;
ensemble learning — XGBoost&lt;br&gt;
transfer learning — Image classification&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Train the model
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Train machine learning models&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Model training is the process of providing a model with data to learn from. During model training the data is split into three parts. Most is used as training data with the remainder used for validation and testing. Cross validation is a technique used when training data is limited. By understanding the concepts of the internal workings of algorithms, model training can be optimised. Concepts used by models in training include gradient descent, loss functions, local minima, convergence, batches, optimizer and probability.&lt;/p&gt;

&lt;p&gt;The speed and cost of training depends on the choices about the compute resources used. The type of instance central processing unit can be specified. Graphical Processing Units (GPU) can provide more compute power, but not all algorithms can utilise them and may require cheaper CPU instances. For heavy training loads distributed processing options may be available to speed up training. Spark and non-Spark data processing can be used to pre-process training data.&lt;/p&gt;

&lt;p&gt;Model training is also concerned with how and when models are updated and retrained.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.4 Tune the model
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Perform hyperparameter optimization&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Model tuning is also known as hyperparameter optimisation. Machine Learning algorithms can be thought of as black boxes with hyperparameters being the exposed controls that can be changed and optimised. Hyperparameters settings do not change during training. They can be tuned manually before training commences, using search methods or automatically by using SageMaker guided search. Model tuning can be improved by using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regularization&lt;/li&gt;
&lt;li&gt;Drop out&lt;/li&gt;
&lt;li&gt;L1/L2&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Model initialization&lt;br&gt;
Models that utilise a neural network architecture use other hyperparameters:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;layers / nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;learning rate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;activation functions&lt;br&gt;
Tree-based models have hyperparameters that influence the number of trees and number of levels. The learning rate is used to optimise Linear models.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.5 Evaluate the model
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Evaluate machine learning models&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Model evaluation is used to find out how well a Model will perform in predicting the desired outcome. This is done using metrics to measure the performance of the Model. Metrics measure accuracy, precision and other features of the Model by comparing the results from the Model with the known contents of the training data.&lt;/p&gt;

&lt;p&gt;Metrics commonly used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AUC-ROC&lt;/li&gt;
&lt;li&gt;accuracy&lt;/li&gt;
&lt;li&gt;precision&lt;/li&gt;
&lt;li&gt;recall&lt;/li&gt;
&lt;li&gt;RMSE&lt;/li&gt;
&lt;li&gt;F1 score
To measure the correlation between features a Confusion matrix is used.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Evaluation methods can be performed offline or online. A/B testing can also be used to compare the performance of model variants. Metrics allow the detecting of a poorly fitting model caused by bias or variance. This is where a model performs poorly with real world data.&lt;/p&gt;

&lt;p&gt;Other metrics allow models and model variants to be compared using metrics that are not directly related to data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;time to train a model&lt;/li&gt;
&lt;li&gt;quality of model&lt;/li&gt;
&lt;li&gt;engineering costs&lt;/li&gt;
&lt;li&gt;Cross validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your model is now ready to be used with real data. But before it can be let loose on your corporate data it has to be deployed into the production environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Domain 4: Machine Learning Implementation and Operations
&lt;/h2&gt;

&lt;p&gt;This domain is about Systems Architecture and DevOps skills to make everything work in production. It comprises 20% of the exam marks. There are four sub-domains:&lt;/p&gt;

&lt;p&gt;4.1 &lt;a href="https://www.mlexam.com/machine-learning-production-environment/"&gt;Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.&lt;/a&gt;&lt;br&gt;
4.2 &lt;a href="https://www.mlexam.com/aws-ml-services/"&gt;Recommend and implement the appropriate machine learning services and features for a given problem.&lt;/a&gt;&lt;br&gt;
4.3 &lt;a href="https://www.mlexam.com/aws-security/"&gt;Apply basic AWS security practices to machine learning solutions.&lt;/a&gt;&lt;br&gt;
4.4 &lt;a href="https://www.mlexam.com/deploy-ml-model/"&gt;Deploy and operationalize machine learning solutions.&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 The production environment
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Designing AWS production environments for performance, availability, scalability, resiliency, and fault tolerance is part of AWS best practice. Resilience and availability is provided by deploying models in multiple AWS regions and multiple Availability Zones. Auto Scaling groups and Load balancing provide scalability for compute resources. Performance is optimised by rightsizing EC2 instances, volumes and provision IOPS. There are a variety of deployment options including EC2, SageMaker managed EC2 via endpoints and docker containers. CloudTrail and CloudWatch are used for AWS environment logging and monitoring. This assists in creating fault tolerance systems and build error monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 ML services and features
&lt;/h3&gt;

&lt;p&gt;_Recommend and implement the appropriate machine learning services and features for a given problem&lt;br&gt;
_&lt;br&gt;
AWS provides a range of services and features to choose from for a given Machine Learning problem. AWS provide AI services which are highly optimised algorithms deployed on AWS managed infrastructure. Some of the services contain pre-trained models ready for production inferencing. Some examples are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Poly, text to speech&lt;/li&gt;
&lt;li&gt;Lex, chatbot&lt;/li&gt;
&lt;li&gt;Transcribe, speech to text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NS7peyMJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/367kyek1hbfaeb6gb756.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NS7peyMJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/367kyek1hbfaeb6gb756.jpg" alt="An Infographic listing the Amazon AI services provided by AWS." width="450" height="675"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When using AI services AWS does all the heavy lifting of managing infrastructure, models and training. There are other features if you need to have more control of these aspects. SageMaker built-in algorithms can be used or you can bring your own model. This allows cost considerations to influence the choice of compute services. Even more sophisticated cost control can be achieved with using spot instances to train deep learning models using AWS Batch&lt;/p&gt;

&lt;p&gt;AWS Service limits are used to limit the amount of resources. This may be a limit on the number of instances of a service used in an account. AWS Service limits can be increased by AWS on request. Sometimes there is a hard limit which is the maximum for that service in a single AWS Account or Region.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 Security
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Apply basic AWS security practices to machine learning solutions&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Security in AWS starts with the ubiquitous IAM, Identity and Access Management, which controls the activities of all AWS services. Since S3 is the most common storage for Machine Learning services S3 bucket policies are also included. It may seem that access to VPCs, &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html"&gt;Amazon Virtual Private Cloud&lt;/a&gt;, and VPC &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html"&gt;Security Groups&lt;/a&gt; may not be needed if you are implementing serverless applications. However, under the hood, SageMaker uses these services and the security has to be configured. As well as configuring security for the services, data security also has to be considered. This includes encryption of data both at rest and in transit. Anonymisation can be used to protect PII data, &lt;a href="https://docs.aws.amazon.com/comprehend/latest/dg/pii.html"&gt;Personally identifiable information&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.4 Deploy and operationalize
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Deploy and operationalize machine learning solutions&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There are many ways to deploy Machine Learning models in production, one method is to use SageMaker endpoints. Despite the name a SageMaker endpoint is more than an isolated interface, it sits on top of serious processing power. This is provided by SageMaker managed EC2 instances which are set up by the endpoint configuration. SageMaker endpoints can host multiple variants of the same model. This enables different variants of the same Model to be compared using testing strategies, for example A/B testing.&lt;/p&gt;

&lt;p&gt;Once in production the model is monitored because the performance of a model may degrade over time as real world data changes. This drop in performance can be detected and used to trigger the retraining of the model via a retrain pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The AWS Certified Machine Learning — Speciality, exam guide is good for outlining the breadth of the course and how it is divided up into four domains and fifteen sub-domains. Whilst it lists and mentions many subjects only a few are described in any detail and it is still a bit light with those. This article provides additional description of the subjects to allow someone considering studying for the exam to understand what has to be learnt to achieve exam success.&lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Photo by &lt;a href="https://unsplash.com/@overlyawesome?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Daniel Gonzalez&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/map?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Unsplash&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Originally published at &lt;a href="https://www.mlexam.com/aws-machine-learning-exam-guide/"&gt;www.mlexam.com&lt;/a&gt; on October 8, 2020.&lt;/li&gt;
&lt;li&gt;All infographics by &lt;a href="https://www.linkedin.com/in/michael-stainsbury-b695392b/"&gt;Michael Stainsbury&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Copyright &lt;a href="https://www.linkedin.com/in/michael-stainsbury-b695392b/"&gt;Michael Stainsbury&lt;/a&gt; 2020, 2023&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>certification</category>
    </item>
    <item>
      <title>AWS SageMaker BlazingText Algorithm</title>
      <dc:creator>Michael Stainsbury </dc:creator>
      <pubDate>Wed, 26 Jul 2023 05:00:00 +0000</pubDate>
      <link>https://forem.com/mlexam/aws-sagemaker-blazingtext-algorithm-1lhc</link>
      <guid>https://forem.com/mlexam/aws-sagemaker-blazingtext-algorithm-1lhc</guid>
      <description>&lt;p&gt;BlazingText is the name AWS has given it’s SageMaker built-in algorithm that can identify relationships between words in text documents. These relationships, which are also called embeddings, are expressed as vectors. The semantic relationship between words is preserved by the vectors which cluster words with similar semantics together. This conversion of words to meaningful numeric vectors is very useful for Natural Language Processing which requires input data in vector format. This is why BlazingText is used as a precursor for Natural Language Processing.&lt;/p&gt;

&lt;p&gt;Word2Vec is used to pre-process documents containing text to be used by other systems. For example: sentiment analysis; machine language from one language to another. Word2Vec generates a numerical representation of words called embedding. This captures the relationships between words so king, queen and president would be closely related. These relationships are used by Natural Language processing systems. BlazingText is an implementation of the Word2Vec algorithm. Word2Vec was published by Google in 2013 and is compatible with Facebook’s FastText.&lt;/p&gt;

&lt;p&gt;Text Classification is used to classify documents, search engines and for document ranking. Text Classification uses embeddings generated by Word2Vec.&lt;/p&gt;

&lt;p&gt;This article contains revision notes for the &lt;a href="https://www.mlexam.com/aws-machine-learning-exam-guide/"&gt;AWS certified exam MLS-C01, Machine Learning — Specialty&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does the BlazingText algorithm do
&lt;/h2&gt;

&lt;p&gt;BlazingText is used for Textual analysis and text classification problems. BlazingText is the only SageMaker built in algorithm to have both Unsupervised and Supervised learning modes. Word2Vec is Unsupervised and Text Classification is Supervised learning.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Word2Vec — &lt;a href="https://www.mlexam.com/unsupervised-learning/"&gt;Unsupervised learning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Text Classifier — &lt;a href="https://www.mlexam.com/supervised-learning-for-machine-learning/"&gt;Supervised learning&lt;/a&gt;
Usually for Text Classification you would pre-process the data by passing it through a Word2Vec algorithm and then a Text Classifier. The BlazingText algorithm implements the Word2Vec and Text Classifier as a single process.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How is BlazingText implemented
&lt;/h2&gt;

&lt;p&gt;BlazingText is a SageMaker built-in algorithm and so can be trained via SageMaker Jupyter Notebooks and deployed on SageMaker endpoints. Blazing Test processes text data. The input data is presented in a single file with one sentence per line.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the training data formats for BlazingText
&lt;/h3&gt;

&lt;p&gt;There are two input file formats:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;File Mode&lt;/li&gt;
&lt;li&gt;Augmented Manifest Text (AMT) format
The data in File Mode is text with space separated words and one sentence per line. Each line begins with a label like this:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;__label__1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data in Augmented Manifest Text format is in JSON (json lines) format. Each line can contain a single sentence or be split up into phrases by commas as a JSON array. Here are some examples:&lt;/p&gt;

&lt;p&gt;A single line in File Mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;__label__1 Our aim is to increase the year-round consumption of berries in the UK, working closely with British growers during the spring and summer months, and collaborating with UK importers and overseas exporters during winter and early spring.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single JSON line in Augmented Manifest Text format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"source":"Our aim is to increase the year-round consumption of berries in the UK, working closely with British growers during the spring and summer months, and collaborating with UK importers and overseas exporters during winter and early spring","label":1}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single JSON array containing phrases in Augmented Manifest Text format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"source":"Our aim is to increase the year-round consumption of berries in the UK, working closely with British growers during the spring and summer months, and collaborating with UK importers and overseas exporters during winter and early spring","label":[1,3]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model artifacts and inference
&lt;/h3&gt;

&lt;p&gt;Blazing Text uses different artifacts depending on it’s processing mode. This table summarises the file names and formats.&lt;/p&gt;

&lt;p&gt;Artifacts and files used by BlazingText&lt;/p&gt;

&lt;h3&gt;
  
  
  Processing environment
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Word2Vec
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Model binaries: vectors.bin&lt;/li&gt;
&lt;li&gt;Supporting artifacts: Vectors.txt Eval.json (optional)&lt;/li&gt;
&lt;li&gt;Request format: JSON&lt;/li&gt;
&lt;li&gt;Result: List of vectors. If word not found: zeros&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Text Classification
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Model binaries: model.bin&lt;/li&gt;
&lt;li&gt;Supporting artifacts: none&lt;/li&gt;
&lt;li&gt;Request format: JSON&lt;/li&gt;
&lt;li&gt;Result: One prediction&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Processing environment
&lt;/h2&gt;

&lt;p&gt;BlazingText can be run on a single CPU or GPU instance, or multiple CPU instances. The choice depends on the type of processing being performed. Word2Vec has three processing methods:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Skip-gram&lt;/li&gt;
&lt;li&gt;Continuous Bag Of Words (CBOW)&lt;/li&gt;
&lt;li&gt;Batch Skip-gram
These modes are the complete opposites to each other. In skip-gram mode you supply a word and the model returns the context of the word. With CBOW you provide the context and a predicted word is returned.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below is a summary showing mode types and processing modes used by Blazing test.&lt;/p&gt;

&lt;h3&gt;
  
  
  Word2Vec
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single CPU instance: Skip-gram, CBOW, Batch skip-gram&lt;/li&gt;
&lt;li&gt;Single GPU instance (with 1 or more GPUs): Skip-gram, CBOW&lt;/li&gt;
&lt;li&gt;Multiple CPU instances: No&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Text Classification
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single CPU instance: Yes&lt;/li&gt;
&lt;li&gt;Single GPU instance (with 1 or more GPUs): Yes&lt;/li&gt;
&lt;li&gt;Multiple CPU instances: No&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From this summary you can see that all processing methods can be performed on a single CPU instance. Only Word2Vec using batch skip-gram method can run on multiple CPUs and this method cannot utilise GPUs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are BlazingText’s strengths and weaknesses
&lt;/h2&gt;

&lt;p&gt;The strength of BlazingText is high performance. BlazingText is more than 20x faster than other popular alternatives such as Facebook’s FastText. This enables inferences to be done in real time for online transactions rather than batch processing. The main weakness of BlazingText is handling words that were not presented during training. These are called Out Of Vocabulary (OOV) words. Typically such words will be marked as Unknown. There are other ways to perform Word2Vec processing, but they do not have the high performance of BlazingText.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the Use Case for BlazingText
&lt;/h2&gt;

&lt;p&gt;BlazingText can only ingest words, so the input data must be text. Word2Vec is required to convert data to vectors for Natural Language Processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Word2Vec:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Sentiment analysis&lt;/li&gt;
&lt;li&gt;Named entity recognition&lt;/li&gt;
&lt;li&gt;Machine translation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Text classification:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Web searches&lt;/li&gt;
&lt;li&gt;Information retrieval&lt;/li&gt;
&lt;li&gt;Ranking&lt;/li&gt;
&lt;li&gt;Document classification&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Video: AWS re:Invent 2019: Natural language modeling with Amazon SageMaker BlazingText algorithm (AIM375-P)
&lt;/h4&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/E04JXdnPN0w"&gt;
&lt;/iframe&gt;
&lt;br&gt;
This is a 50.36 minutes video from AWS by &lt;a href="https://www.linkedin.com/in/denis-v-batalov-59a3111/"&gt;Denis Batalov&lt;/a&gt;. The presentation can be split into four parts as shown in the timestamps below. I suggest you skip the first two parts and start with the overview of SageMaker BlazingText at 17.13 minutes. This is the link to the Jupyter Notebook used in the demo (part 4):&lt;/p&gt;

&lt;p&gt;SageMaker notebook on Github: &lt;a href="https://github.com/dbatalov/wikipedia-embedding"&gt;https://github.com/dbatalov/wikipedia-embedding&lt;/a&gt;&lt;br&gt;
0 — Introduction&lt;br&gt;
2.17 — Word embedding&lt;br&gt;
2.56 — Word representations&lt;br&gt;
3.43 — One hot encoding&lt;br&gt;
4.37 — Intuition, given a sentence, try to maximise the probability of predicting the context of words.&lt;br&gt;
6.20 — Word2Vec algorithm&lt;br&gt;
8.20 — t-SNE diagram&lt;br&gt;
9.23 — Overview of Amazon SageMaker&lt;br&gt;
12.20 — Build, train and deploy ML Models&lt;br&gt;
13.16 — Built-in algorithms&lt;br&gt;
14.10 — Deep learning frameworks&lt;br&gt;
15.17 — Automatic Model Tuning&lt;br&gt;
16.27 — Amazon SageMaker Neo&lt;br&gt;
17.13 — Overview of SageMaker BlazingText&lt;br&gt;
18.28 — BlazingText highlights&lt;br&gt;
18.45 — Optimization on CPU negative samples sharing&lt;br&gt;
19.40 — Through characteristics&lt;br&gt;
20.35 — BlazingText benchmarking&lt;br&gt;
23.00 — Demo — Georgian Wikipedia&lt;br&gt;
Selected articles with examples of BlazingText being used&lt;br&gt;
This article, by Evan Harris, describes the usefulness of having a website search feature tuned to the specific vocabulary used on the website. The example Evan uses is for a search for a specific grape variety which returns a list of wines that use that variety.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/building-ibotta/heating-up-word2vec-blazingtext-for-real-time-search-c2121bd1396"&gt;https://medium.com/building-ibotta/heating-up-word2vec-blazingtext-for-real-time-search-c2121bd1396&lt;/a&gt;&lt;br&gt;
This article has a good worked example of BlazingText being used:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://t-redactyl.io/blog/2020/09/training-and-evaluating-a-word2vec-model-using-blazingtext-in-sagemaker.html"&gt;https://t-redactyl.io/blog/2020/09/training-and-evaluating-a-word2vec-model-using-blazingtext-in-sagemaker.html&lt;/a&gt;&lt;br&gt;
This article is a worked example of using BlazingText in Word2Vec mode: Training Word Embeddings On AWS SageMaker Using BlazingText by Roald Schuring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://towardsdatascience.com/training-word-embeddings-on-aws-sagemaker-using-blazingtext-93d0a0838212"&gt;https://towardsdatascience.com/training-word-embeddings-on-aws-sagemaker-using-blazingtext-93d0a0838212&lt;/a&gt;&lt;br&gt;
This example, from AWS, uses a method to enable BlazingText to generate vectors for out-of-vocabulary (OOV) words.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/blazingtext_word2vec_subwords_text8/blazingtext_word2vec_subwords_text8.html"&gt;https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/blazingtext_word2vec_subwords_text8/blazingtext_word2vec_subwords_text8.html&lt;/a&gt;&lt;br&gt;
This is an example SageMake Notebook on Github which uses a dataset derived from Wikipedia.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.ipynb"&gt;https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.ipynb&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Video: Amazon SageMaker’s Built-in Algorithm Webinar Series: Blazing Text
&lt;/h4&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/G2tX0YpNHfc"&gt;
&lt;/iframe&gt;
&lt;br&gt;
This is a 1.14.36 minutes video from AWS by &lt;a href="https://www.linkedin.com/in/pratapramamurthy/"&gt;Pratap Ramamurthy&lt;/a&gt;. This is a very long video so use the timestamps below to select the part you wish to see.&lt;/p&gt;

&lt;p&gt;0 — Introduction&lt;br&gt;
2.19 — What are Amazon algorithms&lt;br&gt;
3.08 — BlazingText algorithms&lt;br&gt;
3.17 — BlazingText use case&lt;br&gt;
4.16 — Typical deep learning task on Text&lt;br&gt;
5.36 — Integer encoding&lt;br&gt;
9.20 — One hot encoding&lt;br&gt;
14.00 — Requirements for word vectors&lt;br&gt;
16.32 — Word2Vec mechanism&lt;br&gt;
16.42 — Word2Vec setup&lt;br&gt;
18.07 — Skip-gram preprocessing&lt;br&gt;
20.30 — Neural network setup&lt;br&gt;
25.38 — BlazingText word embedding&lt;br&gt;
27.35 — Word vectors used for further ML training&lt;br&gt;
28.20 — Intuition&lt;br&gt;
28.25 — Random or is there a pattern? (t-SNE plot)&lt;br&gt;
31.14 — Distance between related words&lt;br&gt;
32.26 — How did the magic work?&lt;br&gt;
35.08 — OOV handling using BlazingText&lt;br&gt;
39.38 — Subword detection&lt;br&gt;
41.43 — Text classification with BlazingText&lt;br&gt;
42.18 — Typical NLP pipeline&lt;br&gt;
44.25 — Parameters&lt;br&gt;
47.43 — Demo&lt;br&gt;
100.11 — Questions&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;BlazingText is a high performance algorithm for analyzing text. The two modes of processing producing either numeric vectors for Natural Language Processing via the Word2Vec algorithm or Text Classifications that can infer words from context or context from words.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;These revision notes support subdomain 3.2 &lt;em&gt;Select the appropriate model(s) for a given machine learning problem&lt;/em&gt; of the AWS certification exam: AWS Machine Learning — Speciality (MLS-C01).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;3.2 Select the appropriate model(s) for a given machine learning problem.&lt;br&gt;
Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning. Express intuition behind models&lt;br&gt;
AWS Certified Machine Learning — Specialty, (MLS-C01) Exam Guide&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.mlexam.com/aws-machine-learning-exam-guide/"&gt;AWS Certified Machine Learning exam guide&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mlexam.com/home/domain-3-modeling/"&gt;Domain 3 Modeling articles index&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mlexam.com/sagemaker-text-processing-algorithms/"&gt;3.2 Text processing algorithms&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mlexam.com/35-q-a-for-sagemaker-built-in-algorithms/"&gt;Questions for SageMaker built-in algorithms and their uses&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mlexam.com/aws-machine-learning-practice-exam/"&gt;Free Practice exam with 65 questions&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS docs: &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html"&gt;https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html&lt;/a&gt;&lt;br&gt;
Wikipedia Word2vec: &lt;a href="https://en.wikipedia.org/wiki/Word2vec"&gt;https://en.wikipedia.org/wiki/Word2vec&lt;/a&gt;&lt;br&gt;
Google original papers from 2013: &lt;a href="https://arxiv.org/abs/1301.3781"&gt;https://arxiv.org/abs/1301.3781&lt;/a&gt;&lt;br&gt;
Google original papers from 2013: &lt;a href="https://arxiv.org/abs/1310.4546"&gt;https://arxiv.org/abs/1310.4546&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Training data format resources&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Augmented Manifest Text (AMT) format: &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html"&gt;https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html&lt;/a&gt;&lt;br&gt;
Json lines format: &lt;a href="http://jsonlines.org/"&gt;http://jsonlines.org/&lt;/a&gt;&lt;br&gt;
Text examples from &lt;a href="https://www.britishsummerfruits.co.uk/about"&gt;https://www.britishsummerfruits.co.uk/about&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Processing environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/enhanced-text-classification-and-word-vectors-using-amazon-sagemaker-blazingtext"&gt;https://aws.amazon.com/blogs/machine-learning/enhanced-text-classification-and-word-vectors-using-amazon-sagemaker-blazingtext&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;Burning book photo by &lt;a href="https://unsplash.com/@gasparuhas?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Gaspar Uhas&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/writing-on-fire?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Originally published at &lt;a href="//mlexam.com/blazingtext-algorithm/"&gt;www.mlexam.com&lt;/a&gt; on March 2, 2021.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>sagemaker</category>
      <category>machinelearning</category>
      <category>certification</category>
    </item>
  </channel>
</rss>
