Forem: Tomas

Debugging AWS Lambda + Serverless Framework Locally

Tomas — Thu, 10 Mar 2022 22:31:08 +0000

Working with Lambdas

I've been working with AWS Lambdas + Serverless Framework on my projects lately. When I started to work with AWS Lambda I was a bit lost - I was not sure about the best way to develop🛠, debug🐛 and test🧪 AWS Lambdas locally. One thing I knew for sure - AWS web IDE is not the way to go.

This approach uses Serverless Framework specific CLI commands. However, the idea can be generalized and used with other frameworks. I must also mention, that this post will not walk you through the project setup. But you can check the sample repository here.

The Project & Workflow

While working with Lambdas I converged to my current workflow - the topic of this blog.

Project Setup

Consider the following setup for a Lambda project:

serverless.yaml contains the definition of a function(s) - handler path, deployment settings, etc. it's specific to Serverless Framework. More here.
lambda_1 directory contains an AWS Lambda function. It's possible to have multiple Lambdas per project. You could have something like lambda_2 in your project. You just need to add additional definitions in your serverless.yaml file.
I prefer to use src directories that hold the source code for a function.
handler.py contains the entry point ("lambda handler") for invocation
event.json is a sample event for local invocation. More on that soon.

# directory structure

aws_lambda_project
|-lambda_1
|  |-src
|    |-__init__.py
|    |-util.py
|    |-db.py
|  |-event.json
|  |-handler.py
|-serverless.yml

Local Debugging with Serverless Framework

Now, if you want to run/debug your Lambda you can use the Serverless Framework CLI command. Something like this:

sls invoke local --function my_funnction

Or if you have environment variables and events you'd do something like this:

sls invoke local --function my_function --path event.json --env KEY=VALUE

Using these commands is a legit way to run your AWS Lambdas locally. However, in my experience, this invocation is slow(ish) and no debug breakpoints for you.

❗️ Please, let me know if there's a way to execute with Serverless Framework CLI in debug mode with breakpoints.
❗️ Apparently, if you use AWS SAM it works OOB.

Making local debugging more effective & efficient

The workaround and workflow I converged to. I use an additional script, local_handler.pywhich wraps handler.py This allows you to set up your variables, "events", environment variables, and everything you might need. And best of all - you can use breakpoints in your code.

aws_lambda_project
|-lambda_1
|  |-src
|    |-__init__.py
|    |-util.py
|    |-db.py
|  |-event.json
|  |-handler.py
|  |-local_handler.py <- THIS
|-serverless.yml

Let's consider the following handler.py and local_handler.py.

# handler.py
from lambda_1.src.util import get_db  

db = get_db()

def handler(event, context) -> dict:  
    "Sample lambda function handler."
    records = event.get("Records", "")

    if records:  
        db.save(records)  
    return {"statusCode": 200, "body": "Hello from Lambda!"}

Sidenote: This is the handler you can also invoke with the sls invoke local. This simulates an AWS trigger locally. If it works locally, it will likely work on AWS when deployed.

Your local handler should look something like this:

# local_handler.py
from handler import handler  

sample_event = {
    "Records": [
        {"name": "John", "age": "30"},
        {"name": "Jane", "age": "25"}
    ]  
}  

def main():  
    print(handler(sample_event, None))  

if __name__ == "__main__":  
    main()

Now in local_handler.py, I wrap the handler function with a main function which is called when you run the local_handler.py.

This has a couple of advantages IMO:

you can run your code with Python 🐍 - i.e. breakpoints, IDE capabilities
you can keep your lambda handler intact and deploy it directly to AWS
faster startup - no need to initialize Serverless Framework

Un-cluttering deployments

In serverless.yml it's possible to define files that you do not want to deploy to AWS. The exclamation mark ignores the files. They won't be packaged and pushed - keeping it tidy. See the example below:

...
functions:
  sample_lambda:  
    package:  
      patterns:  
        # include  
        - 'src/**'  
        # exclude 
        - '!local_handler.py'  
        ...
        - '!venv/**'  
    handler: lambda_1.handler.handler  
    environment: ${file(env/.env.sls.json):stage}
...

🏁 Fin. This approach has proven to be the most flexible in my experience with AWS Lambdas.

Feel free to leave a comment or reach out. 📥 💫.

Repository: AWS Sample Repo
Catch me on: github
Catch me on: Twitter

Data Engineering Pipeline with AWS Step Functions, CodeBuild and Dagster

Tomas — Thu, 30 Dec 2021 20:13:52 +0000

What are we building?

An end-to-end project to collect, process, and visualize housing data.

The goal of this project is to collect Slovak real estate market data, process it, and aggregate it. Aggregated data is consumed by a web application to display a price map of 2 Slovak cities - Bratislava and Kosice.

Data is collected once per month. My intention is to create a snapshot of the housing market in a given month and check on changing price trends, market statistics, ROIs, and others. You could call it a business intelligence application.

Collect -> Process -> Visualize  //  🏠📄 -> 🛠 -> 💻📈

Currently, the web application frontend shows the median rent and sell prices by borough. Still a WIP 💻🛠.

I have a backlog of features I want to implement in the upcoming months. Also, feature ideas are welcome 💡.

Why am I building this?
I am interested in price trends and whatnot. Plus, I wanted to build a project on AWS using new exciting technologies like Dagster.

What's in it for you?

It's not a tutorial by any means. More of a walkthrough and reasoning behind the design and gotchas along the way. I will talk about:

AWS Step Functions and how I implemented my pipeline using this service.
AWS CodeBuild and why I think it is the optimal service to use for this my use-case.
Dagster and how it fits in the picture.

Going technical

The Workflow & Architecture

From technical perspective the project is implemented via 3 separate microservices. This allows flexibility in deployments, managing Step Functions, and developing the project part-by-part.

It is a side project so I have to keep costs as low as possible while having a "fully running product". I built the project around serverless services which introduced a couple of constraints to keep the prices low. Mainly using GCP along AWS.

Why AWS and GCP?

Cost savings 💸.

I wanted to build this project solely on AWS... but AWS AppRunner(GCP CloudRun analog), to run the web application does not support scale down to 0 instances. Meaning, there's fixed base cost for 1 running instance which I wanted to avoid.

GCP CloudRun supports scale down to 0 instances which is ideal. I will only pay for the resources when a web application is accessed and I do not have to keep a constantly running instance.

Services & Tools

I will write about the less known AWS Services and the reason I selected them for the project. Everyone knows about S3. Plus, Dagster is an awesome pipeline orchestrator.

AWS Step Functions

AWS Step Functions is a low-code, visual workflow service that to build distributed applications, automate IT and business processes, and build data and machine learning pipelines using AWS services.

There is a great blog post about AWS Step Function use cases. It goes in-depth on patterns, use-cases, and pros/cons of each. Check this link.

For my use-cases it was the ideal orchestration tool. Because:

Pipelines run infrequently - with AWS Step Functions + CodeBuild + Dagster I avoided the overhead of deploying to EC2, Fargate, ECS. Everything is executed on demand.
Low complexity - Ideal for AWS Step Functions.
Cheap (free in my case) - Low number of state transitions.
Native integration with CloudBuild, CloudWatch, and other Step Functions. No need to fiddle with Lambda triggers.

CodeBuild

AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers.

I use CodeBuild as it's the easiest way to get long running on-demand compute. Has native support in Step Functions, and comes with 100 free build minutes. It would be possible to use EC2 instances to execute workloads in the same manner but CodeBuild is quicker to spin up and requires less maintenance. Not to mention it's easy to scale, and run in parallel.

The drawback is that build jobs are ephemeral and therefore data is lost if not saved. This required a bit of engineering to handle errors gracefully in the containers and uploading data artifacts right after they are produced.

Dagster

"Dagster is a data orchestrator for machine learning, analytics, and ETL. It lets you define pipelines in terms of the data flow between reusable, logical components, then test locally and run anywhere." Great intro here.

I tried two other tool before settling on Dagster. Namely, Prefect, and Kedro. While both great, they were not ideal for this project. Prefect needs a running Docker and I felt Kedro had to steep learning curve. Also Kedro is intended for ML project management. When it comes to Kedro, I will dig deeper into it in future projects as I liked how it's organized and also used their Data Engineering convention in this project. I will talk about it later.

Back to Daster, I ultimately choose it because it doesn't need a running docker instance - it's a pip install dagster away, lightweight, extensible, and can be run anywhere - locally, Airflow, Kubernetes, you choose.

Dagster comes in two parts. Dagster - orchestration and Dagit - Web UI. They are installed separately which proven to be a benefit in my development workflow.

As already mentioned I use CodeBuild as an accessible compute resource where I run my Dagster pipeline. I don't think Dagster was intended to be used this way (inside a Docker build) but everything worked seamlessly.

Making It All Work

Step Functions

Main Step Function
Everything is orchestrated by the Main state machine. Which triggers the Data Collect and Data Process state machines containing the CodeBuild blocks where "real work" is done.

My main state machine contains two choice blocks. This allows to run collect and process independently by defining an input at execution trigger.

# Main Step Function inputs
{
    "run_data_collect": true or false,
    "run_data_process": true or false
}

Why triggering a Step Function from a Step Function?

Easier debugging.

By decoupling collect and process and creating two child Step Functions it was easier to debug. I was able to run them workflows separately. It made the whole development process more friendly. On top of that, making changes in the underlying Step Functions doesn't affect the overall flow, and I can easily change the Step Function that is called.

Note on triggering Step Functions & CodeBuilds
My use-cases requires sequential execution of steps. By default, AWS Step Functions triggers another Step Function in, a "Fire and Forget", async manner. Meaning if the child Step Function trigger is successful, it proceeds to the next step.

To wait for the child Step Function execution to finish and return a Success(or Failure) state you should use startExecution.sync. This ensures that the parent Step Function waits until the child Step Function finishes its work.

Similarly for CodeBuild triggers. To wait for the build task to finish use startBuild.sync.

Note on environmental variable overrides in AWS Step Functions
Same code is used for all data collection and processing CodeBuild jobs. To make it possible I am passing environmental variables extensively to define parameters. I define them in Step Functions and use them as Docker --build-arg in CodeBuild.

To make it work, I had to override the env vars in the Step Function CodeBuild trigger. This gave me a headache as AWS in their documentation Call AWS CodeBuild with Step Functions and API reference StartBuild says to use:

That's incorrect - see below. Notice the PascalCase instead of camelCase.

Collect Data

I use BeautifulSoup to collect data. There are great articles, and tutorials out there. I will only mention that I am running data collection sequentially to be a good internet citizen.

Process & Aggregate Data

The magic happens inside of CodeBuild block where a Dagster pipeline is executed.

Deeper into Dagster

Dagster offers a number ways to deploy and execute pipelines. See here.

But that's not what I do - I run the Dagster in a Docker/CodeBuild. I am still questioning if it's the right approach. Nonetheless, taking the pipeline from local development to AWS was painless.

I mentioned that Dagster comes with an UI component - Dagit with a full suite of features to make development enjoyable. While I worked locally, I used both components. Dagit has great UI to launch pipelines, re-execute from a selected step, it also saves intermediary results, and keeps a DB of runs.

Dagit is not necessary to execute Dagster runs and I did not install it at all for Docker builds. Thanks to Poetry it was easy to separate dev installs and save time while building.

Dev Workflow - form Local to Step Functions

Local Dev Runs - At this step I used my computer to execute the runs.
Local Docker Runs - I executed the pipeline in a local Docker build.
AWS CodeBuild Runs - Same as the previous step but on AWS.
AWS Step Function Runs - End-to-end testing.

I must say that using Dagster might have been an overkill but this project was a great opportunity to learn it. Also provides future-proofness in case I want to restructure my project (add data collection to Dagster pipeline etc.), add machine learning pipelines to Dagster's repository, execute on Spark.

Data Process Steps

When doing my research, I ran into Kedro as one of the alternatives. While not used on this project I repurposed Kedro's Data Engineering convention. It works with "layers" for each stage of the data engineering pipeline. I am only using the first 3 layers - Raw, Intermediate, and Primary. As, I am not (yet) running any machine learning jobs.

Stage	Description	Format
Raw	"Raw" data that is gathered in the "Data Gathering" step of the State machine is downloaded to this folder	txt
Intermediate	Cleaned "raw" data. At this stage redundant columns are removed. Data is cleaned, validated, and mapped.	csv
Primary	Aggregated data that will be consumed by the front-end.	csv

The above stages and associated directories contain data after each group of tasks was executed. Output files from Raw, Intermediate, and Primary are uploaded to S3. Locally, I used them for debugging and sanity checks.

Dagster pipeline

Dagster separates business logic from the execution. You can write the business logic inside the components and Dagster takes care of the orchestration. The underlying execution engine is abstracted away. It's possible to use Dagster's executor, Dask, Celery, etc.

Three main Dagster concepts are: @op, @job and @graph. You can read about them here.

Briefly @op is unit of compute work to be done - it should be simple and written in functional style. Larger number of @op can be connected into a @graph for convenience. I connected mapping, cleaning, and validation steps into graphs. A logical grouping of ops based on job type. @job is a fully connected graph of @op and @graph units that can be triggered to process data.

As I am processing both, rent and sell data, using the same @op in the same job in parallel and reusing the @ops by aliasing them. See the gist below:

Full dagster @job.

A @graph implementation in Dagster.

When expanded in Dagit it looks like below. @graph helps to group operations together and unclutter the UI compared to only op implementation. Furthermore you can test a full block of operations instead of testing an operation by operation.

Future Scope & Room for Improvement

I finished the first version of my project. I already see how could parts of the code be improved. Mostly in the data processing part where I use Dagster. It's my first time working with this tool and I missed some important featrues that would make development, testing, and data handling easier.

S3 File Handling I wrote my own S3 manager to upload and download data from s3 buckets. I only recently found out that a dagster-aws model exists. Looking at the module it does exactly what I need, minus the code I had to write.
Artifact/Data Handling I use the Raw, Intermediate, Primary stages for data artifacts created during process. To save it to the respective folder I implemented a simple write @op. It's a legit approach but AssetMaterialization seems like a better, more Dagster-y, way to do it.
Settings & Config I created a global Settings class which contained all settings and configs. In hindsight I should have added the Settings class to Dagster's context or just use Dagster's config. (I think I carried over the mindset from the previous pure-Python implementation of the data processing pipeline).

Feel free to leave comment 📥 💫.

Catch me on github
Catch me on Twitter

Links:

Planning on using AWS Step Functions? Think again
A Quick Introduction to Machine Learning with Dagster
Call AWS CodeBuild with Step Functions
StartBuild
Dagster - Deployment

Export FastAI ResNet models to ONNX

Tomas — Wed, 07 Jul 2021 23:25:55 +0000

A short guide on FastAI vision model conversion to ONNX. Code included. 👀

What is FastAI?

FastAI is "making neural nets uncool again". It offers a high-level API to PyTorch. It could be considered the Keras of PyTorch (?). FastAI and the accompanying course taught by Jeremy Howard and Rachel Thomas has a practical approach to deep learning. It encourages students to train DL models from the first minute. BUT I am sure you already have hands-on experience with this framework if you are looking to convert your FastAI models to ONNX.

What is ONNX?

Open Neural Network Exchange or ONNX is a unified format for deep learning and traditional machine learning models. The idea behind ONNX is to create a common interface for all ML frameworks and increase the interoperability between frameworks and devices.

ONNX is an open specification that consists of a definition of an extensible computation graph model, definition of standard data types, and definition of built-in operators. Extensible computation graph and definition of standard data types make up the Intermediate Representation (IR).

Source [link]

ONNX, and its implementation - ONNX Runtime, make it easier to put your models into production. You can train your models using the framework of your choice and deploy to a target that uses ONNX Runtime. This way bloated environments with large number of dependencies can be minimized to (pretty much) only ONNX Runtime. There's a growing support to ONNX, and exports are being natively supported by frameworks like PyTorch, MxNet, etc. Find all of them here [link]. Although in some cases it might be tricky to export/import, due opset compatibility.

Why use ONNX and ONNX Runtime?

Couple of reasons here:

Faster inference [link], [link]
Lower number of dependencies*
Smaller environment size*
One, universal target framework for deployment

*See conda environment and dependency comparison at the end of the article.

The Process

FastAI currently doesn't natively support ONNX exports from FastAI learners. But by design FastAI is a high-level API of PyTorch. This allows us to extract the wrapped PyTorch model. And luckily, PyTorch models can be natively exported to ONNX. It's a 2-step process with a couple of gotchas. This guide intends to make it a smooth experience.

You can find the entire process in my repository [link]. It also includes an optional ResNet model training. You can skip it and proceed with model export to ONNX. I included a link to a pre-trained model in the notebooks. Or BYOFM - Bring Your Own FastAI Model.

Export (Extract) the PyTorch model

Let's break down what's happening.

If you check the associated notebooks you will find that I exported the FastAI ResNet learner in the previous steps. And named it hot_dog_model_resnet18_256_256.pkl. With load_learner() I am loading the previously exported FastAI model on line 7. If you trained your own model you can skip the load step. Your model is already stored in learn.

To get the PyTorch model from the FastAI wrapper we use model attribute on learn - see line 12. I don't want to train the model in subsequent steps thus I am also setting it to evaluation mode with eval(). For more details on eval() and torch.no_grad() see the discussion [link].

FastAI wraps the PyTorch model with additional layer for convenience - Softmax, Normalization, and other transformation(defined in FastAI DataBlock API). When using the bare-bones PyTorch model I have to make up for this. Otherwise I'll be getting weird results.

First, I define the softmax layer. This will turn inference results into more human readable format. It turn the inference results from something likes this ('not_hot_dog', array([[-3.0275817, 1.2424631]], dtype=float32)) into ('not_hot_dog', array([[0.01378838, 0.98621166]], dtype=float32)). Notice the range of inference results - with the added softmax layer the results are scaled between 0-1.

On line 18, the normalization layer is defined. I am reusing the suggested ImageNet mean and standard deviation values as described here [link]. If you are interested in an in-depth conversation on the topic of normalization. See this [link].

On lines 21-25, I am pulling all together into the final model. This final model will be used for ONNX conversion. FastAI learner also handles resizing but for PyTorch and ONNX this will be handled outside of the model by an extra function.

Export PyTorch to ONNX

PyTorch natively support ONNX exports, I only need to define the export parameters. As you can see we are (re)using the final_model for export. On line 5 I am creating a dummy tensor that is used to define the input dimensions of my ONNX model. These dimensions are defined as batch x channels x height x width - BCHW format. My FastAI model was trained on images with 256 x 256 dimension which was defined in our FastAI DataBlock API. The same dimensions must be used for the ONNX export - torch.randn(1, 3, 256, 256).

I got this wrong a couple of times - the dummy tensor had different dimensions than the images the model was trained on. Example: Dummy tensor torch.randn(1, 3, 320, 320) while training image dimensions were 3 x 224 x 224. It took me a while to figure out why I got poor results from my ONNX models.

The export_param argument, if set to True, includes the parameters of the trained model in the export. It's important to use True in this case. We want our model with parameters. As you might have guessed, export_params=False exports a model without parameters. Full torch.onnx documentation [link].

Inference with ONNX Runtime

On line 10, I am creating an ONNX runtime inference session and loading the exported model. For debugging purposes, or if you get your hands on an ONNX model with unknown input dimensions. You can run get_inputs()[0].shape on the inference session instance to get the expected inputs. If you prefer a GUI, Netron [link] can help you to visualize the architecture of the neural networks.

The inference itself is done by using the run() method which returns a numpy array with softmaxed probabilities. See line 21.

Storage, Dependencies & Inference Speed

Storage

The advantage of using ONNX Runtime is the small storage footprint compared to PyTorch and FastAI. A conda environment with ONNX Runtime (+ Pillow for convenience) is ~ 25% of the PyTorch environment and only ~ 15% of the FastAI environment. Important for serverless deployments.

Dependencies

See for yourself.

Inference speed

I mentioned inference speed as an advantage of ONNX. I tested the inference speed of all three versions of the same model. There were negligible differences between inference speed. Other experiments had more favourable results for ONNX. See references in Why use ONNX and ONNX Runtime?

Summary

FastAI is a great tool to get you up and running with model training in a (VERY) short time. It has everything you need to get top notch results with minimal effort in a practical manner. But when it comes to deployment, tools like ONNX & ONNX Runtime can save resource with their smaller footprint and efficient implementation. Hope this guide was helpful and that you managed successfully convert your model to ONNX.

Repository/Code

FastAI-ONNX GitHub

Feel free to reach out.👏

TinyML: Machine Learning on ESP32 with MicroPython

Tomas — Sat, 26 Jun 2021 11:03:15 +0000

Detecting gestures from time-series data with ESP32, accelerometer, and MicroPython in near real-time.

Why this project?

I wanted to build a TinyML application that uses time-series data and could be deployed to edge devices - ESP32 microcontroller in this case. I looked into machine learning projects that use MicroPython on ESP32 but could not find any (let me know if I am missing something 🙃). Although, There's a growing number of C/C++ TinyML projects using Tensorflow Lite Micro in combination with neural networks. For the first iteration of this project I skipped neural networks and explored what's possible with standard machine learning algorithms.

Before jumping into code, let's clear the basics...

Introducing TinyML

What's TinyML?

TinyML is the overlap between Machine Learning and embedded (IoT) devices. It gives more "intelligence" to power advanced applications using machine. The idea is simple - for complex use-cases where rule-based logic is insufficient; apply ML algorithms. And run them on low-power device at the edge. Sounds simple; execution gets tougher.

TinyML is a fairly new concept, first mentions are dating back to 2018(?). There's still ambiguity about what is considered TinyML. For the purpose of this article, TinyML applications are applications running on microcontrollers with MHz of clock speed up to more powerful ones like the Nvidia Jetson Family. Raspberry included. Other names for TinyML are AIoT, Edge Analytics, Edge AI, far-edge computing. Choose the one you like the most.

Why TinyML?

Bandwidth - As an example, a device at 100Hz sampling rate produces 360,000 data points each hour. Now imagine the amount of data produced by a fleet of these devices. It get's even trickier with images and video.
Latency - "time between when a system takes in a sensory input and responds to it". In case of conventional ML deployment data must be first sent to an ML application. This increases the time in which an edge device can take action as it waits for the response.
Economics - Cloud is cheap but not so cheap. It still costs money to ingest large amounts of data, especially if it must happen in real-time.
Reliability - Revisiting the bandwidth example, in case of high-frequency sampling, it might be hard to ensure that data arrives to a target in the same order as it was produced by an edge device.
Privacy - TinyML processes data on-device and is not sent through network. This reduces the surface for data abuse.

TinyML use cases

TinyML use cases can range from predictive maintenance all the way to virtual assistants. I might write an article on the current landscape, use cases, and business behind case behind TinyML.

What's this TinyML Project about?

I set out to build a TinyML system that detects 3 types of gestures (I will be using gestures/movements interchangeably throughout this article.) from a time-series, stores results, and visualizes them on a webpage.

The system has a static webpage hosted on S3 buckets, DynamoDB, a Golang Microservices, and obviously the edge device with a TinyML application.

Project architecture

While there are more components to this project. This article will be about machine learning and some parts of the implementation on ESP32. If you are interested in the full code you can find the links to the repositories at the end of this article.

Let's go to the edge and see the hardware.

On the edge

The core of the system is an ESP32 - a microcontroller produced by Espressif with 240MHz clock speed, built-in WiFi+BLE and ability to handle MicroPython🐍 (I used MicroPython 1.14). The IMU used for this project was an MPU6500 with 6 degrees of freedom (DoF) - 3 accelerometers (X,Y,Z) and 3 angular velocities (X,Y,Z). Plus breadboard and jumper wires to connect it all together.

MicroPython

If you haven't yet heard about MicroPython - it's Python for microcontrollers.

"MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library and is optimized to run on microcontrollers and in constrained environments." [Link]

It's might not be as performant as C or C++ but provides plenty to make prototyping enjoyable. Especially for IoT applications which are not latency sensitive. (*It worked fine with 100Hz sampling rate)

Data & Machine Learning

Gestures definition

You can find the 3 gestures for which I collected data below. I call them 'circle', 'X' and 'Y' - gifs follow the same order. 'Circle' is self-explanatory. 'X' and 'Y' because to gesture was along the X and Y axis of the sensor, respectively. Ideally, I would have wanted to detect anomalies on real machine data but that type of data is hard to come by and also hard to replicate. My defined gestures, on the other hand, were easy to generate and more than enough to test the possibilities of MicroPython and ESP32.

Experiments

Experimentation with Machine learning was divided into two parts - the first explored the effects of time-series labelling on model performance. I use all available signals from the sensors - X,Y,Z accelerations and X,Y,Z angular velocities. Additionally, I tested the viability of ML on ESP32 from the inference time perspective. Whether, it will be possible to achieve low enough inference times.

The focus of the second set of experiments was model optimization, reducing feature space, selecting the right sampling frequency, and reducing incorrect inference results.

Collecting data

I simplified data collection by using Terminal Capture VS Code extension. It let me save sensor data from VSC's terminal to a txt file which I later wrangled to csv format. For printing out sensor data I wrote the below script. It runs on ESP32 at startup with a 10ms sampling period (100Hz sampling rate). 10ms was the lowest I could get with consistent results. I tried a period=5 but the readings were inconsistent with readings between 5-7ms. Hitting the first limitation of the stack. Nonetheless, 10ms (100Hz) was more than enough.

And this is how it works:

Labelling and label distribution

There are great labelling tools out there, I used labelstud.io to label my time-series data. Among the 3 defined gestures 'circle' is the longest, at around 800-1000ms, 'X', and 'Y' are between 400-600ms. To have a buffer, I used a 1000ms label span for all three of the labels.

Exploring the dataset - EDA

I used 3D plots to see if there's a relationship between the signals. All data points of the 1000ms time span are plotted (101 data points).

Acceleration 3D plot

Angular velocity 3D plot

It's clear that there's a pattern in gesture accelerations and angular velocities.

Let's double check by plotting all signals against their mean. You can find plots of all gestures and correlation matrices in jupyter notebooks in the associated repository.

Note: The cutoff at the top and bottom of the signals is due sensor range which was set to 2G (~19.6 m/s^2).

Machine Learning to detect gestures

Since ESP32, and microcontrollers in general, are resource constrained there's a couple of requirements for my TinyML application:

Inference time << sampling period
ML model < 20kB - it's hard do load files larger than 20kB onto ESP32 (at least with MPY)

MicroPython is still a young project, supported by an active community and there are many libraries already developed. Unfortunately there's no scikit-learn or a dedicated time-series machine learning library for MicroPython.

How to overcome this?

The answer is pure-python machine learning models. Luckily, I found a great library (m2cgen) that let's you export scikit-learn models to Python, Go, Java (and many other) programming languages. It doesn't have time-series specific ML model export capabilities. So, I'll be using standard scikit-learn algorithms.

In practice it looks like this:

Train models with scikit-learn on tabular data
Convert scikit-learn models to pure-python code
Use pure-python models for inference

Caveats of using scikit-learn for time-series data

Using scikit-learn for time-series comes with a price - data must be in a tabular format to train the models. There are two ways to go about this [link]:

Tabularizing(reducing) data

In this case each time point is considered a feature and we lose order of data in time. There's no dependency of one point on the previous or next in the series.
Feature extraction

In case of feature extraction time-series data is used to calculate mean, max, min, variance and other, time-series specific, variables which are then used as features for model training. We moved away from the time-series domain and operate in the domain of features.

I chose data tabularization. While it's simple to call advanced libraries in Python - MicroPython has a limited mathematical toolset - I might not be able extract all features in MPY. Secondly, I had to consider the speed at which these features could be calculated - given the limited resources might take longer than sampling period. Maybe in the next iteration of this TinyML project.

Dataset variations for ML model training

Event position in label

This affected 'X' and 'Y' gestures since their execution takes between 400-600ms. It was possible to change their position in the 1000ms label window. 'Circle' takes 800-1000ms so I left this gesture as labelled.

Dataset description

Baseline dataset

Dataset used for training and validation contained movements as collected and labeled.
Centered X and Y move signals

'Circle' movement takes up the whole span of 1000ms and cannot be manipulated by moving it along the time axis. However 'X' and 'Y' have shorter execution at around 400-600ms and allow for flexibility. I tried to center the movement of the signal in the center of the 1000ms window to see if the model will perform better with this setup.
Centered X and Y move signals + Augmentation

Similarly as in previous case 'X' and 'Y' movements were in the center of the 1000ms window. Additionally, a sort of augmentation was introduced. Since labelling the movements is not 'exact' some signals might have a misaligned start. To make up for this, and possibly achieve a better generalization I added a shift - meaning I used a range of small shifts.

For 'X' and 'Y' movements the center is at -20 steps . For augmentation a range between -20 and -15 was used. Where one step is 10ms.

For 'circle' a range between -2 and 2 was used.

Example: If the original label starts at 0 and the augmented dataset was shifted by -1 step - the augmented dataset will have its start at 0-10ms step.
Centered X and Y move signals + SMOTE

Simlarly, as previous two cases 'X' and 'Y' are centerd but additional synthetic oversampling is used (SMOTE) and an equal amount of labels is create for the training dataset.
X and Y signal at the end of the window

In this case the 'X' and 'Y' movements are put at the end of the 1000ms sampling window.

Data sampling rate
Data was collected at 100Hz which allowed me to downsample. You can find the frequencies used for model training below.

Model evaluation

After the initial model training, deployment and inference on live data I noticed that inference on ESP32 was too sensitive - multiple detections for the same movement occurrence. I collected a validation dataset to see what happens when used with live data. Each validations dataset - 'circle', 'X', 'Y' contained 5-6 gesture events.

I emulated live data feed through a sliding inference window, which makes an inference at each step while it's sliding through the time-series. Each green(circle), blue(X), and red(Y) represent one inference. These line are in the center of the sliding window (i.e. T + 500ms).

See the example below - all of these models had 0.95+ accuracy but still produce incorrect inference results when emulating live data on validation datasets.

Evaluation equation
I used the below equations to do so. For each dataset I calculated the ratio of incorrect labels that shouldn't be there. I acknowledge that I should have labelled my evaluation datasets but I needed a quick way to quantitatively evaluate models.

Label	Equation
Cirle	cirle_error = X+Y / (X+Y+Circle)
X	x_error = Circle+Y / (X+Y+Circle)
Y	y_error = Circle+X / (X+Y+Circle)

Baseline model training results

Initially I used 5 models for baseline model training but reduced it to Decision Trees and Random Forests from the initial set of Decision Tree, Random Forest, Support Vector Machines, Logistic Regression, and Naive Bayes. m2cgen doesn't support Naive Bayes so I was unable convert NB models to pure-python. Logistic Regression and SVMs had issues with inference times when converted to pure-python.

Both Random Forests and Decision Tree settings were left on default.

As you can see there's nothing conclusive with regards to sampling frequency and event position in the label. On top of that, I noticed large variations in results just by changing the random_seed of the model. I assume this could be solved by collecting more data.

You can see the results of means across all 3 movement errors can be found below. Again there's no clear winner so going forward I will be using the baseline dataset to train optimized models - sticking to the basics.

Testing inference time

I tested inference times of Random Forests with different number of estimators to see what's the highest number that is still usable. Inference time with 10 estimators is approximately 4ms which is viable even at 10 ms sampling period. Additionally, the ESP32 was set to 160MHz clock speed, for the actual script I will be using 240MHz (50% increase) which will further decrease inference times.

Note: Random Forests are just ensembles of Decision Trees - if Random Forest pass Decision Trees will pass as well.

Optimizing models

Note: Random Forests are just ensembles of Decision Trees - if Random Forest pass Decision Trees will pass as well.

Considered optimization

Optimizing the number of estimators

Number of estimators must be kept low - ideally between 3-5 because of time constraints.
Optimizing the number of collected inputs

X,Y,Z acceleration signals must be collected for a different part of the application. I considered to create a combination of acceleration and 1 or 2 angular velocity signals.
Optimizing sampling rate

Sampling rate of 100Hz might be an overkill for the application. And based on evaluation results it doesn't offer any benefit over 50Hz or 20Hz sampling rate. On the other hand 10Hz might be too slow. Therefore for experiments I will be using 20, 25 and 50Hz sampling rates.

To train optimized models I used a grid search over the parameters below.

Comparing the results

In the charts you can see the results for all 3 gestures. Blue dots and line represent the baseline model (not optimized model) and red dots and line optimized models (result of grid search). Horizontal lines in each of the charts are the means of all 3 gestures.

The best model is ID #2.

Comparing model #2 to baseline model

Baseline model was trained with all 6 signals at 50Hz with default settings. It's counter intuitive but by reducing the number of signals it was possible to reduce the number of incorrect inferences. Same evaluation methods were used as described in previously.

Although the inference improved by reducing the number of signals and tuning the hyperparameters - it's far from perfect. There are two phenomenons in the inference results - first there are groups of same CORRECT inferences and secondly there's a trailing inference (mostly for circle gestures). The trailing inferences are due residual movement at the end of the circle motion. While these might be correctly classified, they are unwanted and must be filtered out. Ideally, there's only one correct inference per event that is sent to the REST API.

Debouncing inference results

I am assuming models have a window around the 'true' center of movement. Meaning, models will make inference few ms before and after the 'true' movement point. Additionally, there are incorrect 'trailing' inferences.

My debounce implementation is based on two conditions. One of them compares the time difference between the first and last inference in an inference buffer('Circle', 'X', and 'Y' inference results are added to the inference buffer.). The other one evaluates the number of inferences in the inference buffer.

For the window around the 'true' movement I am assuming 200ms which practically allows 9 inferences at 50Hz sampling rate. Therefore, the inference buffer must contain at least 9 values.

The time difference threshold value is set to 450 ms. After experimentation it worked the best at 50Hz sampling rate. It filtered out trailing inferences of 'Circle' gesture while still detecting 'X' and 'Y' gestures. Values above 450ms were unable to detect them. In contrary, time difference threshold values below 400ms were classifying 'trailing' inferences as separate gestures (often of incorrect type)

If above conditions are met - the most frequent value of the first 9 elements in the inference buffer are returned as the final inference result.

Note: This is still wip and I am thinking about smarter re-implementations.

Debouncing implementation

Comparing raw and debounced inference results

The results are cleaner but there's still room for improvement - there should be only in inference per event in 'Circle' eval data and all events should be picked up in 'X' and 'Y'.

Circle

In general the results look better than those produced by baseline models.

Inference on ESP32

Future tweaks

I am already thinking how to tweak this project to achieve faster classification, more accurate results. Here's a couple of ideas I am thinking about:

Improve model evaluation by labelling validation data or by designing better evaluation methods.
Implement feature extraction in addition to time-series data to (possibly) achieve better inference results.
Implement async writes to DB on backend. Shorter response time -> shorter blocking. *MPY requests module implementation does not yet support async.
Replace HTTP requests (does not support async) with MQTT (supports async)
Implement digital signal processing methods to smooth out signals.
Improve data handling - memory allocation errors with 900 data points.
Compare evaluation results to dedicated time-series models and neural networks.

Summary

To conclude - it is clearly possible to classify gestures on an ESP32 microcontroller using standard machine learning algorithms, and MicroPython but some corners need to be cut. Among others, time-series data must be tabularized, highest possible sampling rate is 100Hz (with current setup).

Future scope

While working on this project I found many interesting sources and projects implementing TinyML with Tensorflow Lite Micro, DeepC and similar. Next, I'd like to explore implement gesture classification using neural networks to compare the results between standard ML and DL.

Reach out if you have any questions or suggestions. 👏

Repos

Data & ML notebooks
ESP32 TinyML
Golang REST API

Note: Code is work-in-progress.