Forem: Goran Vuksic

Rendering myself with NVIDIA's Instant NeRF

Goran Vuksic — Thu, 09 Jun 2022 18:27:38 +0000

Back in January 1999 a few friends and I went to the cinema to watch the movie "Enemy of the State". It was a really popular movie and most likely you have seen it, but if you have not seen it in short it is a high tech action thriller where the main character is chased by some secret agents. In the movie there is one scene where Jack Black as the main hacker in the movie is using images from a video camera and reconstructing 3D scene trying to figure out what Will Smith is carrying in the bag. I remember we were happy with the movie, but we commented how that scene was way too far stretched. Today, 23 years later, I have to say technology went really a long way from where it was back then, and this is not so far stretched anymore.

NeRF (Neural Radiance Field) is a fully connected neural network capable of generating 3D scene from collections of 2D images. NVIDIA's Instant NeRF is a fastest neural rendering model developed so far that achieved up to 1000 times speedups, and it is capable of rendering 3D scenes in seconds. Instant NeRF was showcased at NVIDIA GTC at the end of March this year, examples shown were just amazing, you can check out the official announcement here.

I had to try it out and I decided to render myself standing in the forest. If you think about it, the scene itself is pretty complex with trees around, but I wanted to see how well Instant NeRF can figure out such a scene. My wife Helena took 150 pictures walking around me in the forest making almost a full circle. Following preview shows some of the pictures taken:

NVIDIA's Instant NeRF is available on GitHub, if you would like to try it out you will of course need the NVIDIA GPU. Examples shown on the repository are rendered with RTX 3090. Unfortunately, I have a GTX 1660 Ti with only 1536 CUDA cores (RTX 3090 has 10496 CUDA cores), but I still managed to get really nice results.

In this blogpost I will give you a high overview regarding setup and also some tips that I figured out on the way, beside the info in official repository you can find a great tutorial here if you would like to know how it is done in the details.

After you clone the repository locally, you should install all dependencies and make a build with cmake. Copy pictures into your data folder, and run colmap2nerf.py in order to generate the transform.json file.

python scripts/colmap2nerf.py --colmap_matcher exhaustive --run_colmap --aabb_scale 8 --images data/<insert data folder name>

Please note that aabb_scale parameter specifies the extent of the scene, default value is 1 and it can be set to larger values by power of 2, up to 128. This parameter actually defines a bounding box around your scene, Instant NeRF assumes pictures are taken in a way an object of interest is always in the center, so setting a higher number for aabb_scale parameter means you are extending the bounding box around the object in the center. For the scene I was rendering with a few tries I found value 8 was optimal.

Generated file transform.json contains a lot of information about the pictures used (such as path, sharpness, transform matrix, etc.), and you need this file in order to run 3D scene rendering:

<insert path to instant-ngp>\build\testbed.exe --scene data/<insert data folder name>

Neural Graphic Primitives window will pop up and initially you will be able to see something like this:

In just a few seconds picture will become much more clear and you can easily monitor the progress:

If you are, like me, using not so powerful graphic card as RTX 3090 you can pause training in order to zoom or rotate scene around, otherwise you can do it live. After a minute or two you will not be able to see any more improvements and if you are satisfied with the result you can stop the training. Use camera option to set waypoint around the scene, save current snapshot and waypoints, and go back to command prompt to render it into a video with render.py script:

python scripts/render.py --scene data/<insert data folder name> --n_seconds <insert number of seconds> --fps <insert number of fps> --render_name <insert your video name> --width <insert video width> --height <insert video height>

Final result of my render can be seen in the following video:

I personally find photogrammetry amazing, recently I have seen several great things in this field of science, and NVIDIA's Instant NeRF is really impressive. Being able to generate 3D scenes and objects from 2D images opens a lot of possibilities for generating digital twins, quickly training autonomous robots to navigate environments and much more.

My wife Helena (kudos for taking pictures of me in the forest) said this video render looks like it is from some parallel dimension. No matter how great the video looks, I have not managed yet to explain to her why I need to invest a few thousand dollars in a new graphic card, but I will continue to work on that. ;)

Thanks for reading, if you found it interesting feel free to like and share!

The Maker Show: Remote controlled Robot Arms with Raspberry Pi, .NET, Azure, Blazor and SignalR

Goran Vuksic — Wed, 17 Mar 2021 19:04:59 +0000

The Maker Show is series of monthly one hour show for developers hosted by Sherry List and Goran Vuksic. On each episode we highlight tools and projects from the community that can inspire new creations and inventions.

Pete Gallagher is a Freelance IT Consultant, Microsoft Certified Trainer and Azure MVP, Pluralsight Author and owner of PJG Creations Ltd. He has been creating software for decades and is happy programming in just about any language. Pete has been involved in a wide span of tech in his many years of industry experience, including IoT Projects for; Royal Mail Stamp Vending before there was such a thing as modern IoT, Building Monitoring Systems, Internet Connected Self Service Kiosks and much more. He has presented all over the UK on a variety of IoT Topics, including Azure IoT Hubs, Amazon Alexa, Particle Photon, Arduino etc etc. Pete also organises Notts IoT, co-organises Dot Net Notts, Notts Dev Workshop and sits on the Board of LATi, a Loughborough based Advanced Technology networking group. He is also an active STEM Ambassador and is passionate about making STEM subjects accessible to all ages. Pete particularly likes gadgets of all kinds!
In this super interesting episode of Maker Show, Pete took us through everything needed in order to build a robot arm with a Raspberry Pi, .NET 5 a Blazor App and SignalR.

Pete started by talking about .NET 5 and how easy it is to install .NET 5. Then we had a look through the various circuits and he spun up a console application to explore how we can control the GPIO on the Pi. We have seen how the Servos are wired up and how to get the code in place to start moving our Raspberry pi based Robot Arm. Finally we saw a simple Blazor and SignalR app that controls the robot remotely! This talk appeals to all knowledge levels and anyone interested in getting into STEM, Electronics and Robotics.

Feel free to check out recording of the show:

Presentation from the session can be found on SlideShare.

Code repository is on the GitHub.

Check this post to discover how to Connect Raspberry Pi online simulator to Azure IoT Hub (Node.js)

Follow us on our ForDevs Twitter account if you would like to be notified when our next show will be announced. You can also join our Meetup group and subscribe to our Youtube channel.

The Maker Show: TinyML for wildlife conservation 🐘

Goran Vuksic — Thu, 25 Feb 2021 07:16:56 +0000

Biodiversity is declining at a rapid pace and several wild species are at risk of extinction. With technical support as IoT solutions and machine learning, conservation efforts can be facilitated.

Sara Olsson is a Maker who recently took a big step into the world of things. She has a background in software development and image processing and is now exploring the combined field of IoT and Machine Learning. She is an ambassador at Edge Impulse and uses their tools to train ML for wildlife conservation.

Sara gave a talk on machine learning for camera traps and running inference on the edge. She demonstrated image classification, from model training to deployment, using tools as Edge Impulse, Azure IoT Edge, and the OpenMV camera board.

Feel free to check out recording of the show:

Demo resources:
• Hackster submission
• Edge Impulse
• Kaggle Dataset: Africa Wildlife
• IoTMQTTSample

Related project mentioned:
• Project Ngulia: projectngulia.org
• Master thesis: Edge Machine Learning for Animal Detection, Classification, and Tracking
• with co-author Amanda Tydén

Inspiration and related projects:
• Microsoft AI for Earth and their CameraTrap repo
• Project 15 and their github repo
• Hack The Poacher
• Smart Parks

Follow us on our ForDevs Twitter account if you would like to be notified when our next show will be announced. You can also join our Meetup group and subscribe to our Youtube channel.

The Maker Show: Making The Skull 💀

Goran Vuksic — Thu, 21 Jan 2021 06:52:10 +0000

In first episode in 2021 we had Uri Shaked as a guest speaker about "Making The Skull", a mind-bending Arduino based hardware puzzle.

Uri is a Maker who loves voiding warranties; Currently building AVR8js, an open-source AVR simulator in JavaScript and working on "The Skull", an ATtiny85 reverse engineering puzzle.

Uris blog post with the details of manufacturing in China can be found here, and here's the story of making his first PCB: "Electronics with a Personal Touch, the Hard Way and the Easy Way". You can also read more about his first CTF, that he made and sold on Tindie, year before the skull, in the blogpost here.

Feel free to check out recording of the show:

Follow us on our ForDevs Twitter account if you would like to be notified when our next show will be announced. You can also join our Meetup group and subscribe to our Youtube channel.

The Maker Show: IoT powered holiday lights 🎄

Goran Vuksic — Sat, 26 Dec 2020 17:13:35 +0000

We have just kicked off the very first episode of The Maker Show! This show is series of monthly one hour show for developers hosted by Sherry List and Goran Vuksic. On each episode we will highlight tools and projects from the community that can inspire new creations and inventions.

In this episode we had Jim Bennett as a guest speaker about "IoT powered holiday lights 🎄".

Jim does things with IoT and Azure in the Developer Relations team at Microsoft, mainly creating content for students and faculty to help them be successful with Microsoft technologies.

As an IoT enthusiast, Jim has wired up his lights to a Raspberry Pi, and he controls them using cloud IoT Services with no code apps. This allows him to turn lights on and off, or to change the color via a mobile app.

Feel free to check out recording here:

Resources from the show are available at the following link: http://aka.ms/Jim/MakerShow

Follow us on our ForDevs Twitter account if you would like to be notified when our next show will be announced. You can also join our Meetup group and subscribe to our Youtube channel.

Puffins Detection with Azure Custom Vision and Python

Goran Vuksic — Thu, 06 Aug 2020 10:08:00 +0000

In our recent project "Dragonfly Mini" here at Stratiteq we used different technologies and programming languages in order to build an autonomous vehicle powered by NVIDIA Jetson Nano, equipped with a drone, that collects and processes data and uploads it to the Microsoft's Azure cloud. Following this project we will publish series of blogposts here at Dev.to explaining how to achieve specific things and in this first blog post I will show how easy it is to train a model for custom detection with Azure Custom Vision and how you can use this model via Python.

Puffin is, as you probably know, a small bird with brightly coloured beak and Iceland is home to the Atlantic puffins. One of use cases we demonstrated in "Dragonfly Mini" project was inspired by protection of endangered species by use of AI, therefore we trained a custom model that is capable of detecting puffins. If you want to build a model that can recognise different birds you can use: Wah C., Branson S., Welinder P., Perona P., Belongie S. "The Caltech-UCSD Birds-200-2011 Dataset" Computation & Neural Systems Technical Report, CNS-TR-2011-001. You can also find many other datasets on Kaggle for this use. Data preparation is manual work and takes some time, but more effort you invest in preparing and tagging your images the results will be better.

It is really easy to create object detection project with Azure Custom Vision, and I explained this with step-by-step walkthrough in one of the previous blogposts "Where's Chewie? Object detection with Azure Custom Vision". Log into the Custom Vision, create new project for object detection, add and tag images, train your project and publish the trained iteration. Before you publish the iteration make sure you make few tests with different images. Using only Caltech dataset will not be sufficient to make good recognition so try to add more images from other datasets or from other Internet sources.

We can use this model from Python in less than 20 lines of code. First let's install OpenCV and Custom Vision service SDK for Python by running following commands:

pip install opencv-python
pip install azure-cognitiveservices-vision-customvision
pip install msrest

OpenCV will be used to get image from the camera, we'll save this and resulting image with it, and we'll also use it to draw bounding boxes for objection detection results.

Create a new Python script and import packages you just installed.

import cv2
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

Define the camera and set properties for width and height, keep in mind you should keep aspect ratio, in my case it will be 640 by 480 pixels.

camera = cv2.VideoCapture(0)
camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

OpenCV's VideoCapture can use input from a camera, but it can also use a video file as input which is a really useful feature.

Next thing we need to do is to define credentials which we will use for our predictor.

credentials = ApiKeyCredentials(in_headers={"Prediction-key": "<PREDICTION_KEY>"})
predictor = CustomVisionPredictionClient("<ENDPOINT_URL>", credentials)

Prediction key can be found in the Custom Vision interface by clicking on "Prediction URL" button. In the window that pops up you will see it below URL.

Endpoint URL can be found if the project settings, in top bar click the settings and you will find endpoint URL (use it without Resource ID). In settings you will also find the project ID which will be in the following format: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.

We will take an image from camera and we will save it as "capture.png" with following two lines of code.

ret, image = camera.read()
cv2.imwrite('capture.png', image)

The image we saved we will use for detection and you will have to use the project ID and name of the published iteration. If you are interested in more details how results look like you can print them out.

with open("capture.png", mode="rb") as captured_image:
    results = predictor.detect_image("<PROJECT_ID>", "<ITERATION_NAME>", captured_image)

We are now able to loop through our results and we will take predictions that have probability over 50%. With these predictions we will draw bounding boxes on the image and store it as new image "result.png". For the bounding boxes we make simple calculation based on the image size, we set bounding box color and border thickness.

for prediction in results.predictions:
    if prediction.probability > 0.5:
        bbox = prediction.bounding_box
        result_image = cv2.rectangle(image, (int(bbox.left * 640), int(bbox.top * 480)), (int((bbox.left + bbox.width) * 640), int((bbox.top + bbox.height) * 480)), (0, 255, 0), 3)
        cv2.imwrite('result.png', result_image)

In the end we release the camera we have used.

camera.release()

When you execute the code, camera will turn on, take an image, Azure Custom Vision should be invoked via URL to get the results and results will be saved as new image. Here is quick test I made by using fridge magnet with puffins.

If you do not get really good results at first, you can retrain your model by adding and tagging additional images.

Full code we used in this tutorial is here:

import cv2
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

camera = cv2.VideoCapture(0)
camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

credentials = ApiKeyCredentials(in_headers={"Prediction-key": "<PREDICTION_KEY>"})
predictor = CustomVisionPredictionClient("<ENDPOINT_URL>", credentials)

ret, image = camera.read()
cv2.imwrite('capture.png', image)

with open("capture.png", mode="rb") as captured_image:
    results = predictor.detect_image("<PROJECT_ID>", "<ITERATION_NAME>", captured_image)

for prediction in results.predictions:
    if prediction.probability > 0.5:
        bbox = prediction.bounding_box
        result_image = cv2.rectangle(image, (int(bbox.left * 640), int(bbox.top * 480)), (int((bbox.left + bbox.width) * 640), int((bbox.top + bbox.height) * 480)), (0, 255, 0), 3)
        cv2.imwrite('result.png', result_image)

camera.release()

I hope you learned how simple it is to utilise Azure Custom Vision via Python, and as you were able to see it was done in less than 20 lines of code.

Thanks for reading!

What does the Computer Vision see? Analyse a local image with JavaScript

Goran Vuksic — Mon, 25 May 2020 07:00:18 +0000

Every week here at Stratiteq we have tech talks called "Brown bag". Idea behind it is to grab your lunch (brown) bag and join a session where we watch presentation about different tech topics, and discuss it afterwards. Last week our session was about Azure Computer Vision.

Computer Vision is an AI service that analyses content in images. In documentation you can find several examples how to use it from different programming languages, in this post you'll also see one example that is not in official documentation and that is: how to analyse a local image with Javascript.

In order to set up Computer Vision you should log in to the Azure Portal, click "Create a resource", select "AI + Machine learning" and "Computer Vision".

Define resource name, select subscription, location, pricing tier and resource group, and create the resource. In resource overview click on "Keys and Endpoint" in order to see keys and endpoint needed to access the Cognitive Service API. This values you'll need later in code we'll write.

Sketch of HTML page we will create is visible on the image below. We'll use camera and show feed on the page, take screenshot of camera every 5 seconds, analyse that screenshot with Computer Vision and display description under it.

For setup of our page we'll use following HTML code, please note JQuery is included in page head.

<!DOCTYPE html>
<html>
<head>
    <title>Brown Bag - Computer Vision</title>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
</head>
<body>
    <h2>What does AI see?</h2>
    <table class="mainTable">
        <tr>
            <td>
                <video id="video" width="640" height="480" autoplay></video>
            </td>
            <td>
                <canvas id="canvas" width="640" height="480"></canvas>
                <br />
                <h3 id="AIresponse"></h3>
            </td>
        </tr>
    </table>
</body>
</html>

We'll use simple CSS style to align content on top of our table cells and set colour of result heading.

table td, table td * {
    vertical-align: top;
}
h3 {
    color: #990000;
}

Inside of document.ready function we'll define our elements, check for camera availability and start camera feed.

$(document).ready(function () {

    var video = document.getElementById("video");
    var canvas = document.getElementById("canvas");
    var context = canvas.getContext("2d");

    if(navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
        navigator.mediaDevices.getUserMedia({ video: true }).then(function(stream) {
            video.srcObject = stream;
            video.play();
        });
    }

});

You can check compatibility of mediaDevices on following link: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/mediaDevices

Every 5 second we'll take a screenshot of our camera feed and we'll send blob of it to the Computer Vision API.

window.setInterval(function() {
    context.drawImage(video, 0, 0, 640, 480);

    fetch(canvas.toDataURL("image/png"))
        .then(res => res.blob())
        .then(blob => processImage(blob));
}, 5000);

Result processing is done in processImage function where you need to enter your subscription key and endpoint in order to make it work. Those values are available in the Azure Computer Vision overview as mentioned earlier.

function processImage(blobImage) {
    var subscriptionKey = "COMPUTER_VISION_SUBSCRIPTION_KEY";
    var endpoint = "COMPUTER_VISION_ENDPOINT";
    var uriBase = endpoint + "vision/v3.0/analyze";

    var params = {
        "visualFeatures": "Categories,Description,Color",
        "details": "",
        "language": "en",
    };

    $.ajax({
        url: uriBase + "?" + $.param(params),
        beforeSend: function(xhrObj){
            xhrObj.setRequestHeader("Content-Type","application/octet-stream");
            xhrObj.setRequestHeader("Ocp-Apim-Subscription-Key", subscriptionKey);
        },
        type: "POST",
        cache: false,
        processData: false,
        data: blobImage
    })
        .done(function(data) {

    });
}

Result we receive from the Computer Vision API is JSON, we'll take description from it and add it to the header 3 element named "AIresponse".

document.getElementById('AIresponse').innerHTML = data.description.captions[0].text;

We did few tests with it, Computer Vision describes images really well, if you mess around with you could also get few funny results as we did:

Thanks for reading, you can find full code on the GitHub: https://github.com/gvuksic/BrownBagComputerVision

Where's Chewie? Object detection with Azure Custom Vision

Goran Vuksic — Sun, 03 May 2020 21:33:16 +0000

"Where's Waldo?" game is very well known by everyone, in this post I'll explain how to make such game with Azure Custom Vision and LEGO minifigures. This was my weekend project, feel free to recreate it and while learning how Azure Custom Vision works you can also entertain your child to help you out with it.

I'll be using Chewbacca LEGO minifigure (nicknamed Chewie) that AI should detect in image among other minifigures.

Board where Chewbacca will be hiding can be seen on the following image, it consist of 49 different LEGO minifigures.

In order to follow next steps you should have an Azure account, if you don't have one you can create it here.

When you sign in with your Azure account into Custom Vision (www.customvision.ai) you'll see a page that shows your projects.

Clicking on "New project" will ask you to enter projects name, you can add description and you need to select resource.

If you are creating a new resource, you should define resource name, subscription, resource group, what kind of resource is that (CognitiveServices), location and pricing tier.

For new resource group you should just define name and location.

After you fill this in you can select type and domain for your new project.

Image classification is used when you want to classify the whole image, which object is represented the most in the image. Object detection is used to find location of content in the image and this is what we need for this project.

For domain we'll use General domain which is explained by Microsoft as "Optimised for a broad range of object detection tasks. If none of the other domains are appropriate, or you are unsure of which domain to choose, select the Generic domain". Feel free to read more about the domains here.

Once project is created you'll be able to see project page that is currently empty and gives you option to upload images to train your model.

I took several images of the LEGO Chewbacca minifigure from different angles. I also made a little trick, as you can see on the images below, I've placed minifigure on top of the LEGO catalogue and I switched pages for each image. That way model will be able to distinguish better what is minifigure and what is background in the image.

When images are uploaded you can manually tag them. UX is really nice here and you'll notice that tagging tool will suggest a frame around objects it detects. For first tag you'll need to enter a name, afterwards you can select it for other images. Adjusting frame to fit the minifigure is super easy.

We prepared the images and we can now start the training, in this case quick training. Advanced training should be used on advanced and challenging datasets.

When training is finished you'll see information about it:

Precision: this number will tell you: if a tag is predicted by your model, how likely is that to be right
Recall: this number will tell you: out of the tags which should be predicted correctly, what percentage did your model correctly find
mAP: mean average precision will tell you: the overall object detector performance across all the tags

Quick test I made with the image of the whole board showed me that some objects were identified as target minifigure but with really low precision (20%). Dataset I made was really small number of images, so I was not surprised and I expected to see something like this.

Predictions allows you to see images that were used for prediction and here you can go into images and correctly tag them.

I fixed predictions several times and added much more images in my dataset. Once everything is in place this is quick and simple to do. After few iterations of my model I tried with new image (that Custom Vision model haven't seen yet) and it successfully found the Chewie with accuracy 40.8%!

Accuracy of the model can be improved further with adding more images to the data set, by using advanced training, etc. In my case I was happy with the result since this is what I aimed for: "Where's Waldo"-like game.

Thanks for reading!