Forem: SiaSearch

Computer Vision Startups Enhancing Security & Surveillance

SiaSearch — Mon, 13 Sep 2021 07:55:47 +0000

From common CCTV cameras to autonomous security drones, visual monitoring devices are everywhere. These security systems continuously produce high volumes of footage, much of which sits unused once it's been captured. It's nearly impossible for humans to monitor multiple live security feeds and take proactive action.

This is where AI comes in - computer vision technology leverages the abundance of visual data to identify what data is useful, what can be ignored, and what demands immediate attention.

In this article, we at SiaSearch have put together the most promising AI applications for security, as well as the most innovative computer vision startups within each.

Video Surveillance

Unlike human personnel, computer vision-based security systems are able to watch security footage tirelessly, monitor everyone in view, identify patterns and any suspicious activity. In recent years, a large number of startups have stepped up to provide AI-powered video surveillance.

Umbo is a Taiwan and San Francisco-based startup that provides cloud-based video security systems for businesses. Umbo's smart security cameras, in combination with proprietary computer vision-based software, autonomously detect and identify suspicious events, such as intrusion, tailgating, and wall-scaling.

Deep Sentinel has built a similar solution for home security. Whenever a threat is detected, Deep Sentinel streams live video to real human security personnel to remotely intervene via microphone.

In response to the uptick in mass shootings events, Actuate (formerly known as Aegis AI) integrates with existing camera feeds to automatically identify anyone who's brandishing a firearm. Once the model identifies a weapon, it alerts security teams and law enforcement.

"We can detect a weapon before a trigger is pulled," said Ben Ziomek, Actuate co-founder and CTO. "In some instances we can enable a security response before any bullets are fired."

Access Control

A number of computer vision startups offer innovative solutions to restrict or allow access to certain areas or facilities.

San Francisco-based Swiftlane, for instance, uses facial recognition to allow or deny access to offices, apartment complexes, and other physical spaces. The solution employs deep learning and computer vision techniques to provide a single-sign-on using a mobile phone or video intercom. After signing up, users only need to look at the face reading terminal to unlock the doors to areas they are authorized to enter.

Similarly, Paravision's platform is designed to be used by global security device manufacturers, solution providers, systems integrators, and financial services firms in situations where an error could have profound negative consequences.

Checkpoint Security

Computer vision technology has also made great strides in detecting and identifying threats at security screening checkpoints.

For example, Silicon Valley's Synapse Technology automates security screening, enabling checkpoints worldwide to catch more threats while reducing operating costs and increasing throughput. Their platform, Syntech ONE®, integrates with new and existing checkpoint machines at airports, courthouses, federal buildings, and more.

Evolv Technology's Edge system uses a combination of camera, facial recognition and millimetre-wave technologies to scan people walking through portable security gates at airports. Machine learning models automatically check for threats, including explosives and firearms, while ignoring non-dangerous items.
Inspecting vehicles can be a challenge for checkpoint security teams.

Difficult to access, the undercarriage is an ideal spot for adversaries to hide illicit materials such as explosives, weapons, and drugs. UVeye is an Israeli startup that provides an automated under vehicle inspection scanner that captures high-resolution images as it scans passing vehicles. Using advanced deep learning algorithms, the scanner is able to detect and flag anomalies in seconds.

Theft Detection

Computer vision can also help retailers react to theft and threats as they happen. With roots in MIT's artificial intelligence labs, StopLift analyzes security video and POS data to distinguish between legitimate and fraudulent behavior at checkout. By applying advanced computer vision algorithms to existing camera feeds, StopLift's ScanItAll system tracks items that pass through the checkout lane, associate them with POS, and flag any suspicious activity.

Developed by Japanese telecommunications company NTT East and startup Earth Eyes, AI Guardman is an automated security camera designed to catch shoplifters in the act. Based on open-source technology developed at Carnegie Mellon University, AI Guardman scans live video streams from cameras in convenience stores and supermarkets, tracking every customer inside. When a threat is detected, the system sends an alert to shop staff in real time.

Public Health & Safety

Advances in computer vision technology can also be used to address common public safety concerns, from cutting crime rates to slowing down the spread of infectious diseases.

Atlanta-based startup Flock Safety aims to reduce crime in America by 25% by using computer vision to improve public safety in neighborhoods. Their automated license plate reader (ALPR) software, FlockOS, combines character recognition with computer vision and machine learning to provide real-time insights to crime prevention authorities in over 1200 US cities. It uses patented Vehicle Fingerprint™ technology that identifies a vehicle even if it's modified or if its license plate is missing or covered.

Furthermore, in the wake of the COVID-19 pandemic, many public facilities have equipped existing security cameras with AI-based software that can track compliance with health guidelines.

Reducing the risk of infection in stores is a major priority for brick-and-mortar retailers in particular. Retail analytics platform Aura Vision provides a suite of COVID-focused solutions to promote in-store safety, from features that monitor face mask compliance to heat maps to visualize high traffic areas since the last clean.

Homeland Security

Not only has computer vision proven its practical value for physical security on private property and public spaces, it has also demonstrated value on a national scale, with applications ranging from environmental monitoring to use in military systems.

Shield AI is a company that works with federal, state, and local departments and agencies to deliver next generation surveillance systems. Their first product, Nova, is a Hivemind-powered drone that searches buildings while simultaneously streaming video and generating maps.

Finally, Orbital Insight specializes in applying computer vision to geo-analytics. The company uses satellites, drones, balloons, UAV footage, and geolocation data from mobile phones to analyze human activities and provide businesses and governments with key behavioral insights and address security concerns.

Visual Data Management for the Security Industry

In the security sector, the challenge is not data acquisition, but effective data management. The majority of security camera recordings aren't useful for the relevant business or computer vision functions, causing extremely high redundancy and low information density.
SiaSearch makes it easy to select subsets of highly quality training data, helping you build better ML models at lower costs.

With SiaSearch's lightweight API, users can:

Automatically index, structure, and evaluate raw data captured by security cameras and sensors
Visualize data and analyze model performance
Search and access all security footage across all events and attributes
Identify rare edge cases and curate training datasets

Interested in learning more? Reach out to SiaSearch for a free proof of concept.

10 Computer Vision Startups Disrupting the Retail Industry

SiaSearch — Mon, 09 Aug 2021 01:02:01 +0000

This article was originally published at: https://www.siasearch.io/blog/computer-vision-startups-retail-industry

Online retail has been growing steadily for years with no sign of stopping. Especially amid movement restrictions induced by the COVID-19 pandemic, research shows that global online sales jumped to $26.7 trillion in 2020. With the rise of ecommerce, one thing is abundantly clear: brick-and-mortar retailers need to innovate if they want to stay competitive.

The use of AI technologies like computer vision is rapidly increasing in the retail industry. AI-enhanced retail holds the promise to eliminate operational inefficiencies and provide shoppers with frictionless in-store experiences. In this article, we’ve put together a list of the most innovative computer vision startups in the retail space.

The Future of AI in Retail

In recent years, an increasing number of retail companies have started to quietly transform physical stores. Walmart, for instance, began installing an array of sensors, cameras and processors to monitor inventory levels, perform automated product quality checks, and more.

Many industry giants have followed suit by employing similar approaches to drive in-store efficiency, better logistics, prevent theft, and more. Research shows that the retail AI market is growing fast—according to a recent RIS News report, only 3% of retailers were utilizing computer vision technology at the end of 2020, yet an additional 40% had plans to deploy new solutions within the next year.

Computer vision solutions present retailers with ample opportunity to boost operations as well as enhance the shopping experience for customers. Among the most promising applications of computer vision include inventory management, loss prevention, automated checkout, and behavioral analytics. From employee-free shops to in-store surveillance, here are several computer vision startups disrupting the retail industry.

Computer vision startups in retail

Inventory management

RADAR: RADAR is a fully integrated hardware and software solution to automate inventory management using RFID and computer vision techniques. Their mission is to streamline inventory management via automated inventory counts, improved in-store replenishment and instantaneous customer stock checks.
Trax: Singapore-based startup Trax provides an in-store solution that uses a combination of computer vision models and hardware to keep track of their inventory in real time. This solution ensures out-of-stock items are repurchased efficiently, while expired items are pulled off from the shelves. The company holds 23 patents on its technology and can analyze images from phones, in-store cameras, and grocery store robots.

Automated checkout

Standard.ai: Previously known as Standard Cognition, Standard.ai’s automated checkout solution is made to fit with retailers’ existing stores and technology. They boast an easy to install camera-first solution that doesn’t employ the use of turnstiles or gates. Standard doesn't use any facial recognition or biometrics, and all deployments are on-premise to ensure maximum performance and security for retailers and shoppers alike.
Trigo: Using proprietary algorithms and affordable off-the-shelf sensor kits, Tel Aviv-based Trigo allows retailers to analyze anonymized shoppers’ movements and product choices in real time. The system automatically compiles selected items into a virtual shopping list, enabling shoppers to leave without going through a traditional checkout line.
Accel Robotics: Accel Robotics provides checkout-free shopping experiences across existing and new store formats with its patented camera-based AI system. They recently launched Valet Market, a completely automated convenience storefront without cashiers or checkout kiosks.

Loss prevention

StopLift: With roots in MIT’s artificial intelligence labs, StopLift analyzes security video and POS data to distinguish between legitimate and fraudulent behavior at checkout. By applying advanced computer vision algorithms to existing camera feeds, StopLift’s ScanItAll system is capable of tracking items that pass through the checkout lane, associate them with POS, and flag suspicious activity as it happens.
Vaak: Japanese startup Vaak provides a cloud-based computer vision system that monitors retail security camera footage for suspicious behavior. Already deployed in over 50 stores within Japan, VaakEye analyzes movement at more than 100 points across the body, automatically weighing behavior for suspiciousness. Once a customer reaches a certain threshold, the system sends an alert, along relevant video clips, to the appropriate staff member.

Behavioral analytics

Deep North: Deep North provides an analytics platform that builds real-time video intelligence for retailers based on video data from CCTV and other cameras that those retailers already use. Deep North’s proprietary technology captures parameters as daily entries and exits, customer occupancy, queue times, conversions and more.
Advertima: Based on information captured by visual sensors, Advertima’s platform provides retailers with a real-time view of what’s going on in physical stores as shoppers move through the space. The platform claims to only process minimal anonymized data, without storing any recordings or personal information for future use.
Cosmose: Cosmose provides a data analytics platform that analyzes foot traffic in physical stores to help predict customer behavior. They offer 3 main products: Cosmose Analytics, which tracks customers’ movements inside physical stores; Cosmose AI, a data analytics and prediction platform to help retailers create marketing campaigns and drive sales; and Cosmose Media, for targeting online ads.

Real-time visibility is essential to operating brick-and-mortar retail. That’s why more and more retailers are employing computer vision in an effort to increase operational efficiency, better the customer experience, and gain an edge over competitors.

Early adopters are already seeing great results—according to estimates from RBC Capital Markets analysts, cashierless Amazon Go stores bring in about 50% more revenue on average than typical convenience stores.

Visual data management for retail

Companies that build computer vision solutions for retail are constantly building and growing their ML training data sets. Today, most companies have to rely on internal tools or manual solutions like spreadsheets to do this.

SiaSearch helps retail companies to simplify and speed up this process with a lightweight API that simplifies data exploration, visualization and selection. As a result, companies can reduce annotation costs and increase model performance.

Interested in learning more? Reach out to the SiaSearch team for a free proof of concept.

15 Best Open-Source Autonomous Driving Datasets

SiaSearch — Wed, 09 Jun 2021 07:05:45 +0000

In recent years, more and more companies and research institutions have made their autonomous driving datasets open to the public. However, the best datasets are not always easy to find, and scouring the internet for them takes time.

To help, we at SiaSearch have put together a list of the top 15 open datasets for autonomous driving. The resources below collectively contain millions of data samples, many of which are already annotated. We hope this list provides you with a solid starting point for learning more about the field, or for starting your own autonomous driving project.

Top Open Datasets for Autonomous Driving Projects

A2D2 Dataset
The Audi Autonomous Driving Dataset (A2D2) features over 41,000 labeled with 38 features. Around 2.3 TB in total, A2D2 is split by annotation type (i.e. semantic segmentation, 3D bounding box).
ApolloScape Dataset
ApolloScape is an evolving research project that aims to foster innovation across all aspects of autonomous driving, from perception to navigation and control. Via their website, users can explore a variety of simulation tools and over 100K street view frames, 80k lidar point cloud and 1000km trajectories for urban traffic.
Argoverse Dataset
The Argoverse dataset includes 3D tracking annotations for 113 scenes and over 324,000 unique vehicle trajectories for motion forecasting.
Berkeley DeepDrive Dataset
Also known as BDD 100K, the DeepDrive dataset gives users access to 100,000 annotated videos and 10 tasks to evaluate image recognition algorithms for autonomous driving. The dataset represents more than 1000 hours of driving experience with more than 100 million frames, as well as information on geographic, environmental, and weather diversity.
CityScapes Dataset
CityScapes is a large-scale dataset focused on the semantic understanding of urban street scenes in 50 German cities. It features semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories. The entire dataset includes 5,000 annotated images with fine annotations, and an additional 20,000 annotated images with coarse annotations.
Comma2k19 Dataset
This dataset includes 33 hours of commute time recorded on highway 280 in California. Each 1-minute scene was captured on a 20km section of highway driving between San Jose and San Francisco. The data was collected using comma EONs, which features a road-facing camera, phone GPS, thermometers and a 9-axis IMU.
Google-Landmarks Dataset
Published by Google in 2018, the Landmarks dataset is divided into two sets of images to evaluate recognition and retrieval of human-made and natural landmarks. The original dataset contains over 2 million images depicting 30 thousand unique landmarks from across the world. In 2019, Google published Landmarks-v2, an even larger dataset with 5 million images and 200k landmarks.
KITTI Vision Benchmark Suite
First released in 2012 by Geiger et al, the KITTI dataset was released with the intent of advancing autonomous driving research with a novel set of real-world computer vision benchmarks. One of the first ever autonomous driving datasets, KITTI boasts over 4000 academic citations and counting.
Level 5 Open Data
Published by popular rideshare app Lyft, the Level5 dataset is another great source for autonomous driving data. It includes over 55,000 human-labeled 3D annotated frames, surface map, and an underlying HD spatial semantic map that is captured by 7 cameras and up to 3 LiDAR sensors that can be used to contextualize the data.
nuScenes Dataset
Developed by Motional, the nuScenes dataset is one of the largest open-source datasets for autonomous driving. Recorded in Boston and Singapore using a full sensor suite (32-beam LiDAR, 6 360° cameras and radars), the dataset contains over 1.44 million camera images capturing a diverse range of traffic situations, driving maneuvers, and unexpected behaviors.

Looking for more datasets? Read the entire blogpost at https://www.siasearch.io/blog/best-open-source-autonomous-driving-datasets.

SiaSearch partners with Virtual Mechanics Corporation (VMC) to accelerate ADAS development in Japan

SiaSearch — Mon, 31 May 2021 02:17:47 +0000

SiaSearch, a Berlin-based AI startup, has announced their partnership with Virtual Mechanics Corporation (VMC) to accelerate ADAS development in Japan. Virtual Mechanics Corporation has over 20 years of experience and established relationships within the Japanese automotive industry. As the sole distributor of the SiaSearch product in Japan, Virtual Mechanics Corporation will promote the adoption of SiaSearch in the Japanese market.

Via web-based GUI or programmatic API, SiaSearch makes it 10x easier and faster for developers to explore, understand, and share large amounts of visual data.

Structure - Automatically index, structure, and evaluate raw sensor data on a petabyte-scale, based on semantic attributes and keywords from a catalog of over 50 events and attributes
Analyze - Quickly visualize your data, making full use of the extracted attributes and run more targeted analyses to improve your models.
Search - Use custom attributes to efficiently find and select the data from your entire data lake.
Collaborate - Save, edit, version, comment and share frames, sequences or objects with colleagues or 3rd parties.

“At Virtual Mechanics Corporation, we are laser focused on driving innovation within the mobility industry,” said Eiji Takita, CEO of Virtual Mechanics Corporation. “In working with SiaSearch, we have secured a partner with deep technical capabilities to help accelerate autonomous driving innovation in Japan.”

“Our distribution partnership with Virtual Mechanics Corporation opens up new and exciting opportunities to accelerate autonomous driving innovation in the Japanese market,” said Clemens Viernickel, CEO of SiaSearch. “We are beyond excited to be working with some of the globally leading automotive companies in Japan through this arrangement. The partnership represents an important milestone for us, and we will continue to pave the path of adoption of SiaSearch abroad.”

About SiaSearch
SiaSearch is a Berlin-based AI company on a mission to power automated mobility solutions of tomorrow with a truly scalable data infrastructure. The category-leading sensor management platform for automated driving, SiaSearch is currently in use by leading automotive OEMs, Tier 1 suppliers, and global technology companies.
For more information, visit https://www.siasearch.io/

About Virtual Mechanics Corporation (VMC)
Virtual Mechanics Corporation is working on vehicle dynamics simulation software to simulate driving in a virtual environment and its application technology. Leading customers, the developer and partner companies, Virtual Mechanics Corporation plays a pivotal role in innovation, allowing customers to expedite their research and development.
For more details, please visit https://vmc.jp/siasearch/ (information in Japanese)

KITTI on SiaSearch - Our first public product unveil for researchers

SiaSearch — Fri, 14 May 2021 02:14:17 +0000

Today, we are glad to announce the release of a public version of SiaSearch based on the popular KITTI dataset. We would like to let researchers use SiaSearch’s power in order to immensely simplify data searchability. We hope that the use of KITTI would allow them to seamlessly extract interesting insights from one of the most popular datasets in AV/ADAS. Deployed on KITTI, we want to make a subset of the features of SiaSearch accessible to researchers all around the world.

SiaSearch allows to efficiently search through recorded driving data. It is made available to researchers on the KITTI dataset.

About SiaSearch

Before we dive into the demo walkthrough, let’s quickly recap the need and value of SiaSearch. As we explained in detail in our previous post, data volumes in the AV/ADAS domain are exploding at an increasing pace with a single vehicle producing upto 10TBs per hour today. Therefore, the ability to search, analyze and prioritize that data in an efficient manner becomes fundamental to its utilization. SiaSearch allows users to process large quantities of multimodal automotive data and extract queryable metadata. Using this metadata developers can easily find complex situations encountered by the vehicle, ranging from lane changes to overtaking to unsafe braking. With fast search, we reduce the time wasted on repetitive data tasks by instantly connecting engineers with relevant data. Addressing data access further enables us to enable smarter data retention decisions and therefore significantly optimize infrastructure usage.

SiaSearch Features

In order to allow you to experience SiaSearch’s abilities, let’s quickly walk through the most important features and functions:

Querying — In SiaSearch there are two methods with which you can query for the data you want: The visual (default) and the code interface. The code query works like any API call statement would, whereas the visual query offers a visually rich interface to make the selection of extractors and search extremely intuitive. Users can select an extractor from a variety of categories. The user can select multiple extractors and then apply search to obtain the results. The code and visual query interfaces are directly linked and updated accordingly.
Similarity Search — When a user comes across an interesting data segment, they might want to find similar clips of data. Since we have high dimensional multimodal data, this would be extremely hard to do manually. Therefore we integrated an automated similarity search, based on unsupervised clustering of the previously extracted metadata to fetch the most relevant and similar data segments. It is important to note that similarity search is NOT a query. In order to find the same clips, queries can be sent. Similarity analysis is more of an explorative tool, taking into account the metadata clusters.
Replay/Adjust Clip — A user might also want to view the data right before or after a certain situation (e.g. a left turn) in order to contextualize the vehicle behavior. This is why we allow the user to change the length of the clip according to their needs. Clip re-adjustment can be achieved by clicking on the adjust clip button in the replay viewer. This opens up the slider so that the user can specify the new start and end points and apply the changes to get an updated clip.
Export — To empower the user with all our metadata insights we allow export of selected data as parquet files. Having this metadata can allow the user to do further analysis or to directly use it any application. After collecting several segments of choice the user can go to the export page, select the parts of metadata which should be exported and obtain the parquet file.

SiaSearch Workflows

SiaSearch can unlock many different workflows in both academic and commercial settings ranging from scenario extraction for AV stack simulation testing to data sharing/collaborative analysis. However, one of the most important ones is filtering data for annotation. This is shown in the video below to enable contextual understanding of the product.

Try it today!

If you are interested in using KITTI on SiaSearch for your research, click here to get started. Our team will set up a dedicated account for you so that you can see SiaSearch’s abilities for yourself.

The Best Data Curation Tools for Computer Vision in 2021

SiaSearch — Mon, 03 May 2021 01:12:31 +0000

The term ‘curation’ is commonly associated with museums or libraries, not data science. However, much like the work that’s done on rare paintings or books, data curation tools make the most important data easily accessible to engineers as they build complex machine learning models.

Without curation, data is difficult to find, analyze, and interpret. Data curation tools provide meaningful insights and enduring access to all your data in one place. In this article, we’ll dive into the importance of data curation for computer vision specifically, as well as review the top data curation tools on the market today.

What is data curation?

Data curation is the act of organizing, enhancing, and preserving data for future use. In machine learning, data curation describes the management of data throughout its lifecycle: from its collection and initially storage, to the time it is archived for future re-use.

This process is all the more important for computer vision engineers, who deal with massive amounts of visual data on a daily basis. Instead of using manual methods such as writing ETL jobs to extract insights, data curation tools provide a streamlined way to access the right data whenever you need to.

The importance of data curation for machine learning

Under the hood, data curation tools directly influence computer vision model performance. Using data curation tools, engineers can get a better understanding of the data they’ve collected, identify the most important subsets and edge cases, and curate custom training datasets to feed back into their models.

The best data curation tools enable you to:

Visualize large scale data: Make it easy to obtain insights on key metrics, as well as the general distribution and diversity of your datasets regardless of sensor type and format.
Enable data discovery and retrieval: Quickly search, filter, and sort through the entire data lake by making all features queryable and easily accessible.
Curate diverse scenarios: Identify the most interesting segments within your dataset, and manipulate them within the tool to create completely customized training sets.
Seamlessly integrate: The tool should fit well within your existing workflows and toolset.

What are the best data curation tools for computer vision?

With an overwhelming amount of AI products and platforms popping up year after year, how do you know which will provide the most value? Based on our experience, we are sharing our honest reviews of the top tools, hoping that this will be of use for engineers searching for a data curation solution.

Read on below to find out which data curation tool is the best fit for your computer vision project.

Aquarium Learning

Aquarium is a data management platform that aims to make it easy to identify labeling errors and model failures. With Aquarium, users can version and combine model predictions with their ground truth.

Aquarium is especially focused on curating and maintaining training datasets, catering less to raw data management use cases. This is because data exploration in Aquarium is predominantly tied to model predictions and ground truth labels.

Users can access Aquarium via their cloud platform or API. However, they currently do not offer on-premise or VPC deployments, and there are no external integrations.

Wide range of use cases - Aquarium supports image, 3D, audio, and text data. They also support multiple annotation types, such as classification, detection, and segmentation.
Interactive model evaluation - Users can manipulate evaluation thresholds and obtain interactive visualizations to obtain required samples quickly.
Collaborative features - Users can collaborate with each other on the Aquarium platform to build data subsets, associate them with issues, and identify new data for annotation.

Scale Nucleus

Launched in late 2020 by Scale, Nucleus is one of the newest data curation tools to hit the market. The Nucleus platform allows users to collaboratively search through image data for model failures. As of now, Nucleus only supports image data, with no support for 3D sensor fusion, video, or text data.

Users can access Nucleus via their cloud platform, API or Python SDK. Currently, Nucleus does not support on-premise deployability.

Visual similarity - Users can search for visually similar images based on one or multiple base samples and associate custom tags with them.
Metadata schemas - Using the Nucleus SDK, users can create flexible metadata schemas. Nucleus provides smart methods to detect and create schemas using the annotation format provided.
Model versioning - Users can create model entities and associate corresponding runs with them. Hence, models can be versioned based on runs (dataset & predictions).

SiaSearch

SiaSearch is a data management platform for computer vision data. Consisting of a scalable metadata catalog and query engine, SiaSearch enables developers to easily search through visual data, add metadata to frames and sequences, as well as assemble custom subsets of data for training or testing.

With deep roots in autonomous driving, the SiaSearch platform is used by many OEMs, Tier 1s and tech companies. Aside from autonomous driving, SiaSearch also has solutions for robotics, retail, and more.

Specialized in sensor data - One of the only tools that can support 3D sensor fusion data, SiaSearch can analyze large volumes of unstructured sensor data, providing insights at the frame and sequence level.
Auto-tagging capabilities - SiaSearch employs a large catalog of pre-trained extractors to automatically add frame-level, contextual metadata to raw data. Additionally, SiaSearch provides a toolbox for quick extractor development, allowing developers to integrate their own extractors.
Fast performance - The SiaSearch platform features a unique, proprietary architecture that combines numeric and sequence-based queries to enable noticeably faster performance.
Flexible workflows & integrations - Users can access SiaSearch via their web-based GUI or programmatic API. SiaSearch also supports cloud or on-premise deployment for enterprise users.

Interested in data curation?

The right data curation tool can dramatically reduce the time spent on manual processes, allowing engineers to focus on what really matters - building great models.

If you’d like to hear more about what we’re doing at SiaSearch, reach out to us at hi@siasearch.io, or visit the SiaSearch website to learn more.

Read the full list on our blog: https://www.siasearch.io/blog/best-data-curation-tools-for-computer-vision

Effortlessly explore the nuScenes dataset with SiaSearch

SiaSearch — Mon, 12 Apr 2021 01:50:22 +0000

A guide to better access, explore, and understand unstructured sensor data for autonomous driving development

To accelerate the speed of autonomous vehicle adoption, an increasing number of organizations and individuals are making their projects available to the public. Open data is fueling commercial and technological advancement in autonomous driving—one of most well known resources being the nuScenes dataset.

Developed by the team at Motional (formerly nuTonomy), nuScenes is one of the most popular open-source datasets for autonomous driving. The nuScenes dataset enables researchers to study a wide range of urban driving situations using data captured by the full sensor suite of a self-driving car. The first dataset of its kind, nuScenes was a key player in cultivating a culture of data sharing and collaboration within the mobility industry.

To further advance this mission, Motional recently partnered with Berlin-based startup SiaSearch to introduce a completely new way to interact with nuScenes—by using a data curation platform to delve deeper into the data than ever before.

This guide will walk you through how to better access, explore and understand the nuScenes dataset with the SiaSearch platform.

What is the nuScenes Dataset?

The nuScenes dataset is one of the largest public datasets for autonomous driving. The dataset contains a rich library of meticulously hand-annotated scenes collected by real self-driving cars. Recorded in Boston and Singapore, nuScenes features a diverse range of traffic situations, driving maneuvers, and unexpected behaviors.

The dataset includes:

Full sensor suite: 32-beam LiDAR, 6 cameras and radars with complete 360° coverage
1000 urban street scenes, 20 seconds each
1,440,000 camera images
23 classes and 8 attributes

Accessing nuScenes data in SiaSearch

To access the data yourself, you’ll need to sign up for a free account on SiaSearch. After creating an account, you can immediately load and visualize nuScenes data within the web-based GUI.

Now that we’re all set up, let’s start exploring! Upon loading nuScenes in SiaSearch, you’ll see the main dashboard, which captures key features of the dataset. This view lets you quickly understand the overall dataset composition, as well as identify any gaps in data distribution.

Querying the nuScenes Dataset

Having a holistic view of the dataset, while useful, is not enough. The ability to drill into specific subsets can uncover insights and imbalances in the data—a critical step in model building and validation.

SiaSearch makes every piece of nuScenes data searchable against all available and auto-extracted dimensions using its intelligent search interface. The platform features two ways to search for the exact sequences you want, using either a visual or code interface. The visual query lets you select extractors from a list of semantic attributes and driving situations, while the code query functions like any API call statement would.

For example, if we are searching for rainy weather scenes with over three cars in view, you could submit the following query using the code interface:

vehicle_following = 'True' AND precip_type = 'RAIN' AND 
road_joint = 'True' AND forward_velocity >= 5 AND num_cars >= 3

Within seconds, the API returns clips that match the query attributes. This makes it easy to search the entire data lake to return situations that fit the case in question. From the search interface, specific clips can be selected by clicking into them to reveal more details, such as charts for forward velocity and acceleration profiles.

Curating custom training datasets from nuScenes

Another key benefit of using the platform is the ability to transform data to produce custom training datasets. The flexibility of the SiaSearch platform allows you to finetune data for training and testing to see a measurable improvement in your models.

For example, you may want to isolate the data right before or after an unprotected left turn in order to contextualize vehicle behavior. The SiaSearch interface gives you the ability to adjust the length of snippets to match your requirements in just a few clicks.

Here are just a few ways you can use the SiaSearch to transform data:

Re-adjust clip length by clicking on the adjust clip button in the replay viewer. Easily trim or expand snippets according to your needs.
Add custom tags to selected segments to produce new subsets of data.
Leave comments for collaborators on a frame or sequence level to notify them of any questions or feedback.

Now that we’ve selected our training samples, we need to find a way to feed the data back into the computer vision model.

Exporting data from SiaSearch

The final step is to export your data into a format you can import directly into your model. To access the export page, add one or more snippets to your export list from either the search or playback pages.

On the export page, you can review a breakdown of the selected snippets based on their associated queries, as well as basic statistics to get an idea of the quantity and contents of the export. When you’re happy with your results, you can export the raw data and or specific metadata attributes as a parquet file to feed back into your model or validation pipeline.

Carefully curated data is critical in the development of any great computer vision model. At SiaSearch, we are determined to accelerate ADAS development by empowering engineers with full access and visibility into their data.

If you found this article interesting, please leave a comment and follow us for similar content. For more information about the SiaSearch platform, please check out our website at https://www.siasearch.io/.

nuScenes is available for commercial use under our commercial license agreement. Please reach out to nuScenes@motional.com to learn more.

Data Curation, without the effort

SiaSearch — Tue, 06 Apr 2021 01:27:12 +0000

When building machine learning (ML) models for highly complex automation tasks it is not only essential to identify the right models for the right job but also to select the suitable datasets for training, testing and validation. While dataset selection is typically easier for ML problems based on structured data, there are massive challenges to use data-driven technologies with unstructured data. Typical examples of such applications are robotics, automated driving and computer vision. While the content of structured data can be easily accessed and queried, dealing with unstructured data typically requires immense amounts of manual work for data selection, annotation and dataset balancing. Even tasks that seem simple, like better understanding the content of whole datasets, become extremely challenging as the typical information depth is very low as little to no additional knowledge about the content and context of the underlying unstructured data is present.

Data selection is a crucial factor for all data-driven applications. It is of great importance to ensure that no biases are introduced by the data, to avoid correlation between the training and test data, and to cover the whole spectrum of possible input data as models are typically bad at extrapolating. This requires deep insights into the data which are typically hard to get, especially on a large scale (several 1000s of hours of recordings). Handling the data becomes tricky and manual labor cannot provide the necessary insights at this scale.

The SiaSearch API allows the user to easily access and analyze the data through efficient queries. Once the user has identified the desired subset of data, it can easily be exported and used for other workflows, like data annotation or model training.

With SiaSearch we provide the tools that enable users to identify the desired data within seconds. As a large amount of content information and metadata is automatically extracted during data ingestion, the data is easily searchable for the end user -- even on a petabyte scale of data.

Using the SiaSearch API to balance datasets

In order to illustrate how to work with the SiaSearch API and SDK, we will walk you through the simple process of identifying data for model training and testing, for example. Let’s assume for now that the AV engineers identified problems in the object detection module while conducting lane changes. Therefore they want to improve their models to address the problem and investigate if new data can be added to the existing training and test sets. In this case, SiaSearch can be used to crawl through previously unlabeled data and identify additional lane changes.

We will illustrate the following functionalities in this post:

Querying the large underlying dataset for specific situations
Conducting a visual sanity check of the data
Generating subsets of data and comparing the underlying data content
Exporting data to a common format

In the following we’ll provide a few steps on how to work with our API and the Python SDK. Please note that only an excerpt of the functionalities can be shown in this post. If you are interested in obtaining full access to SiaSearch and its API, please reach out to us at hi@siasearch.com or request a demo here.

Authenticate with the API

The first step is to use the login credentials to log into SiaSearch.

>>> from siasearch import auth
>>> sia = auth(username='foo', password='bar', "https://some-endpoint.com")
>>> sia
Siasearch object with user `foo` connected to `https://some-endpoint.com

Querying for desired data

Queries can be constructed with an SQL-like DSL (domain specific language). For our simple example where we want to find lane changes in our own “mxa” dataset, the query looks like this:

>>> results = sia.query("dataset_name = 'mxa' AND is_lane_change = 'True'")
>>> len(results.segments)
56

The API returns a results object which consists of segments that match the query attributes. Those segments are defined by the dataset, the underlying drive IDs and the start and end time of each segment and have various methods relating to how a user can interact with a segment.

The results can also be obtained as a Pandas DataFrame.

>>> results.df_segments.head()

Validation of results

It's critical to validate the query results. The query we previously constructed may have been too narrow, or vague. In either case, we can iterate and tweak the query until we are satisfied with the resulting set of events.

The easiest way to visually inspect the query results is by accessing them in the SiaSearch web interface. Giving you an overview of all segments, and allowing you to inspect a single segment in more detail.

This can be achieved with the following command:

>>> results.url()
https://demo.sia-search.com/search?query=dataset_name+=+%27mxa%27+AND+is_lane_change+=+%27True%27&submit=true

This then provides you with an overview of the segments and a statistical aggregation of the results:

The GUI also allows the user to dig deeper into each segment and use the embedded data viewer:

Generating data subsets by tagging

In our simple example the goal was to identify unseen lane changes which are potentially interesting to use for model training and testing. Therefore we also want to be able to tag segments and produce subsets of data. Here we will show you how to tag the first two-thirds of the segments as training data and the rest as test data:

>>> switching_point = int(len(results.segments) *2/3)
>>> for seg in results.segments[:switching_point]:
        seg.add_tag("lane_change_train")
>>> for seg in results.segments[switching_point:]:
        seg.add_tag("lane_change_test")

It is easy to obtain information about which tags are already present for the current user by calling:

>>> sia.get_all_tags()
['lane_change_test', 'lane_change_train']

Or simply return all segments which belong to a certain tag:

>>> tag_results = sia.get_results_from_tag('lane_change_test')
>>> tag_results.segments[0]
<Segment(drive_id='mxa_drive_0052', start_timestamp='2020-02-06 15:04:47.400000+00:00', , dataset_name='mxa', tag='lane_change_test')>

Accessing additional metadata

If we want to build analytics and visualizations without depending on the SiaSearch web interface we can access SiaSearch’s automatically extracted metadata catalogue through the API. In our example we will also extract information about the vehicle velocity, the road types and the precipitation types under which the lane changes were conducted.

The additional metadata for the desired dimensions can be easily accessed and the results are obtained as a dataframe:

>>> meta_results = results.segments[0].get_raw_values(["forward_velocity", "precip_type", "tag_road_type"])
>>> meta_results.head()

This dataframe allows us to quickly generate new plots in order to analyze the underlying data. In our example this allows us to compare the train and test datasets in a straightforward way by visualizing the additional metadata:

The underlying GPS data can also be obtained in order to generate geospatial visualizations (using ipyleaflet for jupyter notebooks).

>>> m = Map(center=compute_gps_center(results), zoom=7)
>>> add_segments_to_map(
        m, 
        sia.get_results_from_tag('lane_change_train').segments,
        color="blue"
    )
>>> add_segments_to_map(
        m, 
        sia.get_results_from_tag('lane_change_test').segments, 
        color="red"
    )
>>> m.save("my_map.html", title="My Map")

The visualizations obtained through the API clearly show that the data in the train and test sets are not balanced, which could cause biases during training or misleading test results, as most test data was recorded on highways. This is not surprising since the process we used to split the data is very simple (first two thirds of the data put into the training set, the rest in the test set). More iterations or a different split of the data would be required.

Exporting

The final step is to export the results into a format you can use for importing into your model or validation pipeline.

The easiest way to achieve this is to access the underlying Panda’s DataFrame which provides a multitude of export formats to choose from. See the official Panda’s document for all available export formats.

As an example:

>>> df_results = results.df_segments
>>> df_results.to_csv(“out.csv”)
>>> df_results.to_parquet(“out.parquet”)

In future versions we will provide more custom formats, not directly available from Pandas interface, such as GeoJSON, KML, and MAT files.

As shown in the previous steps, the SiaSearch API and Python SDK allow the user to easily interact with large scale unstructured data. Cumbersome data engineering tasks are abstracted away which allows the user to focus on the important task of analyzing the data and identifying the required datasets for training, testing and validation files.

Get started with SiaSearch

If you share some of the problems we talked about, or are excited give SiaSearch a try, request a demo on our website or reach out to us at hi@siasearch.io.

Originally published by Mark Pfeiffer on: https://www.siasearch.io/blog/data-curation-without-the-effort

Automotive Companies only Access 5% of their Vehicle Data

SiaSearch — Mon, 05 Apr 2021 06:02:39 +0000

With vehicle sensors collecting massive amounts of data, only 5% of it is currently being used for product development. Better infrastructure and data processing hold the keys to progress.

The promise of fully autonomous vehicles continues to excite and inspire millions of people around the world. The amazing things that safe, reliable, self-driving vehicles can do for humanity--from providing newfound mobility to senior citizens to reducing traffic accidents--are closer to our reach than ever. But we still have a long way to go.

Along with the fuel (gasoline, diesel or electricity) that powers automobiles, autonomous vehicles require a fuel of their own to “drive” safely and effectively: data. Although that data, already collected by millions of sensors on thousands of vehicles around the world, is readily available, it’s not being utilized to its full potential.

These datasets power the algorithms that make all levels of autonomous driving possible. Today, automotive companies only access 5% of their vehicle data, while the remaining 95% becomes costly to store and optimise without the context necessary to make use of it. As automotive companies aggressively pursue data-driven product development, data handling, including the exploration, querying, curation and evaluation of data, is a common bottleneck on the road to progress. These essential but hard-to-manage datasets bring a unique set of challenges for those hoping to make use of them:

Unstructured and diverse formats
Need for rich semantics in order to access them
Huge sizes that require high-performance computing
Strong need for data versioning
Access and security issues
Need for continuous failure-case driven data exploration

With proper data processing and analytics software, engineering users can overcome all of these challenges and vehicle data can fulfill its potential. By upgrading legacy technology for data access and analysis, OEMs and mobility tech companies can bump up data utilisation rates by up to 40% and generate additional ROI as data exploration, search, analysis, anomaly detection and evaluation require less manual engineering work and yield better results.

Data infrastructure and management are being neglected

Currently, vehicle technology developers are focused on machine learning models and ground truth labeling. These same developers are neglecting infrastructure upgrades, leading many to use legacy technology for data management. Terabytes of unstructured, unprocessed vehicle data easily overwhelm these systems, causing them to malfunction. Raw data has no metadata and billions of frames, and technology developers are left to use tools ill-suited to the task of data management to organize data on their own.

Highly-paid engineers search datasets manually on sluggish database systems, spending up to 75% of their time on raw data handling issues instead of training and validating building models. Furthermore, the lack of insights garnered from these vehicle datasets means machine learning and data science systems are unable to effectively build AI functions that rely on sensor data as an input.

Purpose-built data infrastructure is the solution

Infrastructure that is designed and built specifically to house raw sensor data and extract insights is the answer. This infrastructure should be as simple and easy to navigate as spreadsheets and SQL databases that bring order and usefulness to data in other industries. For vehicle sensor data, infrastructure that uses a flexible and minimalist data model and a scalable method to produce semantics, along with fast queries and integrated endpoints to use and share data, is most effective.

These and other infrastructure elements improve data re-use and result in faster insights. Enabled by semantic automation, data can be ranked according to its importance based on context, content, criticality, and usability. This reduces redundancy and maximizes information density. Low importance data is archived or deleted, while high importance data is easily accessible for everyone. Furthermore, data insights can be provided almost immediately with overviews of the content and the redundancy.

Users can augment retention recommendations with tailored rule-based constraints, e.g. unprotected left turns should always be kept.

Better autonomous vehicles, sooner

Advances in automated data infrastructure and processing liberate automotive technology developers from the constraints of legacy technology systems. Automated, semantic data management systems make data handling easier than ever, enabling automotive companies to save valuable engineering time, boosting productivity and increasing overall utilization.

Put simply, more robust, effective infrastructure for unstructured sensor data will lead to higher ROI on research and development by freeing up engineers to do what they do best: building algorithms that will power the autonomous vehicles of the future. And with those engineers working more efficiently, the dreams of autonomous vehicles improving our lives are that much closer to reality.

Dealing with lots of raw sensor data?

Learn more about how the SiaSearch data management platform can help on our website: https://www.siasearch.io/

Originally published by Clemens Viernickel on: https://www.siasearch.io/blog/raw-sensor-data-needs-infrastructure-to-be-useful/