Jesse Williams for Jozu

Posted on Jun 5 • Originally published at jozu.com

How to Generate an AI SBOM, and What Tools to Use

#programming #ai #opensource #security

AI systems often depend on a complex web of third-party components including open-source libraries, pre-trained models, external APIs, and datasets. And, without proper tracking, these dependencies introduce security risks that make AI projects vulnerable to supply chain attacks and compliance failures.

In a previous article, we explored how model attestation and SBOMs secure AI projects by providing detailed inventories of every component. While SBOMs improve transparency, security, and governance, their adoption remains limited. The lack of standardization, integration difficulties, and the constantly evolving nature of AI workflows make implementation challenging.

Looking at the current adoption landscape, AI teams are in need of better tools and strategies to simplify and aid the SBOM generation workflow. Before diving into solutions, let's look at why adoption (specifically for AI projects) has been slow, the security vaule of SBOMs, and the main challenges organizations face when adopting or creating them.

The Current State of SBOM Usage in AI Projects

Currently, SBOM adoption for AI projects is mainly limited due to lack of awareness, difficulties adapting SBOM methodologies to AI workflows, and the rapidly evolving nature of the AI industry.

SBOMs are widely used in traditional software development, however, AI has been much slower creating industry-wide risks including supply chain vulnerabilities, compliance violations, and reduced trust in AI outputs. Addressing these risks is critical to making AI development secure and transparent.

Key obstacles include:

Complexity of AI systems: AI development involves multiple stages including data preprocessing, model training, validation, and deployment. Each stage relies on different tools, frameworks, and dependencies, making it more complex than traditional software composition analysis.

Consider a typical AI project that uses PyTorch or TensorFlow for model training, scikit-learn for data preprocessing, and FastAPI for deployment. Each library has its own dependencies, creating a complex web that traditional SBOM tools struggle to capture fully.

Lack of standardization: Unlike traditional software, there are no standard frameworks or guidelines for generating AI-tailored SBOMs.

Integration challenges: Many AI teams struggle to integrate SBOMs into existing development tools and workflows. Automating SBOM creation and making it part of continuous monitoring remains a significant challenge.

Dynamic components: AI systems often rely on constantly changing elements like pre-trained models, external APIs, and third-party datasets, making it challenging to maintain accurate and consistent tracking.

The consequences of slow SBOM adoption expose organizations to several risks:

Security vulnerabilities: Undocumented assets can introduce potential LLM security risks that malicious actors may exploit.

Compliance challenges: Regulatory requirements, such as those mandated by the EU AI Act, are difficult to meet without clear component inventories.

Reduced accountability: Without transparency into model development and data usage, tracing the root cause of errors or biases becomes problematic.

Supply chain risks: Neglecting SBOMs allows malicious actors to insert vulnerabilities into model supply chain components that can later compromise the system. SBOMs enable organizations to track existing workflows and identify untrusted or compromised dependencies before they affect AI systems.

Given these constraints, having a comprehensive inventory of libraries and dependencies is key for driving SBOM adoption as AI systems increasingly integrate third-party components.

Why You Need SBOMs in AI Projects

SBOMs offer several key advantages:

Enhanced security and vulnerability management: SBOMs allow developers to track specific versions of all dependencies and promptly update components that contain security vulnerabilities.

Traceability and transparency: SBOMs provide clear records of all software components, including licenses, dependencies, and versions within AI systems. This helps regulators understand systems and enables development teams to diagnose issues more efficiently during system failures.

Improved collaboration and maintenance: SBOMs act as shared reference points for development teams, including data scientists, software engineers, and domain experts. This helps avoid conflicts between different library versions when updating or scaling existing workflows.

Auditability: SBOMs serve as historical records for AI projects, making it easier to conduct audits of older system versions and fulfill regulatory reporting requirements.

Tools for Creating AI SBOMs

Unlike traditional software SBOMs that primarily track application dependencies, AI SBOMs must account for dynamic components like model weights, training data, and external APIs. This means that using existing methods, such as container-based SBOM tools, can capture some dependencies but often lack visibility into the full AI development lifecycle.

To address these gaps, new tools have emerged that extend SBOM capabilities to meet the needs of AI projects. Some focus on packaging AI artifacts as container images, while others provide structured frameworks for documenting model provenance and dependencies. There are currently three main types of tools being used:

Container-Based SBOM Tools

Traditional SBOM tools like Syft extract dependency data from container images, providing snapshots of libraries and frameworks used in AI projects. While useful, these tools typically don't capture metadata related to model training, data sources, or transformation pipelines.

Here's a quick look at how to generate SBOMs using Syft:

[embed]https://www.youtube.com/watch?v=ZUpUiG3Q6J8[/embed]

Model-Oriented SBOM Frameworks

AI-focused tools that extend beyond static dependency tracking by incorporating model lineage, dataset tracking, and provenance information. These tools use standards like OCI (Open Container Initiative) artifacts to structure AI SBOMs.

For example, KitOps packages AI projects as ModelKits, a format that encapsulates models, datasets, configurations, and dependency relationships. This approach allows teams to maintain tamper-proof records of model evolution and track compliance requirements more effectively.

Registry-Based SBOM Management

Once SBOMs are generated, storing and managing them at scale is the next challenge. Platforms like Jozu Hub focus on secure storage and versioning of AI SBOMs, enabling organizations to maintain verifiable records of all AI assets. These registries also support model attestation, helping teams validate model integrity and detect unauthorized modifications.

The effectiveness of any SBOM approach depends on how well it integrates into existing AI development workflows. As AI security and compliance requirements continue evolving, SBOM generation will likely become an essential part of AI governance.

So What Should You Do?

Traditional SBOMs don't perfectly fit AI project needs, but when extended with AI-specific capabilities like data lineage, model metadata, and compliance tracking, they can serve as robust AI SBOMs. Your ideal tool or combination depends on your specific needs:

Basic requirements: If you primarily need to track software dependencies for containerized AI projects, a simpler option like Syft might suffice.

Comprehensive AI lifecycle management: For teams requiring deep model development tracking, data lineage, and compliance management, a model-focused framework like KitOps is a better fit.

Enterprise-scale management: Organizations with numerous AI models that prioritize security and compliance will find registry-based solutions like Jozu Hub most useful.

AI SBOMs are becoming critical components for maintaining transparency, security, and compliance in modern AI projects. You can explore and download KitOps for free and use Jozu Hub for free to adopt best practices that safeguard your models against security threats and ensure your AI projects' integrity.

I hope this helps,
/Jesse

OpenFeature Multi-Provider: Enabling New Feature Flagging Use-Cases

DevCycle is the first feature management platform with OpenFeature built in. We pair the reliability, scalability, and security of a managed service with freedom from vendor lock-in, helping developers ship faster with true OpenFeature-native feature flagging.

Watch Full Video 🎥

Top comments (1)

Ethan Anderson • Jun 6

Great overview! SBOMs are essential for securing AI projects and ensuring transparency. Thanks for highlighting the importance of using the right tools for AI workflows.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.