Software Bill of Materials | Part 1 - What Is SBOM and Why Is It Useful?

Issue #002

Welcome to the Issue #002 of the /dev/stdout newsletter. In this issue, I will shed light on a very timely topic known as the Software Bill of Materials, or SBOM in short. This is part 1 of my SBOM series.

Table of Contents

What is a Software Bill of Materials?

An SBOM is a nested inventory, a list of ingredients that make up software components.1

In other words, an SBOM is the same as an ingredients list you find on any food product you buy from a grocery store, but just for the software products instead.

Let’s take an example of an application distributed as a Docker image, quite a typical case in today’s software engineering. An SBOM of such an image would contain everything from the operating system dependencies (such as those from apt, yum, rpm, etc.), all the way to application dependencies (such as NPM, Maven, PyPi, etc.), providing a detailed insight on what the application is composed of.

On the other hand, the simplest SBOM representation contains information about a single package or dependency.

Why an SBOM?

The Software Bill of Materials helps organizations learn what software dependencies their software stack uses. This allows assessing the vulnerabilities, managing license compliance, and managing the combined risk of their entire software stack at an organization-wide level. All of these for many companies are necessary due to compliance and due to national or other laws. SBOM is one of the core concepts in securing the Software Supply Chain.

Having SBOM data collected for the whole tech stack has many benefits.

Finding vulnerable and out-of-date dependencies

When a company gathers SBOM information throughout its software stack, it can track down a single dependency and see immediately in which part of the software stack it’s being used.

Let’s recall the infamous Log4j vulnerability from 2021. To enable the company's security team to identify which parts of their stack are vulnerable, and to determine if a vulnerable version of a library has been updated across the organization, they can search their SBOM data. This allows them to identify where updates are still required and coordinate with the relevant teams.

Without an SBOM database, the process of finding dependencies organization-wide is a very labor-heavy task and requires a lot of communication and coordination. I think many of us experienced the latter firsthand.

Hunting vulnerabilities down manually turns out very costly, compared to the way where the SBOM data is accessible (and searchable) by the teams themselves and stakeholders that can see it organization-wide.

Helping with software license compliance

In addition to the dependencies themselves, the SBOM data contains license information of the software and its dependencies. This data can be used to verify that software licenses in use match the company policy.

This data can help for example in cases where a new library version contains a different license than the older one did, having SBOM data gathered makes the situation visible.

Giving insight into the technology actually used

Many companies think their software stack is based on certain technology or programming languages. Say, some companies say they’re a Java house, and other company says they run on Rust. While this might be true for the code and applications produced in-house, it is not the full picture, and especially in the security sense they as a company run slightly blindfolded.

Understanding what other libraries the stack consists of besides the in-house produced code and frameworks used there helps the company to understand its tech stack more deeply. With SBOM data at hand, a company can focus more for example on security issues and license compliance that stem from outside their main technology choices or programming languages.

Conclusion

Collecting SBOM data helps companies to gain insight into their tech stack in various ways.

Based on my observations firsthand and from other sources many companies that collect SBOM information struggle to make the best use of it. Collecting SBOM data is an important step, but utilizing it in a manner suitable for a given company is where the real value lies.

In this series, I will go through, not only the nature of SBOM but also ways in which the data can be used in a valuable way.

One of the reasons that makes an SBOM important today is the recent regulations that mention SBOM.

One example is the recently approved EU law called the Cyber Resilience Act, or CRA in short, which the European Parliament passed in 2024.

NOTE: At the time of writing the CRA is not enforced yet as it must still be formally adopted by the European Council.5,6 

The EU law comes only a few years after the US Senate passed a similar law, in 2021, the DHS Software Supply Chain Risk Management Act. Before that US President Biden issued an Executive Order on Improving the Nation’s Cybersecurity, also in 2021.

As I’m no lawyer, I will not go into the details of either of them but rather wanted to mention them to give a broader perspective on the Software Bill of Materials and its criticality in modern software engineering.

In addition to regulation, many national security centers around the world have issued recommendations for using and collecting SBOM data.2

From this ISACA blog post, you can learn more about how regulations and recommendations from different government bodies affect and shape, not only the usage of SBOMs but the software industry in general. I think it’s also worth noting how the increasing criticality of the software shapes regulation, as those things do not happen in isolation.

Talk Nerdy To Me Marketing Strategy GIF by Similarweb

Gif by Similarweb on Giphy

SBOM formats

The machine-readable formats in which SBOM data can be presented or delivered are many. The two most known vendor-agnostic formats are SPDX by Linux Foundation and CycloneDX by OWASP. Both of them have their variations and versions such as CycloneDX JSON and CycloneDX XML.

In addition to those two some tools like Anchore Syft or Github security tooling have their format with their reasoning behind them.

SPDX

CycloneDX

Origin

Originally made for license compliance

Made specifically for SBOM

Data Format

JSON/XML

JSON/XML

License

CC-BY-3.0

Apache 2.0

Backed by

Linux Foundation

OWASP

Characteristics

Comprehensive, contains detailed license information; supports comments, and other snippets.

Lightweight, making it easy to maintain, update, and consume; broad tooling support; supports digital signatures.

Format comparison 3,4

I’m going to focus on CycloneDX JSON, as that’s the one I’m most familiar with, but the core concept and the data on a broader level are the same regardless of which one you pick.

Example File

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.2",
  "serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
  "version": 1,
  "components": [
    {
      "type": "library",
      "publisher": "Apache",
      "group": "org.apache.tomcat",
      "name": "tomcat-catalina",
      "version": "9.0.14",
      "hashes": [
        {
          "alg": "MD5",
          "content": "3942447fac867ae5cdb3229b658f4d48"
        },
        {
          "alg": "SHA-1",
          "content": "e6b1000b94e835ffd37f4c6dcbdad43f4b48a02a"
        },
        {
          "alg": "SHA-256",
          "content": "f498a8ff2dd007e29c2074f5e4b01a9a01775c3ff3aeaf6906ea503bc5791b7b"
        },
        {
          "alg": "SHA-512",
          "content": "e8f33e424f3f4ed6db76a482fde1a5298970e442c531729119e37991884bdffab4f9426b7ee11fccd074eeda0634d71697d6f88a460dce0ac8d627a29f7d1282"
        }
      ],
      "licenses": [
        {
          "license": {
            "id": "Apache-2.0"
          }
        }
      ],
      "purl": "pkg:maven/org.apache.tomcat/[email protected]"
    }
  ]
}

Above is a JSON structure containing SBOM information for a single dependency [email protected]. From there you can easily see the general structure and the main data. You can see there the file format in question, library details, license(s), and some metadata unique to just this SBOM generation.

While the file formats are standardized, it does not always guarantee that the data is harmonious. Some variation can be found for example in the license data, the same license can be found written in many ways, say in the direction of, Apache 2.0, apache-2.0, and Apache-2.0.

When aggregating data on a company level, taking all these nuances into account can easily turn the project at times more toward Data Engineering. Luckily some products and tools help using SBOM as part of the company processes, more about those in the next issue of my SBOM series.

What’s next?

In the next part of my SBOM series, I go through tooling and processes for gathering and utilizing SBOM data.

In the meantime, you should check the awesomeSBOM/awesome-sbom Github repository for sources, material, tools, etc. The details of CycloneDX can be found in the CycloneDX schema definition in their GitHub repository.

Do you need someone to help you gather and utilize SBOM data within your organization?

Got your SBOM things already sorted out, but have not yet subscribed to my newsletter?

Do not forget to check my previous post where I briefly introduced myself and the newsletter.

Thanks for reading!

Best,
Pyry

P.S. I got my writing groove going for this issue by listening to:

Reply

or to participate.