Skip to main content

Analysis

Whether researching new components or tracking open-source in production, you can scan your applications throughout your software development lifecycle (SDLC). This article reviews the Sonatype evaluation process to demonstrate the best methods for analyzing your environment.

Sonatype Lifecycle Evaluation Process

  1. Evaluation using one of the Lifecycle integrations

    1. Scan all the file targets

    2. Search for dependencies found in the lock and manifest files

    3. Includes bill of materials (SBOMs) and modules.xml files

  2. Scans are sent through Lifecycle to the Sonatype Data Services for processing

  3. Component metadata and matches are returned to Lifecycle

  4. Lifecycle applies policies and any waivers to the raw results

  5. Violations and enforcement details are returned to the integration

  6. Notifications go out for new violations and webhooks are triggered

  7. The Lifecycle Reports UI is updated with the latest results

Advanced Binary Fingerprinting (ABF)

We examine binary fingerprints (similar to a truncated sha1 hash) of the files and not just the file names and manifests. ABF examines everything included in the application after the build, including any embedded dependencies. An ABF scan will not return false positives in its report. Sonatype data is tied to the component fingerprints of any files where the vulnerability is discovered. When a vulnerability is reported it is because the component fingerprint is in your application. Depending on the environment, there are a few points where this might be confusing.

  1. When the build includes development dependencies and clutter such as testing frameworks left in the source control repository. This can be corrected by rescoping the scan to only the artifacts that are deployed to production.

  2. Embedded dependencies that are renamed within the application. These cannot be detected by name matching or manifest scanning but will be caught by ABF.

  3. When vulnerable files are reused in other open-source components with completely different names. This secondary expansion of discovered vulnerabilities is unique to Sonatype Evaluations.

  4. Use ABF to track InnerSource Insight and apply policy to them.

  5. For Java applications, the ABF scanner uses an additional unique “partial matching” technology, which is capable of identifying when a component is “similar” but not “identical” to the cataloged version. Sometimes modifications to open source are made by development teams to 'fix' the vulnerabilities within the project. One major issue with this is that it introduces technical debt and potential risk to the application when those changes are not documented. Modifying open-source components will also trigger license risk for components with weak copyleft licenses.

Manifest Evaluation

Scans performed before the application has been built do not have open-source dependencies to fingerprint and must therefore leverage package-lock and manifest files to scan. This happens when directly analyzing the source control repositories or software bill of materials earlier in the development lifecycle. The Lifecycle scanners can use the lock or manifest files to get an idea of what should be in the final application. This will include any transitive dependencies if they are included in the lock file. Otherwise, it will report on only what is requested in the manifest.

  • Lifecycle integrations will default to ABF scans as they are accurate to what is in your application.

  • They will then look for lock files followed by manifest files.

  • As we provide feedback earlier in the development/design process, manifest scanning becomes the only option for providing the earliest possible feedback during component selection.

Note

As a best practice, we recommend development teams include lock files with fixed versions with their project manifests. To guarantee a repeatable build, ranges are ignored.

Sonatype Lifecycle Data Analysis

Sonatype Lifecycle uses data derived from our automated vulnerability detection system — basically, a big funnel of sources (NVD, GitHub commits, OSS Index, Sonatype research, etc.) that is processed with automated techniques such as data filtering, aggregation, and machine learning algorithms.

  • Most ecosystems have security, license, and identity data, while a few only have security and identity data.

  • License data includes open source licenses identified in the package manifest, and in the case of Java, any licenses are also observed within the package itself.

  • Identity refers to component details such as recommendations, version graphs, or cataloged data pulled from the package manager repository.

  • We categorize these ecosystems as having either Premium or Standard data capabilities.

Premium Capabilities

For most ecosystems, Sonatype researchers triage incoming data and determine if there is a vulnerability, creating a research ticket for further investigation when necessary. Tickets are prioritized and then entered into our human-curated research process. When research is complete, it goes into our data mart which feeds Sonatype Data Services. Data from the Sonatype Data Services is what you’ll then see in the Lifecycle Dashboard and Application Composition report after an application scan.

Standard Capabilities

For ecosystems with only Security/Identity data, we report any known security vulnerabilities but may not include in-depth research or license data.

Ecosystem Support

Logo

Language

Package Manager

Data

ABF/Manifest

Data Capabilities

Java logo

Java

Maven, Gradle, Ivy

Security, Identity, License

Both

Premium for Java (Maven)

iq-ecosystem-logo-javascript

Javascript

npm, yarn

Security, Identity, License

Both

Premium for Javascript (npm)

iq-ecosystem-logo-nuget

.NET

NuGet

Security, Identity, License

Both

Premium

iq-ecosystem-logo-ruby

Ruby

RubyGems, Bundler

Security, Identity, License

Both

Premium for Ruby (RubyGems)

iq-ecosystem-logo-golang

Go

Go Modules

Security, Identity, License

Manifest

Premium

iq-ecosystem-logo-pypi

Python

PyPi, Poetry, pipenv

Security, Identity, License

Both

Premium

iq-ecosystem-logo-rpm

RPM (Yum)

Yum, Fedora EPEL repo

Security, Identity, License

Both

Standard

iq-ecosystem-logo-c++

C++

Conan

Security, Identity, License

Manifest

Standard

iq-ecosystem-logo-php

PHP

Composer, Drupal

Security, Identity, License

Both

Standard

iq-ecosystem-logo-objective-c

Objective-C

CocoaPods

Security, Identity, License

Manifest

Standard

iq-ecosystem-logo-conda

Conda

Conda

Security, Identity

Manifest

Standard

iq-ecosystem-logo-r-cran

R (CRAN)

CRAN

Security, Identity

Both

Standard

iq-ecosystem-logo-rust

Rust

Cargo

Security, Identity

Both

Standard

iq-ecosystem-logo-swift

Swift

Swift

Security, Identity

Manifest

Standard

More on Analysis Examples Covered in this Section

The examples in this section use IQ Server CLI to scan components in Maven format.

For more examples on scanning components from other popular package managers/package formats refer to Referencing Package URL (purl) and Component Identifiers.