Sonatype Vulnerability Data

Sonatype creates its data using a proprietary, automated vulnerability detection system that monitors, aggregates, correlates, and incorporates machine learning from publicly available information.

We gather data from various sources including the National Vulnerability Database, website security advisories, email lists, GitHub events, blogs, OWASP, OSS Index, X formally known as Twitter, and customer reports.

We have evaluated many paid-for services and have found the quality and precision of the data to be of limited value, driving our decision to build an intelligent, automated vulnerability detection system.

The Sonatype Data Research team is not in the business of aggregating public security feeds — we create the precise data we use.

Unfortunately, not all security data is created equal and some data from the above sources, specifically the NVD and public feeds, is incomplete. Many times the “incomplete” data is missing vulnerabilities, and automation is not sufficient to identify this missing information.

As a result, this data is highly curated by Sonatype’s research teams to fill in the gaps and improve accuracy.

How Sonatype provides high-quality data

There are two considerations for data quality:

the content of the security advisory
the precision of associating the content with the correct artifact

Automated decisions require precise identification and corresponding association of security information. Without accurate identification and association, there is a high degree of false positives.

False positives incur unnecessary research and upgrade costs. False negatives leave you at risk because there are no indicators that show you may be at risk. Sonatype uses a combination of automated identification and human research that eliminates false positives and negatives. This results in savings in research time to prove false positives and rework time to upgrade when not required.

When is vulnerability data available

Sonatype Data Services is continuously updated, allowing the most recent data to be visible the instant a Lifecycle analysis occurs. This is true for both newly published components and newly discovered security issues. We have two processing queues for security vulnerabilities to ensure they are immediate available to our customers.

Fast-Track

Our automated vulnerability detection systems process various data sources each day. Upon discovery of an issue, a researcher ensures that an appropriate component was identified, a one-line summary exists, and that the vulnerable version range matches any available advisories. The Fast-Track process generally makes newly discovered vulnerabilities available in less than 24 hours, depending on the severity of the issue.

Deep Dive

After the Fast-Track process is complete, issues are selected to undergo the Deep Dive process based on our priority queue. During the Deep Dive process, issues undergo source code analysis to ensure there is an accurate vulnerable version range as well as detailed explanations, detections, and recommendations. The Deep Dive process may cause a change to the implicated components, CVSS score, and versions as we validate and correct the data provided from the initial Fast-Track process. Deep Dive generally takes 24 hours but may take up to 3 days for outliers.

There is no "refresh time" or delay between completing research and making the results of that research available to you as a customer. As soon as the research is completed, the results of that research will be available in new Lifecycle scans.

When a vulnerability is no longer found in violation details

This happens when the vulnerability that triggered a policy violation previously, has been deleted by our research team.

At Sonatype, we strive for continuous improvement in our vulnerability detection system by maintaining high data quality. As part of continuous data refresh, in addition to new vulnerabilities, we re-evaluate older vulnerability data to determine its security implications in an evolving threat landscape and remove if non-relevant. This reduces false positives and eliminates unnecessary blockers in the development process.

As a result, you may notice that policy reports have fewer security violations, or some links do not return the same vulnerability data as before. Here are the most common causes:

The vulnerability from NVD was determined to be in error after security review
vulnerability was duplicated and consolidated into a single issue
Vulnerability range is updated during the deep dive to correct the effective range set from fast-track

Where components data is sourced

Component binaries come from popular public repositories like Central, NuGet.org, npmjs.org, Fedora EPEL, and PyPI. We will also ingest components directly from GitHub, and other project download sites when nominated by customers.

Binary repositories provide the ability to extract information like declared licenses, popularity, and release history. Additional component metadata comes from a variety of sources including direct research.

How the vulnerability scores are calculated

When new vulnerabilities are reported from sources other than the NVD, Sonatype uses the Common Vulnerability Scoring System (CVSS) version 3 to score vulnerabilities and assign a vulnerability identifier with the SONATYPE- prefix.

Sonatype researchers often comes up with the CVSS scores well before the NVD does due to their months long backlog. These vulnerabilities may not have valid CVSS scores assigned for some time but when updated they may differ from the Sonatype score.

They're both using the same CVSS metric, but the Sonatype researcher came to a different conclusion as to the threat than NVD did. We make our determination after finding and analyzing the fix code and all available additional resources. It is common and expected that those scores may sometimes be different.

Data that Sonatype provides

The source of the advisory: Sonatype Security Research or the National Vulnerability Database
The severity of the issue: CVSS and scoring system version and the source of the score creation
The Common Weakness Enumeration (CWE)
The exact description from the advisory
A detailed explanation of the advisory risk and the attack vector (because the advisory description is often very poor)
How to determine if you are vulnerable
A recommendation on how to fix or work around the issue
The root cause of the issue; the exact class and vulnerable version range that was found in your code
Publicly known attack vectors or exploits; additional resources that describe the exact issue