A Primer for Storage Management in Nexus Repository 3

Repository Manager | Reading time: 10 minutes

In this guide

Overview

 

Among its many abilities, Nexus Repository 3 is a tool designed to optimize download and storage of repository data managed in your software development life cycle. Over time more frequent and rigorous use of the application may prove to be a burden on storage. Therefore you’ll need to plan a storage management strategy.

Fortunately, the repository manager includes in-app support for disk optimization, cleanup, and connection to storage devices. This article gives you some background on data management with Nexus Repository. Plus you’ll get an inside look at the directories most affected by storage requirements. You’ll also learn about administrative tasks and configuration scenarios that’ll help leave less of a footprint on your storage devices.

Audience

Administrators or other Operations Personnel that manage Nexus Repository 3

Desired Outcomes

After reading this article we expect you to be able to:

  • Understand the importance of managing storage and what factors affect system performance over time.
  • Identify the directories that impact storage and how they function in external file systems.
  • Choose an appropriate file storage configuration when allocating component data to a larger storage framework.
  • Perform administrative tasks and terminal commands that clean up unnecessary content and optimize disk space.

Why Storage Management is Important

 

Nexus Repository communicates with clients such as build tools and IDEs. This in turn creates a systematic workload of client-to-server requests and file storage. If a solid plan is in place for managing your data, Nexus Repository will maintain steady performance. Plus it’ll reduce other bottlenecks that hinder production of good software.

On installation, the repository manager consumes approximately 500 MB of disk storage. As the application is used additional space is taken up in the working data directory storing such contents as:

  • proxied components, cached from a remote server
  • components deployed to hosted repositories
  • indexes generated by search and browse functionality

Without a doubt, using Nexus Repository to manage all your applications will call for greater demands on disk space. These demands could be the result of the following:

  • teams churning out daily project deployments via continuous deployment pipelines
  • retention of one or more outdated repositories
  • the size of repository contents such as Maven projects or Docker containers

It won’t take long to discover obsolete snapshots or other expendable data taking a toll on local storage. So, along with expanding disk space and improving response time you’ll need to properly configure removal policies and/or administrative tasks to rid the system of unwanted content.

Directories Used for Storage Management

 

Inside the application directory where you extract Nexus Repository, there are two directories: an installation directory and a data directory. The data directory, found at ../sonatype-work/nexus3, stores repositories, components, system configuration that power the application. This directory also stores a file system path to manage your blob stores as well as a database folder containing component metadata via OrientDB.

Note: Nexus Repository comes with a default blob store — ../sonatype-work/nexus3/blobs/default which contains binaries associated with a repository.

In order to attach storage to a single-node installation the external blob store can be configured at ../sonatype-work/nexus3/blobs/blobstore-name. In this example the file based directory is blobstore-name containing data associated with repositories using this blob store. Blob stores contains assets downloaded from caching proxies or published to hosted repositories. Its data is stored in a computer-readable, binary format.

For multi-server storage attachments, such as HA-C nodes, the blob store must be mounted to a shared storage location. Although the default blob store folder is created in the data directory, you can create additional blob stores in other directories or additional devices. First, be sure to determine the location of your blob store before moving it to a new location.

The information inside your data directory’s component metadata folder (../sonatype-work/nexus3/db/component) references all respective binaries found in the blob store folder. The repository manager’s databases rely on OrientDB which is the primary store for repository metadata. OrientDB provides a command-line tool so you can access its console to make data store adjustments.

As your repository data expands with a larger personnel, more applications, and more servers to manage, we advise that you separate the blob store from the data directory. This prevents the risk of data corruption when the OrientDB database runs out of space. In practice, blob stores can be created on devices with large capacity where no database update would be required upfront. However, if you need to move a blob store later due to limited storage then you’re required to configure the database to reference the newly migrated blob store.

File System Configuration

 

Sonatype has approved file systems to help you manage large directories of assets and metadata. Though we do caution you to be discreet and test your storage plan before deploying a production instance of Nexus Repository.

For optimal storage performance we recommend using a storage area network (SAN) to maintain repository data. Many Nexus Repository administrators use on-premises servers with the SAN model for proven speed and reliability. On the other hand, using a network file system (NFS) heightens the risk of performance degradation within your repository manager installation.

However, there are exceptions reserved for NFSv4 and later. If you choose NFSv4 we recommend that your avoid mounting the entire data directory to this storage type; use it only for your blob store directory.

As of 3.15 we introduced dynamic storage for Nexus Repository. It’s only available for users with professional licenses. So if you’re working with this version or later, you can combine blob stores into a group then migrate the binaries to external media. See the guide on scaling storage for more information.

Recommended

With the growing popularity of cloud services we also support a variety of network file systems that you can partially mount on Amazon Cloud. Cloud services allow for a scalable location where you, your operations, or development teams can store and retrieve repository data in your SDLC.

Here are a few recommended file systems that you can use with your storage media:

Amazon Elastic File System

Amazon Elastic File System (EFS) is a service within the AWS ecosystem. It can be used for blob storage with Nexus Repository 3. You can mount EFS to an Elastic Computing Cloud (EC2) instance with standard Linux commands. This can be accomplished with the NFSv4 protocol.

NFSv4 offers traditional protocols to help maintain your blob data, such as file access and handling. You can follow the steps in this AWS tutorial.

Block Storage

If you choose the SAN model for storage, create an Amazon Elastic Block Store (EBS) volume and attach it to an EC2 instance. EBS is used in storage area network (SAN) environments where component data is stored in blocks. From the service UI and console you can connect EBS volumes to logical EC2 drives inside your repository manager installation.

Cloud Object Storage

It’s recommended to use Amazon Simple Storage Service (S3) for blob stores only if your installation is running on EC2 instances within AWS. If you integrate S3 into your architecture and workflow you’ll use it for backing up and restoring repository data. To date this service is only available for your backup solution, not for migrating blob store data.

Not Recommended

Some storage configuration services and methods have either not been tested or proven incompatible with Nexus Repository installations. For example, we don’t advise using file systems such as Linux FUSE and GlusterFS since issues like search index corruption have been reported on these programs.

Additionally, busy input/output (I/O) requests are known to be burdensome to versions of NFS prior to NFSv4. So, we recommend you avoid using storage media for NFS, NFSv2, or NFSv3.

Tasks and Policies to Reclaim Space

 

NOTE: We recommended that you determine your cleanup and related policies before disk space issues arise.

When the contents of your repository manager start encroaching upon a full disk you can either run administrative tasks, or create and run customize policies to retain space. With both features you can determine what contents you need to preserve, what to let go, or when to dump unnecessary contents.

Scheduled Tasks

NOTE: Tasks are available in all versions of Repository Manager 3.

To schedule these tasks in the UI, locate the Tasks submenu in the Administration panel and run them as needed. You can learn more details about them in our help guide.

  • Repository - Delete unused components
  • Maven - Delete SNAPSHOT
  • Maven - Delete unused SNAPSHOT
  • Docker - Delete incomplete uploads
  • Docker - Delete unused manifests and images

Cleanup Policies

Cleanup is a solution for removing components identified by format type. It allows you to select components of a single format, or you can select all to remove at once. To use the feature:

  1. Click Create Cleanup Policy from the Repository submenu.
  2. Name the policy and choose the format.
  3. Then specify criteria to remove contents published before X days and downloaded before Y days.

You can preview the policies created from the UI, too.

Cleanup has a built-in scheduled task to execute the policies you create. You’ll need to run the task which deletes the binaries associated with the policy. This feature was released in Repository Manager 3.14. It’s not available in early versions.

When executed, all the tasks and policies will mark content for deletion from a blob store. In order to physically delete the content from your blob store, you need to run Admin - Compact blob store. This task hard deletes repository data, returning space to your disk.

You can read more about Cleanup Policies in our help guide.

Additional Tips for Storage Management

In extreme events where I/O bottlenecks slow down productivity due to maxed-out space, examine your data directory from the command line. Then refer to these recommendations below.

Blob Store

When the nexus.log server begins reporting errors that resemble this output:

npm ERR! javax.servlet.ServletException: org.sonatype.nexus.blobstore.api. BlobStoreException: BlobId: tmp$eb796cb8-7fe6-4306-83ac-3143ced2eb38, java.io.IO Exception: No space left on device

We recommend you run the Admin - Compact blob store task. Follow the steps that start in What to Do When the Blobstore is Out of Disk Space. Resolving this issue also requires that you access the OrientDB database when upgrading your hardware or moving to a larger storage service.

OrientDB

In this example, your metadata database (OrientDB) runs out of storage. The database is more likely to run out of disk space when it shares the same location as the blob store. See below for a sample case and solution.

When your system experiences an error that resembles this output:

com.orientechnologies.orient.core.exception.OLowDiskSpaceException: Error occurred while executing a write operation to database 'component' due to limited free space on the disk (0 MB). The database is now working in read-only mode. Please close the database (or stop OrientDB), make room on your hard drive and then reopen the database.

This means OrientDB is completely full. It automatically switches to read-only mode. Write access will be deactivated until you upgrade to more space. Review the recommended steps in What to Do When the Database is Out of Disk Space.

Optimal Mount Options

Note: Adjust mount options at the discretion of your Operations team.

noatime: If your installation runs on Linux, you can reduce disk I/O with a mount option called noatime. This attribute tells your file system not to record the last access time for files in the blob store, even if the operation is a read. Consult the manuals for /etc/fstab and the mount command for more information.

Conclusion

At its core, Nexus Repository is a high-capacity I/O application that could potentially read and write a lion’s share of data. To avoid performance bottlenecks, it’s best to develop a storage strategy that best suits your organization’s needs. It could include team headcount, workflow, system architecture, and extent of repository usage. These factors impact how you will execute and oversee data storage resources.

In the near future the Sonatype’s Repository team plans to improve functionality allowing you to reduce unnecessary repository data with more flexible tools and features. So keep an eye out for supplemental guides and learning materials on storage, blob store management, and more.