A Primer for Storage Management in Nexus Repository 3

Repository Manager | Reading time: 11 minutes

In this guide

Overview

Among its many abilities, Nexus Repository 3 is a tool designed to optimize download and storage of repository data managed in your software development life cycle. Over time more frequent and rigorous use of the application may prove to be a burden on storage. Therefore you’ll need to plan a storage management strategy.

Fortunately, the repository manager includes in-app support for disk optimization, cleanup, and connection to storage devices. This article gives you some background on data management with Nexus Repository. Plus you’ll get an inside look at the directories most affected by storage requirements. You’ll also learn about administrative tasks and configuration scenarios that’ll help leave less of a footprint on your storage devices.

Audience

Administrators or other Operations Personnel that manage Nexus Repository 3

Desired Outcomes

After reading this article we expect you to be able to:

  • Understand the importance of managing storage and what factors affect system performance over time.
  • Identify the directories that impact storage and how they function in external file systems.
  • Choose an appropriate file storage configuration when allocating component data to a larger storage framework.
  • Perform administrative tasks and terminal commands that clean up unnecessary content and optimize disk space.
  • Select the best sample scenario to scale storage requirements across team size.

Why Storage Management is Important

Nexus Repository communicates with clients such as build tools and IDEs. This in turn creates a systematic workload of client-to-server requests and file storage. If a solid plan is in place for managing your data, Nexus Repository will maintain steady performance. Plus it’ll reduce other bottlenecks that hinder production of good software.

On installation, the repository manager consumes approximately 500 MB of disk storage. As the application is used additional space is taken up in the working data directory storing such contents as:

  • proxied components, cached from a remote server
  • components deployed to hosted repositories
  • indexes generated by search and browse functionality

Without a doubt, using Nexus Repository to manage all your applications will call for greater demands on disk space. These demands could be the result of the following:

  • teams churning out daily project deployments via continuous deployment pipelines
  • retention of one or more outdated repositories
  • the size of repository contents such as Maven projects or Docker containers

It won’t take long to discover obsolete snapshots or other expendable data taking a toll on local storage. So, along with expanding disk space and improving response time you’ll need to properly configure administrative tasks to rid the system of unwanted content.

Directories Used for Storage Management

Inside the application directory where you extract Nexus Repository, there are two directories: an installation directory and a data directory. The data directory, found at ../sonatype-work/nexus3, stores repositories, components, system configuration, and other data that power the application. This directory contains both a default blob store folder and a database folder containing component metadata via OrientDB.

In a single-node installation, the blob store (../sonatype-work/nexus3/blobs) is found in the parent data directory. This folder contains assets downloaded from a caching proxy or published to hosted repositories. Its data is stored in a computer-readable, binary format.

For a multi-server installation, such as HA-C nodes, the blob store must be mounted to a shared storage location. Although the default blob store folder is created in the data directory, you can create additional blob stores in other directories or additional devices. Be sure to determine the location of your blob store, first, before moving it to a new location.

The information inside your data directory’s component metadata folder (../sonatype-work/nexus3/db/component) references all respective binaries found in the blob store folder. The repository manager’s databases rely on OrientDB which is the primary store for repository metadata. OrientDB provides a command-line tool so you can access its console to make data store adjustments.

As your repository data expands with a larger personnel, more applications, and more servers to manage, we advise that you separate the blob store from the data directory. This prevents the risk of data corruption when the OrientDB database runs out of space. In practice, blob stores can be created on devices with large capacity where no database update would be required upfront. However, if you need to move a blob store later due to limited storage then you’re required to configure the database to reference the newly migrated blob store.

File System Configuration

Sonatype has approved file systems to help you manage large directories of assets and metadata. Though we do caution you to be discreet and test your storage plan before deploying a production instance of Nexus Repository.

For optimal storage performance we recommend using a storage area network (SAN) to maintain repository data. Many Nexus Repository administrators use on-premises servers with the SAN model for proven speed and reliability. On the other hand, using a network file system (NFS) heightens the risk of performance degradation within your repository manager installation.

However, there are exceptions reserved for NFSv4 and later. If you choose NFSv4 we recommend that your avoid mounting the entire data directory to this storage type; use it only for your blob store directory.

Recommended

With the growing popularity of cloud services we also support a variety of network file systems that you can partially mount on Amazon Cloud. Cloud services allow for a scalable location where you, your operations, or development teams can store and retrieve repository data in your SDLC.

Here are a few recommended file systems that you can use with your storage media:

Amazon Elastic File System

Amazon Elastic File System (EFS) is a service within the AWS ecosystem. It can be used for blob storage with Nexus Repository 3. You can mount EFS to an Elastic Computing Cloud (EC2) instance with standard Linux commands. This can be accomplished with the NFSv4 protocol.

NFSv4 offers traditional protocols to help maintain your blob data, such as file access and handling. You can follow the steps in this AWS tutorial.

Block Storage

If you choose the SAN model for storage, create an Amazon Elastic Block Store (EBS) volume and attach it to an EC2 instance. EBS is used in storage area network (SAN) environments where component data is stored in blocks. From the service UI and console you can connect EBS volumes to logical EC2 drives inside your repository manager installation.

Cloud Object Storage

It’s recommended to use Amazon Simple Storage Service (S3) for blob stores only if your installation is running on EC2 instances within AWS. If you integrate S3 into your architecture and workflow you’ll use it for backing up and restoring repository data. To date this service is only available for your backup solution, not for migrating blob store data.

Not Recommended

Some storage configuration services and methods have either not been tested or proven incompatible with Nexus Repository installations. For example, we don’t advise using file systems such as Linux FUSE and GlusterFS since issues like search index corruption have been reported on these programs.

Additionally, busy input/output (I/O) requests are known to be burdensome to versions of NFS prior to NFSv4. So, we recommend you avoid using storage media for NFS, NFSv2, or NFSv3.

Administrative Tasks to Reclaim Space

When your repository manager starts encroaching upon a full disk you can run administrative tasks to regain space. Nexus Repository 3 allows you to schedule in-app tasks to manage your disk. You can determine what contents you need to preserve, what to let go, or when to dump unnecessary contents.

To schedule these tasks in the UI, locate the Tasks submenu in the Administration panel and run them as needed. You can learn more details about them in our help guide.

  • Repository - Delete unused components
  • Maven - Delete SNAPSHOT
  • Maven - Delete unused SNAPSHOT
  • Docker - Delete incomplete uploads
  • Docker - Delete unused manifests and images

When executed, all the tasks above will mark content for deletion from a blob store. In order to physically delete the content from your blob store you need to run Admin - Compact blob store. This task hard deletes repository data, returning space to your disk.

Additional Tips for Storage Management

In extreme events where I/O bottlenecks slow down productivity due to maxed-out space, examine your data directory from the command line. Then refer to these recommendations below.

Blob Store

When the nexus.log server begins reporting errors that resemble this output:

npm ERR! javax.servlet.ServletException: org.sonatype.nexus.blobstore.api. BlobStoreException: BlobId: tmp$eb796cb8-7fe6-4306-83ac-3143ced2eb38, java.io.IO Exception: No space left on device

We recommend you run the Admin - Compact blob store task. Follow the steps that start in What to Do When the Blobstore is Out of Disk Space. Resolving this issue also requires that you access the OrientDB database when upgrading your hardware or moving to a larger storage service.

OrientDB

In this example, your metadata database (OrientDB) runs out of storage. The database is more likely to run out of disk space when it shares the same location as the blob store. See below for a sample case and solution.

When your system experiences an error that resembles this output:

com.orientechnologies.orient.core.exception.OLowDiskSpaceException: Error occurred while executing a write operation to database 'component' due to limited free space on the disk (0 MB). The database is now working in read-only mode. Please close the database (or stop OrientDB), make room on your hard drive and then reopen the database.

This means OrientDB is completely full. It automatically switches to read-only mode. Write access will be deactivated until you upgrade to more space. Review the recommended steps in What to Do When the Database is Out of Disk Space.

noatime

Note: Use this option at the discretion of your Operations team.

If your installation runs on Linux, you can reduce disk I/O with a mount option called noatime. This attribute tells your file system not to record the last access time to disk, even if the operation is a read. In a rapid development lifecycle, each record is cached on disk and can potentially slow down your system over time. So, enabling this feature can certainly free up data you don’t need.

Baseline Recommendations for Memory by Team

Part of your storage plan should also include strategies for scaling across teams. Of course, the use cases will vary. Whether you’re part of a team of a dozen managing ten repositories or one with 6 terabytes, your storage guidelines will differ. Bottom line, you’ll need a benchmark to determine the minimum amount of physical RAM necessary to write information to the data directory and read from the Nexus Repository server.

Since storage consumption and team size are tied to RAM requirements we offer instance profiles that you can measure against your personnel, repository, and storage needs. For this reason, we provide examples highlighting small, medium, and large teams and the most logical baseline for memory needed for your repository manager installation.

Small Team

In this example, you’re part of small team that produces applications in a single format type. Your team manages less than 20 repositories with a blob store size that doesn’t exceed 20 gigabytes of repository data. For this use case we recommend you use 4 gigabytes of RAM. This is a minimum requirement for a first time installation of Nexus Repository 3.

Medium-sized Team

In this example your organization has grown and you’re managing more staff and more repositories. The total number of repositories now reach nearly 50 with a total blob store size just shy of 200 gigabytes.

Also, you’re no longer part of a one-format organization. Your team now builds applications with a couple additional formats. With this heavier demand on I/O operations we recommend you double your RAM usage to 8 gigabytes.

Large, Enterprise Team

In this final example, your team has grown significantly due to rapid development and automation. The number of repositories has also grown to well over 50 repositories. Additionally, the total size of your blob stores has exceeded 200 gigabytes of repository data. So, we suggest you upgrade to 16 gigabytes of RAM or more.

Conclusion

At its core, Nexus Repository is a high-capacity I/O application that could potentially read and write a lion’s share of data. To avoid performance bottlenecks, it’s best to develop a storage strategy that best suits your organization’s needs. It could include team headcount, workflow, system architecture, and extent of repository usage. These factors impact how you will execute and oversee data storage resources.

In the near future the Sonatype’s Repository team plans to improve functionality allowing you to reduce unnecessary repository data with more flexible tools and features. So keep an eye out for supplemental guides and learning materials on storage, blob store management, and more.