Starting with Staging and Cleanup

Repository Manager | Reading time: 11 minutes

Is this article helpful?

In this guide:

Overview

Staging is a simple but powerful feature in Nexus Repository that lets you move artifacts from repository to repository with your CI/CD tools. You can use this to build workflows with quality checks so artifacts are never used before they’re ready. Staging also connects with other features like Cleanup, helping you keep your build pipelines lean and light on storage space. Starting staging after having already integrated with build pipelines may lead to a fair amount of rework. This guide will review the concept of staging, how to set up a basic staging workflow, and the steps to migrate legacy environments.

Here are some quick benefits to staging:

  • Repository endpoints at each stage in the pipeline
  • Artifacts are not copied or duplicated
  • Everything as a release candidate
  • Promotion workflow is driven by CI
  • Cleanup is automated based on stage policy

Why Staging?

The concepts behind staging come from requirements for modern CI/CD pipelines which are not managed effectively with just a single repository. A few common requirements are:

  • Create a release candidate to test or discard before being promoted
  • Keep build metadata for release artifacts
  • Avoid dependencies on non-production artifacts
  • Clean up fast based on artifact lifecycle

In Nexus Repository Manager, artifacts in a repository are not related to where the artifact is located on a disk. When an file is uploaded, it is stored in a blobstore and the repository path is included as metadata stored in a database. At build time, artifacts may also be tagged with additional metadata needed by the CI pipeline such as build and environment details. These tags are used to group the artifacts needed for a release and promote them through the release process. This workflow gives build administrators complete control over the artifact’s lifecycle from an automated pipeline.

Artifact Lifecycle

Another way to think about staging is the lifecycle of the artifacts your organization produces. In this context, an artifact is one of the tangible by-products produced during software development. Artifacts commonly go through a set of lifecycle phases, or ‘stages’, where they are promoted from one stage to the next as they pass automated quality checks. A core DevSecOps principle is to fail fast. If a release candidate fails checks, they are quickly dropped to not take up time and space in the workflow. Anything left around too long will be removed by regular cleanup policies.

Production environments need to be protected from artifacts that have not passed the required quality checks. The risk to organizations is too great. Stages make it easier to control access to artifacts through the pipeline. Here are the stages commonly found in use.

  1. Development > build first stores artifact in a hosted repository
  2. Testing > release candidates are promoted for quality testing
  3. Production > artifacts made available to production environments
  4. End of Life > artifacts are deleted or archived to match retention policies

Staging Environments

The NXRM best practice in staging is to funnel access through group repositories aligned to lifecycle stages. This gives administrators control over the mix of repositories, and in turn artifacts, available in a given stage. This mix can later be edited without having to change the access endpoints used by the pipeline. Even with one repository in a group, this step will make the management of environments easier in the long term.

Deploying to a repository, tagging of release candidates, and the promotion of artifacts should happen as a part of an automated CI/CD pipeline. Any dependence on a manual workflow may lead to errors and does not scale well. This is why the staging and tagging functions are done through the platform plugins and API calls. The workflow can be simulated through NXRM’s built-in swagger interface to test out the calls mentioned below.

Planning for Cleanup

Planning for cleanup is a Nexus administrator’s top priority. The number of artifacts moving into the repository will need to be matched by the artifacts being cleaned. This is critical to keeping server costs manageable and avoiding running out of space. Staging environments are an effective means to clean up the repository.

Aligning cleanup policies with the artifact’s lifecycle makes more sense when selecting by age and last downloaded. Review the Cleanup Guide for suggestions on configuring this effectively for different stages and formats.

Format-based vs Team-based Repositories

When designing the initial layout of repositories in NXRM3 there are 2 common models used.

Models Description Example Traits
format-based repository made for each language format Maven
npm
NuGet
PyPI
  • enforce clear naming convention for team name-space
  • content selectors for access enforcement
  • simple staging and cleanup workflow
  • scales for large organizations
  • supports inner-source components
team-based repository made for each team or line of business team-a
team-b
  • easy to set up and manage access
  • clear separation between business units
  • staging and cleanup per team
  • can lead to repository sprawl

Which model you use can have meaningful implications for staging.

Format-based Team-based
  • easier to set up and deploy staging because only a single set of staging environments are needed per format
  • trying to use a single staging environment can be very difficult to manage many team repositories
  • team staging environments will result in a large number of repositories/groups to manage
  • one size fits all model may not work with very different teams with different release cadence and requirements
  • teams can have their pipeline customized to best fit their needs
  • creating meaningful unique tags may pose a challenge to avoid overlapping with other teams
  • operations are repository-specific resulting in fewer issues with overlapping tags
  • cleaning up tags may affect other teams
  • cleanup policies are time-based.
  • teams with different release cadence may have different cleanup requirements
  • teams can have their cleanup match their release cadence

Generally, the best practice is to standardize on the format-based model however there are very good reasons to leverage the team-based model. This is especially true when the number of teams will not grow over time and their pipelines will not overlap. For some organizations, they may wish to leverage both models where most development teams share the same pipeline and a limited few keep their own. With either model, the deployment of the staging environments will be a similar process. They should take careful planning for repository naming to make managing them easier in the long run.

Starting with Staging

Staging in NXRM3 requires a few basic steps to implement.

  1. Set up the staging environment using group repositories and separate hosted repositories for each stage.
    • Map out the user access requirements for each stage.
    • It may be easier to add additional stages once the basic workflow is implemented.
  2. Use the REST API endpoints to move artifacts from one hosted repository to another.
    • Optionally, tagging can be used to label artifacts with the release candidate identifier.
    • These tags will be used as the target in the move REST call.
  3. Apply cleanup policies to each stage of the workflow.

Repository Configuration

Staging is intended to fit into an organization’s existing workflow and should not require a complex setup. To start, add group repositories with a corresponding hosted repository for each stage. Here is a simple example.

Environment Development Testing Production
group (read) maven-dev maven-uat maven-prod
stage (write/move) maven-dev-hosted maven-uat-hosted maven-prod-hosted
proxy maven-central-proxy maven-central-proxy maven-central-proxy
read-only maven-uat-hosted maven-prod-hosted maven-prod-hosted

In this example, there are three stages however this could be done with just two; dev and prod.

  • Each stage is made of (1) group repository and (1) hosted repository for that stage.
  • A proxy to maven central can be shared by all stages where needed.
  • The hosted repositories of later stages may be added to the previous groups.
    • In development, this is for build tools to determine the ‘latest’ versions of an artifact available by having access to all the available artifacts under the single group repository.
    • It may also be necessary to have the production artifacts available during the testing stage if there are cross dependencies.
  • Everything required for a release candidate is promoted to testing and production together.

Additional Tips

  • Establish a normalized, meaningful naming convention for your groups and stages. This makes managing the process easier in the long run. This is especially true if using team-based, or mixed, repositories. This will also reduce the chance of errors configuring the pipeline.
  • Keep the stages on the same blobstore if possible. So long if the repositories are on the same blobstore, the move action is only an update to the metadata in the database. If different blob stores are used, then a copy / soft delete action is needed. This will cause additional server load and increase the disk space used.
  • npm and docker formats allow for deploying to group repositories. This makes configuring staging much easier for these formats.

Simple Workflow:

  1. Build systems use the [maven-dev] group for pulling components for the build.
  2. Artifacts are deployed to [maven-dev-hosted] and optionally tagged with the metadata for the release candidate.
  3. Build artifacts are promoted from [maven-dev-hosted] to [maven-uat-hosted] using the rest API move command.
  4. Testing uses the artifacts from the [maven-uat] group. Depending on the job they may also need artifacts from releases to test.
  5. Artifacts are promoted from [maven-uat-hosted] to [maven-prod-hosted] using the API.
  6. Production artifacts are pulled from the [maven-prod] group.

Using the API

The staging API uses a simple POST to move artifacts from a source repository to a destination repository.

POST service/rest/v1/staging/move/{repository}

This post uses the same structure to search for components as the Search API. Below is the endpoint with a simple search for an artifact in the [maven-dev-hosted] repository which we will move to the [maven-uat-hosted] repository.

POST service/rest/v1/staging/move/maven-uat-hosted?repository=maven-dev-hosted&group=org.osgi&name=org.osgi.core&version=4.3.1

The move command can be made against any number of artifacts that match the search criteria. It can be further simplified using a tag that is associated with the artifacts of the release candidate.

POST service/rest/v1/staging/move/maven-uat-hosted?repository=maven-dev-hosted&tag=maven-build-100

Tagging Simplified

Staging is used to promote the artifacts associated with a release candidate from one environment to the next. The Tagging feature is grouped with staging as it makes the process easier as seen in the above example. It is important to note that using tags to promote is not required to do staging. We do recommend adopting it since Tagging is done primarily through the rest APIs so that it can be automatically added during the build process. Tags can be searched through the UI however the only way to manage tags is through API calls.

Tagging Workflow

The only pieces required for tagging are creating your tag “name” as a unique identifier and assigning artifacts to the tag. We recommend using a name that is meaningful and unique but for performance reasons, not too long. Aligning identifiers with ids used in the release process is ideal. In this example, the “attributes” are optional and can contain any JSON data that would be meaningful to associate with the release candidate.

curl -u admin:admin123 -X POST -H 'Content-Type: application/json' 'http://localhost:8081/service/rest/v1/tags' -d '{"name": "org.osgi.core_4.3.1","attributes": {"built-by": "jenkins"}}'

Associating artifacts to a tag is similar to the search API, where any artifacts identified with the search will be tagged. It is usually better to associate the tag when uploading the artifacts to the repository. This can be done using API or with any of the CI tooling used to upload. See tagging documentation for details.

POST /service/rest/v1/tags/associate/{tagName}

Legacy Conversions

The primary challenge for existing users is that they may have to rearchitect actively used repositories to mature to a staging model. Often a single repository will be used for both dev and production or the pipeline is using a hosted repository rather than a group repository. In these cases, the simplicity of both the tagging and staging features allows for adoption to be done in stages without too much disruption.

This example assumes a single repository (legacy) is actively used for dev and production artifacts.

  1. Setup group repositories (dev, prod) and add the existing hosted repositories.
    • The most time-consuming part here is switching to use these group repositories.
  2. Create hosted repositories for dev and prod and add them to groups.
    • When planning to use the prod group, add the legacy repo to the prod group so the artifacts are available from this group.
  3. Coordinate with pipeline teams to switch to using the new group repositories.
    • Allowing anonymous access to pipeline repositories is a security risk and is not recommended. Set up user access controls if not already in place.
    • Implement tagging even if it is not being used yet.
    • Adding the legacy repo to each of the new groups will ensure that any needed artifacts are still available.
    • This would be a good opportunity to test the staging workflow.
  4. Move production artifacts from legacy to the prod hosted repository.
    • Moving artifacts will not change the availability of the artifacts at this point.
    • Search and move artifacts using the staging APIs. General searches can move many artifacts at once.
    • Generic tags, such as ‘prod’, could also be used effectively as well.
    • Note that these endpoints are limited to 10K artifacts at a time. The operation may need to be carried out multiple times until all artifacts have been moved.
  5. When builds teams are all leveraging the staging workflow, repository groups can be cleaned up to match our staging model.

Resources

Staging does not need to be overly complicated nor take a long time to implement. There is no reason to not build a staging environment from the start. The key reason organizations often do not is due to the planning and decision-making involved to do it effectively. The Sonatype Customer Success team can help with this process and set up the discussions needed to be effective.