In this guide:
- Reads vs Writes
- Configuring Repositories
- Networking Infrastructure
- Internet Connectivity
- Define Your Deployment Model
- Component Management vs. Performance
- High Availability
- Maven Configuration for High Availability
- Scale Across Data Centers
- Disaster Recovery
- Virtual or Physical Machine
- Deployment Examples
- Additional Resources
Many organizations have successfully deployed Nexus Repository Manager. While the system design and architecture might seem difficult at first, it’s quite straightforward so long as you follow a few basic guidelines.
This guide offers tips and common practices we’ve learned after years of helping customers deploy Nexus Repository Manager. Illustrating some of these strategies below, you’ll learn about deployment models that you can apply to all respective versions of Repository Manager.
Follow these guidelines to meet successful deployment goals within your organization.
Reads vs Writes
The underlying principle for configuring a highly available binary repository manager architecture is to distinguish read-only from read/write requests.
When the read-only attribute is turned on in a server its contents can be opened and viewed but writing to it won’t be possible. In other words, contents can be retrieved, but not modified. For example a proxy server that’s marked as read-only implies that it’s a secure cache of components.
Read/write access refers to – in this case – a server that can both input and output data. For example, if using a star pattern deployment strategy read/write access is attributed to one hosted server. If using the federated model all hosted repositories in your deployment environment are assigned read/write capabilities.
It’s critical to determine where you want your repositories to be located. This decision is heavily influenced by your deployment model – star or federated - explained later. Configuration and placement is also determined by the network infrastructure and internet connectivity, as well as the degree of control you need to exercise over component usage.
In your deployment environment, set up your proxy repositories so they can’t be modified by producers. In other words, configure them to read-only state for your release manager. Instead, add a CI system and make it accessible to your producers. This allows them to automate and publish changes to hosted instances with write access.
The underlying data networking infrastructure will impact the proxy configuration. You need to be sure that it can handle the expected data traffic between repositories, and between repositories and users.
If you have a star networking topology that requires all data traffic to go through a central location, then you probably will want to create a proxy repository in the central location. Remote offices will not proxy directly with each other, but rather through the intermediate proxy.
On the other hand, if you have fast network connections between all your offices, then you can set up direct proxies between locations as needed without worrying about the underlying infrastructure.
For maximum performance, configure the local repository manager to proxy Central Repository directly if the location has a direct internet connection.
On the other hand, if the location connects to the internet over an enterprise WAN, then you will probably want to proxy Central Repository indirectly through another repository that has direct access to the internet.
Define Your Deployment Model
The geographic dispersion of developers is important because it dictates whether you should use a star or federated pattern for your deployments. Depending on your chosen model there are standards on how to appropriate read-only and read/write access.
In both models you’ll establish one master hosted repository where artifacts are staged and deployed. However, the relationship between team role, permissions, and network architecture differ. So, consider your ideal model based on your organization’s needs.
We recommend that you adopt the star pattern when all of the producers in your organization are at the same location. The star pattern is similar to “hub-and-spoke” network architecture where the “master” server – or “hub” – is connected to at least two more servers. In essence, the hub centralizes all repositories that hosts artifacts in the network.
In this model you’ll deploy a sole, co-located master hosted repository with read/write access for your entire enterprise of producers. All additional repositories are configured to read-only at other locations. Artifact consumers will make requests to the read-only proxy instances. If the proxies don’t have the binaries the requests go back to the “hub”, which caches the result.
In this model, you have producers at multiple locations. You’ll deploy a master hosted repository at each location with producers. Additionally, you’ll set up proxy repositories at locations solely for artifact consumers. In this pattern, all servers have read/write access.
Though if using this method, neither version of Repository Manager supports atomic commits across geographies. Atomic commits require coordination amid deployment to multiple servers. We recommend that you assign a single location for a given development team to upload their artifacts. This practice keeps your product in a in a consistent state.
For example, the IQ team publishes
com.sonatype.iqserver to “Location A” and the RM team publishes
com.sonatype.repository to “Location B”. Any attempts to deploy into the wrong location would fail.
Also note, that when artifacts are shared between the teams – for example, Firewall Libraries – “Location A” would have a proxy repository to “Location B” and vice versa.
As a word of caution, the federated pattern could lead to a complicated network of inter-tangled proxies. That said, be vigilant of network performance in this model, especially if network throughput relies heavily on writes. Plus, since atomic commits aren’t supported be sure that producers in separate locations aren’t creating the same components.
Component Management vs. Performance
The relative importance of component management versus performance will impact your proxy configuration.
You’ll get the best performance by locating hosted repositories closest to producers, and proxying all other repositories, including the Central Repository. For maximum performance, each location with an adequate internet connection should proxy Central directly because it is globally load balanced. Artifacts are fetched from the fastest Central Repository server based on the proxy repository location.
However, if you need to control which open source components are available to developers then you’ll want to create a centrally managed proxy repository for Central.
You can use Nexus Firewall to ensure this repository only has approved components that are free of license or security issues. Each location should have a local proxy of this controlled repository which developers will use to acquire approved open source components.
When your development teams are working on mission critical projects that cannot suffer a delay, design your deployment architecture to ensure High Availability (HA).
The two most commonly used HA configurations are active-active and active-passive. Consider the differences:
Active/Passive - in this model only one node is operational, whereas the other repositories remain on standby if the active one fails. You can configure 2-node active-passive instances for Repository Manager 2 OSS and PRO as well as version 3 OSS.
Active/Active - in this environment your configuration ensures that components are always available for consumption. When the release manager or CI Server continuously publishes components directly to the master repository, all contents are replicated across the entire cluster. In this model, you can configure a 3-node cluster but it’s limited to Repository Manager 3 PRO.
NOTE: If you’re a version 3 PRO user you’ll be able to set up an active-active cluster. Also, your PRO license includes unlimited servers at no additional cost.
This matrix identifies tool, features, and capabilities you can apply to your version of Repository Manager:
|Feature||3 PRO/HA-C||3 PRO||3 OSS||2 PRO||2 OSS|
|Backup for binary storage||X||X||X||X||X|
|Tasks to archive database & system configuration||X||X||X||X||X|
Maven Configuration for High Availability
Deploying a highly available configuration requires the use of separate URLs for reading and writing artifacts. Whether or not you plan to deploy HA-C immediately, we recommend configuring Maven to use different host names for the deployment URL (set in the POM), and the repository read URL (set in the Maven settings file).
If you are not yet using this type of configuration, then you can create an alias for your repository with a CNAME record entry in your DNS. By doing this, you’ll be able to move to an HA configuration later without requiring changes to your developer’s Maven environment.
Scale Across Data Centers
We also recommend you configure your deployment environment for additional, horizontal scaling across your data centers.
As your personnel expands in size, so will the need to both automate system communication and offload such to external devices. More often than not, large-scale, multi-site deployments of proxy repositories may overload master repositories using the traditional proxy mechanism.
So, consider utilities like webhooks and Smart Proxy to help streamline these inter-network requests.
Webhooks are available in Repository Manager 3 (PRO and OSS). They allow you to configure HTTP callbacks from the UI. You can configure and test these callback with external services such as RequestBin or Hookbin to inspect HTTP requests.
Smart Proxy is only available in Nexus Repository 2 PRO. It consists of an enhanced proxy that scales to support large deployments by pushing component update notifications from the master repository.
We recommend enabling this feature when utilizing snapshot repositories. You should also ensure the timeouts are set at the default value of 1440 minutes (24 hours). This will reduce the load on your master repository while still ensuring that the most recent snapshots are available immediately around your organization.
In a standard backup scenario, someone on your team might make copy of all data to be used later, if your system fails. However, backups do not necessarily include the infrastructure of failed systems, on-prem or in cloud. Ultimately it’ll be very time consuming to access and utilize these backups when needed.
So, for your highly available deployment a more robust backup solution for the repository manager must capture state from two separate yet associated parts. This includes repository data and configuration for each node, as well as its internal storage directories for the binary parts of components and their assets (blob stores).
Each node in an HA cluster has its own separate data directory, which contains all information need to synchronize the nodes. This is done via scheduled tasks that you can run from the user interface.
NOTE: If rsync isn’t available, similar tools exist to help with file transfer and synchronization.
Backup tasks in the UI export settings and metadata in each server, as snapshots, to an offsite location. This location must be a file system backup tool where these backup files are stored separately. Along with the tasks, you can run a utility such as rsync to replicate content to your remote disaster recovery system. This approach works on any version of Repository Manager. Though one unique function for Repository Manager 3 PRO is that all nodes in the cluster become read-only, automatically, when running tasks on any of the nodes.
Unlike Repository Manager 3, version 2 doesn’t use blob stores. Regardless, the same binary files are still
stored on disk, by default in a “storage” directory nested in the data directory (
../sonatype-work). In some
cases, the recovery process is separated from the rsync of the rest of sonatype-work, in the same fashion as the
Repository Manager 3’s blob store.
Virtual or Physical Machine
Nexus Repository Manager supports physical and virtual machines equally well as it doesn’t require a lot of CPU or RAM to work effectively. At Sonatype, we’ve moved all of our managed forges over to virtual machines with the specifications found in System Requirements for version 2 and version 3.
These systems serve requests on the order of 1,400-2,500 requests per minute. Above that, the system typically needs to scale up in terms of network and I/O optimization. As mentioned in the requirements guidelines above, consider increasing the number of CPUs and amount of RAM can typically help as well.
In general, both Repository Manager 2 and 3 can be configured to either the star or federated model. You could even combine both active-active and active-passive approaches to create a system that provides high availability for both “reads” and “writes”.
In many cases, it is more efficient to create a proxy repository and configure it to a group repository on the master server. If needed, you can create a proxy repository to serve each individual proxy or hosted repository on the master. However, this may increase configuration overhead. In addition, this will increase the network overhead thereby decreasing read performance from the proxy instances.
In an active-passive configuration you increase the availability of repository “writes” by using redundant, standby servers. These servers are backed by a highly available file system.
Consider these guidelines:
- Configure two (or more) Nexus servers identically, sharing the same file system. Only one Nexus repository manager can be active at a time. The backup will utilize the same configuration files and point to the same repositories as the master.
- The file system should be highly available using an off-the-shelf solution.
- When the first Nexus server fails, the corresponding backup must be activated through a configuration change.
- An IP switch (or similar device) enables clients to continue using the same Name/IP address for the Nexus server.
Star Pattern Deployment with Smart Proxy
The Smart Proxy functionality in Nexus Pro 2.x scales for the largest proxy architectures by pushing update notifications from the master.
As illustrated above, configure your repository manager servers to the star model, with Smart Proxy. This includes:
- the master as a read/write server to accept CI builds
- Smart Proxy, enabled in the Repository Manager UI
- a load balancer, behind the master, to redirect any requests to other servers in the network
- additional “carbon copy” instances as proxies of the master; they contain redundant content from which developers can read contents
- a reverse proxy server to balance the load of request among your servers that accept writes
So, when a development team in New York commits a build, the CI server deploys a new component snapshot version to the Nexus master instance. With Smart Proxy enabled, this deployment is immediately followed by notifications, sent to secure Smart proxy subscribers in the HA network – the additional proxy servers.
These are respectively collocated with the developers in London, Bangalore, and San Jose and can be configured to immediately fetch the new components available. At a minimum the teams will be aware of new component versions without the need to manually poll the master repeatedly.
Additionally, Smart Proxy automates these notifications which in turn eases system performance. When a user of any of the subscribers build a component that depends on a snapshot version of the component from the master, Smart Proxy ensures that the latest version published to the master is used.
If you’re using Repository Manager 3 PRO you can configure your deployment environment for active-active clustering – detailed in High Availability. However this environment is limited to a single data center location.
In an active-active configuration your organization requires the repository manager to be online all the time to fulfill its critical role deployment. In this environment, your servers share knowledge about new components and corresponding metadata. This approach includes the following:
- setting up 3 repository managers, installed side by side on separate hardware (or cloud) servers, but co-located in the same data center.
- a CI server cluster detects increased load over a configured threshold
- a load balancer redirecting requests to the additional servers with round-robin scheduling distributing the load across the servers.
- deploying and/or proxying artifacts on one repository manager which are immediately available to all others
- a system that alerts members of the repository administration team that new changes have taken place (via Webhooks)
- in the light of node failover, a new instance is automatically started in the datacenter
- adding the new repository manager instance to the load balancer’s round robin schedule; the load on the other servers is kept below performance threshold
As depicted in this diagram, HA-C stores component and asset binaries in a shared location, which can be either a shared file system or a cloud object store.
When you configure HA-C for deployment use the star pattern - where the Repository Manager cluster acts as the “hub” - to sustain multi-data center performance. All additional, synced proxy servers must be configured to have their own file system.
Additional assistance is available in the Sonatype Community at https://community.sonatype.com/.
Please visit our community for easy access to support from us or your peers, product updates, insights from Nexus experts, free training and help, forums and much more.