Merlin is a series of high-performance computing clusters provided by central PSI.
The Merlin resources are available to all PSI staff and external collaborators. The Merlin clusters currently available include:
- Merlin 7 (next-generation, currently in pre-production, expected to be generally available by January 2025)
- Merlin 6 (production system, will be decommissioned in 2025)
- Merlin 5 (legacy system with best effort support, will be decommissioned by 2025)
Merlin 7 is the newest generation of the HPC clusters available for PSI staff and collaborators. Based in Lugano at CSCS on their Alps infrastructure, the cluster boasts extensive advantages for distributed workloads and GPU jobs. The system is made up of several different types of Cray Shasta nodes, in a heterogeneous configuration. Currently, the system is being built up, and is in pre-production at the moment.
Our production cluster is Merlin 6, the cluster was designed to be extensible regarding the addition of compute nodes and storage. In addition to the main cluster's CPU based resources, the system also contains a smaller partition of GPU resources for biology research (Cryo-EM analysis) and machine learning applications.
We also maintain a legacy cluster, called Merlin 5, which is provided on a best-effort bases for workloads that don't have large resource needs or can be run for long periods.
Since 2019, the service has been maintained by the High-Performance Computing & Emerging Technologies group (HPCE).
All PSI staff and external collaborators can request access to Merlin, please follow the instructions in the documentation.
Various resources and support articles are provided in the documentation, which is only available through the PSI intranet. It also includes details on how to get help from admins.
Resource specific documentation:
The clusters use different services to help get users the most out of the resources. These include:
AFS
The Andrew File System (AFS) is available at PSI under the 'psi.ch' domain. This is mounted in the Merlin clusters thanks to the Auristor client. AFS contains personal user information as well as the software stack used in the Merlin clusters. AFS is mounted through the standard Ethernet network.
Network mounted home directories
Home directories are mounted under the PSI Central NFS service, providing up to 10GB capacity to each user, with daily snapshots for one week. This is mounted through the standard Ethernet network.
HPC storage
The main storage at PSI is based on the IBM's General Parallel FileSystem Spectrum Scale, suitable for HPC environments. This is mounted through the Infiniband network for high performance and low latency.
For Merlin 7, we are operating a HPE ClusterStor Array that used the Lustre File System, which is designed for high-throughput low latency data operations. The storage is connected to the cluster nodes over the Cray Slingshot network fabric.
Linux O.S.
All Merlin 5 and 6 nodes and servers are running RedHat Enterprise Linux. Merlin 7 uses the Cray OS (based on SUSE Linux for Enterprise).
Remote Desktop
For remote desktop access, the newest login nodes are running NoMachine Terminal Server.
Batch system
All the Merlin clusters use the Slurm Workload Manager. The Merlin Slurm configuration allows running from single core based jobs up to MPI based jobs, allowing to scale up when running over multiple nodes. Mixed resource workloads, such as use GPUs with CPUs, are also supported.
The system is built on Crays Shasta platform and will include the following hardware (exact details are still being determined, specifics might change until the system is in production):
- Multi-core CPU node (x86), 2x AMD EPYC 7742 (x86_64 Rome, 64 Cores, 3.2GHz) with 512GB DRR4 3200Mhz RAM
- Multi-core CPU + GPU node (x86), 2x AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) with 512GB DRR4 3200Mhz RAM, and 4x NVidia A100 (Ampere, 80GB)
- Grace Hopper node (arm64), 4x Nvidia Grace CPU (SBSA arm64 , 72 Cores, 3.44GHz) with 128 GB DDR5 6400Mhz RAM, and 4x Nvidia H200 (Hopper, 96GB)
Networking
Merlin 6 uses Infiniband and is based on EDR (100Gbps) technology for MPI communication as well as for storage access. Infiniband bandwidth between chassis provide up to 1200Gbps
Hardware
Service | Hardware | ||
---|---|---|---|
Solution | Blade | Description | |
Computing nodes | 4 x HPE Apollo k6000 Chassis | (24 blades per chassis) | 72 x Two Intel® Xeon® Gold 6152 Scalable Processor @ 2.10GHz (2 x 22 cores per node, HT-enabled, 384GB RAM, NVMe /scratch) 24 x Two Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz (2 x 24 cores per node, 18 x 768GB + 6 x 384GB RAM, NVMe / scratch) HPC Network based on Dual Port Infiniband ConnectX-5 EDR (1 x 100Gbps). Standard network 1 x 10Gbps. |
Login nodes | Single Blade
| 2 x HPE Proliant DL380 Gen10 | Two Intel® Xeon® Gold 6152 Scalable Processor @ 2.10GHz (2 x 22 cores, HT-enabled, 384GB RAM, NVMe /scratch) HPC Network based on Dual Port Infiniband ConnectX-5 EDR (2 x 100Gbps). Standard network 2 x 10Gbps.
|
Networking
Merlin 5 network is Infiniband is based on QDR (40Gbps) and FDR (56Gbps) technologies for MPI communication as well as for storage access. Merlin5 is connected to Merlin6 through FDR (56 Gbps) (MPI)
Hardware
Service | Hardware | ||
---|---|---|---|
Solution | Blade | Description | |
Computing nodes | Chassis | (16 blades per chassis) | Two Intel® Xeon® Processor E5-2670 @ 2.60GHz (2 x 8 cores, no-HT, 64GB RAM, SAS /scratch) HPC Network based on Single Port Infiniband ConnectX-3 QDR (1 x 40Gbps). Standard network 1Gbps. |
Login nodes | Single Blade
| 1 x HPE Proliant DL380 Gen9 | Two Intel(R) Xeon(R) CPU E5-2697A v4@ 2.60GHz (2 x 16 cores, HT-enabled, 512GB RAM, SAS /scratch) HPC Network based on Dual Port Infiniband ConnectIB FDR (1 x 56Gbps). Standard network 1 x 1Gbps. |
Service | Hardware | ||
---|---|---|---|
Solution | Blade | Description | |
Storage nodes | Lenovo Distributed Storage Solution for IBM Spectrum Scale 1 x Lenovo DSS G240 Building block | 1 x ThinkSystem SR630 (Mgmt node) 2 x ThinkSystem SR650 (IO nodes) | ThinkSystem SR630: Two Intel(R) Xeon(R) Gold 5118 Scalable Processor @ 2.30Ghz (2 x 12 cores, HT-enabled, 96GB RAM
Building block 1:
|