Die letzten Meldungen

Behoben: Störung beim Zugriff auf CMS-Instanzen

30. November 2016

Derzeit kommt es beim Zugriff auf CMS-Instanzen zu einem Umleitungsfehler.
Weiterlesen...

Wartungsankündigung für FAUcard am 24.11.2016, 13:00-15:00 Uhr

21. November 2016

am kommenden Donnerstag, den 24. November wird von 13 bis 15 Uhr eine Datenbankwartung an der Datenbank FAUcard durchgeführt. In dieser Zeit können weder Karten exportiert noch lokal gedruckt werden und die Validierungsautomaten können nicht genutzt werden.
Weiterlesen...

Warnung vor Phishing-Mails mit dem Betreff „E-Mail-Benachrichtigung!“

17. November 2016

Aus aktuellem Anlass bitten wir alle E-Mail-Nutzer darauf zu achten, Phishing-Mails mit dem Betreff ‚E-Mail-Benachrichtigung!‚ zu ignorieren.
Weiterlesen...

Meldungen nach Thema

 

Emmy Cluster

Photograph of the RRZE Emmy Cluster

The RRZE's Emmy cluster (Externer Link:  NEC) is a high-performance compute resource with high speed interconnect. It is intended for distributed-memory (MPI) or hybrid parallel programs with medium to high communication requirements.

  • 560 compute nodes, each with two Xeon 2660v2 "Ivy Bridge" chips (10 cores per chip + SMT) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM
  • 2 frontend nodes with the same CPUs as the nodes.
  • 16 Intel Xeon Phi coprocessors and 16 Nvidia K20 GPGPUs spread over 16 compute nodes
  • parallel filesystem (LXFS) with capacity of 400 TB and an aggregated parallel I/O bandwidth of > 7000 MB/s
  • Infiniband interconnect fabric with 40 GBit/s bandwith per link and direction
  • Overall peak performance of ca. 234 TFlop/s (191 TFlop/s LINPACK, using only the CPUs).

The Emmy cluster is named after famous mathematician Emmy Noether who was born here in Erlangen.

Photograph of the RRZE Emmy Cluster

Emmy is a system that is designed for running parallel programs using significantly more than one node. Jobs with less than one node are not supported by RRZE and are subject to be killed without notice.

This website shows information regarding the following topics:

Access, User Environment, and File Systems

Access to the machine

Users can connect to

emmy.rrze.fau.de

and will be randomly routed to one of the two frontends. All systems in the cluster, including the frontends, have private IP addresses in the 10.28.8.0/22 range. Thus they can only be accessed directly from within the FAU networks. If you need access from outside of FAU, you have to connect for example to the dialog server cshpc.rrze.fau.de first and then ssh to emmy from there. While it is possible to ssh directly to a compute node, a user is only allowed to do this while they have a batch job running there. When all batch jobs of a user on a node have ended, all of their processes, including any open shells, will be killed automatically.

The login and compute nodes run CentOS (which is basically Redhat Enterprise without the support). As on most other RRZE HPC systems, a modules environment is provided to facilitate access to software packages. Type "module avail" to get a list of available packages.

The shell for all users on Emmy is always bash. This is different from our other clusters and the rest of RRZE, where the shell used to be tcsh unless you had requested it to be changed.

File Systems

The following table summarizes the available file systems and their features. It is only an excerpt from the main file system table in the HPC environment description.

File system overview for the Emmy cluster
Mount point Access via Purpose Technology, size Backup Data lifetime Quota
/home/hpc $HOME Storage of source, input and important results NFS on central servers, small YES + Snapshots Account lifetime YES (restrictive)
/home/vault Mid- to Longterm storage central servers, HSM YES + Snapshots Account lifetime YES
/home/woody $WOODYHOME Short- to Midimterm storage or small files central NFS server NO Account lifetime YES
/elxfs $FASTTMP High performance parallel I/O; short-term storage LXFS (Lustre) parallel file system via InfiniBand, 400 TB NO High watermark deletion NO

Please note the following differences to our older clusters:

  • There is no cluster local NFS server like on previous clusters (e.g. /home/woody)
  • The nodes do not have any local hard disc drives like on previous clusters. Exception: The GPU nodes.
  • /tmp lies in RAM, so it is absolutely NOT possible to store more than a few MB of data there

NFS file system $HOME

When connecting to one of the front end nodes, you'll find yourself in your regular RRZE $HOME directory (/home/hpc/...). There are relatively tight quotas there, so it will most probably be too small for the inputs/outputs of your jobs. It however does offer a lot of nice features, like fine grained snapshots, so use it for "important" stuff, e.g. your jobscripts, or the source code of the program you're working on. See the HPC storage page for a more detailed description of the features.

Parallel file system $FASTTMP

The cluster's parallel file system is mounted on all nodes under /elxfs/$GROUP/$USER/ and available via the $FASTTMP environment variable. It supports parallel I/O using the MPI-I/O functions and can be accessed with an aggregate bandwidth of >7000 MBytes/sec (and even much larger if caching effects can be used).

The parallel file system is strictly intended to be a high-performance short-term storage, so a high watermark deletion algorithm is employed: When the filling of the file system exceeds a certain limit (e.g. 80%), files will be deleted starting with the oldest and largest files until a filling of less than 60% is reached. Be aware that the normal tar -x command preserves the modification time of the original file instead of the time when the archive is unpacked. So unpacked files may become one of the first candidates for deletion. Use tar -mx or touch in combination with find to work around this. Be aware that the exact time of deletion is unpredictable.

Note that parallel filesystems generally are not made for handling large amounts of small files. This is by design: Parallel filesystems achieve their amazing speed by writing to multiple different servers at the same time. However, they do that in blocks, in our case 1 MB. That means that for a file that is smaller than 1 MB, only one server will ever be used, so the parallel filesystem can never be faster than a traditional NFS server - on the contrary: due to larger overhead, it will generally be slower. They can only show their strengths with files that are at least a few megabytes in size, and excel if very large files are written by many nodes simultanously (e.g. checkpointing). For that reason, we have set a limit on the number of files you can store there.

Batch processing

As with all production clusters at RRZE, resources are controlled through a batch system. The frontends can be used for compiling and very short serial testruns, but everything else has to go through the batch system to the cluster.

Please see the batch system description in our HPC environment description.

The following queues are available on this cluster:

Queues on the Emmy cluster
Queue min - max walltime min - max nodes Availablility Comments
routeN/AN/Aall usersDefault router queue; sorts jobs into execution queues
devel0 - 01:00:001 - 8all usersSome nodes reserved for queue during working hours
work01:00:01 - 24:00:001 - 64all users"Workhorse"
big 01:00:01 - 24:00:00 1 - 560 special users Not active all the time as it causes quite some waste. Users can get access for benchmarking or after proving they can really make use of more than 64 nodes with their codes.
special0 - infinity1 - allspecial usersDirect job submit with -q special

As full nodes have to be requested, you always need to specify -l nodes=<nnn>:ppn=40 on qsub.

All nodes have properties that you can use to request nodes of a certain type. This is mostly needed to request one of the GPU nodes. You request nodes with a certain property by appending :property to your request, e.g. -l nodes=<nnn>:ppn=40:ddr1600. The following properties are available:

Node properties on Emmy
PropertyDescription
:ddr1600Nodes that have DDR3-1600 memory modules. 544 nodes qualify
:ddr1866Nodes that have DDR3-1866 memory modules. 16 nodes qualify
:k20mnodes with one or two NVidia Keppler cards. 10 nodes qualify
:k20m1xnodes with one NVidia Keppler card. 4 nodes qualify
:k20m2xnodes with two NVidia Keppler cards. 6 nodes qualify
:phinodes with one or two Xeon Phi "MIC". 10 nodes qualify
:phi1xnodes with one Xeon Phi. 4 nodes qualify
:phi2xnodes with two Xeon Phi. 6 nodes qualify

Properties can also be used to request a certain CPU clock frequency. This is not something you will usually want to do, but it can be used for certain kinds of benchmarking. Note that you cannot make the CPUs go any faster, only slower, as the default already is the turbo mode, which makes the CPU clock as fast as it can (up to 2.6 GHz) without exceeding its thermal or power budget. So please do not use any of the following options unless you know what you're doing. The available options are: :noturbo to disable Turbo Mode, :f2.2 to request 2.2 GHz (this is equivalent to :noturbo), :f2.1 to request 2.1 GHz, and so on in 0.1 GHz steps down to :f1.2 to request 1.2 GHz.

MPI

  • Do NOT use Intels mpiexec.hydra, it is severely broken; see http://blogs.fau.de/zeiser/2013/11/28/perhost-option-in-intels-mpiexec-hydra-is-broken/. Typical users should go with RRZE's mpirun_rrze.
  • If an intelmpi module is loaded, mpirun_rrze is available in the search path. It learned a new pinning option for pure-MPI binaries: -pinexpr EXPR where EPXR can be any syntax likwid-pin understands, e.g. S0:0-9@S1:0-9 to select ten cores on socket 0 and socket 1.
  • Intel MPI is recommended, but OpenMPI is available, too.

Further Information

Intel Xeon E5-2660 v2 "Ivy Bridge" Processor

Intels ark lists some technical details about the Externer Link:  Xeon E5-2660 v2 processor.

InfiniBand Interconnect Fabric

The InfiniBand network on Emmy is a quad data rate (QDR) network, i.e. the links run at 40 GBit/s in each direction. This is identical to the network on LiMa. The network is fully non blocking, i.e. the backbone is capable of handling the maximum amount of traffic coming in through the client ports without any congestion. However, due to the fact that InfiniBand still uses static routing, i.e. once a route is established between two nodes it doesn't change even if the load on the backbone links changes, it is possible to generate traffic patterns that will cause congestion on individual links. This is however not likely to happen on normal user jobs.

Letzte Änderung: 10. Dezember 2014, Historie

zum Seitenanfang

Startseite | Kontakt | Impressum

RRZE - Regionales RechenZentrum Erlangen, Martensstraße 1, D-91058 Erlangen | Tel.: +49 9131 8527031 | Fax: +49 9131 302941

Zielgruppennavigation

  1. Studierende
  2. Beschäftigte
  3. Einrichtungen
  4. IT-Beauftragte
  5. Presse & Öffentlichkeit