Sprungmarken

Die letzten Meldungen

Auto-Antworten für OTRS

23. Mai 2013

Sie sind ein Ärgerniss: Auto-Antworten, in denen mir jemand mitteilt, dass meine Antwort aus OTRS heraus erst nach dem Urlaub – wie lange er auch immer dauern möge – gelesen werden kann.
Weiterlesen...

Ankündigung einer Netzwartung: zentrale Serverbereiche RRZE am 20.06.2013

22. Mai 2013

Zeitraum: Donnerstag, 20.06.2013, 17:00 – 20:00 Uhr
Weiterlesen...

RRZE-Betrieb am „Berch“-Dienstag

17. Mai 2013

Am Dienstag, den 21.05.2013, wird das RRZE ab 12 Uhr geschlossen.
Weiterlesen...

Meldungen nach Thema

 

TinyGPU Cluster

The RRZE's TinyGPU cluster is an experimental cluster for developing and benchmarking applications using GPUs as accelerators.

  • 8 compute nodes, each with two Xeon 5550 "Nehalem" chips (8 cores + SMT) running at 2.66 GHz with 8 MB Shared Cache per chip, 24 GB of RAM (DDR3-1333) and 200 GB of local scratch disk; Two NVIDIA Tesla M1060 GPU Boards in every node

  • 1 compute node with two Xeon 5650 "Westmere" chips (12 cores + SMT) running at 2.66 GHz with 12 MB Shared Cache per chip, 48 GB of RAM (DDR3-1333) and 500 GB of local scratch disk; Two NVIDIA Tesla C2070 GPU Boards (plus two varying other GPUs)

  • Infiniband interconnect fabric with 20 GBit/s bandwith per link and direction

Jobs with less than one node are currently not supported by RRZE and are subject to be killed without notice. Thus, always use ppn=16 in the node specification for qsub.

This website shows information regarding the following topics:

Access, User Environment, and File Systems

Access to the machine

Access to TinyGPU is through the Woody Frontends. So, connect to

woody.rrze.uni-erlangen.de

and you will be randomly routed to one of the frontends for Woody, as there are no extra frontends for TinyGPU. See the documentation for the Woodcrest cluster for information about these frontends. Although the TinyGPU compute nodes actually run Ubuntu LTS, the environment is compatible. Programs compiled for Woody will just run on Tinygpu as well. In most cases, you even can compile CUDA programs on the Woody frontends (after loading the cuda module), although no GPU hardware is available there. In case of problems, try to compile your GPU programs on one of the TinyGPU compute nodes (e.g. within an interactive job).

For submitting Jobs, you will have to use the command qsub.tinygpu instead of the normal qsub.

In general, the documentation for Woody applies. This page will only list the differences to Woody.

File Systems

Parallel file system $FASTTMP

The parallel filesystem $FASTTMP in /wsfs is currently not available on TinyGPU.

Node-local storage $TMPDIR

Each node has at least 200 GB of local hard drive capacity for temporary files (instead of the 130 Woody has) available under /tmp/ (also accessible via /scratch/).

Compiling and running CUDA codes

Unfortunately, due to the experimental nature of this cluster, the proper way for doing this is still in the flow. Please contact hpc-support if you need assistance. However, in many cases you will find most of the required information by looking at the (default) cuda module (e.g. module show cuda).

Batch Processing

The batch system works just like on Woody, the few notable differences are:

  • The command for job submission is qsub.tinygpu instead of just qsub.
  • The compute nodes do not have 4 cores like Woody, but 8 physical cores plus 8 SMT cores. This means that the operating system will see 16 cores. In the moment, you have to generally request ppn=16 (or ppn=24 for the fermi queue) even if you only need less cores and independent of the number of GPUs used per node. A different mechanism may be established in the future, thus, check this documentation regularly for updates.
  • If you want to get the node with the C2070 GPUs (tg010), you have to submit your job to the queue "fermi", i.e. use qsub -q fermi ....
  • With the Nehalem, Intel has reintroduced the concept of Hyper Threading, although they now call it Simultaneous multithreading (SMT) and it actually is useful for some applications this time. You should test if your application runs better or worse with SMT. To run a job without using SMT, your still have to request all 16 cores of a node (see previous paragraph!), and then restrict your program to only the 8 "real" of them. The "real" cores on TinyGPU are the ones numbered 0-7. Core numbers 0-3 are the first physical socket, 4-7 the second; 8-15 are the corresponding virtual cores created by SMT. If you use mpirun, you can just use the parameters -npernode 8 -pin "0 1 2 3 4 5 6 7" to restrict your program to the right cores.

Further Information

Intel Xeon 5550 "Nehalem" Processor

The Externer Link:  Xeon 5550 processor implements Intel's Nehalem microarchitecture and is a dual-core chip running at 2.66 GHz. The most significant improvements compared to the Core 2 based chips (as used, e.g., in our Woodcrest cluster) have been made to the memory interface, and they can dynamically overclock themselves as long as they stay within their thermal envelope.

The memory interface controllers are now no longer in the chipset, but integrated into the CPU, a concept that is familiar from the Opteron CPUs of Intels competitor AMD. Intel has however decided to go the whole hog: Each CPU has no less than three independant memory channels, which leads to a vastly improved memory bandwidth compared to Core 2 based CPUs like the Woodcrest. Please note that this improvement really only applies to the memory interface. Applications that run mostly from the cache do not run better on Nehalem than on Woodcrest.

The physical CPU sockets are coupled with something called QPI. As the memory is now attached directly to the CPUs, accesses to the Memory of the other socket have to go through QPI and the other processor, so they are more expensive and slower. In other words, the Nehalems are CC-NUMA machines.

InfiniBand Interconnect Fabric

The InfiniBand network on TinyGPU is a double data rate (DDR) network, i.e. the links run at 20 GBit/s in each direction. All 8 nodes are connected to a small DDR switch and can thus communicate with each other fully non blocking.

Letzte Änderung: 13. Maerz 2012, Historie

zum Seitenanfang

Startseite | Kontakt | Impressum

RRZE - Regionales RechenZentrum Erlangen, Martensstraße 1, D-91058 Erlangen | Tel.: +49 9131 8527031 | Fax: +49 9131 302941

Inhaltenavigation

Zielgruppennavigation

  1. Studierende
  2. Beschäftigte
  3. Einrichtungen
  4. IT-Beauftragte
  5. Presse & Öffentlichkeit