Die letzten Meldungen

Umstellung auf DNSSEC

26. August 2016

Am 20. September 2016 wird das RRZE die sogenannte DNSSEC Erweiterung (Domain Name System Security Extensions) für die beiden Hauptdomains der Universität, uni-erlangen.de und fau.de, aktivieren. Ausfälle werden nicht erwartet. Es könnte jedoch zu Komplikationen mit selbst betriebenen Nameservern und damit auch Windows Domain Controllern kommen. Außerdem wird es ein paar weitere Änderungen im DNS geben, die unbedingt beachtet werden müssen!
Weiterlesen...

Abschaltung Novell-Server Memory am 01.10.2016

16. August 2016

Der Novell-Server Memory (memory.rrze.uni-erlangen.de) wird am 01.10.2016 außer Betrieb genommen wird.
Weiterlesen...

Wartungsankündigung für die FAU Zeiterfassung (12.08. – 15.08.2016)

11. August 2016

Wartungsankündigung für die FAU Zeiterfassung
Weiterlesen...

Meldungen nach Thema

 

HPC Storage

Photograph of the RRZE HPC storage servers and disk arrays

This website shows information regarding the following topics:

Short description

Since the disk space requirements of the HPC users exceed those of all other users by a significant factor, the RRZE provides a centralized HPC storage system for its HPC customers. The system is based on Externer Link:  IBM hard- and software (GPFS, TSM) and took up operation in August 2009.

The system serves two functions: It houses the normal home directories of all HPC users, and it provides tape-backed mid- to longterm storage for users data

Technical data

This section contains technical data. You can safely skip it if you're not interested.

Hardware

  • 6 file servers, IBM X3650 7979-B3G, 16 GB RAM, 10 GB Ethernet
  • 1 console node, IBM X3650 7979-B3G, 16 GB RAM, 10 GB Ethernet
  • 1 TSM server, IBM X3650 7979-B3G, 8 GB RAM, 10 GB Ethernet
  • IBM TS3500 tape library with currently
    • 6 LTO4 tape drives and 2 expansion frames
    • 2913 LTO4 tape slots
    • >1500 LTO4 tapes
  • 3 IBM DS3500
    • plus 12 IBM EXP3000 expansion units (4 per DS3500)
    • redundant controllers
    • 180 SAS 600 GB 15 rpm drives for data
    • Usable data capacity: 66 TB for vault, 5 TB for homes
  • 1 IBM DS3400 and 1 IBM DS3500 for TSM

Software and Features

  • NFS and CIFS export
  • Automatic failover: If one node fails, other nodes automatically take over the IP addresses. Clients will not notice anything except a short delay.
  • Hierarchical Storage Management (HSM)
  • Snapshots

Home directories

The storage system houses the Home directories of the HPC users. These directories are available under the path /home/hpc/GROUPNAME/USERNAME on all RRZE Unix/Linux systems. The home directory is the directory, in which you are placed right after login, and where most programs try to save settings and similar things. When this directory is unavailable, most programs will stop working or show really strange behaviour - which is why we tried to make the system highly redundant.

The home directory is protected by snapshots, and additionally by regular backups. The directory is intended for really important data, e.g. important self-written source code or unrecoverable input files.

Each user gets a standard quota of 10 Gigabytes for the home. Quota extensions are possible, but will be handled very restrictively.

Vault

Photograph of the RRZE HPC storage tape library

The storage system also houses a large, HSM backed archive section called "vault". Each HPC user has a directory there that is available under the path /home/vault/GROUPNAME/USERNAME on all RRZE Unix/Linux systems.

HSM stands for hierarchical storage management and means that data is transparently moved between different storage media types without the need for user intervention. In our case, this means the online storage on fast SAS hard disk arrays and the offline pool on tapes in our tape robot. When you put a file in the archive section, it will naturally go to the disks first. If you do not use this file for some time, it will at some point in time be moved to a tape in the tape robot - or actually to two tapes, for redundancy. This however is fully transparent to you: Even when the file has been moved, you will still see it in your directory. When you access the file, the system will automatically tell the tape robot to fetch the tape and copy the file back to the hard disks. This might take a few minutes, but other than waiting there is no user interaction required.

There is a limit (quota) for the space used by an user on the online pool, i.e. the rotating hard disks.

It is planned to provide commands for users that allow them to manually trigger migration of their files (either to tape, or back from tape). These commands however are not available yet.

Snapshots

Snapshots work mostly like the name suggests. In certain intervals, the filesystem takes an "snapshot", which is an exact read-only copy of the contents of the whole filesystem at one moment in time. In a way, a snapshot is similar to a backup, but with one great restriction: As the "backup" is stored on the exact same filesystem, this is no protection against desasters - if for some reason the filesystem fails, all snapshots will be gone as well. Snapshots do however provide a great protection against user errors, which has always been the number one cause of data loss on the RRZE HPC systems. Users can restore Important files that have been deleted or overwritten from an earlier snapshot.

Snapshots are stored in a hidden directory .snapshots. Please note that this directory is more hidden than usual: It will not even show up on ls -a, it will only appear when it is explicitly requested.

This is best explained by an example: Lets assume you have a file important.txt in your homedirectory /home/hpc/exam/example1 that you have been working on for months. You accidently delete that file. Thanks to snapshots, you should be able to recover most of the file, and "only" lose the last few hours of work. If you do a ls -l /home/hpc/exam/example1/.snapshots/, you should see something like this:

ls -l /home/hpc/exam/example1/.snapshots/
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.16-03.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.17-03.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.18-03.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.19-03.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.19-21.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.19-23.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.20-01.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.20-03.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-16 14:07 @GMT-2009.03.20-04.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 04:07 @GMT-2009.03.20-04.30.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 04:07 @GMT-2009.03.20-05.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 04:07 @GMT-2009.03.20-05.30.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 04:07 @GMT-2009.03.20-06.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 04:07 @GMT-2009.03.20-06.30.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 04:07 @GMT-2009.03.20-07.00.00
drwxr-xr-x 2 example1 exam 8192 2009-03-20 07:07 @GMT-2009.03.20-07.30.00

Each of these directories contains an exact read-only copy of your home directory at the time that is given in the name. To restore the file in the state as it was at 7:00 UTC on the 20th of March, you can just copy it from there to your current work directory again: cp '/home/hpc/exam/example1/.snapshots/@GMT-2009.03.20-07.00.00/important.txt' '/home/hpc/exam/example1/important.txt'

Snapshots are enabled on both the homedirectories and vault section, but they are made much more often on the home directories than on vault. Please note that the exact snapshot intervals and the number of snapshots retained may change at any time - you should not rely on the existance of a specific snapshot. Also note that any times given are in GMT / UTC. That means that, depending on whether daylights saving time is active or not, the 03:00 UTC works out to either 05:00 or 04:00 german time. At the time of this writing, snapshots were configured as follows:

Snapshot settings on home section (/home/hpc)
Intervalx Copies retained= covered timespan
30 minutes (every half and full hour)63 hours
2 hours (every odd-numbered hour - 01:00, 03:00, 05:00, ...)121 day
1 day (at 03:00)71 week
1 week (Sundays at 03:00)44 weeks
Snapshot settings on vault section (/home/vault)
Intervalx Copies retained= covered timespan
1 day (at 03:00)71 week
1 week (Sundays at 03:00)44 weeks

Advanced Topics

Limitations on number of files

Please note that having a large number of small files is pretty bad for the filesystem performance. This is actually true for almost any filesystem and certainly for all RRZE fileservers, but it is a bit tougher for this storage system due to the underlying parallel filesystem, the snapshots and the hierarchical storage management. We have therefore set a limit on the number of files a user is allowed. That limit is set rather high for the home section, so that you are unlikely to hit it unless you try to, because small files are part of the intended usage there. It is however set rather tight on the vault section, especially compared to the huge amount of space available there. Note that for every file, a small (1 MB) stub is kept on the disks even if the rest of the file is migrated to tape, meaning that even migrated files take up some disk space. It also means that files that are smaller than the stub size are never written to tape because that would not make sense.

If you have a large number of small files in the vault section that you do not intend to use for a long time, please put them into an archive (tar, zip, etc.).

Access Control Lists (ACLs)

Besides the normal Unix permissions that you set with chmod (where you can set permissions for the owning user, the owning group, and everyone else), the system also supports more advanced ACLs.

However, they are not done in the traditional (and non-standardized) way with setfacl / getfacl that users of Linux or Solaris might be familiar with, but in the new standardized way that NFS version 4 uses. This has both advantages and disadvantages. One advantage ist that the way these ACLs work is practically compatible with what Windows does, meaning that you could set them from a windows client through the usual explorer interface. The major disadvantages are that they are unnecessarily complex and their support on the linux side is far from perfect yet - not only because NFS version 4 is not in a usable state yet. That leads to a few restrictions that we will cover in the next section.

The ACLs can only be edited and viewed from clients that access the filesystem through a native GPFS client or CIFS, not from NFS clients. As there are no native GPFS clients available to normal users, the only way to edit them currently is CIFS. Please take note that this restriction only applies to the setting and display of ACLs, not their effectiveness. As access permissions are checked by the filesystem servers and not the clients, any ACL that is set will affect any client, even if the client machine has no idea that there is an ACL.

Letzte Änderung: 1. Februar 2013, Historie

zum Seitenanfang

Startseite | Kontakt | Impressum

RRZE - Regionales RechenZentrum Erlangen, Martensstraße 1, D-91058 Erlangen | Tel.: +49 9131 8527031 | Fax: +49 9131 302941

Zielgruppennavigation

  1. Studierende
  2. Beschäftigte
  3. Einrichtungen
  4. IT-Beauftragte
  5. Presse & Öffentlichkeit