OS Upgrade of HPC Cluster Emmy from CentOS 6.x to 7.x (finished)

As already announced at the HPC Campustreffen on June 30th, the operating system of Emmy (and a little bit later also of LiMa) will be upgraded from CentOS 6.x to CentOS 7.x. This OS upgrade is a major change and will provide more modern versions of software included in the Liux distribution.

The OS upgrade will require quite some downtime as it basically means reinstalling the complete system from scratch.With the OS upgrade we will also remove or upgrade the (default) versions some of the software provided via modules. The SSH-RSA key of the login nodes will not change, but new/additional SSH-ECDSA and SSH-ED25519 keys may become available.

This article will no longer be updated (last update: Wednesday July 27th) and provides the latest information, both, on time schedule and status as well as major software changes.

The schedule for Emmy is as follows:

  • Tuesday July 12th starting at 08:30 in the morning

    • a system reservation will make sure that all running jobs have completed and the cluster is empty and no new jobs will start
    • we will try to keep at least one login node up and running most of the time during the OS upgrade to enable access to files on the parallel file system
      • software provided by modules may not be available
      • reboots of the login nodes have to be expected at any time
      • the available login node may either be CentOS 6.x (n/a) or CentOS 7.2 (emmy1; emmy2)
      • new jobs can be submitted – although there are (as of 2016.07.13, 19:00) still some configuration glitches. Do not expect jobids to keep the „.rrze.uni-erlangen.de“.
  • Thursday July 14th evening: return of operation of the CPU-only nodes in beta mode
    • some jobs from the devel queue have been manually started on Tue/Wed evening; unfortunately, many of them did not finish correctly due to diverse reasons. Sorry for that.
    • general batch processing has been resumed on the CPU-only nodes on Thursday July 14th evening at 22 o’clock. See the MOTD on the login nodes for known issues.
  • until end of July: improvement of the configuration and mitigation of glitches
  • until end of July: return of operation of the accelerator nodes (GPGPU and/or Xeon Phi) (both are available again since Tuseday, July 26th)

Software changes on Emmy

  • most deprecated and testing modules will be removed TODO
  • ansys/17.0 will be removed (marked as deprecated); use ansys/17.1 instead
    ansys/15.0.7 and ansys/16.2 might be removed as they are not supported on CentOS/RHEL 7.x (not tested yet)
  • gromacs/4.6.4-mkl-IVB will be removed (marked as deprecated)
  • star-ccm+/8.06.005, star-ccm+/8.06.005-r8,  star-ccm+/9.06.011-r8, star-ccm+/9.06.011, star-ccm+/10.02.010, star-ccm+/10.04.009 will be removed (marked as deprecated);
    STAR-CCM+ v11.04 will be made available (done)
  • tecplot/360-2013 and tecplot/360-2014 will be removed (marked as deprecated)
  • cuda/5.0 will be removed (marked as deprecated);
    software depending on cuda/5.0 will be removed (e.g. amber-gpu/12p19-at13p16-intel13.1-intelmpi4.1-cuda5.0, gromacs/5.0.4-mkl-IVB-CUDA50) TODO
  • the new system gcc will be 4.8.5; thus, at least gcc/4.8.1 will be removed (marked as deprecated);
    other gcc modules might be removed or replaced by their latest updates; new gcc modules will no longer support 32-bit compilation  (done)
  • several versions of intel64 will be removed;
    the new defaults of the Intel tools will be: intel64/16.0up03, intelmpi/5.1up03-gnu, intelmpi/5.1up03-intel, mkl/11.3up03 and  itac/9.1up02 (already installed, and defaults updated)
    advanced Intel tools are provided as-is without support by RRZE as intel-advisor/2016up03, intel-inspector/2016up03 and intel-vtune-amplifier/2016up03 in the testing section of the modules. Some components of these tools probably will not work (neither on login nor compute nodes)
  • PGI compiler will be removed (marked as deprecated);
  • likwid/3.1.1 will be removed; a LIKWID 4.1.* module will be provided instead – the recommended module is „likwid/system“ (i.e. the one we also use for system monitoring) unless you really depend on a specific version (done)
  • mpirun_rrze will default to Intel MPI 5.1up03= 5.1.3.210 instead of 4.1.3.048 (done)
  • cmake updated from 2.8.11.2 to 3.6.0 (done)
  • boost updated from 1.54.0 to 1.61.0; modules are available for intel16 and system-gcc (done)
  • openmpi has been updated to the current version 1.10.3 and 1.8.8 (the module name for the latter is 1.08.8 to keep proper sorting in the module avail list); modules are provided for intel16 and system gcc (done)
  • python/2.7-anaconda has been updated/extended; python/3.5-anaconda is provided (done); don’t use the system python if you e.g. need numpy. The module variants will use MKL and thus should be much faster; in addition, these module variantes include more python addon packackes.
  • OFED is no longer available on the Xeon Phi!