Setup a distributed filesystem
The cluster has 3 nodes specifically configured for storage, in particular MDS and OSS-0 and OSS-1, which correspond to the names of the MetaData Service and Object Storage Service needed by Lustre.
Based on the documentation, the OSS nodes have 4 disks of 2TB each, a total of 16 TB of storage, which currently is completely disregarded. The current setup uses a single 1TB disk in the login node served via NFS to the compute nodes, which is almost full. Also the storage is served via the Ethernet port (1Gbit/s), and using the OmniPath network may be a better idea.
Lustre and Ceph seem to be appropriate candidates. However, Lustre seems to be incompatible with the latest kernel version.
- Contact Ramón Nou to erase the disks in the MDS, OSS1 and OSS2 nodes (currently used by their Lustre installation).
- Take control over mds01
- Install nixos in one of the disks
- Test Ceph
- Mount the ceph FS in the other nodes