File system and storage

File systems and storage

[ Overview | AFS (home directory) | PFS | Project storage | /scratch | SweStore ]

Overview

  AFS (~/) AFS (~/Public) Project storage PFS /scratch
Recommended for batch jobs No   Yes Yes  
Backed up Yes Yes No No No
Accessible by batch system No Yes, readable with right permissions Yes Yes Yes (node only)
High performance No No Yes Yes Medium
Default readability Owner World readable. Group only Everyone on cluster Owner
Permissions chmod, chgrp, ACL Cannot be changed chmod, chgrp, ACL chmod, chgrp, ACL chmod, chgrp, ACL
Notes Your homedirectory is on AFS Your homedirectory is on AFS This is the storage your group gets allocated through the storage projects.   Per node

filesystem_v5.png

AFS - Andrew File System (home directory)

Your home-directory (ie. the directory pointed to by the $HOME variable) is placed on an AFS file system. This file system is backed up regularly.

Note that since ticket-forwarding to batch jobs does not work, the only AFS-access possible from batch jobs are to read files from your Public-directory which is world-wide readable (yes, the entire world). Use the parallel file system 'pfs' for data management in conjunction with batch jobs.

This generally means you should setup and run your jobs from directories somewhere in your project storage or on /pfs/nobackup/, not your home directory!

To find the path to your home directory, either run pwd just after logging in, or

$ cd
$ pwd
/home/u/username
$ 

It is not generally possible to get more space on your AFS. You should generally use project storage instead. If you need more of that, the PI in your project should apply for it.
However, if you really need more space in your AFS home directory, have your PI contact support@hpc2n.umu.se and include a very good explanation of what you need the extra space for.

See AFS at HPC2N for further explanation of AFS.

PFS ('parallel') File System

There is a parallel file system (PFS) available on all clusters.

Apart from your usual home directory you also have file space in the parallel file system. This file system is set up in "parallel" to the usual home tree, but starting from /pfs/nobackup instead. To more easily access this directory, you could create a soft link from your home directory to your corresponding home on the parallel file system (as suggested in the picture above).

Your home directory on the parallel file system is useful, since batch jobs can create and access files there without any Kerberos ticket or manipulations with permissions. Moreover the parallel file system offers high performance when accessed from the nodes making it suitable for storage that are to be accessed from parallel jobs.

Note that HPC2N has moved to project storage, and that means that user's pfs in general will be very small (25GB as default).

Note that the parallel file system is not intended for permanent storage and there is NO BACKUP of /pfs/nobackup. In case the file system gets full, files that have been unused for some time might get deleted without warning.

Quota on pfs

In order to avoid having runaway programs filling the file system quota limits are in place. Use the quota command to view your current quotas.

There are actually 4 quota limits for pfs. Soft and hard limit for disk usage and soft and hard limit for the number of files. The hard limits are really hard limits. You can never go above them. You can be above the soft limit for a grace period, but after the grace period the soft limit will behave as a hard limit until you have gone below the soft limit again.

The default limit in your pfs is 25 GB. If your limit is too small, your PI needs to apply for project storage.

Project storage

Project storage is where a project's members have the majority of their storage. It is applied for through SUPR, as a storage project. While storage projects needs to be applied for separately, they are usually linked to a compute project.

Since batch jobs can create files in the project storage space without any Kerberos ticket or manipulations with permissions, this is where you should keep your data and run your batch jobs from. Moreover, it offers high performance when accessed from the nodes making it suitable for storage that are to be accessed from parallel jobs.

Note that the project storage is not intended for permanent storage and there is NO BACKUP of /proj/nobackup.

Project storage is located below /proj/nobackup/ in the directory name selected during the creation of the proposal. In the picture above, the project id snicXXXX-YY-ZZ has been used as an example.

Quota

The size of the storage depends on the allocation. There are small, medium, and large storage projects, each with their own requirements. You can read about this on SUPR. The quota limits are specific for the project as such, there are no user level quotas on that space.

There are actually 4 quota limits for the project storage space. Soft and hard limit for disk usage and soft and hard limit for the number of files. The hard limits are really hard limits. You can never go above them. You can be above the soft limit for a grace period, but after the grace period the soft limit will behave as a hard limit until you have gone below the soft limit again.

Misc

It is recommended to use the project's storage directory for the projects data. Layout structure in that project directory is the responsibility of the project itself.

NOTE: For the PI, make sure to add any user in SUPR that should be granted access to the storage space to the storage project.

NOTE: The storage project PI can link one or several compute projects to the storage project, thereby allowing users in the compute project access to the storage project without the PI having to explicitly handle access to the storage project.

NOTE: For those who has previously had their storage under their pfs directory, there are a few things to notice:

  • After a grace period of two months we will reduce the quota on /pfs/nobackup/home of all users that have access to project storage to 25GB (the default for new users).
  • Since the new storage directory is not on the same physical file system as the normal /pfs/nobackup/home data, transferring the data will take some time.

/scratch

On some of the computers at HPC2N there is a directory called /scratch. It is a local disc area, usually pretty fast and big. It is intended for saving (temporary) files you create or need during your computations. Please do not save files in /scratch you don't need when not running jobs on the machine, and please make sure your job removes any temporary files it creates.

When anybody need more space than available on /scratch, we will remove the oldest/largest files without any notices.

There is NO backup of /scratch.

The size of /scratch depends on the type of nodes:

Abisko (all nodes): 352 GB
Kebnekaise, standard compute nodes: 171 GB
Kebnekaise, GPU nodes: 171 GB
Kebnekaise, Largemem nodes: 352 GB (a few have 391 GB)

SweStore - Nationally Accessible Storage

For data archiving and long-term storage we recommend our users to use the SweStore Nationally Accessible Storage. This is a robust, flexible and expandable long term storage system aimed at storing large amounts of data produced by various Swedish research projects.

For more information, see the SNIC documentation for SweStore available at http://docs.snic.se/wiki/SweStore

Updated: 2020-09-14, 15:03