Storage Guide ############# .. _storage-on-compute-nodes: Understanding Storage on Compute Nodes ====================================== Storage types based on node accessibility ----------------------------------------- When working on any compute node within the system, there are two types of storage accessible to a node: local storage and shared storage. The former is only accessible by a particular node, while the latter is accessible by all nodes. On every node, the path ``/mnt/local/scratch/$USER`` points to local storage. That means if you have two nodes *N1* and *N2*, anything stored at ``/mnt/local/scratch/$USER`` in *N1* won't be accessible by *N2* and vice-versa. In contrast, the paths ``/work/$USER`` and ``/home/$USER`` point to a location in shared storage and are thus accessible by both *N1* and *N2*. The described storage architecture is depicted in the diagram below: .. image:: images/logical_storage_architecture.png :alt: Logical Storage Architecture Filesystem locations users should understand -------------------------------------------- Scratch ^^^^^^^ - **Location on compute nodes:** ``/mnt/local/scratch/$USER``. - **Shortcut Variable**: ``$SCRATCH`` - **Local to each compute node**: This storage is temporary, and local to a compute node. It is available to a user only when a node is assigned for running jobs - **High performance**: Offers significantly faster read/write speeds compared to home and work storage. - **Limited capacity**: the capacity varies as per the node type and can be anywhere between 480 GB to 30 TB. Please refer to the table on the `Zurada hardware specs `_ for details - **Data retention policy:** ALL DATA IS REMOVED after a job finishes. Work ^^^^^ - **Location in filesystem:** ``/work/$USER``. - **Shortcut Variable**: ``$WORK`` - **Shared across all nodes**: Accessible from any compute node in the system. - **Large capacity**: Designed to store big files, datasets and, in general, input/output files used/produced by a job. - **Slower access**: Due to its shared nature, read/write operations are generally slower than scratch storage. - **Parallel I/O**: Multiple parts of large files can be accessed simultaneously (e.g. MPI-IO). - **Data retention policy:** Files not accessed within 30 days are candidate for deletion. Home ^^^^^ - **Location in filesystem:** ``/home/$USER``. - **Shortcut Variable**: ``$HOME`` or ``~`` - **Shared across all nodes**: Accessible from any compute node in the system. - **Limited capacity**: Hard quota limit of 25GB per user. If you try to write more than 25G, an error will be displayed and any subsequent write operations will be denied. - **Slower access**: Due to its shared nature, read/write operations are generally slower than local scratch storage. - **Data retention policy:** Data is kept and backed up for 7 days. .. note:: Shortcut variables in bash are set upon entry to a Zurada node, so you may utilize them while inside the system (scripts, navigation, etc). If you are using a command (ex. ``scp``, or ``rsync``) from your laptop/desktop utilization of the shortcut variables will not work. Recommended Workflow ==================== A common and efficient workflow for running jobs on the system is: 1. **Prepare Input Data**: Copy necessary input files from your home or work storage to the node's local scratch storage at the start of your job. 2. **Run the Application**: Configure your application to read from and write to the local scratch storage during execution. This takes advantage of its high-speed performance. 3. **Save Results**: Once the job completes, copy the output files back to your work directory for longer term retention, or to home directory if you want them backed up. See Section :ref:`Copying data between home and scratch ` for more information on how to implement this worflow. .. note:: Always ensure that your input and output data will fit within the available space on ``scratch`` storage. If your files exceed this capacity, you may need to adjust your workflow accordingly. .. _storage_copy_data: Copying Data Between Home and Scratch ------------------------------------- To efficiently transfer data between your shared ``home`` storage and node-local ``scratch`` storage, you can use ``pdsh``—a parallel remote shell client that executes commands across multiple nodes simultaneously. The general template for the ``pdsh`` command includes three key components: - ``-R ssh``: Specifies SSH as the remote shell method (always use ``ssh``). - ``-w $SLURM_JOB_NODELIST``: Targets all nodes allocated to your job. - **Remote command**: Executes on each node, leveraging the fact that ``home`` is shared while ``scratch`` is local. .. note:: Please note that while we talk about copying from ``home`` in this section, you may utilize your ``work`` directory to copy to node-local ``scratch`` as well. Using your ``work`` directory might even be prefered given the size limitations of ``home``. Below are common usage patterns: Copy input data from home to scratch on all nodes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash pdsh -R ssh -w $SLURM_JOB_NODELIST "cp -r /home/$USER/input /mnt/local/scratch/$USER/" For example, assume you submitted a batch job requesting 3 nodes and slurm allocated ``cpusm[01-03]`` such that ``cpusm01`` is chosen as the node where the batch script is to be executed from. Then, the ``pdsh`` command above would: #. Create 3 (parallel) ssh sessions from ``cpusm01`` to itself, ``cpusm02`` and ``cpusm03``. #. Within each session, instruct the node to copy the folder ``/home/$USER/input`` to ``/mnt/local/scratch/$USER/`` .. image:: images/pdsh_home_to_scratch.png :width: 900 :alt: Logical Storage Architecture Copy results from scratch to home ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: When copying results back to ``home``, ensure unique filenames or directories to prevent nodes from overwriting each other's output. The commands below use the node's hostname as a suffix to avoid conflicts. .. code-block:: bash # Copy results from scratch to home, appending hostname to avoid overwrites pdsh -R ssh -w $SLURM_JOB_NODELIST "cp -r /mnt/local/scratch/$USER/results /home/$USER/results_\`hostname\`" # Alternatively, move results from scratch to home pdsh -R ssh -w $SLURM_JOB_NODELIST "mv /mnt/local/scratch/$USER/results /home/$USER/results_\`hostname\`" For example, assume you submitted a batch job requesting 3 nodes and slurm allocated ``cpusm[01-03]`` such that ``cpusm01`` is chosen as the node where the batch script is to be executed from. Then, the ``pdsh`` commands above would: #. Create 3 (parallel) ssh sessions from ``cpusm01`` to itself, ``cpusm02`` and ``cpusm03``. #. Within each session, instruct the node to copy (or move if using ``mv``) the folder ``/mnt/local/scratch/$USER/results`` to ``/home/$USER/``, appending ``_`` followed by the node's hostname to the copy. i.e., .. code-block:: bash # cpusm01 executes: cp -r /mnt/local/scratch/$USER/results /home/$USER/results_cpusm01 # cpusm02 executes: cp -r /mnt/local/scratch/$USER/results /home/$USER/results_cpusm02 # cpusm03 executes: cp -r /mnt/local/scratch/$USER/results /home/$USER/results_cpusm03 .. image:: images/pdsh_scratch_to_home.png :width: 900 :alt: Logical Storage Architecture Simplified Copy for Aggregated Results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If your application aggregates results on the submission node (e.g., via MPI reduction), and per-node outputs are not needed, you can use a standard copy command instead of ``pdsh``. Batch Script Example ^^^^^^^^^^^^^^^^^^^^ Here's how this workflow fits into a typical Slurm batch script: .. code-block:: bash #!/bin/bash #SBATCH ... # Copy input to scratch pdsh -R ssh -w $SLURM_JOB_NODELIST "cp -r /home/$USER/input /mnt/local/scratch/$USER/" # Run your application # ... # Copy results back to home pdsh -R ssh -w $SLURM_JOB_NODELIST "cp -r /mnt/local/scratch/$USER/results /home/$USER/results_\`hostname\`" For aggregated results (e.g., via MPI reduction): .. code-block:: bash #!/bin/bash #SBATCH ... # Copy input to scratch pdsh -R ssh -w $SLURM_JOB_NODELIST "cp -r /home/$USER/input /mnt/local/scratch/$USER/" # Run your application # ... # Copy final results from scratch to home cp -r /mnt/local/scratch/$USER/results /home/$USER/