Tensorflow

To use TensorFlow on the cluster, start by reviewing the Conda installer and how to manage Conda environments.

There are two main ways to use TensorFlow:

Using the Global Conda Environment

The cluster provides a pre-configured global Conda environment with TensorFlow. Note that only administrators can modify this environment, so you’re limited to the installed packages.
```
module load miniforge3/25.3.1-gcc-11.4.1
conda env list
conda activate tensorflow
```
Creating a Custom Conda Environment

Note

TensorFlow is typically installed via pip in Conda environments. Ensure your environment includes a version of TensorFlow built with GPU support (e.g., tensorflow==2.15.0 or later).

You can create your own environment in two ways:
Clone the global tensorflow environment Refer to Cloning an Environment. After exporting the environment, you can edit the environment.yml file before creating your custom environment, or use it as-is and install additional packages as needed.
Create a new environment from scratch This gives you full control over the packages and versions you include.
module load miniforge3/25.3.1-gcc-11.4.1
conda create --name my_tensorflow_env tensorflow-gpu

Verifying GPU Availability

After activating your environment, verify that TensorFlow detects the GPUs:

import tensorflow as tf

print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print("GPU Devices:", tf.config.list_physical_devices('GPU'))

If GPUs are available, you should see output listing one or more GPU devices.

Single Node, Multi-GPU Training

TensorFlow automatically uses all visible GPUs. You can control GPU memory growth and device placement as follows:

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(1)
    ])
    model.compile(optimizer='adam', loss='mse')

# Dummy data
import numpy as np
x = np.random.randn(1000, 10)
y = np.random.randn(1000, 1)

model.fit(x, y, epochs=5, batch_size=64)

TensorFlow guesses the optimal number of threads (CPU cores) to use, but you can manually control this by

Setting environmental variables like TF_NUM_INTEROP_THREADS and TF_NUM_INTRAOP_THREADS in your batch submission script or within your python script. For example,

import os
import tensorflow as tf

os.environ["TF_NUM_INTEROP_THREADS"] = "2"
os.environ["TF_NUM_INTRAOP_THREADS"] = "4"

or by modifying TensorFlow’s runtime configuration with:

# Num threads for parallelism between independent operations
tf.config.threading.set_inter_op_parallelism_threads(num)
# Num threads for parallelism within an individual operation
tf.config.threading.set_intra_op_parallelism_threads(num)

Slurm script for single-node, multi-GPU training:

#SBATCH --job-name=tf_single_node
#SBATCH --nodes=1
#SBATCH --gpus-per-node=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=24
#SBATCH --time=01:00:00
#SBATCH --partition=gpu2h100

module load miniforge3/25.3.1-gcc-11.4.1
conda activate my_tensorflow_env

python train_tf.py