Tensorflow
To use TensorFlow on the cluster, start by reviewing the Conda installer and how to manage Conda environments.
There are two main ways to use TensorFlow:
Using the Global Conda Environment
The cluster provides a pre-configured global Conda environment with TensorFlow. Note that only administrators can modify this environment, so you’re limited to the installed packages.
module load miniforge3/25.3.1-gcc-11.4.1 conda env list conda activate tensorflow
Creating a Custom Conda Environment
Note
TensorFlow is typically installed via
pipin Conda environments. Ensure your environment includes a version of TensorFlow built with GPU support (e.g.,tensorflow==2.15.0or later).You can create your own environment in two ways:
Clone the global
tensorflowenvironment Refer to Cloning an Environment. After exporting the environment, you can edit theenvironment.ymlfile before creating your custom environment, or use it as-is and install additional packages as needed.Create a new environment from scratch This gives you full control over the packages and versions you include.
module load miniforge3/25.3.1-gcc-11.4.1 conda create --name my_tensorflow_env tensorflow-gpu
Verifying GPU Availability
After activating your environment, verify that TensorFlow detects the GPUs:
import tensorflow as tf
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print("GPU Devices:", tf.config.list_physical_devices('GPU'))
If GPUs are available, you should see output listing one or more GPU devices.
Single Node, Multi-GPU Training
TensorFlow automatically uses all visible GPUs. You can control GPU memory growth and device placement as follows:
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')
# Dummy data
import numpy as np
x = np.random.randn(1000, 10)
y = np.random.randn(1000, 1)
model.fit(x, y, epochs=5, batch_size=64)
TensorFlow guesses the optimal number of threads (CPU cores) to use, but you can manually control this by
Setting environmental variables like
TF_NUM_INTEROP_THREADSandTF_NUM_INTRAOP_THREADSin your batch submission script or within your python script. For example,
import os import tensorflow as tf os.environ["TF_NUM_INTEROP_THREADS"] = "2" os.environ["TF_NUM_INTRAOP_THREADS"] = "4"
or by modifying TensorFlow’s runtime configuration with:
# Num threads for parallelism between independent operations tf.config.threading.set_inter_op_parallelism_threads(num) # Num threads for parallelism within an individual operation tf.config.threading.set_intra_op_parallelism_threads(num)
Slurm script for single-node, multi-GPU training:
#SBATCH --job-name=tf_single_node
#SBATCH --nodes=1
#SBATCH --gpus-per-node=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=24
#SBATCH --time=01:00:00
#SBATCH --partition=gpu2h100
module load miniforge3/25.3.1-gcc-11.4.1
conda activate my_tensorflow_env
python train_tf.py