The National Institute for Computational Sciences

Darter

Darter: I have a Python script that uses Python's multiprocessing module for parallelization. How should this be run on Darter?


Python's multiprocessing module is similar to threading, so you should use the following in your Darter batch script to launch the python script on a single node:

module load python
aprun -d 16 python script.py
This will make all 16 cores on the node available to the Python script. Please note: whether or not the cores are fully utilized is up to the programming of the script.

Darter: How can I run my Python script in a batch script?


If the python script is parallelized using MPI (e.g. with mpi4py which is available on Darter), then it should be run just like any other MPI program using the following syntax in your batch script:

module load python
aprun -n numproc python parallel_script.py
If there is no MPI in the python script, use the following syntax in your batch script:
module load python
aprun python serial.py

Darter: How do I use the code bisection method to find a bug?

While using tools is a preferable method of debugging to simply using print statements, sometimes the latter option is the only method to find the bug. In this case, the most effective way to isolate the error in your code is through the method of bisection, which is an iterative process for tracing the program manually.

Step 1: In the main routine of your code, comment out the second half of the code (or approximately the second half).

Step 2: Compile and run the code. Did it crash as before?

Darter: How do I use Cray ATP to determine where and why a code died abnormally?

Sometimes a code will work fine in many cases and circumstances but there will be a bug which only rears its head when a certain perfect storm of case and job size occurs. This causes the code to die in a strange spot and it is not obvious exactly why or where. In cases like this, Cray's ATP (Abnormal Termination Processing) can likely help!

Simply do

Darter: How to determine memory usage on the compute node

In order to determine memory usage for a given process on a compute node, one would normally simply issue the command "top" and look at the memory usage of the process in question. However, this cannot be done on a Darter compute node, since they are not accessible to the user. Also, OOM (Out of Memory) errors often occur even when a problem has been discretized finely enough but memory leaks in the code occur in the worst case scenario, causing the program to crash.

Darter: Why shouldn’t I use "make -j 12" when compiling my code?

Unlike Darter's compute nodes, its login nodes have modest hardware specs: a single quad-core processor with 8 gigabytes of memory. However, each of the Darter login nodes may have up to 30 user login sessions active at any given time. As a result, a single user who runs a very processor- or memory-intensive task on a Darter login node can affect the work of several dozen other users. As a result, NICS recommends that concurrent makes ("make -j N") on Darter be done with an N of 2 or less.

Pages

Subscribe to RSS - Darter