Other links:
A tutorial lecture on using the Cluster via Enigma and Sun Grid Engine (SGE)
Troubleshooting
The cluster is the computational workhorse for the department and all users are encouraged to run jobs on it. As stated above, the machine enigma is the access host for the cluster. You will not be logging directly into the compute nodes of the cluster rather, you will logon to enigma and then submit jobs to the cluster nodes.
There are currently 37 cluster compute nodes, each with two CPUs (some dual-core). Currently, the cluster nodes are arranged in the following configuration:
| Number of nodes | Memory | Swap Space | Processor Cores | Cluster job slots |
|---|---|---|---|---|
| 15 | 2 GB | 2 GB | 2 | 3 |
| 7 | 4 GB | 4 GB | 2 | 3 |
| 8 | 8 GB | 8 GB | 2 | 3 |
| 4 | 16 GB | 16 GB | 2 | 3 |
| 4 | 32 GB | 32 GB | 4 | 5 |
| (32G nodes are currently reserved for special projects.) | ||||
The above configuration may change depending on maintenance needs and not all nodes are available for all types of jobs. For example, due to licensing restrictions, one 8GB node is reserved for SAS.
Everything related to job submission, scheduling, and execution on the cluster is under the control of Sun's Grid Engine software (SGE). The Grid Engine project sponsored by Sun Microsystems is an open source community effort to facilitate the adoption of distributed computing solutions. Among other things, we use SGE to limit the total number of jobs each user is allowed to run simultaneously on the cluster (currently 16, but subject to change). However, you may submit more jobs than the limit all of which will be queued to run as your other jobs finish. When the cluster nodes are all at maximum capacity, jobs waiting to run will be subject to a functional share priority algorithm as we have defined it using SGE.
Do NOT run compute-intensive/long-running jobs on enigma! This machine is not for doing any sort of computation. Rather, it is ONLY for prototyping and submitting jobs to the cluster. Any long-running jobs found running on enigma (R BATCH jobs, for example) may be KILLED WITHOUT WARNING. You will lose any data and/or computations associated with the running job.
qrshYou will be logged into a "random" cluster node and get an interactive shell prompt, just as if you logged into enigma. Now you can run whatever program you want. For example, you can run R. However, you must remember to logout ('exit' or 'CTRL D'). Otherwise, you will be taking up a slot in the queue which will not be available to others.
While you are logged into a cluster node via   qrsh , if you run
qstat -u YOUR_USER_IDyou'll see something like the following:
job-ID prior name user state submit/start at queue slots ------------------------------------------------------------------------------------------------------- 15194 1.53962 Pf_3D7 rpeng r 08/15/2007 16:04:16 standard.q@compute-0-20.local 1 15299 2.00790 BootA10600 rpeng r 08/17/2007 15:26:06 standard.q@compute-0-10.local 1 15290 2.35449 QRLOGIN rpeng r 08/17/2007 15:20:00 standard.q@compute-0-11.local 1The job labeled QRLOGIN is the interactive session ( for more info see Checking the status of your job).
NOTE: If you encounter an error while running a program interactively on a cluster node and your program crashes, it still might be in the cluster's process queue. If you don't quit out of your program normally, make sure to check the cluster queue (via qstat, see below) and see if your (interactive) job is still there. If it is, get the job-ID and kill the job using qdel.
You may also specify memory requirements or special queues on your qrsh command just as you do on the qsub command (see below). For interactive work we strongly encourage users to work on the cluster via qrsh (rather than use enigma).
nice +19 R CMD BATCH mycommands.R &where mycommands.R is a file of R commands that you want to run.     Remember, you SHOULD NOT do this on enigma (see IMPORTANT).
To run an R BATCH job on the cluster using the mycommands.R file, your batch.sh file would look something like this:
#!/bin/bash R CMD BATCH mycommands.RThe technical name for this file is "shell script". Knowing this might help you communicate with the system administrator.
qsub -cwd batch.shThe -cwd switch tells the cluster to execute the batch.sh script in the current working directory (otherwise, it will run in your home directory, which is probably not what you want).
When submitting your job(s), if you do not specify any memory requirements, SGE will choose the cluster node(s) with the lowest CPU load WITHOUT REGARD TO MEMORY AVAILABILITY (subject to other scheduling parameters which we have defined).
In an effort to prevent "misbehaving" cluster jobs from exceeding the available memory on a given node, we have implemented job memory limits appropriate to each type of node. When a cluster job exceeds the following established memory limits, it will be AUTOMATICALLY ABORTED.
2GB nodes: MEMORY LIMIT = 2G 4GB nodes: MEMORY LIMIT = 4G 8GB nodes: MEMORY LIMIT = 8G 16GB nodes: MEMORY LIMIT = (to be determined) 32GB nodes: MEMORY LIMIT = (to be determined)It is, therefore, IMPORTANT to specify your expected memory requirements when submitting cluster jobs that may require more than approx. 1.5G . After calculating approximately how much memory your job will need, you should add a memory resource requirement to your qsub (or qrsh ) command.
qsub -cwd -l mem_free=[[memory needed]]
For example, if your job will require 4GB of memory, you should type
qsub -cwd -l mem_free=4000M ... or qsub -cwd -l mem_free=4G
The actual amount of RAM (memory) available on each node is usually less than the nominal amount of RAM installed:
| Node (nominal) | Available RAM (approx.) | Available swap (approx.) |
|---|---|---|
| 2 GB | 2.0 GB | 2.0 GB |
| 4 GB | 3.9 GB | 3.9 GB |
| 8 GB | 7.7 GB | 7.8 GB |
| 16 GB | 15.6 GB | 15.6 GB |
| 32 GB | 31.4 GB | 31.2 GB |
| (32G nodes are currently reserved for special projects.) | ||
As indicated in the CONFIGURATION table above, the combined pool of 2G and 4G nodes provides many more nodes (and therefore more slots) than the 8G and 16G nodes. Over estimating your job's memory requirements may push your job to higher memory nodes, and possibly force your jobs to compete for fewer slots. The example above (with -l mem_free=4G) will always be forced to an 8 GB (or higher) node. So ... be aware of the total available memory per node when specifying memory requirements.
To see a summary of available nodes and their memory capacity and current load, use the command   qhostw   .
After submitting your job with qsub, use the qu command to see which queue (node) your job actually went to (see Checking the status of your job). In the output of qu , the next to last column lists the queue name.
qstat -j NNNNN | grep vmem
where NNNNN is your specific cluster job number ... look at the "vmem"
and "maxvmem" entries.
To make it easier to monitor memory usage for your currently running jobs, we have created the command qmem . If you have no jobs running on the cluster qmem will print nothing, but if you do, the results will look something like:
[enigma]$ qmem 10506 rpeng node=33 vmem=289.1M, maxvmem=294.3M howMany10.sh 14257 rpeng node=8 vmem=231.5M, maxvmem=238.0M s.all.sh 16695 rpeng node=25 vmem= 1.8G, maxvmem= 1.8G mergedoc1.3.sh 17464 rpeng node=15 vmem=272.9M, maxvmem=284.0M simulateVariance.sh 17555 rpeng node=12 vmem= 0.0A, maxvmem= 0.0A QRLOGIN 17584 rpeng node=6 vmem=315.1M, maxvmem=334.3M calculateVaried-emp.genSampScheme.sh
To see your job's memory usage upon job completion, use email notification, which works for aborted jobs as well. See the job status via email discussion for instructions on how to use email notification.
Note: qrsh sessions will not report memory usage using the above method. You will simply see "N/A" in the entries for vmem and maxvmem.
By default, under our version of SGE, qstat with no arguments shows cluster jobs for all users. To restrict the output to show only your jobs, use the -u USERID argument. For example:
qstat -u rpeng
would only display active/pending jobs for user rpeng.
However, we have created the command qu to easily accomplish the same thing (view only your jobs). If you have no jobs running on the cluster qu will print nothing, but if you do, the results will look something like:
[enigma]$ qu job-ID prior name user state submit/start at queue slots ------------------------------------------------------------------------------------------------------- 15194 1.53962 Pf_3D7 rpeng r 08/15/2007 16:04:16 standard.q@compute-0-20.local 1 15299 2.00790 BootA10600 rpeng r 08/17/2007 15:26:06 standard.q@compute-0-10.local 1 15290 2.35449 QRLOGIN rpeng r 08/17/2007 15:20:00 standard.q@compute-0-11.local 1Under the state column you can see the status of your job. Some of the codes are
Another important thing to note is the job-ID for your job. You need to know this if you ever want to make changes to your job. For example, to delete your job from the cluster, you can run
qdel 40
where 40 is the job-ID   I got from running qstat.
qsub -m e -M your_email@jhsph.edu your_job.sh
which means send email to given address(es) when the job ends.
If you want to automatically have such options (or others) always added to your job(s), simply put them in a file named .sge_request in your home directory. You can also have working-directory-specific .sge_request files (see the man page for sge_request - man sge_request).
Lines like this in your .sge_request file:
-M your_email@jhsph.edu -m ewill cause an email to be sent, when your job ends, for every cluster job that you start (including, for what it's worth, a qrsh 'job').
You could use   -m n on individual qsub job command lines to suppress email notification for certain jobs.
Or better yet, ... you might only put the -M your_email@jhsph.edu in the .sge_request file and simply use the -m e option on jobs for which you want email notification.
Note: You may also invoke the options shown above (and others) by including special lines at the top of your job shell scripts. Lines beginning with #$ are interpreted as qsub options for that job. For example, if the first few lines of your script look like the following:
#!/bin/bash #$ -M joe_x@gmail.com #$ -m eThe lines beginning with #$ would cause SGE to send email to 'joe_x@gmail.com' when the job ends.
#$ -m bewould cause an email to be sent when the job begins ('b') and ends ('e'). See the manual page for qsub (type man qsub at a shell prompt ) to get more information.
A special queue has been created (currently consisting of 2 slots on one node) for "express" jobs. Use this express queue to avoid "traffic jams" on the rest of the cluster when you need to run a relatively quick job, whether it be with qrsh or qsub. The express queue can be selected by using the   -l express   option on your qrsh or qsub command. Each job (or interactive session) run using the express queue is limited to 30 minutes of cpu time and 3 hours of clock time.
The express queue node(s) are reserved for express jobs unless there are no more 2GB, 4GB, or 8GB slots available in the standard queues; in which case, the express queue nodes may be used to satisfy standard queue requests.
Remember, to see a summary of available hosts and their current memory capacity and load, use the command   qhostw   .
Please send any questions or comments about this document to
Roger Peng (rpeng at jhsph.edu) or
Marvin Newhouse (marv at jhu.edu).
Questions about enigma or the cluster should be sent to BITSUPPORT ( bitsupport at jhsph.edu ).
This document was last modified on 2008-Mar-30