LSF - Monitoring and Controlling Jobs
With LSF, a number of commands are available for you to monitor and control job status and progress after your serial/parallel job is submitted. Below are some of the most commonly used commands.
bjobs
The “bjobs” command displays the current status of one or more jobs. If used without any options, it displays all of the pending, running or suspended jobs that you own. If you want to check the status of a specific job, use “bjobs JobID”. For example:
% bjobs 125532
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
125532 joeuser DONE int bc01-n13 bc03-n01 uname -a Mar 28 13:37
This was a rather trivial job consisting of only one command, so it ran very quickly. Its status (STAT) is DONE, which means it completed successfully. If a job returns anything other than a normal completion code, its status will be EXIT. For example, to check all jobs on the machine, use:
bjobs –u all –m bc03-n01
To find out as much as possible about a job use the long listing:
emerald% bjobs -l 783947
Job <783947>, User <mason>, Project <noproj>, Status <RUN>, Queue <week>, Job P
riority <50>, Command <sleep 60>
Tue Jul 21 10:42:50: Submitted from host <bc09-n13>, CWD </nas/uncch/home/m/a/m
ason/conifers>;
Tue Jul 21 10:42:54: Started on <bc01-n04>, Execution Home </afs/isis.unc.edu/h
ome/m/a/mason>, Execution CWD </nas/uncch/home/m/a/mason/c
onifers>;
Tue Jul 21 10:43:26: Resource usage collected.
MEM: 5 Mbytes; SWAP: 202 Mbytes; NTHREAD: 5
PGID: 19169; PIDs: 19169 19170 19172 19175
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - 8.2 - - - - - - - - -
loadStop - 9.4 - - - - - - - - -
adapter_windows css0 csss gm_ports nrt_windows ntbl_windows
loadSched - - - - - -
loadStop - - - - - -
poe
loadSched -
loadStop -
Note: the section on “Resource usage collected.” Shows memory (MEM: 5 Mbytes),
Swap space used (SWAP: 202 Mbytes), and the number of processes (NTHREAD: 5).
bhist
The “bhist” command shows the amount of time a job was pending, running and suspended. If you specify the “-l” option, “bhist” also shows a chronological summary of each change in the status of the job.
bhosts
The “bhosts” command checks availability of machines in the LSF cluster. If a host is “ok” then it has one or more job slots available for accepting submitted jobs.
bpeek
The “bpeek” command displays the stdout and stderr of a job while it is running. Usually this is only the most recent 10 lines of output.
If you use the “-f” option, “bpeek” will continue to show additional lines as they are produced. It uses the “tail –f” command to do this, so you can stop the display of the output at any time by using <Ctrl-C>.
bkill
The “bkill” command is used to kill a running, pending or suspended job. You can kill only your own jobs.
More precisely, “bkill” causes LSF to send the SIGINT and SIGTERM signals to a job to give it a chance to clean up, and then LSF sends the SIGKILL signal to kill the job.
bstop
The “bstop” command suspends a job by sending it the SIGSTOP signal. After you use the “bstop” command on a running job, the status will be USUSP.
If you use the “bstop” command on a pending job, its status will change to PSUSP. Most users will rarely need to use the “bstop” command. Of course, you can stop only your own jobs.
bresume
The “bresume” command resumes a job that was suspended by the “bstop” command. It does this by sending the SIGCONT signal to the job.
Additional help


