Well, I have had my requisite good idea for the year. If you use a high-performance computing facility, odds are you use some kind of batch job system like PBS to submit and monitor jobs. We use a modified form of qstat to monitor jobs – how much walltime has elapsed, project codes, ram and scratch usage and the like.
The problem:
Unfortunately the program is terminal size-agnostic and as such fits all salient information into 80 characters. Thus the summary truncates job names to the first 8 characters. Granted, 8 characters offers many many unique job names, however I prefer my jobs to be fully summarisable by their titles. I want information on what the molecule is, what the job type is, the charge, spin multiplicity and d electron configuration, cleanly delimited for readability. This means that my job names are often something like dc6_BSLT_108b: a Broken Symmetry Linear Transit on molecule dc6 with a charge of 1, 8 unpaired electrons and a d electron configuration of ‘B’.
Why not just reverse the order?
The name is still truncated. Whilst the fiddly details of each job are now revealed, it’s difficult to see the context of the details – i.e. what the job type is and even what molecule you’re looking at!
What about the lab book?
Lab books document results and process, they shouldn’t be hashtables of job codes. Moreover, I believe that computational jobs should be effectively self-documenting – they should contain sufficient information to make sense on their own, both in the file name and in the internal comments structure.
This said, ‘electron configuration of B’ is not amazingly lucid – as such, I need to draw each configuration out (initially in the lab book, and then presentably in inkscape, illustrator or whatever), and this in some sense does constitute a lookup table. Drawing these out is necessary anyhow for when it comes time to publish/writeup.
Ultimately, the real point is that seeing these descriptors in the PBS queue is actually so that I can keep track of what has been submitted without having to use something so barbaric and crass as a pencil and post-it note. This stops me from submitting the same job twice (which is in clear violation of the no cloning theorem) and causing some combination of the following:
- Abnormal job termination.
- Cosmic horror.
- Weakly acausal behaviour in the spinny thing.
- Corpses half-embedded in walls, floors, etc.
- Excessive IO to /scratch/
- Spontaneous formation of malevolent hyper-intelligences in the cluster.
- Wasted CPU-time quota.
- Stuff like in that movie ‘Jacob’s Ladder’
You’re rambling.
Okay so the dumb PBS trick is to encode information about the job in the least significant digits of the walltime limit. Why set a job time limit at 4:00:00 when you can set it for 4:18:01 and be able to glean the relevant details of the job from that? Anyhow, this is my new tactic to keep track of stuff. And, looking at the queue as a whole, it seems that at least a handful of other people are doing the same thing… I feel validated. Note that they were doing this before I thought of it, so I’ve not started a trend.
