[Moabusers] Job reported idle

Douglas Wightman wightman at clusterresources.com
Fri Oct 24 07:30:56 MDT 2008


These messages mean that Moab tells the RM (torque?) to start the job, Torque responds with "success".  The next iteration the job is idle.  These almost always indicate a failure of the RM to start the job.  You'll probably need to go to the RM logs to find out what is going on.

- Douglas

----- "Glen Beane" <Glen.Beane at jax.org> wrote:

> Can someone tell me what could cause Moab to set a jobs comment field
> to
> this:
> 
> job rejected by RM 'base' - job started on hostlist COMPUTE-36-1
>     8,COMPUTE-36-14,COMPUTE-35-11,COMPUTE-35-10... at time
> 07:09:26_10/24,
>      job reported idle at time 07:09:29_10/24 (see RM logs for
> details)
> 
> 
> I am trying to launch a 256 proc job on a large cluster (>1000 nodes)
> at an
> external site (so I have no control over the cluster, but I get the
> feeling
> they don't really know what is going on)
> 
> I had been able to launch the job just fine,  now every time I try to
> start
> the job it gets that message set in its comment field and then stays
> queued
> (it appears to get a 1 hour deferment)
> 
> 
> I also tried submitting a short small interactive job and get this
> emailed
> to me:
> 
> MOAB_INFO:  interactive job can never run - job rejected by RM 'base'
> - job
> started on hostlist
> COMPUTE-36-18,COMPUTE-36-14,COMPUTE-35-11,COMPUTE-35-10... at time
> 07:20:07_10/24, job reported idle at time 07:20:09_10/24 (see RM logs
> for
> details)
> 
> --
> Glen L. Beane
> Software Engineer
> The Jackson Laboratory
> Phone (207) 288-6153
> 
> 
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers


More information about the moabusers mailing list