Maui Admin - Scheduling Iterations and Job Flow
3.3 Scheduling Iterations and Job Flow
3.3.1 Scheduling Iterations
In any given scheduling iteration, many activities
take place. These are broken into the following major categories:
Update State Information
Refresh Reservations
Schedule Reserved Jobs
Schedule Priority Jobs
Backfill Jobs
Update Statistics
Handle User Requests
3.3.1.1 Update State Information
Each iteration, the scheduler contacts the resource
manager(s) and requests up to date information on compute resources, workload,
and policy configuration. On most systems, these calls are to a centralized
resource manager daemon which possesses all information.
3.3.1.2 Refresh Reservations
3.3.1.3 Schedule Reserved Jobs
3.3.1.4 Schedule Priority Jobs
In scheduling jobs, multiple steps occur.
3.3.1.5 Backfill Jobs
3.3.1.6 Update Statistics
3.3.1.7 Handle User Requests
User requests include any call requesting state information,
configuration changes, or job or resource manipulation commands.
These requests may come in the form of user client calls, peer daemon calls,
or process signals.
3.3.1.8 Perform Next Scheduling Cycle
Maui operates on a polling/event driven basis.
When all scheduling activities are complete, Maui will process user requests
until a new resource manager event is received or an internal event is
generated. Resource manager events include activities such as a new
job submission or completion of an active job, addition of new node resources,
or changes in resource manager policies. Internal events include
admin 'schedule' requests, reservation
activation/deactivation, or the expiration of the RMPOLLINTERVAL
timer.
3.3.2 Detailed Job Flow
3.3.2.1 Determine Basic Job Feasibility
The first step in scheduling is determining which
jobs are feasible. This step eliminates jobs which have job holds
in place, invalid job states (i.e., Completed, Not Queued, Defered, etc),
or unsatisfied preconditions. Preconditions may include stage-in
files or completion of preliminary job steps.
3.3.2.2 Prioritize Jobs
With a list of feasible jobs created, the next step
involves determining the relative priority
of all jobs within that list. A priority for each job is calculated
based on job attributes such as job owner, job size, length of time the
job has been queued, and so forth.
3.3.2.3 Enforce Configured Throttling Policies
Any configured throttling
policies are then applied constraining how many jobs, nodes, processors,
etc are allowed on a per credential basis. Jobs which violate these
policies are not considered for scheduling.
3.3.2.4 Determine Resource Availability
For each job, Maui attempts to locate the required
compute resources needed by the job. In order for a match to be made,
the node must possess all node attributes specified by the job and possess
adequate available resources to meet the TasksPerNode job constraint
(Default TasksPerNode is 1) Normally, Maui determine a node to have
adequate resources if the resources are neither utilized by nor dedicated
to another job using the calculation
R.Available = R.Configured - MAX(R.Dedicated,R.Utilized).
The RESOURCEAVAILABILITYPOLICY
parameter can be modified to adjust this behavior.
3.3.2.5 Allocate Resources to Job
If adequate resources can be found for a job, the
node allocation policy is then
applied to select the best set of resources. These allocation policies
allow selection criteria such as speed of node, type of reservations, or
excess node resources to be figured into the allocation decision to improve
the performance of the job and/or maximize the freedom of the scheduler
in making future scheduling decisions.
3.3.2.6 Distribute Jobs Tasks Across Allocated Resources
With the resources selected, Maui then maps job tasks
to the actual resources. This distribution of tasks is typically
based on simple task distribution algorithms such as round-robin or max
blocking, but can also incorporate parallel language library (i.e., MPI,
PVM, etc) specific patterns used to minimize interprocess communication
overhead.
3.3.2.7 Launch Job
With the resources selected and task distribution
mapped, the scheduler then contacts the resource manager and informs it
where and how to launch the job. The resource manager then initiates
the actual job executable.
|