hdeeprm.environment module

The environment is the representation of the agent’s observable context.

class hdeeprm.environment.Environment(workload_manager, env_options: dict)[source]

Bases: gym.core.Env

Environment for workload management in HDeepRM.

It is composed of an action space and an observation space. For every decision step, the agent selects an action, which is applied to the environment. This involves mapping pending jobs to available cores. Changes in environment’s state are manifested as observations. For each action taken, the environment provides a reward as feedback to the agent based on its objective. The environment implementation is compliant with OpenAI gym format.

Any observation is formed by the following data fields:
- Fraction of available memory in each node
- Fraction of available memory bandwidth in each processor
- Fraction of current GFLOPs and Watts with respect to the maximum values for each core
- Fraction left for completing the served job by the core
- Fraction of requested resources with respect to the maximum values of requested time/cores/mem/mem_bw for pending jobs; five percentiles are shown (min, Q1, med, Q3, max) such that the agent can devise a job distribution
- Variability ratio of job queue size with respect to last observation
The action space is constituted by 37 possible actions, including a void action:
Job selection Core selection
  RANDM HICOM HICOR HIMEM HIMBW LPOWR
RANDM 0 1 2 3 4 5
FIARR 6 7 8 9 10 11
SHORT 12 13 14 15 16 17
SMALL 18 19 20 21 22 23
LRMEM 24 25 26 27 28 29
LRMBW 30 31 32 33 34 35
Void action 36          
Job selection policies:
- RANDM (random): random job in the job queue.
- FIARR (first): oldest job in the job queue.
- SHORT (shortest): job with the least requested running time.
- SMALL (smallest): job with the least requested cores.
- LRMEM (low_mem): job with the least requested memory capacity.
- LRMBW (low_mem_bw): job with the least requested memory bandwidth.
Core selection policies:
- RANDM (random): random core in the core pool.
- HICOM (high_gflops): core with the highest peak compute capability.
- HICOR (high_cores): core in the processor with the most amount of available cores.
- HIMEM (high_mem): core in the node with the most amount of current memory capacity.
- HIMBW (high_mem_bw): core in the processor with the most amount of current memory bandwidth.
- LPOWR (low_power): core with the lowest power consumption.
Possible objectives for the agent:
- Average job slowdown: on average, how much of the service time is due to stalling of jobs in the job queue.
- Average job completion time: on average, how much service time for jobs in the platform.
- Utilization: number of active cores over the simulation time.
- Makespan: time span from the arrival of the absolute first job until the completion of the absolute last job.
- Energy consumption: total amount of energy consumed during the simulation.
- Energy Delay Product (EDP): product of the energy consumption by the makespan.
workload_manager

Reference to HDeepRM workload manager required to schedule the jobs on the decision step.

Type:HDeepRMWorkloadManager
action_space

The action space described above. See Spaces.

Type:gym.spaces.Discrete
action_keys

List of sorting key pairs indexed by action IDs. Keys are applied to the job scheduler and the resource manager selections.

Type:list
observation_space

The observation space described above. See Spaces.

Type:gym.spaces.Box
reward

Mapped to a reward function depending on the agent’s objective.

Type:function
queue_sensitivity

Sensitivity of the observation to variations in job queue size. If sensitivity is high, larger variations will be noticed, however smaller ones will not have significant impact. If sensitivity is low, smaller variations will be noticed and large ones will be clipped, thus impactless.

Type:float
last_job_queue_length

Last value of the job queue length. Used for calculating the variation ratio.

Type:int
action_size

Action space size.

Utilized for output layer sizing in agent’s inner models.

Returns:The size of the action space.
avg_job_completion_reward() → float[source]

Reward when the objective is to minimize average job completion time.

It is the negative the number of unfinished jobs in the system. As more jobs are completed, the reward will be higher.

Returns:Negative number of unfinished jobs in the system.
avg_job_slowdown_reward() → float[source]

Reward when the objetive is to minimize average job slowdown.

It is the negative inverse summation of requested times. If the agent is prioritizing short jobs, slowdowns will also go down, because the working set of jobs will do too.

Returns:Negative inverse summation of requested times of all jobs active in the system.
avg_utilization_reward() → float[source]

Reward when the objective is to maximize average utilization.

Average utilization is the average number of active resources during the simulation. Reward is then the number of active resources.

Returns:Number of active resources.
edp_reward() → float[source]

Reward when the objective is to minimize the Energy-Delay Product (EDP).

TODO

energy_consumption_reward() → float[source]

Reward when the objective is to minimize total energy consumption.

It is negative the of current power usage in the data centre. Keeping the power low will decrease total energy consumed.

Returns:Negative the power usage in the data centre service.
makespan_reward() → float[source]

Reward when the objective is to minimize makespan.

Makespan is the total time from the arrival of the first job to the completion of the last one. Reward is the total number of current GFLOPs in the data centre. Higher throughputs will lead to lower makespans.

Returns:Current total GFLOPs provided by the data centre service.
observation_size

Observation space size.

Utilized for input layer sizing in agent’s inner models.

Returns:The size of the observation space.
render(mode='human')[source]

Not used.

reset()[source]

Not used.

step(action: int) → None[source]

Step representing the environment alteration.

Jobs are mapped into available resources and further communicated to Batsim. If void action is selected, no scheduling occurs.

Parameters:action (int) – Action ID to be applied.