hdeeprm.environment module¶

The environment is the representation of the agent’s observable context.

class hdeeprm.environment.Environment(workload_manager, env_options: dict)[source]¶

Bases: gym.core.Env

Environment for workload management in HDeepRM.

It is composed of an action space and an observation space. For every decision step, the agent selects an action, which is applied to the environment. This involves mapping pending jobs to available cores. Changes in environment’s state are manifested as observations. For each action taken, the environment provides a reward as feedback to the agent based on its objective. The environment implementation is compliant with OpenAI gym format.

Any observation is formed by the following data fields:

- Fraction of available memory in each node
- Fraction of available memory bandwidth in each processor
- Fraction of current GFLOPs and Watts with respect to the maximum values for each core
- Fraction left for completing the served job by the core
- Fraction of requested resources with respect to the maximum values of requested
time/cores/mem/mem_bw for pending jobs; five percentiles are shown (min, Q1, med, Q3, max) such
that the agent can devise a job distribution
- Variability ratio of job queue size with respect to last observation

The action space is constituted by 37 possible actions, including a void action:

Job selection	Core selection
	RANDM	HICOM	HICOR	HIMEM	HIMBW	LPOWR
RANDM	0	1	2	3	4	5
FIARR	6	7	8	9	10	11
SHORT	12	13	14	15	16	17
SMALL	18	19	20	21	22	23
LRMEM	24	25	26	27	28	29
LRMBW	30	31	32	33	34	35
Void action	36

Job selection policies:

- RANDM (random): random job in the job queue.
- FIARR (first): oldest job in the job queue.
- SHORT (shortest): job with the least requested running time.
- SMALL (smallest): job with the least requested cores.
- LRMEM (low_mem): job with the least requested memory capacity.
- LRMBW (low_mem_bw): job with the least requested memory bandwidth.

Core selection policies:

- RANDM (random): random core in the core pool.
- HICOM (high_gflops): core with the highest peak compute capability.
- HICOR (high_cores): core in the processor with the most amount of available cores.
- HIMEM (high_mem): core in the node with the most amount of current memory capacity.
- HIMBW (high_mem_bw): core in the processor with the most amount of current memory bandwidth.
- LPOWR (low_power): core with the lowest power consumption.

Possible objectives for the agent:

- Average job slowdown: on average, how much of the service time is due to stalling of jobs in
the job queue.
- Average job completion time: on average, how much service time for jobs in the platform.
- Utilization: number of active cores over the simulation time.
- Makespan: time span from the arrival of the absolute first job until the completion of the
absolute last job.
- Energy consumption: total amount of energy consumed during the simulation.
- Energy Delay Product (EDP): product of the energy consumption by the makespan.

workload_manager¶

Reference to HDeepRM workload manager required to schedule the jobs on the decision step.

Type:	`HDeepRMWorkloadManager`

action_space¶

The action space described above. See Spaces.

Type:	gym.spaces.Discrete

action_keys¶

List of sorting key pairs indexed by action IDs. Keys are applied to the job scheduler and the resource manager selections.

Type:	list

observation_space¶

The observation space described above. See Spaces.

Type:	gym.spaces.Box

reward¶

Mapped to a reward function depending on the agent’s objective.

Type:	function

queue_sensitivity¶

Sensitivity of the observation to variations in job queue size. If sensitivity is high, larger variations will be noticed, however smaller ones will not have significant impact. If sensitivity is low, smaller variations will be noticed and large ones will be clipped, thus impactless.

Type:	float

last_job_queue_length¶

Last value of the job queue length. Used for calculating the variation ratio.

Type:	int

action_size¶

Action space size.

Utilized for output layer sizing in agent’s inner models.

Returns:	The size of the action space.

avg_job_completion_reward() → float[source]¶

Reward when the objective is to minimize average job completion time.

It is the negative the number of unfinished jobs in the system. As more jobs are completed, the reward will be higher.

Returns:	Negative number of unfinished jobs in the system.

avg_job_slowdown_reward() → float[source]¶

Reward when the objetive is to minimize average job slowdown.

It is the negative inverse summation of requested times. If the agent is prioritizing short jobs, slowdowns will also go down, because the working set of jobs will do too.

Returns:	Negative inverse summation of requested times of all jobs active in the system.

avg_utilization_reward() → float[source]¶

Reward when the objective is to maximize average utilization.

Average utilization is the average number of active resources during the simulation. Reward is then the number of active resources.

Returns:	Number of active resources.

edp_reward() → float[source]¶

Reward when the objective is to minimize the Energy-Delay Product (EDP).

TODO

energy_consumption_reward() → float[source]¶

Reward when the objective is to minimize total energy consumption.

It is negative the of current power usage in the data centre. Keeping the power low will decrease total energy consumed.

Returns:	Negative the power usage in the data centre service.

makespan_reward() → float[source]¶

Reward when the objective is to minimize makespan.

Makespan is the total time from the arrival of the first job to the completion of the last one. Reward is the total number of current GFLOPs in the data centre. Higher throughputs will lead to lower makespans.

Returns:	Current total GFLOPs provided by the data centre service.

observation_size¶

Observation space size.

Utilized for input layer sizing in agent’s inner models.

Returns:	The size of the observation space.

render(mode='human')[source]¶: Not used.

reset()[source]¶: Not used.

step(action: int) → None[source]¶

Step representing the environment alteration.

Jobs are mapped into available resources and further communicated to Batsim. If void action is selected, no scheduling occurs.

Parameters:	action (int) – Action ID to be applied.

hdeeprm

Navigation

Related Topics

hdeeprm.environment module¶