Collector Overview¶

Profile of Speed¶

We conduct speed tests of our collector in comparison with Tianshou, under environments with different scales and environment managers with different types.

As the results show, our collector generally outperforms Tianshou’s, especially under environments with large scales or long reset times.

Settings

We design three environments with different scales, whose parameters are as follows:

Observation Dim

Reset Time

Step Time

Infer Time

Small Env

64

0.1

0.005

0.004

Middle Env

300

0.5

0.01

0.008

Big Env

3000

2

0.1

0.02

where Observation Dim means the dimension of the observation, Reset Time, Step Time and Infer Time are the times the collector needs to reset the environment, execute a step of the environment and use the policy to infer a step. In our tests, the three types of Time uniformly fluctuate in [0.4, 1.6] times of their original values in the above table.

We conducts tests under sync and async modes, where the collector runs 8 environments in the environment manager. Under sync mode, we set wait_num as 8 and the collector waits for all 8 environments, while under async mode, it is 7. We retain the default values of timeout for both our and Tianshou’s env managers due to the different implementations of the async mode.

Wait Num

Timeout

Sync

8

None

Async

7

0.01(nx) / None(ts)

Basically, for each environment, we conduct 4 tests with the env managers under sync and async modes in ours and Tianshou, namely ours-sync, ours-async, ts-sync, ts-async. Additionally, we include our base env manager as baseline, namely ours-base.

We simulate 300 collections during one test with each collection producing at least 80 samples, and repeat each test 3 times to get the averages and standard variances. To approximate the real-world scenes, we run a pure cpu task to raise the utilization of cpu to ~60% before each test.

Results

The results of speed tests are as follows(unit: second):

ts-async

ts-sync

ours-base

ours-async

ours-sync

Small Env

49.54+0.35

44.63+0.09

157.70+0.30

47.60+0.62

47.19+1.13

Middle Env

93.02+0.09

88.70+0.14

311.88+0.22

90.84+0.67

82.73+1.51

Big Env

545.07+1.55

520.52+0.30

2592.77+0.25

519.05+1.05

467.50+2.18

We conclude the findings as:

The speeds of our subprocess collector is 3~5 times of the base collector’s.

Our collector is faster than Tianshou’s in all of the six settings except the Small Env under sync mode.

To approximate some environments with large reset times, e.g. Carla and StarCraft2, we further conduct a group of tests with all reset times multiplied by 5 (unit: second):

ts-async

ts-sync

ours-base

ours-async

ours-sync

Small Env

55.11+0.10

45.10+0.08

176.55+0.05

50.71+0.80

50.55+1.39

Middle Env

130.49+0.09

112.27+0.03

407.65+0.14

98.49+1.29

94.18+1.74

Big Env

703.49+0.61

577.92+0.30

2976.80+0.39

555.15+1.90

520.31+1.05

As shown above, Tianshou’s collector bears drastic speed decreases in such environments compared with our collector, which uses reset threads to avoid busy waiting and remain high performance under large reset times.

	Observation Dim	Reset Time	Step Time	Infer Time
Small Env	64	0.1	0.005	0.004
Middle Env	300	0.5	0.01	0.008
Big Env	3000	2	0.1	0.02

	Wait Num	Timeout
Sync	8	None
Async	7	0.01(nx) / None(ts)