-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathOptimal Policies.txt
More file actions
115 lines (100 loc) · 2.93 KB
/
Optimal Policies.txt
File metadata and controls
115 lines (100 loc) · 2.93 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Format:
environment
initial state
value function sequence
policy
convergence time
____________________________
PART A:
Environment:
./envs/doorkey-5x5-normal.env
Initial State:
(1, 2, 2, 0, 0)
All Values Converged at time:
187
Optimal Policy:
['TL', 'PK', 'TR', 'UD', 'MF', 'MF', 'TR', 'MF']
[20.0, 17.0, 16.0, 13.0, 12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/doorkey-6x6-normal.env
Initial State:
(2, 4, 0, 0, 0)
All Values Converged at time:
182
Optimal Policy:
['MF', 'TR', 'PK', 'MF', 'MF', 'MF', 'TR', 'MF', 'UD', 'MF', 'MF', 'TR', 'MF', 'MF', 'MF']
[41.0, 38.0, 35.0, 34.0, 31.0, 28.0, 25.0, 22.0, 19.0, 18.0, 15.0, 12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/doorkey-8x8-normal.env
Initial State:
(2, 2, 3, 0, 0)
All Values Converged at time:
172
Optimal Policy:
['TL', 'MF', 'TR', 'MF', 'MF', 'MF', 'TR', 'PK', 'TR', 'MF', 'MF', 'MF', 'MF', 'TR', 'UD', 'MF', 'MF', 'MF', 'TR', 'MF', 'MF', 'MF', 'MF', 'MF']
[68.0, 65.0, 62.0, 59.0, 56.0, 53.0, 50.0, 47.0, 46.0, 43.0, 40.0, 37.0, 34.0, 31.0, 28.0, 27.0, 24.0, 21.0, 18.0, 15.0, 12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/doorkey-6x6-direct.env
Initial State:
(2, 1, 0, 0, 0)
All Values Converged at time:
188
Optimal Policy:
['TL', 'TL', 'MF', 'MF']
[12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/doorkey-8x8-direct.env
Initial State:
(2, 1, 3, 0, 0)
All Values Converged at time:
185
Optimal Policy:
['TL', 'MF', 'MF', 'MF']
[12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/doorkey-6x6-shortcut.env
Initial State:
(2, 1, 0, 0, 0)
All Values Converged at time:
188
Optimal Policy:
['PK', 'TL', 'TL', 'UD', 'MF', 'MF']
[14.0, 13.0, 10.0, 7.0, 6.0, 3.0, 0.0]
Environment:
./envs/doorkey-8x8-shortcut.env
Initial State:
(2, 1, 3, 0, 0)
All Values Converged at time:
183
Optimal Policy:
['MF', 'TR', 'PK', 'TR', 'MF', 'TR', 'MF', 'UD', 'MF', 'MF']
[26.0, 23.0, 20.0, 19.0, 16.0, 13.0, 10.0, 7.0, 6.0, 3.0, 0.0]
_______________________________________________________________
PART B (arbitrary test cases):
Environment:
./envs/random_envs\DoorKey-8x8_31.pickle
Initial State:
(3, 5, 1, 0, 1, 2, 1, 0)
All Values Converged at time:
178
Optimal Policy:
['TR', 'MF', 'MF', 'TR', 'MF']
[15.0, 12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/random_envs\DoorKey-8x8_20.pickle
Initial State:
(3, 5, 1, 0, 0, 1, 1, 0)
All Values Converged at time:
178
Optimal Policy:
['MF', 'MF', 'TL', 'PK', 'TR', 'MF', 'TR', 'UD', 'MF', 'MF', 'MF', 'TR', 'MF']
[35.0, 32.0, 29.0, 26.0, 25.0, 22.0, 19.0, 16.0, 15.0, 12.0, 9.0, 6.0, 3.0, 0.0]
Environment:
./envs/random_envs\DoorKey-8x8_28.pickle
Initial State:
(3, 5, 1, 0, 0, 2, 0, 0)
All Values Converged at time:
178
Optimal Policy:
['MF', 'MF', 'MF', 'MF', 'TL', 'MF', 'PK', 'TL', 'MF', 'MF', 'MF', 'MF', 'TL', 'MF', 'UD', 'MF', 'MF', 'TR', 'MF']
[53.0, 50.0, 47.0, 44.0, 41.0, 38.0, 35.0, 34.0, 31.0, 28.0, 25.0, 22.0, 19.0, 16.0, 13.0, 12.0, 9.0, 6.0, 3.0, 0.0]