1
00:00:00,060 --> 00:00:02,829
let's now speak briefly about stochastic
现在让我们简要地谈论一下随机

2
00:00:03,029 --> 00:00:08,080
games this is a topic that lends itself
游戏这是一个适合自己的话题

3
00:00:08,279 --> 00:00:11,859
to a very long discussion and quite a
经过很长时间的讨论， 

4
00:00:12,058 --> 00:00:16,240
complicated one but we'll touch on the
复杂的一个，但我们将涉及

5
00:00:16,440 --> 00:00:19,499
main points and position it to the
要点并定位到

6
00:00:19,699 --> 00:00:22,600
landscape of topics we're discussing and
我们正在讨论的主题以及

7
00:00:22,800 --> 00:00:24,970
so the strategy the striking point are
所以策略的重点是

8
00:00:25,170 --> 00:00:27,730
repeated games as we know a repeated
我们知道重复的游戏

9
00:00:27,929 --> 00:00:31,390
game is simply a game in normal form for
游戏只是一种普通形式的游戏

10
00:00:31,589 --> 00:00:33,640
example that will repeat over and over
这个例子会一遍又一遍地重复

11
00:00:33,840 --> 00:00:37,599
again for example we play prisoner's
再例如，我们扮演囚犯

12
00:00:37,799 --> 00:00:40,839
dilemma once twice three times maybe
两次困境两次，也许

13
00:00:41,039 --> 00:00:42,518
finite time maybe an infinite number of
有限的时间也许是无限的

14
00:00:42,719 --> 00:00:44,678
time then we accumulate all the rewards
时间，然后我们积累所有的奖励

15
00:00:44,878 --> 00:00:47,369
all the time to sum over all rewards
一直在总结所有奖励

16
00:00:47,570 --> 00:00:49,899
stochastic gamely is a generalization of
随机博弈是对

17
00:00:50,100 --> 00:00:53,619
it where we play games repeatedly but
它是我们反复玩游戏的地方，但是

18
00:00:53,820 --> 00:00:56,439
not necessarily the same game so we play
不一定是同一游戏，所以我们玩

19
00:00:56,640 --> 00:00:59,768
a game depending on how we played that
取决于我们如何玩的游戏

20
00:00:59,969 --> 00:01:01,809
game let's say prisoner's dilemma we
游戏让我们说囚徒的困境

21
00:01:02,009 --> 00:01:04,629
each got some payoff but depending on
每个人都有一些回报，但取决于

22
00:01:04,829 --> 00:01:06,869
how we play the game we also probable
我们如何玩游戏，我们也可能

23
00:01:07,069 --> 00:01:08,769
probabilistically transition to some
概率性地过渡到一些

24
00:01:08,969 --> 00:01:13,179
other game play that in turn and the
依次进行的其他游戏和

25
00:01:13,379 --> 00:01:17,109
process continues a graphical way to
过程继续以图形方式

26
00:01:17,310 --> 00:01:20,230
look at it is here if if this here is a
看这是这里，如果这是一个

27
00:01:20,430 --> 00:01:23,259
repeated game where you play the same
重复游戏，您玩同样的游戏

28
00:01:23,459 --> 00:01:26,079
game over and over again here you play
在这里玩一遍又一遍的游戏

29
00:01:26,280 --> 00:01:30,730
the game and then if you happen to play
游戏，然后如果您碰巧玩

30
00:01:30,930 --> 00:01:33,789
this your transition to this game if you
这是您向该游戏的过渡，如果您

31
00:01:33,989 --> 00:01:35,349
happen to just play this you maybe
碰巧只是玩这个，你也许

32
00:01:35,549 --> 00:01:37,628
transition to the same game if you play
如果您玩的话，过渡到同一个游戏

33
00:01:37,828 --> 00:01:39,429
this you change your transition here if
如果你改变这里的过渡

34
00:01:39,629 --> 00:01:40,689
you play this maybe you'll play this
你玩这个也许你会玩这个

35
00:01:40,890 --> 00:01:44,198
game again and so on from each game you
再玩一次，依此类推

36
00:01:44,399 --> 00:01:47,890
transition probabilistically to toe it
概率过渡到脚趾

37
00:01:48,090 --> 00:01:51,959
to other games this is a stochastic game
对其他游戏来说这是一个随机的游戏

38
00:01:52,159 --> 00:01:55,750
formally speaking it's it's the
正式来说是

39
00:01:55,950 --> 00:01:57,759
following tuple it's a lot of notation
以下元组有很多符号

40
00:01:57,959 --> 00:02:01,689
but the concept is exactly so we have is
但是这个概念正是我们所拥有的

41
00:02:01,890 --> 00:02:06,488
a finite set of states Q we have a set
有限的一组状态Q我们有一组

42
00:02:06,688 --> 00:02:11,340
of players we have a set of actions
我们有一系列行动

43
00:02:11,539 --> 00:02:13,660
where
哪里

44
00:02:13,860 --> 00:02:17,680
actions are available to two specific
有两个特定的操作可用

45
00:02:17,879 --> 00:02:20,620
players of a sub I is the action
子我是玩家的动作

46
00:02:20,819 --> 00:02:25,390
available player I and then we have two
可用的球员我，然后我们有两个

47
00:02:25,590 --> 00:02:29,650
two functions we have the transition
我们有两个功能过渡

48
00:02:29,849 --> 00:02:31,660
probability function so depending on the
概率函数，因此取决于

49
00:02:31,860 --> 00:02:35,130
state were in and on the actions we took
陈述我们采取的行动

50
00:02:35,330 --> 00:02:38,530
we move to each of any of the other
我们移到其他任何一个

51
00:02:38,729 --> 00:02:41,110
states or the very same state with a
状态或与一个状态完全相同的状态

52
00:02:41,310 --> 00:02:43,660
certain probability as governed by this
受此约束的特定概率

53
00:02:43,860 --> 00:02:50,320
probability distribution and and
概率分布和

54
00:02:50,520 --> 00:02:53,770
similarly a reward is the reward
同样，奖励就是奖励

55
00:02:53,969 --> 00:02:57,580
function which tells us if in a certain
告诉我们如果

56
00:02:57,780 --> 00:03:00,010
State a certain action profile was taken
说明采取了某种行动

57
00:03:00,210 --> 00:03:03,000
by the agents then this is a reward to
由代理商，那么这是对

58
00:03:03,199 --> 00:03:06,430
to that particular agent to eat each of
给那个特工吃

59
00:03:06,629 --> 00:03:08,920
the agents so our sub I is a reward to
代理商，所以我们的子我是对

60
00:03:09,120 --> 00:03:12,960
two agent I that's the formal definition
我是正式的两个代理人

61
00:03:13,159 --> 00:03:15,580
notice that it's sort of assumes
请注意，这是一种假设

62
00:03:15,780 --> 00:03:18,250
implicitly that you have the same action
暗含你有相同的动作

63
00:03:18,449 --> 00:03:24,520
spaces here but you could define it
这里有空格，但您可以定义它

64
00:03:24,719 --> 00:03:26,380
otherwise it simply would involve more
否则，它将涉及更多

65
00:03:26,580 --> 00:03:27,990
notations or nothing inherently
表示法或本来就不存在

66
00:03:28,189 --> 00:03:30,789
important about the action spaces being
关于动作空间很重要

67
00:03:30,989 --> 00:03:32,500
the same in the different games within
在不同的游戏中相同

68
00:03:32,699 --> 00:03:36,550
this stochastic game so just a few final
这个随机的游戏，所以最后几场

69
00:03:36,750 --> 00:03:39,520
comments on it first of all as we as we
首先像我们一样评论它

70
00:03:39,719 --> 00:03:42,100
saw this obviously general generalizes
看到这显然可以概括

71
00:03:42,300 --> 00:03:44,710
the notion of a repeated game but it
重复游戏的概念，但它

72
00:03:44,909 --> 00:03:47,469
also generalizes the notion of an MDP or
也概括了MDP或

73
00:03:47,669 --> 00:03:51,939
a Markov decision process if a if a
马尔可夫决策过程，如果

74
00:03:52,139 --> 00:03:54,009
sarcastic game if a repeated game is a
如果重复的游戏是一个讽刺游戏

75
00:03:54,209 --> 00:03:57,090
stochastic game with only one game a
一个只有一个游戏的随机游戏

76
00:03:57,289 --> 00:04:00,910
Markov decision process on MVP is a game
 MVP的马尔可夫决策过程是一个游戏

77
00:04:01,110 --> 00:04:04,390
with only one player and so you have
只有一个玩家，所以你有

78
00:04:04,590 --> 00:04:08,770
States there where the agents take agent
代理商所在的国家

79
00:04:08,969 --> 00:04:11,620
takes an action receives a remediate
采取行动得到补救

80
00:04:11,819 --> 00:04:13,920
reward and probably move to some other
奖励并可能转移到其他人

81
00:04:14,120 --> 00:04:16,900
state and the only difference is that he
状态，唯一的区别是他

82
00:04:17,100 --> 00:04:20,230
is the only actor in the setting I
是背景中我唯一的演员

83
00:04:20,430 --> 00:04:22,240
mentioned this because well empty peas
提到这个是因为豌豆很好

84
00:04:22,439 --> 00:04:24,680
have been studied
已经研究过

85
00:04:24,879 --> 00:04:27,860
naturally in a variety of disciplines
自然地在各种学科中

86
00:04:28,060 --> 00:04:30,139
from optimization to computer science to
从优化到计算机科学，再到

87
00:04:30,339 --> 00:04:34,400
pure math and beyond but also these two
纯粹的数学，以及超越这两个

88
00:04:34,600 --> 00:04:37,370
perspectives of generalization repeated
泛化观点重复

89
00:04:37,569 --> 00:04:41,050
games and of MVPs give you a sense for
游戏和MVP可以带给您

90
00:04:41,250 --> 00:04:44,030
the theory and investigations into
理论与研究

91
00:04:44,230 --> 00:04:47,240
stochastic games so from repeated games
重复游戏中的随机游戏

92
00:04:47,439 --> 00:04:52,129
we inherit the definitions of different
我们继承了不同的定义

93
00:04:52,329 --> 00:04:53,810
ways of the aggregating rewards over
汇总奖励的方式

94
00:04:54,009 --> 00:04:57,740
time you can have limited average
有限的平均时间

95
00:04:57,939 --> 00:05:01,389
rewards future discounted rewards
奖励未来的折扣奖励

96
00:05:01,589 --> 00:05:05,829
whereas from the literature on
而根据文献

97
00:05:06,029 --> 00:05:11,180
optimization and on MVPs we get notions
优化，在MVP上我们有了概念

98
00:05:11,379 --> 00:05:17,180
such as stationarity and Markovian
例如平稳性和马尔可夫式

99
00:05:17,379 --> 00:05:20,329
strategies these have to do with we also
这些与我们有关的策略

100
00:05:20,529 --> 00:05:22,370
have notions of reach ability about the
有关于

101
00:05:22,569 --> 00:05:24,079
structure of the underlying transition
基本过渡的结构

102
00:05:24,279 --> 00:05:26,569
probability and so again these are
概率，这些又是

103
00:05:26,769 --> 00:05:28,970
issues that are involved that we won't
我们不会涉及的问题

104
00:05:29,170 --> 00:05:33,620
get to into more in this lecture but at
在本讲座中深入了解，但在

105
00:05:33,819 --> 00:05:38,819
least we flagged their existence
至少我们标记了它们的存在