t for t in s if random.uniform(0, 1) > idx_to_pdiscard[t]]
for s in coded_dataset]
```
### 提取中心词和背景词
让我们先回顾一下跳字模型。在跳字模型中,我们用一个词(中心词)来预测它在文本序列周围的词,即与该中心词在相同时间窗口内的背景词。设最大时间窗口大小为`max_window_size`。我们先在整数1和`max_window_size`之间均匀随机采样一个正整数$h$作为时间窗口大小。在同一个句子中,与中心词的词间距不超过$h$的词均为该中心词的背景词。举个例子,考虑句子“the man loves his son a lot”。假设中心词为“loves”且$h$为2,那么该中心词的背景词为“the”、“man”、“his”和“son”。假设中心词为“lot”且$h$为3,那么该中心词的背景词为“his”、“son”和“a”。
[1] Penn Tree Bank. https://catalog.ldc.upenn.edu/ldc99t42
[2] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
[1] Penn Tree Bank. https://catalog.ldc.upenn.edu/ldc99t42
[2] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).