2024-02-05 13:15:51

eb48eee8 · 绝不原创的飞龙 · 609f7f27 · eb48eee8 · eb48eee8 · eb48eee8
16 changed file
--- a/totrans/doc22_009.yaml
+++ b/totrans/doc22_009.yaml
--- a/totrans/doc22_016.yaml
+++ b/totrans/doc22_016.yaml
@@ -72,8 +72,7 @@
  prefs: []
  type: TYPE_NORMAL
 - en: '**Don’t use linear layers that are too large.** A linear layer `nn.Linear(m,
-    n)` uses <math><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mi>n</mi><mi>m</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(nm)</annotation></semantics></math>O(nm)
+    n)` uses $O(nm)$O(nm)
    memory: that is to say, the memory requirements of the weights scales quadratically
    with the number of features. It is very easy to [blow through your memory](https://github.com/pytorch/pytorch/issues/958)
    this way (and remember that you will need at least twice the size of the weights,

--- a/totrans/doc22_017.yaml
+++ b/totrans/doc22_017.yaml
--- a/totrans/doc22_031.yaml
+++ b/totrans/doc22_031.yaml
@@ -186,17 +186,15 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`arange`](generated/torch.arange.html#torch.arange "torch.arange") | Returns
-    a 1-D tensor of size <math><semantics><mrow><mo fence="true">⌈</mo><mfrac><mrow><mtext>end</mtext><mo>−</mo><mtext>start</mtext></mrow><mtext>step</mtext></mfrac><mo
-    fence="true">⌉</mo></mrow><annotation encoding="application/x-tex">\left\lceil
-    \frac{\text{end} - \text{start}}{\text{step}} \right\rceil</annotation></semantics></math>⌈stepend−start⌉
+    a 1-D tensor of size $\left\lceil
+    \frac{\text{end} - \text{start}}{\text{step}} \right\rceil$⌈stepend−start⌉
    with values from the interval `[start, end)` taken with common difference `step`
    beginning from start. |'
  prefs: []
  type: TYPE_TB
 - en: '| [`range`](generated/torch.range.html#torch.range "torch.range") | Returns
-    a 1-D tensor of size <math><semantics><mrow><mrow><mo fence="true">⌊</mo><mfrac><mrow><mtext>end</mtext><mo>−</mo><mtext>start</mtext></mrow><mtext>step</mtext></mfrac><mo
-    fence="true">⌋</mo></mrow><mo>+</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\left\lfloor
-    \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor + 1</annotation></semantics></math>⌊stepend−start⌋+1
+    a 1-D tensor of size $\left\lfloor
+    \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor + 1$⌊stepend−start⌋+1
    with values from `start` to `end` with step `step`. |'
  prefs: []
  type: TYPE_TB
@@ -552,16 +550,12 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`rand`](generated/torch.rand.html#torch.rand "torch.rand") | Returns a tensor
-    filled with random numbers from a uniform distribution on the interval <math><semantics><mrow><mo
-    stretchy="false">[</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">[0, 1)</annotation></semantics></math>[0,1) |'
+    filled with random numbers from a uniform distribution on the interval $[0, 1)$[0,1) |'
  prefs: []
  type: TYPE_TB
 - en: '| [`rand_like`](generated/torch.rand_like.html#torch.rand_like "torch.rand_like")
    | Returns a tensor with the same size as `input` that is filled with random numbers
-    from a uniform distribution on the interval <math><semantics><mrow><mo stretchy="false">[</mo><mn>0</mn><mo
-    separator="true">,</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">[0, 1)</annotation></semantics></math>[0,1). |'
+    from a uniform distribution on the interval $[0, 1)$[0,1). |'
  prefs: []
  type: TYPE_TB
 - en: '| [`randint`](generated/torch.randint.html#torch.randint "torch.randint") |
@@ -817,9 +811,7 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`atan2`](generated/torch.atan2.html#torch.atan2 "torch.atan2") | Element-wise
-    arctangent of <math><semantics><mrow><msub><mtext>input</mtext><mi>i</mi></msub><mi
-    mathvariant="normal">/</mi><msub><mtext>other</mtext><mi>i</mi></msub></mrow><annotation
-    encoding="application/x-tex">\text{input}_{i} / \text{other}_{i}</annotation></semantics></math>inputi/otheri
+    arctangent of $\text{input}_{i} / \text{other}_{i}$inputi/otheri
    with consideration of the quadrant. |'
  prefs: []
  type: TYPE_TB
@@ -974,9 +966,7 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`gradient`](generated/torch.gradient.html#torch.gradient "torch.gradient")
-    | Estimates the gradient of a function <math><semantics><mrow><mi>g</mi><mo>:</mo><msup><mi
-    mathvariant="double-struck">R</mi><mi>n</mi></msup><mo>→</mo><mi mathvariant="double-struck">R</mi></mrow><annotation
-    encoding="application/x-tex">g : \mathbb{R}^n \rightarrow \mathbb{R}</annotation></semantics></math>g:Rn→R
+    | Estimates the gradient of a function $g : \mathbb{R}^n \rightarrow \mathbb{R}$g:Rn→R
    in one or more dimensions using the [second-order accurate central differences
    method](https://www.ams.org/journals/mcom/1988-51-184/S0025-5718-1988-0935077-0/S0025-5718-1988-0935077-0.pdf)
    and either first or second order estimates at the boundaries. |'
@@ -1465,9 +1455,8 @@
    element-wise minimum of `input` and `other`. |'
  prefs: []
  type: TYPE_TB
- en: '| [`ne`](generated/torch.ne.html#torch.ne "torch.ne") | Computes <math><semantics><mrow><mtext>input</mtext><mo
-    mathvariant="normal">≠</mo><mtext>other</mtext></mrow><annotation encoding="application/x-tex">\text{input}
-    \neq \text{other}</annotation></semantics></math>input=other element-wise. |'
+- en: '| [`ne`](generated/torch.ne.html#torch.ne "torch.ne") | Computes $\text{input}
+    \neq \text{other}$input=other element-wise. |'
  prefs: []
  type: TYPE_TB
 - en: '| [`not_equal`](generated/torch.not_equal.html#torch.not_equal "torch.not_equal")
@@ -1947,9 +1936,7 @@
  type: TYPE_TB
 - en: '| [`svd_lowrank`](generated/torch.svd_lowrank.html#torch.svd_lowrank "torch.svd_lowrank")
    | Return the singular value decomposition `(U, S, V)` of a matrix, batches of
-    matrices, or a sparse matrix $A$A such that <math><semantics><mrow><mi>A</mi><mo>≈</mo><mi>U</mi><mi>d</mi><mi>i</mi><mi>a</mi><mi>g</mi><mo
-    stretchy="false">(</mo><mi>S</mi><mo stretchy="false">)</mo><msup><mi>V</mi><mi>T</mi></msup></mrow><annotation
-    encoding="application/x-tex">A \approx U diag(S) V^T</annotation></semantics></math>A≈Udiag(S)VT.
+    matrices, or a sparse matrix $A$A such that $A \approx U diag(S) V^T$A≈Udiag(S)VT.
    |'
  prefs: []
  type: TYPE_TB

--- a/totrans/doc22_032.yaml
+++ b/totrans/doc22_032.yaml
@@ -498,11 +498,8 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`nn.Softplus`](generated/torch.nn.Softplus.html#torch.nn.Softplus "torch.nn.Softplus")
-    | Applies the Softplus function <math><semantics><mrow><mtext>Softplus</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>β</mi></mfrac><mo>∗</mo><mi>log</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>1</mn><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>β</mi><mo>∗</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{Softplus}(x)
-    = \frac{1}{\beta} * \log(1 + \exp(\beta * x))</annotation></semantics></math>Softplus(x)=β1∗log(1+exp(β∗x))
+    | Applies the Softplus function $\text{Softplus}(x)
+    = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$Softplus(x)=β1∗log(1+exp(β∗x))
    element-wise. |'
  prefs: []
  type: TYPE_TB
@@ -527,10 +524,7 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`nn.GLU`](generated/torch.nn.GLU.html#torch.nn.GLU "torch.nn.GLU") | Applies
-    the gated linear unit function <math><semantics><mrow><mrow><mi>G</mi><mi>L</mi><mi>U</mi></mrow><mo
-    stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>=</mo><mi>a</mi><mo>⊗</mo><mi>σ</mi><mo
-    stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">{GLU}(a, b)= a \otimes \sigma(b)</annotation></semantics></math>GLU(a,b)=a⊗σ(b)
+    the gated linear unit function ${GLU}(a, b)= a \otimes \sigma(b)$GLU(a,b)=a⊗σ(b)
    where $a$a
    is the first half of the input matrices and $b$b is the second
    half. |'
@@ -558,9 +552,7 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`nn.LogSoftmax`](generated/torch.nn.LogSoftmax.html#torch.nn.LogSoftmax
-    "torch.nn.LogSoftmax") | Applies the <math><semantics><mrow><mi>log</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mtext>Softmax</mtext><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\log(\text{Softmax}(x))</annotation></semantics></math>log(Softmax(x))
+    "torch.nn.LogSoftmax") | Applies the $\log(\text{Softmax}(x))$log(Softmax(x))
    function to an n-dimensional input Tensor. |'
  prefs: []
  type: TYPE_TB
@@ -893,9 +885,7 @@
  type: TYPE_TB
 - en: '| [`nn.MultiLabelSoftMarginLoss`](generated/torch.nn.MultiLabelSoftMarginLoss.html#torch.nn.MultiLabelSoftMarginLoss
    "torch.nn.MultiLabelSoftMarginLoss") | Creates a criterion that optimizes a multi-label
-    one-versus-all loss based on max-entropy, between input $x$x and target $y$y of size <math><semantics><mrow><mo
-    stretchy="false">(</mo><mi>N</mi><mo separator="true">,</mo><mi>C</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(N, C)</annotation></semantics></math>(N,C). |'
+    one-versus-all loss based on max-entropy, between input $x$x and target $y$y of size $(N, C)$(N,C). |'
  prefs: []
  type: TYPE_TB
 - en: '| [`nn.CosineEmbeddingLoss`](generated/torch.nn.CosineEmbeddingLoss.html#torch.nn.CosineEmbeddingLoss
@@ -909,9 +899,7 @@
    "torch.nn.MultiMarginLoss") | Creates a criterion that optimizes a multi-class
    classification hinge loss (margin-based loss) between input $x$x (a 2D mini-batch
    Tensor) and output $y$y
-    (which is a 1D tensor of target class indices, <math><semantics><mrow><mn>0</mn><mo>≤</mo><mi>y</mi><mo>≤</mo><mtext>x.size</mtext><mo
-    stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mn>1</mn></mrow><annotation
-    encoding="application/x-tex">0 \leq y \leq \text{x.size}(1)-1</annotation></semantics></math>0≤y≤x.size(1)−1):
+    (which is a 1D tensor of target class indices, $0 \leq y \leq \text{x.size}(1)-1$0≤y≤x.size(1)−1):
    |'
  prefs: []
  type: TYPE_TB

--- a/totrans/doc22_033.yaml
+++ b/totrans/doc22_033.yaml
@@ -204,12 +204,8 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`relu6`](generated/torch.nn.functional.relu6.html#torch.nn.functional.relu6
-    "torch.nn.functional.relu6") | Applies the element-wise function <math><semantics><mrow><mtext>ReLU6</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>min</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>max</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>0</mn><mo
-    separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mn>6</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{ReLU6}(x)
-    = \min(\max(0,x), 6)</annotation></semantics></math>ReLU6(x)=min(max(0,x),6).
+    "torch.nn.functional.relu6") | Applies the element-wise function $\text{ReLU6}(x)
+    = \min(\max(0,x), 6)$ReLU6(x)=min(max(0,x),6).
    |'
  prefs: []
  type: TYPE_TB
@@ -223,39 +219,22 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`selu`](generated/torch.nn.functional.selu.html#torch.nn.functional.selu
-    "torch.nn.functional.selu") | Applies element-wise, <math><semantics><mrow><mtext>SELU</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>s</mi><mi>c</mi><mi>a</mi><mi>l</mi><mi>e</mi><mo>∗</mo><mo
-    stretchy="false">(</mo><mi>max</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>0</mn><mo
-    separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo><mo>+</mo><mi>min</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>α</mi><mo>∗</mo><mo
-    stretchy="false">(</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{SELU}(x)
-    = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))</annotation></semantics></math>SELU(x)=scale∗(max(0,x)+min(0,α∗(exp(x)−1))),
+    "torch.nn.functional.selu") | Applies element-wise, $\text{SELU}(x)
+    = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))$SELU(x)=scale∗(max(0,x)+min(0,α∗(exp(x)−1))),
    with $\alpha=1.6732632423543772848170429916717$α=1.6732632423543772848170429916717
    and $scale=1.0507009873554804934193349852946$scale=1.0507009873554804934193349852946.
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`celu`](generated/torch.nn.functional.celu.html#torch.nn.functional.celu
-    "torch.nn.functional.celu") | Applies element-wise, <math><semantics><mrow><mtext>CELU</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo><mo>+</mo><mi>min</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>α</mi><mo>∗</mo><mo
-    stretchy="false">(</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>x</mi><mi
-    mathvariant="normal">/</mi><mi>α</mi><mo stretchy="false">)</mo><mo>−</mo><mn>1</mn><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{CELU}(x)
-    = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))</annotation></semantics></math>CELU(x)=max(0,x)+min(0,α∗(exp(x/α)−1)).
+    "torch.nn.functional.celu") | Applies element-wise, $\text{CELU}(x)
+    = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))$CELU(x)=max(0,x)+min(0,α∗(exp(x/α)−1)).
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`leaky_relu`](generated/torch.nn.functional.leaky_relu.html#torch.nn.functional.leaky_relu
-    "torch.nn.functional.leaky_relu") | Applies element-wise, <math><semantics><mrow><mtext>LeakyReLU</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo><mo>+</mo><mtext>negative_slope</mtext><mo>∗</mo><mi>min</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope}
-    * \min(0, x)</annotation></semantics></math>LeakyReLU(x)=max(0,x)+negative_slope∗min(0,x)
+    "torch.nn.functional.leaky_relu") | Applies element-wise, $\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope}
+    * \min(0, x)$LeakyReLU(x)=max(0,x)+negative_slope∗min(0,x)
    |'
  prefs: []
  type: TYPE_TB
@@ -265,11 +244,7 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`prelu`](generated/torch.nn.functional.prelu.html#torch.nn.functional.prelu
-    "torch.nn.functional.prelu") | Applies element-wise the function <math><semantics><mrow><mtext>PReLU</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo><mo>+</mo><mtext>weight</mtext><mo>∗</mo><mi>min</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)</annotation></semantics></math>PReLU(x)=max(0,x)+weight∗min(0,x)
+    "torch.nn.functional.prelu") | Applies element-wise the function $\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)$PReLU(x)=max(0,x)+weight∗min(0,x)
    where weight is a learnable parameter. |'
  prefs: []
  type: TYPE_TB
@@ -288,20 +263,13 @@
  type: TYPE_TB
 - en: '| [`gelu`](generated/torch.nn.functional.gelu.html#torch.nn.functional.gelu
    "torch.nn.functional.gelu") | When the approximate argument is ''none'', it applies
-    element-wise the function <math><semantics><mrow><mtext>GELU</mtext><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo>=</mo><mi>x</mi><mo>∗</mo><mi mathvariant="normal">Φ</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\text{GELU}(x) = x * \Phi(x)</annotation></semantics></math>GELU(x)=x∗Φ(x)
+    element-wise the function $\text{GELU}(x) = x * \Phi(x)$GELU(x)=x∗Φ(x)
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`logsigmoid`](generated/torch.nn.functional.logsigmoid.html#torch.nn.functional.logsigmoid
-    "torch.nn.functional.logsigmoid") | Applies element-wise <math><semantics><mrow><mtext>LogSigmoid</mtext><mo
-    stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo>=</mo><mi>log</mi><mo>⁡</mo><mrow><mo
-    fence="true">(</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mo>−</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow></mfrac><mo
-    fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">\text{LogSigmoid}(x_i)
-    = \log \left(\frac{1}{1 + \exp(-x_i)}\right)</annotation></semantics></math>LogSigmoid(xi)=log(1+exp(−xi)1)
+    "torch.nn.functional.logsigmoid") | Applies element-wise $\text{LogSigmoid}(x_i)
+    = \log \left(\frac{1}{1 + \exp(-x_i)}\right)$LogSigmoid(xi)=log(1+exp(−xi)1)
    |'
  prefs: []
  type: TYPE_TB
@@ -311,27 +279,18 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`tanhshrink`](generated/torch.nn.functional.tanhshrink.html#torch.nn.functional.tanhshrink
-    "torch.nn.functional.tanhshrink") | Applies element-wise, <math><semantics><mrow><mtext>Tanhshrink</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>x</mi><mo>−</mo><mtext>Tanh</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\text{Tanhshrink}(x) = x - \text{Tanh}(x)</annotation></semantics></math>Tanhshrink(x)=x−Tanh(x)
+    "torch.nn.functional.tanhshrink") | Applies element-wise, $\text{Tanhshrink}(x) = x - \text{Tanh}(x)$Tanhshrink(x)=x−Tanh(x)
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`softsign`](generated/torch.nn.functional.softsign.html#torch.nn.functional.softsign
-    "torch.nn.functional.softsign") | Applies element-wise, the function <math><semantics><mrow><mtext>SoftSign</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mi>x</mi><mrow><mn>1</mn><mo>+</mo><mi
-    mathvariant="normal">∣</mi><mi>x</mi><mi mathvariant="normal">∣</mi></mrow></mfrac></mrow><annotation
-    encoding="application/x-tex">\text{SoftSign}(x) = \frac{x}{1 + &#124;x&#124;}</annotation></semantics></math>SoftSign(x)=1+∣x∣x
+    "torch.nn.functional.softsign") | Applies element-wise, the function $\text{SoftSign}(x) = \frac{x}{1 + &#124;x&#124;}$SoftSign(x)=1+∣x∣x
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`softplus`](generated/torch.nn.functional.softplus.html#torch.nn.functional.softplus
-    "torch.nn.functional.softplus") | Applies element-wise, the function <math><semantics><mrow><mtext>Softplus</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>β</mi></mfrac><mo>∗</mo><mi>log</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>1</mn><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>β</mi><mo>∗</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{Softplus}(x)
-    = \frac{1}{\beta} * \log(1 + \exp(\beta * x))</annotation></semantics></math>Softplus(x)=β1∗log(1+exp(β∗x)).
+    "torch.nn.functional.softplus") | Applies element-wise, the function $\text{Softplus}(x)
+    = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$Softplus(x)=β1∗log(1+exp(β∗x)).
    |'
  prefs: []
  type: TYPE_TB
@@ -360,23 +319,13 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`tanh`](generated/torch.nn.functional.tanh.html#torch.nn.functional.tanh
-    "torch.nn.functional.tanh") | Applies element-wise, <math><semantics><mrow><mtext>Tanh</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>tanh</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>−</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mo>−</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><mrow><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mo>−</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation
-    encoding="application/x-tex">\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x)
-    + \exp(-x)}</annotation></semantics></math>Tanh(x)=tanh(x)=exp(x)+exp(−x)exp(x)−exp(−x)
+    "torch.nn.functional.tanh") | Applies element-wise, $\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x)
+    + \exp(-x)}$Tanh(x)=tanh(x)=exp(x)+exp(−x)exp(x)−exp(−x)
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`sigmoid`](generated/torch.nn.functional.sigmoid.html#torch.nn.functional.sigmoid
-    "torch.nn.functional.sigmoid") | Applies the element-wise function <math><semantics><mrow><mtext>Sigmoid</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mo>−</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation
-    encoding="application/x-tex">\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}</annotation></semantics></math>Sigmoid(x)=1+exp(−x)1
+    "torch.nn.functional.sigmoid") | Applies the element-wise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}$Sigmoid(x)=1+exp(−x)1
    |'
  prefs: []
  type: TYPE_TB
@@ -624,26 +573,17 @@
  type: TYPE_NORMAL
 - en: '| [`pixel_shuffle`](generated/torch.nn.functional.pixel_shuffle.html#torch.nn.functional.pixel_shuffle
    "torch.nn.functional.pixel_shuffle") | Rearranges elements in a tensor of shape
-    <math><semantics><mrow><mo stretchy="false">(</mo><mo>∗</mo><mo separator="true">,</mo><mi>C</mi><mo>×</mo><msup><mi>r</mi><mn>2</mn></msup><mo
-    separator="true">,</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(*, C \times r^2, H, W)</annotation></semantics></math>(∗,C×r2,H,W)
-    to a tensor of shape <math><semantics><mrow><mo stretchy="false">(</mo><mo>∗</mo><mo
-    separator="true">,</mo><mi>C</mi><mo separator="true">,</mo><mi>H</mi><mo>×</mo><mi>r</mi><mo
-    separator="true">,</mo><mi>W</mi><mo>×</mo><mi>r</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(*, C, H \times r, W \times r)</annotation></semantics></math>(∗,C,H×r,W×r),
+    $(*, C \times r^2, H, W)$(∗,C×r2,H,W)
+    to a tensor of shape $(*, C, H \times r, W \times r)$(∗,C,H×r,W×r),
    where r is the `upscale_factor`. |'
  prefs: []
  type: TYPE_TB
 - en: '| [`pixel_unshuffle`](generated/torch.nn.functional.pixel_unshuffle.html#torch.nn.functional.pixel_unshuffle
    "torch.nn.functional.pixel_unshuffle") | Reverses the [`PixelShuffle`](generated/torch.nn.PixelShuffle.html#torch.nn.PixelShuffle
    "torch.nn.PixelShuffle") operation by rearranging elements in a tensor of shape
-    <math><semantics><mrow><mo stretchy="false">(</mo><mo>∗</mo><mo separator="true">,</mo><mi>C</mi><mo
-    separator="true">,</mo><mi>H</mi><mo>×</mo><mi>r</mi><mo separator="true">,</mo><mi>W</mi><mo>×</mo><mi>r</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(*, C,
-    H \times r, W \times r)</annotation></semantics></math>(∗,C,H×r,W×r) to a tensor
-    of shape <math><semantics><mrow><mo stretchy="false">(</mo><mo>∗</mo><mo separator="true">,</mo><mi>C</mi><mo>×</mo><msup><mi>r</mi><mn>2</mn></msup><mo
-    separator="true">,</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(*, C \times r^2, H, W)</annotation></semantics></math>(∗,C×r2,H,W),
+    $(*, C,
+    H \times r, W \times r)$(∗,C,H×r,W×r) to a tensor
+    of shape $(*, C \times r^2, H, W)$(∗,C×r2,H,W),
    where r is the `downscale_factor`. |'
  prefs: []
  type: TYPE_TB

--- a/totrans/doc22_034.yaml
+++ b/totrans/doc22_034.yaml
@@ -626,19 +626,14 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`Tensor.bernoulli`](generated/torch.Tensor.bernoulli.html#torch.Tensor.bernoulli
-    "torch.Tensor.bernoulli") | Returns a result tensor where each <math><semantics><mrow><mtext
-    mathvariant="monospace">result[i]</mtext></mrow><annotation encoding="application/x-tex">\texttt{result[i]}</annotation></semantics></math>result[i]
-    is independently sampled from <math><semantics><mrow><mtext>Bernoulli</mtext><mo
-    stretchy="false">(</mo><mtext mathvariant="monospace">self[i]</mtext><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\text{Bernoulli}(\texttt{self[i]})</annotation></semantics></math>Bernoulli(self[i]).
+    "torch.Tensor.bernoulli") | Returns a result tensor where each $\texttt{result[i]}$result[i]
+    is independently sampled from $\text{Bernoulli}(\texttt{self[i]})$Bernoulli(self[i]).
    |'
  prefs: []
  type: TYPE_TB
 - en: '| [`Tensor.bernoulli_`](generated/torch.Tensor.bernoulli_.html#torch.Tensor.bernoulli_
    "torch.Tensor.bernoulli_") | Fills each location of `self` with an independent
-    sample from <math><semantics><mrow><mtext>Bernoulli</mtext><mo stretchy="false">(</mo><mtext
-    mathvariant="monospace">p</mtext><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\text{Bernoulli}(\texttt{p})</annotation></semantics></math>Bernoulli(p).
+    sample from $\text{Bernoulli}(\texttt{p})$Bernoulli(p).
    |'
  prefs: []
  type: TYPE_TB

--- a/totrans/doc22_053.yaml
+++ b/totrans/doc22_053.yaml
@@ -19,12 +19,8 @@
    and the pathwise derivative estimator. REINFORCE is commonly seen as the basis
    for policy gradient methods in reinforcement learning, and the pathwise derivative
    estimator is commonly seen in the reparameterization trick in variational autoencoders.
-    Whilst the score function only requires the value of samples <math><semantics><mrow><mi>f</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">f(x)</annotation></semantics></math>f(x), the pathwise
-    derivative requires the derivative <math><semantics><mrow><msup><mi>f</mi><mo
-    mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">f'(x)</annotation></semantics></math>f′(x).
+    Whilst the score function only requires the value of samples $f(x)$f(x), the pathwise
+    derivative requires the derivative $f'(x)$f′(x).
    The next sections discuss these two in a reinforcement learning example. For more
    details see [Gradient Estimation Using Stochastic Computation Graphs](https://arxiv.org/abs/1506.05254)
    .
@@ -38,19 +34,13 @@
    parameters, we only need `sample()` and `log_prob()` to implement REINFORCE:'
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mi mathvariant="normal">Δ</mi><mi>θ</mi><mo>=</mo><mi>α</mi><mi>r</mi><mfrac><mrow><mi
-    mathvariant="normal">∂</mi><mi>log</mi><mo>⁡</mo><mi>p</mi><mo stretchy="false">(</mo><mi>a</mi><mi
-    mathvariant="normal">∣</mi><msup><mi>π</mi><mi>θ</mi></msup><mo stretchy="false">(</mo><mi>s</mi><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><mrow><mi mathvariant="normal">∂</mi><mi>θ</mi></mrow></mfrac></mrow><annotation
-    encoding="application/x-tex">\Delta\theta = \alpha r \frac{\partial\log p(a|\pi^\theta(s))}{\partial\theta}</annotation></semantics></math>Δθ=αr∂θ∂logp(a∣πθ(s))
+- en: $\Delta\theta = \alpha r \frac{\partial\log p(a|\pi^\theta(s))}{\partial\theta}$Δθ=αr∂θ∂logp(a∣πθ(s))
  prefs: []
  type: TYPE_NORMAL
 - en: where $\theta$θ
    are the parameters, $\alpha$α
    is the learning rate, $r$r
-    is the reward and <math><semantics><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>a</mi><mi
-    mathvariant="normal">∣</mi><msup><mi>π</mi><mi>θ</mi></msup><mo stretchy="false">(</mo><mi>s</mi><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">p(a|\pi^\theta(s))</annotation></semantics></math>p(a∣πθ(s))
+    is the reward and $p(a|\pi^\theta(s))$p(a∣πθ(s))
    is the probability of taking action $a$a in state $s$s given policy $\pi^\theta$πθ.
  prefs: []
  type: TYPE_NORMAL
@@ -371,24 +361,14 @@
    is defined below
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><msub><mi>p</mi><mi>F</mi></msub><mo
-    stretchy="false">(</mo><mi>x</mi><mo separator="true">;</mo><mi>θ</mi><mo stretchy="false">)</mo><mo>=</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mo stretchy="false">⟨</mo><mi>t</mi><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo separator="true">,</mo><mi>θ</mi><mo stretchy="false">⟩</mo><mo>−</mo><mi>F</mi><mo
-    stretchy="false">(</mo><mi>θ</mi><mo stretchy="false">)</mo><mo>+</mo><mi>k</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle
-    - F(\theta) + k(x))</annotation></semantics></math>pF(x;θ)=exp(⟨t(x),θ⟩−F(θ)+k(x))
+- en: $p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle
+    - F(\theta) + k(x))$pF(x;θ)=exp(⟨t(x),θ⟩−F(θ)+k(x))
  prefs: []
  type: TYPE_NORMAL
 - en: where $\theta$θ
-    denotes the natural parameters, <math><semantics><mrow><mi>t</mi><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">t(x)</annotation></semantics></math>t(x)
-    denotes the sufficient statistic, <math><semantics><mrow><mi>F</mi><mo stretchy="false">(</mo><mi>θ</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">F(\theta)</annotation></semantics></math>F(θ)
-    is the log normalizer function for a given family and <math><semantics><mrow><mi>k</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">k(x)</annotation></semantics></math>k(x) is the carrier
+    denotes the natural parameters, $t(x)$t(x)
+    denotes the sufficient statistic, $F(\theta)$F(θ)
+    is the log normalizer function for a given family and $k(x)$k(x) is the carrier
    measure.
  prefs: []
  type: TYPE_NORMAL
@@ -667,10 +647,8 @@
    "torch.multinomial") samples from.
  prefs: []
  type: TYPE_NORMAL
- en: Samples are integers from <math><semantics><mrow><mo stretchy="false">{</mo><mn>0</mn><mo
-    separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><mi>K</mi><mo>−</mo><mn>1</mn><mo
-    stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">\{0, \ldots,
-    K-1\}</annotation></semantics></math>{0,…,K−1} where K is `probs.size(-1)`.
+- en: Samples are integers from $\{0, \ldots,
+    K-1\}${0,…,K−1} where K is `probs.size(-1)`.
  prefs: []
  type: TYPE_NORMAL
 - en: If probs is 1-dimensional with length-K, each element is the relative probability
@@ -1231,28 +1209,19 @@
    of Bernoulli trials.
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>X</mi><mo>=</mo><mi>k</mi><mo
-    stretchy="false">)</mo><mo>=</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>p</mi><msup><mo
-    stretchy="false">)</mo><mi>k</mi></msup><mi>p</mi><mo separator="true">,</mo><mi>k</mi><mo>=</mo><mn>0</mn><mo
-    separator="true">,</mo><mn>1</mn><mo separator="true">,</mo><mi mathvariant="normal">.</mi><mi
-    mathvariant="normal">.</mi><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">P(X=k)
-    = (1-p)^{k} p, k = 0, 1, ...</annotation></semantics></math>P(X=k)=(1−p)kp,k=0,1,...
+- en: $P(X=k)
+    = (1-p)^{k} p, k = 0, 1, ...$P(X=k)=(1−p)kp,k=0,1,...
  prefs: []
  type: TYPE_NORMAL
 - en: Note
  prefs: []
  type: TYPE_NORMAL
 - en: '[`torch.distributions.geometric.Geometric()`](#torch.distributions.geometric.Geometric
-    "torch.distributions.geometric.Geometric") <math><semantics><mrow><mo stretchy="false">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(k+1)</annotation></semantics></math>(k+1)-th
-    trial is the first success hence draws samples in <math><semantics><mrow><mo stretchy="false">{</mo><mn>0</mn><mo
-    separator="true">,</mo><mn>1</mn><mo separator="true">,</mo><mo>…</mo><mo stretchy="false">}</mo></mrow><annotation
-    encoding="application/x-tex">\{0, 1, \ldots\}</annotation></semantics></math>{0,1,…},
+    "torch.distributions.geometric.Geometric") $(k+1)$(k+1)-th
+    trial is the first success hence draws samples in $\{0, 1, \ldots\}${0,1,…},
    whereas [`torch.Tensor.geometric_()`](generated/torch.Tensor.geometric_.html#torch.Tensor.geometric_
    "torch.Tensor.geometric_") k-th trial is the first success hence draws samples
-    in <math><semantics><mrow><mo stretchy="false">{</mo><mn>1</mn><mo separator="true">,</mo><mn>2</mn><mo
-    separator="true">,</mo><mo>…</mo><mo stretchy="false">}</mo></mrow><annotation
-    encoding="application/x-tex">\{1, 2, \ldots\}</annotation></semantics></math>{1,2,…}.'
+    in $\{1, 2, \ldots\}${1,2,…}.'
  prefs: []
  type: TYPE_NORMAL
 - en: 'Example:'
@@ -1719,9 +1688,7 @@
 - en: 'LKJ distribution for lower Cholesky factor of correlation matrices. The distribution
    is controlled by `concentration` parameter $\eta$η to make the
    probability of the correlation matrix $M$M generated from
-    a Cholesky factor proportional to <math><semantics><mrow><mi>det</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>M</mi><msup><mo stretchy="false">)</mo><mrow><mi>η</mi><mo>−</mo><mn>1</mn></mrow></msup></mrow><annotation
-    encoding="application/x-tex">\det(M)^{\eta - 1}</annotation></semantics></math>det(M)η−1.
+    a Cholesky factor proportional to $\det(M)^{\eta - 1}$det(M)η−1.
    Because of that, when `concentration == 1`, we have a uniform distribution over
    Cholesky factors of correlation matrices:'
  prefs: []
@@ -2209,9 +2176,7 @@
    a positive definite covariance matrix $\mathbf{\Sigma}$Σ
    or a positive definite precision matrix $\mathbf{\Sigma}^{-1}$Σ−1
    or a lower-triangular matrix $\mathbf{L}$L with
-    positive-valued diagonal entries, such that <math><semantics><mrow><mi mathvariant="bold">Σ</mi><mo>=</mo><mi
-    mathvariant="bold">L</mi><msup><mi mathvariant="bold">L</mi><mi mathvariant="normal">⊤</mi></msup></mrow><annotation
-    encoding="application/x-tex">\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top</annotation></semantics></math>Σ=LL⊤.
+    positive-valued diagonal entries, such that $\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top$Σ=LL⊤.
    This triangular matrix can be obtained via e.g. Cholesky decomposition of the
    covariance.
  prefs: []
@@ -2623,11 +2588,7 @@
 - en: Samples are nonnegative integers, with a pmf given by
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><msup><mrow><mi mathvariant="normal">r</mi><mi
-    mathvariant="normal">a</mi><mi mathvariant="normal">t</mi><mi mathvariant="normal">e</mi></mrow><mi>k</mi></msup><mfrac><msup><mi>e</mi><mrow><mo>−</mo><mrow><mi
-    mathvariant="normal">r</mi><mi mathvariant="normal">a</mi><mi mathvariant="normal">t</mi><mi
-    mathvariant="normal">e</mi></mrow></mrow></msup><mrow><mi>k</mi><mo stretchy="false">!</mo></mrow></mfrac></mrow><annotation
-    encoding="application/x-tex">\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!}</annotation></semantics></math>
+- en: $\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!}$
    ratekk!e−rate
  prefs: []
  type: TYPE_NORMAL
@@ -3250,9 +3211,7 @@
  type: TYPE_NORMAL
 - en: Creates a Wishart distribution parameterized by a symmetric positive definite
    matrix $\Sigma$Σ, or its Cholesky
-    decomposition <math><semantics><mrow><mi mathvariant="bold">Σ</mi><mo>=</mo><mi
-    mathvariant="bold">L</mi><msup><mi mathvariant="bold">L</mi><mi mathvariant="normal">⊤</mi></msup></mrow><annotation
-    encoding="application/x-tex">\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top</annotation></semantics></math>Σ=LL⊤
+    decomposition $\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top$Σ=LL⊤
  prefs: []
  type: TYPE_NORMAL
 - en: Example
@@ -3369,18 +3328,11 @@
 - en: '[PRE525]'
  prefs: []
  type: TYPE_PRE
- en: Compute Kullback-Leibler divergence <math><semantics><mrow><mi>K</mi><mi>L</mi><mo
-    stretchy="false">(</mo><mi>p</mi><mi mathvariant="normal">∥</mi><mi>q</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">KL(p \|
-    q)</annotation></semantics></math>KL(p∥q) between two distributions.
+- en: Compute Kullback-Leibler divergence $KL(p \|
+    q)$KL(p∥q) between two distributions.
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mi>K</mi><mi>L</mi><mo stretchy="false">(</mo><mi>p</mi><mi
-    mathvariant="normal">∥</mi><mi>q</mi><mo stretchy="false">)</mo><mo>=</mo><mo>∫</mo><mi>p</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>log</mi><mo>⁡</mo><mfrac><mrow><mi>p</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><mrow><mi>q</mi><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac><mi>d</mi><mi>x</mi></mrow><annotation
-    encoding="application/x-tex">KL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx</annotation></semantics></math>KL(p∥q)=∫p(x)logq(x)p(x)dx
+- en: $KL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx$KL(p∥q)=∫p(x)logq(x)p(x)dx
  prefs: []
  type: TYPE_NORMAL
 - en: Parameters
@@ -3808,9 +3760,8 @@
 - en: '[PRE530]'
  prefs: []
  type: TYPE_PRE
- en: Transform via the mapping <math><semantics><mrow><mi>y</mi><mo>=</mo><mi mathvariant="normal">∣</mi><mi>x</mi><mi
-    mathvariant="normal">∣</mi></mrow><annotation encoding="application/x-tex">y =
-    |x|</annotation></semantics></math>y=∣x∣.
+- en: Transform via the mapping $y =
+    |x|$y=∣x∣.
  prefs: []
  type: TYPE_NORMAL
 - en: '[PRE531]'
@@ -3877,9 +3828,7 @@
 - en: '[PRE535]'
  prefs: []
  type: TYPE_PRE
- en: 'Transforms an uncontrained real vector $x$x with length <math><semantics><mrow><mi>D</mi><mo>∗</mo><mo
-    stretchy="false">(</mo><mi>D</mi><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><mi
-    mathvariant="normal">/</mi><mn>2</mn></mrow><annotation encoding="application/x-tex">D*(D-1)/2</annotation></semantics></math>D∗(D−1)/2
+- en: 'Transforms an uncontrained real vector $x$x with length $D*(D-1)/2$D∗(D−1)/2
    into the Cholesky factor of a D-dimension correlation matrix. This Cholesky factor
    is a lower triangular matrix with positive diagonals and unit Euclidean norm for
    each row. The transform is processed as follows:'
@@ -3904,18 +3853,11 @@
    triangular part, we apply a *signed* version of class [`StickBreakingTransform`](#torch.distributions.transforms.StickBreakingTransform
    "torch.distributions.transforms.StickBreakingTransform") to transform $X_i$Xi into a unit
    Euclidean length vector using the following steps: - Scales into the interval
-    <math><semantics><mrow><mo stretchy="false">(</mo><mo>−</mo><mn>1</mn><mo separator="true">,</mo><mn>1</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(-1, 1)</annotation></semantics></math>(−1,1)
-    domain: <math><semantics><mrow><msub><mi>r</mi><mi>i</mi></msub><mo>=</mo><mi>tanh</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><msub><mi>X</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">r_i = \tanh(X_i)</annotation></semantics></math>ri=tanh(Xi).
+    $(-1, 1)$(−1,1)
+    domain: $r_i = \tanh(X_i)$ri=tanh(Xi).
    - Transforms into an unsigned domain: $z_i = r_i^2$zi=ri2.
-    - Applies <math><semantics><mrow><msub><mi>s</mi><mi>i</mi></msub><mo>=</mo><mi>S</mi><mi>t</mi><mi>i</mi><mi>c</mi><mi>k</mi><mi>B</mi><mi>r</mi><mi>e</mi><mi>a</mi><mi>k</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>T</mi><mi>r</mi><mi>a</mi><mi>n</mi><mi>s</mi><mi>f</mi><mi>o</mi><mi>r</mi><mi>m</mi><mo
-    stretchy="false">(</mo><msub><mi>z</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">s_i = StickBreakingTransform(z_i)</annotation></semantics></math>si=StickBreakingTransform(zi).
-    - Transforms back into signed domain: <math><semantics><mrow><msub><mi>y</mi><mi>i</mi></msub><mo>=</mo><mi>s</mi><mi>i</mi><mi>g</mi><mi>n</mi><mo
-    stretchy="false">(</mo><msub><mi>r</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo>∗</mo><msqrt><msub><mi>s</mi><mi>i</mi></msub></msqrt></mrow><annotation
-    encoding="application/x-tex">y_i = sign(r_i) * \sqrt{s_i}</annotation></semantics></math>yi=sign(ri)∗si.'
+    - Applies $s_i = StickBreakingTransform(z_i)$si=StickBreakingTransform(zi).
+    - Transforms back into signed domain: $y_i = sign(r_i) * \sqrt{s_i}$yi=sign(ri)∗si.'
  prefs:
  - PREF_BQ
  - PREF_OL
@@ -3943,9 +3885,7 @@
 - en: '[PRE538]'
  prefs: []
  type: TYPE_PRE
- en: Transform via the mapping <math><semantics><mrow><mi>y</mi><mo>=</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">y = \exp(x)</annotation></semantics></math>y=exp(x).
+- en: Transform via the mapping $y = \exp(x)$y=exp(x).
  prefs: []
  type: TYPE_NORMAL
 - en: '[PRE539]'
@@ -4018,30 +3958,22 @@
 - en: '[PRE544]'
  prefs: []
  type: TYPE_PRE
- en: Transform via the mapping <math><semantics><mrow><mi>y</mi><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mo>−</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation
-    encoding="application/x-tex">y = \frac{1}{1 + \exp(-x)}</annotation></semantics></math>y=1+exp(−x)1
-    and <math><semantics><mrow><mi>x</mi><mo>=</mo><mtext>logit</mtext><mo stretchy="false">(</mo><mi>y</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x = \text{logit}(y)</annotation></semantics></math>x=logit(y).
+- en: Transform via the mapping $y = \frac{1}{1 + \exp(-x)}$y=1+exp(−x)1
+    and $x = \text{logit}(y)$x=logit(y).
  prefs: []
  type: TYPE_NORMAL
 - en: '[PRE545]'
  prefs: []
  type: TYPE_PRE
- en: Transform via the mapping <math><semantics><mrow><mtext>Softplus</mtext><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>log</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mn>1</mn><mo>+</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>x</mi><mo
-    stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{Softplus}(x)
-    = \log(1 + \exp(x))</annotation></semantics></math>Softplus(x)=log(1+exp(x)).
+- en: Transform via the mapping $\text{Softplus}(x)
+    = \log(1 + \exp(x))$Softplus(x)=log(1+exp(x)).
    The implementation reverts to the linear function when $x > 20$x>20.
  prefs: []
  type: TYPE_NORMAL
 - en: '[PRE546]'
  prefs: []
  type: TYPE_PRE
- en: Transform via the mapping <math><semantics><mrow><mi>y</mi><mo>=</mo><mi>tanh</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">y = \tanh(x)</annotation></semantics></math>y=tanh(x).
+- en: Transform via the mapping $y = \tanh(x)$y=tanh(x).
  prefs: []
  type: TYPE_NORMAL
 - en: It is equivalent to `` ComposeTransform([AffineTransform(0., 2.), SigmoidTransform(),
@@ -4055,9 +3987,7 @@
 - en: '[PRE547]'
  prefs: []
  type: TYPE_PRE
- en: Transform from unconstrained space to the simplex via <math><semantics><mrow><mi>y</mi><mo>=</mo><mi>exp</mi><mo>⁡</mo><mo
-    stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">y = \exp(x)</annotation></semantics></math>y=exp(x)
+- en: Transform from unconstrained space to the simplex via $y = \exp(x)$y=exp(x)
    then normalizing.
  prefs: []
  type: TYPE_NORMAL

--- a/totrans/doc22_064.yaml
+++ b/totrans/doc22_064.yaml
--- a/totrans/doc22_068.yaml
+++ b/totrans/doc22_068.yaml
@@ -88,9 +88,7 @@
 - en: Fill the input Tensor with values drawn from the uniform distribution.
  prefs: []
  type: TYPE_NORMAL
- en: <math><semantics><mrow><mi mathvariant="script">U</mi><mo stretchy="false">(</mo><mi>a</mi><mo
-    separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\mathcal{U}(a, b)</annotation></semantics></math>U(a,b).
+- en: $\mathcal{U}(a, b)$U(a,b).
  prefs: []
  type: TYPE_NORMAL
 - en: Parameters
@@ -135,9 +133,7 @@
 - en: Fill the input Tensor with values drawn from the normal distribution.
  prefs: []
  type: TYPE_NORMAL
- en: <math><semantics><mrow><mi mathvariant="script">N</mi><mo stretchy="false">(</mo><mtext>mean</mtext><mo
-    separator="true">,</mo><msup><mtext>std</mtext><mn>2</mn></msup><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">\mathcal{N}(\text{mean}, \text{std}^2)</annotation></semantics></math>N(mean,std2).
+- en: $\mathcal{N}(\text{mean}, \text{std}^2)$N(mean,std2).
  prefs: []
  type: TYPE_NORMAL
 - en: Parameters
@@ -317,15 +313,12 @@
  type: TYPE_NORMAL
 - en: The method is described in Understanding the difficulty of training deep feedforward
    neural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensor will have
-    values sampled from <math><semantics><mrow><mi mathvariant="script">U</mi><mo
-    stretchy="false">(</mo><mo>−</mo><mi>a</mi><mo separator="true">,</mo><mi>a</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathcal{U}(-a,
-    a)</annotation></semantics></math>U(−a,a) where
+    values sampled from $\mathcal{U}(-a,
+    a)$U(−a,a) where
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mi>a</mi><mo>=</mo><mtext>gain</mtext><mo>×</mo><msqrt><mfrac><mn>6</mn><mrow><mtext>fan_in</mtext><mo>+</mo><mtext>fan_out</mtext></mrow></mfrac></msqrt></mrow><annotation
-    encoding="application/x-tex">a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in}
-    + \text{fan\_out}}}</annotation></semantics></math> a=gain×fan_in+fan_out6
+- en: $a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in}
+    + \text{fan\_out}}}$ a=gain×fan_in+fan_out6
  prefs: []
  type: TYPE_NORMAL
 - en: Also known as Glorot initialization.
@@ -370,15 +363,12 @@
  type: TYPE_NORMAL
 - en: The method is described in Understanding the difficulty of training deep feedforward
    neural networks - Glorot, X. & Bengio, Y. (2010). The resulting tensor will have
-    values sampled from <math><semantics><mrow><mi mathvariant="script">N</mi><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><msup><mtext>std</mtext><mn>2</mn></msup><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathcal{N}(0,
-    \text{std}^2)</annotation></semantics></math>N(0,std2) where
+    values sampled from $\mathcal{N}(0,
+    \text{std}^2)$N(0,std2) where
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mtext>std</mtext><mo>=</mo><mtext>gain</mtext><mo>×</mo><msqrt><mfrac><mn>2</mn><mrow><mtext>fan_in</mtext><mo>+</mo><mtext>fan_out</mtext></mrow></mfrac></msqrt></mrow><annotation
-    encoding="application/x-tex">\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in}
-    + \text{fan\_out}}}</annotation></semantics></math> std=gain×fan_in+fan_out2
+- en: $\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in}
+    + \text{fan\_out}}}$ std=gain×fan_in+fan_out2
  prefs: []
  type: TYPE_NORMAL
 - en: Also known as Glorot initialization.
@@ -423,10 +413,8 @@
  type: TYPE_NORMAL
 - en: 'The method is described in Delving deep into rectifiers: Surpassing human-level
    performance on ImageNet classification - He, K. et al. (2015). The resulting tensor
-    will have values sampled from <math><semantics><mrow><mi mathvariant="script">U</mi><mo
-    stretchy="false">(</mo><mo>−</mo><mtext>bound</mtext><mo separator="true">,</mo><mtext>bound</mtext><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathcal{U}(-\text{bound},
-    \text{bound})</annotation></semantics></math>U(−bound,bound) where'
+    will have values sampled from $\mathcal{U}(-\text{bound},
+    \text{bound})$U(−bound,bound) where'
  prefs: []
  type: TYPE_NORMAL
 - en: $\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}$
@@ -483,10 +471,8 @@
  type: TYPE_NORMAL
 - en: 'The method is described in Delving deep into rectifiers: Surpassing human-level
    performance on ImageNet classification - He, K. et al. (2015). The resulting tensor
-    will have values sampled from <math><semantics><mrow><mi mathvariant="script">N</mi><mo
-    stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><msup><mtext>std</mtext><mn>2</mn></msup><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathcal{N}(0,
-    \text{std}^2)</annotation></semantics></math>N(0,std2) where'
+    will have values sampled from $\mathcal{N}(0,
+    \text{std}^2)$N(0,std2) where'
  prefs: []
  type: TYPE_NORMAL
 - en: $\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}$
@@ -541,12 +527,9 @@
 - en: Fill the input Tensor with values drawn from a truncated normal distribution.
  prefs: []
  type: TYPE_NORMAL
- en: The values are effectively drawn from the normal distribution <math><semantics><mrow><mi
-    mathvariant="script">N</mi><mo stretchy="false">(</mo><mtext>mean</mtext><mo separator="true">,</mo><msup><mtext>std</mtext><mn>2</mn></msup><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathcal{N}(\text{mean},
-    \text{std}^2)</annotation></semantics></math>N(mean,std2) with values outside
-    <math><semantics><mrow><mo stretchy="false">[</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><mo
-    stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">[a, b]</annotation></semantics></math>[a,b]
+- en: The values are effectively drawn from the normal distribution $\mathcal{N}(\text{mean},
+    \text{std}^2)$N(mean,std2) with values outside
+    $[a, b]$[a,b]
    redrawn until they are within the bounds. The method used for generating the random
    values works best when $a \leq \text{mean} \leq b$a≤mean≤b.
  prefs: []
@@ -638,10 +621,8 @@
 - en: Fill the 2D input Tensor as a sparse matrix.
  prefs: []
  type: TYPE_NORMAL
- en: The non-zero elements will be drawn from the normal distribution <math><semantics><mrow><mi
-    mathvariant="script">N</mi><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mn>0.01</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathcal{N}(0,
-    0.01)</annotation></semantics></math>N(0,0.01), as described in Deep learning
+- en: The non-zero elements will be drawn from the normal distribution $\mathcal{N}(0,
+    0.01)$N(0,0.01), as described in Deep learning
    via Hessian-free optimization - Martens, J. (2010).
  prefs: []
  type: TYPE_NORMAL

--- a/totrans/doc22_070.yaml
+++ b/totrans/doc22_070.yaml
@@ -502,10 +502,8 @@
    to the weights:'
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><msubsup><mi>W</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mtext>EMA</mtext></msubsup><mo>=</mo><mi>α</mi><msubsup><mi>W</mi><mi>t</mi><mtext>EMA</mtext></msubsup><mo>+</mo><mo
-    stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>α</mi><mo stretchy="false">)</mo><msubsup><mi>W</mi><mi>t</mi><mtext>model</mtext></msubsup></mrow><annotation
-    encoding="application/x-tex">W^\textrm{EMA}_{t+1} = \alpha W^\textrm{EMA}_{t}
-    + (1 - \alpha) W^\textrm{model}_t</annotation></semantics></math> Wt+1EMA=αWtEMA+(1−α)Wtmodel
+- en: $W^\textrm{EMA}_{t+1} = \alpha W^\textrm{EMA}_{t}
+    + (1 - \alpha) W^\textrm{model}_t$ Wt+1EMA=αWtEMA+(1−α)Wtmodel
  prefs: []
  type: TYPE_NORMAL
 - en: where alpha is the EMA decay.

--- a/totrans/doc22_071.yaml
+++ b/totrans/doc22_071.yaml
@@ -12,9 +12,7 @@
    numbers frequently occur in mathematics and engineering, especially in topics
    like signal processing. Traditionally many users and libraries (e.g., TorchAudio)
    have handled complex numbers by representing the data in float tensors with shape
-    <math><semantics><mrow><mo stretchy="false">(</mo><mi mathvariant="normal">.</mi><mi
-    mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo separator="true">,</mo><mn>2</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(..., 2)</annotation></semantics></math>(...,2)
+    $(..., 2)$(...,2)
    where the last dimension contains the real and imaginary values.
  prefs: []
  type: TYPE_NORMAL
@@ -71,9 +69,7 @@
  - PREF_H2
  type: TYPE_NORMAL
 - en: Users who currently worked around the lack of complex tensors with real tensors
-    of shape <math><semantics><mrow><mo stretchy="false">(</mo><mi mathvariant="normal">.</mi><mi
-    mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo separator="true">,</mo><mn>2</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(..., 2)</annotation></semantics></math>(...,2)
+    of shape $(..., 2)$(...,2)
    can easily to switch using the complex tensors in their code using [`torch.view_as_complex()`](generated/torch.view_as_complex.html#torch.view_as_complex
    "torch.view_as_complex") and [`torch.view_as_real()`](generated/torch.view_as_real.html#torch.view_as_real
    "torch.view_as_real"). Note that these functions don’t perform any copy and return

--- a/totrans/doc22_077.yaml
+++ b/totrans/doc22_077.yaml
@@ -418,9 +418,7 @@
  prefs: []
  type: TYPE_TB
 - en: '| [`atan2`](generated/torch.atan2.html#torch.atan2 "torch.atan2") | Element-wise
-    arctangent of <math><semantics><mrow><msub><mtext>input</mtext><mi>i</mi></msub><mi
-    mathvariant="normal">/</mi><msub><mtext>other</mtext><mi>i</mi></msub></mrow><annotation
-    encoding="application/x-tex">\text{input}_{i} / \text{other}_{i}</annotation></semantics></math>inputi/otheri
+    arctangent of $\text{input}_{i} / \text{other}_{i}$inputi/otheri
    with consideration of the quadrant. |'
  prefs: []
  type: TYPE_TB
@@ -510,9 +508,8 @@
    equality |'
  prefs: []
  type: TYPE_TB
- en: '| [`ne`](generated/torch.ne.html#torch.ne "torch.ne") | Computes <math><semantics><mrow><mtext>input</mtext><mo
-    mathvariant="normal">≠</mo><mtext>other</mtext></mrow><annotation encoding="application/x-tex">\text{input}
-    \neq \text{other}</annotation></semantics></math>input=other element-wise. |'
+- en: '| [`ne`](generated/torch.ne.html#torch.ne "torch.ne") | Computes $\text{input}
+    \neq \text{other}$input=other element-wise. |'
  prefs: []
  type: TYPE_TB
 - en: '| [`le`](generated/torch.le.html#torch.le "torch.le") | Computes $\text{input} \leq \text{other}$input≤other

--- a/totrans/doc22_079.yaml
+++ b/totrans/doc22_079.yaml
@@ -249,12 +249,10 @@
    and `torch.bfloat16` and e = 8 for `torch.int8`.
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><msub><mi>M</mi><mrow><mi>d</mi><mi>e</mi><mi>n</mi><mi>s</mi><mi>e</mi></mrow></msub><mo>=</mo><mi>r</mi><mo>×</mo><mi>c</mi><mo>×</mo><mi>e</mi><msub><mi>M</mi><mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>r</mi><mi>s</mi><mi>e</mi></mrow></msub><mo>=</mo><msub><mi>M</mi><mrow><mi>s</mi><mi>p</mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>f</mi><mi>i</mi><mi>e</mi><mi>d</mi></mrow></msub><mo>+</mo><msub><mi>M</mi><mrow><mi>m</mi><mi>e</mi><mi>t</mi><mi>a</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi></mrow></msub><mo>=</mo><mi>r</mi><mo>×</mo><mfrac><mi>c</mi><mn>2</mn></mfrac><mo>×</mo><mi>e</mi><mo>+</mo><mi>r</mi><mo>×</mo><mfrac><mi>c</mi><mn>2</mn></mfrac><mo>×</mo><mn>2</mn><mo>=</mo><mfrac><mrow><mi>r</mi><mi>c</mi><mi>e</mi></mrow><mn>2</mn></mfrac><mo>+</mo><mi>r</mi><mi>c</mi><mo>=</mo><mi>r</mi><mi>c</mi><mi>e</mi><mo
-    stretchy="false">(</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mn>1</mn><mi>e</mi></mfrac><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">M_{dense}
+- en: $M_{dense}
    = r \times c \times e \\ M_{sparse} = M_{specified} + M_{metadata} = r \times
    \frac{c}{2} \times e + r \times \frac{c}{2} \times 2 = \frac{rce}{2} + rc =rce(\frac{1}{2}
-    +\frac{1}{e})</annotation></semantics></math> Mdense=r×c×eMsparse=Mspecified+Mmetadata=r×2c×e+r×2c×2=2rce+rc=rce(21+e1)
+    +\frac{1}{e})$ Mdense=r×c×eMsparse=Mspecified+Mmetadata=r×2c×e+r×2c×2=2rce+rc=rce(21+e1)
  prefs: []
  type: TYPE_NORMAL
 - en: Using these calculations, we can determine the total memory footprint for both
@@ -265,9 +263,8 @@
    only on the bitwidth of the tensor datatype.
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mi>C</mi><mo>=</mo><mfrac><msub><mi>M</mi><mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>r</mi><mi>s</mi><mi>e</mi></mrow></msub><msub><mi>M</mi><mrow><mi>d</mi><mi>e</mi><mi>n</mi><mi>s</mi><mi>e</mi></mrow></msub></mfrac><mo>=</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mn>1</mn><mi>e</mi></mfrac></mrow><annotation
-    encoding="application/x-tex">C = \frac{M_{sparse}}{M_{dense}} = \frac{1}{2} +
-    \frac{1}{e}</annotation></semantics></math> C=MdenseMsparse=21+e1
+- en: $C = \frac{M_{sparse}}{M_{dense}} = \frac{1}{2} +
+    \frac{1}{e}$ C=MdenseMsparse=21+e1
  prefs: []
  type: TYPE_NORMAL
 - en: By using this formula, we find that the compression ratio is 56.25% for `torch.float16`

--- a/totrans/doc22_081.yaml
+++ b/totrans/doc22_081.yaml
@@ -16,12 +16,9 @@
    they are considered close if
  prefs: []
  type: TYPE_NORMAL
- en: <math display="block"><semantics><mrow><mo stretchy="false">∣</mo><mtext>actual</mtext><mo>−</mo><mtext>expected</mtext><mo
-    stretchy="false">∣</mo><mo>≤</mo><mtext mathvariant="monospace">atol</mtext><mo>+</mo><mtext
-    mathvariant="monospace">rtol</mtext><mo>⋅</mo><mo stretchy="false">∣</mo><mtext>expected</mtext><mo
-    stretchy="false">∣</mo></mrow><annotation encoding="application/x-tex">\lvert
+- en: $\lvert
    \text{actual} - \text{expected} \rvert \le \texttt{atol} + \texttt{rtol} \cdot
-    \lvert \text{expected} \rvert</annotation></semantics></math>∣actual−expected∣≤atol+rtol⋅∣expected∣
+    \lvert \text{expected} \rvert$∣actual−expected∣≤atol+rtol⋅∣expected∣
  prefs: []
  type: TYPE_NORMAL
 - en: Non-finite values (`-inf` and `inf`) are only considered close if and only if

--- a/totrans/doc22_093.yaml
+++ b/totrans/doc22_093.yaml
@@ -293,19 +293,11 @@
 - en: 'Shape:'
  prefs: []
  type: TYPE_NORMAL
- en: 'img_tensor: Default is <math><semantics><mrow><mo stretchy="false">(</mo><mn>3</mn><mo
-    separator="true">,</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(3, H, W)</annotation></semantics></math>(3,H,W).
+- en: 'img_tensor: Default is $(3, H, W)$(3,H,W).
    You can use `torchvision.utils.make_grid()` to convert a batch of tensor into
-    3xHxW format or call `add_images` and let us do the job. Tensor with <math><semantics><mrow><mo
-    stretchy="false">(</mo><mn>1</mn><mo separator="true">,</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(1, H,
-    W)</annotation></semantics></math>(1,H,W), <math><semantics><mrow><mo stretchy="false">(</mo><mi>H</mi><mo
-    separator="true">,</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(H, W)</annotation></semantics></math>(H,W), <math><semantics><mrow><mo
-    stretchy="false">(</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo separator="true">,</mo><mn>3</mn><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(H, W,
-    3)</annotation></semantics></math>(H,W,3) is also suitable as long as corresponding
+    3xHxW format or call `add_images` and let us do the job. Tensor with $(1, H,
+    W)$(1,H,W), $(H, W)$(H,W), $(H, W,
+    3)$(H,W,3) is also suitable as long as corresponding
    `dataformats` argument is passed, e.g. `CHW`, `HWC`, `HW`.'
  prefs: []
  type: TYPE_NORMAL
@@ -364,10 +356,8 @@
 - en: 'Shape:'
  prefs: []
  type: TYPE_NORMAL
- en: 'img_tensor: Default is <math><semantics><mrow><mo stretchy="false">(</mo><mi>N</mi><mo
-    separator="true">,</mo><mn>3</mn><mo separator="true">,</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(N, 3,
-    H, W)</annotation></semantics></math>(N,3,H,W). If `dataformats` is specified,
+- en: 'img_tensor: Default is $(N, 3,
+    H, W)$(N,3,H,W). If `dataformats` is specified,
    other shape will be accepted. e.g. NCHW or NHWC.'
  prefs: []
  type: TYPE_NORMAL
@@ -466,10 +456,7 @@
 - en: 'Shape:'
  prefs: []
  type: TYPE_NORMAL
- en: 'vid_tensor: <math><semantics><mrow><mo stretchy="false">(</mo><mi>N</mi><mo
-    separator="true">,</mo><mi>T</mi><mo separator="true">,</mo><mi>C</mi><mo separator="true">,</mo><mi>H</mi><mo
-    separator="true">,</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(N, T, C, H, W)</annotation></semantics></math>(N,T,C,H,W).
+- en: 'vid_tensor: $(N, T, C, H, W)$(N,T,C,H,W).
    The values should lie in [0, 255] for type uint8 or [0, 1] for type float.'
  prefs: []
  type: TYPE_NORMAL
@@ -511,9 +498,7 @@
 - en: 'Shape:'
  prefs: []
  type: TYPE_NORMAL
- en: 'snd_tensor: <math><semantics><mrow><mo stretchy="false">(</mo><mn>1</mn><mo
-    separator="true">,</mo><mi>L</mi><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(1, L)</annotation></semantics></math>(1,L). The
+- en: 'snd_tensor: $(1, L)$(1,L). The
    values should lie between [-1, 1].'
  prefs: []
  type: TYPE_NORMAL
@@ -624,15 +609,12 @@
 - en: 'Shape:'
  prefs: []
  type: TYPE_NORMAL
- en: 'mat: <math><semantics><mrow><mo stretchy="false">(</mo><mi>N</mi><mo separator="true">,</mo><mi>D</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(N, D)</annotation></semantics></math>(N,D),
+- en: 'mat: $(N, D)$(N,D),
    where N is number of data and D is feature dimension'
  prefs: []
  type: TYPE_NORMAL
- en: 'label_img: <math><semantics><mrow><mo stretchy="false">(</mo><mi>N</mi><mo
-    separator="true">,</mo><mi>C</mi><mo separator="true">,</mo><mi>H</mi><mo separator="true">,</mo><mi>W</mi><mo
-    stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(N, C,
-    H, W)</annotation></semantics></math>(N,C,H,W)'
+- en: 'label_img: $(N, C,
+    H, W)$(N,C,H,W)'
  prefs: []
  type: TYPE_NORMAL
 - en: 'Examples:'
@@ -779,21 +761,15 @@
 - en: 'Shape:'
  prefs: []
  type: TYPE_NORMAL
- en: 'vertices: <math><semantics><mrow><mo stretchy="false">(</mo><mi>B</mi><mo separator="true">,</mo><mi>N</mi><mo
-    separator="true">,</mo><mn>3</mn><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(B, N, 3)</annotation></semantics></math>(B,N,3).
+- en: 'vertices: $(B, N, 3)$(B,N,3).
    (batch, number_of_vertices, channels)'
  prefs: []
  type: TYPE_NORMAL
- en: 'colors: <math><semantics><mrow><mo stretchy="false">(</mo><mi>B</mi><mo separator="true">,</mo><mi>N</mi><mo
-    separator="true">,</mo><mn>3</mn><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(B, N, 3)</annotation></semantics></math>(B,N,3).
+- en: 'colors: $(B, N, 3)$(B,N,3).
    The values should lie in [0, 255] for type uint8 or [0, 1] for type float.'
  prefs: []
  type: TYPE_NORMAL
- en: 'faces: <math><semantics><mrow><mo stretchy="false">(</mo><mi>B</mi><mo separator="true">,</mo><mi>N</mi><mo
-    separator="true">,</mo><mn>3</mn><mo stretchy="false">)</mo></mrow><annotation
-    encoding="application/x-tex">(B, N, 3)</annotation></semantics></math>(B,N,3).
+- en: 'faces: $(B, N, 3)$(B,N,3).
    The values should lie in [0, number_of_vertices] for type uint8.'
  prefs: []
  type: TYPE_NORMAL