blank

LaTeX Math Typesetting Guide

2026-01-08T11:52:00+08:00

Style Issue Basics
Example I: Cases
Example II: Text Functions
Example III: Limits
- Example 3-1
- Example 3-2
Example IV: Text Acronyms
Example V: Fences

Style Issue Basics

Variables

Variables should always be set in italic font in both text and in equations.

Vectors

Vectors should always be in bold type.

Functions

Functions should always be set as roman type.

Alignment and Line Breaks for Display Formulas

Break and Align on Mathematical Verbs

\[\begin{aligned} |q\rangle & =\cos \frac{\theta}{2}|0\rangle+(\cos \varphi+i \sin \varphi) \sin \frac{\theta}{2}|1\rangle \\ & =\cos \frac{\theta}{2}|0\rangle+\cos \varphi \sin \frac{\theta}{2}|1\rangle+i \sin \varphi \sin \frac{\theta}{2}|1\rangle \end{aligned}\]

Break at Mathematical Conjunctions and Align to the Right of the First Mathematical Verb

\[\begin{aligned} |q\rangle = & \cos \frac{\theta(t)}{2}|0\rangle + \cos \varphi(t) \sin \frac{\theta(t)}{2}|1\rangle \\ &+ i \sin \varphi(t) \sin \frac{\theta(t)}{2}|1\rangle \end{aligned}\]

Always Keep Expressions Visually Within Fences

\[\begin{aligned} \overline{P}_{H, rx}^{(E R_{C W}^{f})} = \frac{1}{T_{E R}} & \left( T_{h v} P_{S, tx} |\mathbf{g}^{T} \mathbf{w}|^{2} + T_{tx} P_{S, tx} |\mathbf{g}^{T} \mathbf{w} \right. \\ &\quad \left. + \sqrt{\rho_{1} \eta (P_{E R^{f}, rx})} |\mathbf{f}_{1}^{T} \mathbf{w}| f_{2} e^{-j(\theta_{2}-\theta_{1})} |^{2} \right), \end{aligned}\]

Note the position of the “+” under and to the right of the parentheses surrounding the expression.

Avoid Obsolete Codes and Delimiters (`eqnarray`, `$$` display math delimiters)

Avoid the use of outdated macros, such as eqnarray and $$ math delimiters, for display equations.

Use Appropriate Delimiters for Display Equations

For single-line unnumbered display equations, please use only the following delimiters: \[ . . . \] or \begin{equation*} . . . \end{equation*}
For multiline unnumbered display equations, please use only the following delimiters: \begin{align} . . . \end{align}
For single-line numbered display equations, please use only the following delimiters: \begin{equation} . . . \end{equation}
For multiline numbered display equations, please use only the following delimiters: \begin{align} . . . \end{align}

mathcal vs. RSFS script

Please see the RSFS documentation at https://ctan.org/pkg/rsfs for proper use.

Example I: Cases

Example 1-1

Incorrect Example: The wrong environment is used (array instead of cases), the tabs are missing, and the text is not formatted correctly (should not be italic).

\begin{equation}
  P(Y=1|\boldsymbol{X_{i}^{j}})=
  \left\lbrace
  \begin{array}{l}
    0, correct \\
    1, erroneous.
  \end{array}
  \right.
  \tag{1}
\end{equation}

\[\begin{equation} P(Y=1|\boldsymbol{X_{i}^{j}})= \left\lbrace \begin{array}{l} 0, correct \\ 1, erroneous. \end{array} \right. \tag{1} \end{equation}\]

Correct Example: The correct environment is used. Using the cases environment will save keystrokes (from not having to type the \left\brace{) and automatically provide the correct column alignment. The tabs have been inserted and the text formatting corrected.

\begin{equation*}
  P(Y=1|\boldsymbol{X_{i}^{j}})=
  \begin{cases}
    0, & \text{correct} \\
    1,& \text{erroneous.}
  \end{cases}
  \tag{1a}
\end{equation*}

\[\begin{equation*} P(Y=1|\boldsymbol{X_{i}^{j}})= \begin{cases} 0, & \text{correct} \\ 1,& \text{erroneous.} \end{cases} \tag{1a} \end{equation*}\]

Example 1-2

Incorrect Example: The wrong environment is used and the column alignment is incorrect. Columns in cases should be left aligned.

\begin{equation}
  {z_m(t)} = \left\lbrace {
    \begin{array}{cc}
      1,&{{\mathrm {if}}\  {{\beta }_m(t)} < \frac{\mathfrak {B}_{m}^{\max }}{{ |\mathcal {U}_m|r_{m,i}^{\min } }}},\\
      {0,}&{{\mathrm {otherwise.}}}
  \end{array}}
  \right.
  \tag{2}
\end{equation}

\[\begin{equation} {z_m(t)} = \left\lbrace { \begin{array}{cc} 1,&{{\mathrm {if}}\ {{\beta }_m(t)} < \frac{\mathfrak {B}_{m}^{\max }}{{ |\mathcal {U}_m|r_{m,i}^{\min } }}},\\ {0,}&{{\mathrm {otherwise.}}} \end{array}} \right. \tag{2} \end{equation}\]

\begin{equation*}
  {z_m(t)} =
  \begin{cases}
    1,&{\mathrm {if}}\ {\beta }_m(t) < \frac{\mathfrak {B}_{m}^{\max }}{{ |\mathcal {U}_m|r_{m,i}^{\min } }},\\
    {0,}&{\mathrm {otherwise.}}
  \end{cases}
  \tag{2a}
\end{equation*}

\[\begin{equation*} {z_m(t)} = \begin{cases} 1,&{\mathrm {if}}\ {\beta }_m(t) < \frac{\mathfrak {B}_{m}^{\max }}{{ |\mathcal {U}_m|r_{m,i}^{\min } }},\\ {0,}&{\mathrm {otherwise.}} \end{cases} \tag{2a} \end{equation*}\]

Example 1-3

Incorrect Example: The wrong environment is used; a space is missing after the word “if.” In this instance an extra bit of space is needed.

\begin{align}
  h_{i}(x,y) &= \left\lbrace
  \begin{array}{ll}
    +1 & \mathrm{if} \xi _{i}(x)=\eta _{i}(y),\\
    -1 & \mathrm{otherwise },
  \end{array}
  \right.\nonumber \\
  &=(2 \xi _{i}(x)-1)(2\eta _{i}(y)-1),
  \tag{3}
\end{align}

\[\begin{align} h_{i}(x,y) &= \left\lbrace \begin{array}{ll} +1 & \mathrm{if} \xi _{i}(x)=\eta _{i}(y),\\ -1 & \mathrm{otherwise }, \end{array} \right.\nonumber \\ &=(2 \xi _{i}(x)-1)(2\eta _{i}(y)-1), \tag{3} \end{align}\]

Correct Example: The correct environment is being used. Using the cases environment will save keystrokes (from not having to type the \left\brace{) and automatically provide the correct column alignment. The text formatting is corrected by using \text{} to surround the textual elements “if” and “otherwise.”

\begin{align}
  h_{i}(x,y) &=
  \begin{cases}
    +1 & \mathrm{if }~ \xi _{i}(x)=\eta _{i}(y),\\
    -1 & \text{otherwise },
  \end{cases} \nonumber \\
  &=(2 \xi _{i}(x)-1)(2\eta _{i}(y)-1),
  \tag{3a}
\end{align}

\[\begin{align} h_{i}(x,y) &= \begin{cases} +1 & \mathrm{if }~ \xi _{i}(x)=\eta _{i}(y),\\ -1 & \text{otherwise }, \end{cases} \nonumber \\ &=(2 \xi _{i}(x)-1)(2\eta _{i}(y)-1), \tag{3a} \end{align}\]

Example II: Text Functions

Example 2-1

Incorrect Example: This example has incorrect text formatting and alignment issues. Please use \max, \min, and \text{...} for the conditions or text. \; should not be used for spacing: when the code is reused in other composition software, it will likely format differently than expected. Using tabs will provide concrete alignment points.

\begin{equation}
  LD(a_{x},b_{y})
  \begin{cases}
    max(x,y) \;\;\;\;\;\;\;\;\;\;\;if\; min(x,y)=0 \\
    min
    \begin{cases}
      L(a,b)(x-1,y)+1 \\
      L(a,b)(x,y-1,j)+1 & Otherwise\\
      L(a,b)(x-1,y-1)+1(a_{x}\neq b_{y})
    \end{cases}
  \end{cases}
  \tag{7}
\end{equation}

\[\begin{equation} LD(a_{x},b_{y}) \begin{cases} max(x,y) \;\;\;\;\;\;\;\;\;\;\;if\; min(x,y)=0 \\ min \begin{cases} L(a,b)(x-1,y)+1 \\ L(a,b)(x,y-1,j)+1 & Otherwise\\ L(a,b)(x-1,y-1)+1(a_{x}\neq b_{y}) \end{cases} \end{cases} \tag{7} \end{equation}\]

Correct Example: This example has the correct text formatting and tabs are used to correctly set column alignment. Note the use of \hfill to replace the multiple \; for alignment purposes.

\begin{equation}
  LD(a_{x},b_{y})
  \begin{cases}
    \max(x,y) \hfill \text{if } \min(x,y)=0 \\
    \min
    \begin{cases}
      L(a,b)(x-1,y)+1 & \\
      L(a,b)(x,y-1,j)+1 & \text{Otherwise} \\
      L(a,b)(x-1,y-1)+1 & (a_{x}\neq b_{y})
    \end{cases}
  \end{cases}
  \tag{7a}
\end{equation}

\[\begin{equation} LD(a_{x},b_{y}) \begin{cases} \max(x,y) \hfill \text{if } \min(x,y)=0 \\ \min \begin{cases} L(a,b)(x-1,y)+1 & \\ L(a,b)(x,y-1,j)+1 & \text{Otherwise} \\ L(a,b)(x-1,y-1)+1 & (a_{x}\neq b_{y}) \end{cases} \end{cases} \tag{7a} \end{equation}\]

Example 2-2

Incorrect Example: This example has bad formatting of the function min. When coded as shown, it formats incorrectly as italic text.

\begin{equation*}
  d_{l}^{KM} = \underset {\mathbf {p}_{w}}{min} || \mathbf {p}_{f}^{l} – \mathbf {p}_{w} ||,
  \tag{12}
\end{equation*}

\[\begin{equation*} d_{l}^{KM} = \underset {\mathbf {p}_{w}}{min} || \mathbf {p}_{f}^{l} – \mathbf {p}_{w} ||, \tag{12} \end{equation*}\]

Correct Example: This example shows the use of \min to get the correct formatting of the function min.

\begin{equation*}
  d_{l}^{KM} = \underset {\mathbf {p}_{w}}\min || \mathbf {p}_{f}^{l} – \mathbf {p}_{w} ||,
  \tag{12a}
\end{equation*}

\[\begin{equation*} d_{l}^{KM} = \underset {\mathbf {p}_{w}}\min || \mathbf {p}_{f}^{l} – \mathbf {p}_{w} ||, \tag{12a} \end{equation*}\]

Example 2-3

Incorrect Example: This example has bad formatting of the function “arg min.” When coded as shown, it formats incorrectly as italic text.

\begin{equation*}
  d_{R}^{KM} = \underset {d_{l}^{KM}}{arg~{min}} \{ d_{1}^{KM},\ldots,d_{6}^{KM}\}.
  \tag{13}
\end{equation*}

\[\begin{equation*} d_{R}^{KM} = \underset {d_{l}^{KM}}{arg~{min}} \{ d_{1}^{KM},\ldots,d_{6}^{KM}\}. \tag{13} \end{equation*}\]

Correct Example: This example shows the use of {\text{arg min}} to get the correct formatting of the function “arg min.”

\begin{equation*}
  d_{R}^{KM} = \underset {d_{l}^{KM}} {\text{arg min}} \{ d_{1}^{KM},\ldots,d_{6}^{KM}\}.
  \tag{13a}
\end{equation*}

\[\begin{equation*} d_{R}^{KM} = \underset {d_{l}^{KM}} {\text{arg min}} \{ d_{1}^{KM},\ldots,d_{6}^{KM}\}. \tag{13a} \end{equation*}\]

Example III: Limits

Example 3-1

Incorrect Example: The upper and lower limits in a display formula should generally be above and below the operators.

\begin{equation*}
  c_{r_i} = \beta _0+\sum \nolimits _{j=1}^{n}{\beta _j \times c_{r_j}},
  \tag{15}
\end{equation*}

\[\begin{equation*} c_{r_i} = \beta _0+\sum \nolimits _{j=1}^{n}{\beta _j \times c_{r_j}}, \tag{15} \end{equation*}\]

Correct Example: In this example, the \nolimits was removed as it was causing the incorrect formatting. \nolimits has appropriate uses for inline equations and in certain subelements of a display equation.

\begin{equation*}
  c_{r_i} = \beta _0+\sum_{j=1}^{n}{\beta _j \times c_{r_j}},
  \tag{15a}
\end{equation*}

\[\begin{equation*} c_{r_i} = \beta _0+\sum_{j=1}^{n}{\beta _j \times c_{r_j}}, \tag{15a} \end{equation*}\]

Example 3-2

Incorrect Example: When limits appear in fractions within a display formula, they should be off to the side of the operator.

\begin{equation*}
  {C_{D}} = \frac {{\sum \limits _{i = 1}^{N} {\left ({{C_{D}({n_{\max }}) – {C_{D}}({n_{i}})} }\right)} }}{{ \sum \limits _{i = 1}^{N} {\left ({{C_{D}(n_{\max }^ {*}) – {C_{D}}(n_{i}^{*})} }\right)} }}
  \tag{18}
\end{equation*}

\[\begin{equation*} {C_{D}} = \frac {{\sum \limits _{i = 1}^{N} {\left ({{C_{D}({n_{\max }}) – {C_{D}}({n_{i}})} }\right)} }}{{ \sum \limits _{i = 1}^{N} {\left ({{C_{D}(n_{\max }^ {*}) – {C_{D}}(n_{i}^{*})} }\right)} }} \tag{18} \end{equation*}\]

Correct Example: This example shows the proper formatting when \limits are removed. LaTeX will automatically format the limits correctly when within a fraction.

\begin{equation*}
  {C_{D}} = \frac {{\sum _{i = 1}^{N} {\left ({{C_{D}({n_{\max }}) {C_{D}}({n_{i}})} }\right)} }}{{ \sum _{i = 1}^{N} {\left ({{C_{D}(n_{\max }^ {*}) – {C_{D}}(n_{i}^{*})} }\right)} }}
  \tag{18a}
\end{equation*}

\[\begin{equation*} {C_{D}} = \frac {{\sum _{i = 1}^{N} {\left ({{C_{D}({n_{\max }}) {C_{D}}({n_{i}})} }\right)} }}{{ \sum _{i = 1}^{N} {\left ({{C_{D}(n_{\max }^ {*}) – {C_{D}}(n_{i}^{*})} }\right)} }} \tag{18a} \end{equation*}\]

Example IV: Text Acronyms

Example 4-1

Incorrect Example: This example shows when the acronym “MSE” is not coded as text, it will appear in italic. This is inconsistent with how it appears in the text and it should be consistent.

\begin{equation*}
  MSE = \frac {1}{n}\sum _{i=1}^{n}(Y_{i} – \hat {Y_{i}})^{2}
  \tag{19}
\end{equation*}

\[\begin{equation*} MSE = \frac {1}{n}\sum _{i=1}^{n}(Y_{i} – \hat {Y_{i}})^{2} \tag{19} \end{equation*}\]

Correct Example: This example shows where the acronym “MSE” is coded using \text{} to match how it appears in the text.

\begin{equation*}
  \text {MSE} = \frac {1}{n}\sum _{i=1}^{n}(Y_{i} – \hat {Y_{i}})^{2}
  \tag{19a}
\end{equation*}

\[\begin{equation*} \text {MSE} = \frac {1}{n}\sum _{i=1}^{n}(Y_{i} – \hat {Y_{i}})^{2} \tag{19a} \end{equation*}\]

Example 4-2

Incorrect Example: This example shows an instance where the formatting of the acronym “NCC” is inconsistent between text and its use in a formula.

\begin{equation*}
  {NCC}=\dfrac {\left |{\sum _{i=1}^{n}(a_{i}-\mu _{A})(b_{i}-\mu _{B})}\right |}{l\times \sigma _{A} \times \sigma _{B}},
  \tag{20}
\end{equation*}

\[\begin{equation*} {NCC}=\dfrac {\left |{\sum _{i=1}^{n}(a_{i}-\mu _{A})(b_{i}-\mu _{B})}\right |}{l\times \sigma _{A} \times \sigma _{B}}, \tag{20} \end{equation*}\]

Correct Example: This example shows where the acronym “NCC” is coded using \text{} to match how it appears in the text.

\begin{equation*}
  \text {NCC}=\dfrac {\left |{\sum _{i=1}^{n}(a_{i}-\mu _{A})(b_{i}-\mu _{B})}\right |}{l\times \sigma _{A} \times \sigma _{B}},
  \tag{20a}
\end{equation*}

\[\begin{equation*} \text {NCC}=\dfrac {\left |{\sum _{i=1}^{n}(a_{i}-\mu _{A})(b_{i}-\mu _{B})}\right |}{l\times \sigma _{A} \times \sigma _{B}}, \tag{20a} \end{equation*}\]

Example 4-3

Incorrect Example: This example shows an instance where the formatting of the acronym “RMS” is inconsistent between text and its use in a formula.

\begin{equation*}
  RMS_{rs}=\sqrt {\frac {1}{N}\sum \limits _{i}^{N} {\left ({{d_{rs}(i)} }\right)^{2}}}
  \tag{32}
\end{equation*}

\[\begin{equation*} RMS_{rs}=\sqrt {\frac {1}{N}\sum \limits _{i}^{N} {\left ({{d_{rs}(i)} }\right)^{2}}} \tag{32} \end{equation*}\]

Correct Example: This example shows where the acronym “RMS” is coded using \text{} to match how it appears in the text.

\begin{equation*}
  \text{RMS}_{rs}=\sqrt {\frac {1}{N}\sum \limits _{i}^{N} {\left ({{d_{rs}(i)} }\right)^{2}}}
  \tag{32a}
\end{equation*}

\[\begin{equation*} \text{RMS}_{rs}=\sqrt {\frac {1}{N}\sum \limits _{i}^{N} {\left ({{d_{rs}(i)} }\right)^{2}}} \tag{32a} \end{equation*}\]

Example V: Fences

Example 5-1

Incorrect Example: In this example, the parentheses are not growing to properly surround the content in between them.

\begin{equation*}
  \delta \approx 1 – ({e^{-\frac {d^{2}}{2 \times C^{m}_{T}}} \times e^{-\frac {d^{2}}{2 \times C^{m-1}_{T}}}})
  \tag{21}
\end{equation*}

\[\begin{equation*} \delta \approx 1 – ({e^{-\frac {d^{2}}{2 \times C^{m}_{T}}} \times e^{-\frac {d^{2}}{2 \times C^{m-1}_{T}}}}) \tag{21} \end{equation*}\]

Correct Example: In this example, the use of \left( and \right) enables the parentheses to grow to the height of the content in between them.

\begin{equation*}
  \delta \approx 1 – \left({e^{-\frac {d^{2}}{2 \times C^{m}_{T}}} \times e^{-\frac {d^{2}}{2 \times C^{m-1}_{T}}}}\right)
  \tag{21a}
\end{equation*}

\[\begin{equation*} \delta \approx 1 – \left({e^{-\frac {d^{2}}{2 \times C^{m}_{T}}} \times e^{-\frac {d^{2}}{2 \times C^{m-1}_{T}}}}\right) \tag{21a} \end{equation*}\]

Example 5-2

Incorrect Example: In this example, the square brackets are not growing to properly surround the content in between them.

\begin{equation*}
  [\sqrt {(\Delta x_{i}+d_{x})^{2}+(\Delta y_{i})^{2}} -\mu ^{k}]>\epsilon \mu ^{k}
  \tag{22}
\end{equation*}

\[\begin{equation*} [\sqrt {(\Delta x_{i}+d_{x})^{2}+(\Delta y_{i})^{2}} -\mu ^{k}]>\epsilon \mu ^{k} \tag{22} \end{equation*}\]

Correct Example: In this example, the use of \left[ and \right] enables the square brackets to grow to the height of the content in between them.

\begin{equation*}
  \left[ \sqrt {(\Delta x_{i}+d_{x})^{2}+(\Delta y_{i})^{2}} -\mu ^{k}\right] >\epsilon \mu ^{k}
  \tag{22a}
\end{equation*}

\[\begin{equation*} \left[ \sqrt {(\Delta x_{i}+d_{x})^{2}+(\Delta y_{i})^{2}} -\mu ^{k}\right] >\epsilon \mu ^{k} \tag{22a} \end{equation*}\]

Example 5-3

Incorrect Example: In this example, the parentheses are not growing to properly surround the content in between them.

\begin{equation*}
  \textrm {T} = ({\frac {c}{B}})^{2}
  \tag{34}
\end{equation*}

\[\begin{equation*} \textrm {T} = ({\frac {c}{B}})^{2} \tag{34} \end{equation*}\]

Correct Example: In this example, the use of \left( and \right) enables the parentheses to grow to the height of the content in between them.

\begin{equation*}
  \textrm {T} = \left({\frac {c}{B}}\right)^{2}
  \tag{34a}
\end{equation*}

\[\begin{equation*} \textrm {T} = \left({\frac {c}{B}}\right)^{2} \tag{34a} \end{equation*}\]

Shortcuts Cheat Sheet

2025-12-30T15:12:00+08:00

kanban
  tmux
    [tmux new -s $SESSION]
    [C-b ?]
    [tmux attach -t $SESSION]
    [C-b c : create window]
    [C-b % : split horizontally]
    [C-b " : split vertically]
    [C-b 0]
    [C-b q : print pane numbers]
    [C-b s : show sessions]
    [C-b w : show windows]
    [x : kill selected item]
    [X : kill tagged items]
    [t : toggle tagged]
    [T : tag no items]
    [C-t : tag all items]
    [C-b D : list clients]
    [d : detach selected client]
    [D : detach tagged clients]
    [C-b & : kill current window]
    [C-b x : kill active pane]
    [C-b $ : prompt new name for session]
    [C-b , : prompt new name for window]
    ["C-b { / }: swapped with the pane"]
    [C-b . : prompt new index for window]
    [C-b z : temporarily take up whole window]
    ["C-b [ : enter copy mode"]
    [C-Space : start a selection]
    [C-w : copy selection and exit copy mode]
    ["C-b ] : paste buffer"]
    [C-b = : enter buffer mode]
    [p : paste selected buffer]
    [P : paste tagged buffers]
    [d : delete selected buffer]
    [D : delete tagged buffers]
    [:set -g mouse on]
  vscode
    [Ctrl+Shift+P, F1 : Show Command Palette]
    [Alt+ ↓ / ↑ : Move line down/up]
    [Ctrl+Enter : Insert line below]
    [Ctrl+Shift+Enter : Insert line above]
    ["Ctrl+] / Ctrl+[ : Indent/Outdent line"]
    ["Ctrl+Shift+ [ / ] : Fold/unfold region"]
    [Ctrl+/ : Toggle line comment]
    [Shift+Ctrl+ ↑ / ↓ : Insert cursor above/below]
    [Ctrl+\ : Split editor]
    [Ctrl+` : Show integrated terminal]
    [Ctrl+Shift+` : Create new terminal]
  linux
    [who | w | whoami | last]
    [lscpu | lsblk | lscpu | lsof]
    [free -h]
    [df -h]
    [env]
    ["uptime [-p|-s]"]
    [date | timedatectl]
    [ls -lt]
    [cd -]
    [rmdir dirname]
    [less filename]
    [tail -f logfile]
    [locate filename]
    ["find /path -name ''*.log'' -exec rm {} \;"]
    [cp -p file.txt backup.txt]
    [chown user:group filename]
    [chown -R user:group directory/]
    ["ps [aux]"]
    [ps -ef --forest]
    [ps -u username]
    [jobs]
    ["bg %1 | fg %1"]
    [command &]
    [nohup command &]
    ["kill [-9] PID"]
    [killall processname]
    [kill -l]
    [htop]
    [iotop]
    [systemctl enable servicename]
    [journalctl -f]
    ["ip [addr|route] show"]
    [netstat -tuln]
    [netstat -tuln | grep LISTEN]
    [ss -tuln]
    [netstat -tulnp]
    [netstat -tuln | grep :80]
    [traceroute google.com]
    [mtr google.com]
    [grep -r "pattern" /path/]
    [grep -n "pattern" filename]
    ["sort [-n] filename"]
    [uniq filename]
    ["wc [-l] filename"]
    [sed 's/old/new/g' filename]
    [sed '/pattern/d' filename]
    [cut -d',' -f1 file.csv]
    [tar -xzf archive.tar.gz -C /destination/]
    [unzip -l archive.zip]
    [du -h --max-depth=1 /path/]
    [du -h | sort -hr | head -10]
    [cat /proc/meminfo]
    ["vmstat [2]"]
    [swapon --show]
    [top -p PID]
    ["iostat [2]"]
    [iostat -x /dev/sda]
    ["journalctl [-f]"]
    [journalctl -u servicename]
    [dmesg]
    [usermod -aG groupname username]
    ["userdel [-r username]"]
    ["su [-]"]
    [sudo -u username command]
    [groupadd groupname]
    [groups username]
    [cat /etc/group]
    [usermod -g groupname username]
    ["passwd [username]"]
    [chage -l username]
    [command > output.txt]
    [command >> output.txt]
    [command < input.txt]
    [command1 | command2]
    [alias ll='ls -la']
    [alias]
    [unalias ll]
    [printenv]
    [md5sum filename]
    [sha256sum file]
    [sha256sum -c checksums.txt]
    [ufw status verbose]
    [ufw allow ssh]
    [ufw deny 23]
    [iptables -L]
    [tail -f /var/log/auth.log]
    [tail -f /var/log/syslog]
    [journalctl -p err]
    [mount /dev/sda1 /mnt]
    [chroot /mnt]
    [grub-install /dev/sda]
    [update-grub]
    [systemctl status servicename]
    [journalctl -u servicename]
    [systemctl --failed]

RTX 5060 Ti 安装 TensorFlow

2025-12-19T17:41:00+08:00

RTX 5060 has compute capability 12.0 - a very new architecture. Stable TensorFlow versions lack pre-compiled GPU kernels for this architecture, causing JIT compilation failures for certain operations (especially float32).

Step by Step Solution

conda create --name tf_gpu python=3.11
conda activate tf_gpu

conda install nvidia/label/cuda-12.5.1::cuda-toolkit

pip install tf-nightly[and-cuda]

conda env config vars set LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

conda deactivate
conda activate tf_gpu

tmux User Guides

2025-12-19T17:41:00+08:00

Basic concepts
Using tmux interactively

Basic concepts

The tmux server and clients

tmux keeps all its state in a single main process, called the tmux server. This runs in the background and manages all the programs running inside tmux and keeps track of their output. The tmux server is started automatically when the user runs a tmux command and by default exits when there are no running programs.

Users attach to the tmux server by starting a client. This takes over the terminal where it is run and talks to the server using a socket file in /tmp . Each client is identified by the name of the outside terminal where it is started, for example /dev/ttypf.

Sessions, windows and panes

Every terminal inside tmux belongs to one pane, this is a rectangular area which shows the content of the terminal inside tmux.

Each pane appears in one window. A window is made up of one or more panes which together cover its entire area - so multiple panes may be visible at the same time.

Every window has a name. Window names do not have to be unique, windows are usually identified by the session and the window index rather than their name.

Each pane is separated from the panes around it by a line, this is called the pane border. There is one pane in each window called the active pane.

Multiple windows are grouped together into sessions. Windows may be linked to multiple sessions at the same time, although mostly they are only in one. Each window in a session has a number, called the window index - the same window may be linked at different indexes in different sessions.

Each session has one current window.

A session may be attached to one or more clients, which means it is shown on the outside terminal where that client is running. Sessions do not have an index but they do have a name, which must be unique.

Summary

Seriver
- Client 1
  - Session 1 (attached)
    - Windows 1
      - Pane 1
      - Pane 2
    - Windows 2
      - Pane 1
  - Session 2
- Client 2

Using tmux interactively

Creating sessions

tmux new -s mysession

The prefix key

The default prefix key is C-b, which means the Ctrl key and b.

Help keys

C-b ?

The command prompt

tmux has an interactive command prompt. This can be opened by pressing C-b :.

Multiple commands may be entered together at the command prompt by separating them with a semicolon (;). This is called a command sequence.

Attaching and detaching

To detach tmux, the C-b d key binding is used.

tmux attach -t mysession

Listing sessions

The list-sessions command (alias ls) shows a list of available sessions that can be attached.

$ tmux ls
1: 3 windows (created Sat Feb 22 11:44:51 2020)
2: 1 windows (created Sat Feb 22 11:44:51 2020)
myothersession: 2 windows (created Sat Feb 22 11:44:51 2020)
mysession: 1 windows (created Sat Feb 22 11:44:51 2020)

Killing tmux entirely

:kill-server

Creating new windows

A new window can be created in an attached session with the C-b c key binding which runs the new-window command.

Splitting the window

A pane is created by splitting a window. This is done with the split-window command which is bound to two keys by default:

C-b % splits the current pane into two horizontally.
C-b " splits the current pane into two vertically.

Changing the current window

C-b 0 changes to window 0, C-b 1 to window 1, up to window C-b 9 for window 9.
C-b ' prompts for a window index and changes to that window.
C-b n changes to the next window in the window list by number.
C-b p changes to the previous window in the window list by number.
C-b l changes to the last window.

Changing the active pane

C-b Up, C-b Down, C-b Left and C-b Right change to the pane above, below, left or right of the active pane.
C-b q prints the pane numbers and their sizes on top of the panes for a short time. Pressing one of the number keys before they disappear changes the active pane to the chosen pane, so C-b q 1 will change to pane number 1.
C-b o moves to the next pane by pane number.
C-b C-o swaps that pane with the active pane.

Choosing sessions, windows and panes

tmux includes a mode where sessions, windows or panes can be chosen from a tree, this is called tree mode.

There are two key bindings to enter tree mode:

C-b s starts showing only sessions and with the attached session selected;
C-b w starts with sessions expanded so windows are shown and with the current window in the attached session selected.

Items in the tree are tagged by pressing t and untagged by pressing t again. Tagged items are shown in bold and with * after their name.

All tagged items may be untagged by pressing T.

Tagged items may be killed together by pressing X, or a command applied to them all by pressing : for a prompt.

Each item in the tree has as shortcut key in brackets at the start of the line. Pressing this key will immediately choose that item (as if it had been selected and Enter pressed).

Key	Function
`Enter`	Change the attached session, current window or active pane
`Up`	Select previous item
`Down`	Select next item
`Right`	Expand item
`Left`	Collapse item
`x`	Kill selected item
`X`	Kill tagged items
`<`	Scroll preview left
`>`	Scroll preview right
`C-s`	Search by name
`n`	Repeat last search
`t`	Toggle if item is tagged
`T`	Tag no items
`C-t`	Tag all items
`:`	Prompt for a command to run for the selected item or each tagged item
`O`	Change sort field
`r`	Reverse sort order
`v`	Toggle preview
`q`	Exit tree mode

Detaching other clients

A list of clients is available by pressing C-b D (that is, C-b S-d).

Key	Function
`Enter`	Detach selected client
`d`	Detach selected client, same as `Enter`
`D`	Detach tagged clients
`x`	Detach selected client and try to kill the shell it was started from
`X`	Detach tagged clients and try to kill the shells they were started from

Killing a session, window or pane

Pressing C-b & prompts for confirmation then kills (closes) the current window.

C-b x kills only the active pane.

Renaming sessions and windows

C-b $ will prompt for a new name for the attached session.

C-b , prompts for a new name for the current window.

Swapping and moving

Panes can additionally be swapped with the pane above or below using the C-b { and C-b } key bindings.

Pressing C-b . will prompt for a new index for the current window. If a window already exists at the given index, an error will be shown.

Resizing and zooming panes

Panes may be resized in small steps with C-b C-Left, C-b C-Right, C-b C-Up and C-b C-Down and in larger steps with C-b M-Left, C-b M-Right, C-b M-Up and C-b M-Down. These use the resize-pane command.

A single pane may be temporarily made to take up the whole window with C-b z, hiding any other panes. Pressing C-b z again puts the pane and window layout back to how it was. This is called zooming and unzooming.

Window layouts

The panes in a window may be automatically arranged into one of several named layouts, these may be rotated between with the C-b Space key binding or chosen directly with C-b M-1, C-b M-2 and so on.

Name	Key	Description
even-horizontal	`C-b M-1`	Spread out evenly across
even-vertical	`C-b M-2`	Spread out evenly up and down
main-horizontal	`C-b M-3`	One large pane at the top, the rest spread out evenly across
main-vertical	`C-b M-4`	One large pane on the left, the rest spread out evenly up and down
tiled	`C-b M-5`	Tiled in the same number of rows as columns

Copy and paste

Text is copied using copy mode, entered with C-b [, and the most recently copied text is pasted into the active pane with C-b ].

Key	Action
`Up, Down, Left, Right`	Move the cursor
`C-Space`	Start a selection
`C-w`	Copy the selection and exit copy mode
`q`	Exit copy mode
`C-g`	Stop selecting without copying, or stop searching
`C-a`	Move the cursor to the start of the line
`C-e`	Move the cursor to the end of the line
`C-r`	Search interactively backwards
`M-f`	Move the cursor to the next word
`M-b`	Move the cursor to the previous word

Once some text is copied, the most recent may be pasted with C-b ] or an older buffer pasted by using buffer mode, entered with C-b =.

Key	Function
`Enter`	Paste selected buffer
`p`	Paste selected buffer, same as Enter
`P`	Paste tagged buffers
`d`	Delete selected buffer
`D`	Delete tagged buffers

Finding windows and panes

C-b f prompts for some text and then enters tree mode with a filter to show only panes where that text appears in the visible content or title of the pane or in the window name.

Using the mouse

:set -g mouse on

Ubuntu 22.04 MacOS Monterey 主题

2022-05-28T22:03:00+08:00

首先更新源以及升级软件：

sudo apt update
sudo apt upgrade

也可以更新一下驱动：

sudo ubuntu-drivers autoinstall

之后安装 gnome-tweaks 以及 gnome-shell-extensions ：

sudo apt install gnome-tweaks gnome-shell-extensions

到GNOME Shell Extensions网站安装谷歌浏览器扩展：https://extensions.gnome.org/ 。

接下来安装 User Themes 主题插件： https://extensions.gnome.org/extension/19/user-themes/ 。

克隆 GitHub 上的 WhiteSur-gtk-theme 主题： https://github.com/vinceliuice/WhiteSur-gtk-theme 到任意你想存放的目录。

进入到 WhiteSur-gtk-theme 目录下，找到 install.sh 以及 tweaks.sh 脚本文件，执行命令安装主题：

./install.sh -t all -N glassy -s 220
sudo ./tweaks.sh -g -f monterey

下载并提取 Mkos-Big-Sur 图标包到你的 home 下的 .icons 目录： https://www.gnome-look.org/p/1400021 。

找到 Ubuntu 应用程序优化（tweaks），选择外观菜单，在图标、 Shell 以及 过时应用程序 中应用 WhiteSur- 主题以及 Mkos-Big-Sur 图标包。

在 窗口标题栏 菜单中将 标题栏按钮 放置到左侧。

在 GNOME Shell Extensions 网站安装 Blur my Shell 插件： https://extensions.gnome.org/extension/3193/blur-my-shell/ 。

在 GNOME Shell Extensions 网站安装 Compiz alike magic lamp effect 插件： https://extensions.gnome.org/extension/3740/compiz-alike-magic-lamp-effect/ 。

在终端中执行命令：

gsettings set org.gnome.shell.extensions.dash-to-dock click-action 'minimize'

找到 Ubuntu 应用程序的扩展（extensions），找到 Blur my Shell 扩展，点击设置按钮，在 Dash 窗口中将 Dash to Dock blur 选项取消勾选。

最后，选择一张你喜欢的 MacOS 壁纸，也可以到这个 GitHub 仓库下载： https://github.com/vinceliuice/WhiteSur-wallpapers 。

参考链接： https://youtu.be/Y6k7THQ3x6U

常用 Gnome Shell Extensions

CASIA-WebMaskedFace 模拟佩戴口罩人脸数据集

2022-04-14T10:55:00+08:00

Based on CASIA-WebFace Dataset using MaskTheFace tool mask the face images of datasets.

基于CASIA-WebFace数据集，使用MaskTheFace工具给数据集中的人脸图像“戴上口罩”。

数据集介绍

此数据集是在源数据集 CASIA-Webface 之上，使用 MaskTheFace 工具对 CASIA-Webface 数据集中的图像进行佩戴口罩，此数据所涉及到的口罩类型有：Surgical（白色医用外科口罩）、Surgical Blue（蓝色医用外科口罩）、N95、KN95以及Cloth（黑色布质口罩）。口罩的分布类型都是均匀分布随机生成的。

关于口罩类型以及口罩颜色和材质的类型的更多介绍，可以查看原工具仓库。

CASIA-WebMaskedFace 有 10,575 个实体人物， 494,414 张人脸图像。

此数据集是在 CASIA-Webface 数据集原封不动的基础上进行配到口罩的模拟，所以与原数据集有相同的实体和图像数量。

武汉大学国家多媒体软件工程技术研究中心在最早做了相关的研究，也提出了当时最大的模拟口罩人脸数据集和一个真实世界的人脸佩戴口罩的数据集。

Aqeel Anwar, Arijit Raychowdhury 在之后也提出了一个真实世界佩戴口罩的人脸数据集，并且提出了一个工具，也就是上文提到的 MaskTheFace 用来在已有的人脸数据集上进行模拟佩戴口罩。

数据示例

下载地址

Kaggle：https://www.kaggle.com/datasets/geekfx/casia-webmaskedface
Kaggle (cropped using MTCNN, 160x160)：https://www.kaggle.com/datasets/geekfx/casia-webmaskedface-cropped
百度网盘
Google Drive
GitHub

LaTeX Workshop 配置信息

2022-02-25T12:33:00+08:00

使用 VS Code 编写 LaTeX 论文时，安装 LaTeX Workshop 插件可以实现非常多的功能，但是由于 LaTeX Workshop 默认配置的编译命令是 latexmk ，而在编写中文论文时通常需要使用 xelatex 命令来编译文件源代码，所以为了正常使用 LaTeX Workshop 编写中文论文，通常需要对 LaTeX Workshop 进行自定义修改。

以下是笔者根据官方文档自己修改的设置选项信息，每一项的设置上面都写好了中文注释，也为了日后笔者更方便的进行配置、修改。

关于 LaTeX Workshop 的配置官方文档信息，可以参考 LaTeX Workshop GitHub Wiki

本文给出 3 种编译方式：

使用 xelatex 命令编译两次

通常生成目录时，通常先编译一次生成目录所需的辅助文件，例如目录项等，然后编译第二遍结合辅助文件生成最终的 PDF

使用 BibTeX 参考文献工具时所需用到的编译命令
使用 BibLaTeX 参考文献所需用到的编译命令

// ******** LaTeX Workshop 配置信息 ********
// 文件修改时不自动编译
// "never", "onSave", "onFileChange"
"latex-workshop.latex.autoBuild.run": "never",
// LaTeX Workshop 编译源代码文件的快捷键默认为：ctrl + alt + b
// 但是在有些情况下，ctrl + alt 快捷键被占用
// 将下面设置项改为 true 可以启动替代的快捷键
// ctrl + l / alt + letter
"latex-workshop.bind.altKeymap.enabled": false,
// 编译文件时选用哪种 recipes 方案
// recipes 的定义在下文
// "first" （默认）为定义在下文 recipes 中的第一项
// "lastUsed" 为上次使用运行的 recipe
"latex-workshop.latex.recipe.default": "lastUsed",
// 预览生产的 pdf 文件方式：在 vscode 窗口中预览
"latex-workshop.view.pdf.viewer": "tab",
// 设置在使用 LaTeX Workshop 编译后，自动清理辅助文件
// 也可以设置为 "never" 表示不自动清理辅助文件
// 设置 "onFailed" 为当编译失败时自动清理辅助文件
"latex-workshop.latex.autoClean.run": "onBuilt",
// 编译 LaTeX 时使用的工具（tool）顺序
// 工具（tool）需要自定义
"latex-workshop.latex.recipes": [
    // 没有参考文献的编译方式
    // 为了正确生成目录项，一般需要编译两次源代码
    {
        "name": "xelatex",
        "tools": [
            "xelatex",
            "xelatex"
        ]
    },
    // 使用 BibTeX 参考文献工具的编译方式
    {
        "name": "xelatex ➞ bibtex ➞ xelatex × 2",
        "tools": [
            "xelatex",
            "bibtex",
            "xelatex",
            "xelatex"
        ]
    },
    // 使用 BibLaTeX 参考文献工具的编译方式
    {
        "name": "xelatex ➞ biber ➞ xelatex × 2",
        "tools": [
            "xelatex",
            "biber",
            "xelatex",
            "xelatex"
        ]
    }
],
// 定义 recipes 中工具的命令以及参数
// 以下列出 LaTeX Workshop 定义好的占位符
// %DOC%             The root file full path without the extension
// %DOC_W32%         The root file full path without the extension with \ path separator on Windows
// %DOCFILE%         The root file name without the extension
// %DOC_EXT%         The root file full path with the extension
// %DOC_EXT_W32%     The root file full path with the extension with \ path separator on Windows
// %DOCFILE_EXT%     The root file name with the extension
// %DIR%             The root file directory
// %DIR_W32%         The root file directory with \ path separator on Windows
// %TMPDIR%             A temporary folder for storing ancillary files
// %OUTDIR%             The output directory configured in latex-workshop.latex.outDir
// %OUTDIR_W32%         The output directory configured in latex-workshop.latex.outDir with \ path separator on Windows
// %WORKSPACE_FOLDER% The current workspace path
// %RELATIVE_DIR%     The root file directory relative to the workspace folder
// %RELATIVE_DOC%     file root file path relative to the workspace folder
"latex-workshop.latex.tools": [
    {
        "name": "xelatex",
        "command": "xelatex",
        "args": [
            "-synctex=1",
            "-interaction=nonstopmode",
            "-file-line-error",
            "%DOC%"
        ],
        "env": {}
    },
    {
        "name": "bibtex",
        "command": "bibtex",
        "args": [
            "%DOCFILE%"
        ],
        "env": {}
    },
    {
        "name": "biber",
        "command": "biber",
        "args": [
            "%DOCFILE%"
        ],
        "env": {}
    }
],

使用以上配置选项将上述代码拷贝到你的 VS Code 的 json 设置文件下即可。

关于Linux下LaTeX无法找到已安装字体的问题与解决

2021-08-12T22:47:00+08:00

当我在Ubuntu系统下使用Latex时，在编译渲染时报出了Font "xxx" does not contain requested这种错误，其中xxx就是你可能想使用的字体格式。

然而我的Ubuntu系统已经正确安装了一些常用中文字体，然而在LaTeX编译渲染时还会报出错误。

由于笔者为了解决这个问题翻遍了国内外的网站、去查看了Texlive官方文档，耗费了许多时间与精力，都没能找到解决我的问题的信息，人们的时间往往都很珍贵，一般找不到解决方案过后，往往都不了了之，本着互联网极客精神（开源精神），在这里将笔者的解决过程记录并分享，希望人们可以将这种精神继承并传承下去。

我甚至在Texlive的官方安装文档中看到这么一段话：

可以看到关乎于LaTeX排版中让人非常头疼的问题就是汉字的一些处理了，好在今天有非常多好用的宏包可以解决处理这个问题。

首先看一下笔者出现的问题：

可以看到终端给出的错误提示是未能找到已安装的字体（installed font not found），有了这句话的提示，笔者想的可能是，在我的系统上已安装了的字体中，在LaTeX中未能正确地引用，查看代码中可以看到：

在这里设置字体格式的时候我们设置楷体对应的字体文件为simkai，我们可以查看我们的字体库中是否有simkai.ttf楷体的字体文件：

在Ubuntu存放字体的目录/usr/share/fonts/下可以使用find命令查找：

可以看到，笔者的系统中已经存放了simkai.ttf这个字体文件，为什么LaTeX还没有找到这个字体呢。

通过一番排查，原来是在Ubuntu中，对字体的使用并不仅仅是字体文件名，而是另一个别名。

我们可以使用fc-list来查看系统可用的字体列表，然后使用grep匹配simkai.ttf的字体文件，可以进一步查看字体的信息：

原来，图中1号框中的名字只是字体文件名，而在系统应用中想要使用这个字体，我们需要指定2号框中的字体名，例如本图，我们想要使用楷体就需要指定KaiTi或楷体这个名字。

回到LaTeX代码中，将之前出现错误原因的simkai替换成KaiTi即可解决问题。

为了进一步验证笔者的猜想，可以看到上图代码中，在KaiTi的下面还使用了SimSun宋体的使用，然而SimSun并不报错，这里笔者想可以继续查看SimSun的字体信息，其后面的别名应该包含SimSun。

所以笔者的猜想是正确的，所以在我们不同的系统、不同的字体文件中，每个人发生无法正确找到我们想要使用的字体的错误可能都不一样，因为我们使用的字体文件不同可能导致我们的在应用中使用字体中需要引用的名称的不同而不同，所以出现这种问题我们要具体问题具体分析，对症下药。

关于 RNN 循环神经网络的反向传播求导

2021-01-11T20:19:00+08:00

本文是对 RNN 循环神经网络中的每一个神经元进行反向传播求导的数学推导过程，下面还使用 PyTorch 对导数公式进行编程求证。

RNN 神经网络架构

一个普通的 RNN 神经网络如下图所示：

其中 $x^{\langle t \rangle}$ 表示某一个输入数据在 $t$ 时刻的输入；$a^{\langle t \rangle}$ 表示神经网络在 $t$ 时刻时的hidden state，也就是要传送到 $t+1$ 时刻的值；$y^{\langle t \rangle}$ 则表示在第 $t$ 时刻输入数据传入以后产生的预测值，在进行预测或 sampling 时 $y^{\langle t \rangle}$ 通常作为下一时刻即 $t+1$ 时刻的输入，也就是说 $x^{\langle t \rangle}=\hat{y}^{\langle t \rangle}$ ；下面对数据的维度进行说明。

输入： $x\in\mathbb{R}^{n_x\times m\times T_x}$ 其中 $n_x$ 表示每一个时刻输入向量的长度；$m$ 表示数据批量数（batch）；$T_x$ 表示共有多少个输入的时刻（time step）。
hidden state：$a\in\mathbb{R}^{n_a\times m\times T_x}$ 其中 $n_a$ 表示每一个 hidden state 的长度。
预测：$y\in\mathbb{R}^{n_y\times m\times T_y}$ 其中 $n_y$ 表示预测输出的长度；$T_y$ 表示共有多少个输出的时刻（time step）。

RNN 神经元

下图所示的是一个特定的 RNN 神经元：

上图说明了在第 $t$ 时刻的神经元中，数据的输入 $x^{\langle t \rangle}$ 和上一层的 hidden state $a^{\langle t \rangle}$ 是如何经过计算得到下一层的 hidden state 和预测输出 $\hat{y}^{\langle t \rangle}$ 。

下面是对五个参数的维度说明：

$W_{aa}\in\mathbb{R}^{n_a\times n_a}$
$W_{ax}\in\mathbb{R}^{n_a\times n_x}$
$b_a\in\mathbb{R}^{n_a\times 1}$
$W_{ya}\in\mathbb{R}^{n_y\times n_a}$
$b_y\in\mathbb{R}^{n_y\times 1}$

计算 $t$ 时刻的 hidden state $a^{\langle t \rangle}$ ：

\[\begin{split} z1^{\langle t \rangle} &= W_{aa} a^{\langle t-1 \rangle} + W_{ax} x^{\langle t \rangle} + b_a\\ a^{\langle t \rangle} &= \tanh(z1^{\langle t \rangle}) \end{split}\]

预测 $t$ 时刻的输出 $\hat{y}^{\langle t \rangle}$ ：

\[\begin{split} z2^{\langle t \rangle} &= W_{ya} a^{\langle t \rangle} + b_y\\ \hat{y}^{\langle t \rangle} &= softmax(z2^{\langle t \rangle}) = \frac{e^{z2^{\langle t \rangle}}}{\sum_{i=1}^{n_y}e^{z2_i^{\langle t \rangle}}} \end{split}\]

RNN 循环神经网络反向传播

在当今流行的深度学习编程框架中，我们只需要编写一个神经网络的结构和负责神经网络的前向传播，至于反向传播的求导和参数更新，完全由框架搞定；即便如此，我们在学习阶段也要自己动手证明一下反向传播的有效性。

RNN 神经元的反向传播

下图是 RNN 神经网络中的一个基本的神经元，图中标注了反向传播所需传来的参数和输出等。

就如一个全连接的神经网络一样，损失函数 $J$ 的导数通过微积分的链式法则（chain rule）反向传播到每一个时间轴上。

为了方便，我们将损失函数关于神经元中参数的偏导符号简记为 $\mathrm{d}\mathit{parameters}$ ；例如将 $\frac{\partial J}{\partial W_{ax}}$ 记为 $\mathrm{d}W_{ax}$ 。

上图的反向传播的实现并没有包括全连接层和 Softmax 层。

反向传播求导

计算损失函数关于各个参数的偏导数之前，我们先引入一个计算图（computation graph），其演示了一个 RNN 神经元的前向传播和如何利用计算图进行链式法则的反向求导。

因为当进行反向传播求导时，我们需要将整个时间轴的输入全部输入之后，才可以从最后一个时刻开始往前传进行反向传播，所以我们假设 $t$ 时刻就为最后一个时刻 $T_x$ 。

如果我们想要先计算 $\frac{\partial\ell}{\partial W_{ax}}$ 所以我们可以从计算图中看到，反向传播的路径：

我们需要按部就班的分别对从 $W_{ax}$ 计算到 $\ell$ 一路相关的变量进行求偏导，利用链式法则，将红色路线上一路的偏导数相乘到一起，就可以求出偏导数 $\frac{\partial\ell}{\partial W_{ax}}$ ；所以我们得到：

\[\begin{split} \frac{\partial\ell}{\partial W_{ax}} &= \frac{\partial\ell}{\partial\ell^{\langle t\rangle}} {\color{Red}{ \frac{\partial\ell^{\langle t\rangle}}{\partial\hat{y}^{\langle t\rangle}} \frac{\partial\hat{y}^{\langle t\rangle}}{\partial z2^{\langle t\rangle}} }} \frac{\partial z2^{\langle t\rangle}}{\partial a^{\langle t\rangle}} \frac{\partial a^{\langle t\rangle}}{\partial z1^{\langle t\rangle}} \frac{\partial z1^{\langle t\rangle}}{\partial W_{ax}} \end{split}\]

在上面的公式中，我们仅需要分别求出每一个偏导即可，其中红色的部分就是关于 $\mathrm{Softmax}$ 的求导，关于 $\mathrm{Softmax}$ 求导的推导过程，可以看本人的另一篇博客：关于 Softmax 回归的反向传播求导数过程

关于 $\mathrm{tanh}$ 的求导公式如下：

\[\frac{\partial \tanh(x)} {\partial x} = 1 - \tanh^2(x)\]

所以上面的式子就得到：

我们就可以得到在最后时刻 $t$ 参数 $W_{ax}$ 的偏导数。

关于上面式子中的偏导数的计算，除了标量对矩阵的求导，在后面还包括了两个一个矩阵或向量对另一个矩阵或向量中的求导，实际上这是非常麻烦的一件事。

比如在计算 $\frac{\partial z1^{\langle t\rangle}}{\partial W_{ax}}$ 偏导数的时候，我们发现 $z1^{\langle t\rangle}$ 是一个 $\mathbb{R}^{n_a\times m}$ 的矩阵，而 $W_{ax}$ 则是一个 $\mathbb{R}^{n_a\times n_x}$ 的矩阵，这一项就是一个矩阵对另一个矩阵求偏导，如果直接对其求导我们将会得到一个四维的矩阵 $\mathbb{R}^{n_a\times n_x\times n_a\times m}$ （雅可比矩阵 Jacobian matrix）；只不过这个高维矩阵中偏导数的值有很多 $0$ 。

在神经网络中，如果直接将这个高维矩阵直接生搬硬套进梯度下降里更新参数是不可行，因为我们需要得到的梯度是关于自变量同型的向量或矩阵而且我们还要处理更高维度的矩阵的乘法；所以我们需要将结果进行一定的处理得到我们仅仅需要的信息。

一般在深度学习框架中都会有自动求梯度的功能包，这些包（比如 PyTorch ）中就只允许一个标量对向量或矩阵求导，其他情况是不允许的，除非在反向传播的函数里传入一个同型的权重向量或矩阵才可以得到导数。

我们先简单求出一个偏导数 $\frac{\partial\ell}{\partial W_{ax}}$ 我们下面使用 PyTorch 中的自动求梯度的包进行验证我们的公式是否正确。

import torch

# 这是神经网络中的一些架构的参数
n_x = 6
n_y = 6
m = 1
T_x = 5
T_y = 5
n_a = 3

# 定义所有参数矩阵
# requires_grad 为 True 表明在涉及这个变量的运算时建立计算图
# 为了之后反向传播求导
W_ax = torch.randn((n_a, n_x), requires_grad=True)
W_aa = torch.randn((n_a, n_a), requires_grad=True)
ba = torch.randn((n_a, 1), requires_grad=True)
W_ya = torch.randn((n_y, n_a), requires_grad=True)
by = torch.randn((n_y, 1), requires_grad=True)

# t 时刻的输入和上一时刻的 hidden state
x_t = torch.randn((n_x, m), requires_grad=True)
a_prev = torch.randn((n_a, m), requires_grad=True)
y_t = torch.randn((n_y, m), requires_grad=True)

# 开始模拟一个神经元 t 时刻的前向传播
# 从输入一直到计算出 loss
z1_t = torch.matmul(W_ax, x_t) + torch.matmul(W_aa, a_prev) + ba
z1_t.retain_grad()
a_t = torch.tanh(z1_t)
a_t.retain_grad()
z2_t = torch.matmul(W_ya, a_t) + by
z2_t.retain_grad()
y_hat = torch.exp(z2_t) / torch.sum(torch.exp(z2_t), dim=0)
y_hat.retain_grad()
loss_t = -torch.sum(y_t * torch.log(y_hat), dim=0)
loss_t.retain_grad()

# 对最后的 loss 标量开始进行反向传播求导
loss_t.backward()

# 我们就可以得到 W_ax 的导数
# 存储在后缀 _autograd 变量中，表明是由框架自动求导得到的
W_ax_autograd = W_ax.grad

# 查看框架计算得到的导数
W_ax_autograd

tensor([[ 0.5252,  1.1938, -0.2352,  1.1571, -1.0168,  0.3195],
        [-1.0536, -2.3949,  0.4718, -2.3213,  2.0398, -0.6410],
        [-0.0316, -0.0717,  0.0141, -0.0695,  0.0611, -0.0192]])

# 我们对自己推演出的公式进行手动计算导数
# 存储在后缀 _manugrad 变量中，表明是手动由公式计算得到的
W_ax_manugrad = torch.matmul(torch.matmul((y_hat - y_t).T, W_ya).T * (1 - torch.square(torch.tanh(z1_t))), x_t.T)
#torch.matmul(torch.matmul(W_ya.T, y_hat - y_t) * (1 - torch.square(torch.tanh(z1_t))), x_t.T)

# 输出手动计算的导数
W_ax_manugrad

tensor([[ 0.5195,  1.1809, -0.2327,  1.1447, -1.0058,  0.3161],
        [-1.0195, -2.3172,  0.4565, -2.2461,  1.9737, -0.6202],
        [-0.0309, -0.0703,  0.0138, -0.0681,  0.0599, -0.0188]],
       grad_fn=)

# 查看两种求导结果的之差的 L2 范数
torch.norm(W_ax_manugrad - W_ax_autograd)

tensor(0.1356, grad_fn=)

通过上面的编程输出可以看到，我们手动计算的导数和框架自己求出的导数虽然有一定的误差，但是一一对照可以大体看到我们手动求出来的导数大体是对的，并没有说错的非常离谱。

但上面只是当 $t=T_x$ 即 $t$ 时刻是最后一个输入单元的时候，也就是说所求的关于 $W{ax}$ 的导数只是全部导数的一部分，因为参数共享，所以每一时刻的神经元都有对 $W{ax}$ 的导数，所以需要将所有时刻的神经元关于 $W_{ax}$ 的导数全部加起来。

若 $t$ 不是最后一时刻，可能是神经网络里的中间的某一时刻的神经元；也就是说，在进行反向传播的时候，想要求 $t$ 时刻的导数，就得等到 $t+1$ 时刻的导数值传进来，然后根据链式法则才可以计算当前时刻参数的导数。

下面是一个简易的计算图，只绘制出了 $W_ax$ 到 $\ell$ 的计算中，共涉及到哪些变量（在整个神经网络中的 $W_{ax}$ 的权重参数是共享的）：

下面使用一个视频展示整个神经网络中从 $W_{ax}$ 到一个数据批量的损失值 $\ell$ 的大体流向：

计算完 $\ell$ 之后就可以计算 $\frac{\partial\ell}{\partial W_{ax}}$ 的导数值，但是 RNN 神经网络的反向传播区别于全连接神经网络的。

然后，我们演示一下如何进行反向传播的，注意看每一个时刻的 $a^{\langle t\rangle}$ 的计算都是等 $a^{\langle t+1\rangle}$ 的导数值传进来才进行计算的；同样地，$W_{ax}$ 导数的计算也不是一步到位的，也是需要等到所有时刻的 $a$ 的值全部传到才计算完。

所以对于神经网络中间某一个单元 $t$ 我们有：

\[\begin{split} \frac{\partial\ell}{\partial W_{ax}} &= {\color{Red}{ \left( \frac{\partial\ell}{\partial a^{\langle t\rangle}} +\frac{\partial\ell}{\partial z1^{\langle t+1\rangle}} \frac{\partial z1^{\langle t+1\rangle}}{\partial a^{\langle t\rangle}} \right) }} \frac{\partial a^{\langle t\rangle}}{\partial z1^{\langle t\rangle}} \frac{\partial z1^{\langle t\rangle}}{\partial W_{ax}} \end{split}\]

关于红色的部分的意思是需要等到 $t+1$ 时刻的导数值传进来，然后才可以进行对 $t+1$ 时刻关于当前时刻 $t$ 的参数求导，最后得到参数梯度的一个分量。其实若仔细展开每一个偏导项，就像是一个递归一样，每次求某一时刻的导数总是要从最后一时刻往前传到当前时刻才可以进行。

多元复合函数的求导法则

如果函数 $u=\varphi(t)$ 及 $v=\psi(t)$ 都在点 $t$ 可导，函数 $z=f(u,v)$ 在对应点 $(u,v)$ 具有连续偏导数，那么复合函数 $z=f[\varphi(t),\psi(t)]$ 在点 $t$ 可导，且有 $\frac{\mathrm{d}z}{\mathrm{d}t}=\frac{\partial z}{\partial u}\frac{\mathrm{d}u}{\mathrm{d}t}+\frac{\partial z}{\partial v}\frac{\mathrm{d}v}{\mathrm{d}t}$

下面使用一张计算图说明 $a^{\langle t\rangle}$ 到 $\ell$ 的计算关系。

也就是说第 $t$ 时刻 $\ell$ 关于 $a^{\langle t\rangle}$ 的导数是由两部分相加组成，也就是说是由两条路径反向传播，这两条路径分别是 $\ell\to\ell^{\langle t\rangle}\to\hat{y}^{\langle t\rangle}\to z2^{\langle t\rangle}\to a^{\langle t\rangle}$ 和 $\ell\to\ell^{\langle t+1\rangle}\to\hat{y}^{\langle t+1\rangle}\to z2^{\langle t+1\rangle}\to a^{\langle t+1\rangle}\to z1^{\langle t+1\rangle}\to a^{\langle t\rangle}$ ，我们将这两条路径导数之和使用 $\mathrm{d}a_{\mathrm{next}}$ 表示。

所以我们可以得到在中间某一时刻的神经单元关于 $W_{ax}$ 的导数为：

\[\frac{\partial\ell}{\partial W_{ax}}=\left(\mathrm{d}a_{\mathrm{next}} * \left( 1-\tanh^2(z1^{\langle t \rangle}\right)\right) x^{\langle t \rangle T}\]

通过同样的方法，我们就可以得到其它参数的导数：

\[\begin{align} \frac{\partial\ell}{\partial W_{aa}} &= \left(\mathrm{d}a_{\mathrm{next}} * \left( 1-\tanh^2(z1^{\langle t\rangle}) \right)\right) a^{\langle t-1 \rangle T}\\ \frac{\partial\ell}{\partial b_a} & = \sum_{batch}\left( da_{next} * \left( 1-\tanh^2(z1^{\langle t\rangle}) \right)\right)\\ \end{align}\]

除了传递参数的导数，在第 $t$ 时刻还需要传送 $\ell$ 关于 $z1^{\langle t\rangle}$ 的导数到 $t-1$ 时刻，将需要传送到上一时刻的导数记作为 $\mathrm{d}a_{\mathrm{prev}}$ 我们得到：

\[\begin{split} \mathrm{d}a_{\mathrm{prev}} &= \mathrm{d}a_\mathrm{next}\frac{\partial a^{\langle t\rangle}}{\partial z1^{\langle t\rangle}}\frac{\partial z1^{\langle t\rangle}}{\partial a^{\langle t-1\rangle}}\\ &= { W_{aa}}^T\left(\mathrm{d}a_{\mathrm{next}} * \left( 1-\tanh^2(z1^{\langle t\rangle}) \right)\right) \end{split}\]

可以看到，一个循环神经网络的反向传播实际上是非常复杂的，因为每一时刻的神经元都与参数有计算关系，所以反向传播时的路径非常杂乱，其中还涉及到了高维的矩阵，所以在计算时需要对高维矩阵进行一定的矩阵代数转换才方便导数和更新参数的计算。

向量、矩阵和张量的导数

2021-01-09T15:21:00+08:00

[著] Erik Learned-Miller

本文翻译自 Vector, Matrix, and Tensor Derivatives

本人英语水平有限，文章中有翻译不到位的地方请热心指出并改正！

本文的目的是帮助学习向量（vectors）、矩阵（matrices）和更高阶张量（tensors）的导数，关于向量、矩阵和高阶张量求导。

1 简化，简化，简化

关于对数组求导的许多困惑都来自于想要一次性做太多的事情。这些“事情”包括一次性同时对多个组成部分的公式进行求导，在求和符号前面求导和应用链式求导法则。通过做这些事情的同时，我们更有可能犯错，至少在我们有经验之前是这样的。

1.1 扩展符号到显式求和方程式的每个部分

为了简化一个给定的计算，对于输出的单个标量元素（a single scalar element）除了标量变量（scalar variables）写出显示公式通常是非常有用的。一旦对于输出的单个标量元素根据其它标量值有一个显式的公式，然后就可以使用微积分计算，这比同时尝试做所有的矩阵数学、求和和求导工作要简单得多。

Example. 假如我们有一个由一个 $C$ 行 $D$ 列的矩阵 $W$ 乘以一个长度为 $D$ 的列向量 $\vec{x}$ 得到的长度为 $C$ 的列向量 $\vec{y}$ ：

\[\vec{y}=W\vec{x}\tag{1}\]

假如我们对 $\vec{y}$ 关于 $\vec{x}$ 的导数感兴趣。这个导数的完整表征需要 $\vec{y}$ 的每一个分量关于 $\vec{x}$ 的每一个分量的（偏）导数，在这个例子中，将包含 $C\times D$ 个值，因为 $\vec{y}$ 中有 $C$ 个分量，在 $x$ 中有 $D$ 个分量。

让我们开始计算其中的一个，对于 $\vec{y}$ 的第 $3$ 个分量关于关于 $\vec{x}$ 的第 $7$ 个分量；即我们想要计算：

\[\frac{\partial\vec{y}_3}{\partial\vec{x}_7}\]

这只是一个标量关于另一个标量的导数。

要做的第一件事就是写出对于计算 $\vec{y}_3$ 的公式，然后我们就可以对其求导。从矩阵-向量相乘的定义，$\vec{y}_3$ 的值通过 $W$ 的第 $3$ 行和向量 $\vec{x}$ 的点积计算得到：

\[\vec{y}_3=\sum_{j=1}^DW_{3,j}\vec{x}_j\tag{2}\]

此时，我们将原先的矩阵方程（公式 $(1)$ ）简化为一个标量方程。这使得计算想要的导数更加容易。

1.2 移除求和符号

因为直接计算公式 $(2)$ 的导数是可以的，人们经常犯的错误就是当微分表达式中包含求和符号（$\sum$）或是连乘符号（$\prod$）。当我们开始计算时，写出来不包含任何求和符号确保做的每一步都是正确的有时候是非常有用的。使用 $1$ 作为第一个索引，我们有：

\[\vec{y}_3=W_{3,1}\vec{x}_1+W_{3,2}\vec{x}_2+\dots+W_{3,7}\vec{x}_7+\dots+W_{3,D}\vec{x}_D\]

当然，我们明确地包括了含有 $\vec{x}7$ 这一项，因为这就是我们在此的不同之处。此时，我们可以看到对于 $y_3$ 仅依赖在 $\vec{x}_7$ 之上的表达式只有 $W{3,7}\vec{x}_7$ 这一个项。因为在累加中没有其它项包括 $\vec{x}_7$ 即它们关于 $\vec{x}_7$ 的导数都是 $0$ 。因此，我们有：

\[\begin{split} \frac{\partial\vec{y}_3}{\partial\vec{x}_7} &=\frac{\partial}{\partial\vec{x}_7}\left[W_{3,1}\vec{x}_1+W_{3,2}\vec{x}_2+\dots+W_{3,7}\vec{x}_7+\dots+W_{3,D}\vec{x}_D\right]\\ &=0+0+\dots+\frac{\partial}{\partial\vec{x}_7}\left[W_{3,7}\vec{x}_7\right]+\dots+0\\ &=\frac{\partial}{\partial\vec{x}_7}\left[W_{3,7}\vec{x}_7\right]\\ &=W_{3,7} \end{split}\]

通过关注在 $\vec{y}$ 的一个分量和 $\vec{x}$ 的一个分量，我们尽可能地简化计算。在未来，当你感到困惑时，尝试减少一个问题最基本的设置可以帮助你查看哪里出错。

1.2.1 完善导数：雅可比（Jacobian）矩阵

回想我们的原始目标是计算 $\vec{y}$ 每一个分量关于 $\vec{x}$ 的每一个分量的导数，并且我们注意到其中会有 $C\times D$ 个。它们可以写成如下形式的一个矩阵：

\[\begin{bmatrix} \frac{\partial\vec{y}_1}{\partial\vec{x}_1}&\frac{\partial\vec{y}_1}{\partial\vec{x}_2}&\frac{\partial\vec{y}_1}{\partial\vec{x}_3}&\dots&\frac{\partial\vec{y}_1}{\partial\vec{x}_D}\\ \frac{\partial\vec{y}_2}{\partial\vec{x}_1}&\frac{\partial\vec{y}_2}{\partial\vec{x}_2}&\frac{\partial\vec{y}_2}{\partial\vec{x}_3}&\dots&\frac{\partial\vec{y}_2}{\partial\vec{x}_D}\\ \vdots&\vdots&\vdots&\ddots &\vdots\\ \frac{\partial\vec{y}_D}{\partial\vec{x}_1}&\frac{\partial\vec{y}_D}{\partial\vec{x}_2}&\frac{\partial\vec{y}_D}{\partial\vec{x}_3}&\dots&\frac{\partial\vec{y}_D}{\partial\vec{x}_D}\\ \end{bmatrix}\]

在这个特别的例子中，这个矩阵被称为雅可比矩阵（Jacobian matrix），但是这个术语对于我们的目的不重要。

注意对于如下方程：

\[\vec{y}=W\vec{x}\]

$\vec{y}3$ 关于 $\vec{x}_7$ 的部分简单地由 $W{3,7}$ 给出。如果你做相同的处理方式到其它部分上，你将会发现，对于所有的 $i$ 和 $j$ ：

\[\frac{\partial\vec{y}_i}{\partial\vec{x}_j}=W_{i,j}\]

这意味着偏导数的矩阵是

当然，这就是 $W$ 本身。

因此，做完这些所有工作后，我们可以得出结论

\[\vec{y}=W\vec{x}\]

我们有

\[\frac{\mathrm{d}\vec{y}}{\mathrm{d}\vec{x}}=W\]

2 行向量而不是列向量

在使用不同的神经网络包时要密切关注权重（weight）矩阵和数据矩阵等的排列方式是很重要的。举个例子，如果一个数据矩阵 $X$ 包含许多不同的向量，每一个向量表示一个输入，矩阵 $X$ 的一行或是一列哪一个是数据向量？

在第一章节的例子中，我们使用的向量 $\vec{x}$ 是一个列向量。但是，当 $\vec{x}$ 是一个行向量时你也应该有能力使用同样的基础想法。

2.1 Example 2

令 $\vec{y}$ 的长度为 $C$ 的行向量（row vector）是由另一个长度为 $D$ 的行向量 $\vec{x}$ 与 $D$ 行 $C$ 列的矩阵 $W$ 相乘得到。

\[\vec{y}=\vec{x}W\]

重要地是，尽管 $\vec{y}$ 和 $\vec{x}$ 依旧有着相同数量的成分，$W$ 的形状（shape）是我们之前使用的 $W$ 的形状的转置（transpose）。尤其是，因为我们现在使用 $\vec{x}$ 左乘，然而，之前的 $\vec{x}$ 是在右边的，$W$ 对于矩阵代数必须是转置的才行得通。

在这个例子，你将会看到，通过写出

\[\vec{y}_3=\sum_{j=1}^D\vec{x}_jW_{j,3}\]

得到

\[\frac{\partial\vec{y}_3}{\partial\vec{x}_7}=W_{7,3}\]

注意到 $W$ 的索引是与第一个例子是相反的。但是，当我们组成所有的雅可比矩阵，我们仍然可以看到此例中也是

\[\frac{\mathrm{d}\vec{y}}{\mathrm{d}\vec{x}}=W\tag{7}\]

3 高于两维的处理

让我们来考虑另一个密切相关的问题，计算

\[\frac{\mathrm{d}\vec{y}}{\mathrm{d}W}\]

在这个例子中，$\vec{y}$ 变量沿着一个坐标变化而 $W$ 变量沿着两个坐标变化。因此，全部的导数最自然地包含在三维数组中。我们避免术语“三维矩阵（three-dimensional matrix）”，因为如何做定义在三维数组上的矩阵相乘和其它矩阵运算是不清晰的。

三维数组的处理，找到一种输出排列它们的方法可能会变得更加麻烦。相反，我们应该简单地定义我们的结果为可以适用于所需三维数组的任意一个元素结果的公式。

让我们再次计算在 $\vec{y}$ 上的一个分量的标量导数，比如 $\vec{y}3$ 和 $W$ 的一个成分，比如是 $W{7,8}$ 。让我们从相同的基本设置开始，其中我们根据其它标量分量写下一个 $\vec{y}3$ 的等式。现在，我们想要一个根据其它标量值表示 $\vec{y}_3$ 的等式，并且显示出 $W{7,8}$ 在它的计算中扮演的角色。

但是，我们可以看到，$W_{7,8}$ 在 $\vec{y}3$ 的计算中是没有作用（_no role）的，因为：

\[\vec{y}_3=\vec{x}_1W_{1,3}+\vec{x}_2W_{2,3}+\dots+\vec{x}_DW_{D,3} \tag{8}\]

换句话说，即：

\[\frac{\partial\vec{y}_3}{\partial W_{7,8}}=0\]

但是，$\vec{y}3$ 关于 $W$ 的第 $3$ 列元素的偏导数将一定不为 $0$ 。举个例子，$\vec{y}_3$ 关于 $W{2,3}$ 的导数如下给出：

\[\frac{\partial\vec{y}_3}{\partial W_{2,3}}=\vec{x}_2 \tag{9}\]

其可以通过公式 $(8)$ 简单地看出。

总的来说，当 $\vec{y}$ 的分量的索引等于 $W$ 的第二个索引，导数就将非零，同时对于其它情况就为零。我们可以写出：

\[\frac{\partial\vec{y}_j}{\partial W_{i,j}}=\vec{x}_i\]

但是其它三维数组的元素将会是 $0$ 。如果我们令 $F$ 表示 $\vec{y}$ 关于 $W$ 的导数的三维数组，其中：

\[F_{i,j,k}=\frac{\partial\vec{y}_i}{\partial W_{j,k}}\]

则

\[F_{i,j,i}=\vec{x}_j\]

对于 $F$ 的其它元素都为零。

最后，如果我们定义一个新的二维（two-dimensional）数组 $G$ 为：

\[G_{i,j}=F_{i,j,i}\]

我们可以看到我们需要关于 $F$ 的所有信息都被存放到 $G$ 中，并且 $F$ 有用的（non-trivial）部分实际上是二维的，而不是三维的。

在高效的神经网络实现中，将导数数组中重要的部分以一个紧凑的方式表示出是至关重要的。

4 多个数据点

重复之前的一些例子是很好的练习，并且使用 $\vec{x}$ 的多个例子，堆叠在一起形成一个矩阵 $X$ 。让我们假设每一个单独的 $\vec{x}$ 是一个长度为 $D$ 的行向量，$X$ 是一个 $N$ 行 $D$ 列的二维数组。作为我们的上一个例子，$W$ 将会是一个 $D$ 行 $C$ 列的矩阵。$Y$ 由下给出：

\[Y=XW\]

也是一个 $N$ 行 $C$ 列的矩阵。因此，$Y$ 的每一行将给出与输入 $X$ 的相应行的相关联的行向量。

坚持我们写出一个对于给定的输出成分的表达式的技术，我们有：

\[Y_{i,j}=\sum_{k=1}^DX_{i,k}W_{k,j}\]

我们立即可以从这个等式中看到导数：

\[\frac{\partial Y_{a,b}}{\partial X_{c,d}}\]

除了 $a=c$ 的情况，它们都为零。这也就是因为每一个 $Y$ 的分量仅仅通过 $X$ 的相应的行计算得到，$Y$ 和 $X$ 的不同行之间分量的导数都为零。

此外，我们可以看到：

\[\frac{\partial Y_{i,j}}{\partial X_{i,k}}=W_{k,j} \tag{10}\]

这完全不取决于上面我们正在比较的 $X$ 和 $Y$ 的行。

实际上，矩阵 $W$ 保持所有的这些部分，我们只需要记住根据公式 $(10)$ 索引到其中来获得我们想要的具体的偏导数。

如果我们令 $Y_{i,:}$ 为 $Y$ 的第 $i$ 行，令 $X_{i,:}$ 为 $X$ 的第 $i$ 行，我们就可以看到：

\[\frac{\partial Y_{i,:}}{\partial X_{i,:}}=W\]

这是一个我们之前由公式 $(7)$ 得到的结果的简单的归纳。

链式法则与向量和矩阵的结合

现在我们已经解决了几个基础的例子，让我们结合这些思想到一个链式法则（chain rule）的例子上。同样，假设 $\vec{y}$ 和 $\vec{x}$ 都是列向量，让我们从这个等式开始：

\[\vec{y}=VW\vec{x}\]

并且尝试计算 $\vec{y}$ 关于 $\vec{x}$ 的导数。我们应该简单地观察两个矩阵 $V$ 和 $W$ 的乘积不过是另一个矩阵，记作 $U$ ，因此

\[\frac{\mathrm{d}\vec{y}}{\mathrm{d}\vec{x}}=VW=U\]

但是，我们想要通过使用链式法则的处理得到中间结果的定义，以便我们可以看到在这种情况下链式法则如何应用到非标量导数上。

让我们定义中间结果：

\[\vec{m}=W\vec{x}\]

然后我们有：

\[\vec{y}=V\vec{m}\]

然后我们使用链式法则写出：

\[\frac{\mathrm{d}\vec{y}}{\mathrm{d}\vec{x}}=\frac{\mathrm{d}\vec{y}}{\mathrm{d}\vec{m}}\frac{\mathrm{d}\vec{m}}{\mathrm{d}\vec{x}}\]

为了确保我们准确地知道这是什么意思，让我们一次分析一个分量的老方法，以 $\vec{y}$ 的一个分量和 $\vec{x}$ 的一个分量开始：

\[\frac{\mathrm{d}\vec{y}_i}{\mathrm{d}\vec{x}_j}=\frac{\mathrm{d}\vec{y}_i}{\mathrm{d}\vec{m}}\frac{\mathrm{d}\vec{m}}{\mathrm{d}\vec{x}_j}\]

但是我们应该如何准确地解释右边的乘积？链式法则的思想是以 $\vec{y}i$ 关于每一个标量（_each scalar）中间变量的变化乘以（multiply）每一个标量中间变量关于 $\vec{x}_j$ 的变化。尤其如果 $\vec{m}$ 由 $M$ 个分量组成，然后我们写出：

\[\frac{\mathrm{d}\vec{y}_i}{\mathrm{d}\vec{x}_j}=\sum_{k=1}^M\frac{\mathrm{d}\vec{y}_i}{\mathrm{d}\vec{m}_k}\frac{\mathrm{d}\vec{m}_k}{\mathrm{d}\vec{x}_j}\]

回想我们之前关于一个向量关于一个向量的导数的结果：

\[\frac{\mathrm{d}\vec{y}_i}{\mathrm{d}\vec{m}_k}\]

就是 $V_{i,k}$ 并且：

\[\frac{\mathrm{d}\vec{m}_k}{\mathrm{d}\vec{x}_j}\]

就是 $W_{k,j}$ 。所以我们可以写出：

\[\frac{\mathrm{d}\vec{y}_i}{\mathrm{d}\vec{x}_j}=\sum_{k=1}^MV_{i,k}W_{k,j}\]

其就是对 $VW$ 的分量表达式，就是我们原先对这个问题的答案。

总结一下，我们可以在向量和矩阵导数的背景下使用链式法则：

明确说明中间结果和用于表示它们的变量
表示最终导数各个分量的链式法则
对链式法则表达式内的中间结果上适当求和