Machine Learning Formulas

#programming #ml

Backprop equations for column major input $$ \begin{gather} \delta^L = \nabla_{a} C\ \odot \sigma'(z^L)\ \ \delta^l = ((w^{l+1})^T \delta^{l+1})\ \odot \sigma'(z^l) \ \ \frac{\partial C}{\partial b^l} = \delta^l \ \ \frac{\partial C}{\partial w^l} = \delta^l \times (a^{l-1})^T \end{gather} $$

Backprop equations for row major input $$\begin{gather} \delta^L = \nabla_{a} C\ \odot \sigma'(z^L) \ \ \delta^l = (\delta^{l+1}(w^{l+1})^T) \odot \sigma'(z^l) \ \ \frac{\partial C}{\partial b^l} = \delta^l \ \ \frac{\partial C}{\partial w^l} = (a^{l-1})^T \times \delta^l \end{gather}$$

The major difference is the order of multiplication used for $δ^{l}$

Loss functions

MSE $$C = \frac{1}{2n}\sum_{j}(a_{j}^{L} - y_{j})^2$$

Cross entropy $$C = -\frac{1}{n}\sum_{j}(y_{j}\ln(a^{L}{j}) + (1-y)\ln(1-a^L_{j}))$$

And the gradient for cross-entropy loss is simply $$\delta^L = a^L - y$$

Continue reading
Regularization
Weight Initialization

Powered by Forestry.md