Table of Contents

Deep Learning
- Basics
- Data Preparation
  - Image
- Network
  - Design
  - Architecture
  - Layer
  - Node
  - Activation
- Loss Function
- Optimizer
- Hyperparameter
  - Reinforcement Learning
- Compiler
- References

Deep Learning

Basics

TBW

Data Preparation

Image

Information - How to Resize Input Images (e.g., AlexNet, ResNet)

https://blog.shikoan.com/imagenet-preprocessing/

Information - Automatic Image Augmentation

https://sebastianraschka.com/blog/2023/data-augmentation-pytorch.html

Information - Tensor Layouts in Memory: NCHW vs NHWC

https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html
https://gist.github.com/mingfeima/595f63e5dd2ac6f87fdb47df4ffe4772
https://www.intel.com/content/www/us/en/docs/onednn/developer-guide-reference/2023-1/understanding-memory-formats.html

N: Batch
C: Channels
H: Height
W: Width

Network

Design

Input
Hidden
Output

Architecture

MLP: Multi-Layer Perceptron
CNN: Convolutional Neural Network

https://cs231n.github.io/convolutional-networks/

RNN: Recurrent Neural Network
Transformer

Layer

Linear
Convolution
Dropout
Pooling
- Max Pooling
- Adaptive Max Pooling
Normalization
- Batch Normalization

https://cvml-expertguide.net/terms/dl/layers/batch-normalization-layer/

Recurrent
- RNN
- LSTM
Transformer
- Encoder
- Decoder

Node

Number

Activation

ReLU [Common Choice]
Tanh
Sigmoid
SoftMax: Typically used in the output layer for (multi-class) classification problems

GELU: Suitable for transformer networks [Smart Choice]
Leaky ReLU

Loss Function

https://qiita.com/Hatomugi/items/d00c1a7df07e0e3925a8

Regression
1. MSE: More sensitive to outliers in the data [Common Choice]
2. MAE: Less sensitive to outliers in the data
3. Huber Loss: Switching between MSE and MAE with a threshold [Smart Choice]
Classification
1. Cross-Entropy: Prime candidate for classification

Optimizer

https://qiita.com/omiita/items/1735c1d048fe5f611f80

SGD: Stochastic Gradient Descent
Momentum: SGD + Moving Average
RMSProp: Root Mean Square Propagation in 2012
Adam: Momentum + RMSProp in 2014 [Common Choice]
RAdam: Rectified Adam in 2020 [Smart Choice]

Hyperparameter

Learning Rate: Initial / Final / Fixed or Time-Based Decay
Batch Size: Greater leads faster training and avoiding trapping in local minima, but lower accuracy

Reinforcement Learning

Gamma
Epsilon
Tau
Entropy

Compiler

https://xsig.ipsj.or.jp/wp-content/uploads/sites/6/2019/06/imai_20190529_xSiG_public_s.pdf

References

https://nonbiri-tereka.hatenablog.com/entry/2016/03/10/073633
https://qiita.com/omiita/items/d24568a835da6911b01e
https://acro-engineer.hatenablog.com/entry/2019/12/25/130000
https://medium.com/aureliantactics/ppo-hyperparameters-and-ranges-6fc2d29bccbe
https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network