Published on

Overfitting and Underfitting

In the realm of machine learning,

overfitting and underfitting are common challenges in ml(machine learning).

A model that performs well on both training and unseen data is the ultimate goal.

Overfitting
Occurs when a model learns the training data too well, capturing noise and random fluctuations instead of the underlaying patterns, as a result the model performs poorly on new, unseen data.

Underfitting A model is too simple to capture the underlaying patterns in the data. it performs poorly on both training and test data.

Regularization

Regularization is a crucial technique to prevent overfitting in ml. It helps model generalize better to unseen data.

  • L1 Regularization (Lasso) L1 Regularization adds the absolute value of the weights to the loss function. This encourages sparsity, meaning many weights become zero, effectively performing feature selection.

Lasso 全称 Least Absolute Shrinkage and Selection Operator,即最小绝对收缩和选择算子,常被译为 套索回归。它是一种使用 L1 正则化的线性回归模型估计方法。Lasso 回归不仅可以进行参数估计,还能在估计过程中对变量进行选择,使得一些回归系数变为零,从而达到变量选择的目的。

  • L2 Regularization (Ridge Regression) L2 Regularization adds the squared magnitude of the weights to the loss function. It prevents weights from growing too large.

  • Dropout It randomly sets a fraction of input units to zero at each training update. This prevents the network from relying too much on any particular feature.

import torch.nn as nn

dropout_layer - nn.Dropout(p=0.5) # Dropout rate of 50%
  • Early Stopping Its a regularization technique that prevents overfitting by halting the training process before the model starts to memorize the trainning data too closely.

Training is stopped when the performance on the validation set starts to deteriorate.

Data Augmentation

Its a technique used to increase the amount of data available for training a ml model. Especially when dealing with small datasets.

  • Cropping
  • Resizing
  • Roatating
  • Flipping
  • Adding noise
  • Color jittering
  • Libs
  • Keras
  • Augmentor
  • Albumentations

THE END