Vision Transformer
paper:[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arxiv.org)
bilibili:ViT论文逐段精读【论文精读】_哔哩哔哩_bilibili
code:WZMIAOMIAO/deep-learning-for-image-processing (github.com)
DeiT
paper:[2012.12877] Training data-efficient image transformers & distillation through attention (arxiv.org)
bilibili:DeiT:注意力Attention也能蒸馏 - 知乎 (zhihu.com)
code:同ViT
MAE
paper:[2111.06377] Masked Autoencoders Are Scalable Vision Learners (arxiv.org)
bilibili:MAE 论文逐段精读【论文精读】_哔哩哔哩_bilibili
bilibili:43、逐行讲解Masked AutoEncoder(MAE)的PyTorch代码_哔哩哔哩_bilibili
code:facebookresearch/mae
MoCo
paper:[1911.05722] Momentum Contrast for Unsupervised Visual Representation Learning (arxiv.org)
网上似乎关于MoCo的代码解读并不多,之后如果有时间可能录一份MoCo代码详解视频(如有)
Swim Transformer
......