Transformer 架构早已在自然语言处理任务中得到广泛应用,但在计算机视觉领域中依旧受到限制。在计算机视觉领域,目前已有大量工作表明模型对 CNN 的依赖不是必需的,当直接应用于图像块序列时,Transformer 也能很好地执行图像分类任务。
本文将简要介绍了优秀的 PyTorch Image Model 库:timm库。与此同时,将会为大家详细介绍其中的视觉Transformer代码以及一个优秀的视觉Transformer 的PyTorch实现,以协助大家更快地开展相关实验。
什么是timm库?
PyTorch Image Model,简称timm,是一个巨大的PyTorch代码集合,包括了一系列:
- image models
- layers
- utilities
- optimizers
- schedulers
- data-loaders / augmentations
- training / validation scripts
作者github链接:
https://github.com/rwightman
timm库链接:
https://github.com/rwightman/pytorch-image-models
所有的PyTorch模型及其对应arxiv链接如下:
- Big Transfer ResNetV2 (BiT) – https://arxiv.org/abs/1912.11370
- CspNet (Cross-Stage Partial Networks) – https://arxiv.org/abs/1911.11929
- DeiT (Vision Transformer) – https://arxiv.org/abs/2012.12877
- DenseNet – https://arxiv.org/abs/1608.06993
- DLA – https://arxiv.org/abs/1707.06484
- DPN (Dual-Path Network) – https://arxiv.org/abs/1707.01629
- EfficientNet (MBConvNet Family)
- EfficientNet NoisyStudent (B0-B7, L2) – https://arxiv.org/abs/1911.04252
- EfficientNet AdvProp (B0-B8) – https://arxiv.org/abs/1911.09665
- EfficientNet (B0-B7) – https://arxiv.org/abs/1905.11946
- EfficientNet-EdgeTPU (S, M, L) – https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
- FBNet-C – https://arxiv.org/abs/1812.03443
- MixNet – https://arxiv.org/abs/1907.09595
- MNASNet B1, A1 (Squeeze-Excite), and Small – https://arxiv.org/abs/1807.11626
- MobileNet-V2 – https://arxiv.org/abs/1801.04381
- Single-Path NAS – https://arxiv.org/abs/1904.02877
- GPU-Efficient Networks – https://arxiv.org/abs/2006.14090
- HRNet – https://arxiv.org/abs/1908.07919
- Inception-V3 – https://arxiv.org/abs/1512.00567
- Inception-ResNet-V2 and Inception-V4 – https://arxiv.org/abs/1602.07261
- MobileNet-V3 (MBConvNet w/ Efficient Head) – https://arxiv.org/abs/1905.02244
- NASNet-A – https://arxiv.org/abs/1707.07012
- NFNet-F – https://arxiv.org/abs/2102.06171
- NF-RegNet / NF-ResNet – https://arxiv.org/abs/2101.08692
- PNasNet – https://arxiv.org/abs/1712.00559
- RegNet – https://arxiv.org/abs/2003.13678
- RepVGG – https://arxiv.org/abs/2101.03697
- ResNet/ResNeXt
- ResNet (v1b/v1.5) – https://arxiv.org/abs/1512.03385
- ResNeXt – https://arxiv.org/abs/1611.05431
- 'Bag of Tricks' / Gluon C, D, E, S variations – https://arxiv.org/abs/1812.01187
- Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 – https://arxiv.org/abs/1805.00932
- Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts – https://arxiv.org/abs/1905.00546
- ECA-Net (ECAResNet) – https://arxiv.org/abs/1910.03151v4
- Squeeze-and-Excitation Networks (SEResNet) – https://arxiv.org/abs/1709.01507
- Res2Net – https://arxiv.org/abs/1904.01169
- ResNeSt – https://arxiv.org/abs/2004.08955
- ReXNet – https://arxiv.org/abs/2007.00992
- SelecSLS – https://arxiv.org/abs/1907.00837
- Selective Kernel Networks – https://arxiv.org/abs/1903.06586
- TResNet – https://arxiv.org/abs/2003.13630
- Vision Transformer – https://arxiv.org/abs/2010.11929
- VovNet V2 and V1 – https://arxiv.org/abs/1911.06667
- Xception – https://arxiv.org/abs/1610.02357
- Xception (Modified Aligned, Gluon) – https://arxiv.org/abs/1802.02611
- Xception (Modified Aligned, TF) – https://arxiv.org/abs/1802.02611
timm库特点
所有的模型都有默认的API:
- accessing/changing the classifier – get_classifier and reset_classifier
- 只对features做前向传播 – forward_features
所有模型都支持多尺度特征提取 (feature pyramids) (通过create_model函数):
动态的全局池化方式可以选择: average pooling, max pooling, average + max, or concat([average, max]),默认是adaptive average。
Schedulers:
Schedulers 包括step,cosinew/ restarts,tanhw/ restarts,plateau 。
© 版权声明
文章版权归作者所有,未经允许请勿转载。
嘿嘿大侠留赞关注我,一直在更新