视觉Transformer-timm开源库

Transformer 架构早已在自然语言处理任务中得到广泛应用,但在计算机视觉领域中依旧受到限制。在计算机视觉领域,目前已有大量工作表明模型对 CNN 的依赖不是必需的,当直接应用于图像块序列时,Transformer 也能很好地执行图像分类任务。

本文将简要介绍了优秀的 PyTorch Image Model 库:timm库。与此同时,将会为大家详细介绍其中的视觉Transformer代码以及一个优秀的视觉Transformer 的PyTorch实现,以协助大家更快地开展相关实验。

什么是timm库?

PyTorch Image Model,简称timm,是一个巨大的PyTorch代码集合,包括了一系列:

  • image models
  • layers
  • utilities
  • optimizers
  • schedulers
  • data-loaders / augmentations
  • training / validation scripts

作者github链接:
https://github.com/rwightman

timm库链接:
https://github.com/rwightman/pytorch-image-models

所有的PyTorch模型及其对应arxiv链接如下:

  • Big Transfer ResNetV2 (BiT) – https://arxiv.org/abs/1912.11370
  • CspNet (Cross-Stage Partial Networks) – https://arxiv.org/abs/1911.11929
  • DeiT (Vision Transformer) – https://arxiv.org/abs/2012.12877
  • DenseNet – https://arxiv.org/abs/1608.06993
  • DLA – https://arxiv.org/abs/1707.06484
  • DPN (Dual-Path Network) – https://arxiv.org/abs/1707.01629
  • EfficientNet (MBConvNet Family)
  • EfficientNet NoisyStudent (B0-B7, L2) – https://arxiv.org/abs/1911.04252
  • EfficientNet AdvProp (B0-B8) – https://arxiv.org/abs/1911.09665
  • EfficientNet (B0-B7) – https://arxiv.org/abs/1905.11946
  • EfficientNet-EdgeTPU (S, M, L) – https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
  • FBNet-C – https://arxiv.org/abs/1812.03443
  • MixNet – https://arxiv.org/abs/1907.09595
  • MNASNet B1, A1 (Squeeze-Excite), and Small – https://arxiv.org/abs/1807.11626
  • MobileNet-V2 – https://arxiv.org/abs/1801.04381
  • Single-Path NAS – https://arxiv.org/abs/1904.02877
  • GPU-Efficient Networks – https://arxiv.org/abs/2006.14090
  • HRNet – https://arxiv.org/abs/1908.07919
  • Inception-V3 – https://arxiv.org/abs/1512.00567
  • Inception-ResNet-V2 and Inception-V4 – https://arxiv.org/abs/1602.07261
  • MobileNet-V3 (MBConvNet w/ Efficient Head) – https://arxiv.org/abs/1905.02244
  • NASNet-A – https://arxiv.org/abs/1707.07012
  • NFNet-F – https://arxiv.org/abs/2102.06171
  • NF-RegNet / NF-ResNet – https://arxiv.org/abs/2101.08692
  • PNasNet – https://arxiv.org/abs/1712.00559
  • RegNet – https://arxiv.org/abs/2003.13678
  • RepVGG – https://arxiv.org/abs/2101.03697
  • ResNet/ResNeXt
  • ResNet (v1b/v1.5) – https://arxiv.org/abs/1512.03385
  • ResNeXt – https://arxiv.org/abs/1611.05431
  • 'Bag of Tricks' / Gluon C, D, E, S variations – https://arxiv.org/abs/1812.01187
  • Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 – https://arxiv.org/abs/1805.00932
  • Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts – https://arxiv.org/abs/1905.00546
  • ECA-Net (ECAResNet) – https://arxiv.org/abs/1910.03151v4
  • Squeeze-and-Excitation Networks (SEResNet) – https://arxiv.org/abs/1709.01507
  • Res2Net – https://arxiv.org/abs/1904.01169
  • ResNeSt – https://arxiv.org/abs/2004.08955
  • ReXNet – https://arxiv.org/abs/2007.00992
  • SelecSLS – https://arxiv.org/abs/1907.00837
  • Selective Kernel Networks – https://arxiv.org/abs/1903.06586
  • TResNet – https://arxiv.org/abs/2003.13630
  • Vision Transformer – https://arxiv.org/abs/2010.11929
  • VovNet V2 and V1 – https://arxiv.org/abs/1911.06667
  • Xception – https://arxiv.org/abs/1610.02357
  • Xception (Modified Aligned, Gluon) – https://arxiv.org/abs/1802.02611
  • Xception (Modified Aligned, TF) – https://arxiv.org/abs/1802.02611

timm库特点

所有的模型都有默认的API:

  • accessing/changing the classifier – get_classifier and reset_classifier
  • 只对features做前向传播 – forward_features

所有模型都支持多尺度特征提取 (feature pyramids) (通过create_model函数):

动态的全局池化方式可以选择: average pooling, max pooling, average + max, or concat([average, max]),默认是adaptive average。

Schedulers:

Schedulers 包括step,cosinew/ restarts,tanhw/ restarts,plateau 。

© 版权声明

相关文章

1 条评论

  • 头像
    BlackDreamland 投稿者

    嘿嘿大侠留赞关注我,一直在更新

    无记录
    回复