在本教程中,您将学习针对移动端以及嵌入式视觉的应用的一类有效的模型MobileNet v1,v2,v3
本教程需要安装软件库thop用来统计模型的乘法-加分操作(MAdds),这是统计深度模型ops的常见做法,因为现代硬件处理器大部分使用FMA指令集来处理Tensor,FMA指令集允许建立新的指令并有效率地执行各种复杂的运算,可结合乘法与加法运算(即进行积和熔加运算),通过单一指令执行多次重复计算,从而简化程序。与FLOPs不同,MAdds表示一次乘法加法操作,粗略来看MAdds = 0.5 * FLOPs。
安装thop,请在terminal运行如下命令:
shell conda info -e source activate xxx
#xxx为对应的pytorch环境,这里为ipykernel_py3_pytorch1.4
pip install thop
下面测试一下resnet50的MAdds和参数量,可以得到resnet50的MAdds约为4.1G MAdds(与输入图像大小和模型有关), params约为25.6M(与模型有关)
import torch from torchvision.models import resnet50 from thop import profile model = resnet50() input = torch.randn(1, 3, 224, 224) madds, params = profile(model, inputs=(input, )) print('resnet50 #madds:{}, #params:{}'.format(madds, params))
resnet50 #madds:4113562624.0, #params:25557032.0
MobileNet v1主要依赖于Group Convolution来降低计算量,以feature map大小$C \times H \times W$,K个3x3卷积核为例,实际计算量可以粗略表示为$H \times W \times (C \times 3 \times 3) \times K$。如果将卷积核分为g组,则实际计算量为$H \times W \times (\frac{C}{g} \times 3 \times 3) \times \frac{K}{g} \times g$,因此当把卷积核分为g组时,实际计算量约减少为原来的1/g。下面我们用代码来演示一下
import torch.nn as nn # 我们定义两个模型Model1和Model2,唯一区别在于Model1中卷积层分成了4组 class Model1(nn.Module): def __init__(self): super(Model1, self).__init__() self.conv = nn.Conv2d(16, 32, 3, stride=1, padding=1, groups=4) def forward(self, x): return self.conv(x) class Model2(nn.Module): def __init__(self): super(Model2, self).__init__() self.conv = nn.Conv2d(16, 32, 3, stride=1, padding=1) def forward(self, x): return self.conv(x) # 可以从结果可以看出两个模型的输出特征尺寸不变,但model1的MAdds和参数量都为model2的1/4 model1 = Model1() model2 = Model2() input = torch.randn(1, 16, 224, 224) ouput1 = model1(input) madds, params = profile(model1, inputs=(input, )) print('model1 input_size:{} #madds:{}, #params:{}'.format(ouput1.detach().numpy().shape, madds, params)) ouput2 = model2(input) madds, params = profile(model2, inputs=(input, )) print('model2 input_size:{} #madds:{}, #params:{}'.format(ouput2.detach().numpy().shape, madds, params))
model1 input_size:(1, 32, 224, 224) #madds:59408384.0, #params:1184.0 model2 input_size:(1, 32, 224, 224) #madds:232816640.0, #params:4640.0
MobileNet v1将Group设置为输入特征图的通道数,称为“depthwise”,后面加了一层1x1的卷积来改变通道数,称为“pointwise”
普通卷积层定义:3x3 Conv+BN+ReLU
MobileNetv1卷积层定义:3x3 Depthwise Conv+BN+ReLU 和 1x1 Pointwise Conv+BN+ReLU
实际计算量压缩粗略可以看成:$\frac{H \times W \times (3 \times 3) \times C + H \times W \times (C \times 1 \times 1) \times K}{H \times W \times (C \times 3 \times 3) \times K} = \frac{1}{K} + \frac{1}{3 \times 3}$
# MobileNetv1定义如下 class MobileNetv1(nn.Module): def __init__(self): super(MobileNetv1, self).__init__() def conv_bn(inp, oup, stride): return nn.Sequential( nn.Conv2d(inp, oup, 3, stride, 1, bias=False), nn.BatchNorm2d(oup), nn.ReLU(inplace=True) ) def conv_dw(inp, oup, stride): return nn.Sequential( nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False), nn.BatchNorm2d(inp), nn.ReLU(inplace=True), nn.Conv2d(inp, oup, 1, 1, 0, bias=False), nn.BatchNorm2d(oup), nn.ReLU(inplace=True), ) self.model = nn.Sequential( conv_bn( 3, 32, 2), conv_dw( 32, 64, 1), conv_dw( 64, 128, 2), conv_dw(128, 128, 1), conv_dw(128, 256, 2), conv_dw(256, 256, 1), conv_dw(256, 512, 2), conv_dw(512, 512, 1), conv_dw(512, 512, 1), conv_dw(512, 512, 1), conv_dw(512, 512, 1), conv_dw(512, 512, 1), conv_dw(512, 1024, 2), conv_dw(1024, 1024, 1), nn.AvgPool2d(7), ) self.fc = nn.Linear(1024, 1000) def forward(self, x): x = self.model(x) x = x.view(-1, 1024) x = self.fc(x) return x # 我们再仿照MobileNetv1的结构定义一个普通卷积构成的网络 ConvNet class ConvNet(nn.Module): def __init__(self): super(ConvNet, self).__init__() def conv_bn(inp, oup, stride): return nn.Sequential( nn.Conv2d(inp, oup, 3, stride, 1, bias=False), nn.BatchNorm2d(oup), nn.ReLU(inplace=True) ) self.model = nn.Sequential( conv_bn( 3, 32, 2), conv_bn( 32, 64, 1), conv_bn( 64, 128, 2), conv_bn(128, 128, 1), conv_bn(128, 256, 2), conv_bn(256, 256, 1), conv_bn(256, 512, 2), conv_bn(512, 512, 1), conv_bn(512, 512, 1), conv_bn(512, 512, 1), conv_bn(512, 512, 1), conv_bn(512, 512, 1), conv_bn(512, 1024, 2), conv_bn(1024, 1024, 1), nn.AvgPool2d(7), ) self.fc = nn.Linear(1024, 1000) def forward(self, x): x = self.model(x) x = x.view(-1, 1024) x = self.fc(x) return x
# 可以得到MobileNetv1的MAdds约为580M(与输入图像大小和模型有关), params约为4.2M(与模型有关);ConvNet的MAdds约为4.9G,params约为29.2M model = MobileNetv1() input = torch.randn(1, 3, 224, 224) madds, params = profile(model, inputs=(input, )) print('MobileNetv1 #madds:{}, #params:{}'.format(madds, params)) model = ConvNet() input = torch.randn(1, 3, 224, 224) madds, params = profile(model, inputs=(input, )) print('ConvNet #madds:{}, #params:{}'.format(madds, params))
MobileNetv1 #madds:579850752.0, #params:4231976.0 ConvNet #madds:4874540032.0, #params:29294088.0
MobileNet v2是Google继v1之后提出的下一代轻量化网络,主要解决了V1在训练过程中非常容易特征退化的问题,v2相比v1效果有一定提升;在MobileNetV2的论文中,作者通过可视化对比了ReLU对不同纬度特征造成的低维度数据坍塌(collapses),channel越少的feature map不应后接ReLU,神经元输出很容易变为0,ReLU对于0的输出的梯度为0,所以一旦陷入了0输出就学废了;第二个问题是v1没有特征复用,Mobilenet v2针对残差结构提出Inverted Residual Block作为网络基本结构;
Residual Block: 先用1x1降通道,再使用3x3卷积,再用1x1卷积恢复通道,并和输入相加。之所以要1x1卷积降通道,是为了减少计算量,不然中间的3x3空间卷积计算量太大。现在我们中间的3x3卷积变为了Depthwise,计算量会大幅度减少,所以通道可以多一点效果更好
Inverted Residual Block: 先1x1卷积先提升通道数,再使用Depthwise的3x3卷积,再用1x1卷积降低通道
# 我们直接使用torchvision定义的MobileNet V2的定义,可以得到MobileNetv2 MAdds约为315M, params约为3.5M, # 可以看出模型复杂度相较于v1有了很大的改善,同时从论文实验结果显示,在ImageNet上的准确率也从70.6%提升到了72% from torchvision.models import mobilenet_v2 model = mobilenet_v2() input = torch.randn(1, 3, 224, 224) madds, params = profile(model, inputs=(input, )) print('MobileNetv2 #madds:{}, #params:{}'.format(madds, params))
MobileNetv2 #madds:315410496.0, #params:3504872.0
MobileNet v3相较于v2有两大创新点:第一个是利用NAS技术在限定资源的情况下进行模块极搜索;第二个是对模型结构做了微调,比如加入SE Block,使用hard-swish激活函数等,代码可以参考我们预先定义的MobileNetV3网络结构,可以通过滴滴云S3内网下载 ``` shell wget https://dataset-public.s3-internal.didiyunapi.com/DAI教程/MobileNet-v1-v2-v3/mobilenetv3.py
```python from mobilenetv3 import MobileNetV3 model = MobileNetV3(mode='large') input = torch.randn(1, 3, 224, 224) madds, params = profile(model, inputs=(input, )) print('MobileNetv3 #madds:{}, #params:{}'.format(madds, params))
MobileNetv3 #madds:232431296.0, #params:5476416.0
MobileNetv3(larget) 参数量5.5M大于v2,但MAdds约为232M少于v2的315M,而且从论文中可以看出在ImageNet上的准确率从72%提升到了75.2%