PyTorch 的7个有用的技巧,包含演示代码

来源:Reddit开发者分享,翻译:量子位 丰色 发自 凹非寺,如有侵权,请联系作者删除!

现在,Reddit上的一位开发者根据他曾经犯过的错和经常忘记的点,总结了七点使用PyTorch的小技巧,供大家参考。

该分享目前在Reddit上得到了300+的支持。

很多人表示很有用,并有人指出这些不仅仅是tips,是每个人在使用Pytorch之前应该阅读的教程的一部分。

一、使用device参数直接在目标设备创建张量

这样速度会更快!在线示例代码显示,直接在GPU上创建只需0.009s:

start_time = time.time()

for _ in range(100):
  # Creating on the CPU, then transfering to the GPU
  cpu_tensor = torch.ones((1000, 64, 64))
  gpu_tensor = cpu_tensor.cuda()

print('Total time: {:.3f}s'.format(time.time() - start_time))

Total time: 0.584s

start_time = time.time()

for _ in range(100):
  # Creating on GPU directly
  cpu_tensor = torch.ones((1000, 64, 64), device='cuda')

print('Total time: {:.3f}s'.format(time.time() - start_time))

Total time: 0.009s

对此,有网友补充道,之所以这样更快,是因为使用device参数是直接在GPU上创建张量,而不是在CPU上创建后再复制到GPU。并且这样以来,使用的RAM更少,也不会留下CPU张量hanging around的风险。

二、可能的话使用Sequential层

为了代码更干净。

下面是部分示例代码:

class ExampleModel(nn.Module):
  def __init__(self):
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size, hidden_size)
    self.input_activation = nn.ReLU()

    self.mid_layer = nn.Linear(hidden_size, hidden_size)
    self.mid_activation = nn.ReLU()

    self.output_layer = nn.Linear(hidden_size, output_size)

  def forward(self, x):
    z = self.input_layer(x)
    z = self.input_activation(z)
    
    z = self.mid_layer(z)
    z = self.mid_activation(z)
    
    out = self.output_layer(z)

    return out

example_model = ExampleModel()
print(example_model)
print('Output shape:', example_model(torch.ones([100, 2])).shape)

输出:

ExampleModel(
  (input_layer): Linear(in_features=2, out_features=16, bias=True)
  (input_activation): ReLU()
  (mid_layer): Linear(in_features=16, out_features=16, bias=True)
  (mid_activation): ReLU()
  (output_layer): Linear(in_features=16, out_features=3, bias=True)
)
Output shape: torch.Size([100, 3])

三、尽量不要使用列表层

不好的例子:

class BadListModel(nn.Module):
  def __init__(self):
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size, hidden_size)
    self.input_activation = nn.ReLU()

    # Fairly common when using residual layers
    self.mid_layers = []
    for _ in range(5):
      self.mid_layers.append(nn.Linear(hidden_size, hidden_size))
      self.mid_layers.append(nn.ReLU())

    self.output_layer = nn.Linear(hidden_size, output_size)

  def forward(self, x):
    z = self.input_layer(x)
    z = self.input_activation(z)
    
    for layer in self.mid_layers:
      z = layer(z)
    
    out = self.output_layer(z)

    return out

bad_list_model = BadListModel()
print('Output shape:', bad_list_model(torch.ones([100, 2])).shape)

Output shape: torch.Size([100, 3])

虽然能进行前向运行,但是在实际在GPU运行时会报错。

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
gpu_input = torch.ones([100, 2])
gpu_input = gpu_input.to(device)
gpu_bad_list_model = bad_list_model.to(device)
print('Output shape:', bad_list_model(gpu_input).shape)

RuntimeError: Tensor for ‘out’ is on CPU, Tensor for argument #1 ‘self’ is on CPU, but expected them to be on GPU (while checking arguments for addmm)

在pytorch中不能使用python常规list来添加层,应该使用torch.nn.ModuleList代替,如果使用list组成的层,都是顺序执行,也可以使用nn.Sequential进行序列化。

正确示例如下:

class CorrectListModel(nn.Module):
  def __init__(self):
    super().__init__()

    input_size = 2
    output_size = 3
    hidden_size = 16

    self.input_layer = nn.Linear(input_size, hidden_size)
    self.input_activation = nn.ReLU()

    # Fairly common when using residual layers
    self.mid_layers = []
    for _ in range(5):
      self.mid_layers.append(nn.Linear(hidden_size, hidden_size))
      self.mid_layers.append(nn.ReLU())
    self.mid_layers = nn.Sequential(*self.mid_layers)

    self.output_layer = nn.Linear(hidden_size, output_size)

  def forward(self, x):
    z = self.input_layer(x)
    z = self.input_activation(z)
    z = self.mid_layers(z)
    out = self.output_layer(z)

    return out

使用常规list组成的层,如果是顺序执行关系,可以用nn.Sequential进行序列化,执行时进行顺序执行,但是如果不是顺序执行关系,则需要使用torch.nn.ModuleList代替list.

四、充分利用torch.distributions

PyTorch有一些不错的对象和函数用于distribution,但这位开发者认为它们在torch.distributions中没有得到充分利用。可以这样使用:

# Setup
example_model = ExampleModel()
input_tensor = torch.rand(5, 2)
output = example_model(input_tensor)
print(output)

输出:

tensor([[ 0.1965,  0.0558, -0.2112],
        [ 0.2035,  0.0650, -0.2077],
        [ 0.2150,  0.0577, -0.2096],
        [ 0.1957,  0.0540, -0.2117],
        [ 0.2045,  0.0566, -0.2085]], grad_fn=<AddmmBackward>)

引入

from torch.distributions import Categorical
from torch.distributions.kl import kl_divergence

dist = Categorical(logits=output)
dist

Categorical(logits: torch.Size([5, 3]))

# Get probabilities
dist.probs

输出:

tensor([[0.3946, 0.3428, 0.2625],
        [0.3947, 0.3437, 0.2616],
        [0.3986, 0.3406, 0.2607],
        [0.3947, 0.3426, 0.2627],
        [0.3962, 0.3417, 0.2621]], grad_fn=<SoftmaxBackward>)

计算KL散度

# Calculate the KL-Divergence
dist_1 = Categorical(logits=output[0])
dist_2 = Categorical(logits=output[1])
kl_divergence(dist_1, dist_2)

tensor(2.5076e-06, grad_fn=<SumBackward1>)

五、对长度量(Long-Term Metrics)使用detach()

在两个epochs之间存储张量度量时,请确保对它们调用.detach(),以避免内存泄漏

# Setup
example_model = ExampleModel()
data_batches = [torch.rand((10, 2)) for _ in range(5)]
criterion = nn.MSELoss(reduce='mean')

Bad Example

losses = []

# Training loop
for batch in data_batches:
  output = example_model(batch)

  target = torch.rand((10, 3))
  loss = criterion(output, target)
  losses.append(loss)

  # Optimization happens here

print(losses)

[tensor(0.4718, grad_fn=<MseLossBackward>), tensor(0.5156, grad_fn=<MseLossBackward>), tensor(0.6583, grad_fn=<MseLossBackward>), tensor(0.4429, grad_fn=<MseLossBackward>), tensor(0.4133, grad_fn=<MseLossBackward>)]

Better Example

losses = []

# Training loop
for batch in data_batches:
  output = example_model(batch)

  target = torch.rand((10, 3))
  loss = criterion(output, target)
  losses.append(loss.item()) # Or `loss.item()`

  # Optimization happens here

print(losses)

[0.5439911484718323, 0.5461570620536804, 0.6738904118537903, 0.5780249834060669, 0.5130327939987183]

六、删除模型时,使用torch.cuda.empty_cache()清除GPU缓存

import gc
example_model = ExampleModel().cuda()

del example_model

gc.collect()
# The model will normally stay on the cache until something takes it's place
torch.cuda.empty_cache()

七、预测之前一定记得调用model.eval()

是不是很多人都忘记了?

如果你忘记调用model.eval(),也就是忘记将模型转变为evaluation(测试)模式,那么Dropout层和Batch Normalization层就会对你的预测数据造成干扰。

example_model = ExampleModel()

# Do training

example_model.eval()

# Do testing

example_model.train()

# Do training again

在线代码演示:https://colab.research.google.com/drive/15vGzXs_ueoKL0jYpC4gr9BCTfWt935DC

给TA买糖
共{{data.count}}人
人已赞赏
文章

在Backbone不变的情况下,若显存有限,如何增大训练时的batchsize?

2021-5-10 14:18:55

文章计算机视觉

关于计算机视觉中的感受野

2021-5-24 23:59:59

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
搜索