来源:Reddit开发者分享,翻译:量子位 丰色 发自 凹非寺,如有侵权,请联系作者删除!
现在,Reddit上的一位开发者根据他曾经犯过的错和经常忘记的点,总结了七点使用PyTorch的小技巧,供大家参考。
该分享目前在Reddit上得到了300+的支持。
很多人表示很有用,并有人指出这些不仅仅是tips,是每个人在使用Pytorch之前应该阅读的教程的一部分。
一、使用device参数直接在目标设备创建张量
这样速度会更快!在线示例代码显示,直接在GPU上创建只需0.009s:
start_time = time.time()
for _ in range(100):
# Creating on the CPU, then transfering to the GPU
cpu_tensor = torch.ones((1000, 64, 64))
gpu_tensor = cpu_tensor.cuda()
print('Total time: {:.3f}s'.format(time.time() - start_time))
Total time: 0.584s
start_time = time.time()
for _ in range(100):
# Creating on GPU directly
cpu_tensor = torch.ones((1000, 64, 64), device='cuda')
print('Total time: {:.3f}s'.format(time.time() - start_time))
Total time: 0.009s
对此,有网友补充道,之所以这样更快,是因为使用device参数是直接在GPU上创建张量,而不是在CPU上创建后再复制到GPU。并且这样以来,使用的RAM更少,也不会留下CPU张量hanging around的风险。
二、可能的话使用Sequential层
为了代码更干净。
下面是部分示例代码:
class ExampleModel(nn.Module):
def __init__(self):
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.input_layer = nn.Linear(input_size, hidden_size)
self.input_activation = nn.ReLU()
self.mid_layer = nn.Linear(hidden_size, hidden_size)
self.mid_activation = nn.ReLU()
self.output_layer = nn.Linear(hidden_size, output_size)
def forward(self, x):
z = self.input_layer(x)
z = self.input_activation(z)
z = self.mid_layer(z)
z = self.mid_activation(z)
out = self.output_layer(z)
return out
example_model = ExampleModel()
print(example_model)
print('Output shape:', example_model(torch.ones([100, 2])).shape)
输出:
ExampleModel(
(input_layer): Linear(in_features=2, out_features=16, bias=True)
(input_activation): ReLU()
(mid_layer): Linear(in_features=16, out_features=16, bias=True)
(mid_activation): ReLU()
(output_layer): Linear(in_features=16, out_features=3, bias=True)
)
Output shape: torch.Size([100, 3])
三、尽量不要使用列表层
不好的例子:
class BadListModel(nn.Module):
def __init__(self):
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.input_layer = nn.Linear(input_size, hidden_size)
self.input_activation = nn.ReLU()
# Fairly common when using residual layers
self.mid_layers = []
for _ in range(5):
self.mid_layers.append(nn.Linear(hidden_size, hidden_size))
self.mid_layers.append(nn.ReLU())
self.output_layer = nn.Linear(hidden_size, output_size)
def forward(self, x):
z = self.input_layer(x)
z = self.input_activation(z)
for layer in self.mid_layers:
z = layer(z)
out = self.output_layer(z)
return out
bad_list_model = BadListModel()
print('Output shape:', bad_list_model(torch.ones([100, 2])).shape)
Output shape: torch.Size([100, 3])
虽然能进行前向运行,但是在实际在GPU运行时会报错。
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
gpu_input = torch.ones([100, 2])
gpu_input = gpu_input.to(device)
gpu_bad_list_model = bad_list_model.to(device)
print('Output shape:', bad_list_model(gpu_input).shape)
RuntimeError: Tensor for ‘out’ is on CPU, Tensor for argument #1 ‘self’ is on CPU, but expected them to be on GPU (while checking arguments for addmm)
在pytorch中不能使用python常规list来添加层,应该使用torch.nn.ModuleList代替,如果使用list组成的层,都是顺序执行,也可以使用nn.Sequential进行序列化。
正确示例如下:
class CorrectListModel(nn.Module):
def __init__(self):
super().__init__()
input_size = 2
output_size = 3
hidden_size = 16
self.input_layer = nn.Linear(input_size, hidden_size)
self.input_activation = nn.ReLU()
# Fairly common when using residual layers
self.mid_layers = []
for _ in range(5):
self.mid_layers.append(nn.Linear(hidden_size, hidden_size))
self.mid_layers.append(nn.ReLU())
self.mid_layers = nn.Sequential(*self.mid_layers)
self.output_layer = nn.Linear(hidden_size, output_size)
def forward(self, x):
z = self.input_layer(x)
z = self.input_activation(z)
z = self.mid_layers(z)
out = self.output_layer(z)
return out
使用常规list组成的层,如果是顺序执行关系,可以用nn.Sequential进行序列化,执行时进行顺序执行,但是如果不是顺序执行关系,则需要使用torch.nn.ModuleList代替list.
四、充分利用torch.distributions
PyTorch有一些不错的对象和函数用于distribution,但这位开发者认为它们在torch.distributions中没有得到充分利用。可以这样使用:
# Setup
example_model = ExampleModel()
input_tensor = torch.rand(5, 2)
output = example_model(input_tensor)
print(output)
输出:
tensor([[ 0.1965, 0.0558, -0.2112],
[ 0.2035, 0.0650, -0.2077],
[ 0.2150, 0.0577, -0.2096],
[ 0.1957, 0.0540, -0.2117],
[ 0.2045, 0.0566, -0.2085]], grad_fn=<AddmmBackward>)
引入
from torch.distributions import Categorical
from torch.distributions.kl import kl_divergence
dist = Categorical(logits=output)
dist
Categorical(logits: torch.Size([5, 3]))
# Get probabilities
dist.probs
输出:
tensor([[0.3946, 0.3428, 0.2625],
[0.3947, 0.3437, 0.2616],
[0.3986, 0.3406, 0.2607],
[0.3947, 0.3426, 0.2627],
[0.3962, 0.3417, 0.2621]], grad_fn=<SoftmaxBackward>)
计算KL散度
# Calculate the KL-Divergence
dist_1 = Categorical(logits=output[0])
dist_2 = Categorical(logits=output[1])
kl_divergence(dist_1, dist_2)
tensor(2.5076e-06, grad_fn=<SumBackward1>)
五、对长度量(Long-Term Metrics)使用detach()
在两个epochs之间存储张量度量时,请确保对它们调用.detach(),以避免内存泄漏。
# Setup
example_model = ExampleModel()
data_batches = [torch.rand((10, 2)) for _ in range(5)]
criterion = nn.MSELoss(reduce='mean')
Bad Example
losses = []
# Training loop
for batch in data_batches:
output = example_model(batch)
target = torch.rand((10, 3))
loss = criterion(output, target)
losses.append(loss)
# Optimization happens here
print(losses)
[tensor(0.4718, grad_fn=<MseLossBackward>), tensor(0.5156, grad_fn=<MseLossBackward>), tensor(0.6583, grad_fn=<MseLossBackward>), tensor(0.4429, grad_fn=<MseLossBackward>), tensor(0.4133, grad_fn=<MseLossBackward>)]
Better Example
losses = []
# Training loop
for batch in data_batches:
output = example_model(batch)
target = torch.rand((10, 3))
loss = criterion(output, target)
losses.append(loss.item()) # Or `loss.item()`
# Optimization happens here
print(losses)
[0.5439911484718323, 0.5461570620536804, 0.6738904118537903, 0.5780249834060669, 0.5130327939987183]
六、删除模型时,使用torch.cuda.empty_cache()清除GPU缓存
import gc
example_model = ExampleModel().cuda()
del example_model
gc.collect()
# The model will normally stay on the cache until something takes it's place
torch.cuda.empty_cache()
七、预测之前一定记得调用model.eval()
是不是很多人都忘记了?
如果你忘记调用model.eval(),也就是忘记将模型转变为evaluation(测试)模式,那么Dropout层和Batch Normalization层就会对你的预测数据造成干扰。
example_model = ExampleModel()
# Do training
example_model.eval()
# Do testing
example_model.train()
# Do training again
在线代码演示:https://colab.research.google.com/drive/15vGzXs_ueoKL0jYpC4gr9BCTfWt935DC