|
apex安装、使用以及遇到的错误
NameError: name 'apex' is not defined
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Linux系统 ubuntu
1.安装流程(按顺序使用如下命令)
git clone https://github.com/NVIDIA/apex
cd apex(目录下有setup.py文件的)
pip3 install -v --no-cache-dir ./
Successfully built apex
Installing collected packages: apex
Successfully installed apex-0.1
注意:不能直接使用pip3 install apex来安装
2. 使用方法
只需要在原模型训练的代码中修改三处
(1)添加 from apex import amp;
(2)在定义完model和optimizer后,添加 model, optimizer = amp.initialize(model, optimizer, opt_level="O1");注意是字母O
(3)在模型训练部分代码中,注释掉 loss.backward(),使用如下代码来替换:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
报错:
RuntimeError: Found buffer total_ops with type torch.DoubleTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with buffers
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.
model.to(config.device) # 加了这一行就可以了 即使前面加过
|
|