文章/答案/技术大牛

发布

社区首页 >问答首页 >RuntimeError:带有预训练模型的CUDA内存不足

问RuntimeError:带有预训练模型的CUDA内存不足
EN

Stack Overflow用户

提问于 2021-05-14 21:18:36

回答 1查看 211关注 0票数 0

我正在使用一个预先训练好的模型来改善图像。

https://github.com/swz30/MIRNet.

我创建了一个demo.py (下面的代码)文件，以便为提供的预训练模板测试我的图像集。对于我的第一组图像，它们都具有非常高的分辨率，我总是得到相同的错误：

RuntimeError: CUDA out of memory. Tried to allocate 5.38 GiB (GPU 0; 3.95 GiB total capacity; 379.90 MiB already allocated; 2.89 GiB free; 16.10 MiB cached)

当我只测试一个分辨率较低的图像时，错误仍然存在，但以一种奇怪的方式：

RuntimeError: CUDA out of memory. Tried to allocate 1014.00 MiB (GPU 0; 3.95 GiB total capacity; 2.61 GiB already allocated; 527.44 MiB free; 23.25 MiB cached)

我对另一个存储库中的demo.py文件进行了必要的更改，以便在我的图像集上测试MIRNet。在这个过程中，我不得不做一些与图形兼容性相关的配置，但一切都解决了。

你有什么建议来解决我的问题吗？我使用的是在linux环境中提供的预训练模型，该模型具有anaconda和显卡-> NVIDIA GEFORCE GTX 960m 4 4gb的所有正确规格。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms.functional as TF
from PIL import Image
import os
from runpy import run_path
from skimage import img_as_ubyte
from collections import OrderedDict
from natsort import natsorted
from glob import glob
import cv2
import argparse

parser = argparse.ArgumentParser(description='Demo MIRNet')
parser.add_argument('--input_dir', default='./samples/', type=str, help='Input images')
parser.add_argument('--result_dir', default='./samples/output/', type=str, help='Directory for results')
parser.add_argument('--task', required=True, type=str, help='Task to run',
                    choices=['fivek', 'Denoising', 'SR_x3'])

args = parser.parse_args()


def save_img(filepath, img):
    cv2.imwrite(filepath, cv2.cvtColor(img, cv2.COLOR_RGB2BGR))


def load_checkpoint(model, weights):
    checkpoint = torch.load(weights)
    try:
        model.load_state_dict(checkpoint["state_dict"])
    except:
        state_dict = checkpoint["state_dict"]
        new_state_dict = OrderedDict()
        for k, v in state_dict.items():
            name = k[7:]  # remove `module.`
            new_state_dict[name] = v
        model.load_state_dict(new_state_dict)


task = args.task
inp_dir = args.input_dir
out_dir = args.result_dir

os.makedirs(out_dir, exist_ok=True)

files = natsorted(glob(os.path.join(inp_dir, '*.jpg'))
                  + glob(os.path.join(inp_dir, '*.JPG'))
                  + glob(os.path.join(inp_dir, '*.png'))
                  + glob(os.path.join(inp_dir, '*.PNG')))

if len(files) == 0:
    raise Exception(f"No files found at {inp_dir}")

# Load corresponding model architecture and weights
load_file = run_path(os.path.join("networks", "MIRNet_model.py"))
model = load_file['MIRNet']()
model.cuda()

weights = os.path.join("pretrained_models/denoising", "model_" + task.lower() + ".pth")
load_checkpoint(model, weights)
model.eval()

img_multiple_of = 8

for file_ in files:
    img = Image.open(file_).convert('RGB')
    input_ = TF.to_tensor(img).unsqueeze(0).cuda()

    # Pad the input if not_multiple_of 8
    h, w = input_.shape[2], input_.shape[3]
    H, W = ((h + img_multiple_of) // img_multiple_of) * img_multiple_of, (
                (w + img_multiple_of) // img_multiple_of) * img_multiple_of
    padh = H - h if h % img_multiple_of != 0 else 0
    padw = W - w if w % img_multiple_of != 0 else 0
    input_ = F.pad(input_, (0, padw, 0, padh), 'reflect')

    with torch.no_grad():
        restored = model(input_)
    restored = restored[0]
    restored = torch.clamp(restored, 0, 1)

    # Unpad the output
    restored = restored[:, :, :h, :w]

    restored = restored.permute(0, 2, 3, 1).cpu().detach().numpy()
    restored = img_as_ubyte(restored[0])

    f = os.path.splitext(os.path.split(file_)[-1])[0]
    save_img((os.path.join(out_dir, f + '.png')), restored)

print(f"Files saved at {out_dir}")

  [1]: https://github.com/swz30/MIRNet.

pytorch

conv-neural-network

artificial-intelligence

image

machine-learning

回答 1

Stack Overflow用户

发布于 2021-05-16 05:34:53

这听起来可能很愚蠢，但请尝试在终端中执行以下命令：

pkill -9 python

不过要小心，这个命令会杀死所有的python进程。也许其中一个进程在你尝试你的代码时卡住了，占用了GPU内存。如果此命令不能解决您的问题，请尝试在Google Colab上运行代码，看看问题是否仍然存在: Colab应该为您提供10-12 of内存的gpus。让我们保持最新状态

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67535027

复制

相似问题

问RuntimeError:带有预训练模型的CUDA内存不足
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问RuntimeError:带有预训练模型的CUDA内存不足EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问RuntimeError:带有预训练模型的CUDA内存不足
EN