首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >一个人如何在一个似乎拒绝合作的高性能计算机上安装PYTORTOR1.9?

一个人如何在一个似乎拒绝合作的高性能计算机上安装PYTORTOR1.9?
EN

Stack Overflow用户
提问于 2021-09-17 23:51:09
回答 2查看 1.4K关注 0票数 3

我一直试图在我的HPC上安装带有Cuda (理想情况下是11)的PyTorch 1.9,但我做不到。

该专题组说:

代码语言:javascript
运行
复制
Package typing-extensions conflicts for:
typing-extensions
torchvision -> pytorch==1.8.1 -> typing-extensionsThe following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.17=0
  - feature:|@/linux-64::__glibc==2.17=0
  - cffi -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - cudatoolkit=11.0 -> __glibc[version='>=2.17,<3.0.a0']
  - cudatoolkit=11.0 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - freetype -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - jpeg -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - lcms2 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - libffi -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - libgcc-ng -> __glibc[version='>=2.17']
  - libmklml -> libgcc-ng -> __glibc[version='>=2.17']
  - libpng -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - libstdcxx-ng -> __glibc[version='>=2.17']
  - libtiff -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - libwebp-base -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - lz4-c -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - mkl-service -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - mkl_fft -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - mkl_random -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - ncurses -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - ninja -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - numpy -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - numpy-base -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - openjpeg -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - openssl -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - pillow -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - python=3.9 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - pytorch==1.9 -> cudatoolkit[version='>=11.1,<11.2'] -> __glibc[version='>=2.17,<3.0.a0']
  - readline -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - sqlite -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - tk -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - torchvision -> cudatoolkit[version='>=11.1,<11.2'] -> __glibc[version='>=2.17|>=2.17,<3.0.a0']
  - xz -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - zlib -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
  - zstd -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.17

但我不明白如何使用这些信息来安装它。我能为系统管理员做些什么吗?

当我试图用conda安装它时,我会收到一条消息,告诉我它已经安装了。但是,conda列表greps显示的版本仅是CPU,而不是GPU:

代码语言:javascript
运行
复制
(metalearning_gpu) miranda9~/automl-meta-learning $ conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

(metalearning_gpu) miranda9~/automl-meta-learning $ 
(metalearning_gpu) miranda9~/automl-meta-learning $ conda list | grep torch
cpuonly                   1.0                           0    pytorch
ffmpeg                    4.3                  hf484d3e_0    pytorch
pytorch                   1.9.0               py3.9_cpu_0  [cpuonly]  pytorch
torch                     1.9.0+cpu                pypi_0    pypi
torchaudio                0.9.0                    pypi_0    pypi
torchmeta                 1.7.0                    pypi_0    pypi
torchvision               0.10.0+cpu               pypi_0    pypi

使用pip安装它的尝试完全失败:

代码语言:javascript
运行
复制
(metalearning_gpu) miranda9~/automl-meta-learning $ pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html


Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.9.0+cu111
ERROR: Exception:
Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
    status = self.run(options, args)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
    return func(self, options, args)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 315, in run
    requirement_set = resolver.resolve(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
    result = self._result = resolver.resolve(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
    if not criterion.candidates:
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
    return any(self)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
    candidate = func()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 204, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 295, in __init__
    super().__init__(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
    dist = self._prepare_distribution()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 305, in _prepare_distribution
    return self._factory.preparer.prepare_linked_requirement(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 550, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 239, in unpack_url
    file = get_http_url(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 102, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/download.py", line 132, in __call__
    resp = _http_get_download(self._session, link)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/download.py", line 115, in _http_get_download
    resp = session.get(target_url, headers=HEADERS, stream=True)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/session.py", line 454, in request
    return super().request(method, url, *args, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/cachecontrol/adapter.py", line 44, in send
    cached_response = self.controller.cached_request(request)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/cachecontrol/controller.py", line 139, in cached_request
    cache_data = self.cache.get(cache_url)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/cache.py", line 54, in get
    return f.read()
MemoryError

当前安装脚本:

代码语言:javascript
运行
复制
## Installation script
# to install do: bash ~/automl-meta-learning/install.sh

#conda update conda

#conda create -y -n metalearning_gpu python=3.9
#conda activate metalearning_gpu
#conda remove --name metalearning_gpu --all

module load cuda-toolkit/11.1
module load gcc/9.2.0

# A40, needs cuda at least 11.0, but 1.9 requires 11
conda activate metalearning_gpu
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

#conda activate metalearning_cpu
#conda install pytorch torchvision torchaudio cpuonly -c pytorch
#pip3 install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

# uutils installs
conda install -y dill
conda install -y networkx>=2.5
conda install -y scipy
conda install -y scikit-learn
conda install -y lark-parser -c conda-forge

# due to compatibility with torch=1.7.x, https://stackoverflow.com/questions/65575871/torchtext-importerror-in-colab
#conda install -y torchtext==0.8.0 -c pytorch

conda install -y tensorboard
conda install -y pandas
conda install -y progressbar2
conda install -y transformers
conda install -y requests
conda install -y aiohttp
conda install -y numpy
conda install -y plotly
conda install -y matplotlib

pip install wandb

# for automl
conda install -y pyyml
conda install -y torchviz
#conda install -y graphviz

#pip install tensorflow
#pip install learn2learn

#pip install -U git+https://github.com/brando90/pytorch-meta.git
#pip install --no-deps torchmeta==1.6.1
pip install --no-deps torchmeta==1.7.0
#        'torch>=1.4.0,<1.9.0',
#        'torchvision>=0.5.0,<0.10.0',
#pip install -y numpy
pip install Pillow
pip install h5py
#pip install requests
pip install ordered-set

pip install higher
#    'torch'

#pip install -U git+https://github.com/moskomule/anatome
pip install --no-deps -U git+https://github.com/moskomule/anatome
#    'torch>=1.9.0',
#    'torchvision>=0.10.0',
pip install tqdm

# - using conda develop rather than pip because uutils installs incompatible versions with the vision cluster
## python -c "import sys; [print(p) for p in sys.path]"
conda install conda-build
# conda develop ~/ultimate-utils/ultimate-utils-proj-src
# conda develop ~/automl-meta-learning/automl-proj-src
# pip install ultimate-utils



# -- extra notes

# local editable installs
# HAL installs, make sure to clone from wmlce 1.7.0 that has h5py ~= 2.9.0 and torch 1.3.1 and torchvision 0.4.2
# pip install torchmeta==1.3.1
EN

Stack Overflow用户

发布于 2021-09-27 15:21:21

答案是正确的。不巧的是,您可能不得不尝试使用conda和pip安装它。其中一个最终为我工作。由于某些原因,vision集群/ hpc拒绝下载conda版本,无法测试它是否是由于hpc...also (如果您的hpc管理得不好)--安装东西可能需要很长时间(没有办法)--而且它似乎得到了一个与gpu交互的工作,帮助了安装过程…

我在一个与gpu和大量cpus (16)交互的作业中运行了这个命令:

代码语言:javascript
运行
复制
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

而且,这似乎与我也需要的火炬文本兼容(使用0.10.0)。请参阅:https://github.com/pytorch/text/issues/1394#issuecomment-927484153

票数 0
EN
查看全部 2 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69230502

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档