所以我一直在努力安装slurm已经有一段时间了,我真的很茫然。我的目标是在一台机器上安装Slurm并从同一台机器向它提交作业。
最初,我试图通过apt install slurm-llnl进行安装,但是版本远远落后于Ubuntu16.04.3。
因此,下一步是从源代码编译Slurm。下载并提取我运行的tarball
./configure --prefix=/etc/init.d/ --sysconfdir=/etc/slurm-llnl/ make make install
然后我添加了以下/etc/ld.so.con.d/SlurmLib.conf
/etc/init.d/lib
/etc/init.d/lib/slurm然后我创建了我的cgroup.conf、slurm.conf和slurmdb.conf。
cgroup.conf
CgroupAutomount=yes
ConstrainCores=no
ConstrainRAMSpace=noslurm.conf
ControlMachine=arroyavelab15
AuthType=auth/none
CryptoType=crypto/munge
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/slurm_dir/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/slurm_dir/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/slurm_dir/spool/slurmd/
SlurmUser=danielsauceda
SlurmdUser=danielsauceda
StateSaveLocation=/var/slurm_dir/spool
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
AccountingStorageType=accounting_storage/none
AccountingStoreJobComment=YES
ClusterName=cluster
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=5
SlurmctldLogFile=/var/slurm_dir/slurmctld.log
SlurmdDebug=3
NodeName=arroyavelab15 NodeAddr=xxx.xxx.xxx.xxx.xx CPUs=1 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN RealMemory=8000
PartitionName=debug Nodes=arroyavelab15 Default=YES MaxTime=INFINITE State=UPslurmdb.conf
# slurmDBD info
DbdAddr=localhost
DbdHost=localhost
SlurmUser=danielsauceda
DebugLevel=4
PidFile=/var/run/slurmdbd.pid
#
# Database info
StorageType=accounting_storage/mysql
StoragePass=slurm
StorageUser=slurm最后在电子技术之后
./slurmctld -D
./slurmd -D
./slurmdbd -Dv它们似乎都在运行(在那里有独立的终端)。
然而,当执行
srun -N3 --nodes=1 --ntasks-per-node=1 hostname
我得到以下信息
srun: error: Couldn't find the specified plugin name for auth/munge looking at all files
srun: error: cannot find auth plugin for auth/munge
srun: error: cannot create auth context for auth/munge
srun: error: Couldn't find the specified plugin name for auth/munge looking at all files
srun: error: cannot find auth plugin for auth/munge
srun: error: cannot create auth context for auth/munge
srun: error: Couldn't find the specified plugin name for auth/munge looking at all files
srun: error: cannot find auth plugin for auth/munge
srun: error: cannot create auth context for auth/munge
srun: error: authentication: authentication initialization failure
srun: error: Srun communication socket apparently being written to by something other than Slurm
srun: error: Unable to allocate resources: Protocol authentication error我不知道问题出在哪里,在线研究也没有多大帮助。
发布于 2020-03-23 07:41:14
从包管理器安装munge,然后构建slurm -with-munge=选项,auth_munge.so应该出现在$前缀/lib/slurm下面。
https://stackoverflow.com/questions/48410583
复制相似问题