首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何发送新命令到已经运行的nohup进程或在nohup中一起/并发运行两个命令?

如何发送新命令到已经运行的nohup进程或在nohup中一起/并发运行两个命令?
EN

Unix & Linux用户
提问于 2022-11-14 22:32:26
回答 1查看 257关注 0票数 3

我想要运行一个nohup作业,并对该作业运行一个特定的新命令(例如kerberos身份验证)。实际上,理想的解决方案是首先运行reauth命令,然后在相同nohup进程id下运行实际作业all。这样nohup进程就不会丢失kerberos票证。因此,我希望同步运行另一个python脚本,这样就不会丢失它们的kerberos票证-- <#>without屏幕或tmux或任何需要我与脚本交互的内容。基本上,我希望实现一个运行python作业(或任何一个)的守护进程,直到它完成运行,它才会丢失它的票证。一个人是怎么做到的?

这是我目前的尝试,但我不太清楚它是否有效

代码语言:javascript
运行
复制
# - set up this main sh script
source ~/.bashrc
source ~/.bash_profile
source ~/.bashrc.user
echo HOME = $HOME

source cuda11.1

conda init bash
conda activate metalearning_gpu

# - get a job id for this tmux session
export SLURM_JOBID=$(python -c "import random;print(random.randint(0, 1_000_000))")
echo SLURM_JOBID = $SLURM_JOBID
export OUT_FILE=$PWD/main.sh.o$SLURM_JOBID
export ERR_FILE=$PWD/main.sh.e$SLURM_JOBID
export WANDB_DIR=$HOME/wandb_dir

export CUDA_VISIBLE_DEVICES=4
echo CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES
python -c "import torch; print(torch.cuda.get_device_name(0));"

# - CAREFUL, if a job is already running it could do damage to it, rm reauth process, qian doesn't do it so skip it
# top -u brando9
# pkill -9 reauth -u brando9

# - expt python script then inside that python pid attach a reauth process
# should I run rauth within python with subprocess or package both the nohup command and the rauth together in badsh somehow
#python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &
#nohup (echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE) &
nohup (echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/main.py > $OUT_FILE 2> $ERR_FILE) &
# other option is to run `echo $SU_PASSWORD | /afs/cs/software/bin/reauth` inside of python, right?
export JOB_PID=$!
echo JOB_PID = $JOB_PID

# - Done
echo "Done with the dispatching (daemon) sh script

可能是因为它跑得不够久?不确定。

我认为另一种选择是在python脚本本身内运行reauth。没有测试过但可能有用吗?主要的诀窍是不喜欢我的流程,因为失去kerberos票证而导致nohup死掉。

reauth的内容如下:

代码语言:javascript
运行
复制
(metalearning_gpu)~ $ cat /afs/cs/software/bin/reauth
#!/usr/bin/perl
# $Id: reauth 2737 2011-06-20 18:14:05Z miles $
#
# Original version (C) Martin Schulz, 2'2002
# University Karlsruhe
#
# Modifications by Miles Davis 
#  Super minimal -- call programs rather than functions to reduce dependence
#  on extra perl modules.
#
# Heimdal patches thanks to Georgios Asimenos 
#

# General:
##########

# This little script aims at maintaining a valid AFS token from a
# users password for long running jobs.

# As everybody knows (or should know) Kerberos tickets and AFS tokens
# only have a limited lifetime. This is so by design and is usually
# reasonable. After 12 hours, it is no more obvious that it is really
# that user sitting in front of the computer that once typed the
# correct password in. Furthermore the damage caused by compromized
# AFS tokens is limited to the lifetime of that ticket.

# However, there are situations when users want to use long running
# jobs that will write to AFS filespace for several days. Renewable
# tickets are not so much of help here, since they can only be renewed
# if ....

# Therefore the secret has somehow deposited on the local computer
# that will run the long time job. This can be eiter done by storing a
# keytab on the local disk, maybe with a cron(*) principal with
# reduces priviledges. The approach taken here is to work with the
# original password and keep it in RAM only.

# When starting this program, the user is asked for his principal and
# the corresponding password. Then the TGT and AFS token is obtained
# and displayed, afterwards, a background process is forked and the
# main process will return to the system prompt. The workload program
# can now be started.

# The background process will periodically attempt to obtain krb
# tickets and AFS tokens. If this fails for some reason (Kerberos
# server not available or anything, the program aborts.

# aklog does not create a new pag if not told so. If you want your
# background process have a separate pag, create it beforehand.

# The reauth.pl program will work until eternity if is not stopped
# somehow. The canonical way is kill it by "kill $pid", where $pid is
# the process id printed before the return of the initial call to
# reauth.pl or found in the output of "ps".


# (*) Cron jobs are another issue. Our institute introduced
# user.cron-style principals to enable cron to obtain a token and then
# work on restricted parts of the users home directories.


# Security issues:
##################

# reauth.pl will run forever if you do not stop it, so don't forget that!

# The password is kept in RAM (of the child process). AFAIK, this can
# only be recovered by local root (who you need to trust anyway). It
# will not survive a reboot of the local machine.

# The password is not kept on any disk. Therefore any bootfloppy
# (reboot to single user mode..)  or screwdriver (take disk away..)
# attacks are not promising.

# Be aware that your NSA-, FBI-, MI5-, KGB-, ElQaida-, or (*insert
# your favorite opponent or competitor here*)-sponsored cleaning
# personnel or coworkers might have even more elaborate means... :-)


# BUGS:
#######

# Only mildly tested only on Linux and Solaris.

# Uses kinit, aklog, klist and tokens programs for a KerberosV/ Ken
# Hornstein's migration kit centered AFS setup. Please adjust to your
# config.




###########################################################################
# Configs:


# kinit program, add path if necessary
if ( -e "/usr/kerberos/bin/kinit" ) {
    $kinit="/usr/kerberos/bin/kinit";
} elsif ( -e "/usr/lib/heimdal/bin/kinit" ) {
    $kinit = "/usr/lib/heimdal/bin/kinit";
} elsif ( -e "/usr/bin/kinit" ) {
    $kinit="/usr/bin/kinit";
} else {
    die("Couln't find kinit.\n");
}


# aklog program, add path if necessary
if ( -e "/usr/bin/aklog" ) {
    $aklog="/usr/bin/aklog";
} elsif ( -e "/usr/lib/heimdal/bin/afslog" ) {
    # or, afslog, for heimdal weirdos
    $aklog="/usr/lib/heimdal/bin/afslog";
} else {
    die("Couln't find aklog or afslog.\n");
}

# klist program, add path if necessary
$klist="/usr/kerberos/bin/klist";

# tokens program, add path if necessary
$tokens="/usr/bin/tokens";


#################################################################
# Program:

use Getopt::Long;
use POSIX qw(setuid);
use POSIX qw(setgid);
use POSIX qw(setsid);

# Defaults for command line options.
my $keytab = '';
my $command = '';
my $username = '';
my $debug = 0;
my $verbose = 0;
my $interval=15000; # time interval in seconds: 4+ hours:

my %opts = (
    # Keytab
    'k=s' => sub {
                    $keytab = @_[1];
                    $kinit_opts .= "-k -t $keytab ";
                },
    # Run command
    'c=s' => sub {
                    $command = @_[1];
                },
    # Run command as user
    'u=s' => sub {
                    $username = @_[1];
                },
    # Time interval to sleep
    'i=i' => sub {
                    $interval = @_[1];
                },
    # Debug
    'd'   => sub {
                    $debug++;
                },
    # Be versbose
    'v'   => sub {
                    $verbose++;
                },
);


GetOptions(%opts) or die "Usage: reauth [ -k=keytab ] [ -u user ] [ -i  ]\n";




if(@ARGV) {
    $princ = $ARGV[0];
    debug_print(2, "Principal name provided by argument = $princ");
} else {
   # Assume we want the login name as the principal name
    $princ = getpwuid($<);
    debug_print(2, "Principal name provided by argument = $princ");
}

if ($keytab) {
    # Don't ask for password, a keytab was provided.
    debug_print(1, "Keytab provided = $keytab");
} else {
    # read password, but turn off echo before:
    print "Password for $princ: ";
    system "stty -echo";
    $passwd = ;
    system "stty echo";
    printf "\n";
    chomp $passwd;
    # Actually get the tickets/tokens
    if(obtain_tokens()!=0) {
        die "Can't obtain kerberos tickets\n";
    }
    if ($verbose) {
        show_tokens();
    }
}

# fork to go into background:
# a) the parent will exit
# b) the child will work on
$pid = fork();
if ($pid) {
    # I am the parent.
    printf "Background process pid is: $pid\n";
    if ($command) {
        debug_print(1,"Waiting for child to die.");
        wait;
        debug_print(1,"Child is dead.");
    }
    exit 0;
} else {
    # I am the child.
    debug_print(2,"I am process $");
    print "Can't set session id\n" unless setsid();

    debug_print(2,"KRB5CCNAME: " . $ENV{KRB5CCNAME});
    #if ($ENV{KRB5CCNAME}) {
        #$ENV{KRB5CCNAME} =  $ENV{KRB5CCNAME} . "_reauth_$";
    #} else {
        #$ENV{KRB5CCNAME} =  "/tmp/krb5cc_reauth_$";
    #}

    #debug_print(2,"Creating " . $ENV{KRB5CCNAME});
    #system "touch $ENV{KRB5CCNAME}";


    if ($username) {
        debug_print(1, "Looking up UID for $username");
        ($name,$passwd,$UID,$GID, @junk) = getpwnam($username);
        debug_print(1, "Changing to UID $UID, GID $GID");
        print "Can't set group id\n" unless setgid($GID);
        print "Can't set user id\n" unless setuid($UID);
        if ($ENV{KRB5CCNAME}) {
            $ENV{KRB5CCNAME} =  $ENV{KRB5CCNAME} . "_reauth_$";
        } else {
            $ENV{KRB5CCNAME} =  "/tmp/krb5cc_reauth_$";
        }
    }

    debug_print(2, "Running as uid " . $<);
    # Actually get the tickets/tokens
    if(obtain_tokens()!=0) {
        die "Can't obtain kerberos tickets\n";
    }

    if ($verbose) {
        show_tokens();
    }

    # If I was told to run a command, do it.
    if ($command) {
        debug_print(1,"About to exec $command");
        exec($command) or die "Can't execute '$command'.\n";
        exit
    }

    debug_print(2,"Going into auth loop (interval is $interval).");

    #close(STDOUT);
    #close(STDERR);

    # Otherwise, work until killed:
    while (1) {
        debug_print(2,"Waking up to obtain new tokens.");
        obtain_tokens();
        if ($verbose) {
            show_tokens();
        }
        sleep $interval;
    };
}

#################################################################


sub obtain_tokens() {

  # ignore sigpipes' (according to perlopentut)
  $SIG{PIPE} = 'IGNORE';

    #debug_print(1,"Running: | $kinit -f $kinit_opts -p $princ 1>/dev/null 2>&1");

  # run kinit
  open(KINIT, "| $kinit -f $kinit_opts -p $princ 1>/dev/null 2>&1");

  # pass password to stdin, password does not show up on command line
  if (! $keytab) {
      print(KINIT "${passwd}\n");
  }

  # close pipe and get status
  close(KINIT); $status=$?;

    debug_print(1,"kinit exited with status $status\n");
  # act on status..
  if($status == 256) {
        if ($verbose) {
            print "WARNING: kinit is not able to obtain Kerberos ticket ($status).\n";
            print "         Possible DNS or network problem. Continuing anyway...\n";
        }
        return 1;
  } elsif($status!=0) {
    print "kinit is not able to obtain Kerberos ticket: $status\n";
     return 2;
  };

    debug_print(1,"Running $aklog...\n");
  $status = system "$aklog >/dev/null" ;
    debug_print(1,"aklog exited with status $status\n");
  if($status!=0) {
    print "aklog is not able to obtain AFS token: $status\n";
     return 3;
  };

  return 0;

};

##################################################################

sub show_tokens() {
    system $klist ;
    system $tokens ;
};

##################################################################

sub debug_print($) {
    my $level = shift;
    my $message = shift;

    if ($debug >= $level) {
        print "DEBUG$debug: $message\n";
    }
}

##################################################################

python尝试:

代码语言:javascript
运行
复制
def run_bash_command(cmd: str) -> Any:
    import subprocess

    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
    output, error = process.communicate()
    if error:
        raise Exception(error)
    else:
        return output


def stanford_reauth():
    # def stanford_rauth(password: Optional[str] = None):
    # password: str = os.environ['SU_PASSWORD'] if password is None else None
    # assert password is not None, f'Err: {password=}'
    reauth_cmd: str = f'echo $SU_PASSWORD | /afs/cs/software/bin/reauth'
    out = run_bash_command(reauth_cmd)
    print('Output of reauth (/afs/cs/software/bin/reauth with password)')
    print(f'{out=}')

我目前正在运行这两个,看看他们是否通过/没有被杀。

新错误:

我在尝试:

代码语言:javascript
运行
复制
nohup sh -c "echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_diversity_with_task2vec.py --manual_loads_name diversity_ala_task2vec_hdb1_mio > $OUT_FILE 2> $ERR_FILE" > $PWD/main.sh.nohup.out$SLURM_JOBID &

但我不确定它还能不能用。我得到以下错误:

代码语言:javascript
运行
复制
Password for brando9: stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device

Can't obtain kerberos tickets

我想我明白问题是。新命令要求终端发送密码,这样就不会让echo发送密码。我试过:

我有ssh和need来发送密码。有办法用ssh发送密码吗?

有关:

EN

回答 1

Unix & Linux用户

发布于 2022-11-16 02:09:04

似乎所有三次尝试都成功了:

  • 在python内部运行reauth
  • 做以上两件事

我以前运行作业的示例脚本:

代码语言:javascript
运行
复制
# https://unix.stackexchange.com/questions/724902/how-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-e-g-r
# sh ~/diversity-for-predictive-success-of-meta-learning/main_nohup_snap.sh
# - set up this main sh script
export RUN_PWD=$(pwd)
source ~/.bashrc
source ~/.bash_profile
source ~/.bashrc.user
echo HOME = $HOME
# since snap .bash.user cd's me into HOME at dfs
cd $RUN_PWD
echo RUN_PWD = $RUN_PWD
realpath .

source cuda11.1

conda init bash
conda activate metalearning_gpu

# - get a job id for this tmux session
export SLURM_JOBID=$(python -c "import random;print(random.randint(0, 1_000_000))")
echo SLURM_JOBID = $SLURM_JOBID
export OUT_FILE=$PWD/main.sh.o$SLURM_JOBID
export ERR_FILE=$PWD/main.sh.e$SLURM_JOBID
export WANDB_DIR=$HOME/wandb_dir
echo $OUT_FILE
echo $ERR_FILE

export CUDA_VISIBLE_DEVICES=5
echo CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES
python -c "import torch; print(torch.cuda.get_device_name(0));"

# - CAREFUL, if a job is already running it could do damage to it, rm reauth process, qian doesn't do it so skip it
# top -u brando9
# pkill -9 reauth -u brando9

# - expt python script then inside that python pid attach a reauth process
# should I run rauth within python with subprocess or package both the nohup command and the rauth together in badsh somehow
#python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &
nohup sh -c 'echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE' > $PWD/nohup.out$SLURM_JOBID &
#nohup python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &

# other option is to run `echo $SU_PASSWORD | /afs/cs/software/bin/reauth` inside of python, right?
export JOB_PID=$!
echo JOB_PID = $JOB_PID
echo SLURM_JOBID = $SLURM_JOBID

# - Done
echo "Done with the dispatching (daemon) sh script"

主要指挥:

代码语言:javascript
运行
复制
nohup sh -c 'echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE' > $PWD/nohup.out$SLURM_JOBID &

我认为我将坚持选项3,因此运行这两个选项,因为如果我运行一个分布式作业,我不希望这些进程在没有我的许可下被随机杀死。为此,python脚本:

代码语言:javascript
运行
复制
def run_bash_command(cmd: str) -> Any:
    import subprocess

    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
    output, error = process.communicate()
    if error:
        raise Exception(error)
    else:
        return output

def stanford_reauth():
    """"
    re-authenticates the python process in the kerberos system so that the
    python process is not killed randomly.

    ref: https://unix.stackexchange.com/questions/724902/how-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-or-run
    """
    reauth_cmd: str = f'echo $SU_PASSWORD | /afs/cs/software/bin/reauth'
    out = run_bash_command(reauth_cmd)
    print('Output of reauth (/afs/cs/software/bin/reauth with password): ')
    print(f'{out=}')

别忘了检查reauth,经常会杀了这份工作:

代码语言:javascript
运行
复制
# - CAREFUL, if a job is already running it could do damage to it, rm reauth process
# pkill -9 reauth -u brando9
票数 0
EN
页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://unix.stackexchange.com/questions/724902

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档