按下ls -l *.py并回车，shell都为我们做了什么？

somenzz

发布于 2021-12-02 11:11:24

6830

发布于 2021-12-02 11:11:24

文章被收录于专栏：Python七号

你是否想过，当你在 shell 上执行一个命令时，unix 的 shell 到底做了哪些事情？shell 是如何理解和解释这些命令的？屏幕的背后都做些什么？比如说，当我们执行 ls -l *.py 的时候，shell 都做了哪些事情？了解了这些，可以更好的使用 Unix 类操作系统，今天我们就来一探究竟。

0、什么是 shell

shell 通常是一个命令行界面，它将操作系统的服务暴露给人类使用或其他程序。在 shell 启动后，shell 通常会通过显示提示来等待用户的输入。下图描述了基本的 UNIX 和 Windows shell 提示。

所以 shell 会提示用户输入命令。现在是用户输入命令的时候了。那么 shell 是如何获取用户输入的命令并进行解释的呢？为了理解这一点，让我们将它们分为 4 个步骤，分别是：

获取并解析用户输入
识别命令及命令的参数
查找命令
执行命令

现在详细展开：

1、获取并解析用户输入

比如说，在 shell 上输入了 ls -l *.py 并回车，shell 内部会调用一个叫 getline()「声明在#include <stdio.h>中，下同」的函数来读取用户输入的命令，用户输入的命令字符串作为标准输入流，一旦按下回车，表示一行结束，getline() 就会将输入的字符串存储到缓冲区中。

ssize_t getline(char **restrict lineptr, size_t *restrict n, FILE *restrict stream);

函数参数说明：

lineptr: 缓冲区
n: 缓冲区大小
stream: 流，这里就是标准输入流

现在让我们看一下代码：

char *input_buffer;
size_t b_size;

b_size = 32; // size of the buffer
input_buffer = malloc(sizeof(char) * b_size); // the buffer to store the user input

getline(&input_buffer, &b_size, stdin); // gets the line and stores it in input_buffer

一旦用户按下回车，就会调用 getline() ，将用户输入的字符串或命令将存储在 input_buffer 中。所以现在 shell 已经获取了用户输入，那么下一步是什么？

2、识别命令及命令的参数

现在 shell 已经知道你输入了字符串是 'ls -l *.py' 但是，还需要知道这里面哪个是命令，哪个是命令的参数，谁来做这个事情呢？那就是函数 strtok()「#include <string.h>」。

strtok() 将一个字符串标记为分隔符，在这个例子中分隔符是一个空格。所以一个空格告诉 strtok() 它是一个词的结尾。因此 input_buffer 中的第一个标记或单词是命令 (ls)，其余的单词或标记（-l 和 *.py）是命令的参数。因此，一旦 shell 标记了字符串，它就会将它们存储在一个变量中，以便以后使用。

char *strtok(char *restrict str, const char *restrict delim);

参数说明：

str: 要标记的字符串
delim: 分隔符

函数 strtok() 接受字符串和分隔符作为参数，返回一个指向标记字符串的指针。具体的执行代码如下所示：

char *input_buffer, *args, *delim_args, *command_argv[50];
int i;

i = 0;
delim_args = " \t\r\n\v\f"; // the delimeters
args = strtok(input_buffer, delim_args); // stores the token inside args
while (args)
{
 command_argv[i] = args; // stores the token in command_argv
 args = strtok(NULL, delim_args);
 i++;
}
command_argv[i] = NULL; // sets the last entity of command_argv to NULL

command_argv 保存了命令字符串，其内容如下：

command_argv[0] = "ls"
command_argv[1] = "-l"
command_argv[2] = "*.py"
command_argv[3] = NULL

好了，command_argv[0] 是命令，其他的都是它的参数，最后一个是 NULL，表示命令的结束。命令字符串已经拆解完毕了，下一步就是查找命令。

3、查找命令

第二步已经知道，用户要执行的命令就是 ls，那么去哪里查找这个命令呢？shell 回去环境变量 PATH 中去查找，PATH 这个环境变量就是存储可执行命令的位置的。

不过，一个 PATH 存储的路径可不止一个：

如何在这么多路径中高效的查找到 ls 命令呢？这就需要 access() 「#include <unistd.h>」函数：

int access(const char *pathname, int mode);

参数及返回值说明：

pathname: 文件/可执行文件的路径
mode: 模式，我们使用 X_OK 来检查文件是否存在
返回值：如果文件存在，返回 0，否则返回 -1

{
 char *path_buff, *path_dup, *paths, *path_env_name, *path[50];
 int i;

 i = 0;
 path_env_name = "PATH";
 path_buff = getenv(path_env_name); /* get the variable of PATH environment */
 path_dup = _strdup(path_buff); /* this function is found below */
 paths = strtok(path_dup, ":"); /* tokenizes it */
 while (paths)
 {
  path[i] = paths;
  paths = strtok(NULL, ":");
  i++;
 }
 path[i] = NULL; /* terminates it with NULL */
}

/**
* _strdup - duplicates a string
* @from: the string to be duplicated
*
* Return: ponter to the duplicated string
*/
char *_strdup(char *from)
{
 int i, len;
 char *dup_str;

 len = _strlen(from) + 1;
 dup_str = malloc(sizeof(int) * len);
 i = 0;

 while (*(from + i) != '\0')
 {
  *(dup_str + i) = *(from + i);
  i++;
 }
 *(dup_str + i) = '\0';

 return (dup_str);
}

上面代码中的 path 数组存储所有 PATH 位置并且以 NULL 终止。因此，可以将每个 PATH 位置与命令连接起来，并使用 access() 函数执行存在性检查：

{
 char *command_file, *command_path, *path[50];
 int i;

 i = 0;
 command_path = malloc(sizeof(char) * 50);
 while (path[i] != NULL)
 {
  _strcat(path[i], command_file, command_path); /* this function is found below */
  stat_f = access(command_path, X_OK); /* and checks if it exists */
  if (stat_f == 0)
   return (command_path); /* returns the concatenated string if found */

  i++;
 }
 return NULL; /* otherwise returns NULL */
}

/**
* _strcat - concatenates two strings and saves it to a blank string
* @path: the path string
* @command: the command
* @command_path: the string to store the concatenation
*
* Return: Always void
*/
void _strcat(char *path, char *command, char *command_path)
{
 int i, j;

 i = 0;
 j = 0;

 while (*(path + i) != '\0')
 {
  *(command_path + i) = *(path + i);
  i++;
 }
 *(command_path + i) = '/';
 i++;

 while (*(command + j) != '\0')
 {
  *(command_path + i) = *(command + j);
  i++;
  j++;
 }
 *(command_path + i) = '\0';
}

一旦找到命令，就会返回命令的完整路径，否则就返回 NULL，然后 shell 会显示命令不存在的错误。

现在假如命令找到了，然后呢？

4、执行命令

命令一旦找到，就是执行它的时候了，问题是怎么执行呢？

执行命令，需要借助函数 execve()「#include <unistd.h>」中：

int execve(const char *pathname, char *const argv[],
                  char *const envp[]);

参数说明：

pathname: 可执行文件的完整路径
argv: 命令的参数
envp: 环境变量列表

execve() 会执行找到的命令，返回一个整数表示执行结果。

但是现在如果 shell 只是运行 execve()，就会出现问题。execve() 调用后不返回标准输出的信息，这是不好的，因为用户需要执行的结果。所以为了解决这个问题，shell 在子进程中执行命令。因此，一旦在子进程内执行完成，父进程就会收到信号并且程序流继续。所以为了执行命令，shell 使用 fork() 创建了一个子进程。（fork 声明在#include <unistd.h>中）

pid_t fork(void);

fork() 通过复制调用进程来创建一个新进程。新进程称为子进程。调用进程称为父进程。fork() 在父进程中返回子进程的进程 ID，在子进程中返回 0：

{
 char *command, *command_argv[50], **env;
 pid_t child_pid;
 int status;

 get_each_command_argv(command_argv, input_buffer); /* this function is found below */
 child_pid = fork();
 if (child_pid == -1)
  return (0);

 if (child_pid == 0)
 {
  if (execve(command, command_argv, env) == -1)
   return (0);
 }
 else
  wait(&status);
}

/**
* get_each_command_argv - stores all the arguments \
*             of the input command to the list
* @command_argv: the command argument list
* @input_buffer: the input buffer
*
* Return: Always void
*/
void get_each_command_argv(char **command_argv, char *input_buffer)
{
 char *args, *delim_args;
 int i;

 delim_args = " \t\r\n\v\f";
 args = strtok(input_buffer, delim_args);

 i = 0;
 while (args)
 {
  command_argv[i] = args;
  args = strtok(NULL, delim_args);
  i++;
 }
 command_argv[i] = NULL;
}

shell 使用 wait()（函数声明在#include <sys/wait.h>）在程序流继续之前等待子进程的状态变化，并再次为用户显示提示。