Linux下c语言中的main函数是如何被调用的

当我们在shell下执行一个程序的时候,shell内部首先会用fork系统调用来新建一个进程,然后再用execve系统调用把目标程序加载到内存中,并将其参数及环境变量等压入栈中,之后再执行目标程序的入口函数。

由于linux下的程序一般都是elf格式,所以入口函数通常存放在elf header的

e_entry字段里,默认为_start函数。

该_start函数并不是我们写的,而是gcc在编译我们的程序时,将glibc里对应的_start函数嵌入到我们的程序里的。

// sysdeps/x86_64/start.S
/* This is the canonical entry point, usually the first thing in the text
   segment.  The SVR4/i386 ABI (pages 3-31, 3-32) says that when the entry
   point runs, most registers' values are unspecified, except for:

   %rdx    Contains a function pointer to be registered with `atexit'.
    This is how the dynamic linker arranges to have DT_FINI
    functions called for shared libraries that have been loaded
    before this code runs.

   %rsp    The stack contains the arguments and environment:
    0(%rsp)        argc
    LP_SIZE(%rsp)      argv[0]
    ...
    (LP_SIZE*argc)(%rsp)    NULL
    (LP_SIZE*(argc+1))(%rsp)  envp[0]
    ...
            NULL
*/

#include <sysdep.h>

ENTRY (_start)
  /* Clearing frame pointer is insufficient, use CFI.  */
  cfi_undefined (rip)
  /* Clear the frame pointer.  The ABI suggests this be done, to mark
     the outermost frame obviously.  */
  xorl %ebp, %ebp

  /* Extract the arguments as encoded on the stack and set up
     the arguments for __libc_start_main (int (*main) (int, char **, char **),
       int argc, char *argv,
       void (*init) (void), void (*fini) (void),
       void (*rtld_fini) (void), void *stack_end).
     The arguments are passed via registers and on the stack:
  main:    %rdi
  argc:    %rsi
  argv:    %rdx
  init:    %rcx
  fini:    %r8
  rtld_fini:  %r9
  stack_end:  stack.  */

  mov %RDX_LP, %R9_LP  /* Address of the shared library termination
           function.  */
#ifdef __ILP32__
  mov (%rsp), %esi  /* Simulate popping 4-byte argument count.  */
  add $4, %esp
#else
  popq %rsi    /* Pop the argument count.  */
#endif
  /* argv starts just at the current stack top.  */
  mov %RSP_LP, %RDX_LP
  /* Align the stack to a 16 byte boundary to follow the ABI.  */
  and  $~15, %RSP_LP

  /* Push garbage because we push 8 more bytes.  */
  pushq %rax

  /* Provide the highest stack address to the user code (for stacks
     which grow downwards).  */
  pushq %rsp

#ifdef PIC
  /* Pass address of our own entry points to .fini and .init.  */
  mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP
  mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP

  mov main@GOTPCREL(%rip), %RDI_LP
#else
  /* Pass address of our own entry points to .fini and .init.  */
  mov $__libc_csu_fini, %R8_LP
  mov $__libc_csu_init, %RCX_LP

  mov $main, %RDI_LP
#endif

  /* Call the user's main function, and exit with its value.
     But let the libc call main.  Since __libc_start_main in
     libc.so is called very early, lazy binding isn't relevant
     here.  Use indirect branch via GOT to avoid extra branch
     to PLT slot.  In case of static executable, ld in binutils
     2.26 or above can convert indirect branch into direct
     branch.  */
  call *__libc_start_main@GOTPCREL(%rip)

  hlt      /* Crash if somehow `exit' does return.   */
END (_start)

上面就是glibc里对应的_start函数,以汇编写的。

也就是说,kernel的execve系统调用在加载完目标程序后,执行的第一个函数,就是上面的_start函数。

该段汇编代码的注释已经把其作用讲的很清楚了,大意就是按照c语言的calling convention,先把__libc_start_main函数所需的参数放入到对应的寄存器或栈中,再调用__libc_start_main函数。

即:

把main函数放入rdi寄存器中,把argc放入rsi寄存器中,把argv放入rdx寄存器中,把init函数放入rcx寄存器中,把fini函数放入r8寄存器中,把rtld_fini函数放入r9寄存器中,把stack_end压入栈中,至此,将要调用的__libc_start_main函数的参数已准备完毕,最后通过call指令,调用__libc_start_main函数。

// csu/libc-start.c# define LIBC_START_MAIN __libc_start_main.../* Note: the fini parameter is ignored here for shared library.  It   is registered with __cxa_atexit.  This had the disadvantage that   finalizers were called in more than one place.  */STATIC intLIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),                 int argc, char **argv,                 ...                 __typeof (main) init,                 void (*fini) (void),                 void (*rtld_fini) (void), void *stack_end){  /* Result of the 'main' function.  */  int result;  ...  /* Nothing fancy, just call the function.  */  result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);  ...  exit (result);}

上面就是对应的__libc_start_main函数,由上可见,该函数的参数及其顺序和前面的_start函数里按照c语言的calling convention准备的参数及顺序是一致的。

__libc_start_main函数在执行了大段的准备代码之后,最终调用了我们的main函数。

在main函数返回之后,将其结果赋值给result,然后再调用exit(result)作为该程序的返回值。

至此,一个程序的完整生命周期就结束了。

完。

原文发布于微信公众号 - Linux内核及JVM底层相关技术研究(ytcode)

原文发表时间:2019-05-24

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

编辑于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券