我的xinetd守护进程在内核升级(从2.6.24升级到2.6.33)后突然停止工作。我运行了一个strace,发现了这个:
[...]
close(3) = 0
munmap(0x7f1a93b43000, 4096) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=8*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
close(3) = 4294967287
exit_group(1) = ?
因此,基本上看起来close系统调用返回的值不是0或-1
我做了几次测试,似乎只有64位可执行文件才会发生这种情况:
$ file closetest32
closetest32: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
$ strace closetest32
execve("./closetest32", ["closetest32"], [/* 286 vars */]) = 0
[ Process PID=4731 runs in 32 bit mode. ]
open("/proc/mounts", O_RDONLY) = 3
close(3) = 0
close(3) = -1 EBADF (Bad file descriptor)
_exit(0) = ?
$ file closetest64
closetest64: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), statically linked, not stripped
$ strace closetest64
execve("./closetest64", ["closetest64"], [/* 286 vars */]) = 0
open("/proc/mounts", O_RDONLY) = 3
close(3) = 0
close(3) = 4294967287
_exit(0) = ?
我正在运行以下内核:
Linux foobar01 2.6.33.9-rt31.64.el5rt #1 SMP PREEMPT RT Wed May 4 10:34:12 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
最糟糕的是,我不能在另一台具有相同内核的机器上重现这个bug。
有什么想法吗?
编辑:根据要求:以下是用于closetest32和closetest64的代码
closetest32.asm:
.section .data
filename:
.ascii "/proc/mounts"
.section .text
.globl _start
_start:
xorl %edi, %edi
movl $5, %eax # open() i386 system call
leal filename, %ebx # %ebx ---> filename
movl $0, %esi # O_RDONLY flag into esi
int $0x80
xorl %edi, %edi
movl $6, %eax # close() i386 system call
movl $3, %ebx # fd 3
int $0x80
xorl %edi, %edi
movl $6, %eax # close() i386 system call
movl $3, %ebx # fd 3
int $0x80
## terminate program via _exit () system call
movl $1, %eax # %eax = _exit() i386 system call
xorl %ebx, %ebx # %ebx = 0 normal program return code
int $0x80
编译为:
as test32.asm -o test32.o --32
ld -m elf_i386 test32.o -o closetest32
closetest64.asm:
.section .data
filename:
.ascii "/proc/mounts"
.section .text
.globl _start
_start:
xorq %rdi, %rdi
movq $2, %rax # open() system call
leaq filename, %rdi # %rdi ---> filename
movq $0, %rsi # O_RDONLY flag into rsi
syscall
xorq %rdi, %rdi
movq $3, %rax # close() system call
movq $3, %rdi # fd 3
syscall
xorq %rdi, %rdi
movq $3, %rax # close() system call
movq $3, %rdi # fd 3
syscall
## terminate program via _exit () system call
movq $60, %rax # %rax = _exit() system call
xorq %rdi, %rdi # %rdi = 0 normal program return code
syscall
编译:
as test64.asm -o test64.o
ld test64.o -o closetest64
发布于 2012-05-08 17:41:35
正如预期的那样,回滚到以前的内核版本解决了这个问题。我不是一个真正的内核专家,但据我所知,@R给出的答案。这很有意义:
这是一台64位计算机,因此1<<32-9不应该出现。问题是内核在内部使用unsigned而不是int作为其中一些函数的返回值,然后返回-EBADF,它的模数为2^32而不是模数2^64
问题是,处理syscall错误返回的libc syscall包装器中的泛型代码在进行比较以确定返回值是否是指示错误的小负值时,必须将返回值视为long (因为对于某些syscall,返回值可能是指针或long )。但是内核返回(long)(unsigned)-9,这与(long)-9非常不同。或者(无符号的长整型)-9(这两个都可以)。
https://stackoverflow.com/questions/10335448
复制相似问题