审计固件的时候碰到了一个mips64下uClibc堆管理利用的问题,恰巧网络上关于这个的分析不是很多,于是研究了一下。并不是很全面,做个索引,若有进一步了解时继续补全。
面向百度百科的废话
uClibc 是一个面向嵌入式Linux系统的小型的C标准库。最初uClibc是为了支持uClinux而开发,这是一个不需要内存管理单元的Linux版本,因此适合于微控制器系统。
uClibc比一般用于Linux发行版的C库GNU C Library (glibc)要小得多,glibc目标是要支持最大范围的硬件和内核平台的所有C标准,而uClibc专注于嵌入式Linux.很多功能可以根据空间需求进行取舍。
uClibc运行于标准的以及无MMU的Linux系统上,支持i386,x86 64,ARM (big/little endian), AVR32,Blackfin,h8300,m68k,MIPS (big/little endian), PowerPC,SuperH (big/little endian), SPARC,和v850等处理器。
人话
对于某些架构的嵌入式硬件,需要一个低开销的C标准库实现,于是uClibc就出现了。但是由于其实现方式与glibc差别较大,所以利用思路上需要一些转变。好在uClibc没有傻大笨glibc的各种检查,利用思路较为简单明确。
关于uClibc利用分析首当其冲的就是malloc和free等内存管理函数的实现。事实上通过观察其源码可以发现,uClibc中malloc有三种实现,包括malloc
, malloc-simple
和malloc-standard
。其中 malloc-standard
是最近更新的。它就是把早期 glibc
的 dlmalloc
移植到了 uClibc
中。本文关于利用的分析重点在malloc
在这个版本的内存管理逻辑中,内存的分配和释放几乎就一一对应了mmap
和munmap
...
[libc/stdlib/malloc-simple/alloc.c]
#ifdef L_malloc
void *malloc(size_t size)
{
void *result;
if (unlikely(size == 0)) {
#if defined(__MALLOC_GLIBC_COMPAT__)
size++;
#else
/* Some programs will call malloc (0). Lets be strict and return NULL */
__set_errno(ENOMEM);
return NULL;
#endif
}
#ifdef __ARCH_USE_MMU__
# define MMAP_FLAGS MAP_PRIVATE | MAP_ANONYMOUS
#else
# define MMAP_FLAGS MAP_SHARED | MAP_ANONYMOUS | MAP_UNINITIALIZED
#endif
result = mmap((void *) 0, size + sizeof(size_t), PROT_READ | PROT_WRITE,
MMAP_FLAGS, 0, 0);
if (result == MAP_FAILED) {
__set_errno(ENOMEM);
return 0;
}
* (size_t *) result = size;
return(result + sizeof(size_t));
}
#endif
可以发现size没有做过多检查和处理就进了mmap的参数,而返回的地址则由mmap决定,并不存在一个特定的heap
段
[libc/stdlib/malloc-simple/alloc.c]
#ifdef L_free
void free(void *ptr)
{
if (unlikely(ptr == NULL))
return;
if (unlikely(__libc_free_aligned != NULL)) {
if (__libc_free_aligned(ptr))
return;
}
ptr -= sizeof(size_t);
munmap(ptr, * (size_t *) ptr + sizeof(size_t));
}
#endif
直接调用了munmap
我分析的固件使用的是这个机制
location: libc/stdlib/malloc-standard/*
相对而言malloc-standard较为复杂,具体逻辑可以直接参考
dlmalloc
这个版本我愿称之为“无敌大套娃”
使用malloc
函数时发生了如下调用链
void *malloc (size_t size)
[libc/stdlib/malloc/malloc.c]
mem = malloc_from_heap (size, &__malloc_heap, &__malloc_heap_lock);
↓
__malloc_from_heap (size_t size, struct heap_free_area **heap)
[libc/stdlib/malloc/malloc.c]
↓
尝试使用__heap_alloc
获取堆区中管理的已释放的内存:
/* First try to get memory that's already in our heap. */
mem = __heap_alloc (heap, &size);
↓
__heap_alloc (struct heap_free_area **heap, size_t *size)
[libc/stdlib/malloc/heap_alloc.c]
/* Allocate and return a block at least *SIZE bytes long from HEAP.
*SIZE is adjusted to reflect the actual amount allocated (which may be
greater than requested). */
void *
__heap_alloc (struct heap_free_area **heap, size_t *size)
{
struct heap_free_area *fa;
size_t _size = *size;
void *mem = 0;
_size = HEAP_ADJUST_SIZE (_size);
if (_size < sizeof (struct heap_free_area))
/* Because we sometimes must use a freed block to hold a free-area node,
we must make sure that every allocated block can hold one. */
_size = HEAP_ADJUST_SIZE (sizeof (struct heap_free_area));
HEAP_DEBUG (*heap, "before __heap_alloc");
/* Look for a free area that can contain _SIZE bytes. */
for (fa = *heap; fa; fa = fa->next)
if (fa->size >= _size)
{
/* Found one! */
mem = HEAP_FREE_AREA_START (fa);
*size = __heap_free_area_alloc (heap, fa, _size);
break;
}
HEAP_DEBUG (*heap, "after __heap_alloc");
return mem;
}
如果请求的size小于下面结构体的大小会被自动扩大(原因见注释):
/* A free-list area `header'. These are actually stored at the _ends_ of
free areas (to make allocating from the beginning of the area simpler),
so one might call it a `footer'. */
struct heap_free_area
{
size_t size;
struct heap_free_area *next, *prev;
};
注意这个结构体在被free的块的底部,这很重要
然后就是在一条链表(就是一开始传入的&__malloc_heap
)上遍历查找第一个size大于等于请求size的节点进入一个内联函数__heap_free_area_alloc
[libc/stdlib/malloc/heap.h]:
static __inline__ size_t
__heap_free_area_alloc (struct heap_free_area **heap,
struct heap_free_area *fa, size_t size)
{
size_t fa_size = fa->size;
if (fa_size < size + HEAP_MIN_FREE_AREA_SIZE)
/* There's not enough room left over in FA after allocating the block, so
just use the whole thing, removing it from the list of free areas. */
{
__heap_delete (heap, fa);
/* Remember that we've alloced the whole area. */
size = fa_size;
}
else
/* Reduce size of FA to account for this allocation. */
fa->size = fa_size - size;
return size;
}
该函数判断分配掉目标大小的size之后,剩余体积是否足够HEAP_MIN_FREE_AREA_SIZE,不够的话就整个从链表中取出(使用的双链表unlink),否则只取出对应大小的部分内存(切割)。
如果你有疑问:为啥在切割是不涉及链表操作?
那么请往上看:struct heap_free_area
这个区域在freed区域的底部,只需要修改其中的size,然后把需要的mem取出,就完成了一次切割,节省了很多链表操作,提高了效率。
...
回到__malloc_from_heap
,假如没有足够大小的freed区域用于取出,则会用mmap或者sbrk的方式向操作系统取得一块新的内存,具体使用mmap还是sbrk取决于编译时使用的宏:
#ifdef MALLOC_USE_SBRK
//如果用sbrk
__malloc_lock_sbrk ();
/* Use sbrk we can, as it's faster than mmap, and guarantees
contiguous allocation. */
block = sbrk (block_size);
if (likely (block != (void *)-1))
{
/* Because sbrk can return results of arbitrary
alignment, align the result to a MALLOC_ALIGNMENT boundary. */
long aligned_block = MALLOC_ROUND_UP ((long)block, MALLOC_ALIGNMENT);
if (block != (void *)aligned_block)
/* Have to adjust. We should only have to actually do this
the first time (after which we will have aligned the brk
correctly). */
{
/* Move the brk to reflect the alignment; our next allocation
should start on exactly the right alignment. */
sbrk (aligned_block - (long)block);
block = (void *)aligned_block;
}
}
__malloc_unlock_sbrk ();
#else /* !MALLOC_USE_SBRK */
/* Otherwise, use mmap. */
#ifdef __ARCH_USE_MMU__
block = mmap ((void *)0, block_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
#else
block = mmap ((void *)0, block_size, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_UNINITIALIZED, 0, 0);
#endif
注意mem在返回到用户前会经过下列宏处理,以设置malloc_header,并让mem指向用户区域:
/* Set up the malloc header, and return the user address of a malloc block. */
#define MALLOC_SETUP(base, size) \
(MALLOC_SET_SIZE (base, size), (void *)((char *)base + MALLOC_HEADER_SIZE))
有了malloc的逻辑,free的逻辑也差不多明晰了
void free (void *mem)
[libc/stdlib/malloc/free.c]
↓
static void __free_to_heap (void *mem, struct heap_free_area **heap)
[libc/stdlib/malloc/free.c]
首先调用__heap_free
把被free的内存放入链中:
/* Put MEM back in the heap, and get the free-area it was placed in. */
fa = __heap_free (heap, mem, size);
↓
struct heap_free_area *__heap_free (struct heap_free_area **heap, void *mem, size_t size)
[libc/stdlib/malloc/hewp_free.c]
/* Return the block of memory at MEM, of size SIZE, to HEAP. */
struct heap_free_area *
__heap_free (struct heap_free_area **heap, void *mem, size_t size)
{
struct heap_free_area *fa, *prev_fa;
/* 此时的mem经过MALLOC_BASE宏处理,指向MALLOC_HADER */
void *end = (char *)mem + size;
HEAP_DEBUG (*heap, "before __heap_free");
/* Find the right position in the free-list entry to place the new block.
This is the most speed critical loop in this malloc implementation:
since we use a simple linked-list for the free-list, and we keep it in
address-sorted order, it can become very expensive to insert something
in the free-list when it becomes fragmented and long. [A better
implemention would use a balanced tree or something for the free-list,
though that bloats the code-size and complexity quite a bit.] */
/* 空闲区域链表是按照地址从小到大排列的,这个循环是为了找到 mem 应该插入的位置 */
for (prev_fa = 0, fa = *heap; fa; prev_fa = fa, fa = fa->next)
/* 遍历判断fa的尾部是否大于被free的内存的头部 */
if (unlikely (HEAP_FREE_AREA_END (fa) >= mem))
break;
/* 判断fa的头部是否小于等于被free内存的尾部(这里包含了部分overlap的情况?) */
if (fa && HEAP_FREE_AREA_START (fa) <= end)
/* The free-area FA is adjacent to the new block, merge them. */
{
size_t fa_size = fa->size + size;
/* 出现首尾相接 */
if (HEAP_FREE_AREA_START (fa) == end)
/* FA is just after the new block, grow down to encompass it. */
{
/* See if FA can now be merged with its predecessor. */
/* 判断free的内存是否刚好卡在prev_fa和fa之间,是则将三个块合并,作为一个新节点 */
if (prev_fa && mem == HEAP_FREE_AREA_END (prev_fa))
/* Yup; merge PREV_FA's info into FA. */
{
fa_size += prev_fa->size;
__heap_link_free_area_after (heap, fa, prev_fa->prev);
}
}
else
/* 个人感觉这部分实现有些逻辑错误,正在招专业人员求证,有结果了细化一下 */
/* FA is just before the new block, expand to encompass it. */
{
struct heap_free_area *next_fa = fa->next;
/* See if FA can now be merged with its successor. */
if (next_fa && end == HEAP_FREE_AREA_START (next_fa))
/* Yup; merge FA's info into NEXT_FA. */
{
fa_size += next_fa->size;
__heap_link_free_area_after (heap, next_fa, prev_fa);
fa = next_fa;
}
else
/* FA can't be merged; move the descriptor for it to the tail-end
of the memory block. */
{
/* The new descriptor is at the end of the extended block,
SIZE bytes later than the old descriptor. */
fa = (struct heap_free_area *)((char *)fa + size);
/* Update links with the neighbors in the list. */
__heap_link_free_area (heap, fa, prev_fa, next_fa);
}
}
/* 设置新节点的size */
fa->size = fa_size;
}
else
/* Make the new block into a separate free-list entry. */
/* 如果fa和 mem之间有空隙或者 mem> HEAP_FREE_AREA_END (fa),那么可以简单地把 mem 插入 prev_fa 和 fa之间 */
fa = __heap_add_free_area (heap, mem, size, prev_fa, fa);
HEAP_DEBUG (*heap, "after __heap_free");
return fa;
}
看注释
这段代码主要处理被释放内存在入链时的合并和插入
uClibc中没有类似Glibc那样的__free_hook
和__malloc_hook
的机制,但是部分函数间调用使用了类似got表的机制,这里可以看反汇编后的结果:
关于这块这么设计的原因我不太清楚...
既然如此,那么如果能通过任意地址写改libuClibc.so中某些函数的got的地址也许就可以借助system("/bin/sh\x00")
来getshell。
不过要与程序本身的got表区分,如果程序已经导入了某些函数符号,直接修改掉so中这些函数符号的got是不能影响程序本身调用的目标的。(重要)
.got:00000000000A8510 # Segment type: Pure data
.got:00000000000A8510 .data # .got
.got:00000000000A8510 off_A8510: .dword ___libc_stack_end
.got:00000000000A8510 # DATA XREF: _setjmp+4↑o
.got:00000000000A8510 # setjmp+4↑o ...
.got:00000000000A8518 .dword 0x8000000000000000
.got:00000000000A8520 off_A8520: .dword qword_AA1B0 # DATA XREF: brk+24↑r
.got:00000000000A8528 off_A8528: .dword sub_5C5C0 # DATA XREF: __sigsetjmp_aux+3C↑r
.got:00000000000A8530 .dword sub_64730
.got:00000000000A8538 .dword sub_647F8
.got:00000000000A8540 memcpy_ptr: .dword memcpy
.got:00000000000A8548 off_A8548: .dword loc_20000 # DATA XREF: vwarn+C↑r
.got:00000000000A8548 # vwarnx+C↑r
.got:00000000000A8550 exit_ptr: .dword exit
.got:00000000000A8558 open_ptr: .dword open # DATA XREF: creat+C↑r
...
很明显,释放内存的munmap
是一个很好的攻击目标,它的第一个参数正好是一个字符串指针,并且可控程度很高,如果能劫持其got表就可以爽歪歪了。
大部分操作都是一个基本没啥保护的双链表的操作,而且负责管理链表的heap_free_area在每个内存块的末尾。意味着如果有UAF的和堆溢出情况下可以修改free_size,然后取出被修改的节点造成向低地址的overlap。
在取出内存的过程中存在分割操作,如果可以找到目标区域附近某些值作为free_size(最好特别大),然后修改链表的某个next指针到这。当申请内存合适的时候可以拿到目标区域的内存。注意这种利用方式不能触发__heap_delete
,否则容易出错。
由于这种分配器只有fastbin和unsortedbin两种结构,并且检查很稀松,所以大部分ptmalloc
的知识可以迁移过来。并且伪造fastbin并取出时不检查目标区域的size...这简直给了和tcache一样的大方便。
刨除这部分,重点讲下怎么getshell(因为没有各种hook)...
源码宏太多,这里直接看反编译:
void free(void *__ptr)
{
longlong *plVar1;
uint uVar2;
ulonglong uVar3;
ulonglong uVar4;
longlong lVar5;
ulonglong chunk_true_size;
longlong total_size;
longlong chunk_header_ptr;
ulonglong chunk_size;
longlong lVar6;
undefined auStack64 [32];
undefined1 *local_10;
if (__ptr == (void *)0x0) {
return;
}
local_10 = &_gp_1;
_pthread_cleanup_push_defer(auStack64,pthread_mutex_unlock,&DAT_001a82e0);
pthread_mutex_lock((pthread_mutex_t *)&DAT_001a82e0);
chunk_size = *(ulonglong *)((longlong)__ptr + -8);
chunk_true_size = chunk_size & 0xfffffffffffffffc;
chunk_header_ptr = (longlong)__ptr + -0x10;
if (DAT_001c2cd8 < chunk_true_size) {
uVar4 = DAT_001c2cd8 | 1;
if ((chunk_size & 2) != 0) {
DAT_001c3370 = DAT_001c3370 + -1;
total_size = chunk_true_size + *(longlong *)((longlong)__ptr + -0x10);
_DAT_001c3388 = _DAT_001c3388 - total_size;
/* 注意这里 */
munmap((void *)(chunk_header_ptr - *(longlong *)((longlong)__ptr + -0x10)),(size_t)total_size)
;
goto LAB_0015d85c;
......
当chunk-sized大于一个阈值(不同版本可能不同,我这里是0x50)并且is_mmap标志位为1时,会把chunk_header_ptr-prev_size
的地址送入munmap中。
假设我们有办法覆盖munmap的got表为system,那么如果控制参数为"/bin/sh\x00"?
这是我的一种思路:
prev_size
为0xfffffffffffffff0
(-10)0x63
(大于阈值且is_mmap位和inuse位为1)这样当进入munmap时就相当于执行了system("/bin/sh\x00")
。
参考链接:
https://blog.csdn.net/heliangbin87/article/details/78962425 https://blog.csdn.net/weixin_30596165/article/details/96114098