前文再续,书接上一回(大误
打ddctf遇到一题感觉可以用fcd优化的题目,用fcd导出IR,结果多了超多东西,于是现在来仔细分析一下fcd是如何从汇编转换到IR的
上一篇文章分析到,是在createFunction这个函数来进行汇编->IR之间的转换的,我们继续分析这个函数
TranslationContext::createFunction
一开始创建一个函数1
2Function* fn = functionMap->createFunction(baseAddress);
assert(fn != nullptr);
然后获取一些信息,看了下,是获取架构,还有如果是x86-64的话会获取rsp1
auto targetInfo = TargetInfo::getTargetInfo(*module);
再下面是生成blockMap,获取函数入口
1 | AddressToBlock blockMap(*fn); |
获取函数参数1
2Argument* registers = &*fn->arg_begin();
auto flags = new AllocaInst(irgen->getFlagsTy(), "flags", entry);
这里略过一部分初始化的东西,直接到下面反汇编以及翻译IR的部分
这里是拿取要反汇编的地址,动态调试了下,发现是循环取某个block里面的地址1
while (blockMap.getOneStub(addressToDisassemble))
这里executable.map的逻辑简单描述为,返回一个指针,这个指针指向的是要反汇编的指令的内存(文件内容已经读取进内存了)1
if (auto begin = executable.map(addressToDisassemble))
这一行是利用capstone进行反汇编1
if (cs->disassemble(inst.get(), begin, end, addressToDisassemble))
这里是获取下一条指令的地址1
2
3auto nextInstAddress = inst->address + inst->size;
auto ipValue = ConstantInt::get(ipType, nextInstAddress);
new StoreInst(ipValue, ipPointer, false, thisBlock);
然后根据指令的id去判断对应的opcode是否已经implement了1
if (Function* implementation = irgen->implementationFor(inst->id))
可以进函数里面看一下,是用一个vector里面拿取的1
2
3
4llvm::Function* implementationFor(unsigned index)
{
return functionByOpcode.at(index);
}
看着看着就有一个疑问,functionByOpcode里面的是从哪里来的呢?于是一步步追溯回去
找到在class x86CodeGenerator 里面初始化化的一步1
#define X86_INSTRUCTION_DECL(e, n) funcs[e] = getFunction("x86_" #n);
然后印象中在x86.emulator.cpp里面看到过这个函数
找了下1
2
3
4
5
6
7
8
9
10
11X86_INSTRUCTION_DEF(add)
{
const cs_x86_op* source = &inst->operands[1];
const cs_x86_op* destination = &inst->operands[0];
uint64_t left = x86_read_destination_operand(destination, regs);
uint64_t right = x86_read_source_operand(source, regs);
memset(flags, 0, sizeof *flags);
uint64_t result = x86_add(flags, source->size, left, right);
x86_write_destination_operand(destination, regs, result);
}
这里大概就是翻译每一个指令为IR吧,动态调试了下,实际会把x86.emulator.cpp定义的那些指令全部内联到x86CodeGenerator init那个函数里面
回去再看了一下lifting-x86-code 终于明白干了什么了,作者是写了一个模拟器,实现了一些指令,例如上面那个add的函数,Clang将这个函数转换为IR,而这些IR与add这个汇编对应的IR执行的结果是相同的
继续回去translation_context,假如有实现这条指令,那么就将其inline到block中1
2
3
4// We have an implementation: inline it
Constant* detailAsConstant = irgen->constantForDetail(*inst->detail);
inliningParameters[1] = new GlobalVariable(*module, detailAsConstant->getType(), true, GlobalValue::PrivateLinkage, detailAsConstant);
irgen->inlineFunction(fn, implementation, inliningParameters, *functionMap, blockMap, nextInstAddress);
我们来一条条分析上面的代码,首先是1
Constant* detailAsConstant = irgen->constantForDetail(*inst->detail);
在constantForDetail里面,可以拿到指令的各种信息1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21for (size_t i = 0; i < 8; i++)
{
vector<Constant*> structFields {
ConstantInt::get(int32Ty, cs.operands[i].mem.segment),
ConstantInt::get(int32Ty, cs.operands[i].mem.base),
ConstantInt::get(int32Ty, cs.operands[i].mem.index),
ConstantInt::get(int32Ty, static_cast<unsigned>(cs.operands[i].mem.scale)),
ConstantInt::get(int64Ty, static_cast<uint64_t>(cs.operands[i].mem.disp)),
};
Constant* opMem = ConstantStruct::get(x86OpMem, structFields);
Constant* wrapper = ConstantStruct::get(x86OpMemWrapper, opMem, nullptr);
structFields = {
ConstantInt::get(int32Ty, cs.operands[i].type),
wrapper,
ConstantInt::get(int8Ty, cs.operands[i].size),
ConstantInt::get(int32Ty, cs.operands[i].avx_bcast),
ConstantInt::get(int8Ty, cs.operands[i].avx_zero_opmask),
};
operands.push_back(ConstantStruct::get(x86Op, structFields));
}
再下一行,这里是将上面拿到的指令的细节转换为GlobalVariable,存到inliningParameters[1]1
inliningParameters[1] = new GlobalVariable(*module, detailAsConstant->getType(), true, GlobalValue::PrivateLinkage, detailAsConstant);
之后,这一行应该是把反汇编的结果转换为IR的1
irgen->inlineFunction(fn, implementation, inliningParameters, *functionMap, blockMap, nextInstAddress);
一开始复制了下参数什么的,然后再复制Function,然后把当前block和上一个block拼在一起,之后的llvm的东西不是很熟悉,有点看不懂
1 | void CodeGenerator::inlineFunction(Function *target, Function *toInline, ArrayRef<Value *> parameters, AddressToFunction& funcMap, AddressToBlock &blockMap, uint64_t nextAddress) |
再跟进resolveIntrinsics看一下,一堆遍历判断,最终到达replaceIntrinsic,但是感觉 auto call那里已经是翻译完成的了?1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19virtual void resolveIntrinsics(llvm::Function& targetFunction, AddressToFunction& funcMap, AddressToBlock& blockMap) override
{
Module& module = *targetFunction.getParent();
for (Function* decl : declarations)
{
StringRef name = decl->getName();
Function* matching = module.getFunction(name);
if (isIntrinsic(name))
{
auto iter = matching->use_begin();
while (iter != matching->use_end())
{
auto call = cast<CallInst>(iter->getUser());
++iter;
replaceIntrinsic(funcMap, blockMap, name, call);
}
}
}
}
最后到replaceIntrinsic,这里只会处理[“x86_jump_intrin”,”x86_call_intrin”,”x86_ret_intrin”,”x86_read_mem”,”x86_write_mem”]这五类指令,但是llvm的知识太少,感觉也看不太懂这里干了什么1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18void replaceIntrinsic(AddressToFunction& funcMap, AddressToBlock& blockMap, StringRef name, CallInst* translated)
{
if (name == "x86_jump_intrin")
{
if (auto constantDestination = dyn_cast<ConstantInt>(translated->getOperand(2)))
{
BasicBlock* parent = translated->getParent();
BasicBlock* remainder = parent->splitBasicBlock(translated);
auto terminator = parent->getTerminator();
uint64_t dest = constantDestination->getLimitedValue();
BasicBlock* destination = blockMap.blockToInstruction(dest);
BranchInst::Create(destination, terminator);
terminator->eraseFromParent();
remainder->eraseFromParent();
}
}
else if (name == "x86_call_intrin")