fcd源码分析2

前文再续,书接上一回(大误

打ddctf遇到一题感觉可以用fcd优化的题目,用fcd导出IR,结果多了超多东西,于是现在来仔细分析一下fcd是如何从汇编转换到IR的

上一篇文章分析到,是在createFunction这个函数来进行汇编->IR之间的转换的,我们继续分析这个函数

TranslationContext::createFunction

一开始创建一个函数

1
2
Function* fn = functionMap->createFunction(baseAddress);
assert(fn != nullptr);

然后获取一些信息,看了下,是获取架构,还有如果是x86-64的话会获取rsp

1
auto targetInfo = TargetInfo::getTargetInfo(*module);

再下面是生成blockMap,获取函数入口

1
2
AddressToBlock blockMap(*fn);
BasicBlock* entry = &fn->back();

获取函数参数

1
2
Argument* registers = &*fn->arg_begin();
auto flags = new AllocaInst(irgen->getFlagsTy(), "flags", entry);

这里略过一部分初始化的东西,直接到下面反汇编以及翻译IR的部分

这里是拿取要反汇编的地址,动态调试了下,发现是循环取某个block里面的地址

1
while (blockMap.getOneStub(addressToDisassemble))

这里executable.map的逻辑简单描述为,返回一个指针,这个指针指向的是要反汇编的指令的内存(文件内容已经读取进内存了)

1
if (auto begin = executable.map(addressToDisassemble))

这一行是利用capstone进行反汇编

1
if (cs->disassemble(inst.get(), begin, end, addressToDisassemble))

这里是获取下一条指令的地址

1
2
3
auto nextInstAddress = inst->address + inst->size;
auto ipValue = ConstantInt::get(ipType, nextInstAddress);
new StoreInst(ipValue, ipPointer, false, thisBlock);

然后根据指令的id去判断对应的opcode是否已经implement了

1
if (Function* implementation = irgen->implementationFor(inst->id))

可以进函数里面看一下,是用一个vector里面拿取的

1
2
3
4
llvm::Function* implementationFor(unsigned index)
{
return functionByOpcode.at(index);
}

看着看着就有一个疑问,functionByOpcode里面的是从哪里来的呢?于是一步步追溯回去

找到在class x86CodeGenerator 里面初始化化的一步

1
#define X86_INSTRUCTION_DECL(e, n) funcs[e] = getFunction("x86_" #n);

然后印象中在x86.emulator.cpp里面看到过这个函数

找了下

1
2
3
4
5
6
7
8
9
10
11
X86_INSTRUCTION_DEF(add)
{
const cs_x86_op* source = &inst->operands[1];
const cs_x86_op* destination = &inst->operands[0];
uint64_t left = x86_read_destination_operand(destination, regs);
uint64_t right = x86_read_source_operand(source, regs);

memset(flags, 0, sizeof *flags);
uint64_t result = x86_add(flags, source->size, left, right);
x86_write_destination_operand(destination, regs, result);
}

这里大概就是翻译每一个指令为IR吧,动态调试了下,实际会把x86.emulator.cpp定义的那些指令全部内联到x86CodeGenerator init那个函数里面

回去再看了一下lifting-x86-code 终于明白干了什么了,作者是写了一个模拟器,实现了一些指令,例如上面那个add的函数,Clang将这个函数转换为IR,而这些IR与add这个汇编对应的IR执行的结果是相同的

继续回去translation_context,假如有实现这条指令,那么就将其inline到block中

1
2
3
4
// We have an implementation: inline it
Constant* detailAsConstant = irgen->constantForDetail(*inst->detail);
inliningParameters[1] = new GlobalVariable(*module, detailAsConstant->getType(), true, GlobalValue::PrivateLinkage, detailAsConstant);
irgen->inlineFunction(fn, implementation, inliningParameters, *functionMap, blockMap, nextInstAddress);

我们来一条条分析上面的代码,首先是

1
Constant* detailAsConstant = irgen->constantForDetail(*inst->detail);

在constantForDetail里面,可以拿到指令的各种信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
for (size_t i = 0; i < 8; i++)
{
vector<Constant*> structFields {
ConstantInt::get(int32Ty, cs.operands[i].mem.segment),
ConstantInt::get(int32Ty, cs.operands[i].mem.base),
ConstantInt::get(int32Ty, cs.operands[i].mem.index),
ConstantInt::get(int32Ty, static_cast<unsigned>(cs.operands[i].mem.scale)),
ConstantInt::get(int64Ty, static_cast<uint64_t>(cs.operands[i].mem.disp)),
};
Constant* opMem = ConstantStruct::get(x86OpMem, structFields);
Constant* wrapper = ConstantStruct::get(x86OpMemWrapper, opMem, nullptr);

structFields = {
ConstantInt::get(int32Ty, cs.operands[i].type),
wrapper,
ConstantInt::get(int8Ty, cs.operands[i].size),
ConstantInt::get(int32Ty, cs.operands[i].avx_bcast),
ConstantInt::get(int8Ty, cs.operands[i].avx_zero_opmask),
};
operands.push_back(ConstantStruct::get(x86Op, structFields));
}

再下一行,这里是将上面拿到的指令的细节转换为GlobalVariable,存到inliningParameters[1]

1
inliningParameters[1] = new GlobalVariable(*module, detailAsConstant->getType(), true, GlobalValue::PrivateLinkage, detailAsConstant);

之后,这一行应该是把反汇编的结果转换为IR的

1
irgen->inlineFunction(fn, implementation, inliningParameters, *functionMap, blockMap, nextInstAddress);

一开始复制了下参数什么的,然后再复制Function,然后把当前block和上一个block拼在一起,之后的llvm的东西不是很熟悉,有点看不懂

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void CodeGenerator::inlineFunction(Function *target, Function *toInline, ArrayRef<Value *> parameters, AddressToFunction& funcMap, AddressToBlock &blockMap, uint64_t nextAddress)
{
assert(toInline->arg_size() == parameters.size());
Module& targetModule = *target->getParent();
auto iter = toInline->arg_begin();

ValueToValueMapTy valueMap;
getModuleLevelValueChanges(valueMap, targetModule);
for (Value* parameter : parameters)
{
valueMap[&*iter] = parameter;
++iter;
}

SmallVector<ReturnInst*, 1> returns;
Function::iterator blockBeforeInstruction = target->back().getIterator();
CloneAndPruneFunctionInto(target, toInline, valueMap, true, returns);

// Stitch blocks together
Function::iterator firstNewBlock = blockBeforeInstruction;
++firstNewBlock;
BranchInst::Create(&*firstNewBlock, &*blockBeforeInstruction);

// Redirect returns
BasicBlock* nextBlock = blockMap.blockToInstruction(nextAddress);
for (auto ret : returns)
{
BranchInst::Create(nextBlock, ret);
ret->eraseFromParent();
}

resolveIntrinsics(*target, funcMap, blockMap);
}

再跟进resolveIntrinsics看一下,一堆遍历判断,最终到达replaceIntrinsic,但是感觉 auto call那里已经是翻译完成的了?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
virtual void resolveIntrinsics(llvm::Function& targetFunction, AddressToFunction& funcMap, AddressToBlock& blockMap) override
{
Module& module = *targetFunction.getParent();
for (Function* decl : declarations)
{
StringRef name = decl->getName();
Function* matching = module.getFunction(name);
if (isIntrinsic(name))
{
auto iter = matching->use_begin();
while (iter != matching->use_end())
{
auto call = cast<CallInst>(iter->getUser());
++iter;
replaceIntrinsic(funcMap, blockMap, name, call);
}
}
}
}

最后到replaceIntrinsic,这里只会处理[“x86_jump_intrin”,”x86_call_intrin”,”x86_ret_intrin”,”x86_read_mem”,”x86_write_mem”]这五类指令,但是llvm的知识太少,感觉也看不太懂这里干了什么

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void replaceIntrinsic(AddressToFunction& funcMap, AddressToBlock& blockMap, StringRef name, CallInst* translated)
{
if (name == "x86_jump_intrin")
{
if (auto constantDestination = dyn_cast<ConstantInt>(translated->getOperand(2)))
{
BasicBlock* parent = translated->getParent();
BasicBlock* remainder = parent->splitBasicBlock(translated);
auto terminator = parent->getTerminator();

uint64_t dest = constantDestination->getLimitedValue();
BasicBlock* destination = blockMap.blockToInstruction(dest);
BranchInst::Create(destination, terminator);
terminator->eraseFromParent();
remainder->eraseFromParent();
}
}
else if (name == "x86_call_intrin")