fcd源码分析4

前言

这篇文章分析的是fcd自带Pass中的argrec Pass和llvm自带的sroa带来的影响

正文

ArgumentRecovery

ArgumentRecovery这个Pass是一个Module Pass,所以主要功能在runOnModule函数里面

runOnModule

runOnModule函数主要干了这几件事

  • 调用getRegisterPtr获取每个函数的arg指针存到registerPtr这个unordered_map里面
  • 恢复函数的Arguments
  • 除去那些旧的Function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
bool ArgumentRecovery::runOnModule(Module& module)
{
for (Function& fn : module.getFunctionList())
{
getRegisterPtr(fn);
}

bool changed = false;
for (Function& fn : module.getFunctionList())
{
if (md::areArgumentsRecoverable(fn))
{
changed |= recoverArguments(fn);
}
}

getAnalysis<CallGraphWrapperPass>().releaseMemory();
for (Function* toErase : functionsToErase)
{
toErase->eraseFromParent();
}

return changed;
}

我们主要来看recoverArguments这个函数

recoverArguments

一开始判断这个函数是否是Prototype,假如是的话,就用一系列函数去分析

1
if (Function* prototype = md::getFinalPrototype(fn))

但是我试着去找了下,好像基本没有生成prototype的函数,因此我们跳过这部分,直接去下部分

这一部分就和上一篇indirect的pass差不多,利用CallConvention去分析函数,获取callinfo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
if (callInfo == nullptr)
{
if (md::isPrototype(fn))
{
// find a call site and consider it canon
for (auto user : fn.users())
{
if (auto call = dyn_cast<CallInst>(user))
{
uniqueCallInfo = paramRegistry.analyzeCallSite(CallSite(call));
callInfo = uniqueCallInfo.get();
break;
}
}
}
else
{
callInfo = paramRegistry.getCallInfo(fn);
}
parameterizedFunction = &createParameterizedFunction(fn, *callInfo);
}

我们可以进一步的去分析下createParameterizedFunction函数

createParameterizedFunction

这里是根据callinfo create了一个新的FunctionType

1
2
3
4
Module& module = *base.getParent();
auto info = TargetInfo::getTargetInfo(*base.getParent());
SmallVector<string, 8> parameterNames;
FunctionType* ft = createFunctionType(*info, callInfo, module, base.getName().str(), parameterNames);

根据新的FunctionType来create一个新的Function,再插回module那里

1
2
Function* newFunc = Function::Create(ft, base.getLinkage());
base.getParent()->getFunctionList().insert(base.getIterator(), newFunc);

下面这一部分是复制旧函数的名字和各种参数,最后把这个函数返回回去,但是其实只是包含参数等东西,主体的部分还没有

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
newFunc->takeName(&base);
newFunc->copyAttributesFrom(&base);
md::copy(base, *newFunc);
for (Argument& arg : newFunc->args())
{
arg.setName(parameterNames[i]);
i++;
}

// set stack pointer
i = 0;
for (const auto& param : callInfo.parameters())
{
if (param.type == ValueInformation::IntegerRegister && param.registerInfo == info->getStackPointer())
{
md::setStackPointerArgument(*newFunc, static_cast<unsigned>(i));
break;
}
++i;
}

return *newFunc;

recoverArguments

回到recoverArguments函数,首先判断callInfo是否存在,假如存在,那么就会调用fixCallSites,去修正所有调用该函数的call,从旧的函数转为新的函数

假如旧的函数有FunctionBody,调用updateFunctionBody去把旧的函数的指令全部复制到新的函数里面,

顺便最后把旧的函数push到functionsToErase Vector里面

1
2
3
4
5
6
7
8
9
10
if (callInfo != nullptr)
{
fixCallSites(fn, *parameterizedFunction, *callInfo);
if (!md::isPrototype(fn))
{
updateFunctionBody(fn, *parameterizedFunction, *callInfo);
functionsToErase.push_back(&fn);
}
return true;
}

我们重点分析一下updateFunctionBody

updateFunctionBody

首先是获取一下目标程序的integer Type和 integerPtr Type

1
2
3
4
5
LLVMContext& ctx = oldFunction.getContext();
auto targetInfo = TargetInfo::getTargetInfo(*oldFunction.getParent());
unsigned pointerSize = targetInfo->getPointerSize() * CHAR_BIT;
Type* integer = Type::getIntNTy(ctx, pointerSize);
Type* integerPtr = Type::getIntNPtrTy(ctx, pointerSize, 1);

接下来把旧的的函数的指令复制到新的函数那里,再把原来的函数删除掉

1
2
3
// move code, delete leftover metadata on oldFunction
newFunction.getBasicBlockList().splice(newFunction.begin(), oldFunction.getBasicBlockList());
oldFunction.deleteBody();

获取旧函数的argument,再获取其类型,在新函数的开始插入一个alloc指令,这个指令alloc了一个registerStruct大小的空间,再把旧Argument的register的使用者全部换成新的那个

1
2
3
4
5
6
7
8
// Create a register structure at the beginning of the function and copy arguments to it.
Argument* oldArg0 = &*oldFunction.arg_begin();
Type* registerStruct = oldArg0->getType()->getPointerElementType();
Instruction* insertionPoint = &*newFunction.begin()->begin();
AllocaInst* newRegisters = new AllocaInst(registerStruct, "registers", insertionPoint);
md::setRegisterStruct(*newRegisters);
oldArg0->replaceAllUsesWith(newRegisters);
registerPtr[&newFunction] = newRegisters;

然后遍历参数,把参数存到新的registerStruct里面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Copy each argument to the register structure or to the stack.
auto valueIter = ci.begin();
for (Argument& arg : newFunction.args())
{
if (valueIter->type == ValueInformation::IntegerRegister)
{
auto gep = targetInfo->getRegister(newRegisters, *valueIter->registerInfo, *insertionPoint);
auto cast = CastInst::CreatePointerCast(gep, arg.getType()->getPointerTo(), "", insertionPoint);
new StoreInst(&arg, cast, insertionPoint);
}
else if (valueIter->type == ValueInformation::Stack)
{
auto offsetConstant = ConstantInt::get(integer, valueIter->frameBaseOffset);
auto offset = BinaryOperator::Create(BinaryOperator::Add, spValue, offsetConstant, "", insertionPoint);
auto casted = new IntToPtrInst(offset, integerPtr, "", insertionPoint);
new StoreInst(&arg, casted, insertionPoint);
}
else
{
llvm_unreachable("not implemented");
}
valueIter++;
}

最后假如function返回了值,再调整一下

1
2
3
4
5
6
7
8
9
10
11
12
13
// If the function returns, adjust return values.
if (!newFunction.doesNotReturn() && !newFunction.getReturnType()->isVoidTy())
{
for (BasicBlock& bb : newFunction)
{
if (auto ret = dyn_cast<ReturnInst>(bb.getTerminator()))
{
Value* returnValue = createReturnValue(newFunction, ci, ret);
ReturnInst::Create(ctx, returnValue, ret);
ret->eraseFromParent();
}
}
}

runOnModule

回到 runOnModule函数

1
2
3
4
5
getAnalysis<CallGraphWrapperPass>().releaseMemory();
for (Function* toErase : functionsToErase)
{
toErase->eraseFromParent();
}

这里把上面添加到functionsToErase的Function全部去除掉

sroa

我们可以找下llvm的官方文档,里面有介绍sroa

1
The well-known scalar replacement of aggregates transformation. This transform breaks up alloca instructions of aggregate type (structure or array) into individual alloca instructions for each member if possible. Then, if possible, it transforms the individual alloca instructions into nice clean scalar SSA form.

大概意思就是把alloc一个结构体,然后从里面取值进行操作这种行为拆开,变成单独alloc每一部分,这样便于下一步的优化

我们可以看下用了sroa之前和之后

之前:

1
2
3
4
5
6
7
8
9
10
11
12
define void @_init(i64 %rip, i64 %rsp) !fcd.vaddr !2 !fcd.recoverable !3 !fcd.funver !4 !fcd.stackptr !4 {
entry:
%registers = alloca %struct.x86_regs, !fcd.registers !3
%0 = getelementptr inbounds %struct.x86_regs, %struct.x86_regs* %registers, i64 0, i32 8, i32 0
%sp = load i64, i64* %0
%1 = getelementptr inbounds %struct.x86_regs, %struct.x86_regs* %registers, i64 0, i32 9, i32 0
%2 = bitcast i64* %1 to i64*
store i64 %rip, i64* %2
%3 = getelementptr inbounds %struct.x86_regs, %struct.x86_regs* %registers, i64 0, i32 8, i32 0
%4 = bitcast i64* %3 to i64*
store i64 %rsp, i64* %4
%5 = getelementptr inbounds %struct.x86_regs, %struct.x86_regs* %registers, i64 0, i32 9, i32 0

之后

1
2
3
4
5
6
7
8
define void @_init(i64 %rip, i64 %rsp) !fcd.vaddr !2 !fcd.recoverable !3 !fcd.funver !4 !fcd.stackptr !4 {
entry:
%0 = add i64 %rsp, -8
%1 = inttoptr i64 %0 to i64*
store i64 %rip, i64* %1, align 4, !fcd.prgmem !3
%2 = add i64 %rsp, -16
%3 = load i64, i64* inttoptr (i64 6299640 to i64*), align 8
%4 = icmp eq i64 %3, 0

相比一下,用了sroa之后没有了对struct.x86_regs的一堆操作,而且简化了不少指令