fcd源码分析1

这是一篇关于fcd编译器的源码分析,一些不重要的细节会略过
(流水账及咕咕咕预警

初步解析

下面是main函数的初步解析过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
EnablePrettyStackTrace();
sys::PrintStackTraceOnErrorSignal(argv[0]);

pruneOptionList(cl::getRegisteredOptions());
cl::ParseCommandLineOptions(argc, argv, "native program decompiler");

if (customPassPipeline != "default" && additionalPasses.size() > 0)
{
errs() << sys::path::filename(argv[0]) << ": additional passes only accepted when using the default pipeline\n";
errs() << "Specify custom passes using the " << customPassPipeline.ArgStr << " parameter\n";
return 1;
}

Main::initializePasses();

Main mainObj(argc, argv);
string program = mainObj.getProgramName();

首先读取命令行参数并进行解析,然后初始化Passes

从解析到的命令行参数中拿到要反编译的文件的文件名

后面有

1
2
3
4
5
6
7
8
9
10

// step 0: before even attempting anything, prepare optimization passes
// (the user won't be happy if we work for 5 minutes only to discover that the optimization passes don't load)
if (!mainObj.prepareOptimizationPasses())
{
return 1;
}

unique_ptr<Executable> executable;
unique_ptr<Module> module;

这里就不过多分析

1
// step one: create annotated module from executable (or load it from .ll)

这一部分就是从程序或者.ll文件中读取IR,因为我们主要重点在解析程序那里,所以就只分析那一块

1
2
3
4
5
6
7
8
9
10
PrettyStackTraceFormat parsingIR("Parsing executable \"%s\"", inputFile.c_str());

bufferOrError = MemoryBuffer::getFile(inputFile, -1, false);
if (!bufferOrError)
{
cerr << program << ": can't open " << inputFile << ": " << errorOf(bufferOrError) << endl;
return 1;
}

auto executableOrError = mainObj.parseExecutable(*bufferOrError.get());

从文件中读取内容,然后在mainObj.parseExecutable函数里面解析

跟进去可以看到是

1
2
3
4
5
6
ErrorOr<unique_ptr<Executable>> parseExecutable(MemoryBuffer& executableCode)
{
auto start = reinterpret_cast<const uint8_t*>(executableCode.getBufferStart());
auto end = reinterpret_cast<const uint8_t*>(executableCode.getBufferEnd());
return Executable::parse(start, end);
}

然后是

1
2
3
4
ErrorOr<unique_ptr<Executable>> Executable::parse(const uint8_t* begin, const uint8_t* end)
{
return executableFactory->parse(begin, end);
}

再到这里,大概是一个根据文件格式来判断用的解析器,这里好像只支持ELF

1
2
3
4
5
6
7
8
9
10
11
virtual llvm::ErrorOr<std::unique_ptr<Executable>> parse(const uint8_t* begin, const uint8_t* end) override
{
assert(end >= begin);
uintptr_t size = static_cast<uintptr_t>(end - begin);
if (size >= sizeof elf_magic && memcmp(begin, elf_magic, sizeof elf_magic) == 0)
{
return ElfExecutableFactory().parse(begin, end);
}

return make_error_code(ExecutableParsingError::Generic_UnknownFormat);
}

跟进去会发现根据ELF32位和64位不同,进行不同的解析

1
2
3
4
5
6
7
8
9
if (auto classByte = bounded_cast<uint8_t>(begin, end, EI_CLASS))
{
switch (*classByte)
{
case 1: return ElfExecutable<Elf32Types>::parse(begin, end);
case 2: return ElfExecutable<Elf64Types>::parse(begin, end);
default: break;
}
}

再往里面的parse就不分析了,看了下,大概就是解析各个结构体,识别出段和symbol之类的东西

进一步解析

回到main函数,上面初步解析完各种结构了,接下来是将机器码翻译为IR

1
2
3
executable = move(executableOrError.get());
string moduleName = sys::path::stem(inputFile);
auto moduleOrError = mainObj.generateAnnotatedModule(*executable, moduleName);

进入generateAnnotatedModule进行分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Load headers here, since this is the earliest point where we have an executable and a module.
auto cDecls = HeaderDeclarations::create(
transl.get(),
headerSearchPath.begin(),
headerSearchPath.end(),
headers.begin(),
headers.end(),
frameworks.begin(),
frameworks.end(),
errs());
if (!cDecls)
{
return make_error_code(FcdError::Main_HeaderParsingError);
}

EntryPointRepository entryPoints;
entryPoints.addProvider(executable);
entryPoints.addProvider(*cDecls);

读取headers等来添加到entrypoint

下面省略大部分东西,直接到关键的部分

1
2
3
4
5
6
7
if (Function* fn = transl.createFunction(functionInfo.virtualAddress))
{
if (Function* cFunction = cDecls->prototypeForAddress(functionInfo.virtualAddress))
{
md::setFinalPrototype(*fn, *cFunction);
}
}

这里是创建一个function,在createFunction里面是具体的解析过程

大概就是 capstone反编译 -> 将反编译结果转换为IR

翻译IR那部分具体可以看code_generator.cpp的resolveIntrinsics函数,跟着进去分析就可以了,这里就不多说了