Interprocedural Analysis

过程间分析简介

本小节通过四个部分介绍过程间分析。

Motivation
- 为什么 要引入过程间分析？
Call Graph Construction (CHA)
- 介绍一个过程间分析 必要的数据结构Call Graph
- 当前有数种方法来构建Call Graph，本节介绍其中速度最快的一种（Class hierarchy analysis，简称CHA）
Interprocedural Control-Flow Graph
- 之前的章节关注CFG，引入过程间分析后，我们向CFG中添加相应的元素，得到过程间的控制流图（ICFG）
- 讨论由于添加了新元素而需要增加的操作
Interprocedural Data-Flow Analysis
- 通过一个例子（也就是实验一中做的常量传播分析）来总结过程间分析

Motivation

注意区别 interprocedural 和 intraprocedural，过程间和过程内分析

如何处理 method call：做最保守的假设，即为函数调用返回NAC，而这种情况会丢失精度，引入过程间分析能够提高精度

// 示例代码
int foo() { return 10; }
void bar() { 
    int x = foo();  // 方法调用点
    System.out.println(x); 
}

分析方法	处理逻辑	x的取值分析结果
过程内分析	保守假设：所有外部方法返回NAC（Not A Constant）	NAC
过程间分析	通过调用图追踪foo()方法的具体实现，获取精确返回值	常量10

Call Graph Construction (CHA)

Definition of Call Graph: A representation of calling relationships in the program. A call graph is a set of call edges from call-sites to their target methods(callees)

Call Graph Construction

call graph用在OOPLs，主要是java，这里说明最准确的（K-CFA）和最快速的（CHA）

Call type in java

主要关注Java的调用关系图构建。在Java中call可分为三类（static call、special call、virtual call）

Instruction：指Java的IR中的指令
Receiver objects：方法调用对应的实例对象（static方法调用不需要对应实例）
Target methods：表达IR指令到被调用目标方法的映射关系
Num of target methods：call对应的可能被调用的目标方法的数量。Virtual call与动态绑定和多态实现有关，可以对应多个对象下的重写方法。所以Virtual call的可能对象可能超过1个
Determinacy：指什么时候能够确定这个call的对应方法。Virtual call与多态有关，只能在运行时决定调用哪一个具体方法的实现。其他两种call都和多态机制不相关，编译时刻就可以确定

调用类型	字节码指令	接收对象	目标方法数	确定性	多态支持	典型场景
Static Call	`invokestatic`	无	1	编译时确定	❌	`Math.pow()`, 工具类方法
Special Call	`invokespecial`	有	1	编译时确定	❌	构造函数、`super.method()`
Virtual Call	`invokevirtual`	有	≥1	运行时确定	✔️	普通实例方法调用（非final）
Interface Call	`invokeinterface`	有	≥1	运行时确定	✔️	接口方法调用（如`List.add()`）

Virtual call and dispatch

Virtual call是几种调用中最为复杂的一种，在动态运行时，Virtual call基于两点决定调用哪个具体方法：

Type of object
Method signature
- Signature = class type + method name + descriptor
- Descriptor = return type + parameter types

Java中Dispatch机制决定具体调用哪个方法：c是一个类的定义，m是一个方法。如果能在本类中找到name和descriptor一致的方法，则调用c的方法，否则到父类中寻找

We define function Dispatch(𝑐, 𝑚) to simulate the procedure of run-time method dispatch.

下面为Dispatch机制说明，并给出对应的例子说明

Class Hierarchy Analysis (CHA)

Require the class hierarchy information (inheritance structure) of the whole program

需要首先获得整个程序的类继承关系图

Resolve a virtual call based on the declared type of receiver variable of the call site

通过接收变量的声明类型来解析Virtual call，接收变量的例子：在a.foo()中，a就是接收变量

Assume the receiver variable a may point to objects of class A or all subclasses of A（Resolve target methods by looking up the class hierarchy of class A）

假设一个接收变量能够指向A或A的所有子类

Algorithm of Resolve

call site(cs)就是调用语句，m(method)就是对应的函数签名
T集合中保存找到的结果
三个if分支分别对应之前提到的Java中的三种call类型
1. Static call(所有的静态方法调用)
2. Special call(使用super关键字的调用，构造函数调用和Private instance method)
3. Virtual call(其他所有调用)

调用类型	处理逻辑
Static Call	直接绑定到声明类的静态方法（唯一目标）
Special Call	通过`Dispatch`解析父类或构造方法（如`super()`调用）
Virtual Call	动态遍历接收对象所有子类，生成可能的目标方法集合

Static call

对于不了解OOP中静态方法可以参考这里。具体来说，静态方法调用前写的是类名，而非静态方法调用前写的是变量或指针名。静态方法调用不需要依赖类实例

Special call

Superclass instance method（super关键字）最为复杂，故优先考虑这种情况

处理super调用需要使用Dispatch函数，在下图所示情况中没有Dispatch函数时无法正确解析C类的super.foo调用

而Private instance method和Constructor（一定由类实现或有默认的构造函数）都会在本类的实现中给出，使用Dispatch函数能够将这三种情况都包含，简化代码

Virtual call

receiver variable在例子中就是c
对receiver c和c的所有直接间接子类都作为call site调用Dispatch

CHA的特征

只考虑类继承结构，所以很快
因为忽略了数据流和控制流的信息，所以不太准确

Call Graph Construction

Build call graph for whole program via CHA
- 通过CHA构造整个程序的call graph
Start from entry methods (focus on main method)
- 通常从main函数开始
For each reachable method 𝑚, resolve target methods for each call site 𝑐𝑠 in 𝑚 via CHA (Resolve(𝑐𝑠))
- 递归地处理每个可达的方法
Repeat until no new method is discovered
- 当不能拓展新的可达方法时停止
整个过程和计算理论中求闭包的过程很相似

Algorithm

Worklist记录需要处理的methods
Call graph是需要构建的目标，是call edges的集合
Reachable method (RM) 是已经处理过的目标，在Worklist中取新目标时，不需要再次处理已经在RM中的目标

class Main {
    public static void main(String[] args) {
        A a = new B();
        a.foo();       // 虚方法调用（可能指向B.foo()或A.foo()）
        C.bar();       // 静态方法调用（唯一目标C.bar()）
    }
}

class A { void foo() {...} }
class B extends A { void foo() {...} }
class C { static void bar() {...} }

步骤	操作	WL	RM	CG 新增边
初始化	`WL=[Main.main]`	[Main.main]	∅	∅
第1轮	处理`Main.main`	∅	{Main.main}	`Main.main → C.bar`（静态调用） `Main.main → B.foo`（CHA解析结果）
	新增`C.bar`和`B.foo`到WL	[C.bar, B.foo]
第2轮	处理`C.bar`（无调用点）	[B.foo]	{Main.main, C.bar}	无
第3轮	处理`B.foo`（假设其无新调用点）	∅	{Main.main, C.bar, B.foo}	无

Interprocedural Control-Flow Graph

ICFG = CFGs + call & return edges

ICFG可以通过CFG加上两种边构造得到

Call edges: from call sites to the entry nodes of their callees
Return edges: from return statements of the callees to the statements following their call sites (i.e., return sites)

Interprocedural Data-Flow Analysis

定义与比较

Call edge transfer
- 从调用者向被调用者传递参数
Return edge transfer
- 被调用者向调用者传递返回值
Node transfer
- 大部分与过程内的常数传播分析一样，不过对于每一个函数调用，都要kill掉LHS（Left hand side）的变量

example

对比过程内分析和过程间分析，明显过程内分析精度低

分析维度	过程内分析	过程间分析
方法调用处理	假设所有调用返回NAC（Not A Constant）	精确追踪参数传递与返回值
多态方法解析	无法处理虚方法调用	基于类层次分析（CHA）或指针分析解析目标方法
时间复杂度	O(n)（n为单个方法代码量）	O(k·n)（k为跨方法调用次数，n为全局代码量）
内存消耗	低（单方法内存模型）	高（需存储全局ICFG和跨过程数据流）

工具名称	分析策略	技术亮点
Soot	基于Spark框架的指针分析	支持多种上下文敏感策略
WALA	混合CHA与指针分析	增量式调用图构建
CodeQL	声明式数据流分析	跨语言过程间分析（Java/C++/Python）

参考

南京大学《软件分析》课程 07 （Interprocedural Analysis） - 哔哩哔哩

过程间分析简介 | Static Program Analysis Book (gitbook.io)

换个研究

interprocedural_Analysis

Interprocedural Analysis

过程间分析简介

Motivation

Call Graph Construction (CHA)

Call Graph Construction

Call type in java

Virtual call and dispatch

Class Hierarchy Analysis (CHA)

Interprocedural Control-Flow Graph

Interprocedural Data-Flow Analysis

定义与比较

example

参考