The kernel's BPF virtual machine is versatile; it is possible to load BPF programs into the kernel to carry out a large (and growing) set of tasks. The growing body of BPF code can reasonably be thought of as kernel code in its own right. But, while the kernel can check signatures on loadable modules and prevent the loading of modules that are not properly signed, there is no such mechanism for BPF programs; any sufficiently privileged process can load any program that will pass the verifier. One might think that adding this checking for BPF would be straightforward, but that subsystem has some unique characteristics that make things more challenging than one might expect. There may be a solution in the works, though; fittingly, it works by loading yet another BPF program.
内核的 BPF 虚拟机非常灵活;可以将 BPF 程序加载到内核中来执行大量(并且不断增长)的任务。不断扩展的 BPF 代码本身完全可以被看作是内核代码的一部分。但与此不同的是,虽然内核能够检查可加载模块的签名并阻止未正确签名的模块加载,却没有类似的机制来保护 BPF 程序;任何拥有足够权限的进程都可以加载任何能通过验证器的程序。人们可能会认为为 BPF 添加这种检查很简单,但该子系统有一些独特的特性,使问题比预期更具挑战性。不过,现在可能已经有了解决方案;恰如其分的是,它的实现方式是再加载一个 BPF 程序。
Loadable kernel modules are stored as executable images in the ELF format. When one is loaded, the kernel parses that format and does the work needed to enable the module to run within the kernel; this work includes allocating memory for variables, performing relocations, resolving symbols, and more. All of the necessary information exists within the ELF file. Applying a signature to that file is simply a matter of checksumming the relevant sections and signing the result.
可加载内核模块以 ELF 格式的可执行镜像形式存储。当模块被加载时,内核会解析该格式,并完成一系列工作以使模块能够在内核中运行;这些工作包括为变量分配内存、执行重定位、解析符号等等。所有必要的信息都包含在 ELF 文件中。对该文件进行签名仅仅是对相关部分进行校验和并对结果进行签名的问题。
BPF programs have similar needs, but the organization of the requisite information is a bit more, for lack of a better word, messy. The code itself is compiled as an executable section that is then linked into a loader program that runs in user space and invokes the bpf() system call to load the BPF program into memory. But BPF programs, too, need to have data areas allocated in the form of BPF maps, and they need relocations (of a sort) applied to be able to cope with different structure layouts on different systems. The necessary maps are “declared” as special ELF sections in the loader program; the libbpf library finds those sections and turns them into more bpf() calls. The BPF program itself is then modified (before loading into the kernel) so that it can find its maps when it runs.
BPF 程序也有类似的需求,但所需信息的组织方式可以说更为“凌乱”。程序代码本身被编译为一个可执行的段,然后链接到一个在用户空间运行的加载器程序中,通过调用 bpf() 系统调用把 BPF 程序加载到内存中。但 BPF 程序同样需要分配数据区,这以 BPF 映射(maps)的形式存在,并且还需要某种形式的重定位,以便能够应对不同系统上的不同结构布局。所需的映射在加载器程序中作为特殊的 ELF 段“声明”;libbpf 库会找到这些段并将它们转化为更多的 bpf() 调用。BPF 程序本身在加载到内核之前也会被修改,以便它在运行时能够找到自己的映射。
This structure poses a challenge for anybody wanting to implement signed BPF programs. The maps are a part of the program itself; if they are not established as intended, a BPF program might misbehave in interesting ways. But the kernel has no way to enforce any specific map configuration, and thus cannot ensure that a signed BPF program has been properly set up. Additionally, the need to modify the BPF program itself will break signature verification; after all, modifications to BPF programs are just the sort of thing this mechanism is expected to prevent. So, somehow, the kernel has to take a more active role in the loading of BPF programs.
这种结构给任何想要实现 BPF 程序签名的人带来了挑战。映射是程序本身的一部分;如果它们没有按照预期方式建立,BPF 程序可能会以各种有趣的方式出现异常行为。但内核没有办法强制任何特定的映射配置,因此无法确保一个签名过的 BPF 程序已被正确设置。此外,需要修改 BPF 程序本身的事实会破坏签名验证;毕竟,修改 BPF 程序正是这种机制本应阻止的行为。因此,内核必须在 BPF 程序加载过程中扮演更加积极的角色。
In-kernel BPF loading
The old-timers among us will remember that, once upon a time, the kernel's module loader lived in user space. Moving it into the kernel was one of many causes of extended pain during the 2.5 development cycle; 20 years later, some developers still hold a grudge against Rusty Russell for that experience. But those problems are long past and the in-kernel loader has long since ceased to create problems. So one might logically expect that moving the user-space BPF loader into the kernel would be a sensible approach to take.
内核内 BPF 加载
有经验的开发者还记得,曾经内核的模块加载器运行在用户空间。将其移入内核是 2.5 开发周期中造成长期痛苦的诸多原因之一;20 年后,一些开发者仍对 Rusty Russell 因此经历的做法耿耿于怀。但这些问题早已过去,内核内加载器也早已不再制造问题。因此,逻辑上人们可能会认为,将用户空间的 BPF 加载器移入内核是一个合理的做法。
According to Alexei Starovoitov in the cover letter to a new patch set, that approach has been tried in a couple of forms and “discarded after months of work”. Evidently an attempt was made to move libbpf into the kernel; it is not entirely surprising that this complex body of code did not fit comfortably there. Another idea was to create a new executable file format that contained, in essence, a series of system calls needed to set up a specific BPF program.
根据 Alexei Starovoitov 在新补丁集附带说明信中的描述,这种方法曾以几种形式尝试过,但“在几个月的工作后被放弃”。显然有人尝试将 libbpf 移入内核;考虑到这段复杂代码不容易在内核中适配,这并不令人意外。另一个想法是创建一种新的可执行文件格式,本质上包含了一系列用于设置特定 BPF 程序的系统调用。
The problems that were encountered while implementing that second approach are not spelled out. But the third approach, found in Starovoitov's patch set, can be thought of as a variant on that idea. Rather than have the kernel step through a series of system calls, though, user space loads a special BPF program that does that work instead.
在实现第二种方法时遇到的问题并未具体说明。但在 Starovoitov 的补丁集中发现的第三种方法可以看作是该想法的一种变体。不过,与其让内核逐步执行一系列系统调用,用户空间加载了一个特殊的 BPF 程序来代替执行这些工作。
Specifically, the patch set creates yet another type of BPF program — one that exists to execute system calls. This program will run in the context of the process that runs it, and will be limited to a small set of system calls; only bpf() and close() are allowed in the proposed patch set. This program will be expected to make the necessary set of bpf() calls to load and set up the BPF program that the user really wants to run.
具体来说,该补丁集创建了另一种 BPF 程序——专门用于执行系统调用的程序。该程序将在运行它的进程上下文中执行,并且仅限于少量系统调用;在提议的补丁集中,只允许 bpf() 和 close()。该程序将负责调用所需的一系列 bpf() 来加载和设置用户真正想要运行的 BPF 程序。
The generation of this “loader program” is done by watching what libbpf does to load the BPF program of interest and capturing each of the resulting bpf() calls. Those calls are then collected to generate the loader program, which will reproduce that work, from within the kernel, at the right time. So the kernel is, indeed, stepping through a canned series of system calls to load the program; it's just that this series is formatted as a BPF program in its own right.
这个“加载器程序”的生成方式是通过观察 libbpf 如何加载目标 BPF 程序,并捕获每一次产生的 bpf() 调用。随后将这些调用收集起来生成加载器程序,它将在内核内部在适当的时间重现这些工作。因此,内核确实是在执行一系列预定义的系统调用来加载程序;只是这一系列调用本身被格式化为一个独立的 BPF 程序。
The problem of patching the BPF program to find its maps is addressed in the usual way: adding another layer of indirection. An array of file descriptors is set up, and the BPF program references maps by way of that array. When the program is loaded, this array can be populated with the actual file descriptors corresponding to the necessary maps.
修补 BPF 程序以找到其映射的问题采用常规方法解决:增加一层间接访问。建立一个文件描述符数组,BPF 程序通过该数组引用映射。当程序加载时,该数组可以被填充为对应所需映射的实际文件描述符。
Next steps
As Starovoitov noted in the cover letter, this work is not yet a complete solution to the problem; it is a first step to show the direction that this work is taking. A big remaining piece is the offset relocations needed for BPF programs to access structure fields in a configuration-independent way. These relocations, too, require changing the BPF program text, so the solution may not be entirely trivial; more indirection-based schemes run the risk of imposing more of a performance cost than some users may want to pay.
后续步骤
正如 Starovoitov 在附带说明信中指出的,这项工作尚未完全解决问题;它只是展示了工作方向的第一步。一个尚未解决的重要问题是 BPF 程序需要进行偏移重定位,以便能够以与配置无关的方式访问结构字段。这些重定位也需要修改 BPF 程序文本,因此解决方案可能并非完全简单;更多基于间接访问的方案可能会带来比部分用户愿意接受的更高性能开销。
There is also, of course, the little matter of signing BPF programs and checking those signatures; this problem is not addressed in this patch set either. There is a brief mention of putting together a skeleton that would allow BPF programs to be packaged into a kernel module; if that were done, then the existing system for checking module signatures could be used for BPF programs as well.
当然,还有一个小问题是 BPF 程序的签名及其验证;这个问题在该补丁集中也没有涉及。文中简要提到建立一个框架,使 BPF 程序可以打包为内核模块;如果实现了这一点,那么现有的模块签名验证系统也可以用于 BPF 程序。
At a first glance, the BPF loader looks like a bit of a convoluted solution to the problem. It is worth noting, though, that this mechanism is not all that far removed from what happens in user space, where running a program usually involves launching ld.so to assemble the required pieces for that program to run. So there are well-established precedents to this sort of solution. Whether this design will make it into the mainline kernel is yet to be seen, though; this work is young and has not yet seen much review.
乍一看,这种 BPF 加载器似乎是一个有些复杂的解决方案。但值得注意的是,这种机制与用户空间中程序运行的方式并不太远——在用户空间,运行程序通常涉及启动 ld.so 来组装程序运行所需的各个部分。因此,这类解决方案已有成熟先例。不过,这种设计是否会进入主线内核仍有待观察;这项工作还处于初期阶段,尚未经过充分审查。