The BSD packet filter originated in the Xerox Alto. The motivation, implementation, and performance on various BSD machines is described in [66]. Packet filters are written in a simple stack-based language and inserted into the running kernel. For each incoming data packet, the filters are executed until one accepts the packet. There are no branch instructions and only a rudimentary set of operations. Words at constant offsets in the packet can be examined using comparisons and the three boolean operators AND, OR, and XOR. No other arithmetic operations are available.
Interpretation was chosen to make it possible to move packet filters from user space, with the associated context switches and kernel traps, into the kernel, saving that overhead. This enhanced performance considerably. The interpreter ensures that only words inside a packet are accessed and aborts faulty handlers (after a stack underflow for example). In general, though, security is not enforced. Any filter can accept any packet.
An improved version of the packet filter is presented in [61]. The instruction set has been extended and now includes arithmetic operators such as add, sub, mul, and div, as well as conditional branch instructions. The pseudo machine is now register based. It consists of an accumulator, an index register, a scratch memory store, and an implicit program counter. The authors claim that such a machine can be simulated faster on today's register-based RISC architectures. The performance gain over the earlier implementation seems to validate this claim. However, it may be that the language used in the first BSD implementation is just too simple. For example, it is not possible to read a word from the packet, mask certain bits, and then use this intermediate result several times in comparisons. Instead, the first BSD implementation needs to read the same word every time and has to redo any masking done before. The newer implementation can easily store the intermediate result and reuse it when necessary. Therefore, the claim that a stack-based virtual machine is not as suitable as a register-based virtual machine for modern CPUs is not substantiated.
Performance of both BSD packet filters degrades as more filters are added, since each is executed sequentially and independently. With several sessions active at the same time, many filters that differ only minimally (in matching the destination port number for example), have to be installed. This problem is addressed in [107]. The new filter mechanism is called MPF (Mach Packet Filter) and is an extension of the register-based packet filter [61].
To achieve scalability in the presence of many packet filters, MPF offers a new instruction that is reminiscent of a C switch statement. It replaces an instruction sequence in the register-based packet filter that is used to dispatch among various protocols. MPF collapses the new instruction present in all filters into a single, fast, dispatch routine. Therefore, no matter how many filters are present, the common dispatch code is executed only once. Another reason this is faster, is that several instructions can be replaced by a single virtual machine instruction. This lowers the overhead of virtual machine instruction dispatch.