In addition to the well-known bottleneck of the memory copy from kernel to user space, three of the largest impediments to using commodity hardware and protocols are the interrupt pressures associated with Ethernet-sized frames, the high latency of messages using the TCP/IP protocol, and the high overhead associated with TCP/IP processing on a host processor.
One of the most effective tools for increasing the efficiency of high-performance computing has been the use of intelligent Network Interface Cards (NICs). Traditionally, offloading work onto the NIC has been an all or nothing proposition. For example, interrupt coalescing, a method which interrupts the operating system after a number of interrupts have been accumulated offloads none of the original processing work of the communication stack. Instead, it adds the extra work of counting interrupts at the NIC. On the other hand, the newest research considers offloading the entire TCP/IP communication stack onto the NIC. These extremes both come with their drawbacks.
In order to take full advantage of offloading, we must carefully determine which functionality in the communication protocol stack will give us the most benefits if offloaded. Some functions, like fragmentation and defragmentation or IP checksum, can be offloaded with positive results. However, other functions, like TCP checksum, gain little. I propose studying splintering a commodity protocol stack. Splintering is the process of determining which functionality to extract from the protocol stack and extracting it. By splintering the functionality of the TCP/IP stack, we retain the advantages of commodity protocols and gain the performance efficiencies of appropriate offloading onto peripheral devices making communication in high-performance computing faster and more flexible.
Our research group proposes to outline the general problem space. We will
investigate the performance metrics of high-performance computing and
study the advantages and the performance challenges with respect
to commodity hardware, operating systems and protocols. We have
completed work on the process of splintering and offloading IP fragmentation
and defragmentation as a means of relieving interrupt pressure. We have
developed a model by which to determine the trade-offs to splintering
parts of a protocol on a specific architectural environment (namely, a host
CPU with a PCI bus and an intelligent NIC). We propose to extend this model
into a numeric model which can estimate performance numbers given initial metrics). We will then determine which TCP functionality should be splintered in order
to relieve the inefficiencies associated with TCP. We expect to find, as others have, that offloading active TCP connections is the most effective way to lower latency for TCP. We will then propose a method of activating and deactivating TCP connections which will decrease the memory pressure associated with offloaded TCP connections without adding the latency of new connection setups.