Active messages were introduced in [100] and have enjoyed considerable attention [98, , , , ]. Especially on the CM-5, where the network interface can be mapped into the user's address space, active messages have a very small overhead compared to traditional message passing. This translates into very low message latencies. However, there are some characteristics that make active messages unsuitable as a general solution to the problem of fast responding handlers. For example, bandwidth is adversely affected by the necessity to always run a handler. A data transfer method that moves data without invoking a handler will get better bandwidth.
Most active message implementations use polling to achieve the low latencies reported in the literature. Adjusting the polling frequency introduces a tradeoff between fast response time and amount of overhead attributed to unneeded polling. In some cases polling complicates a program and in others it is not appropriate. For example, in a compute intensive application that spends much of its time in libraries such as the BLAS, polling can only occur between calls to the library; hence, handler response time is poor. Interrupt driven implementations of active messages suffer from context switch overhead.
Originally, active messages were designed for communication among the nodes of the same application. Sending the actual start address of a function combined with non-existant protection and recovery mechanisms, make active messages unsuitable for use between arbitrary applications, servers, and machines.
A new organization and application programming interface remedies the shortcomings of the original active message design [58], and tries to bring active messages into the mainstream. In principle, the receive part of an active message endpoint as described in [58] resembles a single block Puma portal with an attached handler. The handler is executed when the data has been deposited into the portal. A single block Puma portal consists of a memory descriptor specifying the start address and the length of a memory segment where data is to be stored or retrieved from.
This new model addresses security and flexibility concerns but only offers the traditional choices to handle incoming messages: polling or interrupt driven with the handler in the user's address space.