next up previous contents
Next: Cost Model Up: Using Kernel Extensions to Decrease the Previous: Introduction

Key Ideas and Concepts

 

The candidate's ideas and insights for solving the problem and any preliminary results he may have obtained [55].

Based on the examples listed in the introduction, we define a common model that describes the execution path on a node. During a broadcast, each node receives a message that is to be distributed to its children in the fanout tree. (The originating node is at the root of the tree and generates the message.) Each node identifies its children when the broadcast message is received and then sends a copy of the message to each child. Therefore, for a broadcast a node needs to be able to receive a message, calculate the addresses of its children in the fanout tree, and send messages to them.

For a global sum, a node receives data from its children, sums it, adds its own contribution, and sends the result to the parent. Instead of sum, the operation can be any function, even one specified by the user.

The Cilk work-stealing example is, in principle, no different than the one above. A node receives a work-steal request, checks its task queue, and sends a message back to the originator. Checking the task queue, potentially removing an entry from the queue and sending a work order back, is slightly more complicated than determining the children in a fanout tree. Nevertheless, the basic principle remains the same. A node receives one or more messages, does some processing, and then, potentially, sends one or more messages. The routine performing the processing is called a handler.

Most remote operations in the Split-C runtime system require the update of a counter. A node receives a message, increments or decrements a counter and sends some data or an acknowledgment back to the originating node.

We should note that this work concentrates on efficient and quick processing of external, asynchronous events. Another approach is to have the user program or the runtime system occasionally check for new messages (polling). If a message has arrived, the corresponding handler is called. This approach works especially well on architectures where the network interface is mapped into the user's address space. The routine that polls for new messages can read the message and call the handler specified in the message. Assuming that the polling operation occurs with sufficient frequency, this avoids costly interrupts and allows for rapid dispatching of handlers.

On architectures where incoming messages generate interrupts and calls to the operating system kernel are necessary to send messages, the main advantage of polling disappears. Furthermore, for operations such as a broadcast or work-stealing where the currently running thread does not directly contribute (it is doing something else now), the response time is better and much more predictable, when an interrupt driven implementation is used.

There is another disadvantage of a polling implementation. Sometimes it is impossible to poll frequently enough. For example, during a long computation in a BLAS library call, the application writer cannot insert polling operations.

In Section 2.1 we will concentrate on a model that is common to the above examples and many others in daily use on high-performance systems. This section should convince us that executing a user defined handler in the kernel has performance advantages. In Section 2.2 we will look at different methods to execute untrusted code safely inside an operating system kernel.




next up previous contents
Next: Cost Model Up: Using Kernel Extensions to Decrease the Previous: Introduction

Rolf Riesen
Wed Jan 22 22:24:20 MST 1997