A Proposed Programming Language for Working with Rhythm

Douglas Eck
Departments of Cognitive Science and Computer Science
Indiana University

Introduction

I propose a programming language and underlying operating system, Entrainment And Reaction Programming LangUaGe (EARPLUG), which makes it easy to implement stimulus and response models where the stimuli and responses are patterns in time. Since I care mainly about music and speech rhythm, EARPLUG need only respond to auditory events. However it seems clear that such a system would be useful for modeling any cognitive phenomena which happen in time.

Here is a wish list to guide the creation of EARPLUG:

  1. Programmers shouldn't be forced to make simplifying assumptions about the shape or density of input. Rather, they should be able to change the kind of input they're using without a lot of recoding. For example, they should be able to switch from constant-interval click tracks (as in Povel and Essens) to 44.1kHz digital recordings of cafeteria noise without tearing apart a program.
  2. As with input, programmers should be able to switch easily between different kinds of output--everything from MIDI events for a soundcard to voltages for a robotic conductor's baton.
  3. The output should be available as input. For example, the voltages which drive a robotic conductor's baton might be relevant for input back into the system. Note that time lag is of concern.
  4. The system should gracefully keep up with real time, even when processing loads become overwhelming. This may mean throwing away unprocessed information. The desired behavior can be compared to a pianist accompanying a performer: a good accompanist will keep up with a performer even when a mistake is made. That is, the accompanist won't try to go back and correctly replay erroneous passages.
  5. Also, the system should support multiple processing entities and should allow the simulation of parallel processing. Note that this seriously increases the complexity of real time processing.
  6. Programmers should not have to code commonly-used tools from scratch. Since there exists no complete theory of music or speech rhythm, we don't have a full list. However, some things are clearly helpful: As new models and new methods come to light, corresponding toolkits can be developed.
With this wish list in mind, EARPLUG can be divided into four components. Following an outline of the components is a more complete write up of what each component will do. One may note that some components have much longer write ups than others; this is due to the fact that most of the important concepts of EARPLUG are embedded in the OS and the API.
  1. An Operating System (OS) that moves streams of information in real time. For the purposes of this discussion, the OS need do nothing more than shuttle information to and from the world. The shape of the information is not of concern to the OS; it need only keep track of one or more streams of bytes. However, the timing of the information is of central concern. The operating system should also support simulated parallel processing via threads. (Threads are processing entities which share common memory.)
  2. An Application Programming Interface (API) which allows low-level interaction with the OS. Specifically, the API should grant the programmer full access to the timing mechanisms of the OS. This would allow the programmer to change how the system responds in the event that processing demands outstrip processing power.
  3. A Programming Language built to work with the API. As far as I am concerned, this programming language can be Java and the API can be tailored to work naturally with Java.
  4. A Programmers Toolkit programmed using the Programming Language which provides the tools mentioned above (e.g. DSP tools). The toolkit, then, is a series of Java classes which exploit the API in useful ways. This toolkit can now be thought of as a set of objects which take as input a stream and produce as output a stream.

Operating System

Most of the complexity of EARPLUG is in the operating system. A robust flexible real time OS will deal with the hardest problem of gaining reasonable access to world time, enabling the system to react. To facilitate the kinds of temporal pattern processing necessary in rhythm cognition, the EARPLUG operating system has two important properties. First, as previously mentioned, it is a real time operating system. Second, since parallel processing is important in cognitive modeling, it is a multi-threaded operating system. These properties warrant separate discussion.

Real Time Processing in EARPLUG

The operating system's job is to facilitate the movement of streams of information in and out of the system in real time. This is no small job since the operating system must simultaneously service processes having nothing to do with rhythm. It must also handle interaction with input and output devices that introduce their own timing irregularities. These factors make it inevitable that processes which implement the cognitive model (called rhythm processes) will not sense events in the world as they happened, but rather with some amount of delay.

Consider how existing real-time OSs handle this problem of delay, or latency. Traditionally, most real time OSs (for example, an OS used to monitor engine parameters during lab tests of a diesel engine) solve the problem of latency by either doing everything in hardware and therefore negating the need for an OS at all or by implementing some sort of cascaded interrupt scheme so that the OS can respond to appropriate tasks at appropriate times. To use the engine testing example, if oil pressure rises above a certain threshold, it may be vital to shut the engine down within some number of milliseconds. It is important, therefore, that the OS have the ability to complete a context switch to the appropriate process in a set amount of time. Once the context switch is completed, it is up to the programmer to write code which is fast enough to shut down the engine in time. The important point is that traditional real time systems are concerned with guaranteeing at all costs some well-defined maximum latency.

The goal of EARPLUG is different. Since EARPLUG is meant to run on a wide variety of platforms I would prefer to allow a wide variety of latencies and build into the system a mechanism for working with whatever time quantization the hardware is able to provide. Also I would prefer to forego having fixed-duration quantization. If the system is to react to input without some intermediate buffering and timestamping then it is inevitable for a variety of reasons that the rhythm process will not see the data in perfect fixed-duration slices. For example, the rhythm process may get swapped out so that an operating system function can run. When the rhythm process gets swapped back in, the system shouldn't try to catch up with lost time but rather throw away (or perhaps compress) data and pick up where it left off. This is similar to what a pianist would do if she made a mistake while performing a hard passage. Playing the passage over would interrupt the rhythmic flow of the piece; it's better to simply move on.

Multi-threading in EARPLUG

Since parallel processing is an important topic in cognitive modeling, a thread package is of great value in EARPLUG. Multi-threaded systems allow many low-overhead processing entities to be serviced simultaneously without the relatively large time loss incurred in servicing multiple processes. This is true because, unlike processes, a group of threads share the same pool of main memory and so don't require as many pages of memory to be moved around. Also, multi-threaded systems lend themselves to multiple-processor configurations

But threads come at some price, because they introduce their own timing complexities. Specifically, threads introduce another kind of latency which makes it even less plausible to ensure some fixed quantization. Threads are generally scheduled in a pseudo-stochastic fashion with such variables as thread queue priority and thread state (alive, dead, waiting for data) interacting to make synchronous thread servicing difficult to achieve. Even if there were some mechanism to make the servicing of the threads synchronous during the time when the thread process has the chip, there is still no way to guarantee synchronous servicing overall given that eventually some other process will be serviced, taking all threads off line for a moment. Therefore, different threads will have different amounts of time elapsed between servicing. For example, Thread A may be serviced with the following time differences (in microseconds): [12, 4321, 13, 12] and Thread B may have these time differences: [12, 14, 4353, 12]. Note that a large time delay, presumably due to a context switch, affected Thread A on a different servicing than it did Thread B. This would happen because Thread B got serviced once more than Thread A did before the context switch happened. The main observation is that in a multi-threaded system, each thread has a unique view of real time because each is scheduled in a pseudo-stochastic manner and each is affected differently by context switches.

As was already suggested, the solution to this problem is the following: the operating system should be equipped to inform a real time thread exactly how much time has passed since it was last serviced. Once a thread is in a critical stage of processing (i.e. once it has committed to updating shared memory based on this reported time) the thread should be allowed some reasonable fixed amount of time to complete its processing without being swapped out. In this way, computation in the thread can take into account the fact that there may be slippages due to context switches and the like: unlike a traditional dynamical system, there would be no fixed Delta-T between each time step in the system, but rather a variable Delta-T which is supplied by the operating system. This scheme allows the rhythm threads to modify their computation accordingly as servicing times vary due to processing demands.

Furthermore, this scheme helps EARPLUG be easily adapted to different machine configurations. If EARPLUG is run on a fairly slow or fairly busy machine, the threads will not get serviced very often. Therefore, the computation will be coarsely quantized in time. But, vitally, the computation will still keep up with world time. Even as hardware changes, this scheme of allowing a variable time quantization and reporting the local changes in that quantization to each thread will allow the system to meet its primary goal: EARPLUG will be equipped to react to the world as best it can.

An Application Programming Interface (API)

An API provides the interface through which a programmer sees the operating system. In a programming language like Java, which sits on top of a virtual machine, it's hard to see where the API starts and where the language begins. For example, in a language like C or C++ threads would be part of an external API package provided by the creators of an operating system. In Java, the Thread class comes built in. Since Java doesn't have (at least not now) a good set of real time tools, it will be necessary to provide an API that joins the EARPLUG real-time OS to the EARPLUG Java variant.

I believe the only relevant addition is a set of classes which exploit the real time, multi-threaded aspect of the system. Specifically I want the programmer to have the following control over threads:

A Programming Language

As I mentioned before, Java is a good language and I see no reason not to use it. Java already offers good support for streams, file input and output, access to information on a network, object inheritance, function callbacks via object interfaces, exception handling and multi-threading. It is possible to build the toolkit (described below) entirely in Java.

One part of Java that deserves explicit mention is its support of streams. Java allows the creation of new kinds of streams of information and for the transformation of one kind of stream into another. In EARPLUG this would allow a programmer to take, for example, RealAudio from the Internet as a stream, filter it appropriately using the DSP toolkit, and then cast it into a stream that can be used by EARPLUG.

A Programmers Toolkit

As new models are proposed by the cognitive science community, EARPLUG should support fast, easy implementation of new toolkits. Some proposed toolkits were mentioned in the introduction. Here is a more complete description of one toolkit. This outline is meant to provide a sketch of how a toolkit would exploit the real time environment of EARPLUG.

A Connectionist Network Toolkit for EARPLUG

A connectionist network toolkit should support the creation of object-oriented real time connectionist nets. It needs to keep two communities of programmers happy. The first community requires a toolkit which supports fast creation of existing networks. To this end, the toolkit might provide the following pre-built simulations: And the toolkit should have demos of some popular tasks for these networks The second community requires a toolkit which supports fast creation of new kinds of networks. To this end, the toolkit needs to provide a good framework for the development of new research models. Though tastes vary, I prefer a wholly object-oriented approach with plug in modules. Furthermore, I want to push the core EARPLUG analogy of real-time information streaming through real-time filtering devices as deeply into the toolkit as possible. Accordingly I want to think about models like connectionist networks in terms of streams and filters as much as possible: In other words, a connectionist network is a collection of nodes which are filters (entities which listen to and change information in a stream). Those streams include input and output streams as well as connections between nodes.

Even this sparse description contains a lot of information about how a toolkit for EARPLUG would work. The key is that a toolkit do a good job of taking advantage of the real-time streams of data and real-time processing threads acting on that data. In fact, there are plenty of good ways to implement a connectionist toolkit (as well as all the other toolkits) and EARPLUG supports their creation. The default EARPLUG system would contain one implementation and then other programmers could add to that class hierarchy by creating better ones. A cursory visit to Gamelan (www.gamelan.com), the worldwide repository for Java enhancements, will show that allowing other programmers to add functionality to a good base system (EARPLUG API) holds a lot of promise.

Conclusion

One part of the question not dealt with in this write-up of EARPLUG is that of programming environment. It is certainly important to have a good environment, both for creating programs and for working with patterns in time. Anyone who has worked in an overly-engineered graphical programming environment knows what it's like to yearn for a good text editor and a decent shell. Also, anyone who has had to deal with the headaches of wiring audio mixing boards, parsing MIDI files, and ensuring clean data acquisition will agree that these hardware and data issues are a large concern.

In closing I wish to note that EARPLUG doesn't solve these problems nor should it. Monolithic systems are destined to be unworkable, and accordingly I believe that it is enough to wed an operating system to a programming language in order to achieve some good real-time performance. Forcing users into an EARPLUG environment as well would be too constraining. Besides, since EARPLUG uses Java as its underlying language, it should be compatible with current Java development tools. Concerning input and output signal manipulation, I hope that EARPLUG will blend in well with labs that already have good configurations and will lend itself to elegant new configurations. Consider as a contrast the task of building a lab around a Sparc workstation running Sun Solaris with C++ or Java as the programming language. This system offers no default byte-level (MIDI) input devices and no elegant way to get real time signals into or out of the system. Furthermore, since the OS and the programming languages are separate, the programmer is left with the task of learning how to use the Sun-specific native I/O libraries. Though EARPLUG offers no new hardware alternatives, I hope that the system's solution to the problem of real time reaction provides a foundation which helps cognitive scientists bring more complex and interesting ideas to bear on issues involving time.


A bibliography is available
Go to Doug's Home Page
deck@indiana.edu
Last modified: Thu Apr 9 12:13:49 EST 1998