A Proposed Programming Language for Working with Rhythm
Douglas Eck
Departments of Cognitive Science and
Computer Science
Indiana University
Introduction
I propose a programming language and underlying operating system,
Entrainment And Reaction Programming LangUaGe (EARPLUG), which makes
it easy to implement stimulus and response models where the stimuli
and responses are patterns in time. Since I care mainly about music
and speech rhythm, EARPLUG need only respond to auditory
events. However it seems clear that such a system would be useful for
modeling any cognitive phenomena which happen in time.
Here is a wish list to guide the creation of EARPLUG:
- Programmers shouldn't be forced to make simplifying assumptions about the shape or
density of input. Rather, they should be able to change the kind of input they're
using without a lot of recoding. For example, they should be able to switch
from constant-interval click tracks (as in Povel and Essens) to 44.1kHz
digital recordings of cafeteria noise without tearing apart a program.
- As with input, programmers should be able to switch easily between different
kinds of output--everything from MIDI events for a soundcard
to voltages for a robotic conductor's baton.
- The output should be available as input. For example, the voltages which
drive a robotic conductor's baton might be relevant for input back
into the system. Note that time lag is of concern.
- The system should gracefully keep up with real time, even when
processing loads become overwhelming. This may mean throwing away
unprocessed information. The desired behavior can be compared to a
pianist accompanying a performer: a good accompanist will keep up with a performer
even when a mistake is made. That is, the accompanist won't
try to go back and correctly replay erroneous passages.
- Also, the system should support multiple processing entities and should
allow the simulation of parallel processing. Note that this seriously
increases the complexity of real time processing.
- Programmers should not have to code commonly-used tools from scratch. Since
there exists no complete theory of music or speech rhythm, we don't have a full list.
However, some things are clearly helpful:
- DSP tools such as resampling algorithms, Fourier transforms, FIR and IIR filters
- Dynamical systems tools such as oscillators with
plug-in coupling functions and driving functions
- Connectionist network tools such as common learning algorithms
- Robotics tools such as functions for interacting with a digital-to-analog
robotic arm controller.
As new models and new methods come to light, corresponding toolkits can
be developed.
With this wish list in mind, EARPLUG can be divided into four components. Following
an outline of the components is a more complete write up of what each component
will do. One may note that some components have much longer write ups than others; this
is due to the fact that most of the important concepts of EARPLUG
are embedded in the OS and the API.
- An Operating System (OS) that moves streams of
information in real time.
For the purposes of this discussion, the OS need do nothing more than
shuttle information to and from the world. The shape
of the information is not of concern to the OS; it need
only keep track of one or more streams of bytes. However, the
timing of the information is of central concern.
The operating system should also support simulated parallel processing
via threads. (Threads are processing entities which share common memory.)
- An Application Programming Interface (API) which allows low-level
interaction with the OS. Specifically, the API should
grant the programmer full access to the timing mechanisms of the OS. This
would allow the programmer to change how the system
responds in the event that processing demands outstrip processing power.
- A Programming Language built to work with the API. As far as I am
concerned, this programming language can be Java and the API can be
tailored to work naturally with Java.
- A Programmers Toolkit programmed using the Programming Language
which provides the tools mentioned above (e.g. DSP tools). The toolkit,
then, is a series of Java classes which exploit the API in useful ways.
This toolkit can now be thought of as a set of objects which take as input
a stream and produce as output a stream.
Operating System
Most of the complexity of EARPLUG is in the
operating system. A robust flexible real time OS will deal with the
hardest problem of gaining reasonable access to world time, enabling
the system to react. To facilitate the kinds
of temporal pattern processing necessary in rhythm cognition, the EARPLUG
operating system has two important properties. First, as previously
mentioned, it is a real time operating system. Second, since
parallel processing is important in cognitive modeling, it is a
multi-threaded operating system. These properties
warrant separate discussion.
Real Time Processing in EARPLUG
The operating system's job is to facilitate the movement of streams of
information in and out of the system in real time. This is no small
job since the operating system must simultaneously service processes
having nothing to do with rhythm. It must also handle interaction with
input and output devices that introduce their own timing
irregularities. These factors make it inevitable that processes which
implement the cognitive model (called rhythm processes) will
not sense events in the world as they happened, but rather with some
amount of delay.
Consider how existing real-time OSs handle this problem of delay, or
latency. Traditionally, most real time OSs (for example, an OS
used to monitor engine parameters during lab tests of a diesel engine)
solve the problem of latency by either doing everything in hardware
and therefore negating the need for an OS at all or by implementing
some sort of cascaded interrupt scheme so that the OS can respond to
appropriate tasks at appropriate times. To use the engine testing
example, if oil pressure rises above a certain threshold, it may be
vital to shut the engine down within some number of milliseconds. It
is important, therefore, that the OS have the ability to complete a
context switch to the appropriate process in a set amount of
time. Once the context switch is completed, it is up to the programmer
to write code which is fast enough to shut down the engine in time.
The important point is that traditional real time systems are
concerned with guaranteeing at all costs some well-defined maximum latency.
The goal of EARPLUG is different. Since EARPLUG is meant to run on a
wide variety of platforms I would prefer to allow a wide variety of
latencies and build into the system a mechanism for working with
whatever time quantization the hardware is able to provide. Also I
would prefer to forego having fixed-duration
quantization. If the system is to react to input without some
intermediate buffering and timestamping then it is inevitable for a
variety of reasons that the rhythm process will not see the data in
perfect fixed-duration slices. For example, the rhythm process may get
swapped out so that an operating system function can run. When the
rhythm process gets swapped back in, the system shouldn't try to catch
up with lost time but rather throw away (or perhaps compress) data and
pick up where it left off. This is similar to what a pianist would do
if she made a mistake while performing a hard passage. Playing the
passage over would interrupt the rhythmic flow of the piece; it's
better to simply move on.
Multi-threading in EARPLUG
Since parallel processing is an important topic in cognitive modeling,
a thread package is of great value in EARPLUG. Multi-threaded systems
allow many low-overhead processing entities to be serviced
simultaneously without the relatively large time loss incurred in
servicing multiple processes. This is true because, unlike processes,
a group of threads share the same pool of main memory and so don't
require as many pages of memory to be moved around. Also,
multi-threaded systems lend themselves to multiple-processor
configurations
But threads come at some price, because they introduce their own
timing complexities. Specifically, threads introduce another kind of
latency which makes it even less plausible to ensure some fixed
quantization. Threads are generally scheduled in a pseudo-stochastic
fashion with such variables as thread queue priority and thread state
(alive, dead, waiting for data) interacting to make synchronous thread
servicing difficult to achieve. Even if there were
some mechanism to make the servicing of the threads synchronous during
the time when the thread process has the chip, there is still
no way to guarantee synchronous servicing overall given that
eventually some other process will be serviced, taking all threads off
line for a moment. Therefore, different threads will have different
amounts of time elapsed between servicing. For example, Thread A may
be serviced with the following time differences (in microseconds):
[12, 4321, 13, 12] and Thread B may have these time differences: [12,
14, 4353, 12]. Note that a large time delay, presumably due to a
context switch, affected Thread A on a different servicing than it did
Thread B. This would happen because Thread B got serviced once
more than Thread A did before the context switch happened. The main
observation is that in a multi-threaded system,
each thread has a unique view of real time because each is
scheduled in a pseudo-stochastic manner and each is affected
differently by context switches.
As was already suggested, the solution to this problem is the
following: the operating system should be equipped to inform a real
time thread exactly how much time has passed since it was last
serviced. Once a thread is in a critical stage of processing
(i.e. once it has committed to updating shared memory based on this
reported time) the thread should be allowed some reasonable fixed
amount of time to complete its processing without being swapped out.
In this way, computation in the thread can take into account the fact
that there may be slippages due to context switches and the
like: unlike a traditional dynamical system, there would be no fixed
Delta-T between each time step in the system, but rather a
variable Delta-T which is supplied by the operating system.
This scheme allows the rhythm threads to modify their computation accordingly
as servicing times vary due to processing demands.
Furthermore, this scheme helps EARPLUG be easily adapted to different
machine configurations. If EARPLUG is run on a fairly slow or fairly
busy machine, the threads will not get serviced very often. Therefore,
the computation will be coarsely quantized in time. But, vitally, the
computation will still keep up with world time. Even
as hardware changes, this scheme of allowing a variable time
quantization and reporting the local changes in that quantization to
each thread will allow the system to meet its primary goal: EARPLUG
will be equipped to react to the world as best it
can.
An Application Programming Interface (API)
An API provides the interface through which a programmer sees the
operating system. In a programming language like Java, which
sits on top of a virtual machine, it's hard to see where the API starts
and where the language begins. For example, in a language like C or C++
threads would be part of an external API package provided by the
creators of an operating system. In Java, the Thread class comes built
in. Since Java doesn't have (at least not now) a good set of real time
tools, it will be necessary to provide an API that joins the EARPLUG real-time
OS to the EARPLUG Java variant.
I believe the only relevant addition is a set of classes which exploit the
real time, multi-threaded aspect of the system. Specifically I want the
programmer to have the following control over threads:
- Programmable granularity of temporal processing
Programmers should
be able to override the default behavior of the operating
system to simply process as many threads as possible and instead kick into a
more traditional mode of guaranteeing a certain maximum latency. This may mean
that a program asks for better granularity than the OS can provide.
In this case the OS should kill the process and exit with some metric of required versus
available processing power.
- A mode allowing simulated time processing
It should be easy to take the system out
of real time by simply changing the type of underlying processing thread. Clearly, not
all cognitive modeling requires on-line real time processing and EARPLUG should
support turning this off. This simply means that the operating
system should ensure that (a) all processing is completed (all threads are waiting)
before advancing an input or output stream and that (b) the computation is known
to be off-line so that no attempt is made by the threads to gauge world time.
- Programmable priority of processing
Like a standard thread package, it
should be possible to mark certain threads as being more important than others.
- Thread-level access to hardware
Java doesn't support access to hardware, at least not
very well. I feel that access to hardware should happen on the thread level. The API should
support the creation of something like a AcmeRoboticArmThread which has access to
an A-to-D card running the ACME robotic arm. This idea of thread-level hardware support
has great advantages because it provides direct access to hardware without using
inelegant hacks to work around the Java virtual machine.
A Programming Language
As I mentioned before, Java is a good language and I see no reason not to use it.
Java already offers good support for streams, file input and output, access to information
on a network, object inheritance, function callbacks via object interfaces, exception
handling and multi-threading. It is possible to build the toolkit (described below)
entirely in Java.
One part of Java that deserves explicit mention is its support of
streams. Java allows the creation of new kinds of streams of
information and for the transformation of one kind of stream into
another. In EARPLUG this would allow a programmer to take, for
example, RealAudio from the Internet as a stream, filter it
appropriately using the DSP toolkit, and then cast it into a stream
that can be used by EARPLUG.
A Programmers Toolkit
As new models are proposed by the cognitive science community, EARPLUG
should support fast, easy implementation of new toolkits. Some proposed toolkits
were mentioned in the introduction. Here is a more complete description of one toolkit.
This outline is meant to provide a sketch of how a toolkit would exploit the real time
environment of EARPLUG.
A Connectionist Network Toolkit for EARPLUG
A connectionist network toolkit should support the creation of object-oriented
real time connectionist nets. It needs to keep two communities of programmers happy.
The first community requires a toolkit which supports fast creation of existing
networks. To this end, the toolkit might provide the following pre-built simulations:
- Simple Recurrent Networks (Elman Nets)
- Jordan Nets
- Oscillator Nets
And the toolkit should have demos of some popular tasks for these networks
- Word prediction tasks
- Temporal XOR
- Beat tracking of Povel and Essens-style patterns
The second community requires a toolkit which supports fast creation of
new kinds of networks. To this end, the toolkit needs to provide a good framework
for the development of new research models. Though tastes vary, I prefer
a wholly object-oriented approach with plug in modules. Furthermore, I want
to push the core EARPLUG analogy of real-time information
streaming through real-time filtering devices as deeply into the toolkit
as possible. Accordingly I want to think about models like connectionist networks
in terms of streams and filters as much as possible:
Class Network extends Object (A network is a top-level object in the system.
Class Connection extends Stream.
(A connection is a stream.)
Class OneWayConnection extends Connection
(One way connections are one-way streams.)
Class SymmetricalConnection extends Connection
(Symmetrical connections are two-way streams.)
Class Node extends Thread implements StreamFilter
(Nodes are filters. Instead of thinking of
them as containing input functions, activation functions, and output functions they are
thought of as entities which change the properties of the stream(s) which flow through them.)
Class InputNode extends Node
(Input nodes have the ability to transform different
kinds of input streams (e.g. RealAudio) into the kind of stream expected by the
rest of the nodes.)
Class OutputNode extends Node
(Output nodes have the ability to transform internal
streams into streams for driving such things as robotic arms and MIDI sound cards.)
In other words, a connectionist network is a collection of nodes which are filters (entities which listen
to and change information in a stream). Those streams include input and output streams as well
as connections between nodes.
Even this sparse description contains a lot of information about how a
toolkit for EARPLUG would work. The key is that a toolkit do a good
job of taking advantage of the real-time streams of data and real-time
processing threads acting on that data. In fact, there are plenty of
good ways to implement a connectionist toolkit (as well as all the
other toolkits) and EARPLUG supports their creation. The default
EARPLUG system would contain one implementation and then other
programmers could add to that class hierarchy by creating better ones. A
cursory visit to Gamelan
(www.gamelan.com), the worldwide repository for Java enhancements,
will show that allowing other programmers to add functionality to a
good base system (EARPLUG API) holds a lot of promise.
Conclusion
One part of the question not dealt with in this write-up of EARPLUG is
that of programming environment. It is certainly important to have a
good environment, both for creating programs and for working with
patterns in time. Anyone who has worked in an overly-engineered
graphical programming environment knows what it's like to yearn for a
good text editor and a decent shell. Also, anyone who has had to deal
with the headaches of wiring audio mixing boards,
parsing MIDI files, and ensuring clean data acquisition
will agree that these hardware and data issues are a large
concern.
In closing I wish to note that EARPLUG doesn't solve these problems
nor should it. Monolithic systems are destined to be unworkable, and
accordingly I believe that it is enough to wed an operating system to
a programming language in order to achieve some good real-time
performance. Forcing users into an EARPLUG environment as well would
be too constraining. Besides, since EARPLUG uses Java as its
underlying language, it should be compatible with current Java
development tools. Concerning input and output signal manipulation, I
hope that EARPLUG will blend in well with labs that already have good
configurations and will lend itself to elegant new configurations. Consider as
a contrast the task of building a lab around a Sparc workstation
running Sun Solaris with C++ or Java as the programming language.
This system offers no default byte-level (MIDI) input devices and no
elegant way to get real time signals into or out of the
system. Furthermore, since the OS and the programming languages are
separate, the programmer is left with the task of learning how to use
the Sun-specific native I/O libraries. Though EARPLUG offers no new
hardware alternatives, I hope that the system's solution to the
problem of real time reaction provides a foundation which helps
cognitive scientists bring more complex and interesting ideas to
bear on issues involving time.
A bibliography is available
Go to Doug's Home Page
deck@indiana.edu
Last modified: Thu Apr 9 12:13:49 EST 1998