My Blog

Personal Stuff

Interests and Projects

Papers

Schedule

Pictures

Laptop

Notes

Miscellaneous

Email: jhorey AT cs.unm.edu

Random Notes

Here I keep random notes and thoughts. Why not write these thoughts on paper? Well mainly paper is relatively expensive compared to most digital media. Also, if I publish these thoughts online, I don't need to worry about having and organizing my papers all the time. Admittingly it is difficult to publish stuff to the web when you don't have a computer handy... but I try to avoid those situations completely :)

08/28/03

Read/skimmed a paper called:

  • Title: Key Message Approach to Optimize Communication of Parallel Applications on Clusters
  • Authors: Ming Zhu, Wentong Cai, Bu-Sung Lee
  • Journal: Cluster Computing 6, 2003
Summary:

The paper described a method of reducing the communication overhead in network transmissions by prioritizing the messages. To do this, they reduce an application to a "task graph" (a weighted DAG), where each node of the graph represents a task performed on a node. The directed edges represent lines of communication. They then attempt to optimize the path taken through this graph, so that "more important" messages are processed first.

Questions:

Their framework uses this task graph to prioritize messages, but can we avoid such a transformation if the applications are capable of prioritizing their own messages? What type of queues does GM/Portals normally employ? What about applying key-message approach to GM/Portals? Compare Sockets-GM with/out key-message, etc. Also randomly... if they've implemented Sockets/IP over GM, what about Sockets/IP over Portals?

09/02/03

Just some random notes while I'm reading about factotum.

  • Factotum is a seperate program from the application and kernel, so that upgrades/fixes to the security protocol can be done with recompilation/relinking at the application level. So this means that factotum holds the authentication keys for the user applications. The keys are initially held in a "secure store" -> "secstore". So for instance, if I keep my email server key in the secstore, factotum just asks for my login once, retrieves all the keys, and allows my mail client to work without ever knowing about keys at all!
  • Factotum is implemented as a file-server (/mnt/factotum)! Like other Plan 9 services, we manipulate and access "virtual" files to access factotum's services. The kernel has a factotum process owned by the "host owner" user. However, local users can have their factotum process that don't even need to reside on the same machine.
  • Keys are stored at regular UTF-8 text using a name=value pair. Particular keys are accessed by providing a query to factotum. For instance a query may look like "server=silla.com proto=pop3 user=? !password=?". The proto specifies which factotum module handles this type of query and a specific key is chosen accordingly.
  • If we write the message "private" to a processes' /proc/pid/ctl file, the memory of that process is not accessible from the /proc directory. Also if we write the message "noswap" to the ctl file, that process won't be swapped out (so that you can't look at the memory on the harddrive). These messages exist so that you can't look at the unencrypted factotum keys either in memory or disk.

09/04/03

Just some more notes about Plan 9 authentication...

  • Authentication works in two layers. First a process must communicate with its factotum using factotum's file interface. For instance, for a user to create a key, a "key command" is written to /mnt/factotum/ctl. The actual authentication involves a file called the "rpc" file. Programs authenticate with factotum by writing a request to the rpc file and reading back the replies. Here is a typical "RPC transaction". Remember that the actual authentication protocol/algorithm is handled by factotum. In an RCP transaction, we are simply asking factotum to initiate a particular authentication protocol for us.
    • Here, a mail client attempts to connect to a mail server using the POP3's APOP challenge-response. First, the mail server contacts its factotum to obtain its "challenge".
      (S -> Fs): start proto=apop role=server
      (Fs -> S): ok
      (S -> Fx): read
      (Fs -> S): ok +OK POP3 challenge

      Now the server greets the client with the "challenge" it received.
      (S -> C): +OK POP3 challenge

      Now the client takes the challenge and produces a resonse.
      (C -> Fc): start proto=apop role=client server=foo.com
      (Fc -> C): ok
      (C -> Fc): write +OK POP3 challenge
      (Fc -> C): ok
      (C -> Fc): read
      (Fc -> C): ok APOP gre response

      Now the client has an APOP response that he can give back to the server.
      (C -> S): APOP gre response

      Finally, the server can take the response and give it to its factotum for final verificiation.
      (S -> Fs): write APOP gre response
      (Fs -> S): ok
      (S -> Fs): read
      (Fs -> S): ok +OK welcome
      (S -> C): +OK welcome
    • The actual athentication protocol that 9P uses involves the authentication file. This is the result of two main design goals of 9P authentication. First, the designers sought to remove the details of authentication from the 9P protocol. Second, the designers sought to allow an external program (ie: factotum) to handle the authentication part of the protocol.
    • Just a quick note: This protocol I am about to describe is a newer version. There exists an older 9P authentication protocol that resembles kerberos, but since I don't care about that old stuff, I won't mention it here.
      Also note: Even though Plan 9 is heavily file-messaged based, since the messages tend to be complicated, system calls were designed to format those messages for us. So for instance, the "mount" system call really just creates and formats the "mount" 9P transaction.

09/05/03

Took a little break from Plan 9 today and read some introductory webpages on the L4 micro-kernel and microkernels in general. Some interesting things done with L4 include L4Linux, which implements a binary-compatible version of Linux on top of the L4 microkernel. Overall, it seems that performance hits were small (5%-10%) compared to "native" Linux. However, it should be noted that the "platform-independant" parts of Linux weren't touched in this port. If more of Linux were modified to take advantage of L4, it could have been much better. Other projects include something called "SawMill Linux". They are attempting to build a very configurable OS.

Questions:
Implementing Plan 9 on top of L4?
Can we take a "SawMill" approach and build kernel modules for L4?
Would this lead to a lightweight kernel?

09/12/03

Here we continue Plan 9's authentication protocol. Since 9P is a file service protocol, they just created a new type of file that is served for authentication. This file is called the "authentication file". Initially, all clients can do is open an authentication file by sending a special message to the file server. This message is generated by the system call:

  • afd = fauth(int fd, char* servicename)
Note that "fd" is simply the file descriptor for the ESTABLISHED network connection to the file server. The "afd" is simply a file descriptor for the authentication file. Once you have "afd", you can read and write to it like a regular file. Once you have "afd", you can begin reading/writing the actual authentication protocol. Of course once you are authenticated, you can "afd" as evidence of authority in the "mount" system call.

Once you receive "afd", you usually pass that file to factotum to begin authentication. There are several advantages to using an "authentication file".

  1. Since we use only regular reads/writes, an outside agent can negotiate the security (ie:factotum)
  2. The authentication file (once authenticated) can be treated like a capability. The user can give it to another user/process to give it special permissions. The user can also store that file until a later time so they don't have to authenticate again, or just delete the file to ensure no one can use that authentication.

09/14/03

The L4Ka architecture reminds me of the architecture behind Puma. Is it possible to implement Puma on top of L4Ka? What about implementing Portals on top of L4?Something along these lines is implementing a persistence-server on top of L4Ka, which could come in handy (Grasshopper: an orthogonally persistent operating system, Computing Systems, 1994)

There are some performance issues still with L4 (see The Performance of micro-kernel Based Systems ) versus "native" Linux kernels. It's not significant, but I wonder what differences there would be between L4-Puma and "native" Puma?

Currently reading "On micro-Kernel Construction" to see if I can figure this all out. Need to read:

  • Puma overview
  • How to build servers on L4Ka

09/17/03

I've finished the Puma Overview paper. I am beginning to read the L4 system programmer's manual to see how difficult it is to program server threads for the kernel. I was able to succesfully boot the L4Ka u-kernel by compiling it with g++-3.2 and modifying my grub file to point to the correct place in the filesystem. If I want a Puma-like OS on top of L4, there will be three main obstacles:

  1. Portals
  2. The Q-kernel
  3. The Process Control Thread (PCT)

L4Ka-User's Manual: Chapter 2 IPC

Threads in L4 tend to be lightweight. This means they can created quickly and destroyed quickly. All IPC between threads obey the Chief/Clan security model (is this still valid for Version 4 API??). Two important notes:

  1. IPC is synchronized
  2. IPC is unbuffered
This basically means both sender and receiver need to expect and accept messages. The receiver then can create its own buffer for the data received.

There are three system calls for IPC (wait and send and a combo of wait/send).

  • l4_arch_ipc_send
  • l4_arch_ipc_wait
  • l4_arch_ipc_call

First we shall cover how messages can be passed to other threads. There are three ways to send data:

  1. Inline by-value data (uses first 8 registers)
  2. Strings
  3. Virtual memory mapping (by-reference,using fpages)
Essentially all messages contain an inline portion called the "short message". Now if we choose to only send register-data, we can ommit the message buffer. Otherwise, the rest of the message is called the "long message". We can differentiate between these messages by the "message descriptor", which we pass to the system call.

The "message descriptor" does one of two things:

  1. State that there is no long message
  2. Point to the address of the long message buffer and state whether fpages are used

Now the format of the "long message" is as follows:

  1. Message Header
  2. Mwords (fpage descriptors + by-value data)
  3. String dopes (indirect data)

The Message Header's role is to describe the format of the long message. I'm not quite sure mwords are (maybe because I'm not familiar enough with fpages?) . String dopes simply describe the size of a string and the location of the string (or empty buffer) in memory. Really it has two parts: 1) the sending string 2) the receiving string-buffer. Note that the message header specifies the number of string dopes.

The "timeout" constant is passed to the system call to control how long to wait to send or receive a message. If a certain amount of time passes before a message can be sent, the operation is aborted. Also you can specify what happens if an interrupt occurs during the IPC.

The "Result Status" is a message dope that gives information on the results of your IPC. It can return strings and an error value, where 0 means succes and other means failure (similiar to Unix errno).

Inline vs Strings

Inline must be aligned and copied to a buffer (not sure why if we are just copying to the register though). It can be faster because we can transfer via registers and we don't need to construct a string-dope. So it is better overall for smaller messages.

Strings avoid extra copying and don't need to be aligned. However we must make sure that the receiver's buffer is at least the same size as the sender's string size. Also we must ensure that the receiver expects the same number of strings the receiver plans on sending.

Concept -> System Call

  • dest -> The identity of the thread we are sending our message
  • snd_msg -> This is the message descriptor
  • snd_reg -> This is the short part of the message
  • timeout -> Obviously the timeout
  • result -> The structure that holds our result status

We should note that when sending an IPC we must know the thread ID of the receiving thread. Sigma0 is special so has a thread ID of "SIGMA0_TID" defined in sigma0.h file. Otherwise we may have to maintain a name server that maps strings to thread ID's both within tasks and between tasks. However most likely when we are communicating inside a task, we already know the thread ID's.

What the hell is nondeceiving/deceiving thread? It is when a task sends a message claiming that the message is from somebody else. For instance, if a chief relays a message from a clan-member of another clan, it can send a deceiving IPC to pretend that it is the original sender. That way the receiving thread doesn't know the message was relayed.

Here are some thread related system calls:

  • task_new - This is used to create a new task
  • lthread_ex_regs - This is used to switch instruction and stack pointer along with exception handler and pager
  • thread_switch - used to release the CPU and donate a time slice to another thread
  • l4_thread_schedule - used to change timeslice and priority of a thread
  • id_nearest - given a thread ID, it returns the thread ID of the actual thread the message gets to first (Chief/Clan)
  • fpage_unmap - Used to unmap or make read-only fpages mapped to address spaces

Thread Scheduling

Three components: 1) timeslice 2) priority 3) maximum controlled priority (MCP). Threads of higher priority will always run before threads of lower priority. The MCP is used to define who can manipulate the timeslice and priority of another thread. Specifically thread "A" must meet the following specifications to manipulate thread "B". A.mcp >= B.priority AND A.mcp >= B.new_priority. Another thing to note is that the MCP is TASK specific, not thread specific.

When sigma0 starts up all the intial servers (by looking at the boot image), the servers automatically get the highest possible MCP. All other tasks then get <= MCP of their initial server.

Using L4

Ok now we know how to create threads, schedule threads, and get threads to talk to each other. Here we will cover how to start L4 and talk about the initial stuff L4 does. Then we can talk about writing new servers for L4. Still need to learn more about pagers and exception handlers though :)

Sigma0

Sigma0 is the default pager and exception handler for all initial servers. It does NOT however allocate stack space for any of the initial servers. The servers must do that on their own. The initial servers are known by Sigma0 because they are marked in a special place in the kernel image. (I believe different implementations of L4 have a different kernel image format; generally though it is like header->kernel->initial servers->clients).

All initial servers can ask for certain resources on a first-come first-serve basis from Sigma0. These include

  • physical frames
  • device addresses
  • interrupts
  • tasks

Initially, the servers (or a single paging server) should consume all the memory provided by Sigma0 to hand out to other clients. There is some memory though that is reserved by Sigma0 such as the "Kernel Information Page" and "DIT" (this is L4/MIPS specific though... don't know about x86 stuff). The KIP stuff includes the kernel, kernel data, and miscellaneous information. The DIT part of reserved memory just describes the kernel boot image (I'm not sure how this handy for user-level servers or clients BTW). It consists of an initial header followed by 0+ initial server headers.

The KIP can be mapped read-only to several servers by the serving sending an short-message IPC to Sigma0 with an invalid memory address (defined as a constant in sigma0.h) in the first register. Something similiar can be done for the DIT. Another method is get the DIT is to first map the KIP and thenaccess that memory without mapping it first. This causes a page-fault and Sigma0 automatically maps that memory to the server. (Note that you can't do this with non KIP/DIT memory because some other server may have already mapped that memory and this would cause a livelock. DIT can be mapped to multiple servers, so it's ok). Finally, mapping a device is similiar. First you pass in invalid address in the first register (defined in some header file hopefully) and the device address in the 2nd register (this address is outside the RAM range).

Page Faults and Pagers

A page fault occurs when a task tries to access memory that is not mapped into its address space. A pager is thread that is called when a page fault occurs to determine what is to be done (often the pager just maps the address that caused the page fault). When a page fault occurs, the kernel takes the interrupt and performs a deceiving IPC to the pager of the thread that faulted. It is deceiving in that the IPC appears to come from the faulting thread. The pager then can map the appropriate fpages. However that IPC is actually intercepted by the kernel, which does the actual mapping and restarts the faulting thread with the new mapping.

Sigma0 is the root pager since it initially controls the entire address space. A user written pager must simply wait for an IPC (the page fault IPC), get the correct fpages for itself, and then send out another IPC to the faulting thread with those fpages.

Exception handling is treated similiarly to page faults. However if the exception handler does not reply back to the thread that caused the exception after receiving an exception, that thread will hang forever (it's essentially killed).

Any thread can be an interrupt handler. The thread simply must accept IPC's with a particular interrupt's number with a zero timeout. Interrupt handlers are assigned by a first-come first-serve basis. However it is possible to disassociate a thread from an interrupt (not sure how to do this... manual says to associate with a "NULL" interrupt, but I am not sure how this disassociates from specific interrupts instead of ALL interrupts). Since it is a FCFS basis, the initial servers should all have first (maybe only) pick.

OS's and more OS's on top of L4

Since an "OS" just consists of multiple (or single sometimes) servers sitting on top of L4, it is possible to have multiple OS "personalities" running simultaneously. We can even have a "System Call server" that takes a system call from a client (really it is just a library call that produces an IPC) and redirects that system call to an appropriate OS personality. The OS personality can then do its thing and send an IPC back to the client. All these IPC's can be deceiving so that the client thinks it only spoke to the System-Call server, and the OS personality only thinks it spoke with the client :) This trickery is what the book calls "RPC Redirection".

When an OS personality starts, it should do the following:

  • Register all free interrupts
  • Request all free memory from Sigma0
  • Request pages containing device addresses (unless some initial server acts as a device driver)
  • Set up memory management data structures
  • Acquire all inactive tasks
  • Set up other server threads
  • Set up device drivers
  • Set up service data structures (TCB, file sytems, etc)
I am not sure about some of this stuff. For instance, if more than one OS personality had to exist, how do you handle conflicts between the two? Give one OS priority over another? Also shouldn't the OS personalities try to share as large number of servers as possible, such as the file system?

Two things should be noted: 1) Since the OS personality is just a user level task, it can interrupted and preempted, so all OS datastructures should have concurrency control 2) All initial servers must have a special startup code that initializes their stack and stack pointer. Also it is possible to place client code within the kernel boot image, so that the OS will have some client to run after booting. One more thing... all client code should know the thread ID of the System-Call server, or else it won't be able to make system calls! Either the server that creates new client tasks gives this ID to the client, or the client can send everything to Sigma0 and let the Clan/Chief model handle it.

09/22/03

Besides taking a short break from reading the L4 User's Manual, I have discovered something called "lpIP". It is a small, portable TCP/IP stack that I think I can use for L4! Yay I no longer have to to rip apart the BSD code or rely on IBM!

New game plan:

  • Compile and boot L4Env with Pistachio/Fiasco
  • Write a simple application that uses L4Env
  • Start porting lpIP to use L4Env(especially semaphore library) and L4
  • What about a NIC driver for L4???
  • Write an application to use the network stack
  • Start on porting Portals...
Always Keep In mind that it is the network stack that must absolutely run well!

09/23/03

L4Env consists of a set of servers and libraries that interact with those servers and the L4 kernel. One of those servers is a "file server" called tftp that loads files from the network. Currently there are several ethernet drivers written for tftp. I should investigate whether we can use tftp's network code instead of lpIP, and/or use tftp's ethernet drivers for lpIP. At least we can look at the drivers as a tutorial to implement our own if necessary. Also, it should be noted that L4Env's semaphore library applies only to threads within a task. Inter-task locking is not implemented. Will this be a problem for lpIP?

Another thing, L4Env only comes with "fprov-14" as a file provider for local filesystems. It actually is a L4LINUX program that asks L4Linux for the file. This of course means that L4Linux must be installed and included as part of the L4Env library. This is obviously bad and a new file provider for local harddisks must be written. Fortunately somebody has already written an ext2 read-only file server which we can probabley adapt for L4Env. Finally, the nameserver that comes with L4Env is not hierarchical. We should replace that we a better nameserver.

Currently on x86 architecture is supported by L4Env. I am not sure which server/library is architecture dependant though.

09/25/03

Quick note: Pistachio supports the following architectures IA32,IA64, PowerPC-32, Alpha, and MIPS. They are planning to support AMD64, Power4, ARM, and UltraSparc. SMP and NUMA support is experimental at this stage for IA32 and IA64 architectures. Possible project: Get SMP support for AMD64 architectures?? BTW, in 32 bit architectures, a dword is 32 bits, while in a 64 bit architecture, a dword is 64 bits.

Since I have finished reading the L4 user's manual, I am now reading the L4 X.2 reference guide to see what changes have been made. Really I should construct a chart of system calls and conventions for both X.0 and X.2 so that I won't get confused. It might also be nice to have a table contrasting the Pistachio and Fiasco implementations.

First a huge difference. In X.0, we had to perform an IPC with a special invalid address to get the KIP mapped to a thread's address space. However in X.2, the KIP is automatically mapped when a new address space is created. The creator of the new task can specify where the KIP is to be mapped. A new system calls exists for threads to get access to the KIP. The X.2 KIP is upward compatible to the X.0 KIP.

Apparently the KIP is a little different in function than the X.0 KIP. I need to read a little more about this.

Another big difference. In X.0 and API version 2.0, we had two message registers that we could use. (I believe L4/MIPS allowed you to use 8 registers for the short message). However in X.2, the Pistachio whitepaper states you can use 64 message registers! This can potentially decrease some IPC's.

Also something I am not completely knowledgeable about... apparently in X.2, the task/address-space and thread management have been seperated. Threads are no longer tied to address spaces as tightly and can migrate much easier between address spaces. This of course also means the security model has somehow been updated. Do they still use Chief/Clan?

Related to this, X.2 no longer has a set limit of the number of threads in an address space. Of course in X.0, all the threads (exception lthread0) existed as inactive threads and we just "woke" them up. What is the equivalent now?

In X.2, threads have both a global and local ID. Global ID's are unique throughout the entire system, while local ID's are unique within an address space. A global ID consists of two sections: the thread number and a version number. Oddly enough the version number has no builtin semantics within the kernel. It can be used for any purpose. The local ID is differentiated between the global ID by the lower 6 bits being 0; the upper bits are used for the actual thread number. Both ID's are encoded using a single word. There are three special thread ID's: 1) nilthread 2) anythread 3) anylocalthread.

The Clock

The clock is a 64-bit word (struct Clock) that is can be seen by calling the SystemClock system call.

Virtual Registers

The X.2 API (and consequently Pistachio) has these things called "Virtual Registers". These objects are abstractions over hardware registers and possibly memory (if there are more virtual registers than hardware registers). Also it is important to note that virtual registers cannot be accessed across address spaces (I am not sure then how inter-task communication is done via registers). There are three types of virtual registers.

  1. Thread Control Registers
  2. Message Registers
  3. Buffer Registers

Thread Control Registers

Contains static control information about the thread. This includes stuff like the pager and exception handler. New system call: ExchangeRegisters. Can be used to exchange registers with another thread or change a single thread's registers. One thing to note: when a thread is created (via ThreadControl system call) it immediately expects an IPC from its pager specifying the IP and SP.

There are two special fpages: 1) entire adress space 2) nilpage. Just keep in mind an fpage just describes a region in the address space. They are an abstraction over hardware pages. The smallest possible fpages is often the size of the smallest possible hardware page. An fpage can be mapped with read/write/execute bits on/off provided the mappee has sufficient rights to take advantage of those privileges.

Here's a funny system call: SpaceControl. It is used to control address spaces. Oh apparently address spaces do not have ID's anymore (what happened to task ID's ?). So they use a "SpaceSpecifier" as the ID of an address space (really it's just a thread ID of a non-active thread). I am not sure how this works out. Other interesting system calls include ProcessorControl and MemoryControl, that obviously controls the processor and memory (fpage properties).

It seems the IPC code has been cleaned up a bit. Specifically they added 64 message registers so most short - medium length messages can be sent much faster. It should be noted that once a message is read from a register, that message is essentially deleted. The first message register (MR0 - Message Tag) is used to state how many typed and untyped message there are along with some flags that are passed.

Thankfully IPC has been simplified greatly if we want to map/grant an fpage or send strings. We can send MapItem, GrantItem, and/or StringItems. All three of them are two word structs, so that one of them takes up 2 message registers. However StringItems can consist of a linked list of "substrings" (compound strings), so that they may take up more than 2 message registers.

Obviously with these new structs and message format, the IPC system call has been changed a little.

09/30/03

Buffer Registers hold StringItems for the receiving end. So the first string received is placed in the string buffer specified in BR1, etc.

Question: If you look at the grub menu.1st file used to boot Pistachio, it appears as thought the root server and Sigma0 are not differentiated between the clients (Pingpong). This kinda makes sense considering everything is user level (whether it's a "server" or a "client"). However, the kernel must differentiate between normal "user servers" and "privileged servers". I am not sure how this done...

10/01/03

Now that I am *done* looking through the manual and specification, I am going to start trying to implement some servers. Unfortunately, it appears that most examples on how to set up a multi-server OS personality uses either Fiasco or Hazelnut, and not Pistachio. This could be because Pistachio is still pretty new. As such I will use Hazelnut for the time being. Once I get a handle on that, I can begin porting my servers to Pistachio.

Here are my short term goals:

  • What is IDL4 used for?
  • Is it necessary and/or helpful?
  • Create multiple servers using ChacmOS and IDL4
  • Understand what the Makefile for ChacmOS is doing
  • Are there are any servers in ChacmOS we don't need?

Here's what I'm learning about the build environment in ChacmOS. All servers are contained within its own directory in the top-level. Within each directory are three files: 1) main.c 2) calls.c 3) server.c. Besides these files there is also an associated "IDL" file called server.idl contained within the idl directory in the top level.

main.c

main.c contains the code that actually does the work. So for instance, this is where you would print out "hello, world" in a hello-world server.

server.idl

This file contains the IDL interface for your server. This is usually written in CORBA. Read the IDL4 manual to get a better idea how to do this. Basically you have several function definitions in your interface. Presumably these functions are how clients communicate with your server (although I am not quite sure about this). Given this IDL interface, the IDL4 compiler generates two files: 1) server_client.h 2)server_server.h. These header files are included in all source files that use the server either as a client or actual server. So for instance, "server_server.h" is included in "main.c". If my server uses another server then it would included "otherserver_client.h", etc.

calls.c

This is the file that actually implements the interfaces defined in the IDL file. Say that we have an function called "implements" in your IDL file. You would have a corresponding function called "server_implements_implementation" in your calls.c file. It should be noted that "int" in your IDL file is translated to "CORBA_long" in the calls.c file (not sure why there is such a weird name discrepency). Finally, once that function is deinfed in calls.c, you would usually see a line like: "IDL4_PUBLISH_EXAMPLE_IMPLEMENTS(server_implements_implementation)". This essentially "publishes" the fact the function exists (or at least I think it does).

server.c

Yeah... I'm not sure the hell this file does. Is it generated by IDL4 compiler? Here's a piece of code that appears in every (maybe) server.c file:

while (1) { idl4_reply_and_wait(reply, buffer, partner, msgdope, fnr, w0, w1, w2, dummy); if (msgdope & 0xF0) break; idl4_process_request(james_vtable, fnr & EXAMPLE_FID_MASK, reply, buffer, partner, w0, w1, w2, dummy); }

Misc

Question: What is the relationship of the functions defined in the IDL file and main.c? I understand that when the server runs, it's main.c is called. However when do the functions in the IDL file get called? Question2: In namesrv's main.c, an "advertise" function is called. Where is this function defined?

It appears as though the idl4_reply_and_wait is receiving a message. What's the content of this message? Perhaps we can print the contents of the message to the console?

10/02/03

After reading the IDL4 README, I think I have figured some stuff out. First, it turns out that main.c can be thought of initialization code before the actual server starts. The call to example_server(), actually begins the process of the server accepting/sending IPC. Of course this IPC functionality is programmed in the file: "calls.c" and defined in "example.idl". Finally, remember that funny file "server.c"? Well that code is generated by the IDL4 compiler and contains the code that actually knows how to accept and send messages. Basically, we shouldn't really have to write or touch "server.c". It should be noted that doing an "idl4 example.idl -t" will create a single file called "example_template.c" that really is just a combination of "server.c" and "calls.c". So it behooves the user to take that file and split it into two.

Now since I have a better idea of the framework, I will begin reading the IDL4 manual. Also, I am trying to compiler ChacmOS... it seems I am having trouble with the Hazelnut source. Later... I did ChacmOS to compile, by using the ChacmOS that was included with the IDL4 compiler. Also, I got the Hazelnut kernel to compile with some pain by using gcc-2.95 and manually including the "types.h" file in several header files... I still can't get rgmr to compile though :( I am tempted to believe that there is a newer version of the Hazelnut source somewhere... but I can't seem to login to their damn CVS server.

Later this evening

I was able to download the cvs version of the Hazelnut source. That seemed to fix compiling the kernel. However, I have still can't compile rmgr. Also when I try to boot a pre-compiled version of Hazelnut, I get an error. So overall, I am having a hardtime using Hazelnut. This is really stupid in that Pistachio seems to boot up just fine. Why can't a version of ChacmOS exist for Pistachio?! So at this juncture, we must do two things: 1) try to get Hazelnut working with ChacmOS 2) see what it will take to port ChacmOS stuff to Pistachio. At least the latter will need a better understanding of how the ChacmOS servers' Makefiles interact with the Hazelnut source.

10/03/03

Right now I am attempting to use the components from ChacmOS, L4Env, and lpIP to build an OS on top of Pistachio. However, since L4Env already uses OSKit to a degree, I wonder whether I can use more of OSKit. I've been to their webpage and it seems their latest "snapshot" is over a year old. It seems nice... perhaps someone else is continuing work on that project. Some bad things about OSKit (besides being a little old) is that it is relatively large and may contain many things that we don't really care about.

The most interesting bits of L4Env seems to be the C++ library and the Linux device driver library. The Linux DDE is essentially a library that emulates the Linux kernel interface expected by native Linux device drivers. That way, we can use existing Linux drivers in L4. I am not sure though whether it requires L4Linux to also be included to work. Perhaps it just needs some linux header files?

I should compile a list of the various components I am looking into (L4Env, ChacmOS, lpIP, OSKit, etc) and compare those libraries with pros/cons. Currently reading stuff about MultiBoot bootloaders and OS images (like grub and L4) along with some stuff on ELF binary format. Also read some stuff on OSKit, which sounds promising (especially the file system,network stack stuff, and posix library).

10/06/03

Special note: what about using uClibc as the c-library in L4 instead of anything L4Env or OSKit offers? UClibc is probabley much smaller than OSKit's stuff and more complete than L4Env's stuff... URL to a neat article is: IBM's Lightweight Linux

Ok, I have begun porting ChacmOS to use Pistachio. So far, in the Makeconf file, I've had to replace the "x0" strings with "V4" to make sure idl4 creates Version 4 stubs. I also outlined several differences I've found so far in the README files in both pistachio/lib/ and pistachio/ directories. I also replaced an include line in the file "/usr/local/include/idl4/idl4.h", so that it looks for the V4 glue code no matter what. Perhaps this isn't necessary, but for some reason it was always looking for X0 glue code otherwise. I am still having compilation problems however... basically I can't find "CORBA_free" declared anywhere, and I inadvertently got rid of "malloc.c". (Actually I found CORBA_free... it was in stdlib.h) We may have to first port "stdlib" to Pistachio (which shouldn't be too bad). Mainly, since Pistachio already includes many "io" files (see the README), we can just make stdlib.h include "l4io.h". In fact I think the only thing we need to add is "malloc" and associated functions.

It appears as though both the Pistachio and hazelnut kernels have a "putc", "print.c", etc. So why did the ChacmOS person write their own stdlib? Linking problems? The solution may lie in how the kernel io stuff is compiled and linked. In fact I am led to believe that many of the functions the ChacmOS wrote is a straight "SCO" rip of the original kernel source! As such, instead of modifying his old files, I will just copy the io files from the kernel source.

Later that evening...

I have copied the kernel io stuff, and it seems that libstd.a has compiled. Not sure if it actually works though.

The panic macro/function seems to be missing from stdlib.h ? I think I will copy the panic code from PISTACHIO/kernel/include/debug.h. I say it is "missing" because I was getting an "undefined reference" error. I'm not sure how the Hazelnut version dealt with this considering I didn't really change anything.

10/07/03

I created two directories for Pistachio-ChacmOS. The first one tries to port the stdlib that ChacmOS uses; the other one just tries to use and just ports string.cc and malloc.cc. I think the second approach seems more sane at the moment. Of course I will eventually port a C-library, perhaps uClibc, but that doesn't seem necessary now. Also it appears as though the makesfiles in ChacmOS doesn't link with libl4.a and libio.a. So I must modify that sometime tomorrow. Perhaps that will remove some of the linking errors with panic and __IPC...

10/09/03

When a thread maps an fpage to another thread, how is the global page-table updated? Does this update have to go back to Sigma0 to update the table? Isn't that inefficient?

10/21/03

I asked Patrick why the LWK was not looking at the BSD kernels (and just the Linux kernel). Apparently BSD isn't as popular in the HPC world. This is pretty unfortunate, as the FreeBSD dev kernel seems to be on par with the Linux 2.6 kernel. Apart from the whole LWK business, it would be a nice project to port the Portals library to FreeBSD.

Largely due to homework and a 561 test, I haven't done much research the past couple weeks (notice the change in dates). I will try to pick up where I left off... this means I am trying to compile a minimal server using instead of the given stdlib. I must either force ChacmOS to link with those libraries, or just move those libraries over to the other ChacmOS lib directories.

Also, I think I am going to look over the build system for just plain Pistachio. Initially I thought that the build system for ChacmOS was nicer (and it really is), but the original Pistachio system may be more flexible.

11/05/03

Well I can compile the "james" server by linking "l4io" and "l4" libraries. I am in the process of finally porting the bootloader code. The reason why I haven't been taking notes for so long is because I have been reading some OS stuff. I felt that I needed to understand some of the basics a little more before I head into this. Now I think I need to seperate ChacmOS's bootloader code. It currently serves as the bootloader and "task server". I think it's better to have just a bootloader that loads a seperate task server. Also I think I need to write down the list of absolutely essential servers needed to get Portals to work. For instance, at this stage a disk block driver and file manager are not *strictly* necessary, since we can share information over the network. First step to do this is asking: what is strictly necessary to get lwIP working? I don't think a threading library is necessary if we assume that this benchmark OS will only run a single non-threaded application (some silly benchmark). This may change if we need to port MPI or something... but I think that is a later concern.

Here is a hierarchical rundown of absolute things we need for a single application OS:

  1. bootloader
  2. lwIP
    1. semaphore library
    2. mailbox library
    3. linux device driver
  3. test client
In this scheme the bootloader assumes certain other roles besides just "init". First, it must act as the memory manager. It asks Sigma0 for all the avaialable memory and acts as a process manager by creating the lwIP server and test client with the memory. I'm assuming that L4's builtin scheduler will suffice so that I don't need to implement one.

BTW, what the hell is the LINK=00x00800000 line in the bootloader Makefile? Obviously, it is a way to tell the linker to link this module at a particular place in the absolute image, but why? Perhaps this address is different for Pistachio vs Hazelnut? Is the roottask always linked at that address?

Porting lwIP may be difficult to port just because of the lack of documentation. Perhaps RTLinux's lightweight stack may be easier?

11/10/03

Just some thoughts on the previous post. We don't need to implement a virtual memory pager, since I will assume that the address space of the entire process will stay in RAM. This isn't a big deal since we only run one client process. Also our memory manager should be very simple. In fact, I'm not it's even a server. Once it allocates fpages to the client process, it never dynamically allocates fpages ever again. As such, our "pager" is also really simple (if not non-existent). Since the client doesn't request any new fpages, the pager doesn't do anything. If it a page fault, it just ignores it.

Milestone 0: Port bootloader and implement a simple client. Also, the ChacmOS bootloader contains "block device" code used for the file system. Right now, I should take that code out, since it just makes things more confusing. We may have to change bootloader so that it requests all the memory from Sigma0 - this is essentially the top half of our "memory manager". The bottom half is just allocating our fpages to the client appropriately. That way when anything page faults, no pages are given out without bootloader's permission. I don't think that "auxpager" is really necessary, we can make Sigma0 the default pager for the client right now.

11/11/03

I've begun stripping the bootloader of the block code and trying to understand overall how it works. Overall not very fun work. There are several small things I don't understand. The largest of these small things is what the hell "tasks" array is used for. Wait... I know what it is. It just keeps track of the existing tasks because originally the bootloader was also the "task server". We can get rid of it!

11/12/03

Something that isn't clear in the bootloader code is where the multi-boot headers come from. In the bootloader file, it automatically assumes that the file is there. Don't we have to look in the KIP to get that information? Also what is the purpose of elf.c ? Do we need that file if we no longer have a file server? Of course we still need a header defining Elf32 structures...

I noticed that looking at SCOS root_task code is helping understand ChacmOS's bootloader code, since they both do much of the same thing (task management and init). This should be used when we need to implement our own task manager and virtual memory manager...

11/13/03

ChacmOS and SCOS start the new ELF process in a different way. ChacmOS first creates an empty task, then sends IPC to map new fpages (via pagefaults), and then finally sends an IPC with the beginning of the code segment to start the task. SCOS first parses the ELF headers and starts the new task with the appropriate values. Not sure how SCOS maps the fpages the task needs though.Overall, it seems they just differentiate between mapping protocols.

11/14/03

ELF webpage

Today I am investigating ELF program headers to try to understand the bootloader code. Here are some preliminary notes on the ELF format. There are three header areas, 1) file header 2) program header(s) 3) section header(s). The actual program code is in between the program and section headers.

The file header is a struct with the name:Elf32_Ehdr. Within the file header, there are three values of importance to us: 1) e_entry - this specifies the location of the _start label in the code, 2) e_phoff - shows where the array of program headers is in relation to the start of the executable, 3) e_shoff - same thing as e_phoff except for the section headers.

Program headers describe the sections of the program that contain executable program code to get mapped into the program address space as it loads. The program header is a struct with the name:Elf32_Phdr. Usually there are at least two of these headers arranged in an array. The program header is mainly used to construct the process image, (while the section headers are used mainly during the linking phase).

12/05/03

This is stuff continued from the last entry. I realize it has been a while, but there was a very large Prolog project due... not to mention that I've been busy with end of semester crap.
Ok after the array of program headers come the ELF Body. The actual locations and sizes of portions of the body is described by the program headers. The body is the thing that actually contains the executable instructions, string constants, and global variables.

From SCOS code, it looks like the BSS Segments start after the all the code blocks. The two methods I am not sure about right now are "init_segments" and "copy_mem_to_segment" from SCOS roottask. Possibly might help if I knew what a "vm_node" was... It seems that init_segment is simply collecting information concerning the blocks and storing them in some sort of struct. Of course of this struct resides in memory and contains the actual code, we should be able to "jump" to it, but I am not sure that this is what this struct does...

12/09/03

It turns out that the ChacmOS code has something a little more interesting than just the bootloader that I may be interested in using. There is a file called "elf.c" in the booter directory that contains a method to launch an elf file from the file system. Of course this means that you must implement a file system to use this launcher but it is essentially an implementation of fork/exec. Might come in handy later... Also useful to understand more elf code :) Also something else that might be handy is the "readelf" program in Linux.

Oh I also think the trampoline code is mapped into every process so that all processes have a uniform way of being started using IPC. So for instance, the trampoline code just has several ipc_receives to accept the elf code blocks. Although I am not sure where/when the trampoline code is being mapped or where the trampoline code is even being defined...

Here's a question/remark: so to start a new task we need to hand the address to the stack and the first instruction. So theoretically that is easy to do. What happens if the thread then tries to access the page if that page is already mapped by the parent process? Will it call the pager or just not do anything?

01/24/04

I installed Plan9 on my home machine and have transformed into a CPU server following the instructions found on the Wiki page. Overall I think the setup is correct. I plan on installing the same setup on two of the lab "bulk" machines, so that I can run some tests. There are a couple issues I am encoutering however: 1) when the cpu kernel starts, I get a "null list in concatenation" error from cpurc 2) I can't seem to get rio to start on the cpu server (although it's not that important that I do) 3) the terminal kernel doesn't seem to like my usb keyboard although the cpu kernel doesn't mind 4) finally, when I boot into the terminal server, it thinks it sees a network card (/net/ether0 exists), but the cpu kernel doesn't see this.

Besides fixing these problems, I still need to investigate the Alef and C programming environments. I should also ask the mailing list of any MPI equivalent library (possibly Alef).

Note: I did eventually fix the stupid "null list" error by manually setting the sysname in cpurc.

02/13/2004

The terminal is having a hard time connecting to the CPU server. When I attempt to connect I get a "auth server protocol botch". Anyway, I'm going to try to fix that today.

Ok I fixed the authentication thing by adding two things: 1) a dom=.. entry in both /lib/ndb/local files (cpu server and terminal). 2) resetting the nvram thing in the CPU server and re-adding bootes. After that both auth/debug and cpu commands work from the terminal and local cpu server.

Questions to answer: How to actually make a terminal use a CPU server? How to mount a remote namespace/filesystem? How do CPU servers authenticate? How to connect multiple CPU servers? Finally, learn how to program in Plan 9.

02/15/04

Ok I think I understand how the ethernet and protocol files work. The library functions are also fairly straightfoward. I will most likely try to program a simple server and client that uses TCP/IP tomorrow using the library functions. I should also probabley try to program something that uses the raw ethernet device. Finally, it may be useful to see the actual command protocol for the protocol files so that we can write a shell script to write a simple server/client (wouldn't that be cool?) instead of using the library functions. I am a little confused on the notation that Plan9 uses for some network stuff. For instance, what does "net!foo.bar.com!login" mean exactly? I think the CS server translates that to "real" addresses like: "tcp!192.168.1.1!8080", but I am still unclear what the semantics are.

I need to better differentiate between mount, bind, and import (exportfs). To help me do this, I will attempt to use these commands explicitly tomorrow. Here are current thoughts:
import - maps namespace from remote system to local system. It does this by starting exportfs on the remote system and then uses "mount" to connect to files served by exportfs.
mount - Attaches a directory served by a file server to a local directory. (similiar to Linux amount except replace device with a file server).
bind - takes two arguments "new" and "old". It is essentially like a symlink that makes "old" point to "new". According to man page, "old" is an alias for "new" (which already exists in namespace). Presumably, this is a per-process namespace change.
Presumably, import is only used when we want to import a namespace that is NOT being served by a file server already (otherwise we could just use mount). The most common case is when the CPU server uses import to map the calling terminal's namespace. I also need to play with the "cpu" command since this is apparently how one uses the CPU servers (try to take advantage of the fact it exports its namespace to the server).

So we need to do the following things:

  • Connect to CPU server using "cpu" command and make it use something in terminal's namespace
  • Use the import command
  • Answer question: Do we use import or mount to grab remote file systems?
  • Use the mount and bind command
  • Start the programming

02/16/04

Ok, I think I learned a lot about Plan 9 today. I was able to use the "cpu" command from the terminal. The CPU server asked for authentication and logged me in (also mounted term's namespace in /mnt/term). Similiarly I was also able to use "import" to grab some piece of the CPU's namespace and map onto the terminal's namespace. I was also able to use the import command from the CPU server to grab stuff from the Plan 9 terminal (I did have to modify termrc however to listen for commands). My last experiement was to use the mount command to mount the CPU server's files by issuing: "srv il!cpuipaddr, mount -c /srv/il!cpuipaddr /n/cpu". If the terminal's user exists in the fileserver everything should be ok, but since 'jhorey' didn't exist in the fileserver, I kept getting an error. After adding 'jhorey' to the fileserver, everything worked as expected.

One question/remark: when I mounted or imported stuff from the Terminal on the CPU server, it never asked for authentication. Why? Also, I wonder if there is a way to make the CPU server ask for authentication before shelling out a terminal so that a person sitting there couldn't mess things up.

02/17/04

Programming projects:

  • C client using library functions
  • C server using library functions
  • Terminal client using files
  • Terminal server using files
Instead of being imaginative, we can probably borrow clients/servers from Stevens book. In fact we should probably read that book anyway to understand network programming better (and to highlight differences between sockets and Plan 9). Also need to know what type of network latency/bandwidth tests people run such as ping/pong bandwidth and latency test. Finally, people talk about using Plan 9's namespace concept for high level parallel programming, so we should investigate that.

Some note. Mount is used to root a namespace that is being served by a particular fileserver. Hence, all import does is start exportfs on the remote machine and mount those files. Also, it should be noted that bind is really like a general purpose version of symlinks. You can bind multiple files (or directories) into a single file (or directory). For instance, we can have /386/bin/, /mips/bin/, and /sparc/bin/ all bind to /bin/ to get all binaries from those architectures. The argument naming is stupid though for bind. bind(char* old, char* new) should be bind(char* source, char* union)...

Another note: an easy way to write or read from the screen is to read/write /dev/cons. Finally, it turns out that 9p uses a kernel-resident file system (the mount device) to translate local 9p functions to RPC's. These RPC's then in turn use either the TCP or IL transport protocol. So all we have to do is replace the transport protocol... Not sure if that means we must mess with the mount device implementation or whether this is specified in some file (/lib/ndb/local ?).

02/23/04

For Plan 9 to become a high performance OS, we need to be able to implement zero-copy/OS-bypass semantics. This in turn will help in implementing high performance protocols. So first, we must figure out how zero-copy/OS-bypass works. This in turn requires figuring which techniques work for particular hardware (smart NICs, dumb NICS, etc). For that we can look at EMP (for Acenic GigE) and GM (for Myrinet) as examples. The interesting questions of course is how do we preserve Plan 9 file semantics with OS-bypass?