Project 1 - Data Extractor

1. Specification

In Project 0, you have written a project to process a text file. In this project, you will explore more about file processing: You are given two text files MTable.txt, Vars.txt, and one binary file RawData.dat. Your program will generate an output text file.

The binary raw data file RawData.dat contains data from a real NASA experiment. The scientist on the ground wants to extract the experimental data from the raw data file and save the results in a text file so he can process these results later using MATLAB, gnuplot, or JAVA graphics. The basic elements of the raw data file are packets.

A text file, called MTable.txt file, describes the format of each packet. Since each packet contains many variables and the scientist just wants to extract some of them at a time, so he creates a text file called Vars.txt to help him to define which variables are to be extracted. With the help of the MTable.txt and Vars.txt files, your program will extract data from file RawData.dat and output a nice text file. 

1.1 The three given files are:

This is a binary file generated by a PowerPC computer. The data is in JAVA data format. It contains certain number of packets of raw data. Each packet is of size 350 bytes.

           ----------------  
           |  packet 1     |   350 bytes
	   |		   |   
	   ----------------  
	   |  packet 2     |   350 bytes
	   |               |
	   ----------------
	   |  packet 3	   |
	   |               |   350 bytes
	   ----------------
	        .....
	   ----------------
	   |  packet n   |
	   |               |   350 bytes
	   ----------------

This file provides the format information in each data packet in the raw data file. Each line in a packet maybe a comment line, a blank line, or a format line that contains 6 columns. The comment lines start with "#". Your program must skip these comment lines and blank lines.

Let's see a format line example: 

2    MET_YEAR    4    2    UINT16

This line says that the second variable is "MET_YEAR", the offset of this variable from the beginning of the packet is 4 bytes, its width is 2 bytes and it has the type named "UINT16".  This line has 5 columns and columns are separated by 1 or more tabs ('\t'). The 6th (comment) line is empty. 

Let's see another format line example:

23    CMD_REG    82    4    UINT32 (bit encoded)

This line says that the variable "CMD_REG" is the 23rd variable in the packet. Its offset is 82 bytes from the beginning of the packet, its width is 4 and it has the type named "UINT32". Note that this line has 6 columns: The first 5 columns are separated by '\t' and the 5th and 6th columns are separated by one space ' '. 

Following is a summary of the format of the Mtable.txt file:

The columns are separated by whitespace characters ('\t' or ' ').

Column 1: serial number of variable
Column 2: name of variable
Column 3: offset of variable in the packet
Column 4: width of variable
Column 5: type of variable
Column 6: comment (maybe empty)

There are five types you have to deal with:
TYPE NAME LENGTH (in bytes) YOUR INTERNAL REPRESENTATION
STRING string variable length, defined in mtable for each appearance String
UINT8 unsigned byte1 long
UINT16 unsigned short2 long
UINT32 unsigned int4 long
FLOAT float4 float

There is another type, ANY, which you can ignore, they are used for padding.