MySQL:
- We will use the DBMS MySQL for the project. You should start
getting familiar with MySQL via their
web page.
A good place to start is the
tutorial.
- An account for using MySQL on the CS department machines will be
made for you. We will have more information as this becomes available.
Back to beginning
Due: Monday, January 28, 2013, before class.
What to turn in:
- email names of team, team lead, CS Usernames of team (indicate if not CS student), and team member email addresses to TA and Instructor.
Back to beginning
Due: Monday, Feb 4, 2013, midnight. extended to Wed 2/6/13
What to turn in:
- Written description of database application (description.txt or
description.doc or description.pdf)
- E/R diagram (ERdiagram.doc or ERdiagram.ppt or ERdiagram.pdf)
- group participation statement (participation.txt or participation.doc
or participation.pdf)
- team project cover sheet (by hand)
Your first step is to identify the domain you would like to manage
with your database, and to construct an Entity-Relationship diagram
for the database. We suggest that you pick an application that you
will enjoy working with, since you'll be stuck with it for the whole
semester! It's especially nice if you pick an application where you
can populate your database using real, as opposed to fabricated,
data.
Your project should consist of at least 10 entities/relationships. You can use either synthetic (computer-generated fake) data or realistic data. If you have an application where you can get a large amount of real data, all the better, but it's not necessary.
Try to pick an application with a schema that is relatively
substantial, but not too enormous. For example, your E/R design should
have in the range of five or so entity sets, and a similar number of
relationship sets. This is a ballpark figure only - shooting for
somewhere between 3 and 10 is fine - you'll sense if your design is
too simple or too complex. You should certainly include different
kinds of relationships (e.g., many-one, many-many) and different kinds
of data (strings, integers, etc.), but your application need not
necessarily require advanced features such as weak entity sets, "is-a"
relationships, or roles.
- Write a short (approximately one paragraph) description of the
database application you propose to work with throughout the
course. Your description should be brief and relatively informal. If
there are any unique or particularly difficult aspects of your
proposed application, please point them out. Your description will be
graded on suitability and conciseness.
- Specify an E/R diagram for your proposed database. Don't forget to
underline key attributes for entity sets and include arrowheads
indicating the multiplicity of relationship sets. If there are weak
entity sets or "is-a" relationships, make sure to notate them
appropriately.
If you're having trouble thinking of an application, take a look at
any Web shopping site. They all have a similar theme: products,
customers, orders, shopping baskets, etc., and typically make for an
interesting and appropriately sized application. If you're still
having trouble, or if you're unsure whether your proposed application
is appropriate, please feel free to consult with the TA or the instructor.
Coming up with a good design now will
pay off greatly as the project progresses.
MAKE SURE TO SAVE A COPY OF YOUR E/R DIAGRAM - YOU WILL NEED
IT FOR SUBSEQUENT PROJECT PARTS.
Back to beginning
Due: Monday, February 27, 2013, midnight.
What to turn in:
- E/R diagram for your database (ER-diagram.doc or ER-diagram.ppt
or ER-diagram.pdf)
- if you changed your E/R diagram, explain which parts in the E/R diagram
are changed and the reasons (reason.txt or reason.doc or reason.pdf)
- schemas for all relations in your database, with keys underlined
(schema.txt or schema.doc or schema.pdf)
- for each relation, list the completely nontrivial functional
dependencies (dependency.txt or dependency.doc or dependency.pdf)
- group participation statement (participation.txt or participation.doc
participation.pdf)
- team project cover sheet (by hand)
- graded Part 1, E/R Diagram (by hand or scanned and included in email)
In this second part of the project, you will produce a relational
schema from the entity- relationship diagram you came up with in
Part 1.
- Please attach a copy of your E/R diagram from Part 1. If you
would like to make changes to your original E/R diagram at this point
(due to staff feedback or any other reason), you may do so. The new
E/R design will not be graded but will be used as a basis for grading
part 2.
- Using the method for translating an E/R diagram to relations,
produce a set of relations for your database design. As usual, please
be sure to underline key attributes in your relations.
- For each relation in your schema, specify a set of completely
nontrivial functional dependencies for the relation. Any functional
dependencies that actually hold in the real- world scenario that
you're modeling should be specified, or should follow from the
specified dependencies. Don't worry if you find that some of your
relations have no nontrivial functional dependencies.
- Is each relation in your schema in Boyce-Codd Normal Form (BCNF)
with respect to the functional dependencies you specified? If not,
decompose the relation into smaller relations so that each relation is
in BCNF. Be sure to underline key attributes in your new
relations. Don't worry if you don't have any BCNF violations not many
PDAs will not have any.
- Are there any nontrivial multivalued dependencies that hold on
any of the relations in your schema? (You needn't consider MVD's that
are also functional dependencies.) If so, specify the multivalued
dependencies, then decompose the relations into smaller ones so that
each one is in Fourth Normal Form (4NF). Be sure to underline key
attributes in your new relations. Don't worry if you don't have any
4NF violations - most PDAs will not have any.
- Now that you've decomposed your relations as far as possible, are
there any relations that could be combined without introducing
redundancy (i.e., without creating BCNF or 4NF violations)? If so,
combine them.
- Is there anything you still don't like about the schema (e.g.,
attribute names, relation structure, etc.)? If so, modify the
relational schema to something you prefer. You will be working with
this schema quite a bit, so it's worth spending some time now to make
sure you're happy with it.
ONCE AGAIN, SAVE A COPY OF YOUR FINAL RELATIONAL SCHEMA (ITEM 7 ABOVE),
AS IT WILL BE NEEDED FOR SUBSEQUENT PROJECT PARTS.
Back to beginning
Due: Fri, March 29, 2013, midnight.
What to turn in:
- a script file containing SQL commands for creating each table in you PDA
(createTable.sql)
- a script log file showing a MySQL session in which your tables are
created successfully (createLog.txt)
- a few (approximately 5-10) records of "realistic" data per each relation
(realData.txt or realData.doc or realData.pdf)
- the commented source code for your program to generate or
transform data.
- written explanation of how your program works. This is not line-by-line
commenting, but a high level global idea of how you generate your
synthetic data.
(explanation.txt or explanation.doc or explanation.pdf)
- a small sample of the "synthetic" records generated for each relation
by your program (5 or so records per relation)
(syntheticData.txt or syntheticData.doc or syntheticData.pdf)
- a script log file showing the successful loading of your "synthetic"
data into MySQL per each relation (syntheticLog.txt or syntheticLog.doc
or syntheticLog.pdf)
- a script file containing SQL commands for inserting
synthetic data (approximately 5-10 records as a sample) generated by
your program (insertSyntheticData.sql)
- a script file containing SQL commands (SELECT * FROM
each table_name) for selecting the synthetic data per each table
(selectSyntheticData.sql)
- a script log file showing a MySQL session in which the
SELECT commands are run successfully (selectSyntheticDataLog.txt)
- group participation statement (participation.txt or participation.doc
or participation.pdf)
- team project cover sheet (by hand)
In this part of the project, you will create a relational schema for
your PDA in the MySQL database system, and you will populate the
tables in your database with initial data sets.
- Familiarize yourself with the MySQL relational DBMS by reading the
on-line documentation (referenced above under logistics),
logging into MySQL, trying some of the examples in the document, and
experimenting with the various commands. You don't need to turn
anything in for this part.
- Create relations for your PDA based on your final relational
schema from Part 2. Use the
CREATE TABLE command to specify each
relation, its attributes and attribute types.
If you have an attribute that
represents a date and/or time, you may want to look at the page on
this page.
-
Turn in a script log
showing a MySQL session in which your relations are created
successfully, as described next.
For each relation in your PDA, create an execution script file
and a few (approximately 5-10) records of
"realistic" data. Then execute the script file from the mysql command
line using the \T option. Turn in a listing showing the contents of
the files you created, the successful loading of the data into MySQL,
and the execution of "SELECT *" commands to show the contents of each
relation.
- Write a program in any programming language you like that creates
large files of records for each of your PDA relations. If you have
available real data for your PDA, then your program will need to
transform the data into files of records conforming to your PDA schema
and to MySQL's load format. The rest of you will need to write a
program to fabricate data: your program will generate either random or
nonrandom (e.g., sequential) records conforming to your schema. Note
that it is both fine and expected for your data values, strings
especially, to be meaningless gibberish. The point of generating
large amounts of data is so that you can experiment with a database of
realistic size, rather than the small "toy" databases often used in
classes. The data you generate and load should be on the order of:
- At least two relations with thousands of tuples
- At least one relation with hundreds of tuples
If your application naturally includes relations that are expected to
be relatively small (e.g., schools within a university), then it is
fine to use some small relations, but please ensure that you have
relations of the sizes prescribed above as well. When writing a
program to fabricate data, there are two important points to keep in
mind:
- Make sure not to generate duplicate values for key attributes.
- Your PDA almost certainly includes relations that are expected
to join with each other. For example, you may have a Student relation
with attribute courseNum that's expected to join with attribute number
in relation Course. When generating data, be sure to generate values
that actually do join - otherwise all of your interesting queries will
have empty results! There are a couple of ways to properly generate
joining values. One way is to generate records for multiple relations
(e.g., Course and Student) at the same time. Another way is to
generate the records for one relation first, and then use the joining
values for the other relation. For example, you could generate records
for relation Course first, then use the Course.number values when
creating values for Student.courseNum.
Turn in your program code for generating or transforming data, a small
sample of the records generated for each relation (5 or so records per
relation), and a script log showing the successful loading of your
data into MySQL.
Back to beginning
Due: Fri, April 12, 2013, midnight
What to turn in:
- a copy of all of your SQL commands (query.sql, update.sql and index.sql)
- a script log file showing a MySQL session in which your SQL commands
are run successfully on both your small and large databases
(queryLog.txt, updateLog.txt and indexLog.txt)
- group participation statement (participation.txt or participation.doc or
participation.pdf)
- team project cover sheet (by hand)
In this part of the project, you will issue SQL queries and updates
against your PDA database, and you will experiment with the use of
indexes. Since you will be modifying your data as part of this
assignment, we strongly suggest that you adopt a routine for getting
repeated 'fresh' starts with MySQL.
You need to save and keep all SQL commands such as creating/dropping database,
creating tables, inserting real data and/or synthetic data, querying tuples,
updating tuples, etc., in script files.
If needed, you just can run these script files again in order to re-setup
your database.
(a) Queries and Updates
Develop and test:
- At least ten SQL data retrieval (select) commands
( three general queries, two 'join' queries (related with three tables), one 'union' query,
one 'group by' query, one 'order by' query, one 'DISTINCT' query
and one 'Aggregate' query).
- At least six SQL data modification
( two update, two deletion, two insertion) commands
Please note:
- For this assignment you will be invoking your SQL commands
interactively through mysql. Of course you should certainly build a
script file, rather than typing in the queries each time you run them.
- Please write 'interesting' queries, in particular,
- three general queries with at least two conditions
(For example, SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 > 10 and
t2.name = `Texas`),
- two 'join' queries (related with three tables),
- one 'union' query,
- one 'group by' query,
- one 'order by' query,
- one 'DISTINCT' query, and
- one 'Aggregate' query).
You should try to use most or all of the SQL constructs discussed in class and in the textbook.
You will not receive full credit if your queries and modifications are all extremely
simple.
- We suggest that you experiment with your SQL commands on your
small hand-created database before running them on the large database
for which you generated data. Initial debugging is much easier when
you're operating on small amounts of data. Once you're confident that
your commands are working, try them on your large database. We do
expect that the commands you turn in work properly on both your small
and large databases.
- If you discover that most or all of your 'interesting' queries
return an empty answer on your large database, check whether you
followed the instructions in Part 3 for generating data values
that join properly. You may need to modify your data generator.
- Turn in a copy of all of your SQL commands, along with a script
illustrating their execution on both your small and large
databases. Your script should be sufficient to convince us that your
commands run successfully, but you can and should truncate query
results after a few lines. Please do not turn in query results that
are hundreds of lines long.
(b) Indexes
In Part 4 (a),
you may have discovered that some queries run very slowly
over your large database. As discussed in class, an important
technique for improving the performance of queries is to create
indexes. An index on an attribute A of relation R allows the database
to quickly find all tuples in R with a given value for attribute A
(which is useful when evaluating selection or join conditions
involving attribute A). An index can be created on any attribute of
any relation, or on several attributes combined.
See this
page for more informationon how to create an index in MySQL.
- Create at least three useful indexes for your PDA. Run your queries
from part 4(a) on your large database with the indexes and without the
indexes. Turn in a script showing your commands to create indexes, and
showing the relative times of query execution with and without
indexes. Here too, please truncate any large query results.
Please note:
- MySQL automatically creates indexes for attributes declared as primary
keys.
- Actual timings will be affected by external factors such as system
load. However, for some of your queries, with appropriate indexes you
should see a consistent dramatic difference between the execution
times with indexes and the times without. If others of your queries do
not show performance improvement even when relevant indexes are
created, please include a short note suggesting why this might be the
case.
Back to beginning
Due: Wed, May 1, 2013, midnight
What to turn in:
- written explanation of how your program works. This is not line-by-line
commenting, but a high level global idea of what is going on in your
program.
(explanation.txt or explanation.doc or explanation.pdf)
- commented source code for your programs
- Your embedded SQL based application must provide the following functions:
connecting to a database, querying, inserting, deleting and updating.
The more complex the queries that your application
can handle, the more credit you will get.
- group participation statement (participation.txt or participation.doc or
participation.pdf)
- team project cover sheet (by hand)
In this part of the project, you will
interact with your PDA database from an external program. Your task
is to build a moderately user-friendly interactive application program
front end to your PDA using the a programming language. Your
program should consist of a continuous loop in which:
- A list of at least five alternative options is offered to the user.
(An additional alternative should be quit.)
- The user selects an alternative.
- The system prompts the user for appropriate input values.
- The system accesses the database to perform the appropriate queries
and/or modifications.
- Data or an appropriate acknowledgment is returned to the user.
You should include both queries and modifications among your
options. As in Part 4, please include some
"interesting" queries or modifications, i.e., operations that require
some of the more complex SQL constructs such as subqueries,
aggregates, set operators, etc.
As a general example, if your PDA is a campus applicant database, then
your interface might include in its menu a number of useful queries on
the database, with both input and output in a format more convenient
and pleasing than raw interactive SQL. Some queries perform
statistical analysis requiring multiple levels of grouping, other
queries are simpler.
- Insert a new student record.
- Insert a new application record.
- Update a student's address, GPA, or SAT.
- Update an application's decision field based on user input.
- Update application decision fields automatically, admitting students
based on a combination of numeric values and quotas for each campus
and/or major.
- Quit.
Your application code should interact with the database using the a
call-level interface that depends on the language you are using.
You can refer to following links for JDBC for Java.
Back to beginning
Due: Wed, May 1, 2013
What to turn in:
- a short video demoing your project
(or URL if file is very large-- make sure that your URL works!)
- team project cover sheet (by hand)
Make a short video demoing your project in action. On our CS linux boxes, this can be done using the commandline recordmydesktop foo.ogv. You can view your file using the movie Player app under "Sound and Video". I want to see you stepping through your queries and interface from Part 5.
Another software package that was recommended was Camtasia Studio or Camtasia:mac. It has a free 30 day trial, is extremely intuitive, and your hand through the process. The only thing is that it will automatically try to find good places to zoom. A student from our class recommends that you delete all of the zoom points.
Back to beginning