CS345 – Topics in Data Warehousing Autumn 2004
[ad_1]
Assignments |
There will be five homework assignments: one problem set and four small programming assignments. The goal of the assignments is for students to gain hands-on experience with the ideas convered in the class.Students who find their course projects to be particularly interesting have the option to expand the scope of their project instead of completing one or two of the assignments. Make arrangements with the instructor if you are interested in electing this option. |
Late Policy |
Homework assignments will be given out on Thursdays and will be due 12 days later (one week from the following Tuesday).Late assignments will be accepted without penalty, provided that they are submitted no more than three days late (that is, by 11:59 pm on the Friday following the due date). No submissions will be accepted that are more than three days late.Students are strongly encouraged to complete the assignments by the nominal due date, rather than relying on the automatic three-day extension. Requests for additional extensions due to exceptional circumstances will be considered on a case-by-case basis, provided they are received before the nominal due date of the assignment. |
Programming Environment |
All of the assignments after the first involve writing programs that interact with a database. The database management system we are using for this class is Oracle. Each student in the class will have an Oracle account created. A web page with useful tips about interacting with Oracle is here.You may use the programming language of your choice for the programming assignments. |
Assignment #1: Data Modeling Trade-Offs |
Due Date: Tuesday, October 19 This assignment will be a written problem set. It will consist of various data modeling scenarios. In each scenario, you will be asked to compare alternative schema designs, listing the pros and cons of each.Assignment #1 (Postscript format) |
Assignment #2: Extraction, Transformation, and Load |
Due Date: Tuesday, November 2 In this programming assignment, you will extract data from three different source systems–a web site, a relational database, and a flat file–and load it into a data warehouse. The assignment will include both an initial load and an incremental load.Assignment #2 (Postscript format)Logging into Oracle using sqlplus Also describes how to change your password.CS 145 Oracle Tips Page Lots of helpful info.Using the Oracle bulk loader (sqlldr)Using TOAD / TOra on your Windows or Linux machine“Hello World” in JDBCDifferences between Players source system tables and Players dimension tableUsing the extract.pl scriptTest queries to validate your data warehouse |
Assignment #3: Aggregate Selection and Navigation |
Due Date: Friday, November 19 (Late assignments will be accepted without penalty until Monday, November 22, 11:59 PM.)This programming assignment has three parts. In the first part, you will implement an algorithm for selecting a good set of precomputed aggregate tables (materialized views) to create for a given database. In the second part, you will generate the SQL to create the aggregate tables that were selected in the first part. In the third part, you will implement a query re-writer that takes an aggregation query that is written to run against the fact table, selects the most efficient aggregate table to use for that query, and re-writes the query to execute against the aggregate table.Assignment #3 (Postscript format)Metadata Tables Describes the format of the metadata tables in which configuration information is stored.Scripts that are providedSQL query formatStar schema for testingSpecifics of the aggregate selection algorithmSome tips for the aggregate creation phase of the assignmentResults on test data Results of the aggregate selection algorithm on the two test data sets. Use these to verify that your algorithm is working correctly. |
Assignment #4: Decision Tree Learning |
Due Date: Thursday, December 2 (Late assignments will be accepted without penalty until Sunday, December 5, 11:59 PM.)In this programming assignment, you will implement the ID3 algorithm to learn a decision tree classifier, along with pruning using a holdout set. The data to be classified will be stored in the database, and computation of the statistics needed by the classifier will be performed using SQL.Assignment #4 (Postscript format)Scripts that are providedResults on test data You can compare the decision tree learned by your program with these test results. |
[Button id=”1″]
[ad_2]
Source link
"96% of our customers have reported a 90% and above score. You might want to place an order with us."
