about | photography | music | research | contact
publications
projects
c.v.

Projects

Here are some course projects I've worked on in the past few years. The final reports and sometimes code are included for your reading pleasure.

Annotating DBWorld Messages Using Domain Resources


PS PDF

The DBworld mailing list is a primary channel of communication for database researchers around the world. Conference call-for-papers, job postings, book releases, and many other types of announcements are routinely posted in DBworld. It has become an indispensable tool for researchers who wish to stay up-to-date with the research community. Unfortunately, the only means of distribution of the mailing list is through email or flat text copies on the web. In the world of hyperlinks and semantic data, this format is rather inadequate. A more useful format would be a fully annotated and linked version of the message. All persons' names, conference names, university names, and others would be linked to their appropriate homepages. This presentation would allow the reader to quickly access content on the web related to the message.

We propose to do exactly this task. Given a DBworld message, we wish to annotate all relevant entities and link them to their appropriate websites (or potentially a set of related websites). To this end, we propose a system which will use many domain-specific resources and perform the proposed task. In this work, we will describe the components of our system and show experiment results.

This project was finished with the help of Prof. Doan and my partners Jim Chan and Linus Wong for CS 498 AD Fall 2004.


Refining Basis Functions in Least-Square Approximations of Zero-Sum Markov Games


PS PDF

Learning in Markov games is a very expensive task. The standard Minimax-Q algorithm requires an immense amount of linear programming that is just unacceptable for most practical problems. In the work by Lagoudakis and Parr, they approximated the Q-table in reinforcement learning using a set of basis functions in the least square sense. The optimal policy is then calculated (via policy iteration) from the approximation. This removes almost all of the linear programming.

The downside of the algorithm is that a human expert is needed to create the set of basis functions. In my work, I used Vector Quantization to refine an initial set of basis functions. In the domain of Littman's soccer game, I show that a refined set one-third the original size performed competitively with the original set while requiring much less computation.

This project was finished with the help of Prof. LaValle for CS 497 SML Spring 2003.

For the really interested parties, all the code is in this tarball. Everything was done in MATLAB R13. You'll need the Optimization toolbox (linprog) and Statistics toolbox (princomp) to get it working. Of course, no documentation is included; good luck figuring everything out.


A Performance Study of Ice Cubing Algorithms


PS PDF

For my data mining course, my final project was to do a thorough performance study of the major ice cubing methods: Multi-Way, BUC, and H-Cubing. Using various synthetic datasets, I tested the performance of all algorithms for variations in dimensionality, cardinality, density, skew, and more. This work was completed for CS 497 HAN Fall 2002.


The ART of Differentiating Human Speech


PS PDF

This is a little final project for one of my classes. Its basic goal is to decide if a particular recording of sound was human or not. For the problem, I was given the pre-processed sound clips, and I coded a simple neural network based on the ART2-A model to give me the answers. My neural net followed the ART2-A model fairly closed except for the learning step. The traditional ART class of networks are used for unsupervised learning but since my sound clips were labeled already, I decided to add the element of supervised learning to ART. Basically, when the algorithm passed the vigilance test and decides to incorporate the vector into the prototype, it will perform another level of supervised checking to make sure the classes match. If they don't, it will ``unlearn'' the vector from the stored prototype.

This turns out to help out the algorithm greatly. In some cases, accuracy improved by as much as 30%. Overall, the accuracy was around 90%, pretty high I suppose. I did this with the help of Prof. Ray for CS 497 SRR Fall 2002.

And if you're really interested, you can get all my source code in this tarball. I didn't include any sound clips so I guess email if you really want them.


A Survey of K-Median and Related Algorithms


PS PDF

The K-Median problem is a very interesting and useful problem in computational geometry. It relates closely to clustering and has applications in data mining, statistical data analysis, compression, and vector quantization. Formally, the problem is defined as follows: we are given n data points in a metric space and a positive integer k. Our goal is then to select k medians such that the sum of the distances from all n points to their nearest median is minimized.

To find out more about the problem, read my survey paper on the subject including other variants of the problem such as K-Means, K-Center, and Facility-Location. The paper includes various methods used to approximate solutions and analysis of several solutions. I did this paper with the assistance of Prof. Har-Peled for CS 290 Independent Study Fall 2001.