Projects
Here are some course projects I've worked on in the past few years.
The final reports and sometimes code are included for your reading
pleasure.
Annotating DBWorld Messages Using Domain Resources
 |
 |
The DBworld mailing list is a primary channel of
communication for database researchers around the world. Conference
call-for-papers, job postings, book releases, and many other types
of announcements are routinely posted in DBworld. It has become an
indispensable tool for researchers who wish to stay up-to-date with
the research community. Unfortunately, the only means of
distribution of the mailing list is through email or flat text
copies on the web. In the world of hyperlinks and semantic data,
this format is rather inadequate. A more useful format would be a
fully annotated and linked version of the message. All persons'
names, conference names, university names, and others would be
linked to their appropriate homepages. This presentation would
allow the reader to quickly access content on the web related to the
message.
We propose to do exactly this task. Given a DBworld message,
we wish to annotate all relevant entities and link them to their
appropriate websites (or potentially a set of related websites). To
this end, we propose a system which will use many domain-specific
resources and perform the proposed task. In this work, we will
describe the components of our system and show experiment
results.
This project was finished with the help of Prof. Doan and my partners
Jim Chan and Linus Wong for CS
498 AD Fall 2004. |
Refining Basis Functions in Least-Square Approximations of Zero-Sum
Markov Games
 |
 |
Learning in Markov games is a very expensive task. The
standard Minimax-Q algorithm requires an immense amount of linear
programming that is just unacceptable for most practical problems.
In the work by
Lagoudakis and Parr, they approximated the Q-table in reinforcement
learning using a set of basis functions in the least square sense.
The optimal policy is then calculated (via policy iteration) from
the approximation. This removes almost all of the linear
programming.
The downside of the algorithm is that a human expert is needed to
create the set of basis functions. In my work, I used Vector
Quantization to refine an initial set of basis functions. In the
domain of Littman's soccer game, I show that a refined set one-third
the original size performed competitively with the original set
while requiring much less computation.
This project was finished with the help of Prof. LaValle for CS
497 SML Spring 2003.
For the really interested parties, all the code is in this tarball. Everything was done
in MATLAB R13. You'll need
the Optimization toolbox (linprog) and Statistics toolbox (princomp)
to get it working. Of course, no documentation is included; good
luck figuring everything out. |
A Performance Study of Ice Cubing Algorithms
 |
 |
For my data mining course, my final project was to do a
thorough performance study of the major ice cubing methods:
Multi-Way, BUC, and H-Cubing. Using various synthetic datasets, I
tested the performance of all algorithms for variations in
dimensionality, cardinality, density, skew, and more. This work was
completed for CS 497 HAN Fall 2002. |
The ART of Differentiating Human Speech
 |
 |
This is a little final project for one of my classes. Its
basic goal is to decide if a particular recording of sound was human
or not. For the problem, I was given the pre-processed sound clips,
and I coded a simple neural network based on the ART2-A model to
give me the answers. My neural net followed the ART2-A model fairly
closed except for the learning step. The traditional ART class of
networks are used for unsupervised learning but since my sound clips
were labeled already, I decided to add the element of supervised
learning to ART. Basically, when the algorithm passed the vigilance
test and decides to incorporate the vector into the prototype, it
will perform another level of supervised checking to make sure the
classes match. If they don't, it will ``unlearn'' the vector from
the stored prototype.
This turns out to help out the algorithm greatly. In some cases,
accuracy improved by as much as 30%. Overall, the accuracy was
around 90%, pretty high I suppose. I did this with the help of Prof. Ray for CS 497 SRR
Fall 2002.
And if you're really interested, you can get all my source code
in this tarball. I
didn't include any sound clips so I guess email if you really want
them. |
A Survey of K-Median and Related Algorithms
 |
 |
The K-Median problem is a very interesting and useful problem
in computational geometry. It relates closely to clustering and has
applications in data mining, statistical data analysis, compression,
and vector quantization. Formally, the problem is defined as
follows: we are given n data points in a metric space and a
positive integer k. Our goal is then to select k
medians such that the sum of the distances from all n
points to their nearest median is minimized.
To find out more about the problem, read my survey paper on the
subject including other variants of the problem such as K-Means,
K-Center, and Facility-Location. The paper includes various methods
used to approximate solutions and analysis of several solutions. I
did this paper with the assistance of Prof. Har-Peled for CS
290 Independent Study Fall 2001.
|
|