I finished my undergraduate degree in computer science at Tsinghua University, China. Now I am a third year graduate student in CMU computer science department.
My research interest lies in machine learning and data mining, with focus on nonparametric statistics, memory-based learning and kernel-based learning. I am currently interested in designing high-performance algorithms that solve fundamental tasks (such as k nearest neighbor and support vector machine) on massive and high-dimensional data sets.
An Investigation of Practical Approximate Nearest Neighbor Algorithms
How to use variations on classic exact data structures for nearest neighbor, if you want to get faster answers and are prepared to accept approximation?
High-Dimensional Probabilistic Classification for Drug Discovery
Discriminative probabilistic classifiers have been used successfully on large life-sciences datasets, but high dimensionalities have prohibited the use of nonparametric class probability estimation. This paper explores a method (SLAMDUNK) which addresses
The IOC algorithm: Efficient Many-Class Non-parametric Classification for High-Dimensional Data
Performing k-nearest-neghbor classifications on multi-class problems without actually finding the k-nearest neighbors.
Efficient Exact k-NN and Nonparametric Classification in High Dimensions
Can we do non-approximate k-NN classification without actually finding the k-NN?
Autonomous Fast Classifiers for Pharmaceutical Data Sets
Muncie, IN, 5/24/04
Sparse Logistic Regression
This program performs fast sparse Logistic Regression classification.
Sparse K Nearest Neighbor
This program performs fast sparse K Nearest Neighbor classification.
Sparse Naive Bayes Classifier
This program performs fast sparse Naive Bayes Classifier classification.
Dense Naive Bayes Classifier
This program performs fast dense Naive Bayes Classifier classification.
Dense Logistic Regression
This program performs fast dense Logistic Regression classification.
Dense K Nearest Neighbor
This program performs fast dense K Nearest Neighbor classification.
A collection of fast classifiers including knn, aknn, naive bayes, decision tree, and logistic regression.