I started life in Spokane, Washington. In 1997 I received a B.S. in
mathematics at Western Washington University in Bellingham, the northern-most
city in the continental US. Carnegie Mellon University awarded me a Master's in
Algorithms, Combinatorics, and Optimization (ACO) in 1999. My ACO Ph.D was
awarded in May 2004.
My overall interest is the improvement human productivity through fast
automated information anaylsis and concise summarization. My research interests
include caching data structures with low amortized time cost, and the
application or modification of numerical methods for exact or approximate
statistical computations. I am also interested in hardware and software systems
for high-performance computation.
AD-trees, Association Rules, Astrostatistics, Auton Fast Classifiers, Cached Sufficient Statistics, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Life Science Data Mining, Link Analysis, Logistic Regression, Optimization
Making Logistic Regression A Core Data Mining Tool With TR-IRLS
This short paper is the easiest, fastest way
to learn about Truncated Regularized Iteratively Re-weighted Least
Squares (TR-IRLS), my algorithm for fast, parameter-free logistic
regression. TR-IRLS can also be used for any generalized linear
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity
Regularized logistic regression can be fast, accurate, and simple. This paper includes the most important findings of my thesis, and a few new details.
High-Dimensional Probabilistic Classification for Drug Discovery
Discriminative probabilistic classifiers have been used successfully on large life-sciences datasets, but high dimensionalities have prohibited the use of nonparametric class probability estimation. This paper explores a method (SLAMDUNK) which addresses
Logistic Regression for Data Mining and High-Dimensional Classification
We document several approached to logistic regression parameter estimation, and detail the most promising implementation for high-dimensinal classification.
A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion
This paper examines the task of link completion, relative algorithm performance, and what this can tell us about the structure of the data.
Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs
Logistic regression can provide faster, better results than SVM for life-sciences datasets with hundreds of thousands of attributes.
A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets
Fast implementation of on-demand AD-trees for scores of high-arity attributes and millions of rows.
Sparse Logistic Regression
This program performs fast sparse Logistic Regression classification.
Sparse K Nearest Neighbor
This program performs fast sparse K Nearest Neighbor classification.
AFDL (Activity From Demographics and Links)
Predicting activity of entities from linkages between entities and their demographics
Sparse Naive Bayes Classifier
This program performs fast sparse Naive Bayes Classifier classification.
Dense Naive Bayes Classifier
This program performs fast dense Naive Bayes Classifier classification.
Utility to convert between various file formats used by the Auton Lab software.
Dense Logistic Regression
This program performs fast dense Logistic Regression classification.
Dense K Nearest Neighbor
This program performs fast dense K Nearest Neighbor classification.
This is a logistic regression implementation using our truncated regularized iteratively re-weighted least squares (TR-IRLS) algorithm.
The XGDA Learn software takes link information as input, and learns groups, subgroups and friends (i.e., most likely collaborators) from that link information.
A collection of fast classifiers including knn, aknn, naive bayes, decision tree, and logistic regression.