Paul Komarek
Alumni
Biography
I started life in Spokane, Washington. In 1997 I received a B.S. in mathematics at Western Washington University in Bellingham, the northern-most city in the continental US. Carnegie Mellon University awarded me a Master's in Algorithms, Combinatorics, and Optimization (ACO) in 1999. My ACO Ph.D was awarded in May 2004.
Research Interests
My overall interest is the improvement human productivity through fast automated information anaylsis and concise summarization. My research interests include caching data structures with low amortized time cost, and the application or modification of numerical methods for exact or approximate statistical computations. I am also interested in hardware and software systems for high-performance computation.
Tags
AD-trees, Association Rules, Astrostatistics, Auton Fast Classifiers, Cached Sufficient Statistics, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Life Science Data Mining, Link Analysis, Logistic Regression, Optimization
Papers
-
Making Logistic Regression A Core Data Mining Tool With TR-IRLS
(2005)
This short paper is the easiest, fastest way to learn about Truncated Regularized Iteratively Re-weighted Least Squares (TR-IRLS), my algorithm for fast, parameter-free logistic regression. TR-IRLS can also be used for any generalized linear model. This -
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity
(2005)
Regularized logistic regression can be fast, accurate, and simple. This paper includes the most important findings of my thesis, and a few new details. -
High-Dimensional Probabilistic Classification for Drug Discovery
(2004)
Discriminative probabilistic classifiers have been used successfully on large life-sciences datasets, but high dimensionalities have prohibited the use of nonparametric class probability estimation. This paper explores a method (SLAMDUNK) which addresses -
Logistic Regression for Data Mining and High-Dimensional Classification
(2004)
We document several approached to logistic regression parameter estimation, and detail the most promising implementation for high-dimensinal classification. -
A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion
(2003)
This paper examines the task of link completion, relative algorithm performance, and what this can tell us about the structure of the data. -
Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs
(2003)
Logistic regression can provide faster, better results than SVM for life-sciences datasets with hundreds of thousands of attributes. -
A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets
(2000)
Fast implementation of on-demand AD-trees for scores of high-arity attributes and millions of rows.
Talks
-
Logistic Regression: Not Dead Yet
Mountain View, CA, 7/28/05 -
Autonomous Fast Classifiers for Pharmaceutical Data Sets
Muncie, IN, 5/24/04 -
Logistic Regression for Data Mining and High-Dimensional Classification
Pittsburgh, PA, 4/14/04 -
A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets
Stanford University, CA, 6/30/00
Recently Updated Software
-
Sparse Logistic Regression
This program performs fast sparse Logistic Regression classification. -
Sparse K Nearest Neighbor
This program performs fast sparse K Nearest Neighbor classification. -
AFDL (Activity From Demographics and Links)
Predicting activity of entities from linkages between entities and their demographics -
Sparse Naive Bayes Classifier
This program performs fast sparse Naive Bayes Classifier classification. -
Dense Naive Bayes Classifier
This program performs fast dense Naive Bayes Classifier classification. - (6 more)