Paul Komarek
komarek.paul@gmail.com
Alumni
My Website

Biography
I started life in Spokane, Washington. In 1997 I received a B.S. in
mathematics at Western Washington University in Bellingham, the northern-most
city in the continental US. Carnegie Mellon University awarded me a Master's in
Algorithms, Combinatorics, and Optimization (ACO) in 1999. My ACO Ph.D was
awarded in May 2004.
Research Interests
My overall interest is the improvement human productivity through fast
automated information anaylsis and concise summarization. My research interests
include caching data structures with low amortized time cost, and the
application or modification of numerical methods for exact or approximate
statistical computations. I am also interested in hardware and software systems
for high-performance computation.
Tags
AD-trees, Association Rules, Astrostatistics, Auton Fast Classifiers, Cached Sufficient Statistics, Efficient Statistical Algorithms, Kd-trees and Ball-trees, Life Science Data Mining, Link Analysis, Logistic Regression, Optimization
Papers
-
Making Logistic Regression A Core Data Mining Tool With TR-IRLS
(2005)
This short paper is the easiest, fastest way
to learn about Truncated Regularized Iteratively Re-weighted Least
Squares (TR-IRLS), my algorithm for fast, parameter-free logistic
regression. TR-IRLS can also be used for any generalized linear
model. This
-
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity
(2005)
Regularized logistic regression can be fast, accurate, and simple. This paper includes the most important findings of my thesis, and a few new details.
-
High-Dimensional Probabilistic Classification for Drug Discovery
(2004)
Discriminative probabilistic classifiers have been used successfully on large life-sciences datasets, but high dimensionalities have prohibited the use of nonparametric class probability estimation. This paper explores a method (SLAMDUNK) which addresses
-
Logistic Regression for Data Mining and High-Dimensional Classification
(2004)
We document several approached to logistic regression parameter estimation, and detail the most promising implementation for high-dimensinal classification.
-
A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion
(2003)
This paper examines the task of link completion, relative algorithm performance, and what this can tell us about the structure of the data.
-
Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs
(2003)
Logistic regression can provide faster, better results than SVM for life-sciences datasets with hundreds of thousands of attributes.
-
A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets
(2000)
Fast implementation of on-demand AD-trees for scores of high-arity attributes and millions of rows.
Talks
Software
-
Sparse Logistic Regression
This program performs fast sparse Logistic Regression classification.
-
Sparse K Nearest Neighbor
This program performs fast sparse K Nearest Neighbor classification.
-
AFDL (Activity From Demographics and Links)
Predicting activity of entities from linkages between entities and their demographics
-
Sparse Naive Bayes Classifier
This program performs fast sparse Naive Bayes Classifier classification.
-
Dense Naive Bayes Classifier
This program performs fast dense Naive Bayes Classifier classification.
-
convert
Utility to convert between various file formats used by the Auton Lab software.
-
Dense Logistic Regression
This program performs fast dense Logistic Regression classification.
-
Dense K Nearest Neighbor
This program performs fast dense K Nearest Neighbor classification.
-
lr_trirls
This is a logistic regression implementation using our truncated regularized iteratively re-weighted least squares (TR-IRLS) algorithm.
-
XGDA Learn
The XGDA Learn software takes link information as input, and learns groups, subgroups and friends (i.e., most likely collaborators) from that link information.
-
Fast Classifiers
A collection of fast classifiers including knn, aknn, naive bayes, decision tree, and logistic regression.