Jeremy Kubica
Graduate Student
Biography
I came to CMU directly after finishing my undergraduate degree in computer science at Cornell University. During undergrad I also had a chance to spend two summers working at research labs (PARC and FXPAL) in California doing robotics/AI research and biking.
Research Interests
My main research interests are in artificial intelligence, machine learning, data mining, and robotics. Specifically, I am interested in the "real world" problems where the information presented is usually very noisy and incomplete. How can computers form a meaningful picture and make decisions from this type of information?
Tags
Applications, Astrostatistics, Cached Sufficient Statistics, Clustering, Efficient Statistical Algorithms, GDA, Kd-trees and Ball-trees, Link Analysis, Statistical Data Mining for Astrophysics
Papers
-
A Multiple Tree Algorithm for the Efficient Association of Asteroid Observations
(2005)
-
Efficiently Identifying Close Track/Observation Pairs in Continuous Timed Data
(2005)
-
Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
(2005)
-
Tractable Group Detection on Large Link Data Sets
(2003)
We present the k-groups algorithm, an improvement of the GDA algorithm that includes significant computational advantages. The k-groups algorithm allows tractable group detection on large data sets. -
Finding Underlying Connections: A Fast Graph-Based Method for Link Analysis and Collaboration Queries
(2003)
CGraph is an algorithm to quickly learn a graph-based model of the underlying connections of a set of entities given link data. -
cGraph: A Fast Graph-Based Method for Link Analysis and Queries
(2003)
This paper is an extended version of the 2003 ICML conference paper. -
Probabilistic Noise Identification and Data Cleaning
(2003)
We examine the use of explicit noise and corruption models to aid in the task of noise identification and data cleaning. -
A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion
(2003)
This paper examines the task of link completion, relative algorithm performance, and what this can tell us about the structure of the data. -
Stochastic Link and Group Detection
(2002)
This paper introduces the GDA algorithm. We use noisy link data (n-tuples of entities) to learn underlying groupings of entities.
Software
-
npt
N-point Spatial Statistics. -
cGraph
CGraph is an algorithm to quickly learn a graph-based model of the underlying connections of a set of entities given link data. -
k-groups/GDA
The group detection algorithm (GDA) finds underlying groupings of entities given a set of observed links and demographic information. -
XGDA Learn
The XGDA Learn software takes link information as input, and learns groups, subgroups and friends (i.e., most likely collaborators) from that link information.