Here is a sample of the problem areas I'm interested in
along with a few of the related works. The
year by year list
has additional works including some that don't fit into one of these topics.
- Transposable data
Many statistical data matrices have
both many rows and many columns. Often both
rows and columns correspond to entities of
interest (e.g. proteins and genes or movies
and customers) instead of being IID. It is not
just p>n because in these problems you could
argue which dimension has p and which has n
levels. So questions
arise as to how to bootstrap, cross-validate
and visualize such data. For example:
- Empirical likelihood
Much of statistical inference is organized around the likelihood
function. That usually requires an unpleasant assumption that the
data come from one of the popular parametric families. Empirical
likelihood uses a data determined likelihood function to avoid this.
There is no loss of power up to second order asymptotics and
it can either win or lose compared to the true likelihood at
Web page for the book
1988 Biometrica paper at
JSTOR for univariate mean
1990 Annals of Statistics paper at
JSTOR for multivariate mean
1991 Annals of Statistics paper at
JSTOR for linear models
Escaping the convex hull EJS (with Sarah Emerson)
- Monte Carlo and quasi-Monte Carlo
Monte Carlo integration typically gets a root mean
square error of O(n^-1/2). Quasi-Monte Carlo (QMC) sampling uses deterministic
points more uniformly distributed than random ones, and it gets
an error of O(n^(-1+epsilon)) for any epsilon>0.
Randomizing the QMC points (while preserving their uniformity)
allows replication based error estimates. It can also bring
error cancellation leading to RMSE O(n^(-3/2 + epsilon)).
The following papers are motivated by large data problems
in biology. Most of the work was done with the
Stuart Kim lab
or as followup on theoretical holes that became
evident in that work.
models with Laura Lazzeroni.
gene recommender for completing partially known
clusters (with Kim lab). It is a kind of supervised correlation.
AGEMAP project (with Kim lab and Kevin Becker's lab
at the National Institute on Aging).
Another look at Karl Pearson's
wrongly thought for over 50 years to be inadmissible. It actually
beats Fisher's test on certain alternatives, when the null hypotheses
tend to be violated in the same direction as each other.
Aging in the human
muscle with the Kim lab.
For correlated hypothesis tests
this paper finds the
of the number of false dscoveries.