Reports
Most of these papers are based upon work supported by the
National Science Foundation under Grants:
IIS-1837931,
DMS-1521145,
DMS-1407397,
DMS-0906056,
DMS-0604939,
DMS-0306612,
and DMS-0072445.
Any opinions, findings, and conclusions or recommendations
expressed in this material are those
of the author(s) and do not necessarily reflect the
views of the National Science Foundation.
Slides from talks are beside some of the articles.
More talks are here.
There is work on empirical likelihood,
Monte Carlo & quasi-Monte Carlo, computer experiments,
transposable data, hypothesis testing and bioinformatics. You
may have to scroll down for some of those.
Papers by year of initial posting
Each article goes under the year when it was first
written, usually as a technical report. Revisions don't usually
make an article move up the list. Some of them are marked with
publication information, some not.
2022
-
N. Hama, M. Mase and A. B. Owen
Model free Shapley values for high dimensional data
arXiv
We develop model-free approximate Shapley value methods for a high dimensional setting.
Model-free methods only need predictions from a model and do not have to evaluate
the model at new points. They apply more generally than model-agnostics methods
that can work with any model. For instance they can identify input variables
that are important to a residual. Cohort Shapley is model free. By using an integrated
gradients approach we get a cohort Shapley method with linear cost in the number of variables
instead of exponential cost. We show that the function we use is very close to one
for which integrated gradients match Shapley value over the vast majority of the unit cube.
We apply the method to a problem from high energy physics and one from computational chemistry.
-
Z. Pan and A. B. Owen
Super-polynomial accuracy of multidimensional randomized nets using the median-of-means
arXiv
We show that a median-of-means strategy for randomized quasi-Monte Carlo sampling via linearly scrambled
Sobol' points attains superpolynomial accuracy for analytic functions on \([0,1]^s\).
-
N. Hama, M. Mase and A. B. Owen
Deletion and Insertion Tests in Regression Models
arXiv
We study deletion and insertion tests of Petsiuk et al. (2018). These measure the quality of algorithms that rank variables by importance. Their original application was for image classification and we consider more general regrssion problems. We obtain expressions for their area under the curve (AUC) quantity in terms of the anchored decomposition of functions used in complexity theory. We show how the ordering by Shapley value does not necessarily optimize AUC. We show that ordering is optimal for logistic regression or naive Bayes. On two real data problems we find the best AUCs for Kernel SHAP. We also find that a much more scalable integrated gradients method is nearly as good. We also compare LIME and DeepLIFT and Vanilla Grad. Integrated gradients are not well defined for binary predictors. We compare three methods to handle them and conclude that simply casting the binary variables to real values is the best choice.
-
H. H. Li and A. B. Owen
A general characterization of optimal tie-breaker designs
arXiv
This paper finds optimal tie-breaker designs for a two line model when the running variable is not necessarily symmetrically distributed and the treatment fraction is not necessarily 1/2. The optimal design is always piece-wise constant over a small number of intervals. When the economically or even ethically important constraint of monotone treatment probability is imposed, then only two distinct treatment probabilities are needed. The optimal design can be much more efficient than one using only treatment probabilities 0, 0.5 and 1. The proof uses the method of undetermined multipliers. This paper also connects tie-breaker designs to covariate adaptive optimal design. Prior works connected tie-breakers to regression discontinuity.
-
T. P. Morrison and A. B. Owen
Optimality in Multivariate Tie-breaker Designs
arXiv
Tie-breaker designs (TBDs), in which subjects with extreme values are assigned treatment deterministically and those in the middle are randomized, are intermediate between regression discontinuity designs (RDDs) and randomized controlled trials (RCTs). TBDs thus provide a convenient mechanism by which to trade off between the treatment benefit of an RDD and the statistical efficiency gains of an RCT. We study a model where the expected response is one multivariate regression for treated subjects and another one for control subjects. For a given set of subject data we show how to use convex optimization to choose treatment probabilities that optimize a prospective D-optimality condition (expected information gain) adapted from Bayesian optimal design. We can incorporate economically motivated linear constraints on those treatment probabilities as well as monotonicity constraints that have a strong ethical motivation. Our condition can be used in two scenarios: known covariates with random treatments, and random covariates with random treatments. We find that optimality for the treatment effect coincides with optimality for the whole regression, and that the RCT satisfies moment conditions for optimality. For Gaussian data we can find optimal linear scorings of subjects, one for statistical efficiency and another for short term treatment benefit.
We apply the convex optimization solution to some real emergency triage data from MIMIC.
-
S. Liu and A. B. Owen
Pre-integration via Active Subspaces
arXiv
Pre-integration is a quasi-Monte Carlo (QMC) version of conditional Monte Carlo in which one of d input variables is integrated out in closed form. It can leave a d-1 dimensional integrand with improved regularity bringing greatly increased accuracy for QMC and randomized QMC. For integrals defined by a Gaussian distribution, the problem arises of which linear combination of variables to pre-integrate over. We propose to use the lead eigenvector of \(\mathbb{E}(\nabla f(x)\nabla f(x)^T)\). This is the one dimensional active subspace direction from work of Constantine and others. This choice works very well in some financial valuation problems where there is no established choice such as the first principal component. We show that pre-integration does not increase the RQMC variance. We show that
We show that the lead eigenvector in an active subspace decomposition is closely related to the vector that maximizes a less computationally tractable criterion using a Sobol’ index to find the most important linear combination of Gaussian variables. They optimize similar expectations involving the gradient. We show that the Sobol’ index criterion for the leading eigenvector is invariant to the way that one chooses the remaining d−1 eigenvectors with which to sample the Gaussian vector.
-
D. Kluger and A. B. Owen and D. B. Lobell
Combining randomized field experiments with observational satellite data to assess the benefits of crop rotations on yields
arXiv |
Environmental Research Letters
With climate change threatening agricultural productivity and global food demand increasing, it is important to better understand which farm management practices will maximize crop yields in various climatic conditions. To assess the effectiveness of agricultural practices, researchers often turn to randomized field experiments, which are reliable for identifying causal effects but are often limited in scope and therefore lack external validity. Recently, researchers have also leveraged large observational datasets from satellites and other sources, which can lead to conclusions biased by confounding variables or systematic measurement errors. Because experimental and observational datasets have complementary strengths, in this paper we propose a method that uses a combination of experimental and observational data in the same analysis.
2021
-
Z. Pan and A. B. Owen
Super-polynomial accuracy of one dimensional randomized nets using the median-of-means
arXiv
Let f be analytic on \([0,1]\) with \(|f^{(k)}(1/2)|\leq A\alpha^kk!\)
for some constant A and \(\alpha<2\).
We show that the median estimate of \(\mu=\int_0^1f(x)\,\mathrm{d} x\)
under random linear scrambling
with \(n=2^m\) points converges at the rate \(O(n^{-c\log(n)})\) for any
\(c< 3\log(2)/\pi^2\approx 0.21\). We also get a super-polynomial convergence rate for
the sample median of \(2k-1\) random linearly scrambled estimates, when \(k=\Omega(m)\).
When f has a p'th derivative that satisfies a \(\lambda\)-Holder condition
then the median-of-means has error \(O( n^{-(p+\lambda)+\epsilon})\)
afor any \(\epsilon>0\), if \(k\to\infty\) as \(m\to\infty\).
-
Z. Pan and A. B. Owen
Where are the logs?
arXiv
| Advances in Modeling and Simulation, 381-400
We show that powers of \(\log(n)\) really are needed in the error bounds
for some functions of BVHK in a setting where the same integrand
is used for all n.
-
E. Rosenman and A. B. Owen
Designing Experiments Informed by Observational Studies
arXiv |
Journal of Causal Inference
We use past observational data to tune the design
for a future experimental comparison of treatment
to control. Because the past data are observational
there can be some doubt about missing variables
affecting the variance estimates they provide. We counter
that using a Rosenbaum-style sensitivity analysis
based on a bootstrap proposal of Zhao, Small and
Bhattacharya (2019).
-
S. Ghosh and T. Hastie and A. B. Owen
Scalable logistic regression with crossed random effects
arXiv
Usual algorithms for crossed random effects bring
costs proportional to \(N^{3/2}\) or worse.
An earlier paper got the cost per iteration down
to O(N) for generalized leaat squares by using
backfitting and also gave conditions where O(1)
iterations suffice.
For generalized linear mixed models there is an additional
complication from a high dimensional integral. We
adapt a penalized quasi-likelihood algorithm from Schall
to get O(N) cost per iteration. Empirically the iteration
count is O(1).
-
C. R. Hoyt and A. B. Owen
Probing neural networks with t-SNE, class-specific projections and a guided tour
PDF | arXiv
We use graphical methods to probe neural nets that classify images. Plots of t-SNE outputs at successive layers in a network reveal increasingly organized arrangement of the data points. They can also reveal how a network can diminish or even forget about within-class structure as the data proceeds through layers. We use class-specific analogues of principal components to visualize how succeeding layers separate the classes. These allow us to sort images from a given class from most typical to least typical (in the data) and they also serve as very useful projection coordinates for data visualization. We find them especially useful when defining versions guided tours for animated data visualization.
-
Z. Pan and A. B. Owen
The nonzero gain coefficients of Sobol's sequences are always powers of two
PDF | arXiv
The old bound for gain coefficients was \(2^t3^s\) for quality parmeter t in s dimensions.
We improve that to \(2^{t+s-1}\).
This bound is a simple, but apparently unnoticed, consequence of a microstructure analysis in Niederreiter and Pirsic (2001). We obtain a sharper bound that is smaller than this for some digital nets. We also show that all nonzero gain coefficients must be powers of two. A consequence of this latter fact is a simplified algorithm for computing gain coefficients of nets in base 2.
-
M. Mase and A. B. Owen and B. Seiler
What makes you unique?
arXiv
This paper proposes a uniqueness Shapley measure to compare the extent to which different variables are able to identify a subject. Revealing the value of a variable on subject t shrinks the set of possible subjects that t could be. The extent of the shrinkage depends on which other variables have also been revealed. We use Shapley value to combine all of the reductions in log cardinality due to revealing a variable after some subset of the other variables has been revealed. This uniqueness Shapley measure can be aggregated over subjects where it becomes a weighted sum of conditional entropies. Aggregation over subsets of subjects can address questions like how identifying is age for people of a given zip code. Such aggregates have a corresponding expression in terms of cross entropies. We use uniqueness Shapley to investigate the differential effects of revealing variables from the North Carolina voter registration rolls and in identifying anomalous solar flares. An enormous speedup (approaching 2000 fold in one example) is obtained by using the all dimension trees of Moore and Lee (1998) to store the cardinalities we need.
-
M. Mase and A. B. Owen and B. Seiler
Cohort Shapley value for algorithmic fairness
arXiv
Cohort Shapley value is a model-free method of variable importance grounded in game theory that does not use any unobserved and potentially impossible feature combinations. We use it to evaluate algorithmic fairness, using the well known COMPAS recidivism data as our example. This approach allows one to identify for each individual in a data set the extent to which they were adversely or beneficially affected by their value of a protected attribute such as their race. The method can do this even if race was not one of the original predictors and even if it does not have access to a proprietary algorithm that has made the predictions. The grounding in game theory lets us define aggregate variable importance for a data set consistently with its per subject definitions. We can investigate variable importance for multiple quantities of interest in the fairness literature including false positive predictions.
-
D. M. Kluger and A. B. Owen
A central limit theorem for the Benjamini-Hochberg
false discovery proportion under a factor model
PDF|
arXiv
The FDP is the proportion of rejected hypotheses that are actually null. The FDR is the
expected value of FDP if we take \(0/0=0\). It can be concerning when the ratio FDP/FDR
has a heavy right tail. We find that this can happen in an asymptotic
setting based on factor analysis.
We also find that a heavy right tail does not necessarily happen. The key is how
strong some long range correlations are. We use a functional central limit
theorem following some earlier work by Delattre and Roquain (2016).
-
S. Liu and A. B. Owen
Quasi-Monte Carlo Quasi-Newton for variational Bayes
PDF
| arXiv
MC methods are commonly used in machine learning
settings to optimize over \(\theta\) the expectation
over z of \(f(z;\theta)\). We look at variational Bayes,
replacing the MC samples by randomized quasi-Monte Carlo (RQMC).
After K iterations, the estimate \(\theta_K\) differs from
the optimum by an amount that exponentially decays with K
plus an amount that can be bounded by the sampling variance.
We find that RQMC reduces that variance and the practical
benefit is that iterations can use fewer z samples.
-
D. M. Kluger and A. B. Owen
Local linear tie-breaker designs
PDF
| arXiv
Tie-breaker designs are a kind of treatment triage: top units
get a treatment, bottom ones do not, and the ones in between
are randomized 50:50 to treatment or control. They can be used
in a tradeoff between statistical efficiency for estimating
the causal effect of a treatment and allocation efficiency
where it makes more sense to give the treatment to higher
rated units.
Owen and Varian (2020)
showed that tie-breaker designs are more statistically efficient than regression discontinuity
designs (RDD). It is becoming standard to use kernel weighted local
regressions in RDD. This paper shows that in the local setting also,
it is more efficient to employ some randomization around a cutoff.
2020
-
A. B. Owen
On dropping the first Sobol' point
PDF|
arXiv
Quasi-Monte Carlo (QMC) methods are a substitute for plain Monte Carlo (MC).
QMC integration can be much more accurate attaining \(o(1/n)\) errors instead of \(O_p(1/\sqrt{n})\).
As a result natural operations for MC like thinning, burn-in and using a round
number n can actually make the rate of convergence worse by a factor of \(\sqrt{n}\).
A common practice is to skip over the first Sobol' point because it equals (0,0,...,0) and take the
next \(2^n\) points. Skipping even one point that way breaks the balance property of the Sobol' points
and worsens the convergence rate when those points are scrambled.
-
S. Ghosh and T. Hastie and A. B. Owen
Backfitting for large scale crossed random effects regressions
PDF|
arXiv
Regression models with crossed random effect error models can be very expensive to compute.
The cost of both generalized least squares and Gibbs sampling
can easily grow as \(N^{3/2}\) (or worse) for N observations.
Papaspiliopoulos, Roberts and Zanella (2020)
present a collapsed Gibbs sampler that costs
\(O(N)\), but under an extremely stringent sampling model.
We propose a backfitting algorithm to compute a generalized least squares
estimate and prove that it costs \(O(N)\) under greatly relaxed though still
strict sampling assumptions. Empirically, the backfitting algorithm
costs \(O(N)\) under further relaxed assumptions.
We illustrate the new algorithm on a ratings data set from Stitch Fix.
-
C. R. Hoyt and A. B. Owen
Efficient estimation of the ANOVA mean dimension, with
an application to neural network classification
PDF|
arXiv
We compare sampling strategies for the ANOVA mean dimension which
involves summing Sobol' indices estimated by changing one variable
at a time. We compare a Gibbs sampler (winding stairs) to a radial
strategy and a naive sampler. The naive sampler costs about twice
as much as the others, but can be more efficient when the target
function has high kurtosis.
As an example, a multilayer network to classify MNIST images from 784
pixels typically has mean dimension in the range from 1.5 to 2.0 prior
to the final softmax layer and winding stairs work best on it.
- E. Rosenman, G. Basse, A. B. Owen and M. Baiocchi
Combining Observational and Experimental Datasets Using Shrinkage Estimators
arXiv
We consider the problem of combining data from observational and
experimental sources to make causal conclusions. The goal is to
design a followup experiment to augment existing observational
data. The estimator is based on tuning a Stein shrinkage estiator
and this requires
some knowledge about underlying variances. We use information
about those variances derived from the observational data.
We use Rosenbaum style sensitivity to account for unmeasured
confounders that could have biased those variance estimates.
-
A. B. Owen and D. Rudolf
A strong law of large numbers for scrambled net
integration
PDF|
arXiv
This article provides a strong law of large numbers for integration on digital nets randomized by a nested uniform scramble. The motivating problem is optimization over some variables of an integral over others, arising in Bayesian optimization.
This strong law requires an integrand in \(L^p[0,1]^d\) for some \(p>1\).
This improves on the analogous result for unrandomized QMC which requires Riemann
integrable f (for which bounded variation in the sense of Hardy
and Krause is a sufficient condition). We first prove the result for an integrand in \(L^2\) then extend to \(L^p\) using
the Riesz-Thorin interpolation theorem.
2019
-
M. Mase and A. B. Owen and B. Seiler
Explaining black box decisions by Shapley cohort refinement
PDF
We use Shapley value to quantify the importance of various
inputs to a black box prediction model. A key property of our
method is that we do not need to use implausible/impossible feature
combinations (e.g., a home with many rooms but few square
feet or a graduation date that precedes a birth date). Instead
we start with a cohort of all subjects, and refine it to be similar
to a target subject in some subset of predictors. The predictors whose
refinement makes large changes are the important ones.
-
C. R. Hoyt and A. B. Owen
Mean dimension of ridge functions
PDF
Ridge functions \(f(\boldsymbol{x})=g(\Theta^{\mathsf{T}}\boldsymbol{x})\)
depend on d dimensional inputs only through an \(r\ll d\) projection.
If g is Lipschitz, then f has bounded mean dimension as \(d\to\infty\)
making it potentially very amenable to quasi-Monte Carlo integration.
For discontinuous g, the mean dimension depends on sparsity of \(\Theta\)
and could grow like \(\sqrt{d}\). Pre-integration of f then improves
its smoothness and can reduce mean dimension to O(1) if the pre-integrated
variable's importance does not vanish as \(d\to\infty\).
This article is mostly about integration over \(\mathbb{R}^d\) under
a spherical Gaussian measure.
-
A. Gelman, B. Haig, C. Hennig, A. B. Owen, R. Cousins,
S. Young, C. Robert, C. Yanovsky, E. J. Wagenmakers, R. Kennet
and D. Lakeland
Many perspectives on Deborah Mayo's "Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars"
arXiv
-
Owen, A. B. and Zhou, Y.
The square root rule for adaptive importance sampling
PDF
| arXiv
Suppose you have uncorrelated unbiased estimates \(\hat\mu_k, 1\leqslant k\leqslant K\) with
variances that decay like \(k^{-y}\) for \(0\leqslant y\leqslant 1\).
Not knowing \(y\), you weight them proportionally to \(k^x\) for \(0\leqslant x\leqslant 1\).
If you choose \(x=1/2\), your estimate will remain unbiased and will have variance at most
9/8 times that of the unknown best unbiased combination. Weighting by an estimated variance
would be prone to bias in rare event settings because the sample values are prone to being
correlated with the variance estimates.
This theorem comes from the appendix of an unpublished technical report from 1999.
2018
-
Owen, A. B.
Unreasonable effectiveness of Monte Carlo
PDF
This is a comment on a paper by
Briol, Oates, Girolami, Osborne and Sejdinovic
entitled "Probabilistic Integration: A Role in Statistical Computation?"
My take is that the Bayesian approach holds tremendous promise but is better
directed at more complicated problems than estimating an integral. For instance
most Bayesian computation is now done by frequentist MCMC methods.
Here is their rejoinder.
The paper, all discussions, and rejoinder will appear in Statistical Science.
-
Owen, A. B. and Varian, H.
Optimizing the tie-breaker regression discontinuity design
PDF
We consider settings where one could use either a
Randomized Controlled Trial (RCT) or a Regression
Discontinuity Design (RDD) to analyze the causal impact
a treatment that offers customers some sort of bonus
or incentive. The RDD is expected to have superior
immediate payoff on the customers under study.
The RCT is well known to have
greater statistical efficiency. We study a hybrid tie-breaker
design in which a fraction \(\Delta\in(0,1)\) of the customers
get a randomized treatment. We find statistical efficiency
increases monotonically with \(\Delta\) in a two regression
line model. The cost of experimentation decreases monotonically
in \(\Delta\) and we find an expression for optimizing the tradeoff.
This is work I did for Google, and it was not part of my Stanford duties.
-
Dobriban, E. and Owen, A. B.
Deterministic parallel analysis
JRSS-B 2019 (online 2018)
It is really hard to pick the number k of factors in a factor analysis.
Parallel analysis is one of the most popular ways. People say it works well.
In Buja and Eyoboglu's version we take a data matrix X and apply uniform random permutations
to each column some number of times, computing the factor analysis each time. If the real
first eigenvalue is larger than say 95% of the simulated ones then we take \(k\geqslant 1\)
and look at the second eigenvalues. And so on. This paper is about how to do that without
actually running all those permutations which are expensive and introduce unneeded randomness into
the outcome. Instead we can use random matrix theory to determine where the upper edge of the spectrum
would be if there were no correlations among the data.
-
A. Ben Abdellah, P. L'Ecuyer, A. B. Owen, F. Puchhammer
Density estimation by randomized quasi-Monte Carlo
arXiv
We have a deterministic function \(x = g(\boldsymbol{u})\) on the unit cube
\((0,1)^s\), or some other domain that
can be mapped to from the unit cube. We want to estimate the probability density function of \(x=g(\boldsymbol{u})\) with respect to \(\boldsymbol{u}\sim U(0,1)^s\). We do this by kernel density estimation and by histograms applied to randomized quasi-Monte Carlo input points in the unit cube. Using RQMC reduces the variance leaving the bias unaffected. That makes it possible to use smaller bandwidths in the KDE. The gain can be mitigated by the reduced smoothness of the implied integrand when h is small. There can be large improvements in low dimensions.
-
Rosenman, E. and Baiocchi, M. and Owen, A. B.
Propensity score methods for merging observational and experimental datasets
PDF
Consider a large observational dataset (ODB) and a much smaller randomized controlled trial (RCT)
about the same treatement intervention. The RCT may support better causal conclusions, while the ODB has much more
data and may have better external validity (a more representative sample).
We consider merging the two, using the propensity score that the RCT data would have had,
had they been in the ODB. We develop two approaches, comparing them by simulations and by
the bias and variance they attain in a delta method approximation. Even a small RCT can greatly
improve the estimation of treatment effects over what the ODB alone can do.
-
Owen, A. B and Glynn, P. (editors)
Monte Carlo and Quasi-Monte Carlo Methods
MCQMC, Stanford CA, August 2016
Springer web site for this book
The proceedings volume of MCQMC 2016
has been published by Springer. It has 26 contributions on Monte Carlo and quasi-Monte Carlo.
It includes 3 tutorials on QMC, written by Fred Hickernell, Pierre L'Ecuyer and Frances Kuo.
2017
-
Owen, A. B., Chertkov, M. and Maximov, Y.
Importance sampling the union of rare events with an application to power systems analysis
arXiv
We consider importance sampling to estimate the probability \(\mu\) of a union of \(J\) rare events \(H_j\)
defined by a random variable \(\boldsymbol{x}\).
The sampler we study has been used in spatial statistics, genomics and combinatorics
going back at least to Frigessi and Vercellis (1985).
It works by sampling one event at random, then sampling \(\boldsymbol{x}\)
conditionally on that event happening and it constructs an unbiased estimate of \(\mu\)
by multiplying an inverse moment of the number of occuring events by the union bound.
We prove some variance bounds for this sampler. For a sample size of \(n\),
it has a variance no larger than \(\mu(\bar\mu-\mu)/n\) where \(\bar\mu\)
is the union bound.
It also has a coefficient of variation no larger than
\(\sqrt{(J+J^{-1}-2)/(4n)}\) regardless of the overlap pattern among the events.
Our motivating problem comes from power system reliability, where the phase
differences between connected nodes have a joint Gaussian distribution and
the \(J\) rare events arise from unacceptably large phase differences.
In the grid reliability problems even some events defined by \(5772\) constraints
in \(326\) dimensions, with probability below \(10^{-22}\),
are estimated with a coefficient of variation of about \(0.0024\) with only \(n=10{,}000\) sample values.
-
Owen, A. B.
Effective dimension of some weighted pre-Sobolev spaces with dominating mixed partial derivatives
arXiv|
PDF of SINUM article
Old title: Effective dimension of weighted Sobolev spaces: non-periodic case
This paper considers two notions of effective dimension for
quadrature in weighted pre-Sobolev spaces
with dominating mixed partial derivatives.
We begin by finding a ball in those spaces just barely
large enough to contain a function with unit variance.
If no function in that ball has more than \(\varepsilon\)
of its variance from ANOVA components involving
interactions of order \(s\) or more, then the space
has effective dimension at most \(s\) in the superposition sense.
A similar truncation sense notion replaces the cardinality
of the ANOVA component by the largest index it contains.
Some Poincar\'e type inequalities
are used to bound variance components by multiples
of these space's squared norm and those
in turn provide bounds on effective dimension.
Very low effective dimension in the superposition
sense holds for some spaces defined by product weights
in which quadrature is strongly tractable.
The superposition dimension is
\(O( \log(1/\varepsilon)/\log(\log(1/\varepsilon)))\)
just like the superposition dimension used in the multidimensional
decomposition method. Surprisingly,
even spaces where all subset weights are equal, regardless
of their cardinality or included indices, have low superposition
dimension in this sense.
This paper does not require periodicity of the integrands.
-
Owen, A. B.
On \(L^2\) norms of derivatives of orthogonal polynomials with respect to Sobolev inner products
PDF
For \(\lambda\ge0\),
let \(\langle f,g\rangle_\lambda := \int_{-1}^1 f(x)g(x) d x
+\lambda\int_{-1}^1 f'(x)g'(x) d x\) define an inner product
for differentiable functions on \([-1,1]\).
For \(n\ge0\), let \(S_n=S_n^\lambda\) be the orthogonal polynomials of degree \(n\)
obtained by applying the Gram-Schmidt algorithm in this
inner product to monomials, normalized so that \(S_n(1)=1\).
Then the derivatives \(S_n'\) are given as explicit Legendre series,
it is proved that \(\arg\min_{n\ge1}\int_{-1}^1 S_n'(x)^2 d x/\int_{-1}^1S_n(x)^2 d x=1\),
and an expression is given for \(\int_{-1}^1 S'_n(x)S'_{n'}(x) d x\).
-
Owen, A. B.
A randomized Halton algorithm in R
PDF
Randomized quasi-Monte Carlo (RQMC) sampling can bring orders of magnitude reduction in variance compared to plain Monte Carlo (MC) sampling. The extent of the efficiency gain varies from problem to problem and can be hard to predict. This article presents an R function rhalton that produces scrambled versions of Halton sequences. On some problems it brings efficiency gains of several thousand fold. On other problems, the efficiency gain is minor. The code is designed to make it easy to determine whether a given integrand will benefit from RQMC sampling. An RQMC sample of n points in \([0,1]^d\) can be extended later to a larger n and/or d.
-
Gao, K. and Owen, A. B.
Estimation and inference for very large linear mixed effects models
PDF
| PDF (supplement)
| Slides
| Code
Linear mixed models with large imbalanced crossed random effects structures pose severe computational problems for maximum likelihood estimation and for Bayesian analysis. The costs can grow as fast as \(N^{3/2}\) when there are N observations. Such problems arise in any setting where the underlying factors satisfy a many to many relationship (instead of a nested one) and in electronic commerce applications, the N can be quite large. Methods that do not account for the correlation structure can greatly underestimate uncertainty. We propose a method of moments approach that takes account of the correlation structure and that can be computed at O(N) cost. The method of moments is very amenable to parallel computation and it does not require parametric distributional assumptions, tuning parameters or convergence diagnostics. For the regression coefficients, we give conditions for consistency and asymptotic normality as well as a consistent variance estimate. For the variance components, we give conditions for consistency and we use consistent estimates of a mildly conservative variance estimate. All of these computations can be done in O(N) work. We illustrate the algorithm with some data from Stitch Fix where the crossed random effects correspond to clients and items.
This one appeared in 2016, but has been very substantially revised.
Katelyn Gao's thesis
2016
-
Owen, A. B. and Launay, T.
Multibrand geographic experiments
PDF
It is about geographic experiments. The idea is to run B experiments on G regions at once
and then use a Stein shrinkage method from Xie, Kou and Brown (2012), or Bayesian statistics via Stan, to pool the results.
Pooling means you can work with smaller experiments than if you did them one at a time. B and G are both even.
In the design, each experiment is toggled up in G/2 regions and down in G/2 regions. Each region is
up for B/2 experiments and down for B/2 experiments. The Bayesian approach seemed to work a bit better.
The design is found using a Markov chain studied by Diaconis and Gangolli. For G>B the design is a
two level factorial for each experiment and is also a supersaturated design for the regions. It would be nice
if the GxB matrix had no duplicate rows or columns and no rows or columns that are exact opposites. The paper
provides some sufficient conditions for the existence of such designs (with constructions).
This is work I did for Google, and it was not part of my Stanford duties.
-
Owen, A. B.
(new title) Refiltering hypothesis tests to control sign error
revised PDF (Aug 2019) |
arXiv|
WHOA-PSI 2019 slides
A common, though not recommended statistical practice
is to report confidence intervals if and only if they exclude a null
value of 0.
The resulting filtered confidence intervals generally do not
have their nominal confidence level. More worryingly, in
low power settings they will frequently lie on the wrong side of zero.
We assume that the confidence intervals were constructed
using an asymptotically Gaussian parameter estimate accompanied
by a weakly consistent estimate of its variance. In these cases,
we can subject the given confidence interval(s) to a second
filtering step such that the probability of a sign error
is controled. This refiltering step retains only
those confidence intervals that are sufficiently well
separated from the origin.
-
Gao, K. and Owen, A. B.
Estimation and inference for very large linear mixed effects models
PDF
| PDF (supplement)
Very large crossed data sets are increasingly common especially in e-commerce. It is often appropriate to model them with crossed
random effects. Their size provides challenges for statistical analysis. For such large data sets, the computational costs of
estimation and inference should grow at most linearly with the sample size. Commonly used maximum likelihood and Bayesian
approaches do not scale at that rate. We propose a method of moments based algorithm that scales linearly and can be easily
parallelized at the expense of some loss of statistical efficiency. We apply the algorithm to some data from Stitch Fix where
the crossed random effects correspond to clients and items. The random effects analysis is able to account for the increased
variance due to intra-client and intra-item correlations in the data. Ignoring the correlation structure can lead to standard
error underestimates of over 10-fold for that data. This algorithm is proven to give estimates that are asymptotically
consistent and normally distributed. We use a martingale CLT decomposition of U-statistics to establish normality for
the variance components.
Katelyn Gao's thesis
-
Wang, J., Gui, L. Su, W. J. Sabatti, C. and Owen, A. B.
Detecting multiple replicating signals using adaptive filtering procedures
arXiv
| PDF | The Annals of Statistics 50 (4), 1890-1909
The partial conjunction null hypothesis \(H_0^{r/n}\) allows up to
n-r+1 related basic null hypotheses to hold. Rejecting it
allows us to conclude that at least r of the basic nulls
are false, providing a measure of reproducibility.
Motivated by genomic problems we consider a setting with a large number M of partial conjunction null
hypotheses to test, based on an \(n\times M\) matrix of p-values. When r>1 the hypothesis \(H_0^{r/n}\) is composite.
Validity versus the case with r-1 alternative hypotheses holding can lead to very conservative tests. We
develop a filtering approach for \(H_0^{r/n}\) based on the M p-values for \(H_0^{(r-1)/n}\). This filtering
approach has greater power than straightforward PC testing. We prove that it can be used to control the
familywise error rate, the per family error rate, and the false discovery rate among M PC tests. In
simulations we find that our filtering approach properly controls the FDR while achieving good power.
-
Owen, A. B. and Prieur, C.
On Shapley value for measuring importance of dependent inputs
Revised PDF
| arXiv
| HAL
This paper makes the case for using Shapley value to quantify the importance of random input variables to a function. Alternatives based on the ANOVA decomposition can run into conceptual and computational problems when the input variables are dependent. Our main goal here is to show that Shapley value removes the conceptual problems. We do this with some simple examples where Shapley value leads to intuitively reasonable nearly closed form values.
Errata: In Theorem 4.1 the condition that \(\Sigma\) has full rank should instead be
that \(\Sigma\) is positive definite. The matrix in the paragraph following Theorem 4.1 should be
\(\bigl(\begin{smallmatrix} 1 & \rho\\\rho & 1\end{smallmatrix}\bigr)\)
not
\(\bigl(\begin{smallmatrix} 0 & \rho\\\rho & 0\end{smallmatrix}\bigr)\).
-
Basu, K. and Owen, A. B.
Quasi-Monte Carlo for an integrand with a singularity along a diagonal in the square PDF
A singularity along an arbitrary manifold works differently for QMC than one at an isolated point or along the boundary of the
unit cube. We look at a singularity along the diagonal of \([0,1]^2\) as a basic example. There are several ways to handle this, but the
best one appears to be to transform the integrand into one or more integrands with QMC-friendly singularities (along the boundary
of the unit cube, in this case). For a festschrift marking Ian Sloan's 80th birthday.
-
He, H. Y., Basu, K., Zhao, Q. and Owen, A. B.
Permutation p‐value approximation via generalized Stolarsky invariance
Annals of Statistics|
arXiv 1603.02757|
latest PDF | Supplement |
pipeGS R package on CRAN|
data
It is really expensive to get a tiny p value via permutations.
For linear test statistics, the permutation p value is the fraction
of permuted data vectors in a given spherical cap.
Stolarsky's invariance principal from quasi-Monte Carlo sampling
gives the root mean squared discrepancy between the observed and expected
proportions of points in spherical caps. We use and extend this principal
to develop approximate p values with small relative errors. Their accuracy is competitive
with saddlepoint approximations, but they come with an error estimate.
Hera He's thesis
-
Gao, K. and Owen, A. B.
Efficient moment calculations for variance components in large unbalanced crossed random effects models
PDF
| supplement
Large unbalanced crossed random effects models provide extreme challenges.
Both maximum likelihood and MCMC scale superlinearly in problem size. We
look at method of moments estimates that scale linearly, while retaining
competitive accuracy with the MLE on problems small enough to allow computation
of the MLE.
Katelyn Gao's thesis
2015
-
Basu, K. and Owen, A. B.
Transformations and Hardy-Krause variation
PDF
| arXiv
| SINUM
A transformation \(\tau\) may be used to map the unit
cube onto the simplex, disk, sphere
and related spaces. For use with QMC integration of f
we want \(f\circ\tau\) to be of bounded variation in the
sense of Hardy and Krause. We give such conditions using
a multivariable Faa di Bruno formula of Constantine and Savits.
A similar condition makes \(f\circ\tau\) smooth enough to
benefit from scrambled net quadrature. We also apply Faa di
Bruno to find conditions for importance sampling and Rosenblatt-Hlawka-Muck
sequential inversion to be suitable for QMC. Finally we give an importance sampled mapping
from the unit cube to the simplex that results in RQMC quadratures with
RMSE \(O(n^{-3/2+\epsilon})\) for a class of functions generalizing polynomials.
Kinjal Basu's thesis
-
Owen, A. B.
Statistically efficient thinning of a Markov chain sampler
JCGS|
revised PDF |
arXiv | R code
Thinning or subsampling an MCMC is well known to decrease efficiency
if we simply discard computed values. Suppose that it costs one unit of time to
advance the Markov chain, and \(\theta\) units to compute the quantity of
interest. If we thin to every k'th value, \(k\geqslant2\) then we don't have to compute the
quantity of interest at every iteration and so we can run the chain longer.
If the autocorrelations in the process are non-increasing and non-negative then
there is alway a \(\theta\) large enough to make thinning to every k'th value
more efficient than not thinning.
Very commonly autocorrelations resemble those of an AR(1) process, \(\rho_\ell = \rho^{|\ell|}\).
This paper gives an algorithm for computing the optimal
subsampling factor k for the AR(1) case. When \(\rho\) is large the optimal thinning value k can
be large. When \(\theta\) is large the efficiency gain can be important. Taking k=1 (no thinning) is
optimal for \(\rho>0\) if and only if \(\theta \leqslant (1-\rho)^2/(2\rho)\). If \(-1<\rho<0\) then k=1
is optimal for any \(\theta\geqslant0\).
-
Wang, J, Zhao, Q., Hastie, T. and Owen, A. B.
Confounder adjustment in multiple hypothesis testing
arXiv
We look at the problem of large scale hypothesis testing when there are latent variables
that induce correlations among the tests, and more seriously, correlate with the primary
variable (e.g. treatment) under study. LEAPP handles this by assuming sparsity of effects
and RUV-4 handles it by assuming some negative controls: tests known a priori to be null.
We provide theoretical gaurantees for versions of these algorithms and generalize them to
handle multiple primary variables. When the confounding variables are strong enough, then the
methods can perform as well asymptotically as an oracle that observes the latent variables.
Jingshu Wang's thesis |
Yunting Sun's thesis
-
Wang, J and Owen, A. B.
Admissibility in partial conjunction testing
arXiv
| PDF (to appear in JASA)
Admissibility of meta-analysis has been well understood since Allan Birnbaum's work in the 1950s. Any valid combined p-value obeying a monotonicity constraint is optimal at some alternative and hence admissible. In an exponential family context, the admissible tests reduce to those with a convex acceptance region. The partial conjunction null hypothesis is that at most r - 1 of n independent component hypotheses are non-null with r = 1 corresponding to a usual meta-analysis. Benjamini and Heller (2008) provide a valid test for this null by ignoring the r - 1 smallest p-values and applying a valid meta-analysis p-value to the remaining n - r + 1 p-values. We provide sufficient conditions for the admissibility of their test among monotone tests. A generalization of their test also provides admissible monotone tests and we show that admissible monotone tests are necessarily of that generalized form. If one does not require monotonicity then their test is no longer admissible, but the dominating tests are too unreasonable to be used in practice.
-
Lee, M. and Owen, A. B.
Single nugget Kriging
PDF
We propose a method with better predictions at extreme values than the standard method of Kriging. We construct our predictor
in two ways: by penalizing the mean squared error through conditional bias and by penalizing the conditional likelihood at
the target function value. Our prediction exhibits robustness to model mismatch in the covariance parameters, a desirable
feature for computer simulations with a restricted number of data points. Applications on several functions show that our
predictor is robust to the non-Gaussianity of the function.
Minyong Lee's thesis
- Dobriban, E, Fortney, K. Kim, S. K. and Owen, A. B.
Optimal multiple testing under a Gaussian prior on effect sizes
Biometrika|
arXiv | PDF
We develop a new method for frequentist multiple testing with Bayesian prior information. Our procedure finds a new set of optimal p-value weights called the Bayes weights. Prior information is relevant to many multiple testing problems. Existing methods assume fixed, known effect sizes available from previous studies. However, the case of uncertain information is more usual. For a Gaussian prior on effect sizes, we show that finding the optimal weights is a non-convex problem. Despite the non-convexity, we give an efficient algorithm that solves this problem nearly exactly. We show that our method can discover new loci in genome-wide association studies. On several data sets it compares favorably to other methods. Open source code is available.
-
Lawrence, M., Huntley, M. A., Stawiski, E., Owen, A. B., Wu, T. D., Goldstein, L., Cao, Y., Degenhardt, J., Young, J.,
Guillory, J., Heldens, S., Jackson, A., Seshagiri, S., Gentleman, R.
Genomic variant calling: Flexible tools and a diagnostic data set
bioR\(\chi\)iv
| Supplement
This is about identifying low frequency genetic variants in tumors.
The supplement includes a use of the Cauchy link in binomial regression. A quasi-binomial logistic regression
gave overdispersions like \(10^9\); Cauchy is much better.
-
Fortney, K., Dobriban, E., Garagnani, P., Pirazzini, C., Monti, D., Mari, D., Atzmon, G., Barzilai, N., Franceschi, C., Owen, A. B., Kim, S. K.
Genome-wide scan informed by age-related disease identifies loci for exceptional human longevity
We find some new SNPs associated with extreme longevity using frequentist
multiple testing based on Bayesian priors from the companion Biometrika paper.
PLoS | genetics
- Basu, K. and Owen, A. B.
Scrambled geometric net integration over general product spaces
Foundations
of Computational Mathematics |
arXiv |
PDF
We develop scrambled net quadrature over Cartesian products of triangles, disks,
spherical triangles and generalizations. For smooth functions on an s-fold product
of d-dimensional sets the variance is
\(O(n^{-1-2/d}\log(n)^{s-1})\).
Kinjal Basu's thesis
- Wang, J. and Owen, A. B.
Bi-cross-validation for factor analysis
published article
from Statistical Science
| slides
We use bi-cross-validation, holding out some rows and some columns
of a data matrix, to choose the number of factors for a factor analysis.
The criterion is recovery of the underlying signal matrix.
Jingshu Wang's thesis
- He, Z. and Owen, A. B.
Discussion on `Sequential Quasi-Monte-Carlo Sampling', by Gerber & Chopin
PDF
Text from copyright form:
This is the pre-peer reviewed version of the discussion named above that was published
in its final form at
JRSS-B.
(Technically this is a comment that was not peer reviewed, but the journal
did copy edit what we sent.)
2014
- He, H. Y. and Owen, A. B.
Optimal mixture weights in multiple importance sampling
PDF
We show that the mixture weights in multiple importance sampling
can be optimized via convex optimization.
- He, Z. and Owen, A. B.
Extensible grids:
uniform sampling on a space-filling curve
PDF
We study QMC in \([0,1]^d\) by QMC and randomized QMC
on \([0,1]\) followed by applying Hilbert's space-filling
curve. We find that the result has the same high-dimensional
performance as sampling on a grid.
As bad as that is in high dimensions,
it is best possible for some hard problems,
and the proposed method does not require sample
sizes of the form \(n=m^d\). An RQMC version attains
MSE \( O(n^{-1-2/d})\) for integration of Lipshitz continuous functions.
- Larson, J. L. and Owen, A. B.
Moment based gene set tests
BMC bioinformatics|
npGSEA Bioconductor package|
old PDF
Permutation tests are a popular way to test
whether a set of genes is associated with a
treatment, or reversing the hoped-for causal arrow,
a phenotype. But they are expensive. We develop
moment based alternatives that are much faster.
Both beta and Gaussian approximations are available
for linear statistics (similar to the JG score).
We also develop a scaled chisquare approximation
to a sum of squared regression coefficients. Such
a test was best overall among 261 gene set tests
investigated by Ackermann and Strimmer (2009). We
illustrate the method on three public data sets
related to Parkinson's disease, and find some enriched
sets not noticed in the original publications.
- Owen, A. B.
A constraint on extensible quadrature rules
Numerische Mathematik
preprint
Suppose that the best possible rate for a quadrature
problem is \(O(n^{-\alpha})\) for \(\alpha>1\), for
a simple average of function values.
Suppose further that a rate optimal sequence uses
sample sizes \(n_k\). This paper extends an idea from
Sobol' (1998) to give a lower bound on \(\rho = n_{k+1}/n_k\).
The bound is between 1 and 2, so it always rules out arithmetic
sequences and never rules out sample size doubling.
This version (Jan 2015) fixes an error in the proof of Theorem 1.
(Neither statement nor conclusion had to change.)
- Basu, K. and Owen, A. B.
Low discrepancy constructions in the triangle
SIAM Journal on Numerical Analysis
| PDF
We give two explicit
constructions of point sets in the triangle
with vanishing discrepancy. One adapts the van der Corput
sequence to the triangle. It has discrepancy
at most 12/\(\sqrt{N}\). The other scales a regular grid
then rotates it through an angle with a badly approximable tangent
attaining discrepancy \(O(\log(N)/N)\). For smooth functions,
randomizing the van der Corput sequence gives RMSE
\(O(1/N)\).
Kinjal Basu's thesis
- Owen, A. B. and Roediger, P. A.
The sign of the logistic regression coefficient
American Statistician (teacher's corner)|
arXiv |
PDF
This paper settles a conjecture that Paul Roediger
made (with D. M. Ray and B. T. Neyer)
in a comment on a paper by Jeff Wu and Y. Tang. In logistic
regression on a scalar \(x\), the MLE of the slope coefficient
\(\beta\) satisfies sign(\(\hat\beta\)) = sign(\(\bar x_1-\bar x_0\)),
where \(\bar x_y\) is the sample mean of x for Y=y.
That this should usually be the case is intuitively obvious.
That one cannot wiggle out of it by tweaking the variances,
skewnesses and/or outliers in the two x groups is less obvious.
One might imagine it follows from the means being sufficient statistics,
but they are only conditionally sufficient. Besides it holds
also for Probit models and others whose inverse link is the
CDF of a log-concave density, and where there is no tiny
sufficient statistic conditional or otherwise. There is
a generalization to vector valued predictors.
2013
- Owen, A. B.
Sobol' indices and Shapley value
PDF
Sobol' indices are used to measure the importance of
input variables in black box functions. Shapley value
is used by economists to apportion the value of a team's
efforts among its individual members. This paper compares
the methods. Neither kind of Sobol' index yields the
Shapley value for variance explained. Compared
to Shapley value, Sobol's lower index ignores interactions
while Sobol's upper index overcounts them.
- Billman, D. and He, H. and Owen, A. B.
Grouping tasks and data display items
via the non-negative matrix factorization
PDF
Our data are a list of 119 tasks that a pilot
must perform in flying a modern air liner,
210 input and output variables available to
the pilot, and a matrix indicating whether a given IO variable
is required for a given task. We use biclustering
of this data to aid in designing an interface.
- Owen, A. B. and Dick, J. and Chen. S.
Higher order Sobol' indices
original PDF | Published version:
Information and Inference 2014(3)59-81
We generalize Sobol' indices from an \(L^2\) to \(L^p\)
(integer \(p\ge2\))
setting in order to emphasize those variables that
most affect extreme values of the function.
Our generalizations have integral representations that
allow Monte Carlo or quasi-Monte Carlo estimation.
- Chen, A., Owen, A. B. and Shi, M.
Data enriched linear regression
arXiv |
revised Nov 2014
We have a small high quality data set of (X,Y) values
following a Gaussian linear regression, and a potentially much larger data
set following a possibly different Gaussian linear regression. We apply
a new form of Stein shrinkage to connect the two. It becomes inadmissible to
just use the small data set when there are p\(\ge\)5 predictors
and the error df \(\ge\)10. The new form of shrinkage outperforms
ordinary Stein shrinkage in simulations and we provide an explanation
by comparing them both to an oracle.
2012
2011
-
Ma, L, Wong, W. H. and Owen, A. B.
A sparse transmission disequilibrium test for
haplotypes based on Bradley-Terry graphs
PDF
-
Owen, A. B. and Eckles, D.
Bootstrapping data arrays of arbitrary order
PDF|
slides
McCullagh (2000) showed that
there is no bootstrap for the crossed random effects model.
But resampling each factor (rows and columns) independently
works well and is slightly conservative.
Here we replace resampling by reweighting and then extend
the result to data arrays of arbitrary order. Reweighting
is better suited to large data warehouses than resampling
is.
-
Gleich, D. G. and Owen, A. B.
Moment based estimation of stochastic
Kronecker graph parameters
PDF
It is hard to estimate the parameters
of Kronecker random graphs by maximum
likelihood. Here is a method of moments
strategy based on simple feature counts.
We find that moments are more robust and
give better fits than maximum likelihood.
-
Sun, Y. and Zhang, N. and Owen, A. B.
Multiple hypothesis testing, adjusting
for latent variables
PDF|
R package on CRAN|
R package tar file
We introduce LEAPP (latent effect adjustment after
primary projection), a method for taking account
of unmeasured latent variables when doing multiple
hypothesis testing. Simulations show good performance
compared to alternatives, EIGENSTRAT and SVA
(surrogate variable analysis). When applied to the
16 tissue AGEMAP data, LEAPP gives lists of age-related
genes with much more reproducibility (over tissues)
than the other methods. We prove some results on
the LEAPP estimates.
Yunting Sun's thesis
2010
-
Owen, A. B.
Moment based estimation of stochastic
Kronecker graph parameters
Deprecated. See above article with David Gleich.
It is hard to estimate the parameters
of Kronecker random graphs by maximum
likelihood. Here is a method of moments
strategy based on simple feature counts.
For large data sets the error is dominated
by model lack of fit and so the extra efficiency
of likelihood over moments is less important
than the speed advantage of moments.
-
Chen, S. and Matsumoto, M. and Nishimura, T. and Owen, A. B.
New inputs and methods for Markov chain
quasi-Monte Carlo
PDF
We present some new ways of generating small CUD
sequences. We introduce some embedded antithetic
and round trip variance reductions into MCMC
and prove that they preserve the CUD property.
In some simulations of GARCH and stochastic
volatility models, the new methods greatly
outperform standard IID sampling.
The original publication is available at
www.springerlink.com
(or will be so, once it has been completed).
Su Chen's thesis
-
Dyer, J. and Owen, A.B.
Visualizing bivariate long tailed data
revised PDF|
slides from NIPS
Suppose that we observe two or more categorical
variables with a long tail, such as movies and
customers in ratings data. This paper looks
at a way to visualize the joint distribution of
such data. We use a copula plot based on the
observed ranks of the data. We prove under
a generative model that the observed ranks are
asymptotically close to some underlying true
ranks under a Zipf-Mandelbrot-Poisson assumption.
Some ratings data show a
strong head to tail affinity: busy raters are
over represented at rarely rated items and conversely.
We present two simple generative models that
produce such an effect. One is a saturation
model and the other is bipartite preferential
attachment. We prove bounds on
the marginal distributions for these models.
Justin Dyer's thesis
-
Dyer, J. and Owen, A.B.
Correct ordering in the Zipf-Poisson ensemble
PDF
Revised Jan 2012 PDF
Counted rank data arise commonly, such as most popular
baby names, English words and web sites.
This paper analyzes the reliability of such ordering.
We use a model
where \(X_i\) is Poisson with mean following a Zipf law.
We get estimates for the number n of leading
items \(i=1\dots n\) correctly ordered by their observed counts \(X_i\).
If grows at the rate \((AN/\log(N))^{1/(\alpha+2)}\)
where \(\alpha\) is the Zipf parameter and \(A = \alpha^2(\alpha+2)/4\).
For a Zipf-Poisson model of the British National Corpus of 100,000,000 words,
we estimate that the 72 most frequent words are in
their correct order.
Justin Dyer's thesis
-
She, Y. and Owen, A.B.
Outlier detection using
nonconvex penalized regressions
PDF (orig) |
PDF (revised)
We put in a dummy variable for all n observations
in a regression but regularize their coefficients
via a thresholding rule. The result is robust regression
that empirically is very good at identifying outliers.
A key step is a clean model for which the outliers
become the signal and in which BIC is applicable.
Yiyuan She's thesis
2009
-
L'Ecuyer, P. and Owen, A.B. (eds)
Monte Carlo and Quasi-Monte Carlo Methods 2008
Proceedings of MCQMC 2008, July 6-11 2008, Montreal Canada.
Springer-Verlag. ISBN 978-3-642-04106-8
List of articles
- S.C. Emerson and Owen, A.B.
Calibration of the empirical likelihood method for a vector mean
Electronic
Journal of Statistics
This paper presents an approach for getting
outside of the 'convex hull problem' in empirical
likelihood.
Sarah Emerson's thesis
- Owen, A.B.
Recycling physical random variables
EJS
This paper shows how to get n(n-1)/2 pairwise independent
random vectors out of just n fully independent ones.
Similar constructions are widely used. What is new here is a
statistical analysis of the consequences for Monte Carlo sampling:
the resulting means are degenerate U-statistics with non-normal
limits. Quite surprisingly, their asymptotic distributions come out
symmetric, based on recent results on the spectrum of
circulant matrices.
- Southworth, L.K., Owen, A.B. and Kim, S.K.
Aging mice show a decreasing correlation of gene expression
within genetic modules.
PLoS Genetics
PDF
As mice age, the correlations among sets of related
genes grow weaker.
- Chen, S., Dick, J. and Owen, A.B.
Consistency of Markov chain quasi-Monte Carlo on continuous state spaces
PDF
Tribble has made over 1000-fold efficiency improvements by
inserting QMC sampling into MCMC problems.
The earlier consistency results for use of QMC in MCMC
only worked for discrete state spaces. This paper
extends them to continuous problems like the ones in
Seth Tribble's
thesis
Su Chen's thesis
- Xu, Y., Dyer, J.S. and Owen, A.B.
Empirical stationary correlations for semi-supervised learning on graphs
PDF|
Talk
Many methods for semi-supervised learning on graphs turn out to be forms
of kriging. They use a correlation structure derived from the graph, but
without taking account of correlations among the observed response values.
We incorporate the empirical correlations into the covariance. In two
example data sets we find greatly improved prediction.
Ya Xu's thesis
2008
- Owen, A.B.
Monte Carlo and Quasi-Monte Carlo for Statistics
PDF |
Proceedings of MCQMC 2008 (Montreal)
This is a survey which sketches some topics in statistics that
use Monte Carlo and quasi-Monte Carlo methods. The emphasis is
on problems with some open research issues.
- Owen, A.B.
Karl Pearson's meta-analysis revisited
Annals of Statistics | Slides
| Supplementary figures (web)
| Supplementary figures (pdf)
A test of Karl Pearson, thought to be inadmissible for over 50
years, is shown to be admissible. Furthermore it has good power
against alternatives in which all or most of the non-zero
parameters share the same sign. An earlier paper below,
used a big Monte Carlo simulation, where this one uses an FFT
to get exact power. This one also compares to standard tests not ordinarily
thought of as meta-analysis.
- Southworth, L.K., Kim. S.K. and Owen, A.B.
Properties of balanced permutations
Journal of Computational Biology,
April 2009, 16(4): 625-638
Balanced permutations are a fascinating idea for microarray
analyses. But we find that they can give very misleading
p values.
2007
- Perry, P.O. and Owen, A.B.
A rotation test to verify latent structure
JMLR Feb 2010
We test the presence of a latent variable in correlated noise
by employing projection pursuit non-normality measures.
Patrick Perry's thesis
- Zahn, Poosala, Owen, Ingram, Lustig, Carter, Weeratna, Taub, Gorospe, Mazan-Mamczarz, Lakatta,
Boheler, Zu, Mattson, Falco, Ko, Schlessinger, Firman, Kummerfeld, Wood, Longo, Zonderman, Kim, Becker
"AGEMAP: a gene expression database for aging in mice"
PLOS Genetics
This is a preliminary online version of the article. There may be changes.
We look at patterns of aging in mice for 16 different tissues, as measured by gene expression.
- Owen, A.B. and Perry, P.O.
Bi-cross-validation of the SVD and the non-negative matrix factorization
final PDF
from AOAS
| JSM 2009 slides
| talk at Cambridge University
| older slides
We look at how to pick the rank k when approximating a matrix
by a truncated SVD. We hold out a rectangular submatrix, fit an SVD
to a complementary submatrix, truncate it and predict.
The method extends to the non-negative matrix factorization
among other models.
Patrick Perry's thesis
- Owen, A.B.
"Pearson's test in a large scale multiple meta-analysis"
PDF
The AGEMAP study generated an 8932 x 16 matrix of p values.
We apply meta-analysis to each row.
A method originally proposed by Pearson (1934) and thought
for over 50 years to be inadmissible performs best in a simulation. We also
show that Pearson's test really is admissible.
- Owen, A.B.
"The pigeonhole bootstrap"
Annals of Applied Statistics
| PDF
| Slides
McCullagh (2000) showed that
large crossed random effects data sets,
such as are now studied for recommender engines and
information retrieval are impossible to bootstrap.
This means that even for balanced homoscedastic
random effects models with no missing data, no bootstrap
correctly estimates the variance of a sample mean (let
alone a more complicated procedure).
But one of the methods he studied, that of
independently resampling rows and columns, comes pretty
close. This article shows the expected bootstrap variance in
that method tracks the desired variance, even
for severely unbalanced and heteroscedastic data sets.
2006
- Owen, A.B.
"A robust hybrid of lasso and ridge regression"
PDF |
Slides
A penalty that behaves like lasso for small coefficients
and like ridge for large coefficients is developed.
This penalty is a reversed Huber function.
The penalty is convex. Like the Huber function it requires
scaling. The scaling parameter can be incorporated into a
criterion that is jointly convex in it and the regression
coefficient vector.
- Owen, A.B.
"Infinitely imbalanced logistic regression"
JMLR
Many binary classfication problems are very unbalanced with
one category much more common than the other. This paper
shows what happens to logistic regression in the limit
where one category's sample size tends to infinity while the
other remains finite. For example one could use logistic
regression to separate a Gaussian measure from a finite data
set. Under mild conditions, the limiting coefficient vector
(apart from the intercept) is finite. It can be expressed in
terms of an exponential tilt and solved for by a convex optimization.
Journal of Machine Learning Research (v 8, pp 761-773, 2007)
- Zahn, Sonu, Vogel, Crane, Mazan-Mamczarz, Rabkin, Davis,
Becker, Owen, Kim
"Transcriptional profile of aging in human muscle reveals a common
aging signature"
PLOS Genetics
This paper relates human aging to various genes and gene groups.
It includes a new version of Gene Set Enrichment Analysis (GSEA)
geared to handle regressions and covariates.
The electron transport group of genes are found to be age related
in human kidney, muscle and brain, and in other species.
2005
- Tribble, S.D and Owen, A.B.
"Construction of weakly CUD sequences for MCMC sampling
Electronic Journal of Statistics
An earlier paper below showed that MCMC can be
driven by completely uniformly distributed (CUD)
or weakly CUD (WCUD) point sequences.
This paper shows that
a construction of Liao's that permutes QMC vectors
leads to WCUD point sequences. A theorem
of Niederreiter (1977) implies that certain lattice
constructions satisfy a triangular array version of CUD.
We find QMC methods for MCMC reduce variance by factors
ranging from 10 to several hundred in a 42 dimensional
Gibbs sampling probit example.
A proposal of Liao's for incorporating acceptance rejection
into QMC-MCMC
Seth Tribble's thesis
- Owen, A.B.
"Local antithetic sampling with scrambled nets"
Annals of Statistics 2008 |
@ arXiv |
@ Project Euclid
A local antithetic reflection strategy can improve
the variance rate of scrambled nets
from \(n^{-3/2+\epsilon}\) to \(n^{-3/2-2/d+\epsilon}\)
in dimension d.
The benefit is similar to that which Haber gets
when moving from stratified sampling to stratified
antithetic sampling.
The method also looks like a merger of scrambled nets
and monomial cubature.
- Owen, A.B.
"On the Warnock-Halton quasi-standard error"
Monte Carlo Methods and Applications
v12 n 1 pp 47--54
DOI: 10.1515/156939606776886652
Warnock and Halton have proposed a
method of treating multiple QMC estimates as
replicates.
This paper reproduces an example where the method
seems to work, but cautions that the method
can fail arbitrarily badly.
-
Lin, Z & Owen, A.B. & Altman, R.
Science
Reply to letters about "Genetic research and human subject privacy".
2004
- Owen, A.B. and Tribble, S.D
"A quasi-Monte Carlo Metropolis algorithm"
PNAS
This paper proves that QMC methods can be applied to
Metropolis-Hastings style MCMC.
The key idea is to use QMC points
that are "completely uniformly distributed".
This is like using the entire period of a small random
number generator.
Seth Tribble's thesis
-
Rodwell, Sonu, Zahn, Lund, Wilhelmy, Wang, Xiao, Mindrinos, Crane, Segal, Myers, Brooks, Davis, Higgins, Owen and Kim
"A transcriptional profile of aging in the human kidney"
Public Library of Science
We found genes that change expression with age, in the human kidney.
These genes do not tend to be the ones that serve to distinguish kidney
from other tissue types, consistent with a model that aging is similar
in different tissue types. The genes do not overlap with aging
related genes in other species that we looked at.
- Owen, A.B.
"Randomized QMC and point singularities"
PDF
Randomized QMC is shown to have a superior
rate of convergence to ordinary MC on some
functions with square integrable singularities
at unknown locations. Surprisingly that means
RQMC will generally beat importance sampling
asymptotically. Of course one might combine
them.
- Lin, Z & Owen, A.B. & Altman, R.
"Genetic research and human subject privacy"
Science, Vol 305, Issue 5681, 183, 9 July 2004
-
Owen, A. B.
"Variance of the number of false discoveries".
PDF
| R functions (beta)
| R function documentation (beta)
Given d hypothesis tests at level \(\alpha\) we expect \(d \alpha\)
false positives. When the tests are dependent, the variance
of the number of false positives can be \(O(d^2)\) higher than the
independence value of \(d \alpha(1-\alpha)\). This paper
shows how to estimate such a variance taking account of
dependency in the tests.
The R functions are beta and very subject to change.
Feedback is welcomed.
- Owen, A. B. "Halton sequences avoid the origin".
SIAM Review
v 48 n 3 pp 487--503
Gives rates of convergence for QMC on unbounded integrands,
using growth conditions on f and a singularity avoidance
pattern for x's. Halton sequences and randomized QMC avoid
the origin suitably.
- Owen, A. B. "Multidimensional variation for quasi-Monte Carlo".
PDF
| BiBTeX
Survey, and some new results, on multidimensional variation
(Vitali and Hardy-Krause) for Quasi-Monte Carlo.
2003
- Nomogram for predicting the likelihood
of delayed graft function in adult cadeveric renal transplant patients
Journal of the American Society of Nephrology
-
Liu, R. Owen, A.B.
"Estimating Mean Dimensionality of ANOVA decompositions"
JASA 2006
Relationships between Sobol's sensitivity indices
and moments of the dimension distribution are established.
The mean dimension is computed for some functions arising in
finance and extreme value theory.
The minimum of d independent uniform random variables is seen
to have strong low dimensional components.
-
Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D.
"A Bayesian framework for combining heterogeneous data sources for
gene function prediction (in S. cerevisiae)"
PNAS|
Online
Supplement
- Owen, A.B.
"Quasi-Monte Carlo Sampling"
PDF
| BiBTeX
A Chapter on QMC for a
SIGGRAPH 2003 course. It
motivates QMC as a deterministic law of large
numbers. The algorithms are presented as
extensions of stratification methods, like
those already well known in graphics
(jittering, n rooks, multi-jittered sampling).
2002
-
Owen, A.B., Stuart, J., Mach, K. Villeneuve, A,M., Kim, S.
"A gene recommender algorithm to identify co-expressed
genes in C elegans"
Paper |
Software
This paper imitates algorithms from movie and book recommenders
to find new genes related to a group of old genes.
Given a query of genes with common function, we identify
experiments in which the query genes are strongly co-expressed.
Then we rank all the organisms genes according to the extent
to which they agree with the query group, in the selected
experiments. RNA interference knockouts confirmed two new
Retinoblastoma related genes in C elegans.
-
Owen, A.B.
"Variance and discrepancy with alternative scramblings"
PostScript |
PDF
There are many computationally efficient proposals for
scrambling digital nets. Generally they preserve mean
squared discrepancy. This paper shows that one
alternative can be detrimental to the sampling variance,
adversely affecting the rate of convergence.
Another scrambling improves the rate of convergence,
at least for d=1.
-
Owen, A.B.
"Necessity of low effective dimension"
PostScript |
PDF
This paper explores the extent to which low superposition
dimension is necessary for QMC to beat MC.
-
Jiang, T. and Owen, A.B.
"Quasi-regression for visualization and interpretation
of black box functions"
PostScript |
PDF
Quasi-regression is applied to the output of a support
vector machine and to a neural network.
The method allows one to peer into a black box and
identify important variables and interactions.
The most vexing issue is how to reconcile a decomposition derived for
independent variables with a function fit to highly dependent data.
-
Hickernell, F. and Lemieux, C. and Owen, A.B.
"Control variates for quasi-Monte Carlo"
PostScript |
PDF
It is easy and natural to combine quasi-Monte Carlo
with control variates. But the proper control
variate coefficients can change, as can the choice
of what constitutes a good control variate.
In MC a good control variate correlates with the
integrand. In QMC it is better to correlate with
some derivative or high frequency component of the
target integrand.
2001
-
Jiang, T. and Owen, A.B.
"Quasi-regression with shrinkage"
PostScript |
PDF |
Software |
Slides
Quasi-regression is a method of Monte Carlo approximation
useful for global sensitivity analysis.
This paper presents a new version,
incorporating shrinkage parameters of the type used
in wavelet approximation.
As an example application, a black box function from
machine learning is analyzed. That function is nearly
a superposition of functions of one and two variables
and the first variable acting alone accounts for more
than half of the variance.
-
Owen, A.B.
"The dimension distribution and quadrature test functions"
PostScript |
PDF
A "dimension distribution" is introduced through which various
measures of effective dimension of a function can be defined.
The idea is explored on some widely used quadrature test
functions. Some isotropic functions are shown to be of low
effective dimension, explaining the success of QMC methods on them.
-
Owen, A.B.
"Empirical Likelihood"
Book and Software home page
-
Lemieux, C. and Owen, A.B.
"Quasi-Regression and the Relative Importance of
the ANOVA Components of a Function"
PostScript |
PDF
Every minute spent doing the research reported here was a minute
somebody was, with good reason, waiting for me to do something else. I thank them
all for their patience.
Sequoia Hall, 390 Serra Mall, Stanford CA 94305