dc.contributor.author | Kallus, Jonatan | |
dc.date.accessioned | 2017-03-30T08:49:04Z | |
dc.date.available | 2017-03-30T08:49:04Z | |
dc.date.issued | 2017 | |
dc.identifier.uri | http://hdl.handle.net/2077/52101 | |
dc.description.abstract | Network modeling is an effective approach for the interpretation of
high-dimensional data sets for which a sparse dependence structure can
be assumed. Genomic data is a challenging and important example. In
genomics, network modeling aids the discovery of biological mechanistic
relationships and therapeutic targets. The usefulness of methods for
network modeling is improved when they produce networks that are
accompanied by a reliability estimate. Furthermore, for methods to
produce reliable networks they need to have a low sensitivity to
occasional outlier observations. In this thesis, the problem of robust
network modeling with error control in terms of the false discovery rate
(FDR) of edges is studied. As a background, existing types of genomic
data are described and the challenges of high-dimensional statistics and
multiple hypothesis testing are explained.
Methods for estimation of sparse dependency structures in single samples
of genomic data are reviewed. Such methods have a regularization
parameter that controls sparsity of estimates. Methods that are based on
a single sample are highly sensitive to outlier observations and to the
value of the regularization parameter. We introduce the method ROPE,
resampling of penalized estimates, that makes robust network estimates
by using many data subsamples and several levels of regularization.
ROPE controls edge FDR at a specified level by modeling edge selection
counts as coming from an overdispersed beta-binomial mixture
distribution. Previously existing resampling based methods for network
modeling are reviewed. ROPE was evaluated on simulated data and gene
expression data from cancer patients. The evaluation shows that ROPE
outperforms state-of-the-art methods in terms of accuracy of FDR control
and robustness. Robust FDR control makes it possible to make a
principled decision of how many network links to use in subsequent
analysis steps. | sv |
dc.format.extent | 30 s. | sv |
dc.language.iso | eng | sv |
dc.publisher | University of Gothenburg and Chalmers University of Technology | sv |
dc.subject | high-dimensional data | sv |
dc.subject | sparsity | sv |
dc.subject | model selection | sv |
dc.subject | bootstrap | sv |
dc.subject | genomics | sv |
dc.subject | graphical modeling | sv |
dc.title | Resampling in network modeling of high-dimensional genomic data | sv |
dc.type | Text | sv |
dc.type.svep | licentiate thesis | sv |
dc.contributor.organization | Department of Mathematical Sciences | sv |