Seminar on Statistics and Data Science

This seminar series is organized by the research group in mathematical statistics and features talks on advances in methods of data analysis, statistical theory, and their applications.
The speakers are external guests as well as researchers from other groups at TUM.

All talks in the seminar series are listed in the Munich Mathematical Calendar.


The seminar takes place in room BC1 2.01.10 under the current rules and simultaneously via zoom. To stay up-to-date about upcoming presentations please join our mailing list. You will receive an email to confirm your subscription.

Zoom link

Join the seminar. Please use your real name for entering the session. The session will start roughly 10 minutes prior to the talk.


Upcoming talks

(keine Einträge)

Previous talks

19.01.2022 12:30 Tobias Windisch (Robert Bosch GmbH): Learning Bayesian networks on high-dimensional manufacturing data

In our manufacturing plants, many tens of thousands of components for the automotive industry, like cameras or brake boosters, are produced each day. For many of our products, thousands of quality measurements are collected and checked during their assembly process individually. Understanding the relations and interconnections between those measurements is key to obtain a high production uptime and keep scrap at a minimum. Graphical models, like Bayesian networks, provide a rich statistical framework to investigate these relationships, not alone because they represent them as a graph. However, learning their graph structure is an NP-hard problem and most existing algorithms designed to either deal with a small number of variables or a small number of observations. On our datasets, with many thousands of variables and many hundreds of thousands of observations, classic learning algorithms don’t converge. In this talk, we show how we use an adapted version of the NOTEARs algorithm that uses mixture density neural networks to learn the structure of Bayesian networks even for very high-dimensional manufacturing data.

15.12.2021 12:30 Dennis Leung (University of Melbourne): ZAP: z-value adaptive procedures for false discovery rate control with side information

In the last five years, adaptive multiple testing with covariates has gained much traction. It has been recognized that the side information provided by auxiliary covariates which are independent of the primary test statistics under the null can be used to devise more powerful testing procedures for controlling the false discovery rate (FDR). For example, in the differential expression analysis of RNA-sequencing data, the average read counts across samples provide useful side information alongside individual p-values, as the genetic markers with higher read counts should be more promising to display differential expression. However, for two-sided hypotheses, the usual data processing step that transforms the primary test statistics, generally known as z-values, into p-values not only leads to a loss of information carried by the main statistics but can also undermine the ability of the covariates to assist with the FDR inference. Motivated by this and building upon recent theoretical advances, we develop ZAP, a z-value based covariate-adaptive methodology. It operates on the intact structural information encoded jointly by the z-values and covariates, to mimic an oracle testing procedure that is unattainable in practice; the power gain of ZAP can be substantial in comparison with p-value based methods, as demonstrated by our simulations and real data analyses.

08.12.2021 12:30 Niels Richard Hansen (University of Copenhagen): Conditional independence testing based on partial copulas

The partial copula provides a method for describing the dependence between two real valued random variables X and Y conditional on a third random vector Z in terms of nonparametric residuals. These residuals are in practice computed via models of the conditional distributions X|Z and Y|Z. In this talk I will show how the nonparametric residuals can be combined to give a valid test of conditional independence provided that nonparametric estimators of the conditional distributions converge at a sufficient rate. The rates can be realized via estimators based on quantile regression. If time permits, I will show how the test can be generalized to conditional local independence (Granger noncausality) in a time dynamic framework.

01.12.2021 12:30 Benito van der Zander (Universität zu Lübeck): t.b.a.


17.11.2021 12:30 Mladen Kolar (University of Chicago): Estimation and Inference for Differential Networks

We present a recent line of work on estimating differential networks and conducting statistical inference about parameters in a high-dimensional setting. First, we consider a Gaussian setting and show how to directly learn the difference between the graph structures. A debiasing procedure will be presented for construction of an asymptotically normal estimator of the difference. Next, building on the first part, we show how to learn the difference between two graphical models with latent variables. Linear convergence rate is established for an alternating gradient descent procedure with correct initialization. Simulation studies illustrate performance of the procedure. We also illustrate the procedure on an application in neuroscience. Finally, we will discuss how to do statistical inference on the differential networks when data are not Gaussian.

10.11.2021 12:15 Michaël Lalancette (University of Toronto): The extremal graphical lasso

Multivariate extreme value theory is interested in the dependence structure of multivariate data in the unobserved far tail regions. Multiple characterizations and models exist for such extremal dependence structure. However, statistical inference for those extremal dependence models uses merely a fraction of the available data, which drastically reduces the effective sample size, creating challenges even in moderate dimension. Engelke & Hitz (2020, JRSSB) introduced graphical modelling for multivariate extremes, allowing for enforced sparsity in moderate- to high-dimensional settings. Yet, the model selection and estimation tools that appear therein are limited to simple graph structures. In this work, we propose a novel, scalable method for selection and estimation of extremal graphical models that makes no assumption on the underlying graph structure. It is based on existing tools for Gaussian graphical model selection such as the graphical lasso and the neighborhood selection approach of Meinshausen & Bühlmann (2006, Ann. Stat.). Model selection consistency is established in sparse regimes where the dimension is allowed to be exponentially larger than the effective sample size. Preliminary simulation results appear to support the theoretical results.

18.10.2021 14:00 Bernd Sturmfels (MPI Leipzig) : Algebraic Statistics with a View towards Physics

We discuss the algebraic geometry of maximum likelihood estimation from the perspective of scattering amplitudes in particle physics. A guiding example is the moduli space of n-pointed rational curves. The scattering potential plays the role of the log-likelihood function, and its critical points are solutions to rational function equations. Their number is an Euler characteristic. Soft limit degenerations are combined with certified numerical methods for concrete computations.

22.09.2021 12:15 Hongjian Shi (TUM): On universally consistent and fully distribution-free rank tests of vector independence

Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result, it has long remained unclear how one may construct distribution-free yet consistent tests of independence between random vectors. This is the problem addressed in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, and adopts a common standard form for dependence measures that encompasses many popular examples. In a unified study, we derive a general asymptotic representation of center-outward rank-based test statistics under independence, extending to the multivariate setting the classical Hájek asymptotic representation results. This representation permits direct calculation of limiting null distributions and facilitates a local power analysis that provides strong support for the center-outward approach by establishing, for the first time, the nontrivial power of center-outward rank-based tests over root-n neighborhoods within the class of quadratic mean differentiable alternatives.

14.04.2021 12:15 Mona Azadkia (ETH Zurich): A Simple Measure Of Conditional Dependence

We propose a coefficient of conditional dependence between two random variables $Y$ and $Z$, given a set of other variables $X_1, \cdots , X_p$, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in $[0, 1]$, where the limit is 0 if and only if $Y$ and $Z$ are conditionally independent given $X_1, \cdots , X_p$, and is 1 if and only if Y is equal to a measurable function of $Z$ given $X_1, \cdots , X_p$. Moreover, it has a natural interpretation as a nonlinear generalization of the familiar partial $R^2$ statistic for measuring conditional dependence by regression. Using this statistic, we devise a new variable selection algorithm, called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. A number of applications to synthetic and real datasets are worked out.

14.04.2021 13:15 Armeen Taeb (ETH Zurich): Latent-variable modeling: causal inference and false discovery control

Many driving factors of physical systems are latent or unobserved. Thus, understanding such systems and producing robust predictions crucially relies on accounting for the influence of the latent structure. I will discuss methodological and theoretical advances in two important problems in latent-variable modeling. The first problem focuses on developing false discovery methods for latent-variable models that are parameterized by low-rank matrices, where the traditional perspective on false discovery control is ill-suited due to the non-discrete nature of the underlying decision spaces. To overcome this challenge, I will present a geometric reformulation of the notion of a discovery as well as a specific algorithm to control false discoveries in these settings. The second problem aims to estimate causal relations among a collection of observed variables with latent effects. Given access to data arising from perturbations (interventions), I will introduce a regularized maximum-likelihood framework that provably identifies the underlying causal structure and improves robustness to distributional changes. Throughout, I will explore the utility of the proposed methodologies for real-world applications such as water resource management.

24.02.2021 12:15 Elisabeth Ullmann (TUM): Multilevel estimators for models based on partial differential equations

Many mathematical models of physical processes contain uncertainties due to incomplete models or measurement errors and lack of knowledge associated with the model inputs. We consider processes which are formulated in terms of classical partial differential equations (PDEs). The challenge and novelty is that the PDEs contain random coefficient functions, e.g., some transformations of Gaussian random fields. Random PDEs are much more flexible and can model more complex situations compared to classical PDEs with deterministic coefficients. However, each sample of a PDE-based model is extremely expensive. To alleviate the high costs the numerical analysis community has developed so-called multilevel estimators which work with a hierarchy of PDE models with different resolution and cost. We review the basic idea of multilevel estimators and discuss our own recent contributions: i) a multilevel best linear unbiased estimator to approximate the expectation of a scalar output quantity of interest associated with a random PDE [1, 2], ii) a multilevel sequential Monte Carlo method for Bayesian inverse problems [3], iii) a multilevel sequential importance method to estimate the probability of rare events [4]. [1] D. Schaden, E. Ullmann: On multilevel best linear unbiased estimators. SIAM/ASA J. Uncert. Quantif. 8(2), pp. 601-635, 2020 [2] D. Schaden, E. Ullmann: Asymptotic analysis of multilevel best linear unbiased estimators, arXiv:2012.03658 [3] J. Latz, I. Papaioannou, E. Ullmann: Multilevel Sequential² Monte Carlo for Bayesian Inverse Problems. J. Comput. Phys., 368, pp. 154-178, 2018 [4] F. Wagner, J. Latz, I. Papaioannou, E. Ullmann: Multilevel sequential importance sampling for rare event estimation. SIAM J. Sci. Comput. 42(4), pp. A2062–A2087, 2020

18.02.2021 17:00 Dorota Kurowicka (TU Delft): Simplified R-vine based forward regression

An extension of the D-vine based forward regression procedure to a R-vine forward regression is proposed. In this extension any R-vine structure can be taken into account. Moreover, a new heuristic is proposed to determine which R-vine structure is the most appropriate to model the conditional distribution of the response variable given the covariates. It is shown in the simulation that the performance of the heuristic is comparable to the D-vine based approach. Furthermore, it is explained how to extend the heuristic into a situation when more than one response variable are of interest. Finally, the proposed R-vine regression is applied to perform a stress analysis on the manufacturing sector which shows its impact on the whole economy. Reference: Zhu, Kurowicka and Nane.

03.02.2021 16:00 Holger Dette (Ruhr-Universität Bochum): Testing relevant hypotheses in functional time series via self-normalization

In this paper we develop methodology for testing relevant hypotheses in a tuning-free way. Our main focus is on functional time series, but extensions to other settings are also discussed. Instead of testing for exact equality, for example for the equality of two mean functions from two independent time series, we propose to test a \textit{relevant} deviation under the null hypothesis. In the two sample problem this means that an $L^2$-distance between the two mean functions is smaller than a pre-specified threshold. For such hypotheses self-normalization, which was introduced by Shao (2010) and is commonly used to avoid the estimation of nuisance parameters, is not directly applicable. We develop new self-normalized procedures for testing relevant hypotheses and demonstrate the particular advantages of this approach in the the comparisons of eigenvalues and eigenfunctions.