The seminar is organised by the chair of Mathematical Statistics. The speakers in this seminar are Master students who present the results of their Master Thesis, PhD students of the chair who speak about their current researsch as well as external researchers in the area of statistics.
If not stated otherwise, the seminar starts at 12:15 and takes place in room BC1 2.01.10.
Title: Clustering Large Time-Series Data using Regression Components. Abstract: This study tackles the question of how we can efficiently and meaningfully build customer groups from large and complex time series data. Current clustering algorithms concern the creation of group of objects, where each group ideally consists of similar data points. However, they usually do not integrate external data into the analysis and do not provide additional insights about the clusters. Given the fact that nowadays there are large amounts of internal and external data available from search engines and meteorological services, we propose a two-stage time-series data clustering strategy, which incorporates external data into the analysis: First, we start with explaining the use of harmonic regression and Fourier decomposition of time series. We then show determining frequency terms of sinusoidal functions with a periodogram analysis overfits the data. On the other hand, we compare state-of-the-art algorithms for clustering by using the harmonic regression components and present a first dependence analysis using the copula approach. Finally, we show and conclude that the clustering approach, which models the dependence between the clusters, is beneficial.
Title: D- and C-vine quantile regression for large data sets - - - Abstract: Quantile regression is a field with growing importance for statistical modeling. It has a broad range of applications and it emerged as a complementary method to linear regression in many fields. Ever since the formal definition for quantile regression has been formulated in 1978, there have been many attempts to improve this methods shortfalls. The occurrence of quantile crossings and the linearity assumption are just a few disadvantages of linear quantile regression. One of the ideas how to overcome such shortfalls, is to use vine copula based quantile regression. Vine-copulas allow highly flexible modeling of high-dimensional dependence structures. The first work in this field by Kraus and Czado (2017), introduced D-vine quantile regression for the subclass of D-vines. The idea how to build the D-vine copula is based on the maximal improvement of the conditional log likelihood in the next tree. Our first goal is to extend this method to the subclass of C-vines, so that more flexible dependence structures can be modeled. The next step is to introduce new algorithms for both D-vine and C-vine copulas, where we look on the next two trees for the maximal improvement in the conditional log likelihood. Furthermore, an additional goal is to be able to use these algorithms on big data sets, and thus, we introduce some modifications in our two-step ahead algorithm in order to reduce the computational complexity. At the end, we try to examine the performance of the algorithms through an extensive simulation study, where we compare the proposed algorithms based on several performance measures which include, among others, the out of sample mean square error, conditional log likelihood and computational time.