大数据统计与计算学术报告

时间: 2017年6月6日下午1:00-4:00

地点: 统计楼105

报告1:

Point Estimation of Many Normal Means

Chuanhai Liu

Department of Statistics, Purdue University

Abstract:Point estimation of many normal means is one of the most fundamental problems in modern statistics. Although there is an impressive theoretical literature on this problem and, more generally, penalty-based inference in the last 50 years, it deserves another, but deep, look in the era of big data. Firstly in this talk, we argue that the method of maximum likelihood mistakenly ignores certain data structural information, which has been leading to the use of penalized likelihood in the hope to obtain more sensible results. Secondly, we propose a method based on the simple but powerful idea that can be stated as: the observed data and the simulated from the estimated model must be stochastically indistinguishable. Thirdly, we show some numerical results obtained with a simple Stochastic Approximation algorithm. The numerical results demonstrate the superior performance of the proposed method, especially when packaged in a Bayesian way with the naive estimate-based empirical distribution as the working prior. Finally, we conclude with a few remarks on the implications and further developments of the new idea and proposed method.

Biographical sketch: Chuanhai Liu obtained his PhD. degree from Harvard and worked in Bell Labs for ten years. He is Professor of Statistics at Purdue. His research interests include optimization methods such as Quasi-Newton, EM and MCMC algorithms, data analysis including automated model building, foundations of statistical inference for which he and his collaborators developed the theory of inferential models, and statistical computing systems for big data analysis.

报告2:

Sparse Tensor Decomposition
for Personalized Advertising and Ads Clustering
Wei Sun
Department of Management Science, University of Miami

Abstract: Tensor as a multi-dimensional generalization of matrix has received increasing attention in industry due to its success in personalized recommendation systems. Traditional recommendation systems are mainly based on the user-item matrix, whose entry denotes each user's preference for a particular item. To incorporate additional information into the analysis, such as the temporal behavior of users, we encounter a user-item-time tensor. Existing tensor decomposition methods are mostly established in the non-sparse regime where the decomposition components include all features. In online advertising, the ad-click tensor is usually sparse due to the rarity of ad clicks. Hence, many latent features essentially contain no information about the tensor structure, and there is a great need for a more appropriate method that can simultaneously perform tensor decomposition and conduct variable selection.

In this talk, I will discuss a new sparse tensor decomposition method that incorporates the sparsity of each latent component to the CP tensor decomposition. Specifically, the sparsity is achieved via an efficient truncation procedure to directly solve an L0 sparsity constraint. In theory, in spite of the non-convexity of the optimization problem, it is proven that an alternating updating algorithm attains an estimator whose rate of convergence significantly improves those shown in non-sparse decomposition methods. The potential business impact of our method is demonstrated via two tasks: click-through rate prediction and cluster analysis of ads. Our results provide new insights in understanding ad clicks and ad industries.

Biographical sketch: Will Wei Sun is currently an assistant professor in the Department of Management Science, University of Miami, Florida. Before that, he was a research scientist in the advertising science team at Yahoo labs. He obtained his PhD in Statistics from Purdue University in 2015. Dr. Sun’s research focuses on machine learning with applications in computational advertising, personalized recommendation system, and Neuroimaging analysis.

报告3:

Automated Model Building and Deep Learning

Xiao Wang

Department of Statistics, Purdue University

Abstract: Analysis of big data demands computer aided or even automated model building. It becomes extremely difficult to analyze such data with traditional statistical models and model building methods. Deep learning has proved to be successful for a variety of challenging problems such as AlphaGo, driverless cars, and image classification. Understanding deep learning has however apparently been limited, which makes it difficult to be fully developed. In this talk, we focus on neural network models with a single hidden layer. We provide an understanding of deep learning from an automated modeling perspective. This understanding leads to a sequential method of constructing deep learning models. This method is also adaptive to unknown underlying model structure. This is a joint work with Chuanhai Liu.

Biographical sketch: Xiao Wang is Professor of Statistics in Purdue University. Dr. Wang's research focuses on nonparametric statistics, machine learning, and artificial intelligence. The application areas include medical image analysis and healthcare engineering. He received his PhD from University of Michigan. He has served as Associate Editor for many top statistics journals such as Journal of the American Statistical Association, Technometrics, Lifetime Data Analysis, and Electronic Journal of Statistics.