时 间:2020年10月22日(周四)下午14:00-15:00
地 点:腾讯会议ID : 691 447 102
报告人:蒋学军 南方科技大学副教授
题 目:Variable selection in distributed sparse regression under memory constraints
摘 要:
This paper studies variable selection using the penalized likelihood method for distributed sparse regression with large sample size n under a limited memory constraint, where the memory of one machine can only store a subset of data. This is a much needed research problem to be solved in the big data era. A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines, aggregate the results from all machines via averaging, and finally obtain the selected variables. However, it tends to select more noise variables, and the false discovery rate may not be well controlled. We improve it by a special designed weighted average in aggregation. Theoretically, we establish asymptotic properties of the resulting estimators for the likelihood model with a diverging number of parameters. Under some regularity conditions we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample. Computationally, a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods. Furthermore, the proposed method is evaluated by simulations and a real example.
个人简介:
蒋学军,现任南方科技大学统计与数据科学系长聘副教授、博士生导师、数学系与统计系联合总支书记。2009年于香港中文大学获得博士学位,09-10(2010/09-2010/09)在港中文从事博士后研究工作,2013年07月加入南方科技大学,南方科技大学杰出教学奖,深圳市优秀教师,主持有国家自然科学基金(青年,面上)、广东省自然科学面上基金(2项)、深圳市科创委基础研究项目、深圳市技术委托开发项目、广东省教学改革项目等。学术任职有中国现场统计研究会各分会(资源环境,高维统计,生存分析,工业统计)理事、全国工业统计教学与研究会理事等。主要研究方向包括金融统计与计量、分位数回归、变量选择、高维统计推断、及贝叶斯应用等。已在统计学主流期刊和相关金融、经济等交叉学科期刊上发表SCI&SSCI论文40余篇,出版英文教材一部。