

学 术 报 告(一)

题  目:Monitoring the data quality of data streams using a two-step control scheme

时  间:2018年10月17日(周三)下午14:30-15:30

地  点:闵行校区法商南楼135室

报告人:吴纯杰教授  上海财经大学统计与管理学院副院长

摘 要:In the Data-rich environments provide unprecedented opportunities for monitoring data quality. This paper focuses on the quality of data streams. We use indicator variables to measure the six dimensions of data quality and a glitch index to indicate how poor the quality is. A two-step control scheme is proposed considering two relationships: the inter- and intra-correlation. In the first step, the Mahalanobis distance (MD) is applied to the  -type control chart to monitor the quality of a data stream. In the second step, a Shewhart control chart is built based on a weighted-sum statistic, which measures the quality of the whole process. The feasibility and effectiveness of the control scheme are illustrated through detailed simulation studies and one landslide example. The simulated results, considering the three cases of no correlation, low correlation and high correlation, show that the proposed approach can detect the mean shift in multiattribute data sensitively and robustly. The example, in which sensors are used to collect data on accelerations in Taiwan, demonstrates the superiority of our design over four traditional control charts, producing the closest type-I error to the given level and the highest power under the same type-I error. This talk is based on a joint work with Drs Miaomiao Yu and Fugee Tsung.


学 术 报 告(二)

题  目:Surprise sampling: an optimal subsampling design

时  间:2018年10月17日(周三)下午15:30-16:30

地  点:闵行校区法商南楼135室

报告人:郁 文副教授  复旦大学管理学院

摘  要:Sampling for surprise is a working principle of efficient sampling for the saving of computational workload among other purposes. A sample is deemed surprising if it has large error of pilot prediction or large absolute score, and will be sampled with larger sampling probability, as it in general contains more information than non-surprising samples. Such sampling schemes are particularly useful when dealing with imbalanced data. Following the working principle, we propose a sample design called surprise sampling. It caters to the specific forms of a variety of objectives. The estimation procedure is valid even if the model is misspecified and/or the pilot estimator is inconsistent. The proposed surprise sampling includes as a special case the local case-control sampling (Fithian and Hastie, 2014), which high efficiency by utilizing a clever adjustment pertained only to the logistic model. The proposed estimator also performs no worse than that of (Fithian and Hastie, 2014) under same model specification. We present theoretical justifications of the claimed advantages and optimality of the estimation and the sampling design. Numerical studies are carried out and the evidence in support of the theory is shown.

