数据科学前沿国际研讨会

International Seminar on the Frontiers of Data Science

2018.09.23-09.24

Opening Ceremony

Time: Sep23, 2018(Sunday) 13:00-14:00

Venue: Room 109, HongYuan Building

Time

Speaker

Event

13:00-13:30

Registration for the Workshop

13:30-13:40

GUO Jianjun (郭建军)

Opening Speech

13:40-13:50

Ostap Okhrin

Speech

13:50-14:00

Group Photo

Parallel Sessions

Session 1

Time: Sep23, 2018(Sunday) 14:00-15:30

Venue: Room 109, HongYuan Building

Session Chair: Ostap Okhrin





Time

Speaker

Title

School / Institution

14:00-14:30

XIAO Feng

(肖峰)

Day-to-day Flow Dynamics for Stochastic User Equilibrium and A General Lyapunov Function

Lyapunov Function

School of Business Administration, SWUFE

Ostap Okhrin

Flexible HAR Model for Realized Volatility





 

Institute of Transport and Economics, TUD


15:00-15:30

XIAO Hui

(肖辉)

Ranking and Selection with Input Uncertainty






 

School of Statistics, SWUFE

14:30-15:00

Ostap Okhrin

Flexible HAR Model for Realized Volatility





 

Institute of Transport and Economics, TUD

15:00-15:30

XIAO Hui

(肖辉)

Ranking and Selection with Input Uncertainty






 

School of Statistics, SWUFE

15:30-15:50

Coffee Break

Session 2

Time: Sep23, 2018(Sunday) 15:50-17:20

Venue: Room 109, HongYuan Building

Session Chair: CHEN Xuerong (陈雪蓉)


Time

Speaker

Title

School / institution

15:50-16:20

Georg Hirte

International Trade, Geographic Heterogeneity and Interregional Inequality

Institute of Transport and Economics, TUD

16:20-16:50

CHEN Xuerong

(陈雪蓉)

Integrated Powered Density: Screening Ultrahigh Dimensional Covariates with Survival Outcomes  

School of StatisticsSWUFE

16:50-17:20

Regine Gerike

Travel Behavior in Urban Areas: Data, Methods, Findings

Institute of Transport Planning and Road TrafficTUD

17:30-19:30

Welcome Dinner




Session 3

Time: Sep24, 2018(Monday) 9:30-10:30

Venue: Room 109, HongYuan Building

Session Chair: GUO Mengmeng (郭萌萌)


Time

Speaker

Title

School / institution


9:30-10:00


Bernhard Schipp


Time Dependent Return Distributions, Nonlinear Fokker-Planck Dynamics and the Tsallis Entropy


Institute of Business and Economics, TUD


10:00-10:30

GUO Mengmeng

(郭萌萌)

Does Air Pollution Affect Stock Returns? Evidence from China


Institute of Economics and Management, SWUFE

10:30-10:50

Coffee Break




Session 4

Time: Sep24, 2018(Monday) 10:50-12:20

Venue: Room 109, HongYuan Building

Session Chair: SUN Xiuli(孙秀丽)


Time

Speaker

Title

School / institution


10:50-11:20

SUN Xiuli

(孙秀丽)

Firm-level Human Capital and Innovation: Evidence from China


School of StatisticsSWUFE


11:20-11:50

ZAHNG Jia

(张佳)

High Dimensional Elliptical Sliced Inverse Regression in non-Gaussian Distributions


School of StatisticsSWUFE


11:50-12:20


Stefanie Lösch


Measuring Regional Environmental Awareness by Using Internet Query Data


Institute of Transport and Economics, TUD


12:20-14:00

Lunch Break : Liulin Restaurant

Session 5

Time: Sep24, 2018(Monday) 14:00-15:30

Venue: Room 109, HongYuan Building

Session Chair: YANG Dong (杨冬)


Time

Speaker

Title

School / institution


14:00-14:30

Mr. Stephan Hocke


Optimize the Optimization – Parameter Tuning of a Stochastically Metaheuristic


Institute of Transport and Economics, TUD


14:30-15:00

Mr. YANG Dong (杨冬)


A Misspecification Test for the Higher Order Comoments of the Factor Model


School of StatisticsSWUFE


15:00-15:30

Mr. Manuel Schmid

Estimating Higher Moments with High Frequency Returns


Institute of Transport and Economics, TUD


15:30-15:50


Coffee Break

Session 6

Time: Sep24, 2018(Monday) 15:50-16:50

Venue: Room 109, HongYuan Building

Session Chair: Sophie Häse


Time

Speaker

Title

School / institution


15:50-16:20


Sophie Häse

The Impact of Unexpected and Recurring Flooding Events on House Prices


Institute of Transport and Economics, TUD


16:20-16:50


WANG Minke
(王旻轲)

Modelling and Solving the Location Inventory Problem with Stochastic Demand Considering Carbon Cap-and-Trade


School of StatisticsSWUFE


17:30-20:30


Dinner in the City Center

Title & Abstract



TUD side:


# Prof. Dr. OstapOkhrin

Titel: Flexible HAR Model for Realized Volatility

Co-Authoren: Francesco Audrino und Chen Huang (Uni St. Gallen)

Abstract:  The Heterogeneous Autoregressive (HAR) model is commonly used in  modeling the dynamics of realized volatility. In this paper, we propose a  flexible HAR(1,...,p) specification, employing the adaptive LASSO and  its statistical inference theory to see whether the lag structure (1, 5,  22) implied from an economic point of view can be recovered by  statistical methods. The model differs from Audrino and Knaus (2016)  where the authors apply LASSO on the AR(p) model, which does not  necessarily lead to a HAR model. Adaptive LASSO estimation and the  subsequent hypothesis testing results fail to show strong evidence that  such a fixed lag structure can be recovered by a flexible model. We also  apply the group LASSO and related tests to check the validity of the  classic HAR, which is rejected in most cases. The results justify our  intention to use a flexible lag structure while still keeping the HAR  frame. In terms of the out-of-sample forecasting, the proposed flexible  specification workscomparably to the benchmark HAR(1, 5, 22). Moreover,  the time-varying model combinations show that when the market  environment is not stable, the fixed lag structure (1, 5, 22) is not  particularly accurate and effective.



# Prof. Dr. Georg Hirte

Title: International Trade, Geographic Heterogeneity and Interregional Inequality

Abstract:  We study the effect of international trade on interregional inequality  from 1992-2012 within almost all countries of the world using satellite  night-light based inequality proxies. For our analysis, we develop novel  indicators for within-country trade cost heterogeneities that are based  on exogenous geographical features. In order to deal with potential  endogeneity issues, we utilize the occurrence of large natural disasters  striking trade partners as instrument to generate exogenous variation  in trade flows. In contrast to previous results, our IV estimates reveal  that international trade aggravates economic disparities only in those  countries that have higher within-country heterogeneity in their access  to the world market and their within-country trade costs.



# Prof. Dr. Regine Gerike

Title: Travel Behavior in Urban Areas: Data, Methods, Findings

Co-Author: Rico Witter

Abstract:  Cross-sectional household travel surveys (HTS) are the main data source  for analyzing travel behavior. HTS are traditionally based on  mixed-mode data collection but increasingly innovative methods such as  smartphone-based GPS-tracking are applied.This talk gives first an  overview of available data sources in Germany and in the international  context including examples for HTS datasets and also methods used for  data collection.In the second part, insights on travel behavior with a  focus on car use and the peak-car phenomenon are presented with the  example of historical HTS analysis for the five European capital cities  Berlin, Copenhagen, London, Paris and Vienna. The peak-car phenomenon  and its drivers are described based on descriptive statistics and  Age-Period-Cohort Analysis.



# Prof. Dr. Bernhard Schipp

Title:Time Dependent Return Distributions, Nonlinear Fokker-Planck Dynamics and the Tsallis Entropy

Co-Author: Sabine Hegewald

Abstract:Econometric  analysis of high frequency stock market data thatis typically based on  one or more Brownian motions lacks ability to controlfor abnormal, i.e.  time dependent changes in the noise distribution. In thispaper, an  approach originally pursued by Barndorff-Nielsen and Shephard(2000) is  extended to nonlinear Fokker-Planck equations of which the  Tsallisdistribution with time-varying parameters may be regarded as a  particularsolution. The resulting Tsallis density is able to model even  distributionswith extreme leptokurtosis in an adequate way.  Additionally, relations between theTsallis distribution and GARCH(1,1)-  and GJR(1,1)-models are discussed.






#Mrs. Sophie Häse

Title: The Impact of Unexpected and Recurring Flooding Events on House Prices

Abstract:  We study the causal impact of an unexpected major flood event and a  sequel of river floods on house prices. Previous literature mainly  investigated the impact of single events (redefining floodplains,  hurricanes, inundation) in the USA finding a negative but temporary  impact on housing prices within floodplains (in and Landry, 2012; Atreya  et al., 2013; Daniel et al., 2009). More recent studies provide  heterogeneous effects (Zhang, 2016). There is also a scarce literature  on the causal effects of river floods on house prices in actually  inundated land parcels (Artreya and Ferreira, 2015).

We  investigate the causal effects of inundation on house prices, the time  pattern and heterogeneity across house types and ask whether recurrence  matters. Our study area is Dresden, a German city with 540,000  inhabitants which is spread along the Elbe river. Dresden constitutes a  specific case due to the unexpected major flood event in 2002  (classified as HQ500 before 2002; afterwards HQ100) and on account of  subsequent events: a HQ20 flood in 2006 and a HQ50 event in 2013.  Further, the housing market in Germany is not comparable to those of  most countries because more than 50\% of all flats arerented. We, thus,  study also the impact on prices of houses with rented flats.

We  use a unique data set of all transactions on the Dresden housing market  from 2000 to 2017 including also houses with rented flats. The  inclusion of comprehensive geodata allows us to consider location,  elevation, urban amenities and local public goods and differentiate  between contrary effects like the proximity to water and the risk of  flooding.



# Mrs. Stefanie Lösch

Title: Measuring Regional Environmental Awareness by Using Internet Query Data

Co-Authors: OstapOkhrin, Hans Wiesmeth

Abstract:  "Global climate change will affect the Russian Federation in  particular: with regions in permafrost areas, large forested areas, and  an agriculture adjusted to the current climatic conditions, Russia will  be confronted with consequences of the climate change on a large scale.  Are citizens sufficiently aware of these challenges in order to provoke  necessary support from the public administration?

In  this paper, we estimate awareness indices for 81 regions and 28 month,  ranging from January 2014 to April 2016, by using a  Multiple-Indicator-Multiple-Causes (MIMIC) model. Dependent indicators  are derived from the number of certain queries in the search engine  Yandex, whereas exogenous causes of environmental awareness are assumed  to be characteristics of the Russian regions. The estimated awareness  time-series reveal seasonal effects, especially a high interest in  environmental topics in the winter months, as well as negative  correlation with the regional temperature. The estimated awareness index  is larger for the regions in the cold north than in the warmer south of  the country. Geographical groups with similar awareness structures are  found by using k-means algorithm. Moreover, a positive dependence between the level of awareness and regional GRP per capita can be shown."



#Mr. Stephan Hocke

Title:Optimize the Optimization – Parameter Tuning of a Stochastically Metaheuristic

Abstract:Optimization  problems arise in various contexts and the proven optimal solution  cannot always been determined in a justifiable amount of computation  time or resources, respectively. Especially for discrete problems,  heuristic procedures are required. The sequence of events and solution  quality of metaheuristics and artificial intelligences are not  predetermined and are mostly randomly dependent. Hence, different seeds  of the random generator results in different solution for the same  problem. Consequently, finding the best parameter setting of a  metaheuristic (e.g. genetic algorithm – recombination type/mutation  probability/mutation type etc., Simulated annealing – temperature  function/iterations, tabu search – acceptance probability/size of tabu  list etc.) is a non-trivial task and represents an optimization problem  itself. Since the literature focus on the proof of concepts and the  necessary computational effort is prohibitively expensive regard to the  desire publication, finding the right parameter setting plays a  subordinate. Consequently, the majority provided parameter settings are  arbitrary or unfounded. This paper presents an exemplary parameter  tuning of a Particle Swarm Optimization (PSO) developed for the Vehicle  Routing Problem with Temporal Synchronization Constraints. In the  process, it should be evaluated whether there is any generalizable  "best" parameter setting, or whether this depends on the problem size  and / or structure.


# Mr. Manuel Schmid

Title:Estimating Higher Moments with High Frequency Returns

Co-Authors:OstapOkhrin, Michael Rockinger

Abstract:  In standard return modelling approaches, returns are often assumed to  follow a normal distribution. This assumption implies a zero skewness as  well as a zero excess kurtosis. Both of these implications do not  correspond to empirical observation and eventually lead to problems e.g.  in financial risk management. On the other side, the typical  non-parametric estimation of these values require a huge amount of data  to be reliable. For this reason, it is advisable to exploit the  availability of high frequency data and construct estimators in the  fashion of the well-known realized variance. In this paper an estimation  approach presented by Neuberger and Payne (2018) is extended to  non-martingale price processes. On the basis of Monte Carlo simulations,  we show that our estimators are unbiased and consistent when the  underlying price process can be modelled as a stochastic volatility jump  diffusion process. Distribution properties of the estimators are  discussed








SWUFE side:


# Prof. Dr. XIAO Hui

Title:Ranking and selection with input uncertainty

Abstract:  In this research, we consider the ranking and selection (R&S)  problem with input uncertainty. It seeks to maximize the probability of  correct selection (PCS) for the best design under a fixed simulation  budget, where the performance of each design is measured by their  worst-case performance. To simplify the complexity of PCS, we develop an  approximated probability measure for it and derive an asymptotically  optimal solution of the resulting problem. An efficient selection  procedure is then designed within the optimal computing budget  allocation (OCBA) framework. More importantly, we provide some useful  insights on characterizing an efficient robust selection rule and how it  can be achieved by adjusting the simulation budgets allocated to each  scenario.



# Prof. Dr. GUO Mengmeng

Title:Does Air Pollution Affect Stock Returns? Evidence from China

Abstract:Building on research linking environmental factors to investors’ sentiments and the local bias literature, we  posit that there is a negative relation between air pollution and a  firm’s stock return. Consistent with our hypothesis, we find that firms  located in cities that experience higher levels of air pollution exhibit  lower stock returns and lower trading volumes. In line with our central hypothesis, we observe that the effect of air pollution on stock returns is stronger among firms that are more  likely to be held by local investors. Our results hold across  alternative measures of air pollution and are not sensitive to the  location of the firm, the city size, the pollution level, and the air  pollution standards. Moreover, the results remain robust after  addressing the endogeneity issue by controlling for firm fundamental  factors. This study is the first to establish an association between air  pollution and local stock returns.



# Prof. Dr. SUN Xiuli

Title:Firm-level Human Capital and Innovation: Evidence from China

Abstract:This  paper explores the role of human capital in firms’ innovation. Based on  a World Bank survey of manufacturing firms in China, we use two  firm-level datasets: one from the larger metropolitan cities, and one  from smaller and mid-sized cities. Patents are used as an indicator of  innovation. The human capital indicators we use include the number of  highly educated workers, the general manager’s education and tenure, and  the management team’s education and age. We use the Negative Binomial  and Instrumental Variables estimators to estimate patent production  function models that are augmented by our human capital variables. We  also use the zero-inflated Poisson model to examine the likelihood of  innovation. We find that the human capital indicators play an important  role in influencing patenting, and that some of the human capital  variables appear to have a greater impact on patenting in the smaller  and mid-sized cities.

Our  human capital estimates are obtained after controlling for firms’  R&D, size, market share, age, and foreign ownership, as well as  fixed effects to control for industry-specific characteristics, and  firms’ location and geography. We comment on how our findings play into  China’s policies related to innovation and human capital formation.



# Prof. Dr. XIAO Feng

Title:Day-to-day Flow Dynamics for Stochastic User Equilibrium and A General Lyapunov Function

Abstract:This  study establishes a general framework for continuous day-to-day models  to capture the perceptual errors in travelers’ day-to-day route choice  behavior. As the counterpart of the Beckmann transformation (Beckmann et  al., 1956), which has been widely used as a candidate Lyapunov function  to prove the stability of continuous day-to-day traffic evolution  models that converge to deterministic user equilibrium (DUE), Fisk’s  formulation (Fisk, 1980; Watling and Cantarella, 2013) is utilized in  our study as a general Lyapunov function for the day-to-day models that  converge to stochastic user equilibrium (SUE), so far as the path flow  growth rates and the “potentials” of the paths satisfy the condition of  negative correlation. A sufficient condition which guarantees the  nonnegativity of the path flow is also provided. The logit dynamic  (Sandholm, 2010), the logit-based smith dynamic (Smith and Watling,  2016) and the logit-based BNN dynamic (Brown and Von Neumann, 1950) are  given as three examples under this framework. Moreover, we extend the  second-order day-to-day model in Xiao et al. (2016) for SUE. Some  properties of the new model, such as fixed point and stability, are  investigated. Interestingly, we find that even the model converges to  SUE, the path flows could still go negative during the oscillation under  extreme situations. A numerical experiment is conducted to demonstrate  the existence of negative path flow for the second-order model.



# Prof. Dr. CHEN Xuerong

Title: Integrated Powered Density: Screening Ultrahigh Dimensional Covariates with Survival Outcomes

Abstract: Modern  biomedical studies have yielded abundant survival data with  high-throughput predictors. Variable screening is a crucial rst step in  analyzing such data, for the purpose of identifying predictive  biomarkers,
understanding biological mechanisms and making accurate  predictions. To nonparametrically quantify the relevance of each  candidate variable to the survival outcome, we propose integrated  powered density (IPOD), which compares the di erences in the  covariate-strati ed distribution functions. The proposed new class of  statistics, with a flexible weighting scheme, is general and includes  the Kolmogorov statistic as a special case. Moreover, the method does  not rely on rigid regression model assumptions and can be easily  implemented. We show that our method possesses sure screening  properties, and con rm the utility of the proposal with extensive  simulation studies. We apply the method to analyze a multiple myeloma  study on detecting gene signatures for cancer patients' survival.



# Mr. YANG Dong

Title:A Misspecification Test for the Higher Order Comoments of the Factor Model.

Abstract:

The  traditional estimation of higher order co-moments of non-normal random  variables by the sample analog of the expectation faces a curse of  dimensionality, as the number of parameters increases steeply when the  dimension increases. Imposing a factor structure on the process solves  this problem; however, it leads to the challenging task of selecting an  appropriate factor model. This paper contributes by proposing a test  that exploits the following feature: when the factor model is correctly  specified, the higher order co-moments of the unexplained return  variation are sparse. It recommends a general to specific approach for  selecting the factor model by choosing the most parsimonious  specification for which the sparsity assumption is satisfied. This  approach uses a Wald or Gumbel test statistic for testing the joint  statistical significance of the co-moments that are zero when the factor  model is correctly specified. The asymptotic distribution of the test  is derived. An extensive simulation study confirms the good finite  sample properties of the approach. This paper illustrates the practical  usefulness of factor selection on daily returns of random subsets of  S&P 100 constituents.



# Mr. WANG Minke

Title: Modelling and Solving the Location Inventory Problem with Stochastic Demand Considering Carbon Cap-and-Trade

Abstract: We  address a multi-period facility location-inventory problem with the  consideration of carbon emissions in a multi-echelon supply chain  network consists of plants, potential DCs, and retailers. Given the  hierarchical structure of the problem, a two-stage stochastic  mathematical model is presented to integrate the inventory planning  decisions, made under (tsS)  inventory policy, with the location-allocation decisions to deal with  nonstationary demand. Linear approximation technique and sample average  approximation method are used to increasing the tractability of the  stochastic programming. Due to the NP-hard of the problem, a three-step  hierarchical metaheuristics algorithm is proposed to solve the model.  Numerical experiments are conducted to validate the modelling and the  three-step algorithm. Meanwhile, the impact of problem sizes, demand  types and cost structures on the supply chain design solution and it’s  cost breakdownare presented to give managerial insights.



# Mrs. ZHANG Jia

Title: High Dimensional Elliptical Sliced Inverse Regression in non-Gaussian Distributions

Abstract: Sliced  inverse regression (SIR) is the most widely-used sufficient dimension  reduction method due to its simplicity, generality and computational  efficiency. However, when the distribution of the covariates deviates  from the multivariate normal distribution, the estimation efficiency of  SIR is rather low. In this paper, we propose a robust alternative to SIR  - called elliptical sliced inverse regression (ESIR) for analyzing high  dimensional, elliptically distributed data. There are wide applications  of the elliptically distributed data, especially in finance and  economics where the distribution of the data is often heavy-tailed. To  tackle the heavy-tailed elliptically distributed covariates, we novelly  utilize the multivariate Kendall’s tau matrix in a framework of  so-called generalized eigenvector problem for sufficient dimension  reduction. Methodologically, we present a practical algorithm for our  method. Theoretically,we investigate the asymptotic behavior of the ESIR  estimator under high dimensional setting. Quantities of simulation  results show that ESIR significantly improves the estimation efficiency  in heavy-tailed scenarios. Analysis of two real data sets also  demonstrates the effectiveness of our method. Moreover, ESIR can be  easily extended to most other sufficient dimension reduction methods and  applied to non-elliptical heavy-tailed distributions.