By Seyit Hocuk, Bertin Martens, Patricia Prufer, Bruno Carballa Smichowski & Néstor Duch-Brown
Economies of scope in data aggregation (ESDA) are attracting the attention of policymakers and researchers because of the efficiency gains they could bring about. Antitrust authorities, in turn, are concerned about their potential anti-competitive outcomes. However, the concept remains blurry and lacks empirical backing. We provide a definition: the improvement in the predictive power of a dataset resulting from adding complementary variables to it. It differs from traditional economies of scope, which are based on re-use of data or other resources. After deriving a theoretical model of ESDA, we estimate it by progressively adding explanatory variables to a dataset of health and health-related data that we use to predict health outcomes. Our three main findings confirm the existence of ESDA and lead to novel policy implications. First, in our dataset, a 1% increase in the number of predictor variables improves prediction accuracy in a range from 0.087% to 0.132%. Second, we find a positive non-linear relation between variable complementarity and ESDA. Third, in our models, ESDA are subject to increasing returns up to the third quartile of variables, and to diminishing returns thereafter. Our results support policies fostering the concentration of data in large pools with shared use rights.