Big data conversion techniques

Including their main features and characteristics

In this paper we are concerned with the task of transforming unstructured big data into a limited number of time series which efficiently summarise the relevant information for nowcasting or short term forecasting the economic indicator(s) of interest. Data structuring and conversion is a difficult task, as the researcher is called to translate the unstructured data and summarise them into a format which is both meaningful and informative for the nowcasting exercise.

The available literature on data structuring is in its early stages, with few contributions specifically focusing on this problem when using big data for economic forecasting or nowcasting. The common approach is to use summary time series indicators readily available, Google trends in particular, as highlighted in the literature. Instead, in this paper we would like to provide general rules and approaches that enable this complex data transformation and reduction, with the goal of extracting from large unstructured datasets a smaller number of time series indicators, which can then be analysed with either standard or high dimensional time series tools.

The rest of the paper is organised as follows. Section 2 presents a brief discussion of the related literature. Section 3 provides a general framework for the conversion of big data into time series format along with simulated cases. We discuss feature extraction, data mining, clustering, and random subsampling in separate subsections. Section 4, which is entirely new, illustrates various empirical big data examples, using publicly available samples. In Section 5, which is also entirely new, we illustrate big data structuring by analysing the various steps involved in the construction of a daily uncertainty indicator, potentially relevant for nowcasting several economic variables, starting from a dataset of over 3 million economic and financial articles covering a period of about 10 years. Section 6 draws the main conclusions of this paper.

