Data should be harnessed more broadly not only by decision-makers but also by researchers – the more quality data is available, the more likely the phenomena under study will be properly understood. Amid an epidemic crisis that can only be overcome by sensible behavior on a societal scale, ensuring wider data access for journalists and citizens is of key importance.
Strategic decisions made without proper data and analysis may not only be ill-guided but can also have severe consequences. This is especially true in crisis situations, such as the current COVID-19 pandemic. Data should be harnessed more broadly not only by decision-makers but also by researchers – the more quality data is available, the more likely the phenomena under study will be properly understood. Amid an epidemic crisis that can only be overcome by sensible behavior on a societal scale, ensuring wider data access for journalists and citizens is of key importance. In this position statement, we analyze how the available data can be more fully harnessed during the COVID-19 pandemic.
Highly aggregated data on infections, hospitalizations, and deaths, as well as interventions such as testing and vaccination, is needed for tracking the evolution of the pandemic on an international scale and assessing the effectiveness of different prevention strategies. These objectives are served by global data repositories such as Worldometer, Our World in Data, or COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, which use such modern technologies as automated retrieval of web-published data. Access to these repositories is open to the public, but they contain only general information.
Decision-making at the local, national, or regional level requires more detailed information, such as which age and occupational groups are getting infected, whether there are local outbreaks, which groups are at risk of severe disease, or what the vaccination status is of those who get sick. Such information is collected as part of epidemic surveillance systems. Additional data is also generated by systems that support administrative processes, such as the isolation, quarantine, and test-order support systems. This constitutes a particularly rich resource when combined with other administrative data, for instance, information on employment status, marital status, parental status, etc. In Poland, however, only basic statistics on COVID-19 incidence are publicly available, and for a long time, even these have been published in a format difficult to download and use.
A separate category consists of data generated by the use of new digital technologies. This includes mobility data from cell phones and phone apps that track contacts made and quarantine compliance, or apps where one can document symptoms and, for example, order a SARS-CoV-2 test. Some of this data is held in the private sector, and some parts of it has been made publicly available, such as the COVID-19 Mobility Reports. Data collected by public applications, on the other hand, is hardly ever made available to the public.
During this pandemic, additional funding has been allocated to research aimed at understanding the virus itself, the pathophysiology of the disease, the routes of transmission, the social processes involved, and the broader consequences of the pandemic. Some of this data has been shared with other researchers. There have also been initiatives to create repositories of data obtained through publicly funded research projects, but these are so far quite few in number and limited to narrow topics or disciplines. Global research ventures such as the Rapid-Response COVID-19 Project (PSACR) are also good examples here. The purpose of the activities included in this project is to conduct rigorous international research dedicated to understanding the psychological and behavioral aspects of the COVID-19 crisis. The advantage of such an effort is the large scale of data collected, which not only increases the reliability of the results obtained but also provides excellent opportunities for cross-cultural comparisons. In Poland, many researchers are pursuing work on various aspects of COVID-19, but their work addresses fragmented issues, is conducted on a small scale and in isolation from other researchers. A lack of coordination, cooperation and established habits of sharing ideas and information hinders the harnessing of the existing research potential in Poland and significantly reduces the importance and rank of the results.
In summary, there is a great deal of data that is being collected on an ongoing basis during this pandemic. This includes epidemic data, administrative data, data from research projects, or data from users of apps and different services. While these types of data are being used by the decision-makers, they are not being used to their full potential. Combining administrative resources would allow, for example, for the incidence of COVID-19 in selected occupational groups to be studied, the severity of the disease in patients with comorbidities to be gauged, or hospitalization rates among vaccinated and unvaccinated individuals to be compared. If we could integrate epidemiological data with psychological or social data, we could also better understand the influence that non-medical factors have on the development and course of the disease.
Making more databases available to researchers would provide a unique opportunity to capitalize on the scientific community’s interest in the pandemic. Moreover, perhaps better evidence-based administrative decisions could be made on the basis of their in-depth analyses. Access to data would also allow for verification and increase the credibility of sensible government decisions aimed at fighting the pandemic.
Health data is sensitive data, so when it is shared, pains must be taken to ensure that it is fully anonymized and that no individual can be identified. It is also important to note that while it may not be possible to identify an individual using the original dataset, its combination with additional information may allow for the identification of that person. The more information included in a data set, the higher the risk of identification of an individual. Therefore, the sharing of personal data must always be considered from this perspective and must be subject to specific rules.
Many data repositories have been established during this pandemic. One worthy of note is the data collected, aggregated, and published by the European Centre for Disease Prevention and Control (ECDC). ECDC is indeed a good example here, as much of the data collected by this institution is made freely available for use for any purpose. However, access to potentially sensitive detailed personal data is only granted on the basis of a specific request from researchers, in which the scope of the data requested and the research objectives are precisely defined. This procedure allows for transparency in the data collection and sharing process and at the same time makes it possible to use data from all over Europe to undertake research work.
This aspect should be taken into account already at the stage of database design; this helps to ensure a transparent and efficient process of accessing data, in particular administrative data from public registers. It is also necessary to designate an institution responsible for providing this access.
During work with complex databases, when it becomes necessary to integrate data from different sources, non-standard operations to prepare the dataset for research may be required. Consequently, a team of professionals who are familiar with the structure of these resources may be needed to prepare that data for further analyses. Currently, COVID-19 records are kept by several institutions in Poland (the e-Zdrowie government health portal, Chief Sanitary Inspectorate, National Institute of Cardiology – National Research Institute, National Institute of Public Health – NIH – National Research Institute), and data is exchanged between these registries. However, rules for the possible sharing of data for research have not been set out, and there is no designated institution to take charge of this process.
At the same time, there usually are many limitations on using secondary data from administrative sources, and these limitations should be taken into account during data analysis. Knowledge of the data acquisition process is important for both working with registry data and data acquired from regular research and experiments. Thus, the process of secondary data analysis requires detailed knowledge of the data collection process. When working with more complex data sets, collaboration with the institutions responsible for data collection is required. We suggest that it may be established as a good practice for data to be published along with its description as a separate publication (a “data paper”), focusing more on the data itself than on the conclusions drawn from it. The role of such a publication would also be to secure recognition for the data collection process itself. Understanding this process and verifying data consistency should be one of the objectives of the institution responsible for the data sharing process.
Fostering a culture of making data widely available is likely to help instill confidence in the decisions of the government, which is crucial in dealing with the pandemic. Therefore, we recommend the following:
The Interdisciplinary COVID-19 Advisory Team to the President of the Polish Academy of Sciences was set up on 30 June 2020. The team is chaired by Prof. Jerzy Duszyński, President of the PAS, with Prof. Krzysztof Pyrć (Jagiellonian University) as deputy chair and Dr. Anna Plater-Zyberk (Polish Academy of Sciences) as its secretary. Other members of the team are:
• Dr. Aneta Afelt (University of Warsaw)• Prof. Małgorzata Kossowska (Jagiellonian University)• Prof. Radosław Owczuk, MD (Medical University of Gdańsk)• Dr. Anna Ochab-Marcinek (PAS Institute of Physical Chemistry)• Dr. Wojciech Paczos (PAS Institute of Economics, Cardiff University)• Dr. Magdalena Rosińska, MD (National Institute for Public Health – National Hygiene Institute, Warsaw)• Prof. Andrzej Rychard (Institute of Philosophy and Sociology PAN),• Dr. Tomasz Smiatacz, MD (Medical University of Gdańsk)
We use cookies on the PAN site to collect statistical data and for the proper functioning of the site. These files may be placed on your device to read the pages. For more information about the purpose of using and changing your cookie settings, please see our privacy policy.