Position Statement No. 21: Broader data access crucial for the fight against the pandemic to be effective

26 October 2021

Data should be harnessed more broadly not only by decision-makers but also by researchers – the more quality data is available, the more likely the phenomena under study will be properly understood. Amid an epidemic crisis that can only be overcome by sensible behavior on a societal scale, ensuring wider data access for journalists and citizens is of key importance.

Strategic decisions made without proper data and analysis may not only be ill-guided but can also have severe consequences. This is especially true in crisis situations, such as the current COVID-19 pandemic. Data should be harnessed more broadly not only by decision-makers but also by researchers – the more quality data is available, the more likely the phenomena under study will be properly understood. Amid an epidemic crisis that can only be overcome by sensible behavior on a societal scale, ensuring wider data access for journalists and citizens is of key importance. In this position statement, we analyze how the available data can be more fully harnessed during the COVID-19 pandemic.

Data during the pandemic

Highly aggregated data on infections, hospitalizations, and deaths, as well as interventions such as testing and vaccination, is needed for tracking the evolution of the pandemic on an international scale and assessing the effectiveness of different prevention strategies. These objectives are served by global data repositories such as Worldometer, Our World in Data, or COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, which use such modern technologies as automated retrieval of web-published data. Access to these repositories is open to the public, but they contain only general information.

Decision-making at the local, national, or regional level requires more detailed information, such as which age and occupational groups are getting infected, whether there are local outbreaks, which groups are at risk of severe disease, or what the vaccination status is of those who get sick. Such information is collected as part of epidemic surveillance systems. Additional data is also generated by systems that support administrative processes, such as the isolation, quarantine, and test-order support systems. This constitutes a particularly rich resource when combined with other administrative data, for instance, information on employment status, marital status, parental status, etc. In Poland, however, only basic statistics on COVID-19 incidence are publicly available, and for a long time, even these have been published in a format difficult to download and use.

A separate category consists of data generated by the use of new digital technologies. This includes mobility data from cell phones and phone apps that track contacts made and quarantine compliance, or apps where one can document symptoms and, for example, order a SARS-CoV-2 test. Some of this data is held in the private sector, and some parts of it has been made publicly available, such as the COVID-19 Mobility Reports. Data collected by public applications, on the other hand, is hardly ever made available to the public.

During this pandemic, additional funding has been allocated to research aimed at understanding the virus itself, the pathophysiology of the disease, the routes of transmission, the social processes involved, and the broader consequences of the pandemic. Some of this data has been shared with other researchers. There have also been initiatives to create repositories of data obtained through publicly funded research projects, but these are so far quite few in number and limited to narrow topics or disciplines. Global research ventures such as the Rapid-Response COVID-19 Project (PSACR) are also good examples here. The purpose of the activities included in this project is to conduct rigorous international research dedicated to understanding the psychological and behavioral aspects of the COVID-19 crisis. The advantage of such an effort is the large scale of data collected, which not only increases the reliability of the results obtained but also provides excellent opportunities for cross-cultural comparisons. In Poland, many researchers are pursuing work on various aspects of COVID-19, but their work addresses fragmented issues, is conducted on a small scale and in isolation from other researchers. A lack of coordination, cooperation and established habits of sharing ideas and information hinders the harnessing of the existing research potential in Poland and significantly reduces the importance and rank of the results.

In summary, there is a great deal of data that is being collected on an ongoing basis during this pandemic. This includes epidemic data, administrative data, data from research projects, or data from users of apps and different services. While these types of data are being used by the decision-makers, they are not being used to their full potential. Combining administrative resources would allow, for example, for the incidence of COVID-19 in selected occupational groups to be studied, the severity of the disease in patients with comorbidities to be gauged, or hospitalization rates among vaccinated and unvaccinated individuals to be compared. If we could integrate epidemiological data with psychological or social data, we could also better understand the influence that non-medical factors have on the development and course of the disease.

Making more databases available to researchers would provide a unique opportunity to capitalize on the scientific community’s interest in the pandemic. Moreover, perhaps better evidence-based administrative decisions could be made on the basis of their in-depth analyses. Access to data would also allow for verification and increase the credibility of sensible government decisions aimed at fighting the pandemic.

Health data is sensitive data, so when it is shared, pains must be taken to ensure that it is fully anonymized and that no individual can be identified. It is also important to note that while it may not be possible to identify an individual using the original dataset, its combination with additional information may allow for the identification of that person. The more information included in a data set, the higher the risk of identification of an individual. Therefore, the sharing of personal data must always be considered from this perspective and must be subject to specific rules.

A culture of data reuse

Many data repositories have been established during this pandemic. One worthy of note is the data collected, aggregated, and published by the European Centre for Disease Prevention and Control (ECDC). ECDC is indeed a good example here, as much of the data collected by this institution is made freely available for use for any purpose. However, access to potentially sensitive detailed personal data is only granted on the basis of a specific request from researchers, in which the scope of the data requested and the research objectives are precisely defined. This procedure allows for transparency in the data collection and sharing process and at the same time makes it possible to use data from all over Europe to undertake research work.

This aspect should be taken into account already at the stage of database design; this helps to ensure a transparent and efficient process of accessing data, in particular administrative data from public registers. It is also necessary to designate an institution responsible for providing this access.

During work with complex databases, when it becomes necessary to integrate data from different sources, non-standard operations to prepare the dataset for research may be required. Consequently, a team of professionals who are familiar with the structure of these resources may be needed to prepare that data for further analyses. Currently, COVID-19 records are kept by several institutions in Poland (the e-Zdrowie government health portal, Chief Sanitary Inspectorate, National Institute of Cardiology – National Research Institute, National Institute of Public Health – NIH – National Research Institute), and data is exchanged between these registries. However, rules for the possible sharing of data for research have not been set out, and there is no designated institution to take charge of this process.

At the same time, there usually are many limitations on using secondary data from administrative sources, and these limitations should be taken into account during data analysis. Knowledge of the data acquisition process is important for both working with registry data and data acquired from regular research and experiments. Thus, the process of secondary data analysis requires detailed knowledge of the data collection process. When working with more complex data sets, collaboration with the institutions responsible for data collection is required. We suggest that it may be established as a good practice for data to be published along with its description as a separate publication (a “data paper”), focusing more on the data itself than on the conclusions drawn from it. The role of such a publication would also be to secure recognition for the data collection process itself. Understanding this process and verifying data consistency should be one of the objectives of the institution responsible for the data sharing process.

Recommendations

Fostering a culture of making data widely available is likely to help instill confidence in the decisions of the government, which is crucial in dealing with the pandemic. Therefore, we recommend the following:

making as much detailed data as possible publicly available, free of charge and without registration. Such data should be available to the media, businesses, and to the general public. A sustainable platform that would allow for the visualization of data, as well as retrieval of up-to-date data in a form that allows for its further analysis, needs to be created and maintained. This will require a clear identification of the acceptable level of detail in data sharing, in compliance with personal and sensitive data protection laws.
providing far greater access to administrative and research datasets for the purpose of conducting secondary analyses on COVID-19. To share these resources securely and make datasets more widely available, it is necessary to create a suitable infrastructure; that includes establishing transparent rules for data sharing and, very importantly, appointing an institution to be in charge of this process. The policies for data sharing should be developed in collaboration with the scientific community and data protection specialists.
establishing a specialized independent entity to maintain a research data repository, particularly data from population-based social surveys on attitudes and behaviors observed during the pandemic. This unit could also serve to coordinate the acquisition of such data to allow for an independent trends assessment.
broader sharing of research findings Scientific publications, while very important, nevertheless take time to appear. Time is of the essence during the pandemic, when it is important to share key findings as soon as possible. In our Position Statement 18 on public communication during the pandemic, we highlighted the critical role of independent institutions and expert groups. Such panels could also provide a forum for discussion on research findings that have not yet been published.
participation in international initiatives dedicated to data resources available to both researchers and businesses. Providing open access to information resources is seen as a long-term development direction, and is part of the European Strategy for Data. This strategy introduces the principle of open and free use and distribution of data sets that come from public registries and publicly funded research, and emphasizes the need to establish fair and clear rules for access to these data. It is also necessary to invest in infrastructure, including pan-European infrastructure, and to ensure that the data-generating institutions have the right powers, tools, and skills. In accordance with this strategy, the European Commission, in cooperation with scientific communities, has taken the initiative to establish the European Open Science Cloud, in which the Polish National Science Centre takes an active part. In a few years’ time, the European Cloud will be a virtual environment that will offer accessible services of storing, managing, analyzing, and reusing research data, which will be shared among different scientific disciplines and the EU Member States. Still, further work is needed in this area.

About the team

The Interdisciplinary COVID-19 Advisory Team to the President of the Polish Academy of Sciences was set up on 30 June 2020. The team is chaired by Prof. Jerzy Duszyński, President of the PAS, with Prof. Krzysztof Pyrć (Jagiellonian University) as deputy chair and Dr. Anna Plater-Zyberk (Polish Academy of Sciences) as its secretary. Other members of the team are:

• Dr. Aneta Afelt (University of Warsaw)
• Prof. Małgorzata Kossowska (Jagiellonian University)
• Prof. Radosław Owczuk, MD (Medical University of Gdańsk)
• Dr. Anna Ochab-Marcinek (PAS Institute of Physical Chemistry)
• Dr. Wojciech Paczos (PAS Institute of Economics, Cardiff University)
• Dr. Magdalena Rosińska, MD (National Institute for Public Health – National Hygiene Institute, Warsaw)
• Prof. Andrzej Rychard (Institute of Philosophy and Sociology PAN),
• Dr. Tomasz Smiatacz, MD (Medical University of Gdańsk)