Article by Linda Raftree
At the Digital Development: From Principles to Practice Forum, ICT4D practitioners came together to discuss the inherent tensions between open and interoperable data for transparency and performance improvement, while also protecting vulnerable populations’ privacy and respecting community concerns around data security.
The session started with a high level overview of the pros and cons of open data (as seen through a privacy/security lens). Then breakout groups dove deeper into three subcategories – transparency, data for decision-making, and interoperability. Each group looked at definitions and misunderstandings, pros/cons, and possible solutions, and presented definitions and key takeaways for further discussion.
Major opportunities for open, interoperable and shared data in international development included:
- improved data for monitoring and evaluation and performance management,
- improved subnational data,
- creation of data “assets” by different audiences,
- re-use and validation of data.
However, the room quickly found that most of the pros were also cons, and vice versa. For example, open data can improve accountability but it can also increase liability. Tracking personally identifiable information can mean improved transparency but also greater vulnerability.
Many additional concerns were brought up, such as the fact that there is already a lot of data being collected but much of it is never used, and yet, the international development community keeps asking for more and more data. Participants highlighted that much of the available data is hard to use because it is of poor quality, information about methodology and approach is missing, or is not interoperable.
In addition, key questions arose about who owns the data (especially when it is not owned by the people whose information is contained in it) and how do they use it? There are considerable concerns that data without context or nuance can be misleading or biased. There are also issues with the lack of a “sunset” policy in much collected data – future anonymity is not assured, especially as technology keeps changing and the ability to access, analyze and combine data sets keeps growing.
Finally, there was acknowledgement that the problem with data privacy and security is a new problem that is now part of our lives. No one has “figured it out” yet, anywhere and it is only going to get more complex, especially as individuals globally rely more and more on their digital identities for daily living.
3 Key takeaways
- There are many benefits to open data for a wide range of audiences and purposes. But every “pro” is balanced by a “con” based on usage and approach.
- There are considerable policy decisions around the balance between access to information and privacy concerns, data quality, how data is used, ownership issues, and the lifecycle of data.
- This is a new problem and no one has “figured it out” yet.
Data for Transparency
The first group looked at one of the explicit goals of open data – improved transparency. They asked the clarifying question of “transparency by whom and for whom?” There are different forms of transparency (and different responses required) based on whether we are talking about the economic markets, government services, or scientific research.
Major questions that need to be asked around data for transparency include: who owns the data, for whom is the data transparency intended, and for what use?
Part of the definition of transparency included access to information and accountability for the information results, and ways to assess the risks/benefits. There is a lack of standardized policies and approaches across the international development community to address many of these questions.
Positives around open data for transparency focused on increasing access to information which can lead to accountability, increased engagement by citizens and other stakeholders, and innovative ideas through analysis. Challenges revolved around privacy concerns, especially for already vulnerable populations, data errors and quality, possibility for manipulation and misuse (such as market pricing) and unintended consequences.
A deeper discussion around informed consent (especially related to government services or health information) and public information (especially for scientific research) is required.
- There is a balance between privacy and accountability.
- We need methods to weigh the risks vs. benefits of opening data.
- Deeper discussions on informed consent are needed.
Data for Decision-making
The group on data for decision making started by asking the clarifying question of who is making the decision and for whom? Data can be used by a wide variety of groups, from NGOs, businesses, government, military, citizens, financial institutions, small businesses, and criminals. They also specified that there is a continuum of data – data may be in different states, such as raw, aggregated, filtered, analyzed, etc.
One key concern is how to use data without losing the context – i.e. critical information on what the data means. Decisions without this context would be highly misguided or incorrect. The example given was that of Ebola data and Liberia. If you just looked at economic statistics during the Ebola crisis without realizing the crisis was occurring in the background, the numbers would be misleading as to the issues in the country.
Another concern is around having bad quality data and not realizing it. Poor data can lead to poor decisions. Also, raw data is not the same as analyzed data – the analysis involves processing the data and hopefully taking into account quality, context, etc. Getting access to analyzed data used in decision making can be as useful as the raw data.
There is a need for clear policy and standards (from donors, organizations, etc.), preferably saying ‘open it’ or ‘share’ by default, with incentives for sharing (especially data used to perform an analysis).
- Need defined analytics per specific need/ question/ project.
- Need better quality control in data collection.
- Need guidance on how to use data for analysis without losing context.
- Analytics is not equal to a data dump. There are stages in-between raw and final.
Interoperability of data means that a set of facts and figures can be interchanged (aggregated, cross referenced, and layered). This requires a standardized format/structure and definitions (including methodology of how it’s collected, time length, etc.). It’s important too to note that “open data” is not the same as “big data.” A lot of the data being discussed is small to sizable data.
The pros were too many to mention and have been outlined by most in the open data community. Some of the challenges outlined included the lack of incentives for many organizations to make their data interoperable by default, especially when interoperability can be seen as undermining their competitiveness or as being too costly.
Many organizations are concerned about a lack of flexibility if they are required to follow a standard – they feel that the standard will constrain their ability to collect data for their specific needs. There is also the concern that increased interoperability of data leads naturally to increased vulnerability as it becomes easier to “reassemble” stripped data which create privacy concerns.
- Pros are too many to mention.
- Lack of incentives to make data interoperable.
- Loss flexibility in content when following standards.
- Makes it easier to reassemble/reanonymize data.
Recommendations for the Foreign Assistance Community
In conclusion, the following is a brief list of recommended topics for the foreign assistance community to address in regard to the tension between open data and privacy and security.
- The need for standardized policies and approaches in the international development community to address:
- Balance between privacy and accountability and how to weight the risks vs. benefits of making this data open. Define transparency for whom? For what purpose?
- Balance of informed consent (especially related to government services or health information) and public information/scientific research.
- Balance of standardization and context/nuance.
- How to define decision-making analytics per specific need/ question/ project.
- Ownership issues around data.
- The lifecycle of data, especially in light of changes over time to technology and state of data.
- The need to create incentives in the international development community for:
- Sharing data by default
- Better quality control in data collection, and stages in between.
- Making data interoperable by default.
- Identifying and addressing emerging security/privacy needs.
- Using existing data for decision making, thoughtfully.