Journal:Open data: Accountability and transparency
|Full article title||Open data: Accountability and transparency|
|Journal||Big Data & Society|
|Author(s)||Mayernik, Matthew S.|
|Author affiliation(s)||University Corporation for Atmospheric Research|
|Primary contact||Email: mayernik at ucar dot edu|
|Volume and issue||4(2)|
|Distribution license||Creative Commons Attribution-NonCommercial 4.0 International|
The movements by national governments, funding agencies, universities, and research communities toward “open data” face many difficult challenges. In high-level visions of open data, researchers’ data and metadata practices are expected to be robust and structured. The integration of the internet into scientific institutions amplifies these expectations. When examined critically, however, the data and metadata practices of scholarly researchers often appear incomplete or deficient. The concepts of “accountability” and “transparency” provide insight in understanding these perceived gaps. Researchers’ primary accountabilities are related to meeting the expectations of research competency, not to external standards of data deposition or metadata creation. Likewise, making data open in a transparent way can involve a significant investment of time and resources with no obvious benefits. This paper uses differing notions of accountability and transparency to conceptualize “open data” as the result of ongoing achievements, not one-time acts.
Keywords: Open data, accountability, transparency, data policy, data, metadata
The movements by national governments, funding agencies, universities, and research communities toward “open data” face many difficult challenges. As a slate of recent studies have shown, the phrase “open data” itself faces at least two central questions, namely (1) what are “data”? and (2) what is “open”? In the face of the vagueness of these terms, individuals, research projects, communities, and organizations define “data” and “openness” in a variety of ways, often via informal norms in lieu of codified policies.
The concepts of “accountability” and “transparency” provide insight in understanding how open data requirements and expectations are achieved in different circumstances. An individual or organization is accountable for “open data” when they are answerable for the act(s) of making data open, whatever those acts might be. Being accountable means having to justify actions and decisions to some individual or organization. Transparency, on the other hand, refers to the notion that information about an individual or organization’s actions can be seen from the outside. Both concepts feature prominently in research and policy discussions concerning the relations that governments, organizations, and other social bodies have with their constituents or communities.
Accountability and transparency
In high-level visions of open data, researchers’ data, and metadata practices are expected to be robust and structured. The integration of the internet into scientific institutions amplifies these expectations, as it provides a seemingly ubiquitous data distribution mechanism. When examined critically, however, the data and metadata practices of scholarly researchers often appear incomplete or deficient. The concept of accountability helps to guide explanations for data practices that seem, on the surface, to be insufficient. “Accountability” is a concept drawn from multiple social science traditions, including studies of governance in organizations and nations, and studies of mundane activities in everyday life. It is important to remember that for most researchers, working with data is a very mundane activity. As Pink et al. note, data are intertwined with everyday routines, and often entail significant improvisation, both in data generation and use. For field-based scientists, such as ecologists and archaeologists, data may literally emerge from the dirt. For laboratory and computational scientists, data generation and management are less obviously subject to worldly interference, but are nevertheless imperfect human activities. To be accountable for data, researchers must be able to describe in a way sufficient for the social situation at hand how any perceived data problems are anomalous, correctable, or in fact not problematic at all — they must be “answerable” for their data. Simply being answerable for data can be called soft accountability. When soft accountability is coupled with the possibility of sanctions for non-compliance, such as loss of research funding or journal article rejections for a lack of data archiving, researchers face hard accountability.
Turning now to transparency, being transparent is often described as a public value and norm of behavior that counters corruption, and enables easy access and use of information. Diverse political drivers are increasing the attention on transparency as it relates to open data. As a result, researchers are increasingly being asked or required to enable their data to be transparent by sharing with colleagues or making data available on the web. Transparency in research is almost always selective, however. Researchers may have numerous incentives to keep particular aspects of the work out of the eye of the public or their research competitors, including the fear of being scooped, or a lack of time to fully clean, process, and package data. This selective character of research openness suggests a distinction between different kinds of transparency, specifically, opaque and clear transparency. Opaque transparency refers to the dissemination of information that does not reveal how people actually behave in practice, while clear transparency involves using information-access policies and programs that do, in fact, reveal reliable information about human or organizational actions.
Categorizing “open data”
The strength of these two concepts — accountability and transparency — emerges when they are coupled together. Enabling one does not necessarily mean enabling the other. Figure 1 presents a model of open data that couples the hard and soft accountability distinction with the opaque and clear notions of transparency. The color scheme indicates the relative likelihood (from green to red) that research practices falling in each cell will achieve the goals of “open data” policy initiatives, namely broad accessibility and usability of data.
The top row in the model illustrates how the possibility of sanctions (hard accountability) may impel data to be made publicly available. The goal of sanctions is to lead to clear transparency, where well documented data are archived in long-term data repositories. The threat of sanctions, however, can also lead to situations in which researchers make data available somewhere online as quickly and minimally as possible in order to “check the box.” In these situations, data may be freely accessible, but very difficult to understand or use by anybody outside of a very narrow circle (opaque transparency). Posting vaguely documented files on personal web sites is often good enough for members of closely bound communities to evaluate and access data sets, allowing researchers to be accountable for their work even if not being fully transparent to all possible audiences. Questions about data in such cases are typically resolved via informal means like direct communication via email. In the third column — hard accountability with no transparency — sanctions carry consequences.
The middle row depicts how open data might manifest under a soft accountability regime, that is, where researchers are expected to be managing and archiving data effectively, but no concrete sanctions exist in the case of non-compliance. With most national funding agency policies toward data management still in the early stages and enforcement mechanisms lacking for many such policies, this soft accountability regime is probably the most common form faced by research institutions currently. Similar to the hard accountability situations described above, open data under soft accountability ranges from highly structured and robust data archiving to a total lack of data management. The difference is that the motivations for making data open, in whatever fashion, come in the form of positive incentives, as opposed to the potential for negative consequences associated with sanctions.
The bottom row in the model depicts scenarios in which no accountabilities for open data exist, or scenarios in which accountabilities related to data are so diffused in the context of highly distributed scientific activity as to be effectively absent. The distinction between clear and opaque transparency is not very useful in these situations. Not being answerable for making data open (hard or soft accountability) implies that nobody is asking for the data, or about the data. Making data open in a transparent way when there is no known interest for the data can be a risky proposition for researchers, as it can involve a significant investment of time and resources with no obvious benefits. The “open data” challenge in this category, however, is that the future uses (and users) of data can almost never be fully predicted. Finally, the bottom right corner of the figure depicts situations in which researchers face no accountabilities for making data open, and no expectation of transparency. This category might best be encapsulated as a “data gulag,” a term that refers to the “incarceration of large numbers of data resources in dark repositories, to be manipulated and viewed only by their masters." As open data policies mature and diversify, the population of research endeavors that sit in these “gulags” should be shrinking, but the full extent of the data resources that fall into such disrepair will continue to be difficult to assess for some time.
This model illustrates a few key insights. First, good data management can happen even without sanctions. Many research communities have developed robust data repositories and other institutional support for data archiving without formal requirements from research funders or journals. These efforts find ways to integrate community norms and routines, data and metadata standards for archiving and interchange, professional data management roles, and technical infrastructures. Second, the transparency concept clearly implies accessibility — if something is not accessible, it cannot be transparent — but providing access does not itself make something transparent. Achieving clear transparency can be difficult. What appears to be clearly transparent for one community may be totally opaque to another. An added complication is that some researchers may not have license to allow secondary access to their data. This may be the case in human or health-related data, or for data collected via proprietary social media where data re-distribution is not allowed. Whatever the case, data providers can find themselves facing new kinds of accountabilities over time, corresponding to emergent needs for new kinds of transparency. They may need to create new voices and narratives around data to account for changing expectations and requirements, and may require new forms of data “intermediaries” to successfully navigate these shifting grounds. Transparency and accountability are thus ongoing achievements, not one-time acts.
Being a “competent researcher” has always involved the ability to generate data that meet the standards of evidence in a given domain. In most situations involving daily research tasks (e.g., data collection and documentation, and writing publications and reports), researchers’ daily data practices do not have to be perfect, they just have to be explainable. The integration of the internet into research institutions has changed the kinds of accountabilities that apply to research data, and enabled new kinds of transparency. Achieving “openness” requires the navigation of these context-specific accountabilities and transparencies.
The author thanks Karen Baker and Mary Marlino for comments on previous drafts.
Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of NCAR or the NSF.
Declaration of conflicting interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The National Center for Atmospheric Research (NCAR) is sponsored by the US National Science Foundation (NSF).
- Borgman, C.L. (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. MIT Press. pp. 416. ISBN 9780262028561.
- Leonelli, S. (2015). "What Counts as Scientific Data? A Relational Framework". Philosophy of Science 82 (5): 810–821. doi:10.1086/684083. PMC PMC4747116. PMID 26869734. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4747116.
- Levin, N.; Leonelli, S.; Weckowska, D. et al. (2016). "How Do Scientists Define Openness? Exploring the Relationship Between Open Science Policies and Research Practice". Bulletin of Science, Technology, and Society 36 (2): 128–141. doi:10.1177/0270467616668760. PMC PMC5066505. PMID 27807390. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC5066505.
- Pasquetto, I.V.; Sands, A.E.; Darch, P.T. et al. (2016). "Open Data in Scientific Settings: From Policy to Practice". Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems 2016: 1585-1596. doi:10.1145/2858036.2858543.
- Pomerantz, J.; Peek, R. (2016). "Fifty shades of open". First Monday 21 (5). doi:10.5210/fm.v21i5.6360.
- Leshner, A.I. (2009). "Accountability and Transparency". Science 324 (5925): 313. doi:10.1126/science.1174215.
- Lessig, L. (8 October 2009). "Against Transparency". New Republic. Hamilton Fish. https://newrepublic.com/article/70097/against-transparency.
- McNutt, M.; Lehnert, K.; Hanson, B. et al. (2016). "Liberating field science samples and data". Science 351 (6277): 1024-1026. doi:10.1126/science.aad7048.
- Agre, P.E. (2011). "Real-Time Politics: The Internet and the Political Process". The Information Society 18 (5): 311–331. doi:10.1080/01972240290075174.
- Van Tuyl, S.; Whitmire, A.L. (2016). "Water, water, everywhere: Defining and assessing data sharing in academia". PLOS ONE 11 (2): e0147942. doi:10.1371/journal.pone.0147942. PMC PMC4757565. PMID 26886581. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4757565.
- Vines, T.H.; Albert, A.Y.; Andrew, R.L. et al. (2014). "The availability of research data declines rapidly with article age". Current Biology 24 (1): 94–7. doi:10.1016/j.cub.2013.11.014. PMID 24361065.
- Bovens, M. (1998). The Quest for Responsibility: Accountability and Citizenship in Complex Organisations. Theories of Institutional Design. Cambridge University Press. pp. 266. ISBN 9780521481632.
- Garfinkle, H. (1967). Studies in Ethnomethodology. Prentice-Hall, Inc. pp. 288. ISBN 10987654321.
- Woolgar, S.; Neyland, D. (2014). Mundane Governance: Ontology and Accountability. Oxford University Press. pp. 328. ISBN 9780199584741.
- Pink, S.; Sumartojo, S.; Lipton, D. et al. (2017). "Mundane data: The routines, contingencies and accomplishments of digital living". Big Data & Society 4 (1). doi:10.1177/2053951717700924.
- Gitelman (2013). Raw Data Is an Oxymoron. MIT Press. pp. 192. ISBN 9780262518284.
- Fox, J. (2007). "The Uncertain Relationship between Transparency and Accountability". Development in Practice 17 (4/5): 663-71.
- Ball, C. (2009). "What Is Transparency?". Public Integrity 11 (4): 293–308. doi:10.2753/PIN1099-9922110400.
- Levy, K.E.C.; Johns, D.M. (2016). "When open data is a Trojan Horse: The weaponization of transparency in science and governance". Big Data & Society 3 (1). doi:10.1177/2053951715621568.
- Jasonoff, S. (2006). "Transparency in Public Science: Purposes, Reasons, Limits". Law and Contemporary Problems 69 (3): 21-46. https://scholarship.law.duke.edu/lcp/vol69/iss3/2.
- boyd, d. (29 November 2016). "Transparency ≠ Accountability". Points. Data & Society Research Institute. https://points.datasociety.net/transparency-accountability-3c04e4804504. Retrieved 23 June 2017.
- Edwards, P.N.; Mayernik, M.S.; Batcheller, A.L. et al. (2011). "Science friction: data, metadata, and collaboration". Social Studies of Science 41 (5): 667-90. doi:10.1177/0306312711413314. PMID 22164720.
- Leonelli, S. (2016). "Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374 (2083): 20160122. doi:10.1098/rsta.2016.0122. PMC PMC5124067. PMID 28336799. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC5124067.
- Uhlir, P.F. (2010). "Information Gulags, Intellectual Straightjackets, and Memory Holes: Three Principles to Guide the Preservation of Scientific Data". Data Science Journal 9: ES1–ES5. doi:10.2481/dsj.Essay-001-Uhlir.
- Mayernik, M.S. (2016). "Research data and metadata curation as institutional issues". Journal of the Association for Information Science and Technology 67 (4): 973-993. doi:10.1002/asi.23425.
- Baker, K.S.; Duerr, R.E.; Parsons, M.A. (2015). "Scientific Knowledge Mobilization: Co-evolution of Data Products and Designated Communities". International Journal of Digital Curation 10 (2): 110–135. doi:10.2218/ijdc.v10i2.346.
- Couldry, N.; Powell, A. (2014). "Big Data from the bottom up". Big Data & Society 1 (2). doi:10.1177/2053951714539277.
- Schrock, A.; Shaffer, G. (2017). "Data ideologies of an interested public: A study of grassroots open government data intermediaries". Big Data & Society 4 (1). doi:10.1177/2053951717690750.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.