The Responsible Operations: Data Science, Machine Learning, and AI in Libraries report authored by Thomas Padilla and published by OCLC meets the pudding test. A foundational piece of research outlining seven professional challenges to ethical approaches (and practical steps forward) to using computational methods in library practice.
- Committing to Responsible Operations
- Description and Discovery
- Shared Methods and Data
- Machine-Actionable Collections
- Workforce Development
- Data Science Services
- Sustaining Interprofessional and Interdisciplinary Collaboration
There is a lot of really useful detail in this report for library and information professionals (and the GLAMRs more broadly) to work through to start stepping into the space of data science, machine learning and artificial intelligence. The research was undertaken through consultation with advisory and landscape groups. The underlying purpose for the landscape group interested me in particular, which was to build in where possible diversity of viewpoint and skill base (less of the – same old and more new faces).
The landscape group is composed of individuals working in libraries and a group of external experts. Library staff were selected with an eye toward diversification of roles and institutional affiliations. (Page 7)
What this post focuses on is a couple of the structural issues and tensions around the use of artificial intelligence in the cultural heritage sector that are lighting up (for me) like the Milky Way – complex and beautiful, and, socially and culturally embedded. Those ideas are outlined in brief here with some questions.
Openness and Open Access
The more I read about machine learning methods for processing data (e.g. supervised, unsupervised, and reinforcement) the more obvious it is that higher level principles need to shape new professional methods (experimentation and diligence) and ethics (transparency and consultative). In the Responsible Operations report the point is made that dealing with probability and certainty will require a professional shift in thinking and new benchmarks and standards will need to be established. Cultural heritage documentation practices have been published and more recently contested (this is also referred to in the report).
Good to learn recently that the AI4LAM community are kicking off with ideas for project registration as an information sharing exercise. There are practices in other realms that may be fit for purpose here as a further step, i.e. the registration of trials and publishing on data curation and making datasets available through data journals. By “available” I mean all levels of access from public access to highly controlled access, and coupled with instructions for attribution and rights information etc*. Scholarly communication on the data science and computational methods is both a strong stance on transparency and the commitment to review and acknowledge errors or issues, and it also affords the traditional recognition of professional expertise and skill through publication. The commercial publishers have already anticipated this… e.g. the ACM Journal on Computing and Cultural Heritage and the Journal of Cultural Heritage. A community wide commitment to openness in approach and open access for scholarly communication is needed so that this practice based information is widely available, and so practitioners undertaking computational work and data curation can receive and respond to critical feedback.
Can cultural heritage curation and documentation practice that integrates AI be transformed or will there be a schism in practice?
Sense and Sensibility
Really wonderful to see the five principles for AI laid out by Floridi and Cowl referenced in the report:
Beneficence — Non maleficence — Autonomy — Justice — Explicability
Not unsurprisingly the language of computer science is reused and ported into the report e.g. HITL (human in the loop) and “gold standard” training datasets. As the community uses and examines the use of this language, it seems reasonable to anticipate some expansion or changes in definitions, challenging or upending and diversifying positions in relation to technology (who or what is in control, making the decisions, why and when). Look ahead for a shift toward social:
- A conscious move toward CITL (computer in the loop) or human-computer interaction (HCI) and a repositioning of the role of technology in decision-making.
- A move to critical evaluation of datasets as reference tools (for machine learning) with a well known model in library practice e.g. the CRAAP (currency, relevance, authority, accuracy, purpose) test.
- The use of two other useful mnemonics to ensure practices are sharp and we consider where the sharp edges of society have already left a mark.
The use of sense and sensibility in this section was a light play on the need for further debate on.
- Reusing commercial tools and existing training datasets, e.g. the advantages and pitfalls of using AutoML (where and how it fits into data analytics platforms and work practice).
- Taking collective approaches to tackling the development of reference datasets (when the digital collections are in distributed custody) that are (relatively) uncontroversial and aimed at mass benefit (e.g. newspapers and magazines) to break the ground and do the learning.
- Not stopping there with the collective action and working on a conscious curatorial program to strengthen collection access where it is weak or vulnerable and where community engagement drives the agenda (e.g. working with specific community groups, tackling subject areas that lie across collection areas and institution types, and research projects driven).
This all points to a fairly radical rethink on building up technical and curatorial skills in general (the siloing and scarcity of these skillsets and expertise is mentioned in several places in the report) and changes in organisational structures, leaders confident with transforming their organisations, and professionals willing to take on new areas of expertise and skillsets.
Can cultural heritage community mesh technical and social practice change in AI to reposition and reassert themselves as social institutions, or is this part of a generational shift in practice?
Public interest and trustworthiness play a big part in the social and custodial roles that cultural institutions fulfill in society. The test with integrating AI into the curatorial and documentation practices of cultural heritage institutions will need to be defined and measured by these same facets. Who benefits, why, and is this work toward building artificial intelligence into practice improving or further advantaging those that have already been privileged with access to cultural heritage collections?
Artificial intelligence is going to upend things… and the cultural heritage community needs to hang onto its social conscience and expose this work to sunlight. This work needs to be both practical and useful and also have a socially active and progressive agenda.
*A good example: the National Library of Australia has a readme.txt file for The Australian Government Gazettes 1832-1968. This collection is formed through collective effort (National Library of Australia, the State Library of New South Wales and the Office of the Parliamentary Council) such that a national reference collection is available as data.