My current preoccupation is looking into how and where AI fits into cultural heritage practice and humanities research. Drawing from a range of views on data and AI is paying off. This week’s mix was:
- An interview with Genevieve Bell from the 3AI Institute at the Australian National University by Jay Hasbrouck on Anthropology, cybernetics, and establishing a new branch of engineering at ANU from This is Human Centred Design
- A research paper written by Stephanie Russo Carroll, Desi Rodriguez-Lonebear and Andrew Martinez on Indigenous Data Governance: Strategies from United States Native Nations published in the CODATA Data Science Journal.
This coincidence of coming across these two works has prompted me to reflect on world views, culture, ethics and AI. At one point in the interview with Bell, she talks about the tensions in time; that technology change is rapid and culture change is slow. A good example of this “slowness” in the heritage collecting world is digital curation and preservation. Inspiring efforts by many in cultural heritage keep digital curation and preservation happening – and – high on the agenda. Significant slowness in the ability to digitally transform the heritage collections and adjust collecting practices is more due to funding constraints. But that slowness has brought with it some other benefits in terms of time for adjustment, reflection, and revision.
There are opportunities arising for cultural heritage institutions to test out and embrace AI as part of their digital collection management practices. But how? What techniques and what parts of your collection to tackle? What if the aim in testing out AI is to try to address structural biases that already exist? How can AI techniques be employed with adventurous, practical, critical and culturally sensitive mindsets?
So… onto unpicking the (sort of) easy bit — the technologies and approaches to using AI. My first set of questions: What are the differences between using pre-trained models as-is and instead making the choice to work and train those models (supervised)? At what what point is it useful to use unsupervised machine learning?
Without having the technical knowledge (yet) to answer to any of these questions immediately it seemed useful to start with one aspect of AI I’m becoming familiar with: natural language processing (NLP). This has meant looking at a code library (spaCy) to think what it means technically and curatorially to use this tool, and then what this might mean in terms of cultural heritage practice.
A course on Advanced NLP developed with spaCY developed by Ines Montani states “While spaCy comes with a range of pre-trained models to predict linguistic annotations, you almost always want to fine-tune them with more examples. You can do this by training them with more labelled data.” Montani points out that training does not help with i.e. discovering patterns in unlabelled data, and what it does help with i.e. improving model accuracy and learning new classification schemes.
This seems straight forward, but, from this point on there are many nuances to consider with regard to cultural material, and especially where that material contains data created by or that denotes indigenous culture, language, lives and experiences.
There is a lot to think about and talk through (ethically) in working with digital cultural heritage material kept in institutional collections already. Investigating using AI techniques and working with that material “as data” adds another layer of complexity to curatorial responsibilities. There are productive tensions that need to be worked at between good institutional intentions and community agency. Bell talks about the three “I” in the 3AI institute: Indicators, Interfaces, and Intention along with the three “A” Agency, Assurance, Automation. The AI lenses Bell provides are a useful addition to the data science toolkit along with the mnemonics of FAIR (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, Ethics). In terms of working ethically with indigenous data and testing out using AI techniques; the need to work first at the CARE is all the more critical.
A question posed by Russo Carroll, Rodriguez-Lonebear and Martinez provokes significant pause for thought when contemplating AI in this context:
“what do data-driven futures look like for communities historically plagued by data inequities?”.