About

Embracing the FAIR principles of Open Science, the HumanitiesConnect Digital Library, initiated in 2020, is a collaborative effort to establish a digital repository that facilitates the access to and direct citation of rare printed sources across three Max Planck libraries. The platform combines the high-resolution scans of 4,800 volumes with their AI-driven transcriptions enabling a comprehensive text search across growing collections. The transcriptions are generated and managed by Transkribus, a software solution developed by the European cooperative READ-COOP, with neural technologies for text and layout recognition. This allows for continuous transcription refinement and annotation of content directly from the application, as well as access for data mining via an application programming interface (API) to facilitate research in the Digital Humanities.

The digital collection consists in rare books from the libraries of the Bibliotheca Hertziana – Max Planck Institute for Art History (Rome), the Kunsthistorisches Institut in Florenz – Max-Planck-Institut (Florence), and the Max Planck Institute for the History of Science (Berlin). It also incorporates texts from the public domain that are related to research projects. Moreover, about 900 volumes of travel literature from the Bibliotheca Hertziana’s holdings were digitized and added in 2023. Several hundreds of newly digitized volumes about Rome and Naples will follow soon.

Starting from the standard Transkribus features, the Bibliotheca Hertziana has collaborated with READ-COOP in the development of important new functionalities. Public neural models for printed texts, optimized for the specificities of historical language and typographic styles, are now employed for the transcription. While abbreviations in texts after 1520 are directly expanded, a new, reusable model and AI technology has been developed for earlier texts that retain abbreviations but contextually annotate expansions. Transkribus’ Read&Search online interface has been optimized for the viewing of ancient texts, with the inclusion of filters based on metadata such as author, title, and publication date. The tolerance level of the fuzzy search can now be selected to include alternative orthographic or typographic forms or declensions, which are common to ancient texts. For the travel literature collection, a new neural model of paragraph layout recognition has been developed. The HumanitiesConnect Digital Library invites scholars to curate volumes by cleaning the recognition, annotating the text, and enriching it with named entity recognition and linking.

The HumanitiesConnect Digital Library, initiated and funded by the Department of Tristan Weddigen and directed by Elisa Bastianello of the Digital Publications unit at the Bibliotheca Hertziana, has been generously supported by the heads and staff of the libraries of the three mentioned Max Planck Institutes and by READ-COOP.

Contact: Elisa Bastianello (elisa.bastianello@biblhertz.it).