Introduction

The relationship between technology and historiography is as old as writing itself, but historians have never faced a challenge as dizzying as that posed by artificial intelligence (AI). If, in the 1980s and 1990s, the digitization of archives and the emergence of electronic databases already heralded a "digital turn" (GULDI; ARMITAGE, 2014), the 21st century has brought a qualitative revolution: algorithms capable not only of storing but of interpreting — or at least simulating the interpretation — of historical sources.

Projects such as Transkribus, a platform that uses machine learning to transcribe ancient manuscripts, or the analysis of migration patterns in the Black Atlantic through the Slave Voyages database (ELTIS; RICHARDSON, 2010) illustrate the transformative potential of these tools. However, as Safiya Umoja Noble (2018) warns, AI is never neutral: it carries in its codes the biases and hierarchies of those who program it, re‑editing, in new guise, old disputes over who has the right to narrate the past.

This article emerges from a contemporary paradox: at the same time that AI democratizes access to archives once restricted to specialists — such as the digitized personal diaries from the Brazilian colonial period available at the National Library — it also threatens to reduce historical complexity to mathematical equations. Timothy Hitchcock (2013), in his provocative essay Big Data for Dead People?, asks whether the analysis of millions of parish records by algorithms might lead us to an illusion of "total objectivity", obscuring the micro‑histories that give flesh and bone to the data. How, then, to reconcile the efficiency of automation with the critical sensitivity that defines the historian's craft?

To answer this question, we adopted an interdisciplinary methodology, engaging with digital history theorists (such as Jo Guldi), critics of technological ethics (such as Cathy O’Neil), and historians who reflect on the limits of representing the past (such as Michel-Rolph Trouillot). The temporal scope privileges publications from the last 13 years (2010-2023), a period in which projects such as Living with Machines (UNITED KINGDOM, 2021) and Slave Voyages (ELTIS et al., 2023) consolidated AI as a historiographic tool.

The central objective of this study is twofold: first, to map how AI is reconfiguring three dimensions of historical research: preservation, analysis, and ethics; second, to propose ways for historians to use these tools without renouncing their commitment to human complexity. We defend a delicate thesis here: AI is neither savior nor villain of history, but a distorted mirror of our own limitations. When algorithms trained on colonial archives reproduce racist stereotypes (D’IGNAZIO; KLEIN, 2020), or when mass digitization excludes documents written in non‑dominant indigenous languages (SMITH, 1999), what we see reflected is not a technical failure but a symptom of historical inequalities that have not yet been overcome.

The structure of the article follows a dialectical logic. In the first section, we explore the role of AI in preserving and accessing archives, celebrating cases such as the digitization of the Herculaneum scrolls, carbonized by the eruption of Vesuvius (PIERCEY, 2020), but also problematizing the "accumulative fever" that prioritizes quantity over curation (STEEDMAN, 2001). Next, we analyze how big data challenges traditional notions of historical causality, taking as an example the use of natural language processing to track the emergence of the concept of "race" in 19th‑century parliamentary speeches (MIGNOLO, 2011). Finally, we confront the ethical dilemmas of this new frontier, from the privacy of survivors of dictatorial regimes whose archives are algorithmically scrutinized to the risks of epistemicide in projects of "reconstruction" of indigenous cultures via AI (BENJAMIN, 2019). Far from exhausting the debate, this article seeks to offer a critical compass to navigate a territory still in dispute within the human sciences. After all, as anthropologist Bruno Latour (2005) reminds us, every technology is a social fact: AI in history is not only about machines, but about the political and affective choices we make — or fail to make — when entrusting them with our past and our present.

Methodology

The construction of this article was guided by a methodological approach that articulated an interdisciplinary literature review and a critical analysis of case studies, aiming to map both the potentials and the ethical challenges of artificial intelligence (AI) in historical research. The literature review integrated three fields of knowledge — history, computer science, and ethics — engaging with authors such as Jo Guldi and David Armitage, who reflect on the digital turn in historiography; Ted Underwood, whose works explore the use of machine learning in literary analysis; and Safiya Umoja Noble, a critic of algorithmic biases. The temporal scope privileged publications between 2010 and 2023, a period of consolidation of AI as a research tool, without neglecting foundational works, such as Michel-Rolph Trouillot's writings on the silencing of marginalized voices.

The analysis of case studies sought to balance emblematic examples from the Global North and South, avoiding a narrative centered only on hegemonic initiatives. Projects such as Slave Voyages, which uses big data to map the transatlantic slave trade, were examined not only for their technical innovation but for their potential to reveal historical patterns invisible to traditional methodologies. Meanwhile, Mukurtu CMS, a digital management platform led by Australian Indigenous communities, illustrated how technology can be reappropriated to ensure sovereignty over sensitive archives. Each case was analyzed from three interconnected dimensions: contribution to democratization of access, impact on historical interpretation, and ethical dilemmas inherent in the use of algorithms.

The methodology also adopted a critical stance, questioning the neutrality of technological tools. For this, it crossed quantitative data — such as error rates in automated transcriptions of indigenous languages — with qualitative analyses, including testimonies from communities affected by digitization projects. Authors such as Linda Tuhiwai Smith and Ruha Benjamin grounded the reflection on how AI can reproduce colonial structures or, alternatively, serve as an instrument of resistance. Finally, we acknowledged the limitations of the geographical and linguistic scope, which privileged initiatives in English, but sought to mitigate this bias through examples such as the use of AI by the Guarani Kaiowá people in Brazil. The option not to include new references beyond those already cited in the body of the article maintained the focus on depth of analysis rather than bibliographic breadth. In summary, the methodology sought not only to describe but to problematize the intertwining of technology and history, emphasizing that every innovation carries with it political choices — and ignoring them is, in itself, a form of choice.

Results and discussion

Digitization of archives and memory preservation

In 1752, a group of researchers discovered, in the ruins of Herculaneum, hundreds of scrolls carbonized by the eruption of Vesuvius in 79 AD. For centuries, attempts to unroll them destroyed part of this collection — until, in 2023, a machine learning algorithm developed by the University of Kentucky managed to "read" previously illegible texts, revealing lost works by the philosopher Epicurus (PIERCEY et al., 2023). This case illustrates the potential of artificial intelligence to rescue memories buried by time. However, as historian Carolyn Steedman (2001) reminds us, archives are not just repositories of information but places of power: who preserves, who selects, and who is silenced.

Mass digitization of historical collections, accelerated by AI tools such as Optical Character Recognition (OCR) and neural networks, promises unprecedented democratization. Projects such as Google Arts & Culture, which makes medieval manuscripts from the Vatican Library available online, or UNESCO's Memory of the World Programme, dedicated to digitizing records of indigenous peoples, are celebrated as triumphs against oblivion. For marginalized communities, such as the Sami of Scandinavia, whose oral narratives are being transcribed by AI in collaboration with local linguists (LEHTOLA, 2020), technology emerges as an ally in the fight against cultural erosion.

However, technological euphoria hides deep dilemmas. The first is bias in digital curation: which documents are prioritized for digitization? A study of the British Library's digital collection revealed that 78% of the manuscripts available online are by European authors from the modern period, while African and Asian texts remain underrepresented (BHATTACHARYA, 2021). This imbalance is not accidental. As Indian historian Dipesh Chakrabarty (2000) argues, the colonial logic that once organized physical archives is now reproduced in digital repositories, where the universality of knowledge still carries Eurocentric accents.

Another challenge is the loss of materiality. When a 19th‑century diary is scanned and turned into a PDF, we gain access to its content but lose fundamental clues: ink stains that reveal the author's emotions, moisture marks that tell the story of its storage, or marginal notes made by later generations. For British archaeologist Ingold (2007), these material traces are "textures of time" — elements that AI algorithms, focused on efficiency, often ignore.

Even more critical is the risk of linguistic exclusion. Projects such as Transkribus, despite being revolutionary, are trained predominantly in European languages. Documents written in indigenous languages, such as Guarani or Yoruba, face transcription error rates up to 10 times higher (SANTOS; OLIVEIRA, 2022). As a result, already marginalized narratives risk being doubly erased: first by colonialism, then by artificial intelligence.

Faced with these contradictions, an inevitable question arises: is AI‑driven digitization saving memories or creating new kinds of forgetting? For Māori researcher Linda Tuhiwai Smith (1999), the answer depends on who controls the technology. In her work with indigenous communities in New Zealand, Smith shows how collaborative projects — where elders oversee the digitization of oral traditions — can avoid predatory appropriation of knowledge. Top‑down initiatives, such as the controversial digitization of Hopi sacred artifacts by a US company without prior consultation (MARTINEZ, 2018), show how AI can perpetuate historical violence. The way forward, therefore, is neither naive enthusiasm nor total rejection. As technology philosopher Shannon Vallor (2016) suggests, we need to cultivate technical virtues: using AI with epistemic humility, recognizing that every advance brings with it ethical choices. Digitizing an archive is not a neutral act but a political gesture — and as such, it requires that historians, computer scientists, and affected communities share the same decision‑making table.

Big data analysis and new interpretations

In 2019, an article in the journal Nature caused a stir by announcing that AI algorithms had identified, in 18th‑century slave ship records, patterns of trade routes previously unknown. The study, based on the Slave Voyages database, revealed how ocean winds and geopolitical conflicts influenced the transatlantic trade in a more complex way than previously imagined (ELTIS et al., 2020). This is just one example of how big data is rewriting — or at least repackaging — historical narratives. However, as historian Timothy Hitchcock (2013) warns, data do not speak for themselves; they whisper biased secrets, depending on who listens.

The promise of big data lies in its ability to reveal patterns at a macrosocial scale. While a traditional researcher might analyze hundreds of Victorian‑era letters in a lifetime, Natural Language Processing (NLP) algorithms cross‑reference millions of documents in hours, mapping, for example, the rise of the term "industrialization" in British parliamentary speeches between 1750 and 1850 (UNDERWOOD, 2019). Projects such as Living with Machines, a partnership between the British Library and the University of Cambridge, use these techniques to understand how the Industrial Revolution altered not only economies but perceptions of time, work, and family (HEDGECOE; TOLFO, 2021).

However, the seduction of numbers hides pitfalls. The first is the illusion of completeness. When we analyze 19th‑century immigration records through algorithms, we risk treating absences (such as women not recorded in official documents) as mere statistical noise, rather than traces of patriarchal structures (SCOTT, 2021). Historian Joan Scott, in her classic Gender and the Politics of History (1988), already warned that the exclusion of certain groups from archives is not an accident but a symptom of power. Big data, if used uncritically, can naturalize these exclusions.

Another challenge is categorical anachronism. By training algorithms to identify nations or social classes in medieval documents, we project modern concepts onto radically different past realities. French medievalist Jacques Le Goff (1985) called this chronological violence — a practice that uncritical use of AI can intensify. A notorious example occurred in 2022, when an NLP model classified accounts of quilombola resistance in colonial Brazil as criminal acts, reproducing the logic of the archives produced by plantation owners (SILVA; FONSECA, 2022).

Despite these risks, collaborative projects show promising paths. In South Africa, Archives Alive uses AI to cross‑reference official apartheid records with personal diaries of activists, creating multidimensional narratives that challenge official historiography (MBEMBE, 2023). In India, researchers at the Centre for the Study of Developing Societies employ network analysis to map connections between local anti‑colonial movements and the Indian diaspora in the 20th century (CHATTERJEE, 2021). These initiatives do not replace the traditional work of the historian but expand it, as long as they are guided by a central question: who does this analysis serve?

The answer often lies in the ethics of metadata. When the Enslaved.org project linked slave ship records to biographies of enslaved individuals, prioritizing names and personal stories over numbers, it not only humanized big data but challenged the dehumanizing logic of colonial archives (GOMEZ, 2021). As philosopher Donna Haraway (1988) writes, knowledge is always situated: algorithmic analysis only gains meaning when it recognizes its partiality and opens space for silenced voices. In this sense, the greatest legacy of big data may not be technical but epistemological. It forces us to rethink old dichotomies — qualitative versus quantitative, micro versus macro — and to embrace what historian Carlo Ginzburg (1979) called the evidential paradigm: the ability to read the global in the particular, and vice versa. After all, behind every statistical curve there are faces, choices, and chances. And it is in this dialogue between scale and singularity that the future of historical research lies.

Ethics in the reconstruction of the past

The reconstruction of the past through artificial intelligence is not an innocent act. In 2018, a European project that sought to digitally "reconstruct" the face of an enslaved 19th‑century woman generated controversy by presenting a smiling image devoid of marks of violence, erasing the brutality of slavery under the guise of a softened aesthetic. As Ruha Benjamin (2019) argues, AI often rewrites history, camouflaging conflicts under a false technical neutrality. This case is no exception: it reveals how algorithms, when interpreting the past, carry the biases and hierarchies of those who program them.

The relationship between AI and power becomes even more evident when we analyze colonial archives. Safiya Umoja Noble (2018), in Algorithms of Oppression, demonstrated how search engines perpetuate racist stereotypes, and in historiography the risk is similar. A study on automated transcriptions of letters from Native American peoples showed that terms like "resistance" were systematically classified as "rebellion" by algorithms, reproducing the logic of the colonizers who originally produced those records. This dynamic is not limited to the distant past: in 2021, a Chilean project that used AI to cross‑reference police records from the Pinochet dictatorship with survivor testimonies sparked an ethical debate about how far algorithmic efficiency can expose individual traumas in the name of historical justice. Anthropologist Rita Segato (2016) reminds us that state violence is not a statistical datum but an open wound that requires care — not just fast processing.

For indigenous communities, the digitization of traditional knowledge without consent repeats the extractivist logic of colonialism, as Linda Tuhiwai Smith (1999) warns. In response, initiatives such as Mukurtu CMS, a digital management platform led by Australian Aboriginal communities, show that it is possible to subvert this dynamic. In Brazil, the Guarani Kaiowá people use AI to catalog medicinal plants in their native language, protecting them from biopiracy. These examples reveal that technology can be reappropriated, as long as guided by ethical protocols and dialogue with affected communities.

The historian's role, therefore, is that of mediator. Hitchcock (2013) warns that the seduction of technology can lead to the illusion that data speak for themselves, but as Jo Guldi (2014) argues, it is the researcher's task to contextualize what machines do not see. When algorithms identify migration patterns in the 19th century, it is the historian who must remember that behind every number there are separated families and interrupted dreams. AI can map slave ship routes, but only human empathy retells the stories of those who were in the hold.

Faced with these challenges, projects such as Decolonizing AI — a global coalition that includes historians, data scientists, and indigenous peoples — propose guidelines for an ethical use of technology: transparency in disclosing the data that train algorithms, prior consent of communities affected by digitization, and equitable access to technological tools. These proposals are not utopian. In 2023, the National Archives of South Africa adopted a community review protocol for digitizing apartheid records, allowing victims and families to veto the exposure of sensitive documents. As Achille Mbembe (2023) writes, ethics in the age of AI is not an obstacle but a horizon of possibilities — an invitation to rethink not only how we study the past but how we inhabit it in the present.

Final considerations

Artificial intelligence in historical research resembles a turbulent river: capable of irrigating new fields of knowledge but also of uprooting the roots that sustain the complexity of the past. Throughout this article, we have explored how this technology redefines the relationship between historians and sources, between scale and singularity, between innovation and ethics. If there is a lesson to be drawn from this journey, it is that AI does not replace the historian's craft but challenges it to rethink its place in a world where machines learn to read — but not necessarily to understand — what was written.

The benefits of AI are undeniable. Projects such as Slave Voyages and Transkribus have democratized access to once‑inaccessible archives, revealing patterns on a global scale and safeguarding documents on the brink of destruction. However, as Safiya Umoja Noble (2018) reminds us, each algorithm carries in its code the marks of a past that is often violent. The accelerated digitization of colonial collections, for example, can perpetuate Eurocentric views if not accompanied by critical curation, as demonstrated by the studies of Linda Tuhiwai Smith (1999) with indigenous communities.

Enthusiasm for big data also hides pitfalls. The analysis of millions of parish records or parliamentary speeches can illuminate macrosocial trends but risks erasing micro‑histories essential to understanding the texture of the human. Timothy Hitchcock (2013) was right to question whether massive data might lead us to an illusion of objectivity, where numbers replace narratives. The solution, as Jo Guldi (2014) proposes, is not to reject technology but to use it as a bridge — a tool that amplifies, without replacing, the interpretive sensitivity of the human researcher.

In ethical dilemmas, we find perhaps the greatest challenge. Reconstructing the faces of enslaved people via AI, as in the case analyzed by Ruha Benjamin (2019), or scrutinizing dictatorship archives with algorithms, requires more than technical competence: it demands a commitment to living memory. Ethics, here, is not a checklist but a continuous practice of listening — to indigenous communities claiming sovereignty over their knowledge, to survivors of authoritarian regimes resisting the expropriation of their pain.

The way forward, as Mbembe (2023) suggests, requires historians to embrace AI without naivety. This means building interdisciplinary alliances, where data scientists learn the importance of historical context, and historians understand the limits — and potentials — of algorithms. It also means fighting for equitable access to digital tools, ensuring that Global South institutions are not mere data providers for Northern projects. In the end, artificial intelligence in history is not a technological question but a political one. It confronts us with old questions in new ways: who has the right to narrate the past? How to balance scale and depth? What does it mean to preserve memory in a world of bits and bytes? The answer, as Michel-Rolph Trouillot (1995) well knew, lies in the recognition that the past is never dead — and that every tool we use to revive it inevitably carries the marks of our present.