A fascinating article from this week's New Scientist magazine. Unfortunately the magazine online is behind a pay-wall and so I will only quote the particularly relevant sections from the print edition. The article concerns the application of AI where computers have been programmed ["trained"] to read and translate cuneiform texts, to reassemble fragmented texts, and help to recreate what was once held in ancient libraries , even predicting sections of missing text. The work being carried out is quite extraordinary. Given that of the 500,000 or so cuneiform texts currently held in the world's museums only half have been transliterated or translated, what has already been discovered, and what awaits discovery, is very exciting.
Enrique Jiménez is based at Ludwig Maximilian University in Munich and Irving Finkel is based at the British Museum in London.
Enrique Jiménez is based at Ludwig Maximilian University in Munich and Irving Finkel is based at the British Museum in London.
One issue is that cuneiform is incredibly complex. “The script is very ambiguous. There is no single way of writing a word,” says Jiménez. In addition, most of the tablets are incomplete. The majority of cuneiform tablets are broken, chipped or smashed to pieces. Often, the edges have crumbled away, leaving stories without beginnings or ends, or with gaps in the narrative. [...]
Piecing these fragments together is like assembling a number of complex jigsaw puzzles whose pieces have become jumbled up, with no picture on the boxes to tell you what to aim for, says Jiménez. What’s more, fragments from the same tablet can be scattered around the world. “There’s a tablet where there’s a piece in Chicago, which joins a piece in Berlin and a piece here,” says Finkel. Putting the puzzle back together is a painstaking process that relies on luck and memory. It took more than 100 years to identify the beginning of the Epic of Gilgamesh in a small fragment stored in a museum drawer, for instance. But now computers are involved, things are changing.
The Fragmentarium, part of the Electronic Babylonian Literature project, set up by Jiménez in 2018, is using AI to reassemble Ashurbanipal’s library and other great collections written in cuneiform by working out which fragments belong together. To do this, Jiménez is using algorithms developed to compare different variants of gene sequences, based on the fact that there are often multiple copies of the same text with minor variations. The AI can be trained on transliterations of these texts, in which cuneiform characters have been written in the Latin alphabet according to the way they sound (in the same way that Chinese characters can be written in Pinyin, their Mandarin pronunciation). The AI can then predict which cuneiform signs are likely to be in the missing segments. It can also search for a particular cuneiform sign in a huge database of fragments.
In 2019, this approach assisted with the identification of several missing pieces of the Epic of Gilgamesh, as well as revealing a new genre of ancient literature: a text consisting of parodies (including jokes about donkey dung) that was used by school children to help them learn to write. And together with Anmar Fadhil at the University of Baghdad in Iraq, Jiménez is also piecing together another previously unknown genre, a hymn to a city, in this case the city of Babylon, featuring details of temple life and cultic prostitutes.
Then last year, in the world’s first fully autonomous cuneiform fragment identification using AI, a missing piece of the famous Poem of the Righteous Sufferer (which explores the question of why bad things happen to good people, and seems to be a precursor to the biblical Book of Job) was identified. “Humans would have missed this,” says Jiménez. [...]
To help wade through this sea of administrative information, the Machine Translation and Automated Analysis of Cuneiform Languages project was set up in 2017 by Heather Baker at the University of Toronto, and coordinated by Pagé-Perron. In the most recent experiments, different algorithms trained on 45,500 transliterated phrases, each consisting of up to 19 words, were tested for their ability to translate Sumerian words into English. Results published last year show that one particular algorithm could translate with an accuracy of 95 per cent. The system also pulls out key information from the texts, identifying categories such as people, places and gods.
Last year, computer scientist Gabriel Stanovsky at the Hebrew University of Jerusalem and his colleagues found a way to predict the text on missing parts of fragments, in a similar way to that of automatic prediction of words on mobile phones. They used a deeplearning AI, feeding it transliterations from 10,000 cuneiform tablets, written in Akkadian, and found that it could suggest contextually correct words to fill the gaps with an accuracy of 89 per cent. Another potential application of AI is the dating of tablets whose origin is unknown.
Piecing these fragments together is like assembling a number of complex jigsaw puzzles whose pieces have become jumbled up, with no picture on the boxes to tell you what to aim for, says Jiménez. What’s more, fragments from the same tablet can be scattered around the world. “There’s a tablet where there’s a piece in Chicago, which joins a piece in Berlin and a piece here,” says Finkel. Putting the puzzle back together is a painstaking process that relies on luck and memory. It took more than 100 years to identify the beginning of the Epic of Gilgamesh in a small fragment stored in a museum drawer, for instance. But now computers are involved, things are changing.
The Fragmentarium, part of the Electronic Babylonian Literature project, set up by Jiménez in 2018, is using AI to reassemble Ashurbanipal’s library and other great collections written in cuneiform by working out which fragments belong together. To do this, Jiménez is using algorithms developed to compare different variants of gene sequences, based on the fact that there are often multiple copies of the same text with minor variations. The AI can be trained on transliterations of these texts, in which cuneiform characters have been written in the Latin alphabet according to the way they sound (in the same way that Chinese characters can be written in Pinyin, their Mandarin pronunciation). The AI can then predict which cuneiform signs are likely to be in the missing segments. It can also search for a particular cuneiform sign in a huge database of fragments.
In 2019, this approach assisted with the identification of several missing pieces of the Epic of Gilgamesh, as well as revealing a new genre of ancient literature: a text consisting of parodies (including jokes about donkey dung) that was used by school children to help them learn to write. And together with Anmar Fadhil at the University of Baghdad in Iraq, Jiménez is also piecing together another previously unknown genre, a hymn to a city, in this case the city of Babylon, featuring details of temple life and cultic prostitutes.
Then last year, in the world’s first fully autonomous cuneiform fragment identification using AI, a missing piece of the famous Poem of the Righteous Sufferer (which explores the question of why bad things happen to good people, and seems to be a precursor to the biblical Book of Job) was identified. “Humans would have missed this,” says Jiménez. [...]
To help wade through this sea of administrative information, the Machine Translation and Automated Analysis of Cuneiform Languages project was set up in 2017 by Heather Baker at the University of Toronto, and coordinated by Pagé-Perron. In the most recent experiments, different algorithms trained on 45,500 transliterated phrases, each consisting of up to 19 words, were tested for their ability to translate Sumerian words into English. Results published last year show that one particular algorithm could translate with an accuracy of 95 per cent. The system also pulls out key information from the texts, identifying categories such as people, places and gods.
Last year, computer scientist Gabriel Stanovsky at the Hebrew University of Jerusalem and his colleagues found a way to predict the text on missing parts of fragments, in a similar way to that of automatic prediction of words on mobile phones. They used a deeplearning AI, feeding it transliterations from 10,000 cuneiform tablets, written in Akkadian, and found that it could suggest contextually correct words to fill the gaps with an accuracy of 89 per cent. Another potential application of AI is the dating of tablets whose origin is unknown.
Comment