Tech Professor Uses AI To Preserve Cherokee Language



Tech Professor Uses AI To Preserve Cherokee Language

Tracey Hackett
A Tennessee Tech University computer science professor is part of a collaborative effort using artificial intelligence to help preserve the Cherokee language, one of many endangered Indigenous languages in the United States.

Jesse Roberts is an expert in natural language processing and computational linguistics, and his interest in language preservation was sparked by work being done in Ireland to protect the Gaelic language through AI-driven methods.

"My research looks at how we can use computers to process language, reason, and analyze meaning," Roberts said. "When I saw what was being done in Ireland with Gaelic, I started thinking about how similar techniques could be used for Indigenous languages - particularly Cherokee - here in the U.S."

Cherokee is a polysynthetic language, which means complex words can be created by combining smaller linguistic units. It can be described as Lego blocks for language - words are built by combining many smaller word parts into long, complex words that can express what would take an entire sentence in English.

And it presents unique challenges for AI modeling. Unlike widely spoken languages such as English, which have vast digital datasets for training AI, Cherokee lacks the same breadth of resources. This makes it difficult to develop AI-driven tools for language learning and preservation.

"This is a typical problem in low-resource languages," Roberts explained. "With English, we have immense amounts of data available to train AI models. Cherokee, on the other hand, has very little online presence, making it difficult to teach AI how to understand and generate it."

In fact, Roberts notes, fewer than 140 first-language Cherokee speakers remain, most of them over the age of 60.

One of the primary goals of the project, he said, is to develop AI systems that can go beyond simply recording spoken Cherokee.

While documentation is essential, Roberts and his collaborators envision a more interactive approach - one where AI can engage with users in conversation, serving as a tool for language learners, educators, and even museum installations.

Ben Frey, a linguist at the University of North Carolina at Asheville and a member of the Eastern Band of Cherokee Indians, is a key collaborator with Roberts - and the project involves other Cherokee language educators and community leaders as well.

"Ben is classically trained in linguistics and has immersive experience in Eastern Band Cherokee. His expertise helps us make sure we're not just preserving words but also the deeper cultural meanings embedded in the language," Roberts said.

Community involvement is critical to the project. The researchers are working closely with James "Bo" Taylor, cultural resource officer with the Eastern Band of Cherokee Indians, and other representatives and organizations to ensure that the AI tools align with the needs and expectations of Cherokee speakers.

"There are strong opinions on how preservation resources should be used," Roberts said. "It's a delicate balance - time spent training an AI model is time that a fluent speaker isn't spending directly teaching another person. We want this project to be synergistic, helping rather than hindering language transmission."

That's why the team is collaborating on efforts that overlap as much as possible.

The long-term vision for the project is to create an AI model that can facilitate meaningful conversations in Cherokee. While this is still years away, incremental progress could lead to AI-powered tools for distance learning, museum exhibits and language education programs.

"Every step forward helps," Roberts said. "The more we can document, analyze, and model, the better equipped we are to preserve the language for future generations."

Beyond Cherokee, the research has broader implications for other endangered languages worldwide. Similar AI-driven projects could aid in revitalization efforts for African languages, Iroquoian languages and other polysynthetic linguistic systems.

"There's no silver bullet for language preservation," Roberts said. "But AI gives us leverage, room for innovation, and new ways to engage people who want to learn. Language loss is a global issue, and our hope is that the methods we develop here can be transferred to help other communities as well."

According to a 2023 report by the Language Conservancy, approximately nine languages are lost each year. That means a language dies on average every 40 days.

The stakes are high for Roberts and his collaborators. Without active intervention, he estimates that the Cherokee language could disappear within a generation. As the project moves forward, the team encourages those interested in language preservation, AI or Cherokee culture to get involved.

"When a language dies, you don't just lose words - you lose a worldview, a way of understanding and interacting with the world," Roberts said, quoting United Nations Educational, Scientific and Cultural Organization (UNESCO) scholar LucĂ­a Iglesias Kuntz. "Our goal is to help ensure that doesn't happen."