91原创

Skip to content
Join our Newsletter

Why is it so hard to type in Indigenous languages?

This article was originally published on The Conversation, an independent and nonprofit source of news, analysis and commentary from academic experts. Disclosure information is available on the original site.

This article was originally published on The Conversation, an independent and nonprofit source of news, analysis and commentary from academic experts. Disclosure information is available on the original site.

___

Authors: Mark Turin, Associate professor, Department of Anthropology, University of British Columbia; and N虛a虛峁囂揼a虂inu虂x虒v - Robyn Humchitt, Digitization, Information & Technology and Archival Manager, Heiltsuk First Nation

When it comes to digital access and internet technologies, some languages are still more equal than others. Speakers of majority languages, who type in English or text in Korean, assume their message will be transmitted accurately. But Indigenous language communities don鈥檛 share this same confidence.

Computers and smartphones don鈥檛 come with the ability to type all letters in all languages. The unique characters integral to many Indigenous languages are often mangled as they travel across the ether.

However, the inclusion of two capital letters needed to write Ha铆色zaqv岣穉 in a recent update of the Unicode Standard means this Indigenous language can finally be written and read on all digital platforms.

Why did it take so long? And what challenges do Indigenous communities face when wanting to type in their languages?

Ha铆色zaqv: 鈥渢o act and speak correctly as a human being鈥

Ha铆色zaqv岣穉 is the language of the Heiltsuk (Hai虂色zaqv) Nation whose traditional homeland is Bella Bella, British Columbia. The language has had its own orthography 鈥 an agreed written form with established spelling conventions 鈥 since the 1970s.

Working in partnership with native speakers, a Dutch linguist was invited by tribal leadership to document their increasingly endangered language and develop learning resources. The results of this collaborative work included an alphabet chart, storybooks and a dictionary.

Before the advent of digital technologies, Indigenous communities used specially modified typewriters to represent their languages in print. Customized typewriters designed to support the Latin, Syllabics and Cherokee scripts allowed users to publish in Indigenous languages like Ha铆色zaqv岣穉.

The digital divide

The digital age has created many opportunities and some new challenges. The American Standard Code for Information Interchange, the first computer text encoding standard, introduced in the early 1990s, did not support 44 of the 129 letters in the Ha铆色zaqv岣穉 orthography. Special fonts and keyboards were required to render these characters on early desktop computers.

Designers around the world produced countless fonts to support typing in digitally under-resourced languages, each using a unique font-keyboard pairing to encode a specific language.

But this system had a major weakness: when files using custom fonts were shared, both the creator and the recipient needed to have the same font installed on their device. And if a recipient wanted to send a reply, they would need a keyboard input system that paired with that same custom font. Without these elements in place, the missing characters would be shown as 鈥渢ofu,鈥 or worse yet, rendered as a random string of meaningless characters.

The Unicode Standard鈥檚 goal is to represent all characters required by all of the world鈥檚 languages and writing systems in digital form. Unicode now defines 154,998 characters covering 168 scripts and has fast become the chosen standard for digital character encoding. Yet, until version 16.0 of the standard, released in September 2024, two capital letters needed to write Ha铆色zaqv岣穉 remained absent.

Encoding Ha铆色zaqv岣穉

Through a partnership between Heiltsuk Revitalization, the University of British Columbia and international type design company, Typotheque, we have been working to ensure that each and every Ha铆色zaqv岣穉 character is consistently represented and accurately reproduced on all digital platforms and devices.

Before this community-led collaboration, it was not possible to fully encode Ha铆色zaqv岣穉 in digital text. This meant that community members couldn鈥檛 access the full range of characters they needed to input their language digitally. That would be like typing English without having access to capital E or S, and relying on workarounds like 危 for E or 鈭 for S.

Ensuring accurate character encoding that is predictable on all operating systems is a cornerstone of language justice. Yet the burden is still on communities to petition Unicode to have their scripts included, and the process is exacting.

Harder still, a proposal must consider whether other languages that use the same script might be impacted by the proposed additions, and then mitigate and navigate potential conflicts. The stakes are high for changes to the encoding standard: decisions are almost impossible to reverse on account of the need to maintain stability and ensure both backward and forward compatibility.

Important projects like the Script Encoding Initiative have for decades been helping communities to prepare technical proposals for the encoding of scripts and characters that are as of yet not supported by Unicode. There is still much work to be done.

Language rights and government documents

鈥'C煤agil谩kv 鈥 also known as Jess H虛a虂ust虛i 鈥 is a Hai虂色zaqv leader, parent, educator and poet from Bella Bella. In 2021, H虛a虂ust虛i approached 91原创 government agencies, both provincial and federal, to change Hai虂色zaqv identification documents to remove colonial anglicizations and reclaim the correct spelling of their name.

H虛a虂ust虛i was informed that the existing backend systems were unable to accommodate the representation of diacritic marks.

鈥淭he reason why I have an incorrect name is because it was anglicized by Indian agents. I didn鈥檛 create the problem, but I鈥檓 not getting any help to fix that,鈥 H虛a虂ust虛i told CBC News in 2021. 鈥淚 feel that it鈥檚 important to honour my ancestors and my language by spelling and pronouncing it correctly. I would love for my children to grow up with the correct spelling of their name on their ID.鈥

The ability to fully encode Ha铆色zaqv岣穉 in the Unicode Standard means the language can now be successfully input into any Unicode compliant system. This is a baseline requirement for the elimination of many remaining digital language barriers.

Beyond bilingualism

Canada is fond of celebrating its commitment to bilingualism. Extensive provisions are in place to support English and French. But the origins of these colonial languages lie in Europe, brought by settlers as they first traded and then colonized; and both have vibrant speech communities in their original homelands and around the globe.

In 2019, the 91原创 government passed the Indigenous Languages Act designed to support the revitalization, maintaining and strengthening of the languages Indigenous to this land.

As Canada works to implement the United Nations Declaration on the Rights of Indigenous Peoples, it should also simultaneously realize the slogan of the Unicode Consortium: 鈥渆veryone in the world should be able to use their own language on phones and computers.鈥

The challenges to achieving universal encoding for historically-marginalized languages are no longer technical; they are bureaucratic and political. In 2009, Canada鈥檚 then Commissioner of Official Languages, Graham Fraser, was quoted as saying:

鈥淚n the same way that race is at the core of 鈥 American experience and class is at the core of British experience, I think that language is at the core of 91原创 experience.鈥

Through ensuring linguistic justice for all of its citizens, Canada can exercise global leadership in language policy and planning.

This article was co-authored by Bridget Chase, a language technologist and researcher, and Kevin King, a typeface designer at Typotheque.

___

Mark Turin receives funding from the Social Sciences and Humanities Research Council of Canada.

N虛a虛峁囂揼谩in煤x虒v - Robyn Humchitt has received funding from the First Peoples' Cultural Council in British Columbia, Canada.

___

This article is republished from The Conversation under a Creative Commons license. Disclosure information is available on the original site. Read the original article: https://theconversation.com/why-is-it-so-hard-to-type-in-indigenous-languages-245247

N虛a虛峁囂揼a虂inu虂x虒v - Robyn Humchitt, Digitization, Information & Technology and Archival Manager, Heiltsuk First Nation, The Conversation