On a Sandy Beach: The i18n Issue

Mike's Notes

This is a copy of the March issue of Ajabbi Research.

It is about the history of the effort to make Pipi available in any language, localisation, or script requested by users (i18n).

Ajabbi Research is published on SubStack on the first Friday of each month, and subscriptions are free.

Each issue is a broad historical overview of a research topic, serving as an index to dozens of previously posted related articles. There are now over 650 articles/posts.

This copy of the issue will be updated with additional information as it becomes available. Check the Last Updated date given below.

Eventually, each issue will be reused on the separate Ajabbi Research website as an introduction to a research area comprising multiple research projects.

Resources

https://ajabbi.substack.com/archive

References

SIL.
Unicode

Repository

Home > Ajabbi Research > Library >
Home > Handbook >

Last Updated

19/04/2026

The i18n Issue

By: Mike Peters

Ajabbi Research: 6/03/2026

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

This is the story of the effort to make Pipi available in any human language and script. The steps taken have been part of Pipi's development since 2005, spanning 5 versions.

The NZERN Pipi 2003-2005 Development Plan started it all.

https://www.blog.ajabbi.com/2016/10/nzern-pipi-2003-2005-development-plan.html

Pipi 4 (2005-2008)

The story starts with Pipi 4. It was a big, successful system that supported community-driven Ecological Restoration in NZ. Here is a history of that Pipi version.

https://www.blog.ajabbi.com/2017/09/pipi-4-2005-2008.html

Initially, the large websites that Pipi generated were in English only. Then, as botanical and zoological information was added, Latin, English, and Maori names were used. Eventually, provision for Chinese was planned to support the Chinese community-led conservation programmes. There were no separate language data structures in the 850-table Pipi 4 database; instead, some entities had additional columns for each language.

English

The 25,000 pages of websites that Pipi generated were initially in English.

Latin

Scientific names used in biological data were written in Latin.

Maori

Over time, it was realised that support for the Maori Language, Te Reo Maori, was required. To start, bilingual volunteers provided lists of words for regional areas, towns, etc.

Chinese

There was an Auckland-based Chinese Community-driven initiative to reach older residents who didn't speak English about conservation.

Pipi 6 (2017-2019)

When Pipi was rebuilt from memory, based on the limited experience with Pipi 4, foundational work was done to prepare Pipi for better multilingual support. This would require extra databases.

https://www.blog.ajabbi.com/2020/01/pipi-6-2017-2019.html

Metadata using international codes was added to every database table to enable future language usage. The codes used were ISO 639-3, Country, Unicode, and CLDR/LDML

https://www.blog.ajabbi.com/2019/04/i18n-internationalisation-and.html

ISO 639-3

ISO 639 gives comprehensive provisions for the identification and assignment of language identifiers to individual languages, and for the creation of new language code elements or for the modification of existing ones (Terms of Reference of the ISO639/MA). - ISO 639-3

***

It defines three-letter codes for identifying languages. The standard was published by the International Organisation for Standardisation (ISO) on 1 February 2007. As of 2023, this edition of the standard has been officially withdrawn and replaced by ISO 639:2023.
ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. The extended language coverage was based primarily on the language codes used in the Ethnologue (volumes 10–14) published by SIL International, which is now the registration authority for ISO 639-3.[2] It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten. However, it does not include reconstructed languages such as Proto-Indo-European.
ISO 639-3 is intended for use as metadata codes in a wide range of applications. It is widely used in computer and information systems, such as the Internet, in which many languages need to be supported. In archives and other information storage, it is used in cataloging systems, indicating what language a resource is in or about. The codes are also frequently used in the linguistic literature and elsewhere to compensate for the fact that language names may be obscure or ambiguous. - Wikipedia

Examples

Eng (English
Fra (French)

ISO_3166-1_alpha-3

ISO 3166-1 alpha-3 codes are three-letter country codes defined in ISO 3166-1, part of the ISO 3166 standard published by the International Organization for Standardization (ISO), to represent countries, dependent territories, and special areas of geographical interest. They allow a better visual association between the codes and the country names than the two-letter alpha-2 codes (the third set of codes is numeric and hence offers no visual association). They were first included as part of the ISO 3166 standard in its first edition in 1974. - Wikipedia

Examples

ABW (Aruba)
AFG (Afghanistan)
AGO (Angola)

Unicode

Unicode (also known as The Unicode Standard and TUS) is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 17.0[A] defines 159,801 characters and 172 scripts used in various ordinary, literary, academic and technical contexts. - Wikipedia

Examples

Latn (Latin)
Lina (Linear B)
Hebr (Hebrew)

CLDR/LDML

The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications. CLDR is written in the Locale Data Markup Language (LDML). - Wikipedia

Example

<?xml version="1.0" encoding="UTF-8" ?>

<ldml>

<version number="1.1">ldml version 1.1</version>

</ldml>

Localisation (L10N)

Language localisation (or language localisation) is the process of adapting a product's translation to a specific country or region. It is the second phase of a larger process of product translation and cultural adaptation (for specific countries, regions, cultures or groups) to account for differences in distinct markets, a process known as internationalisation and localisation. - Wikipedia

***

Pipi internally automatically stores and uses 3-letter language codes, 4-letter Unicode and 3-letter country codes to define Locales.

Examples

eng-Latn-NZD (New Zealand English)
eng-Latn-USA (United States English)

Customers can configure the options for their own websites.

Examples

en-NZ
en-uk

More information

Pipi 7 (2020)

Small, simple, static HTML mockups of websites were created to test how different languages could be used. Experiments with HTML and CSS were conducted to display text on a website in Left-to-Right (LTR) and Right-to-Left (RTL) word order.

https://www.blog.ajabbi.com/2021/03/pipi-7-2020.html

Pipi 8 (2021-2022)

System-wide i18n and L10N namespaces were implemented throughout Pipi to enable reliable automation and rapid scaling across multiple languages.

https://www.blog.ajabbi.com/2022/12/pipi-8-2021-2022.html

Pipi 9 (2023-2026)

Joining up all the built systems to self-generate documentation and a front-end User Interface (UI).

https://www.blog.ajabbi.com/2023/12/pipi-9.html

Experiments were conducted to determine how to integrate i18n support with Pipi's other features. It was confirmed that the Pipi core is written in British Standard English and checked by Grammarly.

https://www.blog.ajabbi.com/2023/12/incorporating-languages-for-i18n.html

A source-target data model structure was created to store i18n scripts. It was greatly influenced by the system used by Wikipedia (MediaWiki) and OpenOffice.

Experiments were done using 23 languages and writing scripts to test the CMS Engine (cms), data storage, UI layout, etc.

String Translation

Community translation will require a dedicated workspace.

Account Settings

Each Pipi is built in 1 language and script. Each account can have many languages. An account has many deployments, each in only one language. A deployment can have many workspaces.

Localisation

API Endpoints

All API connections include a choice of API version and language/script.

Script Font

Noto from Google was chosen as the default font for Ajabbi due to the number of scripts it supports.

KeyMan

SIL provides an open-source KeyMan that enables keyboards for 2500 different languages to be added to websites. Pipi will use a Keyboard Engine (kyb) to provide this integration. This will be built as part of Pipi 10.

Url Naming Pattern

Many experiments were conducted to determine a URL structure that could accommodate websites in many languages. Wikipedia was the main influence.

Examples

eng.example.com
example.com/eng/
en-uk.example.com
example.com/en-nz/

Documentation

Documentation and Learning material will need to be provided in many languages. The data models are ready for this. As the British English documentation is completed, it could be auto-translated into US English using Grammarly and into the 9 world languages using Google Translate. It would then need to be checked by volunteer users. This is speculative and will require trial and error to confirm.

Language Prioritisation

English > 9 world languages > 7000 local languages + localisation.

Priority will be given to English, which will then serve as the source for translation into 9 world languages.

Arabic
Bhasa Indonesian
Chinese
French
German
Japanese
Hindi
Portugese
Russian
Spanish

Carefully edited material in those languages can then be translated into any of the other 7,000 languages by volunteers in response to user requests.

Model-driven UI

The User Interface Description Language (UIDL) was an EU-funded project that was abandoned in 2010 after 10 years of excellent work. It was to enable accessibility on different screens and devices. The research results were reverse-engineered to build a User Interface Engine (usi) that would run in reverse to generate accessibility solutions for Pipi. The CSS Engine (css) replaced some redundant components of the UIDL project. Additional engines for localisation and personalisation were created.

https://www.blog.ajabbi.com/2023/12/model-driven-user-interfaces.html

Pipi CMS Engine (cms)

For a first teaching customer, a decision was made early on to autogenerate a separate website for each language (English, Māori, NZ Sign Language, and AAC picture language). This was the simplest solution for the CMS and the users.

https://www.blog.ajabbi.com/2025/06/some-initial-notes-on-what-my-first.html

Creating UI for each natural language, including sign languages (i18n), requires user requests and volunteer testers.

Sign Language

The scheme was dreamed up to embed NZ Relay Video Interpreting on any webpage and in user workspaces. This is an ongoing experiment, driven by deaf people.

Picture Language

Professor Stephen Hawking used AAC via a computer-generated voice. There are many forms of AAC, including picture language. Providing this as a UI is being explored, with other AAC to follow. Important for the millions of people with Cerebral Palsy and Motor Neurone Disease.

https://www.blog.ajabbi.com/2026/02/how-intel-gave-stephen-hawking-voice.html

Invented Languages

This system will be able to provide support for Klingon, Elvish, and other invented languages from books and movies, upon request and with volunteers prepared to do the work. This could be useful for fan communities.

Dead Languages

This system will be able to provide support for long-dead languages often studied by linguists and historians, such as Ancient Egyptian, Sumerian, Sanskrit, and Ancient Greek, upon request, with volunteers prepared to do the work. This could be useful for museums and faith communities.

Workspace personalisation

The workspace settings will eventually offer complete personalisation of the UI in other languages. This will use a personalisation form in account settings.

https://www.blog.ajabbi.com/2025/10/workspace-menu.html

Future Ajabbi Foundation Sponsorship

Once Ajabbi has established ongoing sponsorship for Ortus for providing open-source BoxLang, the Ajabbi Foundation will generously sponsor open-source SIL KeyMan on an ongoing basis.

Whats next

Pipi 9 is available only in English. However, users can request any other language through their profile. Pipi 10 (2027-) will feature those multiple languages.

https://www.blog.ajabbi.com/2026/02/the-workspace-issue.html

Thanks

The most useful and inspiring resource has been SIL Global.

https://global.sil.org/

Dedication

Every child has the right to be educated in the language of their people and of their birth. This is dedicated to those working tirelessly to record, strengthen or revive human languages.

On a Sandy Beach

Pages

The i18n Issue

Mike's Notes

Resources

References

Repository

Last Updated

The i18n Issue

Pipi 4 (2005-2008)

Pipi 6 (2017-2019)

Pipi 7 (2020)

Pipi 8 (2021-2022)

Pipi 9 (2023-2026)

No comments:

Post a Comment