Mike's Notes
I'm looking for a list in English of standard phrases or terms commonly used in the User Interface (UI).
eg;
- Align Left
- Align Middle
- Alight Right
- Cancel
- Clear
- Close
- Copy
- Create
- Cut
- Delete
This would serve as the basis for creating an online translation database that supports multiple languages and writing systems, again using the words commonly used. It seems pointless to recreate the wheel.
Maybe 500 common phrases would be enough to make the basic website UI and user workspaces available in different languages. Even if it is to sign up, create a profile, and help with community translation.
Resources
- https://www.blog.ajabbi.com/2025/09/the-rise-and-fall-of-standard-user.html
- https://www.reddit.com/r/TranslationStudies/comments/1ayr1i1/how_to_turn_a_tbx_file_into_a_spreadsheet/
- https://learn.microsoft.com/en-us/globalization/reference/microsoft-language-resources
- https://download.microsoft.com/download/b/2/d/b2db7a7c-8d33-47f3-b2c1-ee5e6445cf45/MicrosoftTermCollection.zip
- https://en.wikipedia.org/wiki/TermBase_eXchange
- https://resources.gala-global.org/lisa-oscar/
- https://appstore.rws.com/Plugin/198
- https://cerebus.de/glossaryconverter/help/Introduction.html
- https://www.tbxinfo.net/
- https://www.tbxinfo.net/tbx-support/
- https://www.termbases.eu/
- https://i18n.ajabbi.com/
- https://en.wikipedia.org/wiki/Translate_Toolkit
- https://translatewiki.net/wiki/Translating:MediaWiki
References
- Reference
Repository
- Home > Ajabbi Research > Library >
- Home > Handbook >
Last Updated
23/09/2025
UI Phrases
Mike is the inventor and architect of Pipi and the founder of Ajabbi.
I have been experimenting with importing the Microsoft Term Collection. It is a freely downloadable zip file containing a collection of TBX files, one per language. TBX is an XML format with file extension ".tbx". There seem to be about 110+ languages. Each file record contains a source term in US English, and a translated term in the target language, eg French, German, Ukrainian.
Success
I was able to extract the terms that Microsoft uses. There are about 37,000 unique English terms. Some of them are useful, most are not, while others that are needed are missing. But it is a start.
Language Tools
To extract the language translations, the main challenge is converting the XML files into a tab-delimited format in the correct Unicode.
I'm trying a range of tools to see what works.
There is a list of TBX tools here
https://www.tbxinfo.net/tbx-support/
I used the free tool Glossary Converter to batch import 110+ language TBX files and convert to 110+ Excel 2007 format .xlsx files ready for batch import into the translation database.
The only 4 languages where there was a conversion problem were;
- Central Kurdish
- Punjabi (Arabic)
- Tartar (Cyrillic)
- Wayuu
This probably happened because I'm still learning how to use Glossary Converter and had the wrong setting for handling terms.
Importing
Importing has worked so far for these language files.
- Afrikaans
- French
- Hebrew
- German
- Maori
- Russian
- Spanish
- Ukranian
TermBase eXchange (TBX)
"TermBase eXchange (TBX) is an open XML-based standard that allows you to represent structured, concept-oriented terminological data in a database, which is known as termbase." - Wordbee
.tbx File Format
"TermBase eXchange (TBX) is an international standard (ISO 30042:2019) for the representation of structured concept-oriented terminological data, copublished by ISO and the Localization Industry Standards Association (LISA). Originally released in 2002 by LISA's OSCAR special interest group, TBX was adopted by ISO TC 37 in 2008. In 2019 ISO 30042:2008 was withdrawn and revised by ISO 30042:2019. It is currently available as an ISO standard and as an open, industry standard, available at no charge.
TBX defines an XML format for the exchange of terminology data, and is "an industry standard for terminology exchange"." - Wikipedia
Translation Website
To accommodate the translation needs of Ajabbi, a static translation website using a subdomain is now being created. It will be batch-rendered from a database containing a list of terms in each language and writing system (script), with a page for each term and its usage.
Database
As the database is updated, the website will grow. It will begin with English-UK spelling and Māori, both of which use the Latin script, and gradually expand to cover all languages and dialects requested by users.
The database data model should be compatible with importing and exporting TBX.
Translation Website URLs
Many languages have a local dialect (localisation) and sometimes use more than one script. There needs to be a robust pattern language for the automated naming of URLs. Using a naming pattern of 3-digit language code, plus a 4-digit script code, plus a 2-digit country code. If there is no variation, then use the 3-digit language code only, etc.
Some examples to think about;
- i18n.ajabbi.com/mri/.../....html
- i18n.ajabbi.com/eng-uk/term/save-as.html
- i18n.ajabbi.com/heb-hebr/.../....html
- i18najabbi.com/heb-latn/.../....html
Translation Workspace
A dedicated workspace for users to help translate phrases in the UI should be built. Later, this could be extended to include translating help documentation, content, and other materials. Something similar to how Wikipedia or OpenOffice enable community translation efforts. Of course, each workspace needs to be available in any of the 110+ languages imported so far.
Open-source
The phrase library on the translation website should be made freely available for download in multiple formats, including TBX, CSV, and Database Formats.
No comments:
Post a Comment