Keys play a crucial role in every successful software localisation project and for two reasons. Firstly, a translation unit stored with a key will always overrule a traditional translation memory match. Secondly, keys enable the isolation and locking of approved strings together with their separation from new strings. This is incredibly useful when coping with frequent sprints.
A proper import of software files into a translation management system (TMS) is usually preceded by preparatory engineering. There are many variables to deal with including
character limitations, embedded HTML tags and characters that should or shouldn’t be escaped.
Preparatory engineering is crucial and could require the software localisation engineer to think out of the box, if they are to get the most out of the TMS. When dealing with Asian languages like Japanese, it is particularly important to parse and display every possible particularity. This will ensure accuracy while providing sufficient context and guidelines.
Japanese presents many potential pitfalls including the fact that the language doesn’t feature spaces to separate words. As a consequence, the regular line breaking algorithms don’t always apply. However, in other ways, Japanese can be more straightforward to tackle than some languages. Pluralization is certainly less of an issue than it is with eastern European languages. Japanese has a single form for both singular and plural whereas the translation of “%n items” differs in some Slavic languages such as when “%n” equals 2 or “%n” equals 3.
Consider trading in pseudo translation for machine translation
Pseudo translation is the process of replacing source strings with random characters. This enables the software engineer to test whether international characters are displayed properly in the user interface. They can also check whether all source strings have been extracted from the software and/or imported into the TMS.
There’s no doubt about the importance of pseudo translation. It is a feature of a complex discipline that is often undervalued by translation buyers and LSPs.
When dealing with Japanese, the pseudo translation could be exchanged for a dummy machine translation. A Japanese MT sample will reveal potential problems including corrupt characters and formatting issues. These can then be tackled before commencing the actual localisation process. A good MT pre-translation could also be the start of a post-editing project if time is at a premium or the budget is restricted.
Deliver working software strings back
There may be occasions when the machine translation generates corrupt Japanese characters. In addition, sometimes the developer may not manage to import the MT sample back into the user interface, even though the Japanese characters look good. In these cases, there’s probably a character set or character encoding issue that needs to be addressed.
Characters that are needed for a specific purpose in a computer environment are grouped in character sets such as ASCII or Unicode. Each character in a character set is associated with a number and this is known as a “code point”. An encoding provides a key to unlock the code through a set of mappings between the bytes in the computer and the characters in the character set. Unicode UTF-8 is the most common and safe encoding for JSON, XML and plain text file. This is because it is a “super encoding” that is able to process every character in every living language. Nevertheless, developers often provide software strings in an encoding that works perfectly for English and certain target languages like French and Spanish, but not for Asian languages. This may cause Japanese characters to be exported corruptly from the TMS.
If the developer isn’t able to import the localised software strings back into the UI, even when there are no corrupt characters, this indicates that the encoding requirements have not been met. In the case of Java Properties, the target encoding must be ASCII. This encoding doesn’t support Japanese characters and therefore requires them to be Unicode-escaped, which looks a bit odd, but is perfectly fine: