Unicode in RISC OS


Jump to: navigation, search

Unicode is a means of supporting characters from many different writing systems of the world, and for them to be manipulated in a uniform manner.


RISC OS 4 does not support Unicode; its only addition is an ISO-8859-15 (Latin9) character set to support the Euro symbol at code 0xA4. However its implementation is not strictly correct in that in Latin1 it also replaces the international currency symbol with a Euro:

                                 0x80            0xA4
ISO 8859-1 (Latin-1)             undefined       int'l currency symbol
RISC OS 3 Latin-1                undefined (*)   int'l currency symbol
RISC OS 4 Latin-1                Euro            Euro
RISC OS 5 Latin-1                Euro            int'l currency symbol
Microsoft Latin-1 (CP1252)       Euro            int'l currency symbol

ISO 8859-15 (Latin-9)            undefined       Euro
RISC OS 5 Latin-9                undefined       Euro


RISC OS 5 provides a Unicode Font Manager which is able to display Unicode characters and accept text in UTF-8, UTF-16 and UTF-32. Other parts of the RISC OS kernel and core modules support text described in UTF-8. A Japanese Input Method Editor is available as is a specification for other languages.

On currently released versions of RISC OS 5, printing in Unicode is broken. John-Mark Bell writes on the zap-users list (20 Jan 2007):

There are two issues:                                                                                                                                                 
1) Printing Unicode to a PostScript printer will break as PDriverPS just 
   embeds the Fonts:Encodings.UTF8 encoding file directly in the PS                
   output. This file is not valid PostScript.                                      
2) Printing UTF-16 or UTF-32 to any printer driver will fail as they're            
   not expecting anything other than an 8 bit encoding. Therefore, they            
   pass the string to the FontManager without specifying that it's                 
   UTF<16,32> and the FontManager ends up interpreting the individual              
   bytes of each character code as individual characters. This can result          
   in the FontManager seeing control codes in the text string. The bug's           
   in the Printer Drivers as they don't pass the encoding information on.          
   UTF8 shouldn't be affected in this case, however, as FontManager                
   control characters can't occur as continuation bytes.                           

Issue 1 can be avoided by using the PostScript 3 printer driver (instead of using the native RISC OS PostScript printer driver) which was developed by John Tytgat and Martin W├╝rthner. The ROOL project has released updated Printer Manager software which fixes issue 2.

Further reading

Personal tools