Unicode in RISC OS: Difference between revisions

From RISC OS
Jump to navigationJump to search
m (Linked ROOL to the correct place)
 
(added stub tag, from works in progress)
Line 40: Line 40:


* [http://www.iyonix.com/32bit/fonts.shtml RISC OS 5 Unicode fonts]
* [http://www.iyonix.com/32bit/fonts.shtml RISC OS 5 Unicode fonts]

{{stub}}

Revision as of 04:45, 1 October 2009

Unicode is a means of supporting characters from many different writing systems of the world, and for them to be manipulated in a uniform manner.

RISC OS 4

RISC OS 4 does not support Unicode; its only addition is an ISO-8859-15 (Latin9) character set to support the Euro symbol at code 0xA4. However its implementation is not strictly correct in that in Latin1 it also replaces the international currency symbol with a Euro:

                                 0x80            0xA4
ISO 8859-1 (Latin-1)             undefined       int'l currency symbol
RISC OS 3 Latin-1                undefined (*)   int'l currency symbol
RISC OS 4 Latin-1                Euro            Euro
RISC OS 5 Latin-1                Euro            int'l currency symbol
Microsoft Latin-1 (CP1252)       Euro            int'l currency symbol

ISO 8859-15 (Latin-9)            undefined       Euro
RISC OS 5 Latin-9                undefined       Euro

RISC OS 5

RISC OS 5 provides a Unicode Font Manager which is able to display Unicode characters and accept text in UTF-8, UTF-16 and UTF-32. Other parts of the RISC OS kernel and core modules support text described in UTF-8. A Japanese Input Method Editor is available as is a specification for other languages.

On currently released versions of RISC OS 5, printing in Unicode is broken. John-Mark Bell writes on the zap-users list (20 Jan 2007):

There are two issues:                                                                                                                                                 
1) Printing Unicode to a PostScript printer will break as PDriverPS just 
   embeds the Fonts:Encodings.UTF8 encoding file directly in the PS                
   output. This file is not valid PostScript.                                      
                                                                                   
2) Printing UTF-16 or UTF-32 to any printer driver will fail as they're            
   not expecting anything other than an 8 bit encoding. Therefore, they            
   pass the string to the FontManager without specifying that it's                 
   UTF<16,32> and the FontManager ends up interpreting the individual              
   bytes of each character code as individual characters. This can result          
   in the FontManager seeing control codes in the text string. The bug's           
   in the Printer Drivers as they don't pass the encoding information on.          
   UTF8 shouldn't be affected in this case, however, as FontManager                
   control characters can't occur as continuation bytes.                           

The ROOL project has released updated Printer Manager software which fixes issue 2. The PostScript drivers are currently being overhauled as well, which will hopefully resolve issue 1.

Further reading