Unicode in RISC OS: Difference between revisions

From RISC OS
Jump to navigationJump to search
m (Fix link)
m (→‎RISC OS 5: Fix from original email posting)
Line 26: Line 26:
2) Printing UTF-16 or UTF-32 to any printer driver will fail as they're
2) Printing UTF-16 or UTF-32 to any printer driver will fail as they're
not expecting anything other than an 8byte encoding. Therefore, they
not expecting anything other than an 8 bit encoding. Therefore, they
pass the string to the FontManager without specifying that it's
pass the string to the FontManager without specifying that it's
UTF<16,32> and the FontManager ends up interpreting the individual
UTF<16,32> and the FontManager ends up interpreting the individual
Line 37: Line 37:
Issue 1 has been known about ever since Unicode support was added to the
Issue 1 has been known about ever since Unicode support was added to the
FontManager (so 1998 or so). I reported the second issue to Castle 4 years
FontManager (so 1998 or so). I reported the second issue to Castle 4 years
ago.
ago.

==Further reading==
==Further reading==



Revision as of 18:08, 24 January 2007

Unicode is a means of supporting characters from many different writing systems of the world, and for them to be manipulated in a uniform manner.

RISC OS 4

RISC OS 4 does not support Unicode; its only addition is an ISO-8859-15 (Latin9) character set to support the Euro symbol at code 0xA4. However its implementation is not strictly correct in that in Latin1 it also replaces the international currency symbol with a Euro:

                                 0x80            0xA4
ISO 8859-1 (Latin-1)             undefined       int'l currency symbol
RISC OS 3 Latin-1                undefined (*)   int'l currency symbol
RISC OS 4 Latin-1                Euro            Euro
RISC OS 5 Latin-1                Euro            int'l currency symbol
Microsoft Latin-1 (CP1252)       Euro            int'l currency symbol

ISO 8859-15 (Latin-9)            undefined       Euro
RISC OS 5 Latin-9                undefined       Euro

RISC OS 5

RISC OS 5 provides a Unicode Font Manager which is able to display Unicode characters and accept text in UTF-8, UTF-16 and UTF-32. Other parts of the RISC OS kernel and core modules support text described in UTF-8. A Japanese Input Method Editor is available as is a specification for other languages.

Printing, however, is broken. John-Mark Bell writes on the zap-users list (20 Jan 2007):

There are two issues:                                                                                                                                                 
1) Printing Unicode to a PostScript printer will break as PDriverPS just 
   embeds the Fonts:Encodings.UTF8 encoding file directly in the PS                
   output. This file is not valid PostScript.                                      
                                                                                   
2) Printing UTF-16 or UTF-32 to any printer driver will fail as they're            
   not expecting anything other than an 8 bit encoding. Therefore, they            
   pass the string to the FontManager without specifying that it's                 
   UTF<16,32> and the FontManager ends up interpreting the individual              
   bytes of each character code as individual characters. This can result          
   in the FontManager seeing control codes in the text string. The bug's           
   in the Printer Drivers as they don't pass the encoding information on.          
   UTF8 shouldn't be affected in this case, however, as FontManager                
   control characters can't occur as continuation bytes.                           
                                                                                   
Issue 1 has been known about ever since Unicode support was added to the           
FontManager (so 1998 or so). I reported the second issue to Castle 4 years          
ago.

Further reading