RISC OS Filename Translation

=Introduction=

Compatibility with other Operating Systems has long since been important for all OSes, especially minority ones like RISC OS. A crucial part of this is how RISC OS handles filenames compared to the rest of the world. There's been discussion about the failing of LanManFS on both Usenet and the Iyonix mailing list, and there are perennial misunderstandings about how things ought to work, compounded by incorrect settings.

= The Issue = A fully qualified RISC OS filename might be something like:

ADFS::MyRiscPC.$.Documents.Letter (filetyped to text)

But under Unix, it might be:

/home/peter/documents/letter.txt

Or Windows:

C:\MyDocuments\letter.txt

We're not going to discuss the Windows/DOS format any further, since the issues are much the same, and the Unix format is, by and large, the canonical format for data swapping. However, it's clear that RISC OS and Unix formats differ considerably, and what's appropriate for translation depends upon the circumstance, and there are several we should consider:


 * URL file handling
 * Filenames over network filing systems (NFS, Samba)
 * File handling in RISC OS ports and source files
 * Files on ADFS filing systems under Linux

= Simply Put = The simple case which applies to all is to simply swap the '.' and '/' characters, and this is reasonably obvious:

RISC OS                      Unix file/txt                     file.txt program.data                 program/data SpriteFile (filetyped Sprite) SpriteFile,ff9

The last one is slightly unusual - in some cases, it's appropriate to add the ,xxx extension containing the RISC OS filetype to preserve it over systems which don't support this concept.

There is one final issue - Unix filenames usually have everything represented on one filesystem under the root (the "/" directory), whilst RISC OS uses different file systems. Usually this is handled with something like:

ATAFS::Erble.$.Internet.!Nettle

Becoming:

/ATAFS::Erble.$/Internet/!Nettle/

=URLs in RISC OS=

Given the above, and because you've probably seen a file: style URL in RISC OS, the result won't be very surprising. RISC OS browser and web servers will need to perform the '.' and '/' swap. We therefore end up with URLs like:

file:///ATAFS::Erble.$/Programming/SDL/SDL12/docs.html

When fetching files, the browser will look at the extension and do a MimeMap lookup to set its type - more on MimeMap below.

=Network Filesystem filenames=

This is probably the most contentious topic of the all of them - failure to do it properly will result in lots of frustrated users. Recently, Castle released an update to LanManFS, the Samba client bundled with RISC OS 5. Unfortunately, it had a number of issues, including having an unhelpful default type for untyped files on the host.

The main issue with network filing systems - and there are several for RISC OS, including LanManFS, LanMan98, ImageNFS and Sunfish - is the need to have sensible filenames at both ends, and handling of RISC OS filetypes. Some, but not all, of the principles also apply to filename handling in DOSFS/Win95FS filesystems.

In short - if there's an ,xxx extension on the host filename, then this becomes the filetype presented to RISC OS for the file. If not, the client looks at the filename extension on the host (the bit after the last full stop) - and tries to match that to a filetype using the RISC OS MimeMap functions. If there's no match, or no extension, then it is given the default filetype.

There's also another incidental issue here - some of the miscellaneous ASCII characters that are valid in RISC OS aren't valid on network filesystems, and need to be translated back and forth.

When going the other way, we don't want to needlessly add ,xxx extensions, so if the RISC OS file has an extension which matches its filetype with a MimeMap lookup, then no ,xxx extension is added, otherwise it is - the exception is if the filetype matches the default, in which case it is also left unchanged.

One slightly unexpected behaviour with this if you are not familiar with the operation is the naming of a typed file on RISC OS which lacks an extension. A JPEG typed file 'picture' in RISC OS becomes 'picture,c85', which may not be entirely helpful in contrast to 'picture/jpg" which becomes 'picture.jpg'.

It's easy to suggest that such files automatically get their extension automatically added. However, this creates problems with reversibility, because the filename that gets created is now not identical to what an application expected. To counter this, you might also suggest that file extensions are hidden, and RISC OS is just presented typed files - but again, this causes problems, because we might have several files with the same name, but different extensions. In practice, the best workaround is to make sure you preserve the extension if you plan on transferring files for use on Windows or Unix.

Finally, what should the default filetype be? The only correct answer which happens to work in all circumstances is text, which is what LanMan98 and Sunfish have. Unfortunately, LanManFS has an unhelpful default of the DOS filetype (fe4). On the Iyonix mailing list, Peter Naulls outlined why this would cause problems, which we'll repeat here:


 * From a purely syntactical standpoint, it adds nothing. The DOS filetype isn't anything that's usefully used by any RISC OS application.
 * On a Unix system, many files lack extensions (are "untyped"). Many of these are in fact textual, and the text filetype is certainly what you want. Most of the remainder are native binary executables, and loading into an editor is probably the only sensible thing to do with them on RISC OS.
 * Because of the above, creating a new plain file on a Unix share means that the unhelpful ,fff extension is added when all you want to do is create a plain file, so a tedious rename from a shell is required.
 * There are numerous (mostly source code) files that certainly are also textual in nature. True, these have extensions, and could be added to MimeMap, but there are at least 30 common ones, and these would have to be added to every system, but having the default textual requires no extra effort.
 * AOF/ALF files (Acorn C/C++ output notwithstanding) from GCC on RISC OS are filetyped text. It is common to transfer these back and forth to a system where you're cross compiling - again, there are obvious problems here.
 * On Windows the tangible benefits of a text default tend to be quite few, although there are certainly limited cases of all that I've listed above. Changing from DOS filetype default emphatically causes no problems at all.

Hopefully the default will be changed in the next version of LanManFS.

= File handling in RISC OS ports and source files =

For ports of programs from Unix, there are further issues. Many programs assume Unix style filenames, or their config files have Unix style filenames in them. Sometimes ports can be modified to use RISC OS filenames instead, and often this is preferable. In other cases, the handling is deeply embedded, or just not worth the effort to change, especially when we're talking about lots of different ports.

Fortunately, UnixLib comes to the rescue, and at all places where a filename is passed to RISC OS, it goes through a translation function, which performs the types of translations previously discussed, and many others. The behaviour is configurable, and is also very complex to deal with many cases that we don't have room to discuss here. One for example which is on by default is to set the RISC OS filetype based upon the extension with a MimeMap lookup.

For most ports, the default behaviour works fine, but there might need to for further code to translate RISC OS names to Unix ones (and UnixLib provides another function for this) at the point that filenames are passed to a program. It is sometimes necessary to switch the translation on and off, which is entirely possible.

One special case, which is also dealt with by Unixlib, is source filename handling, in particular, C source files. For historical reasons relating to the old 10/77 limitations on RISC OS, and now something that is still used for convenience, C header and source files are stored in their own 'c' and 'h' directories under RISC OS.

This means that 'program/file.c' under Unix is 'program.c.file' under RISC OS. In some cases, it is even required for Unixlib to create a new directory before a file can be created. The extensions for which this happen are configurable, but for GCC, it applies to about 20.

Sometimes these files Unix format filenames for source files need to be translated to RISC OS ones when files are unpacked from a Unix archive. One program than can do this automatically without lots of tedious renaming is my program, Reverser.

= Files on ADFS filing systems under Linux = There is one final interesting case. As you may know, it is possible to read ADFS filing systems under Linux. Sometimes it is useful to know the filetypes, especially if you're also exporting this filesystem over Samba or NFS to be subsequently read with a RISC OS client.

In this case, the solution is to always append the ,xxx filenames extracted from the ADFS filesystem. There is no gain in this instance of being choosy about when to add the extension. When saving files, the ,xxx extension is stripped and its filetype set. When there's no extension, fff is used. However, this is slightly academic, as ADFS saving is not fully implemented under Linux.

= MimeMap Behaviour = Historically, the mapping of filetypes was only used under things like DOSFS and SparkFS, and they were controlled by a separate systems. In the case of DOSFS, the *DOSMAp command. As the number of programs that wanted to do this increased, especially internet-related programs, this quickly became unsatisfatory. Instead of this, we now use a MimeMap file to control the mapping of filetypes, and this has been the case for many years.

The exact location of the file varies somewhat depending upon the version of RISC OS you have, but it's accessible via a path variable, and the MimeMap module which provides a SWI interface. "*show Inet$MimeMappings" will indicate where the file lives.

The file itself contains a formatted list of mappings. For a given type of file, it maps the IANA (Internet Assigned Numbers Authority) mime type to a RISC OS file type name, 12 bit RISC OS hexadecimal number, and filename extensions. For example, an entry for JPEG is:

image/jpeg JPEG c85 .jpg .jpeg .jpe "JPEG " "JPEG????"

This is actually one of two entries for JPEG, and the second entry contains additional mappings. This entry also contains extra information which allows the filetype to be determined from the first few bytes of the file.

= Common Problems = In most instances, the mapping to and from filetypes works correctly, but some things things go awry if settings aren't just so. Problems include additions of ,xxx extensions in unexpected situations. In many cases, this is simply because you don't have text as the default filetype as mentioned above. It's possible also that faults with old versions of the MimeMap module and its mapping problem are to blame.

Another issue is that handling of the filetype for Zip files. Up until recently, Zip files shared a filetype with other archives, especially SparkFS archives, of &DDC. On recent setups, Zip has its own filetype, &a91. If two systems disagree over what filetype is used for Zips, then it's possible to get unexpected ,ddc or ,a91 extensions on your filing system. This is easily corrected by fixing the mapping in your MimeMap so Zip files are distinct.

= Behaviour in certain filesystems =

AcornNFS
I don't have access to this so can't comment on its implementation, but I understand it is the only RISC OS filesystem that implements a method of representing untyped files with an extension to the ,xxx system. Normally, this feature isn't used in modern RISC OS usage, as untyped files are quite rare and not that useful, and I don't judge its loss in other implementations as a failure, but it might be useful in some instances.

Sunfish
Sunfish implements all aspects of the system fully. One thing it can't always do, and this is because of NFS itself rather than Sunfish is follow symlinks as they may point to outside the mount.

LanMan98
LanMan98 also implements the system correctly. I strongly recommend you do not use its "notypes" option. In some instances this may appear to work more correctly than without, but what's really happening is that it's masking issues. notypes won't allow you to change filetype and it can display ,xxx extensions at the RISC OS end which is almost never useful.

LanManFS
This is mainly in reference to the version bundled with RISC OS 5. I don't believe that LanManFS always gets the filetyping right. Sometimes it appends the ,xxx when it shouldn't be. In addition, it has a fixed buffer size meaning it might not display all the files in a large directory. I've also seen it have issues with certain filename characters causing confusion. I recommend using LanMan98 or SunFish instead if you can.

HostFS (VirtualRPC)
HostFS has two options for filetype handling, neither of which are particularly satisfactory. One option always adds the ,xxx and other acts like "notypes" in LanMan98. Its default filetype is also incorrectly set to data rather than text. If you're only looking at files you've created from within VirtualRPC, then the first option will work ok, but once you start accessing files transferred onto the host system or created there, then you can quickly run into problems. Note that HostFS is also used for VirtualRPC's CD access.

DOSFS/Win95FS
To the best of my knowledge, none of these understands the ,xxx system. However, they can make use of a free FAT field to store the RISC OS filetype if it's changed on RISC OS. This generally works just fine.

CDFS/CDROMFS
Again, these do not make use of the ,xxx extension (except when viewed by a remote system which does like NFS), but there is an extension to the CD format which can store the RISC OS filetype. Usually, however this is not used as the mime mapping from the extension is sufficient for CDs, or RISC OS files are put in zip files on the CD.

SparkFS/Zip files
The zip format too contains an extension to contain the RISC OS filetype if it's set. SparkFS itself does not make use of the ,xxx extension, although I don't know if any of the other RISC OS zip extractors do. The single change I make to my SparkFS setup is to make the default type text rather than data. This is because of reasons I outlined in the article above, and can make somethings work more smoothly. For creating zip files on a foreign system, GCCSDK contains a modified version of Infozip which can optionally turn the ,xxx extension on (for example) the Unix filesystem into the filetype in the extra zip field, and strip it from the filename. There is one potential ambiguity however - filenames which have more than fullstop in them (i.e, a Unix/DOS extension) and with no ,xxx extension get an explicit RISC OS textual type for consistency.

UnixLib programs
Most UnixLib programs don't have any need to make use of this system, although there are some exceptions such as John Tytgat's RISC OS port of CVS. We made very sure that UnixLib implements all aspects properly where relevant, since getting filename handling correct can be crucial with Unix ports.

Linux ADFS
When mounting an ADFS filesystem under Linux, the RISC OS filetypes are ignored. However, Peter Naulls did make a modification which allows the ,xxx extensions to be concatenated where appropriate. This can be handy when viewing the exported filesystem on a real RISC OS system. This change should be usable in the Iyonix Linux kernels.

Messenger/Pluto
For attachments, these mailers will need to make to make a mime match back and forth. I don't have Pluto, nor the latest version of Messenger, but I believe that in most instances they do it ok, but the default filetype is not text, which may cause issues.