Notes

Oskar Schirmer

2006

PostScript font encoding in unicode times

When PostScript was designed, typical character set encoding was based on seven or eight bit wide words, with ASCII being the most wide spread code. For various languages, variants have been introduced, some of these are still in use with the ISO 8859 series. To enable PostScript to depict letters and signs for various languages, and because PostScript programmes themselves are usually encoded in eight bit words, it made perfect sense to introduce the font encoding vector, which is used to assign eight bit values glyph names. To select the desired glyphs from a given font, it is sufficient to set the encoding vector accordingly.

The disadvantage with it is, no more than 256 symbols defined in a font can be used at once. Of course, one can set the font every now and then to have something like font bank switching, but latest when it comes to print chinese text, the number of symbols to be used grows so large, that chances are one cannot print a single line without need to change the encoding vector on one's way.

Later, Unicode was designed, providing one single character encoding vector for almost all symbols that one might encounter. Unicode is complemented by UTF-8, an eight bit words based encoding, backward compatible with ASCII, and thus a good candidate to encode PostScript programmes.

With Unicode and UTF-8, all that is needed to get rid of the encoding vector based bank switching model is available: Define all glyphs with their Unicode numbers instead of the glyph names, make the show operator accept both strings and arrays, with an array of numbers being taken for a list of Unicode characters, a string on the other hand being interpreted as UTF-8 and thus converted to an array of Unicode numbers before printing. And as long as a PostScript programme does not define fonts but simply uses them, this method is backward compatible with current PostScript.