1*4882a593SmuzhiyunUnicode support 2*4882a593Smuzhiyun=============== 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun Last update: 2005-01-17, version 1.4 5*4882a593Smuzhiyun 6*4882a593SmuzhiyunThis file is maintained by H. Peter Anvin <unicode@lanana.org> as part 7*4882a593Smuzhiyunof the Linux Assigned Names And Numbers Authority (LANANA) project. 8*4882a593SmuzhiyunThe current version can be found at: 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun http://www.lanana.org/docs/unicode/admin-guide/unicode.rst 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunIntroduction 13*4882a593Smuzhiyun------------ 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThe Linux kernel code has been rewritten to use Unicode to map 16*4882a593Smuzhiyuncharacters to fonts. By downloading a single Unicode-to-font table, 17*4882a593Smuzhiyunboth the eight-bit character sets and UTF-8 mode are changed to use 18*4882a593Smuzhiyunthe font as indicated. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThis changes the semantics of the eight-bit character tables subtly. 21*4882a593SmuzhiyunThe four character tables are now: 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun=============== =============================== ================ 24*4882a593SmuzhiyunMap symbol Map name Escape code (G0) 25*4882a593Smuzhiyun=============== =============================== ================ 26*4882a593SmuzhiyunLAT1_MAP Latin-1 (ISO 8859-1) ESC ( B 27*4882a593SmuzhiyunGRAF_MAP DEC VT100 pseudographics ESC ( 0 28*4882a593SmuzhiyunIBMPC_MAP IBM code page 437 ESC ( U 29*4882a593SmuzhiyunUSER_MAP User defined ESC ( K 30*4882a593Smuzhiyun=============== =============================== ================ 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunIn particular, ESC ( U is no longer "straight to font", since the font 33*4882a593Smuzhiyunmight be completely different than the IBM character set. This 34*4882a593Smuzhiyunpermits for example the use of block graphics even with a Latin-1 font 35*4882a593Smuzhiyunloaded. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunNote that although these codes are similar to ISO 2022, neither the 38*4882a593Smuzhiyuncodes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and 39*4882a593SmuzhiyunG1), whereas ISO 2022 has four 7-bit codes (G0-G3). 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunIn accordance with the Unicode standard/ISO 10646 the range U+F000 to 42*4882a593SmuzhiyunU+F8FF has been reserved for OS-wide allocation (the Unicode Standard 43*4882a593Smuzhiyunrefers to this as a "Corporate Zone", since this is inaccurate for 44*4882a593SmuzhiyunLinux we call it the "Linux Zone"). U+F000 was picked as the starting 45*4882a593Smuzhiyunpoint since it lets the direct-mapping area start on a large power of 46*4882a593Smuzhiyuntwo (in case 1024- or 2048-character fonts ever become necessary). 47*4882a593SmuzhiyunThis leaves U+E000 to U+EFFF as End User Zone. 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been 50*4882a593Smuzhiyunhard-coded to map directly to the loaded font, bypassing the 51*4882a593Smuzhiyuntranslation table. The user-defined map now defaults to U+F000 to 52*4882a593SmuzhiyunU+F0FF, emulating the previous behaviour. In practice, this range 53*4882a593Smuzhiyunmight be shorter; for example, vgacon can only handle 256-character 54*4882a593Smuzhiyun(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunActual characters assigned in the Linux Zone 58*4882a593Smuzhiyun-------------------------------------------- 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunIn addition, the following characters not present in Unicode 1.1.4 61*4882a593Smuzhiyunhave been defined; these are used by the DEC VT graphics map. [v1.2] 62*4882a593SmuzhiyunTHIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun====== ====================================== 65*4882a593SmuzhiyunU+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 66*4882a593SmuzhiyunU+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 67*4882a593SmuzhiyunU+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 68*4882a593SmuzhiyunU+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 69*4882a593Smuzhiyun====== ====================================== 70*4882a593Smuzhiyun 71*4882a593SmuzhiyunThe DEC VT220 uses a 6x10 character matrix, and these characters form 72*4882a593Smuzhiyuna smooth progression in the DEC VT graphics character set. I have 73*4882a593Smuzhiyunomitted the scan 5 line, since it is also used as a block-graphics 74*4882a593Smuzhiyuncharacter, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. 75*4882a593Smuzhiyun 76*4882a593Smuzhiyun[v1.3]: These characters have been officially added to Unicode 3.2.0; 77*4882a593Smuzhiyunthey are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the 78*4882a593Smuzhiyunnew values. 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun[v1.2]: The following characters have been added to represent common 81*4882a593Smuzhiyunkeyboard symbols that are unlikely to ever be added to Unicode proper 82*4882a593Smuzhiyunsince they are horribly vendor-specific. This, of course, is an 83*4882a593Smuzhiyunexcellent example of horrible design. 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun====== ====================================== 86*4882a593SmuzhiyunU+F810 KEYBOARD SYMBOL FLYING FLAG 87*4882a593SmuzhiyunU+F811 KEYBOARD SYMBOL PULLDOWN MENU 88*4882a593SmuzhiyunU+F812 KEYBOARD SYMBOL OPEN APPLE 89*4882a593SmuzhiyunU+F813 KEYBOARD SYMBOL SOLID APPLE 90*4882a593Smuzhiyun====== ====================================== 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunKlingon language support 93*4882a593Smuzhiyun------------------------ 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunIn 1996, Linux was the first operating system in the world to add 96*4882a593Smuzhiyunsupport for the artificial language Klingon, created by Marc Okrand 97*4882a593Smuzhiyunfor the "Star Trek" television series. This encoding was later 98*4882a593Smuzhiyunadopted by the ConScript Unicode Registry and proposed (but ultimately 99*4882a593Smuzhiyunrejected) for inclusion in Unicode Plane 1. Thus, it remains as a 100*4882a593SmuzhiyunLinux/CSUR private assignment in the Linux Zone. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunThis encoding has been endorsed by the Klingon Language Institute. 103*4882a593SmuzhiyunFor more information, contact them at: 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun http://www.kli.org/ 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunSince the characters in the beginning of the Linux CZ have been more 108*4882a593Smuzhiyunof the dingbats/symbols/forms type and this is a language, I have 109*4882a593Smuzhiyunlocated it at the end, on a 16-cell boundary in keeping with standard 110*4882a593SmuzhiyunUnicode practice. 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun.. note:: 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun This range is now officially managed by the ConScript Unicode 115*4882a593Smuzhiyun Registry. The normative reference is at: 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun https://www.evertype.com/standards/csur/klingon.html 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunKlingon has an alphabet of 26 characters, a positional numeric writing 120*4882a593Smuzhiyunsystem with 10 digits, and is written left-to-right, top-to-bottom. 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunSeveral glyph forms for the Klingon alphabet have been proposed. 123*4882a593SmuzhiyunHowever, since the set of symbols appear to be consistent throughout, 124*4882a593Smuzhiyunwith only the actual shapes being different, in keeping with standard 125*4882a593SmuzhiyunUnicode practice these differences are considered font variants. 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun====== ======================================================= 128*4882a593SmuzhiyunU+F8D0 KLINGON LETTER A 129*4882a593SmuzhiyunU+F8D1 KLINGON LETTER B 130*4882a593SmuzhiyunU+F8D2 KLINGON LETTER CH 131*4882a593SmuzhiyunU+F8D3 KLINGON LETTER D 132*4882a593SmuzhiyunU+F8D4 KLINGON LETTER E 133*4882a593SmuzhiyunU+F8D5 KLINGON LETTER GH 134*4882a593SmuzhiyunU+F8D6 KLINGON LETTER H 135*4882a593SmuzhiyunU+F8D7 KLINGON LETTER I 136*4882a593SmuzhiyunU+F8D8 KLINGON LETTER J 137*4882a593SmuzhiyunU+F8D9 KLINGON LETTER L 138*4882a593SmuzhiyunU+F8DA KLINGON LETTER M 139*4882a593SmuzhiyunU+F8DB KLINGON LETTER N 140*4882a593SmuzhiyunU+F8DC KLINGON LETTER NG 141*4882a593SmuzhiyunU+F8DD KLINGON LETTER O 142*4882a593SmuzhiyunU+F8DE KLINGON LETTER P 143*4882a593SmuzhiyunU+F8DF KLINGON LETTER Q 144*4882a593Smuzhiyun - Written <q> in standard Okrand Latin transliteration 145*4882a593SmuzhiyunU+F8E0 KLINGON LETTER QH 146*4882a593Smuzhiyun - Written <Q> in standard Okrand Latin transliteration 147*4882a593SmuzhiyunU+F8E1 KLINGON LETTER R 148*4882a593SmuzhiyunU+F8E2 KLINGON LETTER S 149*4882a593SmuzhiyunU+F8E3 KLINGON LETTER T 150*4882a593SmuzhiyunU+F8E4 KLINGON LETTER TLH 151*4882a593SmuzhiyunU+F8E5 KLINGON LETTER U 152*4882a593SmuzhiyunU+F8E6 KLINGON LETTER V 153*4882a593SmuzhiyunU+F8E7 KLINGON LETTER W 154*4882a593SmuzhiyunU+F8E8 KLINGON LETTER Y 155*4882a593SmuzhiyunU+F8E9 KLINGON LETTER GLOTTAL STOP 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunU+F8F0 KLINGON DIGIT ZERO 158*4882a593SmuzhiyunU+F8F1 KLINGON DIGIT ONE 159*4882a593SmuzhiyunU+F8F2 KLINGON DIGIT TWO 160*4882a593SmuzhiyunU+F8F3 KLINGON DIGIT THREE 161*4882a593SmuzhiyunU+F8F4 KLINGON DIGIT FOUR 162*4882a593SmuzhiyunU+F8F5 KLINGON DIGIT FIVE 163*4882a593SmuzhiyunU+F8F6 KLINGON DIGIT SIX 164*4882a593SmuzhiyunU+F8F7 KLINGON DIGIT SEVEN 165*4882a593SmuzhiyunU+F8F8 KLINGON DIGIT EIGHT 166*4882a593SmuzhiyunU+F8F9 KLINGON DIGIT NINE 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunU+F8FD KLINGON COMMA 169*4882a593SmuzhiyunU+F8FE KLINGON FULL STOP 170*4882a593SmuzhiyunU+F8FF KLINGON SYMBOL FOR EMPIRE 171*4882a593Smuzhiyun====== ======================================================= 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunOther Fictional and Artificial Scripts 174*4882a593Smuzhiyun-------------------------------------- 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunSince the assignment of the Klingon Linux Unicode block, a registry of 177*4882a593Smuzhiyunfictional and artificial scripts has been established by John Cowan 178*4882a593Smuzhiyun<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>. 179*4882a593SmuzhiyunThe ConScript Unicode Registry is accessible at: 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun https://www.evertype.com/standards/csur/ 182*4882a593Smuzhiyun 183*4882a593SmuzhiyunThe ranges used fall at the low end of the End User Zone and can hence 184*4882a593Smuzhiyunnot be normatively assigned, but it is recommended that people who 185*4882a593Smuzhiyunwish to encode fictional scripts use these codes, in the interest of 186*4882a593Smuzhiyuninteroperability. For Klingon, CSUR has adopted the Linux encoding. 187*4882a593SmuzhiyunThe CSUR people are driving adding Tengwar and Cirth into Unicode 188*4882a593SmuzhiyunPlane 1; the addition of Klingon to Unicode Plane 1 has been rejected 189*4882a593Smuzhiyunand so the above encoding remains official. 190