1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. include:: <isonum.txt> 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun=========================================== 5*4882a593SmuzhiyunFast & Portable DES encryption & decryption 6*4882a593Smuzhiyun=========================================== 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun.. note:: 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun Below is the original README file from the descore.shar package, 11*4882a593Smuzhiyun converted to ReST format. 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun------------------------------------------------------------------------------ 14*4882a593Smuzhiyun 15*4882a593Smuzhiyundes - fast & portable DES encryption & decryption. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunCopyright |copy| 1992 Dana L. How 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunThis program is free software; you can redistribute it and/or modify 20*4882a593Smuzhiyunit under the terms of the GNU Library General Public License as published by 21*4882a593Smuzhiyunthe Free Software Foundation; either version 2 of the License, or 22*4882a593Smuzhiyun(at your option) any later version. 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunThis program is distributed in the hope that it will be useful, 25*4882a593Smuzhiyunbut WITHOUT ANY WARRANTY; without even the implied warranty of 26*4882a593SmuzhiyunMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 27*4882a593SmuzhiyunGNU Library General Public License for more details. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunYou should have received a copy of the GNU Library General Public License 30*4882a593Smuzhiyunalong with this program; if not, write to the Free Software 31*4882a593SmuzhiyunFoundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunAuthor's address: how@isl.stanford.edu 34*4882a593Smuzhiyun 35*4882a593Smuzhiyun.. README,v 1.15 1992/05/20 00:25:32 how E 36*4882a593Smuzhiyun 37*4882a593Smuzhiyun==>> To compile after untarring/unsharring, just ``make`` <<== 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunThis package was designed with the following goals: 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun1. Highest possible encryption/decryption PERFORMANCE. 42*4882a593Smuzhiyun2. PORTABILITY to any byte-addressable host with a 32bit unsigned C type 43*4882a593Smuzhiyun3. Plug-compatible replacement for KERBEROS's low-level routines. 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThis second release includes a number of performance enhancements for 46*4882a593Smuzhiyunregister-starved machines. My discussions with Richard Outerbridge, 47*4882a593Smuzhiyun71755.204@compuserve.com, sparked a number of these enhancements. 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunTo more rapidly understand the code in this package, inspect desSmallFips.i 50*4882a593Smuzhiyun(created by typing ``make``) BEFORE you tackle desCode.h. The latter is set 51*4882a593Smuzhiyunup in a parameterized fashion so it can easily be modified by speed-daemon 52*4882a593Smuzhiyunhackers in pursuit of that last microsecond. You will find it more 53*4882a593Smuzhiyunilluminating to inspect one specific implementation, 54*4882a593Smuzhiyunand then move on to the common abstract skeleton with this one in mind. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun 57*4882a593Smuzhiyunperformance comparison to other available des code which i could 58*4882a593Smuzhiyuncompile on a SPARCStation 1 (cc -O4, gcc -O2): 59*4882a593Smuzhiyun 60*4882a593Smuzhiyunthis code (byte-order independent): 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun - 30us per encryption (options: 64k tables, no IP/FP) 63*4882a593Smuzhiyun - 33us per encryption (options: 64k tables, FIPS standard bit ordering) 64*4882a593Smuzhiyun - 45us per encryption (options: 2k tables, no IP/FP) 65*4882a593Smuzhiyun - 48us per encryption (options: 2k tables, FIPS standard bit ordering) 66*4882a593Smuzhiyun - 275us to set a new key (uses 1k of key tables) 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun this has the quickest encryption/decryption routines i've seen. 69*4882a593Smuzhiyun since i was interested in fast des filters rather than crypt(3) 70*4882a593Smuzhiyun and password cracking, i haven't really bothered yet to speed up 71*4882a593Smuzhiyun the key setting routine. also, i have no interest in re-implementing 72*4882a593Smuzhiyun all the other junk in the mit kerberos des library, so i've just 73*4882a593Smuzhiyun provided my routines with little stub interfaces so they can be 74*4882a593Smuzhiyun used as drop-in replacements with mit's code or any of the mit- 75*4882a593Smuzhiyun compatible packages below. (note that the first two timings above 76*4882a593Smuzhiyun are highly variable because of cache effects). 77*4882a593Smuzhiyun 78*4882a593Smuzhiyunkerberos des replacement from australia (version 1.95): 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun - 53us per encryption (uses 2k of tables) 81*4882a593Smuzhiyun - 96us to set a new key (uses 2.25k of key tables) 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun so despite the author's inclusion of some of the performance 84*4882a593Smuzhiyun improvements i had suggested to him, this package's 85*4882a593Smuzhiyun encryption/decryption is still slower on the sparc and 68000. 86*4882a593Smuzhiyun more specifically, 19-40% slower on the 68020 and 11-35% slower 87*4882a593Smuzhiyun on the sparc, depending on the compiler; 88*4882a593Smuzhiyun in full gory detail (ALT_ECB is a libdes variant): 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun =============== ============== =============== ================= 91*4882a593Smuzhiyun compiler machine desCore libdes ALT_ECB slower by 92*4882a593Smuzhiyun =============== ============== =============== ================= 93*4882a593Smuzhiyun gcc 2.1 -O2 Sun 3/110 304 uS 369.5uS 461.8uS 22% 94*4882a593Smuzhiyun cc -O1 Sun 3/110 336 uS 436.6uS 399.3uS 19% 95*4882a593Smuzhiyun cc -O2 Sun 3/110 360 uS 532.4uS 505.1uS 40% 96*4882a593Smuzhiyun cc -O4 Sun 3/110 365 uS 532.3uS 505.3uS 38% 97*4882a593Smuzhiyun gcc 2.1 -O2 Sun 4/50 48 uS 53.4uS 57.5uS 11% 98*4882a593Smuzhiyun cc -O2 Sun 4/50 48 uS 64.6uS 64.7uS 35% 99*4882a593Smuzhiyun cc -O4 Sun 4/50 48 uS 64.7uS 64.9uS 35% 100*4882a593Smuzhiyun =============== ============== =============== ================= 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun (my time measurements are not as accurate as his). 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun the comments in my first release of desCore on version 1.92: 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun - 68us per encryption (uses 2k of tables) 107*4882a593Smuzhiyun - 96us to set a new key (uses 2.25k of key tables) 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun this is a very nice package which implements the most important 110*4882a593Smuzhiyun of the optimizations which i did in my encryption routines. 111*4882a593Smuzhiyun it's a bit weak on common low-level optimizations which is why 112*4882a593Smuzhiyun it's 39%-106% slower. because he was interested in fast crypt(3) and 113*4882a593Smuzhiyun password-cracking applications, he also used the same ideas to 114*4882a593Smuzhiyun speed up the key-setting routines with impressive results. 115*4882a593Smuzhiyun (at some point i may do the same in my package). he also implements 116*4882a593Smuzhiyun the rest of the mit des library. 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun (code from eay@psych.psy.uq.oz.au via comp.sources.misc) 119*4882a593Smuzhiyun 120*4882a593Smuzhiyunfast crypt(3) package from denmark: 121*4882a593Smuzhiyun 122*4882a593Smuzhiyun the des routine here is buried inside a loop to do the 123*4882a593Smuzhiyun crypt function and i didn't feel like ripping it out and measuring 124*4882a593Smuzhiyun performance. his code takes 26 sparc instructions to compute one 125*4882a593Smuzhiyun des iteration; above, Quick (64k) takes 21 and Small (2k) takes 37. 126*4882a593Smuzhiyun he claims to use 280k of tables but the iteration calculation seems 127*4882a593Smuzhiyun to use only 128k. his tables and code are machine independent. 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun (code from glad@daimi.aau.dk via alt.sources or comp.sources.misc) 130*4882a593Smuzhiyun 131*4882a593Smuzhiyunswedish reimplementation of Kerberos des library 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun - 108us per encryption (uses 34k worth of tables) 134*4882a593Smuzhiyun - 134us to set a new key (uses 32k of key tables to get this speed!) 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun the tables used seem to be machine-independent; 137*4882a593Smuzhiyun he seems to have included a lot of special case code 138*4882a593Smuzhiyun so that, e.g., ``long`` loads can be used instead of 4 ``char`` loads 139*4882a593Smuzhiyun when the machine's architecture allows it. 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun (code obtained from chalmers.se:pub/des) 142*4882a593Smuzhiyun 143*4882a593Smuzhiyuncrack 3.3c package from england: 144*4882a593Smuzhiyun 145*4882a593Smuzhiyun as in crypt above, the des routine is buried in a loop. it's 146*4882a593Smuzhiyun also very modified for crypt. his iteration code uses 16k 147*4882a593Smuzhiyun of tables and appears to be slow. 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun (code obtained from aem@aber.ac.uk via alt.sources or comp.sources.misc) 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun``highly optimized`` and tweaked Kerberos/Athena code (byte-order dependent): 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun - 165us per encryption (uses 6k worth of tables) 154*4882a593Smuzhiyun - 478us to set a new key (uses <1k of key tables) 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun so despite the comments in this code, it was possible to get 157*4882a593Smuzhiyun faster code AND smaller tables, as well as making the tables 158*4882a593Smuzhiyun machine-independent. 159*4882a593Smuzhiyun (code obtained from prep.ai.mit.edu) 160*4882a593Smuzhiyun 161*4882a593SmuzhiyunUC Berkeley code (depends on machine-endedness): 162*4882a593Smuzhiyun - 226us per encryption 163*4882a593Smuzhiyun - 10848us to set a new key 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun table sizes are unclear, but they don't look very small 166*4882a593Smuzhiyun (code obtained from wuarchive.wustl.edu) 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun 169*4882a593Smuzhiyunmotivation and history 170*4882a593Smuzhiyun====================== 171*4882a593Smuzhiyun 172*4882a593Smuzhiyuna while ago i wanted some des routines and the routines documented on sun's 173*4882a593Smuzhiyunman pages either didn't exist or dumped core. i had heard of kerberos, 174*4882a593Smuzhiyunand knew that it used des, so i figured i'd use its routines. but once 175*4882a593Smuzhiyuni got it and looked at the code, it really set off a lot of pet peeves - 176*4882a593Smuzhiyunit was too convoluted, the code had been written without taking 177*4882a593Smuzhiyunadvantage of the regular structure of operations such as IP, E, and FP 178*4882a593Smuzhiyun(i.e. the author didn't sit down and think before coding), 179*4882a593Smuzhiyunit was excessively slow, the author had attempted to clarify the code 180*4882a593Smuzhiyunby adding MORE statements to make the data movement more ``consistent`` 181*4882a593Smuzhiyuninstead of simplifying his implementation and cutting down on all data 182*4882a593Smuzhiyunmovement (in particular, his use of L1, R1, L2, R2), and it was full of 183*4882a593Smuzhiyunidiotic ``tweaks`` for particular machines which failed to deliver significant 184*4882a593Smuzhiyunspeedups but which did obfuscate everything. so i took the test data 185*4882a593Smuzhiyunfrom his verification program and rewrote everything else. 186*4882a593Smuzhiyun 187*4882a593Smuzhiyuna while later i ran across the great crypt(3) package mentioned above. 188*4882a593Smuzhiyunthe fact that this guy was computing 2 sboxes per table lookup rather 189*4882a593Smuzhiyunthan one (and using a MUCH larger table in the process) emboldened me to 190*4882a593Smuzhiyundo the same - it was a trivial change from which i had been scared away 191*4882a593Smuzhiyunby the larger table size. in his case he didn't realize you don't need to keep 192*4882a593Smuzhiyunthe working data in TWO forms, one for easy use of half the sboxes in 193*4882a593Smuzhiyunindexing, the other for easy use of the other half; instead you can keep 194*4882a593Smuzhiyunit in the form for the first half and use a simple rotate to get the other 195*4882a593Smuzhiyunhalf. this means i have (almost) half the data manipulation and half 196*4882a593Smuzhiyunthe table size. in fairness though he might be encoding something particular 197*4882a593Smuzhiyunto crypt(3) in his tables - i didn't check. 198*4882a593Smuzhiyun 199*4882a593Smuzhiyuni'm glad that i implemented it the way i did, because this C version is 200*4882a593Smuzhiyunportable (the ifdef's are performance enhancements) and it is faster 201*4882a593Smuzhiyunthan versions hand-written in assembly for the sparc! 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun 204*4882a593Smuzhiyunporting notes 205*4882a593Smuzhiyun============= 206*4882a593Smuzhiyun 207*4882a593Smuzhiyunone thing i did not want to do was write an enormous mess 208*4882a593Smuzhiyunwhich depended on endedness and other machine quirks, 209*4882a593Smuzhiyunand which necessarily produced different code and different lookup tables 210*4882a593Smuzhiyunfor different machines. see the kerberos code for an example 211*4882a593Smuzhiyunof what i didn't want to do; all their endedness-specific ``optimizations`` 212*4882a593Smuzhiyunobfuscate the code and in the end were slower than a simpler machine 213*4882a593Smuzhiyunindependent approach. however, there are always some portability 214*4882a593Smuzhiyunconsiderations of some kind, and i have included some options 215*4882a593Smuzhiyunfor varying numbers of register variables. 216*4882a593Smuzhiyunperhaps some will still regard the result as a mess! 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun1) i assume everything is byte addressable, although i don't actually 219*4882a593Smuzhiyun depend on the byte order, and that bytes are 8 bits. 220*4882a593Smuzhiyun i assume word pointers can be freely cast to and from char pointers. 221*4882a593Smuzhiyun note that 99% of C programs make these assumptions. 222*4882a593Smuzhiyun i always use unsigned char's if the high bit could be set. 223*4882a593Smuzhiyun2) the typedef ``word`` means a 32 bit unsigned integral type. 224*4882a593Smuzhiyun if ``unsigned long`` is not 32 bits, change the typedef in desCore.h. 225*4882a593Smuzhiyun i assume sizeof(word) == 4 EVERYWHERE. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyunthe (worst-case) cost of my NOT doing endedness-specific optimizations 228*4882a593Smuzhiyunin the data loading and storing code surrounding the key iterations 229*4882a593Smuzhiyunis less than 12%. also, there is the added benefit that 230*4882a593Smuzhiyunthe input and output work areas do not need to be word-aligned. 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun 233*4882a593SmuzhiyunOPTIONAL performance optimizations 234*4882a593Smuzhiyun================================== 235*4882a593Smuzhiyun 236*4882a593Smuzhiyun1) you should define one of ``i386,`` ``vax,`` ``mc68000,`` or ``sparc,`` 237*4882a593Smuzhiyun whichever one is closest to the capabilities of your machine. 238*4882a593Smuzhiyun see the start of desCode.h to see exactly what this selection implies. 239*4882a593Smuzhiyun note that if you select the wrong one, the des code will still work; 240*4882a593Smuzhiyun these are just performance tweaks. 241*4882a593Smuzhiyun2) for those with functional ``asm`` keywords: you should change the 242*4882a593Smuzhiyun ROR and ROL macros to use machine rotate instructions if you have them. 243*4882a593Smuzhiyun this will save 2 instructions and a temporary per use, 244*4882a593Smuzhiyun or about 32 to 40 instructions per en/decryption. 245*4882a593Smuzhiyun 246*4882a593Smuzhiyun note that gcc is smart enough to translate the ROL/R macros into 247*4882a593Smuzhiyun machine rotates! 248*4882a593Smuzhiyun 249*4882a593Smuzhiyunthese optimizations are all rather persnickety, yet with them you should 250*4882a593Smuzhiyunbe able to get performance equal to assembly-coding, except that: 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun1) with the lack of a bit rotate operator in C, rotates have to be synthesized 253*4882a593Smuzhiyun from shifts. so access to ``asm`` will speed things up if your machine 254*4882a593Smuzhiyun has rotates, as explained above in (3) (not necessary if you use gcc). 255*4882a593Smuzhiyun2) if your machine has less than 12 32-bit registers i doubt your compiler will 256*4882a593Smuzhiyun generate good code. 257*4882a593Smuzhiyun 258*4882a593Smuzhiyun ``i386`` tries to configure the code for a 386 by only declaring 3 registers 259*4882a593Smuzhiyun (it appears that gcc can use ebx, esi and edi to hold register variables). 260*4882a593Smuzhiyun however, if you like assembly coding, the 386 does have 7 32-bit registers, 261*4882a593Smuzhiyun and if you use ALL of them, use ``scaled by 8`` address modes with displacement 262*4882a593Smuzhiyun and other tricks, you can get reasonable routines for DesQuickCore... with 263*4882a593Smuzhiyun about 250 instructions apiece. For DesSmall... it will help to rearrange 264*4882a593Smuzhiyun des_keymap, i.e., now the sbox # is the high part of the index and 265*4882a593Smuzhiyun the 6 bits of data is the low part; it helps to exchange these. 266*4882a593Smuzhiyun 267*4882a593Smuzhiyun since i have no way to conveniently test it i have not provided my 268*4882a593Smuzhiyun shoehorned 386 version. note that with this release of desCore, gcc is able 269*4882a593Smuzhiyun to put everything in registers(!), and generate about 370 instructions apiece 270*4882a593Smuzhiyun for the DesQuickCore... routines! 271*4882a593Smuzhiyun 272*4882a593Smuzhiyuncoding notes 273*4882a593Smuzhiyun============ 274*4882a593Smuzhiyun 275*4882a593Smuzhiyunthe en/decryption routines each use 6 necessary register variables, 276*4882a593Smuzhiyunwith 4 being actively used at once during the inner iterations. 277*4882a593Smuzhiyunif you don't have 4 register variables get a new machine. 278*4882a593Smuzhiyunup to 8 more registers are used to hold constants in some configurations. 279*4882a593Smuzhiyun 280*4882a593Smuzhiyuni assume that the use of a constant is more expensive than using a register: 281*4882a593Smuzhiyun 282*4882a593Smuzhiyuna) additionally, i have tried to put the larger constants in registers. 283*4882a593Smuzhiyun registering priority was by the following: 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun - anything more than 12 bits (bad for RISC and CISC) 286*4882a593Smuzhiyun - greater than 127 in value (can't use movq or byte immediate on CISC) 287*4882a593Smuzhiyun - 9-127 (may not be able to use CISC shift immediate or add/sub quick), 288*4882a593Smuzhiyun - 1-8 were never registered, being the cheapest constants. 289*4882a593Smuzhiyun 290*4882a593Smuzhiyunb) the compiler may be too stupid to realize table and table+256 should 291*4882a593Smuzhiyun be assigned to different constant registers and instead repetitively 292*4882a593Smuzhiyun do the arithmetic, so i assign these to explicit ``m`` register variables 293*4882a593Smuzhiyun when possible and helpful. 294*4882a593Smuzhiyun 295*4882a593Smuzhiyuni assume that indexing is cheaper or equivalent to auto increment/decrement, 296*4882a593Smuzhiyunwhere the index is 7 bits unsigned or smaller. 297*4882a593Smuzhiyunthis assumption is reversed for 68k and vax. 298*4882a593Smuzhiyun 299*4882a593Smuzhiyuni assume that addresses can be cheaply formed from two registers, 300*4882a593Smuzhiyunor from a register and a small constant. 301*4882a593Smuzhiyunfor the 68000, the ``two registers and small offset`` form is used sparingly. 302*4882a593Smuzhiyunall index scaling is done explicitly - no hidden shifts by log2(sizeof). 303*4882a593Smuzhiyun 304*4882a593Smuzhiyunthe code is written so that even a dumb compiler 305*4882a593Smuzhiyunshould never need more than one hidden temporary, 306*4882a593Smuzhiyunincreasing the chance that everything will fit in the registers. 307*4882a593SmuzhiyunKEEP THIS MORE SUBTLE POINT IN MIND IF YOU REWRITE ANYTHING. 308*4882a593Smuzhiyun 309*4882a593Smuzhiyun(actually, there are some code fragments now which do require two temps, 310*4882a593Smuzhiyunbut fixing it would either break the structure of the macros or 311*4882a593Smuzhiyunrequire declaring another temporary). 312*4882a593Smuzhiyun 313*4882a593Smuzhiyun 314*4882a593Smuzhiyunspecial efficient data format 315*4882a593Smuzhiyun============================== 316*4882a593Smuzhiyun 317*4882a593Smuzhiyunbits are manipulated in this arrangement most of the time (S7 S5 S3 S1):: 318*4882a593Smuzhiyun 319*4882a593Smuzhiyun 003130292827xxxx242322212019xxxx161514131211xxxx080706050403xxxx 320*4882a593Smuzhiyun 321*4882a593Smuzhiyun(the x bits are still there, i'm just emphasizing where the S boxes are). 322*4882a593Smuzhiyunbits are rotated left 4 when computing S6 S4 S2 S0:: 323*4882a593Smuzhiyun 324*4882a593Smuzhiyun 282726252423xxxx201918171615xxxx121110090807xxxx040302010031xxxx 325*4882a593Smuzhiyun 326*4882a593Smuzhiyunthe rightmost two bits are usually cleared so the lower byte can be used 327*4882a593Smuzhiyunas an index into an sbox mapping table. the next two x'd bits are set 328*4882a593Smuzhiyunto various values to access different parts of the tables. 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun 331*4882a593Smuzhiyunhow to use the routines 332*4882a593Smuzhiyun 333*4882a593Smuzhiyundatatypes: 334*4882a593Smuzhiyun pointer to 8 byte area of type DesData 335*4882a593Smuzhiyun used to hold keys and input/output blocks to des. 336*4882a593Smuzhiyun 337*4882a593Smuzhiyun pointer to 128 byte area of type DesKeys 338*4882a593Smuzhiyun used to hold full 768-bit key. 339*4882a593Smuzhiyun must be long-aligned. 340*4882a593Smuzhiyun 341*4882a593SmuzhiyunDesQuickInit() 342*4882a593Smuzhiyun call this before using any other routine with ``Quick`` in its name. 343*4882a593Smuzhiyun it generates the special 64k table these routines need. 344*4882a593SmuzhiyunDesQuickDone() 345*4882a593Smuzhiyun frees this table 346*4882a593Smuzhiyun 347*4882a593SmuzhiyunDesMethod(m, k) 348*4882a593Smuzhiyun m points to a 128byte block, k points to an 8 byte des key 349*4882a593Smuzhiyun which must have odd parity (or -1 is returned) and which must 350*4882a593Smuzhiyun not be a (semi-)weak key (or -2 is returned). 351*4882a593Smuzhiyun normally DesMethod() returns 0. 352*4882a593Smuzhiyun 353*4882a593Smuzhiyun m is filled in from k so that when one of the routines below 354*4882a593Smuzhiyun is called with m, the routine will act like standard des 355*4882a593Smuzhiyun en/decryption with the key k. if you use DesMethod, 356*4882a593Smuzhiyun you supply a standard 56bit key; however, if you fill in 357*4882a593Smuzhiyun m yourself, you will get a 768bit key - but then it won't 358*4882a593Smuzhiyun be standard. it's 768bits not 1024 because the least significant 359*4882a593Smuzhiyun two bits of each byte are not used. note that these two bits 360*4882a593Smuzhiyun will be set to magic constants which speed up the encryption/decryption 361*4882a593Smuzhiyun on some machines. and yes, each byte controls 362*4882a593Smuzhiyun a specific sbox during a specific iteration. 363*4882a593Smuzhiyun 364*4882a593Smuzhiyun you really shouldn't use the 768bit format directly; i should 365*4882a593Smuzhiyun provide a routine that converts 128 6-bit bytes (specified in 366*4882a593Smuzhiyun S-box mapping order or something) into the right format for you. 367*4882a593Smuzhiyun this would entail some byte concatenation and rotation. 368*4882a593Smuzhiyun 369*4882a593SmuzhiyunDes{Small|Quick}{Fips|Core}{Encrypt|Decrypt}(d, m, s) 370*4882a593Smuzhiyun performs des on the 8 bytes at s into the 8 bytes at 371*4882a593Smuzhiyun ``d. (d,s: char *)``. 372*4882a593Smuzhiyun 373*4882a593Smuzhiyun uses m as a 768bit key as explained above. 374*4882a593Smuzhiyun 375*4882a593Smuzhiyun the Encrypt|Decrypt choice is obvious. 376*4882a593Smuzhiyun 377*4882a593Smuzhiyun Fips|Core determines whether a completely standard FIPS initial 378*4882a593Smuzhiyun and final permutation is done; if not, then the data is loaded 379*4882a593Smuzhiyun and stored in a nonstandard bit order (FIPS w/o IP/FP). 380*4882a593Smuzhiyun 381*4882a593Smuzhiyun Fips slows down Quick by 10%, Small by 9%. 382*4882a593Smuzhiyun 383*4882a593Smuzhiyun Small|Quick determines whether you use the normal routine 384*4882a593Smuzhiyun or the crazy quick one which gobbles up 64k more of memory. 385*4882a593Smuzhiyun Small is 50% slower then Quick, but Quick needs 32 times as much 386*4882a593Smuzhiyun memory. Quick is included for programs that do nothing but DES, 387*4882a593Smuzhiyun e.g., encryption filters, etc. 388*4882a593Smuzhiyun 389*4882a593Smuzhiyun 390*4882a593SmuzhiyunGetting it to compile on your machine 391*4882a593Smuzhiyun===================================== 392*4882a593Smuzhiyun 393*4882a593Smuzhiyunthere are no machine-dependencies in the code (see porting), 394*4882a593Smuzhiyunexcept perhaps the ``now()`` macro in desTest.c. 395*4882a593SmuzhiyunALL generated tables are machine independent. 396*4882a593Smuzhiyunyou should edit the Makefile with the appropriate optimization flags 397*4882a593Smuzhiyunfor your compiler (MAX optimization). 398*4882a593Smuzhiyun 399*4882a593Smuzhiyun 400*4882a593SmuzhiyunSpeeding up kerberos (and/or its des library) 401*4882a593Smuzhiyun============================================= 402*4882a593Smuzhiyun 403*4882a593Smuzhiyunnote that i have included a kerberos-compatible interface in desUtil.c 404*4882a593Smuzhiyunthrough the functions des_key_sched() and des_ecb_encrypt(). 405*4882a593Smuzhiyunto use these with kerberos or kerberos-compatible code put desCore.a 406*4882a593Smuzhiyunahead of the kerberos-compatible library on your linker's command line. 407*4882a593Smuzhiyunyou should not need to #include desCore.h; just include the header 408*4882a593Smuzhiyunfile provided with the kerberos library. 409*4882a593Smuzhiyun 410*4882a593SmuzhiyunOther uses 411*4882a593Smuzhiyun========== 412*4882a593Smuzhiyun 413*4882a593Smuzhiyunthe macros in desCode.h would be very useful for putting inline des 414*4882a593Smuzhiyunfunctions in more complicated encryption routines. 415