1*4882a593SmuzhiyunEditors note: This document is _heavily_ cribbed from the Linux Kernel, with 2*4882a593Smuzhiyunreally only the section about "Alignment vs. Networking" removed. 3*4882a593Smuzhiyun 4*4882a593SmuzhiyunUNALIGNED MEMORY ACCESSES 5*4882a593Smuzhiyun========================= 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunLinux runs on a wide variety of architectures which have varying behaviour 8*4882a593Smuzhiyunwhen it comes to memory access. This document presents some details about 9*4882a593Smuzhiyununaligned accesses, why you need to write code that doesn't cause them, 10*4882a593Smuzhiyunand how to write such code! 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunThe definition of an unaligned access 14*4882a593Smuzhiyun===================================== 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunUnaligned memory accesses occur when you try to read N bytes of data starting 17*4882a593Smuzhiyunfrom an address that is not evenly divisible by N (i.e. addr % N != 0). 18*4882a593SmuzhiyunFor example, reading 4 bytes of data from address 0x10004 is fine, but 19*4882a593Smuzhiyunreading 4 bytes of data from address 0x10005 would be an unaligned memory 20*4882a593Smuzhiyunaccess. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunThe above may seem a little vague, as memory access can happen in different 23*4882a593Smuzhiyunways. The context here is at the machine code level: certain instructions read 24*4882a593Smuzhiyunor write a number of bytes to or from memory (e.g. movb, movw, movl in x86 25*4882a593Smuzhiyunassembly). As will become clear, it is relatively easy to spot C statements 26*4882a593Smuzhiyunwhich will compile to multiple-byte memory access instructions, namely when 27*4882a593Smuzhiyundealing with types such as u16, u32 and u64. 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunNatural alignment 31*4882a593Smuzhiyun================= 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunThe rule mentioned above forms what we refer to as natural alignment: 34*4882a593SmuzhiyunWhen accessing N bytes of memory, the base memory address must be evenly 35*4882a593Smuzhiyundivisible by N, i.e. addr % N == 0. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunWhen writing code, assume the target architecture has natural alignment 38*4882a593Smuzhiyunrequirements. 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunIn reality, only a few architectures require natural alignment on all sizes 41*4882a593Smuzhiyunof memory access. However, we must consider ALL supported architectures; 42*4882a593Smuzhiyunwriting code that satisfies natural alignment requirements is the easiest way 43*4882a593Smuzhiyunto achieve full portability. 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunWhy unaligned access is bad 47*4882a593Smuzhiyun=========================== 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunThe effects of performing an unaligned memory access vary from architecture 50*4882a593Smuzhiyunto architecture. It would be easy to write a whole document on the differences 51*4882a593Smuzhiyunhere; a summary of the common scenarios is presented below: 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun - Some architectures are able to perform unaligned memory accesses 54*4882a593Smuzhiyun transparently, but there is usually a significant performance cost. 55*4882a593Smuzhiyun - Some architectures raise processor exceptions when unaligned accesses 56*4882a593Smuzhiyun happen. The exception handler is able to correct the unaligned access, 57*4882a593Smuzhiyun at significant cost to performance. 58*4882a593Smuzhiyun - Some architectures raise processor exceptions when unaligned accesses 59*4882a593Smuzhiyun happen, but the exceptions do not contain enough information for the 60*4882a593Smuzhiyun unaligned access to be corrected. 61*4882a593Smuzhiyun - Some architectures are not capable of unaligned memory access, but will 62*4882a593Smuzhiyun silently perform a different memory access to the one that was requested, 63*4882a593Smuzhiyun resulting in a subtle code bug that is hard to detect! 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunIt should be obvious from the above that if your code causes unaligned 66*4882a593Smuzhiyunmemory accesses to happen, your code will not work correctly on certain 67*4882a593Smuzhiyunplatforms and will cause performance problems on others. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunCode that does not cause unaligned access 71*4882a593Smuzhiyun========================================= 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunAt first, the concepts above may seem a little hard to relate to actual 74*4882a593Smuzhiyuncoding practice. After all, you don't have a great deal of control over 75*4882a593Smuzhiyunmemory addresses of certain variables, etc. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunFortunately things are not too complex, as in most cases, the compiler 78*4882a593Smuzhiyunensures that things will work for you. For example, take the following 79*4882a593Smuzhiyunstructure: 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun struct foo { 82*4882a593Smuzhiyun u16 field1; 83*4882a593Smuzhiyun u32 field2; 84*4882a593Smuzhiyun u8 field3; 85*4882a593Smuzhiyun }; 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunLet us assume that an instance of the above structure resides in memory 88*4882a593Smuzhiyunstarting at address 0x10000. With a basic level of understanding, it would 89*4882a593Smuzhiyunnot be unreasonable to expect that accessing field2 would cause an unaligned 90*4882a593Smuzhiyunaccess. You'd be expecting field2 to be located at offset 2 bytes into the 91*4882a593Smuzhiyunstructure, i.e. address 0x10002, but that address is not evenly divisible 92*4882a593Smuzhiyunby 4 (remember, we're reading a 4 byte value here). 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunFortunately, the compiler understands the alignment constraints, so in the 95*4882a593Smuzhiyunabove case it would insert 2 bytes of padding in between field1 and field2. 96*4882a593SmuzhiyunTherefore, for standard structure types you can always rely on the compiler 97*4882a593Smuzhiyunto pad structures so that accesses to fields are suitably aligned (assuming 98*4882a593Smuzhiyunyou do not cast the field to a type of different length). 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunSimilarly, you can also rely on the compiler to align variables and function 101*4882a593Smuzhiyunparameters to a naturally aligned scheme, based on the size of the type of 102*4882a593Smuzhiyunthe variable. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunAt this point, it should be clear that accessing a single byte (u8 or char) 105*4882a593Smuzhiyunwill never cause an unaligned access, because all memory addresses are evenly 106*4882a593Smuzhiyundivisible by one. 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunOn a related topic, with the above considerations in mind you may observe 109*4882a593Smuzhiyunthat you could reorder the fields in the structure in order to place fields 110*4882a593Smuzhiyunwhere padding would otherwise be inserted, and hence reduce the overall 111*4882a593Smuzhiyunresident memory size of structure instances. The optimal layout of the 112*4882a593Smuzhiyunabove example is: 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun struct foo { 115*4882a593Smuzhiyun u32 field2; 116*4882a593Smuzhiyun u16 field1; 117*4882a593Smuzhiyun u8 field3; 118*4882a593Smuzhiyun }; 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunFor a natural alignment scheme, the compiler would only have to add a single 121*4882a593Smuzhiyunbyte of padding at the end of the structure. This padding is added in order 122*4882a593Smuzhiyunto satisfy alignment constraints for arrays of these structures. 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunAnother point worth mentioning is the use of __attribute__((packed)) on a 125*4882a593Smuzhiyunstructure type. This GCC-specific attribute tells the compiler never to 126*4882a593Smuzhiyuninsert any padding within structures, useful when you want to use a C struct 127*4882a593Smuzhiyunto represent some data that comes in a fixed arrangement 'off the wire'. 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunYou might be inclined to believe that usage of this attribute can easily 130*4882a593Smuzhiyunlead to unaligned accesses when accessing fields that do not satisfy 131*4882a593Smuzhiyunarchitectural alignment requirements. However, again, the compiler is aware 132*4882a593Smuzhiyunof the alignment constraints and will generate extra instructions to perform 133*4882a593Smuzhiyunthe memory access in a way that does not cause unaligned access. Of course, 134*4882a593Smuzhiyunthe extra instructions obviously cause a loss in performance compared to the 135*4882a593Smuzhiyunnon-packed case, so the packed attribute should only be used when avoiding 136*4882a593Smuzhiyunstructure padding is of importance. 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunCode that causes unaligned access 140*4882a593Smuzhiyun================================= 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunWith the above in mind, let's move onto a real life example of a function 143*4882a593Smuzhiyunthat can cause an unaligned memory access. The following function taken 144*4882a593Smuzhiyunfrom the Linux Kernel's include/linux/etherdevice.h is an optimized routine 145*4882a593Smuzhiyunto compare two ethernet MAC addresses for equality. 146*4882a593Smuzhiyun 147*4882a593Smuzhiyunbool ether_addr_equal(const u8 *addr1, const u8 *addr2) 148*4882a593Smuzhiyun{ 149*4882a593Smuzhiyun#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 150*4882a593Smuzhiyun u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | 151*4882a593Smuzhiyun ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun return fold == 0; 154*4882a593Smuzhiyun#else 155*4882a593Smuzhiyun const u16 *a = (const u16 *)addr1; 156*4882a593Smuzhiyun const u16 *b = (const u16 *)addr2; 157*4882a593Smuzhiyun return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; 158*4882a593Smuzhiyun#endif 159*4882a593Smuzhiyun} 160*4882a593Smuzhiyun 161*4882a593SmuzhiyunIn the above function, when the hardware has efficient unaligned access 162*4882a593Smuzhiyuncapability, there is no issue with this code. But when the hardware isn't 163*4882a593Smuzhiyunable to access memory on arbitrary boundaries, the reference to a[0] causes 164*4882a593Smuzhiyun2 bytes (16 bits) to be read from memory starting at address addr1. 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunThink about what would happen if addr1 was an odd address such as 0x10003. 167*4882a593Smuzhiyun(Hint: it'd be an unaligned access.) 168*4882a593Smuzhiyun 169*4882a593SmuzhiyunDespite the potential unaligned access problems with the above function, it 170*4882a593Smuzhiyunis included in the kernel anyway but is understood to only work normally on 171*4882a593Smuzhiyun16-bit-aligned addresses. It is up to the caller to ensure this alignment or 172*4882a593Smuzhiyunnot use this function at all. This alignment-unsafe function is still useful 173*4882a593Smuzhiyunas it is a decent optimization for the cases when you can ensure alignment, 174*4882a593Smuzhiyunwhich is true almost all of the time in ethernet networking context. 175*4882a593Smuzhiyun 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunHere is another example of some code that could cause unaligned accesses: 178*4882a593Smuzhiyun void myfunc(u8 *data, u32 value) 179*4882a593Smuzhiyun { 180*4882a593Smuzhiyun [...] 181*4882a593Smuzhiyun *((u32 *) data) = cpu_to_le32(value); 182*4882a593Smuzhiyun [...] 183*4882a593Smuzhiyun } 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunThis code will cause unaligned accesses every time the data parameter points 186*4882a593Smuzhiyunto an address that is not evenly divisible by 4. 187*4882a593Smuzhiyun 188*4882a593SmuzhiyunIn summary, the 2 main scenarios where you may run into unaligned access 189*4882a593Smuzhiyunproblems involve: 190*4882a593Smuzhiyun 1. Casting variables to types of different lengths 191*4882a593Smuzhiyun 2. Pointer arithmetic followed by access to at least 2 bytes of data 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunAvoiding unaligned accesses 195*4882a593Smuzhiyun=========================== 196*4882a593Smuzhiyun 197*4882a593SmuzhiyunThe easiest way to avoid unaligned access is to use the get_unaligned() and 198*4882a593Smuzhiyunput_unaligned() macros provided by the <asm/unaligned.h> header file. 199*4882a593Smuzhiyun 200*4882a593SmuzhiyunGoing back to an earlier example of code that potentially causes unaligned 201*4882a593Smuzhiyunaccess: 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun void myfunc(u8 *data, u32 value) 204*4882a593Smuzhiyun { 205*4882a593Smuzhiyun [...] 206*4882a593Smuzhiyun *((u32 *) data) = cpu_to_le32(value); 207*4882a593Smuzhiyun [...] 208*4882a593Smuzhiyun } 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunTo avoid the unaligned memory access, you would rewrite it as follows: 211*4882a593Smuzhiyun 212*4882a593Smuzhiyun void myfunc(u8 *data, u32 value) 213*4882a593Smuzhiyun { 214*4882a593Smuzhiyun [...] 215*4882a593Smuzhiyun value = cpu_to_le32(value); 216*4882a593Smuzhiyun put_unaligned(value, (u32 *) data); 217*4882a593Smuzhiyun [...] 218*4882a593Smuzhiyun } 219*4882a593Smuzhiyun 220*4882a593SmuzhiyunThe get_unaligned() macro works similarly. Assuming 'data' is a pointer to 221*4882a593Smuzhiyunmemory and you wish to avoid unaligned access, its usage is as follows: 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun u32 value = get_unaligned((u32 *) data); 224*4882a593Smuzhiyun 225*4882a593SmuzhiyunThese macros work for memory accesses of any length (not just 32 bits as 226*4882a593Smuzhiyunin the examples above). Be aware that when compared to standard access of 227*4882a593Smuzhiyunaligned memory, using these macros to access unaligned memory can be costly in 228*4882a593Smuzhiyunterms of performance. 229*4882a593Smuzhiyun 230*4882a593SmuzhiyunIf use of such macros is not convenient, another option is to use memcpy(), 231*4882a593Smuzhiyunwhere the source or destination (or both) are of type u8* or unsigned char*. 232*4882a593SmuzhiyunDue to the byte-wise nature of this operation, unaligned accesses are avoided. 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun-- 235*4882a593SmuzhiyunIn the Linux Kernel, 236*4882a593SmuzhiyunAuthors: Daniel Drake <dsd@gentoo.org>, 237*4882a593Smuzhiyun Johannes Berg <johannes@sipsolutions.net> 238*4882a593SmuzhiyunWith help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, 239*4882a593SmuzhiyunKyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, 240*4882a593SmuzhiyunVadim Lobanov 241