xref: /OK3568_Linux_fs/u-boot/doc/README.unaligned-memory-access.txt (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593SmuzhiyunEditors note: This document is _heavily_ cribbed from the Linux Kernel, with
2*4882a593Smuzhiyunreally only the section about "Alignment vs. Networking" removed.
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunUNALIGNED MEMORY ACCESSES
5*4882a593Smuzhiyun=========================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunLinux runs on a wide variety of architectures which have varying behaviour
8*4882a593Smuzhiyunwhen it comes to memory access. This document presents some details about
9*4882a593Smuzhiyununaligned accesses, why you need to write code that doesn't cause them,
10*4882a593Smuzhiyunand how to write such code!
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunThe definition of an unaligned access
14*4882a593Smuzhiyun=====================================
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunUnaligned memory accesses occur when you try to read N bytes of data starting
17*4882a593Smuzhiyunfrom an address that is not evenly divisible by N (i.e. addr % N != 0).
18*4882a593SmuzhiyunFor example, reading 4 bytes of data from address 0x10004 is fine, but
19*4882a593Smuzhiyunreading 4 bytes of data from address 0x10005 would be an unaligned memory
20*4882a593Smuzhiyunaccess.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunThe above may seem a little vague, as memory access can happen in different
23*4882a593Smuzhiyunways. The context here is at the machine code level: certain instructions read
24*4882a593Smuzhiyunor write a number of bytes to or from memory (e.g. movb, movw, movl in x86
25*4882a593Smuzhiyunassembly). As will become clear, it is relatively easy to spot C statements
26*4882a593Smuzhiyunwhich will compile to multiple-byte memory access instructions, namely when
27*4882a593Smuzhiyundealing with types such as u16, u32 and u64.
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunNatural alignment
31*4882a593Smuzhiyun=================
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunThe rule mentioned above forms what we refer to as natural alignment:
34*4882a593SmuzhiyunWhen accessing N bytes of memory, the base memory address must be evenly
35*4882a593Smuzhiyundivisible by N, i.e. addr % N == 0.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunWhen writing code, assume the target architecture has natural alignment
38*4882a593Smuzhiyunrequirements.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunIn reality, only a few architectures require natural alignment on all sizes
41*4882a593Smuzhiyunof memory access. However, we must consider ALL supported architectures;
42*4882a593Smuzhiyunwriting code that satisfies natural alignment requirements is the easiest way
43*4882a593Smuzhiyunto achieve full portability.
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunWhy unaligned access is bad
47*4882a593Smuzhiyun===========================
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunThe effects of performing an unaligned memory access vary from architecture
50*4882a593Smuzhiyunto architecture. It would be easy to write a whole document on the differences
51*4882a593Smuzhiyunhere; a summary of the common scenarios is presented below:
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun - Some architectures are able to perform unaligned memory accesses
54*4882a593Smuzhiyun   transparently, but there is usually a significant performance cost.
55*4882a593Smuzhiyun - Some architectures raise processor exceptions when unaligned accesses
56*4882a593Smuzhiyun   happen. The exception handler is able to correct the unaligned access,
57*4882a593Smuzhiyun   at significant cost to performance.
58*4882a593Smuzhiyun - Some architectures raise processor exceptions when unaligned accesses
59*4882a593Smuzhiyun   happen, but the exceptions do not contain enough information for the
60*4882a593Smuzhiyun   unaligned access to be corrected.
61*4882a593Smuzhiyun - Some architectures are not capable of unaligned memory access, but will
62*4882a593Smuzhiyun   silently perform a different memory access to the one that was requested,
63*4882a593Smuzhiyun   resulting in a subtle code bug that is hard to detect!
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunIt should be obvious from the above that if your code causes unaligned
66*4882a593Smuzhiyunmemory accesses to happen, your code will not work correctly on certain
67*4882a593Smuzhiyunplatforms and will cause performance problems on others.
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunCode that does not cause unaligned access
71*4882a593Smuzhiyun=========================================
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunAt first, the concepts above may seem a little hard to relate to actual
74*4882a593Smuzhiyuncoding practice. After all, you don't have a great deal of control over
75*4882a593Smuzhiyunmemory addresses of certain variables, etc.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunFortunately things are not too complex, as in most cases, the compiler
78*4882a593Smuzhiyunensures that things will work for you. For example, take the following
79*4882a593Smuzhiyunstructure:
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun	struct foo {
82*4882a593Smuzhiyun		u16 field1;
83*4882a593Smuzhiyun		u32 field2;
84*4882a593Smuzhiyun		u8 field3;
85*4882a593Smuzhiyun	};
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunLet us assume that an instance of the above structure resides in memory
88*4882a593Smuzhiyunstarting at address 0x10000. With a basic level of understanding, it would
89*4882a593Smuzhiyunnot be unreasonable to expect that accessing field2 would cause an unaligned
90*4882a593Smuzhiyunaccess. You'd be expecting field2 to be located at offset 2 bytes into the
91*4882a593Smuzhiyunstructure, i.e. address 0x10002, but that address is not evenly divisible
92*4882a593Smuzhiyunby 4 (remember, we're reading a 4 byte value here).
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunFortunately, the compiler understands the alignment constraints, so in the
95*4882a593Smuzhiyunabove case it would insert 2 bytes of padding in between field1 and field2.
96*4882a593SmuzhiyunTherefore, for standard structure types you can always rely on the compiler
97*4882a593Smuzhiyunto pad structures so that accesses to fields are suitably aligned (assuming
98*4882a593Smuzhiyunyou do not cast the field to a type of different length).
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunSimilarly, you can also rely on the compiler to align variables and function
101*4882a593Smuzhiyunparameters to a naturally aligned scheme, based on the size of the type of
102*4882a593Smuzhiyunthe variable.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunAt this point, it should be clear that accessing a single byte (u8 or char)
105*4882a593Smuzhiyunwill never cause an unaligned access, because all memory addresses are evenly
106*4882a593Smuzhiyundivisible by one.
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunOn a related topic, with the above considerations in mind you may observe
109*4882a593Smuzhiyunthat you could reorder the fields in the structure in order to place fields
110*4882a593Smuzhiyunwhere padding would otherwise be inserted, and hence reduce the overall
111*4882a593Smuzhiyunresident memory size of structure instances. The optimal layout of the
112*4882a593Smuzhiyunabove example is:
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun	struct foo {
115*4882a593Smuzhiyun		u32 field2;
116*4882a593Smuzhiyun		u16 field1;
117*4882a593Smuzhiyun		u8 field3;
118*4882a593Smuzhiyun	};
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunFor a natural alignment scheme, the compiler would only have to add a single
121*4882a593Smuzhiyunbyte of padding at the end of the structure. This padding is added in order
122*4882a593Smuzhiyunto satisfy alignment constraints for arrays of these structures.
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunAnother point worth mentioning is the use of __attribute__((packed)) on a
125*4882a593Smuzhiyunstructure type. This GCC-specific attribute tells the compiler never to
126*4882a593Smuzhiyuninsert any padding within structures, useful when you want to use a C struct
127*4882a593Smuzhiyunto represent some data that comes in a fixed arrangement 'off the wire'.
128*4882a593Smuzhiyun
129*4882a593SmuzhiyunYou might be inclined to believe that usage of this attribute can easily
130*4882a593Smuzhiyunlead to unaligned accesses when accessing fields that do not satisfy
131*4882a593Smuzhiyunarchitectural alignment requirements. However, again, the compiler is aware
132*4882a593Smuzhiyunof the alignment constraints and will generate extra instructions to perform
133*4882a593Smuzhiyunthe memory access in a way that does not cause unaligned access. Of course,
134*4882a593Smuzhiyunthe extra instructions obviously cause a loss in performance compared to the
135*4882a593Smuzhiyunnon-packed case, so the packed attribute should only be used when avoiding
136*4882a593Smuzhiyunstructure padding is of importance.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunCode that causes unaligned access
140*4882a593Smuzhiyun=================================
141*4882a593Smuzhiyun
142*4882a593SmuzhiyunWith the above in mind, let's move onto a real life example of a function
143*4882a593Smuzhiyunthat can cause an unaligned memory access. The following function taken
144*4882a593Smuzhiyunfrom the Linux Kernel's include/linux/etherdevice.h is an optimized routine
145*4882a593Smuzhiyunto compare two ethernet MAC addresses for equality.
146*4882a593Smuzhiyun
147*4882a593Smuzhiyunbool ether_addr_equal(const u8 *addr1, const u8 *addr2)
148*4882a593Smuzhiyun{
149*4882a593Smuzhiyun#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
150*4882a593Smuzhiyun	u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
151*4882a593Smuzhiyun		   ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun	return fold == 0;
154*4882a593Smuzhiyun#else
155*4882a593Smuzhiyun	const u16 *a = (const u16 *)addr1;
156*4882a593Smuzhiyun	const u16 *b = (const u16 *)addr2;
157*4882a593Smuzhiyun	return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0;
158*4882a593Smuzhiyun#endif
159*4882a593Smuzhiyun}
160*4882a593Smuzhiyun
161*4882a593SmuzhiyunIn the above function, when the hardware has efficient unaligned access
162*4882a593Smuzhiyuncapability, there is no issue with this code.  But when the hardware isn't
163*4882a593Smuzhiyunable to access memory on arbitrary boundaries, the reference to a[0] causes
164*4882a593Smuzhiyun2 bytes (16 bits) to be read from memory starting at address addr1.
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunThink about what would happen if addr1 was an odd address such as 0x10003.
167*4882a593Smuzhiyun(Hint: it'd be an unaligned access.)
168*4882a593Smuzhiyun
169*4882a593SmuzhiyunDespite the potential unaligned access problems with the above function, it
170*4882a593Smuzhiyunis included in the kernel anyway but is understood to only work normally on
171*4882a593Smuzhiyun16-bit-aligned addresses. It is up to the caller to ensure this alignment or
172*4882a593Smuzhiyunnot use this function at all. This alignment-unsafe function is still useful
173*4882a593Smuzhiyunas it is a decent optimization for the cases when you can ensure alignment,
174*4882a593Smuzhiyunwhich is true almost all of the time in ethernet networking context.
175*4882a593Smuzhiyun
176*4882a593Smuzhiyun
177*4882a593SmuzhiyunHere is another example of some code that could cause unaligned accesses:
178*4882a593Smuzhiyun	void myfunc(u8 *data, u32 value)
179*4882a593Smuzhiyun	{
180*4882a593Smuzhiyun		[...]
181*4882a593Smuzhiyun		*((u32 *) data) = cpu_to_le32(value);
182*4882a593Smuzhiyun		[...]
183*4882a593Smuzhiyun	}
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunThis code will cause unaligned accesses every time the data parameter points
186*4882a593Smuzhiyunto an address that is not evenly divisible by 4.
187*4882a593Smuzhiyun
188*4882a593SmuzhiyunIn summary, the 2 main scenarios where you may run into unaligned access
189*4882a593Smuzhiyunproblems involve:
190*4882a593Smuzhiyun 1. Casting variables to types of different lengths
191*4882a593Smuzhiyun 2. Pointer arithmetic followed by access to at least 2 bytes of data
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun
194*4882a593SmuzhiyunAvoiding unaligned accesses
195*4882a593Smuzhiyun===========================
196*4882a593Smuzhiyun
197*4882a593SmuzhiyunThe easiest way to avoid unaligned access is to use the get_unaligned() and
198*4882a593Smuzhiyunput_unaligned() macros provided by the <asm/unaligned.h> header file.
199*4882a593Smuzhiyun
200*4882a593SmuzhiyunGoing back to an earlier example of code that potentially causes unaligned
201*4882a593Smuzhiyunaccess:
202*4882a593Smuzhiyun
203*4882a593Smuzhiyun	void myfunc(u8 *data, u32 value)
204*4882a593Smuzhiyun	{
205*4882a593Smuzhiyun		[...]
206*4882a593Smuzhiyun		*((u32 *) data) = cpu_to_le32(value);
207*4882a593Smuzhiyun		[...]
208*4882a593Smuzhiyun	}
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunTo avoid the unaligned memory access, you would rewrite it as follows:
211*4882a593Smuzhiyun
212*4882a593Smuzhiyun	void myfunc(u8 *data, u32 value)
213*4882a593Smuzhiyun	{
214*4882a593Smuzhiyun		[...]
215*4882a593Smuzhiyun		value = cpu_to_le32(value);
216*4882a593Smuzhiyun		put_unaligned(value, (u32 *) data);
217*4882a593Smuzhiyun		[...]
218*4882a593Smuzhiyun	}
219*4882a593Smuzhiyun
220*4882a593SmuzhiyunThe get_unaligned() macro works similarly. Assuming 'data' is a pointer to
221*4882a593Smuzhiyunmemory and you wish to avoid unaligned access, its usage is as follows:
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun	u32 value = get_unaligned((u32 *) data);
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunThese macros work for memory accesses of any length (not just 32 bits as
226*4882a593Smuzhiyunin the examples above). Be aware that when compared to standard access of
227*4882a593Smuzhiyunaligned memory, using these macros to access unaligned memory can be costly in
228*4882a593Smuzhiyunterms of performance.
229*4882a593Smuzhiyun
230*4882a593SmuzhiyunIf use of such macros is not convenient, another option is to use memcpy(),
231*4882a593Smuzhiyunwhere the source or destination (or both) are of type u8* or unsigned char*.
232*4882a593SmuzhiyunDue to the byte-wise nature of this operation, unaligned accesses are avoided.
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun--
235*4882a593SmuzhiyunIn the Linux Kernel,
236*4882a593SmuzhiyunAuthors: Daniel Drake <dsd@gentoo.org>,
237*4882a593Smuzhiyun         Johannes Berg <johannes@sipsolutions.net>
238*4882a593SmuzhiyunWith help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
239*4882a593SmuzhiyunKyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
240*4882a593SmuzhiyunVadim Lobanov
241