xref: /OK3568_Linux_fs/kernel/Documentation/ia64/aliasing.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun==================================
2*4882a593SmuzhiyunMemory Attribute Aliasing on IA-64
3*4882a593Smuzhiyun==================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunBjorn Helgaas <bjorn.helgaas@hp.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunMay 4, 2006
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunMemory Attributes
11*4882a593Smuzhiyun=================
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun    Itanium supports several attributes for virtual memory references.
14*4882a593Smuzhiyun    The attribute is part of the virtual translation, i.e., it is
15*4882a593Smuzhiyun    contained in the TLB entry.  The ones of most interest to the Linux
16*4882a593Smuzhiyun    kernel are:
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun	==		======================
19*4882a593Smuzhiyun        WB		Write-back (cacheable)
20*4882a593Smuzhiyun	UC		Uncacheable
21*4882a593Smuzhiyun	WC		Write-coalescing
22*4882a593Smuzhiyun	==		======================
23*4882a593Smuzhiyun
24*4882a593Smuzhiyun    System memory typically uses the WB attribute.  The UC attribute is
25*4882a593Smuzhiyun    used for memory-mapped I/O devices.  The WC attribute is uncacheable
26*4882a593Smuzhiyun    like UC is, but writes may be delayed and combined to increase
27*4882a593Smuzhiyun    performance for things like frame buffers.
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun    The Itanium architecture requires that we avoid accessing the same
30*4882a593Smuzhiyun    page with both a cacheable mapping and an uncacheable mapping[1].
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun    The design of the chipset determines which attributes are supported
33*4882a593Smuzhiyun    on which regions of the address space.  For example, some chipsets
34*4882a593Smuzhiyun    support either WB or UC access to main memory, while others support
35*4882a593Smuzhiyun    only WB access.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunMemory Map
38*4882a593Smuzhiyun==========
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun    Platform firmware describes the physical memory map and the
41*4882a593Smuzhiyun    supported attributes for each region.  At boot-time, the kernel uses
42*4882a593Smuzhiyun    the EFI GetMemoryMap() interface.  ACPI can also describe memory
43*4882a593Smuzhiyun    devices and the attributes they support, but Linux/ia64 currently
44*4882a593Smuzhiyun    doesn't use this information.
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun    The kernel uses the efi_memmap table returned from GetMemoryMap() to
47*4882a593Smuzhiyun    learn the attributes supported by each region of physical address
48*4882a593Smuzhiyun    space.  Unfortunately, this table does not completely describe the
49*4882a593Smuzhiyun    address space because some machines omit some or all of the MMIO
50*4882a593Smuzhiyun    regions from the map.
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun    The kernel maintains another table, kern_memmap, which describes the
53*4882a593Smuzhiyun    memory Linux is actually using and the attribute for each region.
54*4882a593Smuzhiyun    This contains only system memory; it does not contain MMIO space.
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun    The kern_memmap table typically contains only a subset of the system
57*4882a593Smuzhiyun    memory described by the efi_memmap.  Linux/ia64 can't use all memory
58*4882a593Smuzhiyun    in the system because of constraints imposed by the identity mapping
59*4882a593Smuzhiyun    scheme.
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun    The efi_memmap table is preserved unmodified because the original
62*4882a593Smuzhiyun    boot-time information is required for kexec.
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunKernel Identify Mappings
65*4882a593Smuzhiyun========================
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun    Linux/ia64 identity mappings are done with large pages, currently
68*4882a593Smuzhiyun    either 16MB or 64MB, referred to as "granules."  Cacheable mappings
69*4882a593Smuzhiyun    are speculative[2], so the processor can read any location in the
70*4882a593Smuzhiyun    page at any time, independent of the programmer's intentions.  This
71*4882a593Smuzhiyun    means that to avoid attribute aliasing, Linux can create a cacheable
72*4882a593Smuzhiyun    identity mapping only when the entire granule supports cacheable
73*4882a593Smuzhiyun    access.
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun    Therefore, kern_memmap contains only full granule-sized regions that
76*4882a593Smuzhiyun    can referenced safely by an identity mapping.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun    Uncacheable mappings are not speculative, so the processor will
79*4882a593Smuzhiyun    generate UC accesses only to locations explicitly referenced by
80*4882a593Smuzhiyun    software.  This allows UC identity mappings to cover granules that
81*4882a593Smuzhiyun    are only partially populated, or populated with a combination of UC
82*4882a593Smuzhiyun    and WB regions.
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunUser Mappings
85*4882a593Smuzhiyun=============
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun    User mappings are typically done with 16K or 64K pages.  The smaller
88*4882a593Smuzhiyun    page size allows more flexibility because only 16K or 64K has to be
89*4882a593Smuzhiyun    homogeneous with respect to memory attributes.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunPotential Attribute Aliasing Cases
92*4882a593Smuzhiyun==================================
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun    There are several ways the kernel creates new mappings:
95*4882a593Smuzhiyun
96*4882a593Smuzhiyunmmap of /dev/mem
97*4882a593Smuzhiyun----------------
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun	This uses remap_pfn_range(), which creates user mappings.  These
100*4882a593Smuzhiyun	mappings may be either WB or UC.  If the region being mapped
101*4882a593Smuzhiyun	happens to be in kern_memmap, meaning that it may also be mapped
102*4882a593Smuzhiyun	by a kernel identity mapping, the user mapping must use the same
103*4882a593Smuzhiyun	attribute as the kernel mapping.
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun	If the region is not in kern_memmap, the user mapping should use
106*4882a593Smuzhiyun	an attribute reported as being supported in the EFI memory map.
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun	Since the EFI memory map does not describe MMIO on some
109*4882a593Smuzhiyun	machines, this should use an uncacheable mapping as a fallback.
110*4882a593Smuzhiyun
111*4882a593Smuzhiyunmmap of /sys/class/pci_bus/.../legacy_mem
112*4882a593Smuzhiyun-----------------------------------------
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun	This is very similar to mmap of /dev/mem, except that legacy_mem
115*4882a593Smuzhiyun	only allows mmap of the one megabyte "legacy MMIO" area for a
116*4882a593Smuzhiyun	specific PCI bus.  Typically this is the first megabyte of
117*4882a593Smuzhiyun	physical address space, but it may be different on machines with
118*4882a593Smuzhiyun	several VGA devices.
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun	"X" uses this to access VGA frame buffers.  Using legacy_mem
121*4882a593Smuzhiyun	rather than /dev/mem allows multiple instances of X to talk to
122*4882a593Smuzhiyun	different VGA cards.
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun	The /dev/mem mmap constraints apply.
125*4882a593Smuzhiyun
126*4882a593Smuzhiyunmmap of /proc/bus/pci/.../??.?
127*4882a593Smuzhiyun------------------------------
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun	This is an MMIO mmap of PCI functions, which additionally may or
130*4882a593Smuzhiyun	may not be requested as using the WC attribute.
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun	If WC is requested, and the region in kern_memmap is either WC
133*4882a593Smuzhiyun	or UC, and the EFI memory map designates the region as WC, then
134*4882a593Smuzhiyun	the WC mapping is allowed.
135*4882a593Smuzhiyun
136*4882a593Smuzhiyun	Otherwise, the user mapping must use the same attribute as the
137*4882a593Smuzhiyun	kernel mapping.
138*4882a593Smuzhiyun
139*4882a593Smuzhiyunread/write of /dev/mem
140*4882a593Smuzhiyun----------------------
141*4882a593Smuzhiyun
142*4882a593Smuzhiyun	This uses copy_from_user(), which implicitly uses a kernel
143*4882a593Smuzhiyun	identity mapping.  This is obviously safe for things in
144*4882a593Smuzhiyun	kern_memmap.
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun	There may be corner cases of things that are not in kern_memmap,
147*4882a593Smuzhiyun	but could be accessed this way.  For example, registers in MMIO
148*4882a593Smuzhiyun	space are not in kern_memmap, but could be accessed with a UC
149*4882a593Smuzhiyun	mapping.  This would not cause attribute aliasing.  But
150*4882a593Smuzhiyun	registers typically can be accessed only with four-byte or
151*4882a593Smuzhiyun	eight-byte accesses, and the copy_from_user() path doesn't allow
152*4882a593Smuzhiyun	any control over the access size, so this would be dangerous.
153*4882a593Smuzhiyun
154*4882a593Smuzhiyunioremap()
155*4882a593Smuzhiyun---------
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun	This returns a mapping for use inside the kernel.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun	If the region is in kern_memmap, we should use the attribute
160*4882a593Smuzhiyun	specified there.
161*4882a593Smuzhiyun
162*4882a593Smuzhiyun	If the EFI memory map reports that the entire granule supports
163*4882a593Smuzhiyun	WB, we should use that (granules that are partially reserved
164*4882a593Smuzhiyun	or occupied by firmware do not appear in kern_memmap).
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun	If the granule contains non-WB memory, but we can cover the
167*4882a593Smuzhiyun	region safely with kernel page table mappings, we can use
168*4882a593Smuzhiyun	ioremap_page_range() as most other architectures do.
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun	Failing all of the above, we have to fall back to a UC mapping.
171*4882a593Smuzhiyun
172*4882a593SmuzhiyunPast Problem Cases
173*4882a593Smuzhiyun==================
174*4882a593Smuzhiyun
175*4882a593Smuzhiyunmmap of various MMIO regions from /dev/mem by "X" on Intel platforms
176*4882a593Smuzhiyun--------------------------------------------------------------------
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun      The EFI memory map may not report these MMIO regions.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun      These must be allowed so that X will work.  This means that
181*4882a593Smuzhiyun      when the EFI memory map is incomplete, every /dev/mem mmap must
182*4882a593Smuzhiyun      succeed.  It may create either WB or UC user mappings, depending
183*4882a593Smuzhiyun      on whether the region is in kern_memmap or the EFI memory map.
184*4882a593Smuzhiyun
185*4882a593Smuzhiyunmmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
186*4882a593Smuzhiyun----------------------------------------------------------------------
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun      The EFI memory map reports the following attributes:
189*4882a593Smuzhiyun
190*4882a593Smuzhiyun        =============== ======= ==================
191*4882a593Smuzhiyun        0x00000-0x9FFFF WB only
192*4882a593Smuzhiyun        0xA0000-0xBFFFF UC only (VGA frame buffer)
193*4882a593Smuzhiyun        0xC0000-0xFFFFF WB only
194*4882a593Smuzhiyun        =============== ======= ==================
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun      This mmap is done with user pages, not kernel identity mappings,
197*4882a593Smuzhiyun      so it is safe to use WB mappings.
198*4882a593Smuzhiyun
199*4882a593Smuzhiyun      The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
200*4882a593Smuzhiyun      which uses a granule-sized UC mapping.  This granule will cover some
201*4882a593Smuzhiyun      WB-only memory, but since UC is non-speculative, the processor will
202*4882a593Smuzhiyun      never generate an uncacheable reference to the WB-only areas unless
203*4882a593Smuzhiyun      the driver explicitly touches them.
204*4882a593Smuzhiyun
205*4882a593Smuzhiyunmmap of 0x0-0xFFFFF legacy_mem by "X"
206*4882a593Smuzhiyun-------------------------------------
207*4882a593Smuzhiyun
208*4882a593Smuzhiyun      If the EFI memory map reports that the entire range supports the
209*4882a593Smuzhiyun      same attributes, we can allow the mmap (and we will prefer WB if
210*4882a593Smuzhiyun      supported, as is the case with HP sx[12]000 machines with VGA
211*4882a593Smuzhiyun      disabled).
212*4882a593Smuzhiyun
213*4882a593Smuzhiyun      If EFI reports the range as partly WB and partly UC (as on sx[12]000
214*4882a593Smuzhiyun      machines with VGA enabled), we must fail the mmap because there's no
215*4882a593Smuzhiyun      safe attribute to use.
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun      If EFI reports some of the range but not all (as on Intel firmware
218*4882a593Smuzhiyun      that doesn't report the VGA frame buffer at all), we should fail the
219*4882a593Smuzhiyun      mmap and force the user to map just the specific region of interest.
220*4882a593Smuzhiyun
221*4882a593Smuzhiyunmmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
222*4882a593Smuzhiyun------------------------------------------------------------------------
223*4882a593Smuzhiyun
224*4882a593Smuzhiyun      The EFI memory map reports the following attributes::
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun        0x00000-0xFFFFF WB only (no VGA MMIO hole)
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun      This is a special case of the previous case, and the mmap should
229*4882a593Smuzhiyun      fail for the same reason as above.
230*4882a593Smuzhiyun
231*4882a593Smuzhiyunread of /sys/devices/.../rom
232*4882a593Smuzhiyun----------------------------
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun      For VGA devices, this may cause an ioremap() of 0xC0000.  This
235*4882a593Smuzhiyun      used to be done with a UC mapping, because the VGA frame buffer
236*4882a593Smuzhiyun      at 0xA0000 prevents use of a WB granule.  The UC mapping causes
237*4882a593Smuzhiyun      an MCA on HP sx[12]000 chipsets.
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun      We should use WB page table mappings to avoid covering the VGA
240*4882a593Smuzhiyun      frame buffer.
241*4882a593Smuzhiyun
242*4882a593SmuzhiyunNotes
243*4882a593Smuzhiyun=====
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun    [1] SDM rev 2.2, vol 2, sec 4.4.1.
246*4882a593Smuzhiyun    [2] SDM rev 2.2, vol 2, sec 4.4.6.
247