1*4882a593Smuzhiyun================================== 2*4882a593SmuzhiyunMemory Attribute Aliasing on IA-64 3*4882a593Smuzhiyun================================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunBjorn Helgaas <bjorn.helgaas@hp.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunMay 4, 2006 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunMemory Attributes 11*4882a593Smuzhiyun================= 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun Itanium supports several attributes for virtual memory references. 14*4882a593Smuzhiyun The attribute is part of the virtual translation, i.e., it is 15*4882a593Smuzhiyun contained in the TLB entry. The ones of most interest to the Linux 16*4882a593Smuzhiyun kernel are: 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun == ====================== 19*4882a593Smuzhiyun WB Write-back (cacheable) 20*4882a593Smuzhiyun UC Uncacheable 21*4882a593Smuzhiyun WC Write-coalescing 22*4882a593Smuzhiyun == ====================== 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun System memory typically uses the WB attribute. The UC attribute is 25*4882a593Smuzhiyun used for memory-mapped I/O devices. The WC attribute is uncacheable 26*4882a593Smuzhiyun like UC is, but writes may be delayed and combined to increase 27*4882a593Smuzhiyun performance for things like frame buffers. 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun The Itanium architecture requires that we avoid accessing the same 30*4882a593Smuzhiyun page with both a cacheable mapping and an uncacheable mapping[1]. 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun The design of the chipset determines which attributes are supported 33*4882a593Smuzhiyun on which regions of the address space. For example, some chipsets 34*4882a593Smuzhiyun support either WB or UC access to main memory, while others support 35*4882a593Smuzhiyun only WB access. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunMemory Map 38*4882a593Smuzhiyun========== 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun Platform firmware describes the physical memory map and the 41*4882a593Smuzhiyun supported attributes for each region. At boot-time, the kernel uses 42*4882a593Smuzhiyun the EFI GetMemoryMap() interface. ACPI can also describe memory 43*4882a593Smuzhiyun devices and the attributes they support, but Linux/ia64 currently 44*4882a593Smuzhiyun doesn't use this information. 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun The kernel uses the efi_memmap table returned from GetMemoryMap() to 47*4882a593Smuzhiyun learn the attributes supported by each region of physical address 48*4882a593Smuzhiyun space. Unfortunately, this table does not completely describe the 49*4882a593Smuzhiyun address space because some machines omit some or all of the MMIO 50*4882a593Smuzhiyun regions from the map. 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun The kernel maintains another table, kern_memmap, which describes the 53*4882a593Smuzhiyun memory Linux is actually using and the attribute for each region. 54*4882a593Smuzhiyun This contains only system memory; it does not contain MMIO space. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun The kern_memmap table typically contains only a subset of the system 57*4882a593Smuzhiyun memory described by the efi_memmap. Linux/ia64 can't use all memory 58*4882a593Smuzhiyun in the system because of constraints imposed by the identity mapping 59*4882a593Smuzhiyun scheme. 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun The efi_memmap table is preserved unmodified because the original 62*4882a593Smuzhiyun boot-time information is required for kexec. 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunKernel Identify Mappings 65*4882a593Smuzhiyun======================== 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun Linux/ia64 identity mappings are done with large pages, currently 68*4882a593Smuzhiyun either 16MB or 64MB, referred to as "granules." Cacheable mappings 69*4882a593Smuzhiyun are speculative[2], so the processor can read any location in the 70*4882a593Smuzhiyun page at any time, independent of the programmer's intentions. This 71*4882a593Smuzhiyun means that to avoid attribute aliasing, Linux can create a cacheable 72*4882a593Smuzhiyun identity mapping only when the entire granule supports cacheable 73*4882a593Smuzhiyun access. 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun Therefore, kern_memmap contains only full granule-sized regions that 76*4882a593Smuzhiyun can referenced safely by an identity mapping. 77*4882a593Smuzhiyun 78*4882a593Smuzhiyun Uncacheable mappings are not speculative, so the processor will 79*4882a593Smuzhiyun generate UC accesses only to locations explicitly referenced by 80*4882a593Smuzhiyun software. This allows UC identity mappings to cover granules that 81*4882a593Smuzhiyun are only partially populated, or populated with a combination of UC 82*4882a593Smuzhiyun and WB regions. 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunUser Mappings 85*4882a593Smuzhiyun============= 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun User mappings are typically done with 16K or 64K pages. The smaller 88*4882a593Smuzhiyun page size allows more flexibility because only 16K or 64K has to be 89*4882a593Smuzhiyun homogeneous with respect to memory attributes. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunPotential Attribute Aliasing Cases 92*4882a593Smuzhiyun================================== 93*4882a593Smuzhiyun 94*4882a593Smuzhiyun There are several ways the kernel creates new mappings: 95*4882a593Smuzhiyun 96*4882a593Smuzhiyunmmap of /dev/mem 97*4882a593Smuzhiyun---------------- 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun This uses remap_pfn_range(), which creates user mappings. These 100*4882a593Smuzhiyun mappings may be either WB or UC. If the region being mapped 101*4882a593Smuzhiyun happens to be in kern_memmap, meaning that it may also be mapped 102*4882a593Smuzhiyun by a kernel identity mapping, the user mapping must use the same 103*4882a593Smuzhiyun attribute as the kernel mapping. 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun If the region is not in kern_memmap, the user mapping should use 106*4882a593Smuzhiyun an attribute reported as being supported in the EFI memory map. 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun Since the EFI memory map does not describe MMIO on some 109*4882a593Smuzhiyun machines, this should use an uncacheable mapping as a fallback. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyunmmap of /sys/class/pci_bus/.../legacy_mem 112*4882a593Smuzhiyun----------------------------------------- 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun This is very similar to mmap of /dev/mem, except that legacy_mem 115*4882a593Smuzhiyun only allows mmap of the one megabyte "legacy MMIO" area for a 116*4882a593Smuzhiyun specific PCI bus. Typically this is the first megabyte of 117*4882a593Smuzhiyun physical address space, but it may be different on machines with 118*4882a593Smuzhiyun several VGA devices. 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun "X" uses this to access VGA frame buffers. Using legacy_mem 121*4882a593Smuzhiyun rather than /dev/mem allows multiple instances of X to talk to 122*4882a593Smuzhiyun different VGA cards. 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun The /dev/mem mmap constraints apply. 125*4882a593Smuzhiyun 126*4882a593Smuzhiyunmmap of /proc/bus/pci/.../??.? 127*4882a593Smuzhiyun------------------------------ 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun This is an MMIO mmap of PCI functions, which additionally may or 130*4882a593Smuzhiyun may not be requested as using the WC attribute. 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun If WC is requested, and the region in kern_memmap is either WC 133*4882a593Smuzhiyun or UC, and the EFI memory map designates the region as WC, then 134*4882a593Smuzhiyun the WC mapping is allowed. 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun Otherwise, the user mapping must use the same attribute as the 137*4882a593Smuzhiyun kernel mapping. 138*4882a593Smuzhiyun 139*4882a593Smuzhiyunread/write of /dev/mem 140*4882a593Smuzhiyun---------------------- 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun This uses copy_from_user(), which implicitly uses a kernel 143*4882a593Smuzhiyun identity mapping. This is obviously safe for things in 144*4882a593Smuzhiyun kern_memmap. 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun There may be corner cases of things that are not in kern_memmap, 147*4882a593Smuzhiyun but could be accessed this way. For example, registers in MMIO 148*4882a593Smuzhiyun space are not in kern_memmap, but could be accessed with a UC 149*4882a593Smuzhiyun mapping. This would not cause attribute aliasing. But 150*4882a593Smuzhiyun registers typically can be accessed only with four-byte or 151*4882a593Smuzhiyun eight-byte accesses, and the copy_from_user() path doesn't allow 152*4882a593Smuzhiyun any control over the access size, so this would be dangerous. 153*4882a593Smuzhiyun 154*4882a593Smuzhiyunioremap() 155*4882a593Smuzhiyun--------- 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun This returns a mapping for use inside the kernel. 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun If the region is in kern_memmap, we should use the attribute 160*4882a593Smuzhiyun specified there. 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun If the EFI memory map reports that the entire granule supports 163*4882a593Smuzhiyun WB, we should use that (granules that are partially reserved 164*4882a593Smuzhiyun or occupied by firmware do not appear in kern_memmap). 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun If the granule contains non-WB memory, but we can cover the 167*4882a593Smuzhiyun region safely with kernel page table mappings, we can use 168*4882a593Smuzhiyun ioremap_page_range() as most other architectures do. 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun Failing all of the above, we have to fall back to a UC mapping. 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunPast Problem Cases 173*4882a593Smuzhiyun================== 174*4882a593Smuzhiyun 175*4882a593Smuzhiyunmmap of various MMIO regions from /dev/mem by "X" on Intel platforms 176*4882a593Smuzhiyun-------------------------------------------------------------------- 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun The EFI memory map may not report these MMIO regions. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun These must be allowed so that X will work. This means that 181*4882a593Smuzhiyun when the EFI memory map is incomplete, every /dev/mem mmap must 182*4882a593Smuzhiyun succeed. It may create either WB or UC user mappings, depending 183*4882a593Smuzhiyun on whether the region is in kern_memmap or the EFI memory map. 184*4882a593Smuzhiyun 185*4882a593Smuzhiyunmmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled 186*4882a593Smuzhiyun---------------------------------------------------------------------- 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun The EFI memory map reports the following attributes: 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun =============== ======= ================== 191*4882a593Smuzhiyun 0x00000-0x9FFFF WB only 192*4882a593Smuzhiyun 0xA0000-0xBFFFF UC only (VGA frame buffer) 193*4882a593Smuzhiyun 0xC0000-0xFFFFF WB only 194*4882a593Smuzhiyun =============== ======= ================== 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun This mmap is done with user pages, not kernel identity mappings, 197*4882a593Smuzhiyun so it is safe to use WB mappings. 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, 200*4882a593Smuzhiyun which uses a granule-sized UC mapping. This granule will cover some 201*4882a593Smuzhiyun WB-only memory, but since UC is non-speculative, the processor will 202*4882a593Smuzhiyun never generate an uncacheable reference to the WB-only areas unless 203*4882a593Smuzhiyun the driver explicitly touches them. 204*4882a593Smuzhiyun 205*4882a593Smuzhiyunmmap of 0x0-0xFFFFF legacy_mem by "X" 206*4882a593Smuzhiyun------------------------------------- 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun If the EFI memory map reports that the entire range supports the 209*4882a593Smuzhiyun same attributes, we can allow the mmap (and we will prefer WB if 210*4882a593Smuzhiyun supported, as is the case with HP sx[12]000 machines with VGA 211*4882a593Smuzhiyun disabled). 212*4882a593Smuzhiyun 213*4882a593Smuzhiyun If EFI reports the range as partly WB and partly UC (as on sx[12]000 214*4882a593Smuzhiyun machines with VGA enabled), we must fail the mmap because there's no 215*4882a593Smuzhiyun safe attribute to use. 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun If EFI reports some of the range but not all (as on Intel firmware 218*4882a593Smuzhiyun that doesn't report the VGA frame buffer at all), we should fail the 219*4882a593Smuzhiyun mmap and force the user to map just the specific region of interest. 220*4882a593Smuzhiyun 221*4882a593Smuzhiyunmmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled 222*4882a593Smuzhiyun------------------------------------------------------------------------ 223*4882a593Smuzhiyun 224*4882a593Smuzhiyun The EFI memory map reports the following attributes:: 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun 0x00000-0xFFFFF WB only (no VGA MMIO hole) 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun This is a special case of the previous case, and the mmap should 229*4882a593Smuzhiyun fail for the same reason as above. 230*4882a593Smuzhiyun 231*4882a593Smuzhiyunread of /sys/devices/.../rom 232*4882a593Smuzhiyun---------------------------- 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun For VGA devices, this may cause an ioremap() of 0xC0000. This 235*4882a593Smuzhiyun used to be done with a UC mapping, because the VGA frame buffer 236*4882a593Smuzhiyun at 0xA0000 prevents use of a WB granule. The UC mapping causes 237*4882a593Smuzhiyun an MCA on HP sx[12]000 chipsets. 238*4882a593Smuzhiyun 239*4882a593Smuzhiyun We should use WB page table mappings to avoid covering the VGA 240*4882a593Smuzhiyun frame buffer. 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunNotes 243*4882a593Smuzhiyun===== 244*4882a593Smuzhiyun 245*4882a593Smuzhiyun [1] SDM rev 2.2, vol 2, sec 4.4.1. 246*4882a593Smuzhiyun [2] SDM rev 2.2, vol 2, sec 4.4.6. 247