1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun========================== 4*4882a593SmuzhiyunPAT (Page Attribute Table) 5*4882a593Smuzhiyun========================== 6*4882a593Smuzhiyun 7*4882a593Smuzhiyunx86 Page Attribute Table (PAT) allows for setting the memory attribute at the 8*4882a593Smuzhiyunpage level granularity. PAT is complementary to the MTRR settings which allows 9*4882a593Smuzhiyunfor setting of memory types over physical address ranges. However, PAT is 10*4882a593Smuzhiyunmore flexible than MTRR due to its capability to set attributes at page level 11*4882a593Smuzhiyunand also due to the fact that there are no hardware limitations on number of 12*4882a593Smuzhiyunsuch attribute settings allowed. Added flexibility comes with guidelines for 13*4882a593Smuzhiyunnot having memory type aliasing for the same physical memory with multiple 14*4882a593Smuzhiyunvirtual addresses. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunPAT allows for different types of memory attributes. The most commonly used 17*4882a593Smuzhiyunones that will be supported at this time are: 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun=== ============== 20*4882a593SmuzhiyunWB Write-back 21*4882a593SmuzhiyunUC Uncached 22*4882a593SmuzhiyunWC Write-combined 23*4882a593SmuzhiyunWT Write-through 24*4882a593SmuzhiyunUC- Uncached Minus 25*4882a593Smuzhiyun=== ============== 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunPAT APIs 29*4882a593Smuzhiyun======== 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunThere are many different APIs in the kernel that allows setting of memory 32*4882a593Smuzhiyunattributes at the page level. In order to avoid aliasing, these interfaces 33*4882a593Smuzhiyunshould be used thoughtfully. Below is a table of interfaces available, 34*4882a593Smuzhiyuntheir intended usage and their memory attribute relationships. Internally, 35*4882a593Smuzhiyunthese APIs use a reserve_memtype()/free_memtype() interface on the physical 36*4882a593Smuzhiyunaddress range to avoid any aliasing. 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 39*4882a593Smuzhiyun| API | RAM | ACPI,... | Reserved/Holes | 40*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 41*4882a593Smuzhiyun| ioremap | -- | UC- | UC- | 42*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 43*4882a593Smuzhiyun| ioremap_cache | -- | WB | WB | 44*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 45*4882a593Smuzhiyun| ioremap_uc | -- | UC | UC | 46*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 47*4882a593Smuzhiyun| ioremap_wc | -- | -- | WC | 48*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 49*4882a593Smuzhiyun| ioremap_wt | -- | -- | WT | 50*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 51*4882a593Smuzhiyun| set_memory_uc, | UC- | -- | -- | 52*4882a593Smuzhiyun| set_memory_wb | | | | 53*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 54*4882a593Smuzhiyun| set_memory_wc, | WC | -- | -- | 55*4882a593Smuzhiyun| set_memory_wb | | | | 56*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 57*4882a593Smuzhiyun| set_memory_wt, | WT | -- | -- | 58*4882a593Smuzhiyun| set_memory_wb | | | | 59*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 60*4882a593Smuzhiyun| pci sysfs resource | -- | -- | UC- | 61*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 62*4882a593Smuzhiyun| pci sysfs resource_wc | -- | -- | WC | 63*4882a593Smuzhiyun| is IORESOURCE_PREFETCH | | | | 64*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 65*4882a593Smuzhiyun| pci proc | -- | -- | UC- | 66*4882a593Smuzhiyun| !PCIIOC_WRITE_COMBINE | | | | 67*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 68*4882a593Smuzhiyun| pci proc | -- | -- | WC | 69*4882a593Smuzhiyun| PCIIOC_WRITE_COMBINE | | | | 70*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 71*4882a593Smuzhiyun| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 72*4882a593Smuzhiyun| read-write | | | | 73*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 74*4882a593Smuzhiyun| /dev/mem | -- | UC- | UC- | 75*4882a593Smuzhiyun| mmap SYNC flag | | | | 76*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 77*4882a593Smuzhiyun| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 78*4882a593Smuzhiyun| mmap !SYNC flag | | | | 79*4882a593Smuzhiyun| and | |(from existing| (from existing | 80*4882a593Smuzhiyun| any alias to this area | |alias) | alias) | 81*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 82*4882a593Smuzhiyun| /dev/mem | -- | WB | WB | 83*4882a593Smuzhiyun| mmap !SYNC flag | | | | 84*4882a593Smuzhiyun| no alias to this area | | | | 85*4882a593Smuzhiyun| and | | | | 86*4882a593Smuzhiyun| MTRR says WB | | | | 87*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 88*4882a593Smuzhiyun| /dev/mem | -- | -- | UC- | 89*4882a593Smuzhiyun| mmap !SYNC flag | | | | 90*4882a593Smuzhiyun| no alias to this area | | | | 91*4882a593Smuzhiyun| and | | | | 92*4882a593Smuzhiyun| MTRR says !WB | | | | 93*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+ 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun 96*4882a593SmuzhiyunAdvanced APIs for drivers 97*4882a593Smuzhiyun========================= 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunA. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, 100*4882a593Smuzhiyunvmf_insert_pfn. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunDrivers wanting to export some pages to userspace do it by using mmap 103*4882a593Smuzhiyuninterface and a combination of: 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun 1) pgprot_noncached() 106*4882a593Smuzhiyun 2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunWith PAT support, a new API pgprot_writecombine is being added. So, drivers can 109*4882a593Smuzhiyuncontinue to use the above sequence, with either pgprot_noncached() or 110*4882a593Smuzhiyunpgprot_writecombine() in step 1, followed by step 2. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunIn addition, step 2 internally tracks the region as UC or WC in memtype 113*4882a593Smuzhiyunlist in order to ensure no conflicting mapping. 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunNote that this set of APIs only works with IO (non RAM) regions. If driver 116*4882a593Smuzhiyunwants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() 117*4882a593Smuzhiyunas step 0 above and also track the usage of those pages and use set_memory_wb() 118*4882a593Smuzhiyunbefore the page is freed to free pool. 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunMTRR effects on PAT / non-PAT systems 121*4882a593Smuzhiyun===================================== 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunThe following table provides the effects of using write-combining MTRRs when 124*4882a593Smuzhiyunusing ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally 125*4882a593Smuzhiyunmtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will 126*4882a593Smuzhiyunbe a no-op on PAT enabled systems. The region over which a arch_phys_wc_add() 127*4882a593Smuzhiyunis made, should already have been ioremapped with WC attributes or PAT entries, 128*4882a593Smuzhiyunthis can be done by using ioremap_wc() / set_memory_wc(). Devices which 129*4882a593Smuzhiyuncombine areas of IO memory desired to remain uncacheable with areas where 130*4882a593Smuzhiyunwrite-combining is desirable should consider use of ioremap_uc() followed by 131*4882a593Smuzhiyunset_memory_wc() to white-list effective write-combined areas. Such use is 132*4882a593Smuzhiyunnevertheless discouraged as the effective memory type is considered 133*4882a593Smuzhiyunimplementation defined, yet this strategy can be used as last resort on devices 134*4882a593Smuzhiyunwith size-constrained regions where otherwise MTRR write-combining would 135*4882a593Smuzhiyunotherwise not be effective. 136*4882a593Smuzhiyun:: 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun ==== ======= === ========================= ===================== 139*4882a593Smuzhiyun MTRR Non-PAT PAT Linux ioremap value Effective memory type 140*4882a593Smuzhiyun ==== ======= === ========================= ===================== 141*4882a593Smuzhiyun PAT Non-PAT | PAT 142*4882a593Smuzhiyun |PCD | 143*4882a593Smuzhiyun ||PWT | 144*4882a593Smuzhiyun ||| | 145*4882a593Smuzhiyun WC 000 WB _PAGE_CACHE_MODE_WB WC | WC 146*4882a593Smuzhiyun WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC 147*4882a593Smuzhiyun WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC 148*4882a593Smuzhiyun WC 011 UC _PAGE_CACHE_MODE_UC UC | UC 149*4882a593Smuzhiyun ==== ======= === ========================= ===================== 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun (*) denotes implementation defined and is discouraged 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun.. note:: -- in the above table mean "Not suggested usage for the API". Some 154*4882a593Smuzhiyun of the --'s are strictly enforced by the kernel. Some others are not really 155*4882a593Smuzhiyun enforced today, but may be enforced in future. 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunFor ioremap and pci access through /sys or /proc - The actual type returned 158*4882a593Smuzhiyuncan be more restrictive, in case of any existing aliasing for that address. 159*4882a593SmuzhiyunFor example: If there is an existing uncached mapping, a new ioremap_wc can 160*4882a593Smuzhiyunreturn uncached mapping in place of write-combine requested. 161*4882a593Smuzhiyun 162*4882a593Smuzhiyunset_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver 163*4882a593Smuzhiyunwill first make a region uc, wc or wt and switch it back to wb after use. 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunOver time writes to /proc/mtrr will be deprecated in favor of using PAT based 166*4882a593Smuzhiyuninterfaces. Users writing to /proc/mtrr are suggested to use above interfaces. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunDrivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access 169*4882a593Smuzhiyuntypes. 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunDrivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunPAT debugging 175*4882a593Smuzhiyun============= 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunWith CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by:: 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun # mount -t debugfs debugfs /sys/kernel/debug 180*4882a593Smuzhiyun # cat /sys/kernel/debug/x86/pat_memtype_list 181*4882a593Smuzhiyun PAT memtype list: 182*4882a593Smuzhiyun uncached-minus @ 0x7fadf000-0x7fae0000 183*4882a593Smuzhiyun uncached-minus @ 0x7fb19000-0x7fb1a000 184*4882a593Smuzhiyun uncached-minus @ 0x7fb1a000-0x7fb1b000 185*4882a593Smuzhiyun uncached-minus @ 0x7fb1b000-0x7fb1c000 186*4882a593Smuzhiyun uncached-minus @ 0x7fb1c000-0x7fb1d000 187*4882a593Smuzhiyun uncached-minus @ 0x7fb1d000-0x7fb1e000 188*4882a593Smuzhiyun uncached-minus @ 0x7fb1e000-0x7fb25000 189*4882a593Smuzhiyun uncached-minus @ 0x7fb25000-0x7fb26000 190*4882a593Smuzhiyun uncached-minus @ 0x7fb26000-0x7fb27000 191*4882a593Smuzhiyun uncached-minus @ 0x7fb27000-0x7fb28000 192*4882a593Smuzhiyun uncached-minus @ 0x7fb28000-0x7fb2e000 193*4882a593Smuzhiyun uncached-minus @ 0x7fb2e000-0x7fb2f000 194*4882a593Smuzhiyun uncached-minus @ 0x7fb2f000-0x7fb30000 195*4882a593Smuzhiyun uncached-minus @ 0x7fb31000-0x7fb32000 196*4882a593Smuzhiyun uncached-minus @ 0x80000000-0x90000000 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunThis list shows physical address ranges and various PAT settings used to 199*4882a593Smuzhiyunaccess those physical address ranges. 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunAnother, more verbose way of getting PAT related debug messages is with 202*4882a593Smuzhiyun"debugpat" boot parameter. With this parameter, various debug messages are 203*4882a593Smuzhiyunprinted to dmesg log. 204*4882a593Smuzhiyun 205*4882a593SmuzhiyunPAT Initialization 206*4882a593Smuzhiyun================== 207*4882a593Smuzhiyun 208*4882a593SmuzhiyunThe following table describes how PAT is initialized under various 209*4882a593Smuzhiyunconfigurations. The PAT MSR must be updated by Linux in order to support WC 210*4882a593Smuzhiyunand WT attributes. Otherwise, the PAT MSR has the value programmed in it 211*4882a593Smuzhiyunby the firmware. Note, Xen enables WC attribute in the PAT MSR for guests. 212*4882a593Smuzhiyun 213*4882a593Smuzhiyun ==== ===== ========================== ========= ======= 214*4882a593Smuzhiyun MTRR PAT Call Sequence PAT State PAT MSR 215*4882a593Smuzhiyun ==== ===== ========================== ========= ======= 216*4882a593Smuzhiyun E E MTRR -> PAT init Enabled OS 217*4882a593Smuzhiyun E D MTRR -> PAT init Disabled - 218*4882a593Smuzhiyun D E MTRR -> PAT disable Disabled BIOS 219*4882a593Smuzhiyun D D MTRR -> PAT disable Disabled - 220*4882a593Smuzhiyun - np/E PAT -> PAT disable Disabled BIOS 221*4882a593Smuzhiyun - np/D PAT -> PAT disable Disabled - 222*4882a593Smuzhiyun E !P/E MTRR -> PAT init Disabled BIOS 223*4882a593Smuzhiyun D !P/E MTRR -> PAT disable Disabled BIOS 224*4882a593Smuzhiyun !M !P/E MTRR stub -> PAT disable Disabled BIOS 225*4882a593Smuzhiyun ==== ===== ========================== ========= ======= 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun Legend 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun ========= ======================================= 230*4882a593Smuzhiyun E Feature enabled in CPU 231*4882a593Smuzhiyun D Feature disabled/unsupported in CPU 232*4882a593Smuzhiyun np "nopat" boot option specified 233*4882a593Smuzhiyun !P CONFIG_X86_PAT option unset 234*4882a593Smuzhiyun !M CONFIG_MTRR option unset 235*4882a593Smuzhiyun Enabled PAT state set to enabled 236*4882a593Smuzhiyun Disabled PAT state set to disabled 237*4882a593Smuzhiyun OS PAT initializes PAT MSR with OS setting 238*4882a593Smuzhiyun BIOS PAT keeps PAT MSR with BIOS setting 239*4882a593Smuzhiyun ========= ======================================= 240*4882a593Smuzhiyun 241