xref: /OK3568_Linux_fs/kernel/Documentation/x86/pat.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==========================
4*4882a593SmuzhiyunPAT (Page Attribute Table)
5*4882a593Smuzhiyun==========================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyunx86 Page Attribute Table (PAT) allows for setting the memory attribute at the
8*4882a593Smuzhiyunpage level granularity. PAT is complementary to the MTRR settings which allows
9*4882a593Smuzhiyunfor setting of memory types over physical address ranges. However, PAT is
10*4882a593Smuzhiyunmore flexible than MTRR due to its capability to set attributes at page level
11*4882a593Smuzhiyunand also due to the fact that there are no hardware limitations on number of
12*4882a593Smuzhiyunsuch attribute settings allowed. Added flexibility comes with guidelines for
13*4882a593Smuzhiyunnot having memory type aliasing for the same physical memory with multiple
14*4882a593Smuzhiyunvirtual addresses.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunPAT allows for different types of memory attributes. The most commonly used
17*4882a593Smuzhiyunones that will be supported at this time are:
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun===  ==============
20*4882a593SmuzhiyunWB   Write-back
21*4882a593SmuzhiyunUC   Uncached
22*4882a593SmuzhiyunWC   Write-combined
23*4882a593SmuzhiyunWT   Write-through
24*4882a593SmuzhiyunUC-  Uncached Minus
25*4882a593Smuzhiyun===  ==============
26*4882a593Smuzhiyun
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunPAT APIs
29*4882a593Smuzhiyun========
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunThere are many different APIs in the kernel that allows setting of memory
32*4882a593Smuzhiyunattributes at the page level. In order to avoid aliasing, these interfaces
33*4882a593Smuzhiyunshould be used thoughtfully. Below is a table of interfaces available,
34*4882a593Smuzhiyuntheir intended usage and their memory attribute relationships. Internally,
35*4882a593Smuzhiyunthese APIs use a reserve_memtype()/free_memtype() interface on the physical
36*4882a593Smuzhiyunaddress range to avoid any aliasing.
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
39*4882a593Smuzhiyun| API                    |    RAM   |  ACPI,...    |  Reserved/Holes  |
40*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
41*4882a593Smuzhiyun| ioremap                |    --    |    UC-       |       UC-        |
42*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
43*4882a593Smuzhiyun| ioremap_cache          |    --    |    WB        |       WB         |
44*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
45*4882a593Smuzhiyun| ioremap_uc             |    --    |    UC        |       UC         |
46*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
47*4882a593Smuzhiyun| ioremap_wc             |    --    |    --        |       WC         |
48*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
49*4882a593Smuzhiyun| ioremap_wt             |    --    |    --        |       WT         |
50*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
51*4882a593Smuzhiyun| set_memory_uc,         |    UC-   |    --        |       --         |
52*4882a593Smuzhiyun| set_memory_wb          |          |              |                  |
53*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
54*4882a593Smuzhiyun| set_memory_wc,         |    WC    |    --        |       --         |
55*4882a593Smuzhiyun| set_memory_wb          |          |              |                  |
56*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
57*4882a593Smuzhiyun| set_memory_wt,         |    WT    |    --        |       --         |
58*4882a593Smuzhiyun| set_memory_wb          |          |              |                  |
59*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
60*4882a593Smuzhiyun| pci sysfs resource     |    --    |    --        |       UC-        |
61*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
62*4882a593Smuzhiyun| pci sysfs resource_wc  |    --    |    --        |       WC         |
63*4882a593Smuzhiyun| is IORESOURCE_PREFETCH |          |              |                  |
64*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
65*4882a593Smuzhiyun| pci proc               |    --    |    --        |       UC-        |
66*4882a593Smuzhiyun| !PCIIOC_WRITE_COMBINE  |          |              |                  |
67*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
68*4882a593Smuzhiyun| pci proc               |    --    |    --        |       WC         |
69*4882a593Smuzhiyun| PCIIOC_WRITE_COMBINE   |          |              |                  |
70*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
71*4882a593Smuzhiyun| /dev/mem               |    --    |   WB/WC/UC-  |    WB/WC/UC-     |
72*4882a593Smuzhiyun| read-write             |          |              |                  |
73*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
74*4882a593Smuzhiyun| /dev/mem               |    --    |    UC-       |       UC-        |
75*4882a593Smuzhiyun| mmap SYNC flag         |          |              |                  |
76*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
77*4882a593Smuzhiyun| /dev/mem               |    --    |   WB/WC/UC-  |  WB/WC/UC-       |
78*4882a593Smuzhiyun| mmap !SYNC flag        |          |              |                  |
79*4882a593Smuzhiyun| and                    |          |(from existing|  (from existing  |
80*4882a593Smuzhiyun| any alias to this area |          |alias)        |  alias)          |
81*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
82*4882a593Smuzhiyun| /dev/mem               |    --    |    WB        |       WB         |
83*4882a593Smuzhiyun| mmap !SYNC flag        |          |              |                  |
84*4882a593Smuzhiyun| no alias to this area  |          |              |                  |
85*4882a593Smuzhiyun| and                    |          |              |                  |
86*4882a593Smuzhiyun| MTRR says WB           |          |              |                  |
87*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
88*4882a593Smuzhiyun| /dev/mem               |    --    |    --        |       UC-        |
89*4882a593Smuzhiyun| mmap !SYNC flag        |          |              |                  |
90*4882a593Smuzhiyun| no alias to this area  |          |              |                  |
91*4882a593Smuzhiyun| and                    |          |              |                  |
92*4882a593Smuzhiyun| MTRR says !WB          |          |              |                  |
93*4882a593Smuzhiyun+------------------------+----------+--------------+------------------+
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun
96*4882a593SmuzhiyunAdvanced APIs for drivers
97*4882a593Smuzhiyun=========================
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunA. Exporting pages to users with remap_pfn_range, io_remap_pfn_range,
100*4882a593Smuzhiyunvmf_insert_pfn.
101*4882a593Smuzhiyun
102*4882a593SmuzhiyunDrivers wanting to export some pages to userspace do it by using mmap
103*4882a593Smuzhiyuninterface and a combination of:
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun  1) pgprot_noncached()
106*4882a593Smuzhiyun  2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn()
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunWith PAT support, a new API pgprot_writecombine is being added. So, drivers can
109*4882a593Smuzhiyuncontinue to use the above sequence, with either pgprot_noncached() or
110*4882a593Smuzhiyunpgprot_writecombine() in step 1, followed by step 2.
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunIn addition, step 2 internally tracks the region as UC or WC in memtype
113*4882a593Smuzhiyunlist in order to ensure no conflicting mapping.
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunNote that this set of APIs only works with IO (non RAM) regions. If driver
116*4882a593Smuzhiyunwants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
117*4882a593Smuzhiyunas step 0 above and also track the usage of those pages and use set_memory_wb()
118*4882a593Smuzhiyunbefore the page is freed to free pool.
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunMTRR effects on PAT / non-PAT systems
121*4882a593Smuzhiyun=====================================
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunThe following table provides the effects of using write-combining MTRRs when
124*4882a593Smuzhiyunusing ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
125*4882a593Smuzhiyunmtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
126*4882a593Smuzhiyunbe a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
127*4882a593Smuzhiyunis made, should already have been ioremapped with WC attributes or PAT entries,
128*4882a593Smuzhiyunthis can be done by using ioremap_wc() / set_memory_wc().  Devices which
129*4882a593Smuzhiyuncombine areas of IO memory desired to remain uncacheable with areas where
130*4882a593Smuzhiyunwrite-combining is desirable should consider use of ioremap_uc() followed by
131*4882a593Smuzhiyunset_memory_wc() to white-list effective write-combined areas.  Such use is
132*4882a593Smuzhiyunnevertheless discouraged as the effective memory type is considered
133*4882a593Smuzhiyunimplementation defined, yet this strategy can be used as last resort on devices
134*4882a593Smuzhiyunwith size-constrained regions where otherwise MTRR write-combining would
135*4882a593Smuzhiyunotherwise not be effective.
136*4882a593Smuzhiyun::
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun  ====  =======  ===  =========================  =====================
139*4882a593Smuzhiyun  MTRR  Non-PAT  PAT  Linux ioremap value        Effective memory type
140*4882a593Smuzhiyun  ====  =======  ===  =========================  =====================
141*4882a593Smuzhiyun        PAT                                        Non-PAT |  PAT
142*4882a593Smuzhiyun        |PCD                                               |
143*4882a593Smuzhiyun        ||PWT                                              |
144*4882a593Smuzhiyun        |||                                                |
145*4882a593Smuzhiyun  WC    000      WB   _PAGE_CACHE_MODE_WB             WC   |   WC
146*4882a593Smuzhiyun  WC    001      WC   _PAGE_CACHE_MODE_WC             WC*  |   WC
147*4882a593Smuzhiyun  WC    010      UC-  _PAGE_CACHE_MODE_UC_MINUS       WC*  |   UC
148*4882a593Smuzhiyun  WC    011      UC   _PAGE_CACHE_MODE_UC             UC   |   UC
149*4882a593Smuzhiyun  ====  =======  ===  =========================  =====================
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun  (*) denotes implementation defined and is discouraged
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun.. note:: -- in the above table mean "Not suggested usage for the API". Some
154*4882a593Smuzhiyun  of the --'s are strictly enforced by the kernel. Some others are not really
155*4882a593Smuzhiyun  enforced today, but may be enforced in future.
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunFor ioremap and pci access through /sys or /proc - The actual type returned
158*4882a593Smuzhiyuncan be more restrictive, in case of any existing aliasing for that address.
159*4882a593SmuzhiyunFor example: If there is an existing uncached mapping, a new ioremap_wc can
160*4882a593Smuzhiyunreturn uncached mapping in place of write-combine requested.
161*4882a593Smuzhiyun
162*4882a593Smuzhiyunset_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
163*4882a593Smuzhiyunwill first make a region uc, wc or wt and switch it back to wb after use.
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunOver time writes to /proc/mtrr will be deprecated in favor of using PAT based
166*4882a593Smuzhiyuninterfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunDrivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
169*4882a593Smuzhiyuntypes.
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunDrivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun
174*4882a593SmuzhiyunPAT debugging
175*4882a593Smuzhiyun=============
176*4882a593Smuzhiyun
177*4882a593SmuzhiyunWith CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by::
178*4882a593Smuzhiyun
179*4882a593Smuzhiyun  # mount -t debugfs debugfs /sys/kernel/debug
180*4882a593Smuzhiyun  # cat /sys/kernel/debug/x86/pat_memtype_list
181*4882a593Smuzhiyun  PAT memtype list:
182*4882a593Smuzhiyun  uncached-minus @ 0x7fadf000-0x7fae0000
183*4882a593Smuzhiyun  uncached-minus @ 0x7fb19000-0x7fb1a000
184*4882a593Smuzhiyun  uncached-minus @ 0x7fb1a000-0x7fb1b000
185*4882a593Smuzhiyun  uncached-minus @ 0x7fb1b000-0x7fb1c000
186*4882a593Smuzhiyun  uncached-minus @ 0x7fb1c000-0x7fb1d000
187*4882a593Smuzhiyun  uncached-minus @ 0x7fb1d000-0x7fb1e000
188*4882a593Smuzhiyun  uncached-minus @ 0x7fb1e000-0x7fb25000
189*4882a593Smuzhiyun  uncached-minus @ 0x7fb25000-0x7fb26000
190*4882a593Smuzhiyun  uncached-minus @ 0x7fb26000-0x7fb27000
191*4882a593Smuzhiyun  uncached-minus @ 0x7fb27000-0x7fb28000
192*4882a593Smuzhiyun  uncached-minus @ 0x7fb28000-0x7fb2e000
193*4882a593Smuzhiyun  uncached-minus @ 0x7fb2e000-0x7fb2f000
194*4882a593Smuzhiyun  uncached-minus @ 0x7fb2f000-0x7fb30000
195*4882a593Smuzhiyun  uncached-minus @ 0x7fb31000-0x7fb32000
196*4882a593Smuzhiyun  uncached-minus @ 0x80000000-0x90000000
197*4882a593Smuzhiyun
198*4882a593SmuzhiyunThis list shows physical address ranges and various PAT settings used to
199*4882a593Smuzhiyunaccess those physical address ranges.
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunAnother, more verbose way of getting PAT related debug messages is with
202*4882a593Smuzhiyun"debugpat" boot parameter. With this parameter, various debug messages are
203*4882a593Smuzhiyunprinted to dmesg log.
204*4882a593Smuzhiyun
205*4882a593SmuzhiyunPAT Initialization
206*4882a593Smuzhiyun==================
207*4882a593Smuzhiyun
208*4882a593SmuzhiyunThe following table describes how PAT is initialized under various
209*4882a593Smuzhiyunconfigurations. The PAT MSR must be updated by Linux in order to support WC
210*4882a593Smuzhiyunand WT attributes. Otherwise, the PAT MSR has the value programmed in it
211*4882a593Smuzhiyunby the firmware. Note, Xen enables WC attribute in the PAT MSR for guests.
212*4882a593Smuzhiyun
213*4882a593Smuzhiyun ==== ===== ==========================  =========  =======
214*4882a593Smuzhiyun MTRR PAT   Call Sequence               PAT State  PAT MSR
215*4882a593Smuzhiyun ==== ===== ==========================  =========  =======
216*4882a593Smuzhiyun E    E     MTRR -> PAT init            Enabled    OS
217*4882a593Smuzhiyun E    D     MTRR -> PAT init            Disabled    -
218*4882a593Smuzhiyun D    E     MTRR -> PAT disable         Disabled   BIOS
219*4882a593Smuzhiyun D    D     MTRR -> PAT disable         Disabled    -
220*4882a593Smuzhiyun -    np/E  PAT  -> PAT disable         Disabled   BIOS
221*4882a593Smuzhiyun -    np/D  PAT  -> PAT disable         Disabled    -
222*4882a593Smuzhiyun E    !P/E  MTRR -> PAT init            Disabled   BIOS
223*4882a593Smuzhiyun D    !P/E  MTRR -> PAT disable         Disabled   BIOS
224*4882a593Smuzhiyun !M   !P/E  MTRR stub -> PAT disable    Disabled   BIOS
225*4882a593Smuzhiyun ==== ===== ==========================  =========  =======
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun  Legend
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun ========= =======================================
230*4882a593Smuzhiyun E         Feature enabled in CPU
231*4882a593Smuzhiyun D	   Feature disabled/unsupported in CPU
232*4882a593Smuzhiyun np	   "nopat" boot option specified
233*4882a593Smuzhiyun !P	   CONFIG_X86_PAT option unset
234*4882a593Smuzhiyun !M	   CONFIG_MTRR option unset
235*4882a593Smuzhiyun Enabled   PAT state set to enabled
236*4882a593Smuzhiyun Disabled  PAT state set to disabled
237*4882a593Smuzhiyun OS        PAT initializes PAT MSR with OS setting
238*4882a593Smuzhiyun BIOS      PAT keeps PAT MSR with BIOS setting
239*4882a593Smuzhiyun ========= =======================================
240*4882a593Smuzhiyun
241