xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/mm/pagemap.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _pagemap:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=============================
4*4882a593SmuzhiyunExamining Process Page Tables
5*4882a593Smuzhiyun=============================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyunpagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
8*4882a593Smuzhiyunuserspace programs to examine the page tables and related information by
9*4882a593Smuzhiyunreading files in ``/proc``.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunThere are four components to pagemap:
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun * ``/proc/pid/pagemap``.  This file lets a userspace process find out which
14*4882a593Smuzhiyun   physical frame each virtual page is mapped to.  It contains one 64-bit
15*4882a593Smuzhiyun   value for each virtual page, containing the following data (from
16*4882a593Smuzhiyun   ``fs/proc/task_mmu.c``, above pagemap_read):
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun    * Bits 0-54  page frame number (PFN) if present
19*4882a593Smuzhiyun    * Bits 0-4   swap type if swapped
20*4882a593Smuzhiyun    * Bits 5-54  swap offset if swapped
21*4882a593Smuzhiyun    * Bit  55    pte is soft-dirty (see
22*4882a593Smuzhiyun      :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
23*4882a593Smuzhiyun    * Bit  56    page exclusively mapped (since 4.2)
24*4882a593Smuzhiyun    * Bits 57-60 zero
25*4882a593Smuzhiyun    * Bit  61    page is file-page or shared-anon (since 3.5)
26*4882a593Smuzhiyun    * Bit  62    page swapped
27*4882a593Smuzhiyun    * Bit  63    page present
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
30*4882a593Smuzhiyun   In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting from
31*4882a593Smuzhiyun   4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
32*4882a593Smuzhiyun   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
33*4882a593Smuzhiyun
34*4882a593Smuzhiyun   If the page is not present but in swap, then the PFN contains an
35*4882a593Smuzhiyun   encoding of the swap file number and the page's offset into the
36*4882a593Smuzhiyun   swap. Unmapped pages return a null PFN. This allows determining
37*4882a593Smuzhiyun   precisely which pages are mapped (or in swap) and comparing mapped
38*4882a593Smuzhiyun   pages between processes.
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun   Efficient users of this interface will use ``/proc/pid/maps`` to
41*4882a593Smuzhiyun   determine which areas of memory are actually mapped and llseek to
42*4882a593Smuzhiyun   skip over unmapped regions.
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
45*4882a593Smuzhiyun   times each page is mapped, indexed by PFN.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunThe page-types tool in the tools/vm directory can be used to query the
48*4882a593Smuzhiyunnumber of times a page is mapped.
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun * ``/proc/kpageflags``.  This file contains a 64-bit set of flags for each
51*4882a593Smuzhiyun   page, indexed by PFN.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun   The flags are (from ``fs/proc/page.c``, above kpageflags_read):
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun    0. LOCKED
56*4882a593Smuzhiyun    1. ERROR
57*4882a593Smuzhiyun    2. REFERENCED
58*4882a593Smuzhiyun    3. UPTODATE
59*4882a593Smuzhiyun    4. DIRTY
60*4882a593Smuzhiyun    5. LRU
61*4882a593Smuzhiyun    6. ACTIVE
62*4882a593Smuzhiyun    7. SLAB
63*4882a593Smuzhiyun    8. WRITEBACK
64*4882a593Smuzhiyun    9. RECLAIM
65*4882a593Smuzhiyun    10. BUDDY
66*4882a593Smuzhiyun    11. MMAP
67*4882a593Smuzhiyun    12. ANON
68*4882a593Smuzhiyun    13. SWAPCACHE
69*4882a593Smuzhiyun    14. SWAPBACKED
70*4882a593Smuzhiyun    15. COMPOUND_HEAD
71*4882a593Smuzhiyun    16. COMPOUND_TAIL
72*4882a593Smuzhiyun    17. HUGE
73*4882a593Smuzhiyun    18. UNEVICTABLE
74*4882a593Smuzhiyun    19. HWPOISON
75*4882a593Smuzhiyun    20. NOPAGE
76*4882a593Smuzhiyun    21. KSM
77*4882a593Smuzhiyun    22. THP
78*4882a593Smuzhiyun    23. OFFLINE
79*4882a593Smuzhiyun    24. ZERO_PAGE
80*4882a593Smuzhiyun    25. IDLE
81*4882a593Smuzhiyun    26. PGTABLE
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun * ``/proc/kpagecgroup``.  This file contains a 64-bit inode number of the
84*4882a593Smuzhiyun   memory cgroup each page is charged to, indexed by PFN. Only available when
85*4882a593Smuzhiyun   CONFIG_MEMCG is set.
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunShort descriptions to the page flags
88*4882a593Smuzhiyun====================================
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun0 - LOCKED
91*4882a593Smuzhiyun   page is being locked for exclusive access, e.g. by undergoing read/write IO
92*4882a593Smuzhiyun7 - SLAB
93*4882a593Smuzhiyun   page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
94*4882a593Smuzhiyun   When compound page is used, SLUB/SLQB will only set this flag on the head
95*4882a593Smuzhiyun   page; SLOB will not flag it at all.
96*4882a593Smuzhiyun10 - BUDDY
97*4882a593Smuzhiyun    a free memory block managed by the buddy system allocator
98*4882a593Smuzhiyun    The buddy system organizes free memory in blocks of various orders.
99*4882a593Smuzhiyun    An order N block has 2^N physically contiguous pages, with the BUDDY flag
100*4882a593Smuzhiyun    set for and _only_ for the first page.
101*4882a593Smuzhiyun15 - COMPOUND_HEAD
102*4882a593Smuzhiyun    A compound page with order N consists of 2^N physically contiguous pages.
103*4882a593Smuzhiyun    A compound page with order 2 takes the form of "HTTT", where H donates its
104*4882a593Smuzhiyun    head page and T donates its tail page(s).  The major consumers of compound
105*4882a593Smuzhiyun    pages are hugeTLB pages
106*4882a593Smuzhiyun    (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
107*4882a593Smuzhiyun    the SLUB etc.  memory allocators and various device drivers.
108*4882a593Smuzhiyun    However in this interface, only huge/giga pages are made visible
109*4882a593Smuzhiyun    to end users.
110*4882a593Smuzhiyun16 - COMPOUND_TAIL
111*4882a593Smuzhiyun    A compound page tail (see description above).
112*4882a593Smuzhiyun17 - HUGE
113*4882a593Smuzhiyun    this is an integral part of a HugeTLB page
114*4882a593Smuzhiyun19 - HWPOISON
115*4882a593Smuzhiyun    hardware detected memory corruption on this page: don't touch the data!
116*4882a593Smuzhiyun20 - NOPAGE
117*4882a593Smuzhiyun    no page frame exists at the requested address
118*4882a593Smuzhiyun21 - KSM
119*4882a593Smuzhiyun    identical memory pages dynamically shared between one or more processes
120*4882a593Smuzhiyun22 - THP
121*4882a593Smuzhiyun    contiguous pages which construct transparent hugepages
122*4882a593Smuzhiyun23 - OFFLINE
123*4882a593Smuzhiyun    page is logically offline
124*4882a593Smuzhiyun24 - ZERO_PAGE
125*4882a593Smuzhiyun    zero page for pfn_zero or huge_zero page
126*4882a593Smuzhiyun25 - IDLE
127*4882a593Smuzhiyun    page has not been accessed since it was marked idle (see
128*4882a593Smuzhiyun    :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
129*4882a593Smuzhiyun    Note that this flag may be stale in case the page was accessed via
130*4882a593Smuzhiyun    a PTE. To make sure the flag is up-to-date one has to read
131*4882a593Smuzhiyun    ``/sys/kernel/mm/page_idle/bitmap`` first.
132*4882a593Smuzhiyun26 - PGTABLE
133*4882a593Smuzhiyun    page is in use as a page table
134*4882a593Smuzhiyun
135*4882a593SmuzhiyunIO related page flags
136*4882a593Smuzhiyun---------------------
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun1 - ERROR
139*4882a593Smuzhiyun   IO error occurred
140*4882a593Smuzhiyun3 - UPTODATE
141*4882a593Smuzhiyun   page has up-to-date data
142*4882a593Smuzhiyun   ie. for file backed page: (in-memory data revision >= on-disk one)
143*4882a593Smuzhiyun4 - DIRTY
144*4882a593Smuzhiyun   page has been written to, hence contains new data
145*4882a593Smuzhiyun   i.e. for file backed page: (in-memory data revision >  on-disk one)
146*4882a593Smuzhiyun8 - WRITEBACK
147*4882a593Smuzhiyun   page is being synced to disk
148*4882a593Smuzhiyun
149*4882a593SmuzhiyunLRU related page flags
150*4882a593Smuzhiyun----------------------
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun5 - LRU
153*4882a593Smuzhiyun   page is in one of the LRU lists
154*4882a593Smuzhiyun6 - ACTIVE
155*4882a593Smuzhiyun   page is in the active LRU list
156*4882a593Smuzhiyun18 - UNEVICTABLE
157*4882a593Smuzhiyun   page is in the unevictable (non-)LRU list It is somehow pinned and
158*4882a593Smuzhiyun   not a candidate for LRU page reclaims, e.g. ramfs pages,
159*4882a593Smuzhiyun   shmctl(SHM_LOCK) and mlock() memory segments
160*4882a593Smuzhiyun2 - REFERENCED
161*4882a593Smuzhiyun   page has been referenced since last LRU list enqueue/requeue
162*4882a593Smuzhiyun9 - RECLAIM
163*4882a593Smuzhiyun   page will be reclaimed soon after its pageout IO completed
164*4882a593Smuzhiyun11 - MMAP
165*4882a593Smuzhiyun   a memory mapped page
166*4882a593Smuzhiyun12 - ANON
167*4882a593Smuzhiyun   a memory mapped page that is not part of a file
168*4882a593Smuzhiyun13 - SWAPCACHE
169*4882a593Smuzhiyun   page is mapped to swap space, i.e. has an associated swap entry
170*4882a593Smuzhiyun14 - SWAPBACKED
171*4882a593Smuzhiyun   page is backed by swap/RAM
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunThe page-types tool in the tools/vm directory can be used to query the
174*4882a593Smuzhiyunabove flags.
175*4882a593Smuzhiyun
176*4882a593SmuzhiyunUsing pagemap to do something useful
177*4882a593Smuzhiyun====================================
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunThe general procedure for using pagemap to find out about a process' memory
180*4882a593Smuzhiyunusage goes like this:
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
183*4882a593Smuzhiyun    mapped to what.
184*4882a593Smuzhiyun 2. Select the maps you are interested in -- all of them, or a particular
185*4882a593Smuzhiyun    library, or the stack or the heap, etc.
186*4882a593Smuzhiyun 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
187*4882a593Smuzhiyun 4. Read a u64 for each page from pagemap.
188*4882a593Smuzhiyun 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``.  For each PFN you
189*4882a593Smuzhiyun    just read, seek to that entry in the file, and read the data you want.
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunFor example, to find the "unique set size" (USS), which is the amount of
192*4882a593Smuzhiyunmemory that a process is using that is not shared with any other process,
193*4882a593Smuzhiyunyou can go through every map in the process, find the PFNs, look those up
194*4882a593Smuzhiyunin kpagecount, and tally up the number of pages that are only referenced
195*4882a593Smuzhiyunonce.
196*4882a593Smuzhiyun
197*4882a593SmuzhiyunOther notes
198*4882a593Smuzhiyun===========
199*4882a593Smuzhiyun
200*4882a593SmuzhiyunReading from any of the files will return -EINVAL if you are not starting
201*4882a593Smuzhiyunthe read on an 8-byte boundary (e.g., if you sought an odd number of bytes
202*4882a593Smuzhiyuninto the file), or if the size of the read is not a multiple of 8 bytes.
203*4882a593Smuzhiyun
204*4882a593SmuzhiyunBefore Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
205*4882a593Smuzhiyunalways 12 at most architectures). Since Linux 3.11 their meaning changes
206*4882a593Smuzhiyunafter first clear of soft-dirty bits. Since Linux 4.2 they are used for
207*4882a593Smuzhiyunflags unconditionally.
208