1*4882a593Smuzhiyun.. _pagemap: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============================= 4*4882a593SmuzhiyunExamining Process Page Tables 5*4882a593Smuzhiyun============================= 6*4882a593Smuzhiyun 7*4882a593Smuzhiyunpagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow 8*4882a593Smuzhiyunuserspace programs to examine the page tables and related information by 9*4882a593Smuzhiyunreading files in ``/proc``. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunThere are four components to pagemap: 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun * ``/proc/pid/pagemap``. This file lets a userspace process find out which 14*4882a593Smuzhiyun physical frame each virtual page is mapped to. It contains one 64-bit 15*4882a593Smuzhiyun value for each virtual page, containing the following data (from 16*4882a593Smuzhiyun ``fs/proc/task_mmu.c``, above pagemap_read): 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun * Bits 0-54 page frame number (PFN) if present 19*4882a593Smuzhiyun * Bits 0-4 swap type if swapped 20*4882a593Smuzhiyun * Bits 5-54 swap offset if swapped 21*4882a593Smuzhiyun * Bit 55 pte is soft-dirty (see 22*4882a593Smuzhiyun :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`) 23*4882a593Smuzhiyun * Bit 56 page exclusively mapped (since 4.2) 24*4882a593Smuzhiyun * Bits 57-60 zero 25*4882a593Smuzhiyun * Bit 61 page is file-page or shared-anon (since 3.5) 26*4882a593Smuzhiyun * Bit 62 page swapped 27*4882a593Smuzhiyun * Bit 63 page present 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. 30*4882a593Smuzhiyun In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from 31*4882a593Smuzhiyun 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. 32*4882a593Smuzhiyun Reason: information about PFNs helps in exploiting Rowhammer vulnerability. 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun If the page is not present but in swap, then the PFN contains an 35*4882a593Smuzhiyun encoding of the swap file number and the page's offset into the 36*4882a593Smuzhiyun swap. Unmapped pages return a null PFN. This allows determining 37*4882a593Smuzhiyun precisely which pages are mapped (or in swap) and comparing mapped 38*4882a593Smuzhiyun pages between processes. 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun Efficient users of this interface will use ``/proc/pid/maps`` to 41*4882a593Smuzhiyun determine which areas of memory are actually mapped and llseek to 42*4882a593Smuzhiyun skip over unmapped regions. 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun * ``/proc/kpagecount``. This file contains a 64-bit count of the number of 45*4882a593Smuzhiyun times each page is mapped, indexed by PFN. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunThe page-types tool in the tools/vm directory can be used to query the 48*4882a593Smuzhiyunnumber of times a page is mapped. 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each 51*4882a593Smuzhiyun page, indexed by PFN. 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun The flags are (from ``fs/proc/page.c``, above kpageflags_read): 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun 0. LOCKED 56*4882a593Smuzhiyun 1. ERROR 57*4882a593Smuzhiyun 2. REFERENCED 58*4882a593Smuzhiyun 3. UPTODATE 59*4882a593Smuzhiyun 4. DIRTY 60*4882a593Smuzhiyun 5. LRU 61*4882a593Smuzhiyun 6. ACTIVE 62*4882a593Smuzhiyun 7. SLAB 63*4882a593Smuzhiyun 8. WRITEBACK 64*4882a593Smuzhiyun 9. RECLAIM 65*4882a593Smuzhiyun 10. BUDDY 66*4882a593Smuzhiyun 11. MMAP 67*4882a593Smuzhiyun 12. ANON 68*4882a593Smuzhiyun 13. SWAPCACHE 69*4882a593Smuzhiyun 14. SWAPBACKED 70*4882a593Smuzhiyun 15. COMPOUND_HEAD 71*4882a593Smuzhiyun 16. COMPOUND_TAIL 72*4882a593Smuzhiyun 17. HUGE 73*4882a593Smuzhiyun 18. UNEVICTABLE 74*4882a593Smuzhiyun 19. HWPOISON 75*4882a593Smuzhiyun 20. NOPAGE 76*4882a593Smuzhiyun 21. KSM 77*4882a593Smuzhiyun 22. THP 78*4882a593Smuzhiyun 23. OFFLINE 79*4882a593Smuzhiyun 24. ZERO_PAGE 80*4882a593Smuzhiyun 25. IDLE 81*4882a593Smuzhiyun 26. PGTABLE 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the 84*4882a593Smuzhiyun memory cgroup each page is charged to, indexed by PFN. Only available when 85*4882a593Smuzhiyun CONFIG_MEMCG is set. 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunShort descriptions to the page flags 88*4882a593Smuzhiyun==================================== 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun0 - LOCKED 91*4882a593Smuzhiyun page is being locked for exclusive access, e.g. by undergoing read/write IO 92*4882a593Smuzhiyun7 - SLAB 93*4882a593Smuzhiyun page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator 94*4882a593Smuzhiyun When compound page is used, SLUB/SLQB will only set this flag on the head 95*4882a593Smuzhiyun page; SLOB will not flag it at all. 96*4882a593Smuzhiyun10 - BUDDY 97*4882a593Smuzhiyun a free memory block managed by the buddy system allocator 98*4882a593Smuzhiyun The buddy system organizes free memory in blocks of various orders. 99*4882a593Smuzhiyun An order N block has 2^N physically contiguous pages, with the BUDDY flag 100*4882a593Smuzhiyun set for and _only_ for the first page. 101*4882a593Smuzhiyun15 - COMPOUND_HEAD 102*4882a593Smuzhiyun A compound page with order N consists of 2^N physically contiguous pages. 103*4882a593Smuzhiyun A compound page with order 2 takes the form of "HTTT", where H donates its 104*4882a593Smuzhiyun head page and T donates its tail page(s). The major consumers of compound 105*4882a593Smuzhiyun pages are hugeTLB pages 106*4882a593Smuzhiyun (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`), 107*4882a593Smuzhiyun the SLUB etc. memory allocators and various device drivers. 108*4882a593Smuzhiyun However in this interface, only huge/giga pages are made visible 109*4882a593Smuzhiyun to end users. 110*4882a593Smuzhiyun16 - COMPOUND_TAIL 111*4882a593Smuzhiyun A compound page tail (see description above). 112*4882a593Smuzhiyun17 - HUGE 113*4882a593Smuzhiyun this is an integral part of a HugeTLB page 114*4882a593Smuzhiyun19 - HWPOISON 115*4882a593Smuzhiyun hardware detected memory corruption on this page: don't touch the data! 116*4882a593Smuzhiyun20 - NOPAGE 117*4882a593Smuzhiyun no page frame exists at the requested address 118*4882a593Smuzhiyun21 - KSM 119*4882a593Smuzhiyun identical memory pages dynamically shared between one or more processes 120*4882a593Smuzhiyun22 - THP 121*4882a593Smuzhiyun contiguous pages which construct transparent hugepages 122*4882a593Smuzhiyun23 - OFFLINE 123*4882a593Smuzhiyun page is logically offline 124*4882a593Smuzhiyun24 - ZERO_PAGE 125*4882a593Smuzhiyun zero page for pfn_zero or huge_zero page 126*4882a593Smuzhiyun25 - IDLE 127*4882a593Smuzhiyun page has not been accessed since it was marked idle (see 128*4882a593Smuzhiyun :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`). 129*4882a593Smuzhiyun Note that this flag may be stale in case the page was accessed via 130*4882a593Smuzhiyun a PTE. To make sure the flag is up-to-date one has to read 131*4882a593Smuzhiyun ``/sys/kernel/mm/page_idle/bitmap`` first. 132*4882a593Smuzhiyun26 - PGTABLE 133*4882a593Smuzhiyun page is in use as a page table 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunIO related page flags 136*4882a593Smuzhiyun--------------------- 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun1 - ERROR 139*4882a593Smuzhiyun IO error occurred 140*4882a593Smuzhiyun3 - UPTODATE 141*4882a593Smuzhiyun page has up-to-date data 142*4882a593Smuzhiyun ie. for file backed page: (in-memory data revision >= on-disk one) 143*4882a593Smuzhiyun4 - DIRTY 144*4882a593Smuzhiyun page has been written to, hence contains new data 145*4882a593Smuzhiyun i.e. for file backed page: (in-memory data revision > on-disk one) 146*4882a593Smuzhiyun8 - WRITEBACK 147*4882a593Smuzhiyun page is being synced to disk 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunLRU related page flags 150*4882a593Smuzhiyun---------------------- 151*4882a593Smuzhiyun 152*4882a593Smuzhiyun5 - LRU 153*4882a593Smuzhiyun page is in one of the LRU lists 154*4882a593Smuzhiyun6 - ACTIVE 155*4882a593Smuzhiyun page is in the active LRU list 156*4882a593Smuzhiyun18 - UNEVICTABLE 157*4882a593Smuzhiyun page is in the unevictable (non-)LRU list It is somehow pinned and 158*4882a593Smuzhiyun not a candidate for LRU page reclaims, e.g. ramfs pages, 159*4882a593Smuzhiyun shmctl(SHM_LOCK) and mlock() memory segments 160*4882a593Smuzhiyun2 - REFERENCED 161*4882a593Smuzhiyun page has been referenced since last LRU list enqueue/requeue 162*4882a593Smuzhiyun9 - RECLAIM 163*4882a593Smuzhiyun page will be reclaimed soon after its pageout IO completed 164*4882a593Smuzhiyun11 - MMAP 165*4882a593Smuzhiyun a memory mapped page 166*4882a593Smuzhiyun12 - ANON 167*4882a593Smuzhiyun a memory mapped page that is not part of a file 168*4882a593Smuzhiyun13 - SWAPCACHE 169*4882a593Smuzhiyun page is mapped to swap space, i.e. has an associated swap entry 170*4882a593Smuzhiyun14 - SWAPBACKED 171*4882a593Smuzhiyun page is backed by swap/RAM 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunThe page-types tool in the tools/vm directory can be used to query the 174*4882a593Smuzhiyunabove flags. 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunUsing pagemap to do something useful 177*4882a593Smuzhiyun==================================== 178*4882a593Smuzhiyun 179*4882a593SmuzhiyunThe general procedure for using pagemap to find out about a process' memory 180*4882a593Smuzhiyunusage goes like this: 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are 183*4882a593Smuzhiyun mapped to what. 184*4882a593Smuzhiyun 2. Select the maps you are interested in -- all of them, or a particular 185*4882a593Smuzhiyun library, or the stack or the heap, etc. 186*4882a593Smuzhiyun 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine. 187*4882a593Smuzhiyun 4. Read a u64 for each page from pagemap. 188*4882a593Smuzhiyun 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you 189*4882a593Smuzhiyun just read, seek to that entry in the file, and read the data you want. 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunFor example, to find the "unique set size" (USS), which is the amount of 192*4882a593Smuzhiyunmemory that a process is using that is not shared with any other process, 193*4882a593Smuzhiyunyou can go through every map in the process, find the PFNs, look those up 194*4882a593Smuzhiyunin kpagecount, and tally up the number of pages that are only referenced 195*4882a593Smuzhiyunonce. 196*4882a593Smuzhiyun 197*4882a593SmuzhiyunOther notes 198*4882a593Smuzhiyun=========== 199*4882a593Smuzhiyun 200*4882a593SmuzhiyunReading from any of the files will return -EINVAL if you are not starting 201*4882a593Smuzhiyunthe read on an 8-byte boundary (e.g., if you sought an odd number of bytes 202*4882a593Smuzhiyuninto the file), or if the size of the read is not a multiple of 8 bytes. 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunBefore Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is 205*4882a593Smuzhiyunalways 12 at most architectures). Since Linux 3.11 their meaning changes 206*4882a593Smuzhiyunafter first clear of soft-dirty bits. Since Linux 4.2 they are used for 207*4882a593Smuzhiyunflags unconditionally. 208