xref: /OK3568_Linux_fs/kernel/Documentation/vm/split_page_table_lock.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _split_page_table_lock:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=====================
4*4882a593SmuzhiyunSplit page table lock
5*4882a593Smuzhiyun=====================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunOriginally, mm->page_table_lock spinlock protected all page tables of the
8*4882a593Smuzhiyunmm_struct. But this approach leads to poor page fault scalability of
9*4882a593Smuzhiyunmulti-threaded applications due high contention on the lock. To improve
10*4882a593Smuzhiyunscalability, split page table lock was introduced.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunWith split page table lock we have separate per-table lock to serialize
13*4882a593Smuzhiyunaccess to the table. At the moment we use split lock for PTE and PMD
14*4882a593Smuzhiyuntables. Access to higher level tables protected by mm->page_table_lock.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunThere are helpers to lock/unlock a table and other accessor functions:
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun - pte_offset_map_lock()
19*4882a593Smuzhiyun	maps pte and takes PTE table lock, returns pointer to the taken
20*4882a593Smuzhiyun	lock;
21*4882a593Smuzhiyun - pte_unmap_unlock()
22*4882a593Smuzhiyun	unlocks and unmaps PTE table;
23*4882a593Smuzhiyun - pte_alloc_map_lock()
24*4882a593Smuzhiyun	allocates PTE table if needed and take the lock, returns pointer
25*4882a593Smuzhiyun	to taken lock or NULL if allocation failed;
26*4882a593Smuzhiyun - pte_lockptr()
27*4882a593Smuzhiyun	returns pointer to PTE table lock;
28*4882a593Smuzhiyun - pmd_lock()
29*4882a593Smuzhiyun	takes PMD table lock, returns pointer to taken lock;
30*4882a593Smuzhiyun - pmd_lockptr()
31*4882a593Smuzhiyun	returns pointer to PMD table lock;
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunSplit page table lock for PTE tables is enabled compile-time if
34*4882a593SmuzhiyunCONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
35*4882a593SmuzhiyunIf split lock is disabled, all tables guaded by mm->page_table_lock.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunSplit page table lock for PMD tables is enabled, if it's enabled for PTE
38*4882a593Smuzhiyuntables and the architecture supports it (see below).
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunHugetlb and split page table lock
41*4882a593Smuzhiyun=================================
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunHugetlb can support several page sizes. We use split lock only for PMD
44*4882a593Smuzhiyunlevel, but not for PUD.
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunHugetlb-specific helpers:
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun - huge_pte_lock()
49*4882a593Smuzhiyun	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
50*4882a593Smuzhiyun	otherwise;
51*4882a593Smuzhiyun - huge_pte_lockptr()
52*4882a593Smuzhiyun	returns pointer to table lock;
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunSupport of split page table lock by an architecture
55*4882a593Smuzhiyun===================================================
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunThere's no need in special enabling of PTE split page table lock: everything
58*4882a593Smuzhiyunrequired is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
59*4882a593Smuzhiyunmust be called on PTE table allocation / freeing.
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunMake sure the architecture doesn't use slab allocator for page table
62*4882a593Smuzhiyunallocation: slab uses page->slab_cache for its pages.
63*4882a593SmuzhiyunThis field shares storage with page->ptl.
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunPMD split lock only makes sense if you have more than two page table
66*4882a593Smuzhiyunlevels.
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunPMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
69*4882a593Smuzhiyunallocation and pgtable_pmd_page_dtor() on freeing.
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunAllocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
72*4882a593Smuzhiyunpmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
73*4882a593Smuzhiyunpaths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunWith everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunNOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
78*4882a593Smuzhiyunbe handled properly.
79*4882a593Smuzhiyun
80*4882a593Smuzhiyunpage->ptl
81*4882a593Smuzhiyun=========
82*4882a593Smuzhiyun
83*4882a593Smuzhiyunpage->ptl is used to access split page table lock, where 'page' is struct
84*4882a593Smuzhiyunpage of page containing the table. It shares storage with page->private
85*4882a593Smuzhiyun(and few other fields in union).
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunTo avoid increasing size of struct page and have best performance, we use a
88*4882a593Smuzhiyuntrick:
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun - if spinlock_t fits into long, we use page->ptr as spinlock, so we
91*4882a593Smuzhiyun   can avoid indirect access and save a cache line.
92*4882a593Smuzhiyun - if size of spinlock_t is bigger then size of long, we use page->ptl as
93*4882a593Smuzhiyun   pointer to spinlock_t and allocate it dynamically. This allows to use
94*4882a593Smuzhiyun   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
95*4882a593Smuzhiyun   one more cache line for indirect access;
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunThe spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
98*4882a593Smuzhiyunpgtable_pmd_page_ctor() for PMD table.
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunPlease, never access page->ptl directly -- use appropriate helper.
101