1*4882a593Smuzhiyun.. _split_page_table_lock: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun===================== 4*4882a593SmuzhiyunSplit page table lock 5*4882a593Smuzhiyun===================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOriginally, mm->page_table_lock spinlock protected all page tables of the 8*4882a593Smuzhiyunmm_struct. But this approach leads to poor page fault scalability of 9*4882a593Smuzhiyunmulti-threaded applications due high contention on the lock. To improve 10*4882a593Smuzhiyunscalability, split page table lock was introduced. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunWith split page table lock we have separate per-table lock to serialize 13*4882a593Smuzhiyunaccess to the table. At the moment we use split lock for PTE and PMD 14*4882a593Smuzhiyuntables. Access to higher level tables protected by mm->page_table_lock. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunThere are helpers to lock/unlock a table and other accessor functions: 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun - pte_offset_map_lock() 19*4882a593Smuzhiyun maps pte and takes PTE table lock, returns pointer to the taken 20*4882a593Smuzhiyun lock; 21*4882a593Smuzhiyun - pte_unmap_unlock() 22*4882a593Smuzhiyun unlocks and unmaps PTE table; 23*4882a593Smuzhiyun - pte_alloc_map_lock() 24*4882a593Smuzhiyun allocates PTE table if needed and take the lock, returns pointer 25*4882a593Smuzhiyun to taken lock or NULL if allocation failed; 26*4882a593Smuzhiyun - pte_lockptr() 27*4882a593Smuzhiyun returns pointer to PTE table lock; 28*4882a593Smuzhiyun - pmd_lock() 29*4882a593Smuzhiyun takes PMD table lock, returns pointer to taken lock; 30*4882a593Smuzhiyun - pmd_lockptr() 31*4882a593Smuzhiyun returns pointer to PMD table lock; 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunSplit page table lock for PTE tables is enabled compile-time if 34*4882a593SmuzhiyunCONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS. 35*4882a593SmuzhiyunIf split lock is disabled, all tables guaded by mm->page_table_lock. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunSplit page table lock for PMD tables is enabled, if it's enabled for PTE 38*4882a593Smuzhiyuntables and the architecture supports it (see below). 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunHugetlb and split page table lock 41*4882a593Smuzhiyun================================= 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunHugetlb can support several page sizes. We use split lock only for PMD 44*4882a593Smuzhiyunlevel, but not for PUD. 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunHugetlb-specific helpers: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun - huge_pte_lock() 49*4882a593Smuzhiyun takes pmd split lock for PMD_SIZE page, mm->page_table_lock 50*4882a593Smuzhiyun otherwise; 51*4882a593Smuzhiyun - huge_pte_lockptr() 52*4882a593Smuzhiyun returns pointer to table lock; 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunSupport of split page table lock by an architecture 55*4882a593Smuzhiyun=================================================== 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunThere's no need in special enabling of PTE split page table lock: everything 58*4882a593Smuzhiyunrequired is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which 59*4882a593Smuzhiyunmust be called on PTE table allocation / freeing. 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunMake sure the architecture doesn't use slab allocator for page table 62*4882a593Smuzhiyunallocation: slab uses page->slab_cache for its pages. 63*4882a593SmuzhiyunThis field shares storage with page->ptl. 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunPMD split lock only makes sense if you have more than two page table 66*4882a593Smuzhiyunlevels. 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunPMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table 69*4882a593Smuzhiyunallocation and pgtable_pmd_page_dtor() on freeing. 70*4882a593Smuzhiyun 71*4882a593SmuzhiyunAllocation usually happens in pmd_alloc_one(), freeing in pmd_free() and 72*4882a593Smuzhiyunpmd_free_tlb(), but make sure you cover all PMD table allocation / freeing 73*4882a593Smuzhiyunpaths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunWith everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunNOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must 78*4882a593Smuzhiyunbe handled properly. 79*4882a593Smuzhiyun 80*4882a593Smuzhiyunpage->ptl 81*4882a593Smuzhiyun========= 82*4882a593Smuzhiyun 83*4882a593Smuzhiyunpage->ptl is used to access split page table lock, where 'page' is struct 84*4882a593Smuzhiyunpage of page containing the table. It shares storage with page->private 85*4882a593Smuzhiyun(and few other fields in union). 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunTo avoid increasing size of struct page and have best performance, we use a 88*4882a593Smuzhiyuntrick: 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun - if spinlock_t fits into long, we use page->ptr as spinlock, so we 91*4882a593Smuzhiyun can avoid indirect access and save a cache line. 92*4882a593Smuzhiyun - if size of spinlock_t is bigger then size of long, we use page->ptl as 93*4882a593Smuzhiyun pointer to spinlock_t and allocate it dynamically. This allows to use 94*4882a593Smuzhiyun split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs 95*4882a593Smuzhiyun one more cache line for indirect access; 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunThe spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in 98*4882a593Smuzhiyunpgtable_pmd_page_ctor() for PMD table. 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunPlease, never access page->ptl directly -- use appropriate helper. 101