1*4882a593Smuzhiyun.. _highmem: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun==================== 4*4882a593SmuzhiyunHigh Memory Handling 5*4882a593Smuzhiyun==================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunBy: Peter Zijlstra <a.p.zijlstra@chello.nl> 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun.. contents:: :local: 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunWhat Is High Memory? 12*4882a593Smuzhiyun==================== 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunHigh memory (highmem) is used when the size of physical memory approaches or 15*4882a593Smuzhiyunexceeds the maximum size of virtual memory. At that point it becomes 16*4882a593Smuzhiyunimpossible for the kernel to keep all of the available physical memory mapped 17*4882a593Smuzhiyunat all times. This means the kernel needs to start using temporary mappings of 18*4882a593Smuzhiyunthe pieces of physical memory that it wants to access. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThe part of (physical) memory not covered by a permanent mapping is what we 21*4882a593Smuzhiyunrefer to as 'highmem'. There are various architecture dependent constraints on 22*4882a593Smuzhiyunwhere exactly that border lies. 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunIn the i386 arch, for example, we choose to map the kernel into every process's 25*4882a593SmuzhiyunVM space so that we don't have to pay the full TLB invalidation costs for 26*4882a593Smuzhiyunkernel entry/exit. This means the available virtual memory space (4GiB on 27*4882a593Smuzhiyuni386) has to be divided between user and kernel space. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunThe traditional split for architectures using this approach is 3:1, 3GiB for 30*4882a593Smuzhiyunuserspace and the top 1GiB for kernel space:: 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun +--------+ 0xffffffff 33*4882a593Smuzhiyun | Kernel | 34*4882a593Smuzhiyun +--------+ 0xc0000000 35*4882a593Smuzhiyun | | 36*4882a593Smuzhiyun | User | 37*4882a593Smuzhiyun | | 38*4882a593Smuzhiyun +--------+ 0x00000000 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunThis means that the kernel can at most map 1GiB of physical memory at any one 41*4882a593Smuzhiyuntime, but because we need virtual address space for other things - including 42*4882a593Smuzhiyuntemporary maps to access the rest of the physical memory - the actual direct 43*4882a593Smuzhiyunmap will typically be less (usually around ~896MiB). 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunOther architectures that have mm context tagged TLBs can have separate kernel 46*4882a593Smuzhiyunand user maps. Some hardware (like some ARMs), however, have limited virtual 47*4882a593Smuzhiyunspace when they use mm context tags. 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunTemporary Virtual Mappings 51*4882a593Smuzhiyun========================== 52*4882a593Smuzhiyun 53*4882a593SmuzhiyunThe kernel contains several ways of creating temporary mappings: 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun* vmap(). This can be used to make a long duration mapping of multiple 56*4882a593Smuzhiyun physical pages into a contiguous virtual space. It needs global 57*4882a593Smuzhiyun synchronization to unmap. 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun* kmap(). This permits a short duration mapping of a single page. It needs 60*4882a593Smuzhiyun global synchronization, but is amortized somewhat. It is also prone to 61*4882a593Smuzhiyun deadlocks when using in a nested fashion, and so it is not recommended for 62*4882a593Smuzhiyun new code. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun* kmap_atomic(). This permits a very short duration mapping of a single 65*4882a593Smuzhiyun page. Since the mapping is restricted to the CPU that issued it, it 66*4882a593Smuzhiyun performs well, but the issuing task is therefore required to stay on that 67*4882a593Smuzhiyun CPU until it has finished, lest some other task displace its mappings. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun kmap_atomic() may also be used by interrupt contexts, since it is does not 70*4882a593Smuzhiyun sleep and the caller may not sleep until after kunmap_atomic() is called. 71*4882a593Smuzhiyun 72*4882a593Smuzhiyun It may be assumed that k[un]map_atomic() won't fail. 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunUsing kmap_atomic 76*4882a593Smuzhiyun================= 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunWhen and where to use kmap_atomic() is straightforward. It is used when code 79*4882a593Smuzhiyunwants to access the contents of a page that might be allocated from high memory 80*4882a593Smuzhiyun(see __GFP_HIGHMEM), for example a page in the pagecache. The API has two 81*4882a593Smuzhiyunfunctions, and they can be used in a manner similar to the following:: 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun /* Find the page of interest. */ 84*4882a593Smuzhiyun struct page *page = find_get_page(mapping, offset); 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun /* Gain access to the contents of that page. */ 87*4882a593Smuzhiyun void *vaddr = kmap_atomic(page); 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun /* Do something to the contents of that page. */ 90*4882a593Smuzhiyun memset(vaddr, 0, PAGE_SIZE); 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun /* Unmap that page. */ 93*4882a593Smuzhiyun kunmap_atomic(vaddr); 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunNote that the kunmap_atomic() call takes the result of the kmap_atomic() call 96*4882a593Smuzhiyunnot the argument. 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunIf you need to map two pages because you want to copy from one page to 99*4882a593Smuzhiyunanother you need to keep the kmap_atomic calls strictly nested, like:: 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun vaddr1 = kmap_atomic(page1); 102*4882a593Smuzhiyun vaddr2 = kmap_atomic(page2); 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun memcpy(vaddr1, vaddr2, PAGE_SIZE); 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun kunmap_atomic(vaddr2); 107*4882a593Smuzhiyun kunmap_atomic(vaddr1); 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunCost of Temporary Mappings 111*4882a593Smuzhiyun========================== 112*4882a593Smuzhiyun 113*4882a593SmuzhiyunThe cost of creating temporary mappings can be quite high. The arch has to 114*4882a593Smuzhiyunmanipulate the kernel's page tables, the data TLB and/or the MMU's registers. 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunIf CONFIG_HIGHMEM is not set, then the kernel will try and create a mapping 117*4882a593Smuzhiyunsimply with a bit of arithmetic that will convert the page struct address into 118*4882a593Smuzhiyuna pointer to the page contents rather than juggling mappings about. In such a 119*4882a593Smuzhiyuncase, the unmap operation may be a null operation. 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunIf CONFIG_MMU is not set, then there can be no temporary mappings and no 122*4882a593Smuzhiyunhighmem. In such a case, the arithmetic approach will also be used. 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun 125*4882a593Smuzhiyuni386 PAE 126*4882a593Smuzhiyun======== 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunThe i386 arch, under some circumstances, will permit you to stick up to 64GiB 129*4882a593Smuzhiyunof RAM into your 32-bit machine. This has a number of consequences: 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun* Linux needs a page-frame structure for each page in the system and the 132*4882a593Smuzhiyun pageframes need to live in the permanent mapping, which means: 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun* you can have 896M/sizeof(struct page) page-frames at most; with struct 135*4882a593Smuzhiyun page being 32-bytes that would end up being something in the order of 112G 136*4882a593Smuzhiyun worth of pages; the kernel, however, needs to store more than just 137*4882a593Smuzhiyun page-frames in that memory... 138*4882a593Smuzhiyun 139*4882a593Smuzhiyun* PAE makes your page tables larger - which slows the system down as more 140*4882a593Smuzhiyun data has to be accessed to traverse in TLB fills and the like. One 141*4882a593Smuzhiyun advantage is that PAE has more PTE bits and can provide advanced features 142*4882a593Smuzhiyun like NX and PAT. 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunThe general recommendation is that you don't use more than 8GiB on a 32-bit 145*4882a593Smuzhiyunmachine - although more might work for you and your workload, you're pretty 146*4882a593Smuzhiyunmuch on your own - don't expect kernel developers to really care much if things 147*4882a593Smuzhiyuncome apart. 148