xref: /OK3568_Linux_fs/kernel/Documentation/vm/highmem.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _highmem:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun====================
4*4882a593SmuzhiyunHigh Memory Handling
5*4882a593Smuzhiyun====================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunBy: Peter Zijlstra <a.p.zijlstra@chello.nl>
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun.. contents:: :local:
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunWhat Is High Memory?
12*4882a593Smuzhiyun====================
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunHigh memory (highmem) is used when the size of physical memory approaches or
15*4882a593Smuzhiyunexceeds the maximum size of virtual memory.  At that point it becomes
16*4882a593Smuzhiyunimpossible for the kernel to keep all of the available physical memory mapped
17*4882a593Smuzhiyunat all times.  This means the kernel needs to start using temporary mappings of
18*4882a593Smuzhiyunthe pieces of physical memory that it wants to access.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThe part of (physical) memory not covered by a permanent mapping is what we
21*4882a593Smuzhiyunrefer to as 'highmem'.  There are various architecture dependent constraints on
22*4882a593Smuzhiyunwhere exactly that border lies.
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunIn the i386 arch, for example, we choose to map the kernel into every process's
25*4882a593SmuzhiyunVM space so that we don't have to pay the full TLB invalidation costs for
26*4882a593Smuzhiyunkernel entry/exit.  This means the available virtual memory space (4GiB on
27*4882a593Smuzhiyuni386) has to be divided between user and kernel space.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunThe traditional split for architectures using this approach is 3:1, 3GiB for
30*4882a593Smuzhiyunuserspace and the top 1GiB for kernel space::
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun		+--------+ 0xffffffff
33*4882a593Smuzhiyun		| Kernel |
34*4882a593Smuzhiyun		+--------+ 0xc0000000
35*4882a593Smuzhiyun		|        |
36*4882a593Smuzhiyun		| User   |
37*4882a593Smuzhiyun		|        |
38*4882a593Smuzhiyun		+--------+ 0x00000000
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunThis means that the kernel can at most map 1GiB of physical memory at any one
41*4882a593Smuzhiyuntime, but because we need virtual address space for other things - including
42*4882a593Smuzhiyuntemporary maps to access the rest of the physical memory - the actual direct
43*4882a593Smuzhiyunmap will typically be less (usually around ~896MiB).
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunOther architectures that have mm context tagged TLBs can have separate kernel
46*4882a593Smuzhiyunand user maps.  Some hardware (like some ARMs), however, have limited virtual
47*4882a593Smuzhiyunspace when they use mm context tags.
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunTemporary Virtual Mappings
51*4882a593Smuzhiyun==========================
52*4882a593Smuzhiyun
53*4882a593SmuzhiyunThe kernel contains several ways of creating temporary mappings:
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun* vmap().  This can be used to make a long duration mapping of multiple
56*4882a593Smuzhiyun  physical pages into a contiguous virtual space.  It needs global
57*4882a593Smuzhiyun  synchronization to unmap.
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun* kmap().  This permits a short duration mapping of a single page.  It needs
60*4882a593Smuzhiyun  global synchronization, but is amortized somewhat.  It is also prone to
61*4882a593Smuzhiyun  deadlocks when using in a nested fashion, and so it is not recommended for
62*4882a593Smuzhiyun  new code.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun* kmap_atomic().  This permits a very short duration mapping of a single
65*4882a593Smuzhiyun  page.  Since the mapping is restricted to the CPU that issued it, it
66*4882a593Smuzhiyun  performs well, but the issuing task is therefore required to stay on that
67*4882a593Smuzhiyun  CPU until it has finished, lest some other task displace its mappings.
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun  kmap_atomic() may also be used by interrupt contexts, since it is does not
70*4882a593Smuzhiyun  sleep and the caller may not sleep until after kunmap_atomic() is called.
71*4882a593Smuzhiyun
72*4882a593Smuzhiyun  It may be assumed that k[un]map_atomic() won't fail.
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunUsing kmap_atomic
76*4882a593Smuzhiyun=================
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunWhen and where to use kmap_atomic() is straightforward.  It is used when code
79*4882a593Smuzhiyunwants to access the contents of a page that might be allocated from high memory
80*4882a593Smuzhiyun(see __GFP_HIGHMEM), for example a page in the pagecache.  The API has two
81*4882a593Smuzhiyunfunctions, and they can be used in a manner similar to the following::
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun	/* Find the page of interest. */
84*4882a593Smuzhiyun	struct page *page = find_get_page(mapping, offset);
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun	/* Gain access to the contents of that page. */
87*4882a593Smuzhiyun	void *vaddr = kmap_atomic(page);
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun	/* Do something to the contents of that page. */
90*4882a593Smuzhiyun	memset(vaddr, 0, PAGE_SIZE);
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun	/* Unmap that page. */
93*4882a593Smuzhiyun	kunmap_atomic(vaddr);
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunNote that the kunmap_atomic() call takes the result of the kmap_atomic() call
96*4882a593Smuzhiyunnot the argument.
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunIf you need to map two pages because you want to copy from one page to
99*4882a593Smuzhiyunanother you need to keep the kmap_atomic calls strictly nested, like::
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun	vaddr1 = kmap_atomic(page1);
102*4882a593Smuzhiyun	vaddr2 = kmap_atomic(page2);
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun	memcpy(vaddr1, vaddr2, PAGE_SIZE);
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun	kunmap_atomic(vaddr2);
107*4882a593Smuzhiyun	kunmap_atomic(vaddr1);
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunCost of Temporary Mappings
111*4882a593Smuzhiyun==========================
112*4882a593Smuzhiyun
113*4882a593SmuzhiyunThe cost of creating temporary mappings can be quite high.  The arch has to
114*4882a593Smuzhiyunmanipulate the kernel's page tables, the data TLB and/or the MMU's registers.
115*4882a593Smuzhiyun
116*4882a593SmuzhiyunIf CONFIG_HIGHMEM is not set, then the kernel will try and create a mapping
117*4882a593Smuzhiyunsimply with a bit of arithmetic that will convert the page struct address into
118*4882a593Smuzhiyuna pointer to the page contents rather than juggling mappings about.  In such a
119*4882a593Smuzhiyuncase, the unmap operation may be a null operation.
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunIf CONFIG_MMU is not set, then there can be no temporary mappings and no
122*4882a593Smuzhiyunhighmem.  In such a case, the arithmetic approach will also be used.
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun
125*4882a593Smuzhiyuni386 PAE
126*4882a593Smuzhiyun========
127*4882a593Smuzhiyun
128*4882a593SmuzhiyunThe i386 arch, under some circumstances, will permit you to stick up to 64GiB
129*4882a593Smuzhiyunof RAM into your 32-bit machine.  This has a number of consequences:
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun* Linux needs a page-frame structure for each page in the system and the
132*4882a593Smuzhiyun  pageframes need to live in the permanent mapping, which means:
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun* you can have 896M/sizeof(struct page) page-frames at most; with struct
135*4882a593Smuzhiyun  page being 32-bytes that would end up being something in the order of 112G
136*4882a593Smuzhiyun  worth of pages; the kernel, however, needs to store more than just
137*4882a593Smuzhiyun  page-frames in that memory...
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun* PAE makes your page tables larger - which slows the system down as more
140*4882a593Smuzhiyun  data has to be accessed to traverse in TLB fills and the like.  One
141*4882a593Smuzhiyun  advantage is that PAE has more PTE bits and can provide advanced features
142*4882a593Smuzhiyun  like NX and PAT.
143*4882a593Smuzhiyun
144*4882a593SmuzhiyunThe general recommendation is that you don't use more than 8GiB on a 32-bit
145*4882a593Smuzhiyunmachine - although more might work for you and your workload, you're pretty
146*4882a593Smuzhiyunmuch on your own - don't expect kernel developers to really care much if things
147*4882a593Smuzhiyuncome apart.
148