Documentation/vm/highmem.rst

*4882a593Smuzhiyun.. _highmem:
*4882a593Smuzhiyun
*4882a593Smuzhiyun====================
*4882a593SmuzhiyunHigh Memory Handling
*4882a593Smuzhiyun====================
*4882a593Smuzhiyun
*4882a593SmuzhiyunBy: Peter Zijlstra <a.p.zijlstra@chello.nl>
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. contents:: :local:
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhat Is High Memory?
*4882a593Smuzhiyun====================
*4882a593Smuzhiyun
*4882a593SmuzhiyunHigh memory (highmem) is used when the size of physical memory approaches or
*4882a593Smuzhiyunexceeds the maximum size of virtual memory.  At that point it becomes
*4882a593Smuzhiyunimpossible for the kernel to keep all of the available physical memory mapped
*4882a593Smuzhiyunat all times.  This means the kernel needs to start using temporary mappings of
*4882a593Smuzhiyunthe pieces of physical memory that it wants to access.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe part of (physical) memory not covered by a permanent mapping is what we
*4882a593Smuzhiyunrefer to as 'highmem'.  There are various architecture dependent constraints on
*4882a593Smuzhiyunwhere exactly that border lies.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIn the i386 arch, for example, we choose to map the kernel into every process's
*4882a593SmuzhiyunVM space so that we don't have to pay the full TLB invalidation costs for
*4882a593Smuzhiyunkernel entry/exit.  This means the available virtual memory space (4GiB on
*4882a593Smuzhiyuni386) has to be divided between user and kernel space.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe traditional split for architectures using this approach is 3:1, 3GiB for
*4882a593Smuzhiyunuserspace and the top 1GiB for kernel space::
*4882a593Smuzhiyun
*4882a593Smuzhiyun		+--------+ 0xffffffff
*4882a593Smuzhiyun		| Kernel |
*4882a593Smuzhiyun		+--------+ 0xc0000000
*4882a593Smuzhiyun		|        |
*4882a593Smuzhiyun		| User   |
*4882a593Smuzhiyun		|        |
*4882a593Smuzhiyun		+--------+ 0x00000000
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis means that the kernel can at most map 1GiB of physical memory at any one
*4882a593Smuzhiyuntime, but because we need virtual address space for other things - including
*4882a593Smuzhiyuntemporary maps to access the rest of the physical memory - the actual direct
*4882a593Smuzhiyunmap will typically be less (usually around ~896MiB).
*4882a593Smuzhiyun
*4882a593SmuzhiyunOther architectures that have mm context tagged TLBs can have separate kernel
*4882a593Smuzhiyunand user maps.  Some hardware (like some ARMs), however, have limited virtual
*4882a593Smuzhiyunspace when they use mm context tags.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunTemporary Virtual Mappings
*4882a593Smuzhiyun==========================
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe kernel contains several ways of creating temporary mappings:
*4882a593Smuzhiyun
*4882a593Smuzhiyun* vmap().  This can be used to make a long duration mapping of multiple
*4882a593Smuzhiyun  physical pages into a contiguous virtual space.  It needs global
*4882a593Smuzhiyun  synchronization to unmap.
*4882a593Smuzhiyun
*4882a593Smuzhiyun* kmap().  This permits a short duration mapping of a single page.  It needs
*4882a593Smuzhiyun  global synchronization, but is amortized somewhat.  It is also prone to
*4882a593Smuzhiyun  deadlocks when using in a nested fashion, and so it is not recommended for
*4882a593Smuzhiyun  new code.
*4882a593Smuzhiyun
*4882a593Smuzhiyun* kmap_atomic().  This permits a very short duration mapping of a single
*4882a593Smuzhiyun  page.  Since the mapping is restricted to the CPU that issued it, it
*4882a593Smuzhiyun  performs well, but the issuing task is therefore required to stay on that
*4882a593Smuzhiyun  CPU until it has finished, lest some other task displace its mappings.
*4882a593Smuzhiyun
*4882a593Smuzhiyun  kmap_atomic() may also be used by interrupt contexts, since it is does not
*4882a593Smuzhiyun  sleep and the caller may not sleep until after kunmap_atomic() is called.
*4882a593Smuzhiyun
*4882a593Smuzhiyun  It may be assumed that k[un]map_atomic() won't fail.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunUsing kmap_atomic
*4882a593Smuzhiyun=================
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhen and where to use kmap_atomic() is straightforward.  It is used when code
*4882a593Smuzhiyunwants to access the contents of a page that might be allocated from high memory
*4882a593Smuzhiyun(see __GFP_HIGHMEM), for example a page in the pagecache.  The API has two
*4882a593Smuzhiyunfunctions, and they can be used in a manner similar to the following::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/* Find the page of interest. */
*4882a593Smuzhiyun	struct page *page = find_get_page(mapping, offset);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/* Gain access to the contents of that page. */
*4882a593Smuzhiyun	void *vaddr = kmap_atomic(page);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/* Do something to the contents of that page. */
*4882a593Smuzhiyun	memset(vaddr, 0, PAGE_SIZE);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/* Unmap that page. */
*4882a593Smuzhiyun	kunmap_atomic(vaddr);
*4882a593Smuzhiyun
*4882a593SmuzhiyunNote that the kunmap_atomic() call takes the result of the kmap_atomic() call
*4882a593Smuzhiyunnot the argument.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf you need to map two pages because you want to copy from one page to
*4882a593Smuzhiyunanother you need to keep the kmap_atomic calls strictly nested, like::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	vaddr1 = kmap_atomic(page1);
*4882a593Smuzhiyun	vaddr2 = kmap_atomic(page2);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	memcpy(vaddr1, vaddr2, PAGE_SIZE);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	kunmap_atomic(vaddr2);
*4882a593Smuzhiyun	kunmap_atomic(vaddr1);
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunCost of Temporary Mappings
*4882a593Smuzhiyun==========================
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe cost of creating temporary mappings can be quite high.  The arch has to
*4882a593Smuzhiyunmanipulate the kernel's page tables, the data TLB and/or the MMU's registers.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf CONFIG_HIGHMEM is not set, then the kernel will try and create a mapping
*4882a593Smuzhiyunsimply with a bit of arithmetic that will convert the page struct address into
*4882a593Smuzhiyuna pointer to the page contents rather than juggling mappings about.  In such a
*4882a593Smuzhiyuncase, the unmap operation may be a null operation.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf CONFIG_MMU is not set, then there can be no temporary mappings and no
*4882a593Smuzhiyunhighmem.  In such a case, the arithmetic approach will also be used.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593Smuzhiyuni386 PAE
*4882a593Smuzhiyun========
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe i386 arch, under some circumstances, will permit you to stick up to 64GiB
*4882a593Smuzhiyunof RAM into your 32-bit machine.  This has a number of consequences:
*4882a593Smuzhiyun
*4882a593Smuzhiyun* Linux needs a page-frame structure for each page in the system and the
*4882a593Smuzhiyun  pageframes need to live in the permanent mapping, which means:
*4882a593Smuzhiyun
*4882a593Smuzhiyun* you can have 896M/sizeof(struct page) page-frames at most; with struct
*4882a593Smuzhiyun  page being 32-bytes that would end up being something in the order of 112G
*4882a593Smuzhiyun  worth of pages; the kernel, however, needs to store more than just
*4882a593Smuzhiyun  page-frames in that memory...
*4882a593Smuzhiyun
*4882a593Smuzhiyun* PAE makes your page tables larger - which slows the system down as more
*4882a593Smuzhiyun  data has to be accessed to traverse in TLB fills and the like.  One
*4882a593Smuzhiyun  advantage is that PAE has more PTE bits and can provide advanced features
*4882a593Smuzhiyun  like NX and PAT.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe general recommendation is that you don't use more than 8GiB on a 32-bit
*4882a593Smuzhiyunmachine - although more might work for you and your workload, you're pretty
*4882a593Smuzhiyunmuch on your own - don't expect kernel developers to really care much if things
*4882a593Smuzhiyuncome apart.