xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/mm/soft-dirty.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _soft_dirty:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===============
4*4882a593SmuzhiyunSoft-Dirty PTEs
5*4882a593Smuzhiyun===============
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThe soft-dirty is a bit on a PTE which helps to track which pages a task
8*4882a593Smuzhiyunwrites to. In order to do this tracking one should
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun  1. Clear soft-dirty bits from the task's PTEs.
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun     This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the
13*4882a593Smuzhiyun     task in question.
14*4882a593Smuzhiyun
15*4882a593Smuzhiyun  2. Wait some time.
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun  3. Read soft-dirty bits from the PTEs.
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun     This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the
20*4882a593Smuzhiyun     64-bit qword is the soft-dirty one. If set, the respective PTE was
21*4882a593Smuzhiyun     written to since step 1.
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunInternally, to do this tracking, the writable bit is cleared from PTEs
25*4882a593Smuzhiyunwhen the soft-dirty bit is cleared. So, after this, when the task tries to
26*4882a593Smuzhiyunmodify a page at some virtual address the #PF occurs and the kernel sets
27*4882a593Smuzhiyunthe soft-dirty bit on the respective PTE.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunNote, that although all the task's address space is marked as r/o after the
30*4882a593Smuzhiyunsoft-dirty bits clear, the #PF-s that occur after that are processed fast.
31*4882a593SmuzhiyunThis is so, since the pages are still mapped to physical memory, and thus all
32*4882a593Smuzhiyunthe kernel does is finds this fact out and puts both writable and soft-dirty
33*4882a593Smuzhiyunbits on the PTE.
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunWhile in most cases tracking memory changes by #PF-s is more than enough
36*4882a593Smuzhiyunthere is still a scenario when we can lose soft dirty bits -- a task
37*4882a593Smuzhiyununmaps a previously mapped memory region and then maps a new one at exactly
38*4882a593Smuzhiyunthe same place. When unmap is called, the kernel internally clears PTE values
39*4882a593Smuzhiyunincluding soft dirty bits. To notify user space application about such
40*4882a593Smuzhiyunmemory region renewal the kernel always marks new memory regions (and
41*4882a593Smuzhiyunexpanded regions) as soft dirty.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunThis feature is actively used by the checkpoint-restore project. You
44*4882a593Smuzhiyuncan find more details about it on http://criu.org
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun-- Pavel Emelyanov, Apr 9, 2013
48