1*4882a593Smuzhiyun.. _soft_dirty: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=============== 4*4882a593SmuzhiyunSoft-Dirty PTEs 5*4882a593Smuzhiyun=============== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThe soft-dirty is a bit on a PTE which helps to track which pages a task 8*4882a593Smuzhiyunwrites to. In order to do this tracking one should 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun 1. Clear soft-dirty bits from the task's PTEs. 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the 13*4882a593Smuzhiyun task in question. 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun 2. Wait some time. 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun 3. Read soft-dirty bits from the PTEs. 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the 20*4882a593Smuzhiyun 64-bit qword is the soft-dirty one. If set, the respective PTE was 21*4882a593Smuzhiyun written to since step 1. 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunInternally, to do this tracking, the writable bit is cleared from PTEs 25*4882a593Smuzhiyunwhen the soft-dirty bit is cleared. So, after this, when the task tries to 26*4882a593Smuzhiyunmodify a page at some virtual address the #PF occurs and the kernel sets 27*4882a593Smuzhiyunthe soft-dirty bit on the respective PTE. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunNote, that although all the task's address space is marked as r/o after the 30*4882a593Smuzhiyunsoft-dirty bits clear, the #PF-s that occur after that are processed fast. 31*4882a593SmuzhiyunThis is so, since the pages are still mapped to physical memory, and thus all 32*4882a593Smuzhiyunthe kernel does is finds this fact out and puts both writable and soft-dirty 33*4882a593Smuzhiyunbits on the PTE. 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunWhile in most cases tracking memory changes by #PF-s is more than enough 36*4882a593Smuzhiyunthere is still a scenario when we can lose soft dirty bits -- a task 37*4882a593Smuzhiyununmaps a previously mapped memory region and then maps a new one at exactly 38*4882a593Smuzhiyunthe same place. When unmap is called, the kernel internally clears PTE values 39*4882a593Smuzhiyunincluding soft dirty bits. To notify user space application about such 40*4882a593Smuzhiyunmemory region renewal the kernel always marks new memory regions (and 41*4882a593Smuzhiyunexpanded regions) as soft dirty. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunThis feature is actively used by the checkpoint-restore project. You 44*4882a593Smuzhiyuncan find more details about it on http://criu.org 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun-- Pavel Emelyanov, Apr 9, 2013 48