1*4882a593Smuzhiyun============================= 2*4882a593SmuzhiyunNo-MMU memory mapping support 3*4882a593Smuzhiyun============================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunThe kernel has limited support for memory mapping under no-MMU conditions, such 6*4882a593Smuzhiyunas are used in uClinux environments. From the userspace point of view, memory 7*4882a593Smuzhiyunmapping is made use of in conjunction with the mmap() system call, the shmat() 8*4882a593Smuzhiyuncall and the execve() system call. From the kernel's point of view, execve() 9*4882a593Smuzhiyunmapping is actually performed by the binfmt drivers, which call back into the 10*4882a593Smuzhiyunmmap() routines to do the actual work. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunMemory mapping behaviour also involves the way fork(), vfork(), clone() and 13*4882a593Smuzhiyunptrace() work. Under uClinux there is no fork(), and clone() must be supplied 14*4882a593Smuzhiyunthe CLONE_VM flag. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunThe behaviour is similar between the MMU and no-MMU cases, but not identical; 17*4882a593Smuzhiyunand it's also much more restricted in the latter case: 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun (#) Anonymous mapping, MAP_PRIVATE 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun In the MMU case: VM regions backed by arbitrary pages; copy-on-write 22*4882a593Smuzhiyun across fork. 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun In the no-MMU case: VM regions backed by arbitrary contiguous runs of 25*4882a593Smuzhiyun pages. 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun (#) Anonymous mapping, MAP_SHARED 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun These behave very much like private mappings, except that they're 30*4882a593Smuzhiyun shared across fork() or clone() without CLONE_VM in the MMU case. Since 31*4882a593Smuzhiyun the no-MMU case doesn't support these, behaviour is identical to 32*4882a593Smuzhiyun MAP_PRIVATE there. 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun In the MMU case: VM regions backed by pages read from file; changes to 37*4882a593Smuzhiyun the underlying file are reflected in the mapping; copied across fork. 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun In the no-MMU case: 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun - If one exists, the kernel will re-use an existing mapping to the 42*4882a593Smuzhiyun same segment of the same file if that has compatible permissions, 43*4882a593Smuzhiyun even if this was created by another process. 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun - If possible, the file mapping will be directly on the backing device 46*4882a593Smuzhiyun if the backing device has the NOMMU_MAP_DIRECT capability and 47*4882a593Smuzhiyun appropriate mapping protection capabilities. Ramfs, romfs, cramfs 48*4882a593Smuzhiyun and mtd might all permit this. 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun - If the backing device can't or won't permit direct sharing, 51*4882a593Smuzhiyun but does have the NOMMU_MAP_COPY capability, then a copy of the 52*4882a593Smuzhiyun appropriate bit of the file will be read into a contiguous bit of 53*4882a593Smuzhiyun memory and any extraneous space beyond the EOF will be cleared 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun - Writes to the file do not affect the mapping; writes to the mapping 56*4882a593Smuzhiyun are visible in other processes (no MMU protection), but should not 57*4882a593Smuzhiyun happen. 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun In the MMU case: like the non-PROT_WRITE case, except that the pages in 62*4882a593Smuzhiyun question get copied before the write actually happens. From that point 63*4882a593Smuzhiyun on writes to the file underneath that page no longer get reflected into 64*4882a593Smuzhiyun the mapping's backing pages. The page is then backed by swap instead. 65*4882a593Smuzhiyun 66*4882a593Smuzhiyun In the no-MMU case: works much like the non-PROT_WRITE case, except 67*4882a593Smuzhiyun that a copy is always taken and never shared. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun (#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun In the MMU case: VM regions backed by pages read from file; changes to 72*4882a593Smuzhiyun pages written back to file; writes to file reflected into pages backing 73*4882a593Smuzhiyun mapping; shared across fork. 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun In the no-MMU case: not supported. 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun (#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 78*4882a593Smuzhiyun 79*4882a593Smuzhiyun In the MMU case: As for ordinary regular files. 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun In the no-MMU case: The filesystem providing the memory-backed file 82*4882a593Smuzhiyun (such as ramfs or tmpfs) may choose to honour an open, truncate, mmap 83*4882a593Smuzhiyun sequence by providing a contiguous sequence of pages to map. In that 84*4882a593Smuzhiyun case, a shared-writable memory mapping will be possible. It will work 85*4882a593Smuzhiyun as for the MMU case. If the filesystem does not provide any such 86*4882a593Smuzhiyun support, then the mapping request will be denied. 87*4882a593Smuzhiyun 88*4882a593Smuzhiyun (#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun In the MMU case: As for ordinary regular files. 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun In the no-MMU case: As for memory backed regular files, but the 93*4882a593Smuzhiyun blockdev must be able to provide a contiguous run of pages without 94*4882a593Smuzhiyun truncate being called. The ramdisk driver could do this if it allocated 95*4882a593Smuzhiyun all its memory as a contiguous array upfront. 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun (#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun In the MMU case: As for ordinary regular files. 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun In the no-MMU case: The character device driver may choose to honour 102*4882a593Smuzhiyun the mmap() by providing direct access to the underlying device if it 103*4882a593Smuzhiyun provides memory or quasi-memory that can be accessed directly. Examples 104*4882a593Smuzhiyun of such are frame buffers and flash devices. If the driver does not 105*4882a593Smuzhiyun provide any such support, then the mapping request will be denied. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunFurther notes on no-MMU MMAP 109*4882a593Smuzhiyun============================ 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun (#) A request for a private mapping of a file may return a buffer that is not 112*4882a593Smuzhiyun page-aligned. This is because XIP may take place, and the data may not be 113*4882a593Smuzhiyun paged aligned in the backing store. 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun (#) A request for an anonymous mapping will always be page aligned. If 116*4882a593Smuzhiyun possible the size of the request should be a power of two otherwise some 117*4882a593Smuzhiyun of the space may be wasted as the kernel must allocate a power-of-2 118*4882a593Smuzhiyun granule but will only discard the excess if appropriately configured as 119*4882a593Smuzhiyun this has an effect on fragmentation. 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun (#) The memory allocated by a request for an anonymous mapping will normally 122*4882a593Smuzhiyun be cleared by the kernel before being returned in accordance with the 123*4882a593Smuzhiyun Linux man pages (ver 2.22 or later). 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun In the MMU case this can be achieved with reasonable performance as 126*4882a593Smuzhiyun regions are backed by virtual pages, with the contents only being mapped 127*4882a593Smuzhiyun to cleared physical pages when a write happens on that specific page 128*4882a593Smuzhiyun (prior to which, the pages are effectively mapped to the global zero page 129*4882a593Smuzhiyun from which reads can take place). This spreads out the time it takes to 130*4882a593Smuzhiyun initialize the contents of a page - depending on the write-usage of the 131*4882a593Smuzhiyun mapping. 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun In the no-MMU case, however, anonymous mappings are backed by physical 134*4882a593Smuzhiyun pages, and the entire map is cleared at allocation time. This can cause 135*4882a593Smuzhiyun significant delays during a userspace malloc() as the C library does an 136*4882a593Smuzhiyun anonymous mapping and the kernel then does a memset for the entire map. 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun However, for memory that isn't required to be precleared - such as that 139*4882a593Smuzhiyun returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to 140*4882a593Smuzhiyun indicate to the kernel that it shouldn't bother clearing the memory before 141*4882a593Smuzhiyun returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled 142*4882a593Smuzhiyun to permit this, otherwise the flag will be ignored. 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this 145*4882a593Smuzhiyun to allocate the brk and stack region. 146*4882a593Smuzhiyun 147*4882a593Smuzhiyun (#) A list of all the private copy and anonymous mappings on the system is 148*4882a593Smuzhiyun visible through /proc/maps in no-MMU mode. 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun (#) A list of all the mappings in use by a process is visible through 151*4882a593Smuzhiyun /proc/<pid>/maps in no-MMU mode. 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun (#) Supplying MAP_FIXED or a requesting a particular mapping address will 154*4882a593Smuzhiyun result in an error. 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun (#) Files mapped privately usually have to have a read method provided by the 157*4882a593Smuzhiyun driver or filesystem so that the contents can be read into the memory 158*4882a593Smuzhiyun allocated if mmap() chooses not to map the backing device directly. An 159*4882a593Smuzhiyun error will result if they don't. This is most likely to be encountered 160*4882a593Smuzhiyun with character device files, pipes, fifos and sockets. 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun 163*4882a593SmuzhiyunInterprocess shared memory 164*4882a593Smuzhiyun========================== 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunBoth SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU 167*4882a593Smuzhiyunmode. The former through the usual mechanism, the latter through files created 168*4882a593Smuzhiyunon ramfs or tmpfs mounts. 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunFutexes 172*4882a593Smuzhiyun======= 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunFutexes are supported in NOMMU mode if the arch supports them. An error will 175*4882a593Smuzhiyunbe given if an address passed to the futex system call lies outside the 176*4882a593Smuzhiyunmappings made by a process or if the mapping in which the address lies does not 177*4882a593Smuzhiyunsupport futexes (such as an I/O chardev mapping). 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunNo-MMU mremap 181*4882a593Smuzhiyun============= 182*4882a593Smuzhiyun 183*4882a593SmuzhiyunThe mremap() function is partially supported. It may change the size of a 184*4882a593Smuzhiyunmapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size 185*4882a593Smuzhiyunof the mapping exceeds the size of the slab object currently occupied by the 186*4882a593Smuzhiyunmemory to which the mapping refers, or if a smaller slab object could be used. 187*4882a593Smuzhiyun 188*4882a593SmuzhiyunMREMAP_FIXED is not supported, though it is ignored if there's no change of 189*4882a593Smuzhiyunaddress and the object does not need to be moved. 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunShared mappings may not be moved. Shareable mappings may not be moved either, 192*4882a593Smuzhiyuneven if they are not currently shared. 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunThe mremap() function must be given an exact match for base address and size of 195*4882a593Smuzhiyuna previously mapped object. It may not be used to create holes in existing 196*4882a593Smuzhiyunmappings, move parts of existing mappings or resize parts of mappings. It must 197*4882a593Smuzhiyunact on a complete mapping. 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun.. [#] Not currently supported. 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun 202*4882a593SmuzhiyunProviding shareable character device support 203*4882a593Smuzhiyun============================================ 204*4882a593Smuzhiyun 205*4882a593SmuzhiyunTo provide shareable character device support, a driver must provide a 206*4882a593Smuzhiyunfile->f_op->get_unmapped_area() operation. The mmap() routines will call this 207*4882a593Smuzhiyunto get a proposed address for the mapping. This may return an error if it 208*4882a593Smuzhiyundoesn't wish to honour the mapping because it's too long, at a weird offset, 209*4882a593Smuzhiyununder some unsupported combination of flags or whatever. 210*4882a593Smuzhiyun 211*4882a593SmuzhiyunThe driver should also provide backing device information with capabilities set 212*4882a593Smuzhiyunto indicate the permitted types of mapping on such devices. The default is 213*4882a593Smuzhiyunassumed to be readable and writable, not executable, and only shareable 214*4882a593Smuzhiyundirectly (can't be copied). 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunThe file->f_op->mmap() operation will be called to actually inaugurate the 217*4882a593Smuzhiyunmapping. It can be rejected at that point. Returning the ENOSYS error will 218*4882a593Smuzhiyuncause the mapping to be copied instead if NOMMU_MAP_COPY is specified. 219*4882a593Smuzhiyun 220*4882a593SmuzhiyunThe vm_ops->close() routine will be invoked when the last mapping on a chardev 221*4882a593Smuzhiyunis removed. An existing mapping will be shared, partially or not, if possible 222*4882a593Smuzhiyunwithout notifying the driver. 223*4882a593Smuzhiyun 224*4882a593SmuzhiyunIt is permitted also for the file->f_op->get_unmapped_area() operation to 225*4882a593Smuzhiyunreturn -ENOSYS. This will be taken to mean that this operation just doesn't 226*4882a593Smuzhiyunwant to handle it, despite the fact it's got an operation. For instance, it 227*4882a593Smuzhiyunmight try directing the call to a secondary driver which turns out not to 228*4882a593Smuzhiyunimplement it. Such is the case for the framebuffer driver which attempts to 229*4882a593Smuzhiyundirect the call to the device-specific driver. Under such circumstances, the 230*4882a593Smuzhiyunmapping request will be rejected if NOMMU_MAP_COPY is not specified, and a 231*4882a593Smuzhiyuncopy mapped otherwise. 232*4882a593Smuzhiyun 233*4882a593Smuzhiyun.. important:: 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun Some types of device may present a different appearance to anyone 236*4882a593Smuzhiyun looking at them in certain modes. Flash chips can be like this; for 237*4882a593Smuzhiyun instance if they're in programming or erase mode, you might see the 238*4882a593Smuzhiyun status reflected in the mapping, instead of the data. 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun In such a case, care must be taken lest userspace see a shared or a 241*4882a593Smuzhiyun private mapping showing such information when the driver is busy 242*4882a593Smuzhiyun controlling the device. Remember especially: private executable 243*4882a593Smuzhiyun mappings may still be mapped directly off the device under some 244*4882a593Smuzhiyun circumstances! 245*4882a593Smuzhiyun 246*4882a593Smuzhiyun 247*4882a593SmuzhiyunProviding shareable memory-backed file support 248*4882a593Smuzhiyun============================================== 249*4882a593Smuzhiyun 250*4882a593SmuzhiyunProvision of shared mappings on memory backed files is similar to the provision 251*4882a593Smuzhiyunof support for shared mapped character devices. The main difference is that the 252*4882a593Smuzhiyunfilesystem providing the service will probably allocate a contiguous collection 253*4882a593Smuzhiyunof pages and permit mappings to be made on that. 254*4882a593Smuzhiyun 255*4882a593SmuzhiyunIt is recommended that a truncate operation applied to such a file that 256*4882a593Smuzhiyunincreases the file size, if that file is empty, be taken as a request to gather 257*4882a593Smuzhiyunenough pages to honour a mapping. This is required to support POSIX shared 258*4882a593Smuzhiyunmemory. 259*4882a593Smuzhiyun 260*4882a593SmuzhiyunMemory backed devices are indicated by the mapping's backing device info having 261*4882a593Smuzhiyunthe memory_backed flag set. 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunProviding shareable block device support 265*4882a593Smuzhiyun======================================== 266*4882a593Smuzhiyun 267*4882a593SmuzhiyunProvision of shared mappings on block device files is exactly the same as for 268*4882a593Smuzhiyuncharacter devices. If there isn't a real device underneath, then the driver 269*4882a593Smuzhiyunshould allocate sufficient contiguous memory to honour any supported mapping. 270*4882a593Smuzhiyun 271*4882a593Smuzhiyun 272*4882a593SmuzhiyunAdjusting page trimming behaviour 273*4882a593Smuzhiyun================================= 274*4882a593Smuzhiyun 275*4882a593SmuzhiyunNOMMU mmap automatically rounds up to the nearest power-of-2 number of pages 276*4882a593Smuzhiyunwhen performing an allocation. This can have adverse effects on memory 277*4882a593Smuzhiyunfragmentation, and as such, is left configurable. The default behaviour is to 278*4882a593Smuzhiyunaggressively trim allocations and discard any excess pages back in to the page 279*4882a593Smuzhiyunallocator. In order to retain finer-grained control over fragmentation, this 280*4882a593Smuzhiyunbehaviour can either be disabled completely, or bumped up to a higher page 281*4882a593Smuzhiyunwatermark where trimming begins. 282*4882a593Smuzhiyun 283*4882a593SmuzhiyunPage trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``. 284