Lines Matching +full:memory +full:- +full:controller

9 conventions of cgroup v2.  It describes all userland-visible aspects
10 of cgroup including core and specific controller behaviors. All
12 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
17 1-1. Terminology
18 1-2. What is cgroup?
20 2-1. Mounting
21 2-2. Organizing Processes and Threads
22 2-2-1. Processes
23 2-2-2. Threads
24 2-3. [Un]populated Notification
25 2-4. Controlling Controllers
26 2-4-1. Enabling and Disabling
27 2-4-2. Top-down Constraint
28 2-4-3. No Internal Process Constraint
29 2-5. Delegation
30 2-5-1. Model of Delegation
31 2-5-2. Delegation Containment
32 2-6. Guidelines
33 2-6-1. Organize Once and Control
34 2-6-2. Avoid Name Collisions
36 3-1. Weights
37 3-2. Limits
38 3-3. Protections
39 3-4. Allocations
41 4-1. Format
42 4-2. Conventions
43 4-3. Core Interface Files
45 5-1. CPU
46 5-1-1. CPU Interface Files
47 5-2. Memory
48 5-2-1. Memory Interface Files
49 5-2-2. Usage Guidelines
50 5-2-3. Memory Ownership
51 5-3. IO
52 5-3-1. IO Interface Files
53 5-3-2. Writeback
54 5-3-3. IO Latency
55 5-3-3-1. How IO Latency Throttling Works
56 5-3-3-2. IO Latency Interface Files
57 5-3-4. IO Priority
58 5-4. PID
59 5-4-1. PID Interface Files
60 5-5. Cpuset
61 5.5-1. Cpuset Interface Files
62 5-6. Device
63 5-7. RDMA
64 5-7-1. RDMA Interface Files
65 5-8. HugeTLB
66 5.8-1. HugeTLB Interface Files
67 5-8. Misc
68 5-8-1. perf_event
69 5-N. Non-normative information
70 5-N-1. CPU controller root cgroup process behaviour
71 5-N-2. IO controller root cgroup process behaviour
73 6-1. Basics
74 6-2. The Root and Views
75 6-3. Migration and setns(2)
76 6-4. Interaction with Other Namespaces
78 P-1. Filesystem Support for Writeback
81 R-1. Multiple Hierarchies
82 R-2. Thread Granularity
83 R-3. Competition Between Inner Nodes and Threads
84 R-4. Other Interface Issues
85 R-5. Controller Issues and Remedies
86 R-5-1. Memory
93 -----------
102 ---------------
108 cgroup is largely composed of two parts - the core and controllers.
110 processes. A cgroup controller is usually responsible for
123 disabled selectively on a cgroup. All controller behaviors are
124 hierarchical - if a controller is enabled on a cgroup, it affects all
126 sub-hierarchy of the cgroup. When a controller is enabled on a nested
136 --------
141 # mount -t cgroup2 none $MOUNT_POINT
150 A controller can be moved across hierarchies only after the controller
151 is no longer referenced in its current hierarchy. Because per-cgroup
152 controller states are destroyed asynchronously and controllers may
153 have lingering references, a controller may not show up immediately on
155 Similarly, a controller should be fully disabled to be moved out of
157 controller to become available for other hierarchies; furthermore, due
158 to inter-controller dependencies, other controllers may need to be
164 the hierarchies and controller associations before starting using the
180 ignored on non-init namespace mounts. Please refer to the
185 Only populate memory.events with data for the current cgroup,
190 option is ignored on non-init namespace mounts.
194 Recursively apply memory.min and memory.low protection to
199 behavior but is a mount-option to avoid regressing setups
205 --------------------------------
211 A child cgroup can be created by creating a sub-directory::
216 structure. Each cgroup has a read-writable interface file
218 belong to the cgroup one-per-line. The PIDs are not ordered and the
249 0::/test-cgroup/test-cgroup-nested
256 0::/test-cgroup/test-cgroup-nested (deleted)
282 constraint - threaded controllers can be enabled on non-leaf cgroups
306 - As the cgroup will join the parent's resource domain. The parent
309 - When the parent is an unthreaded domain, it must not have any domain
313 Topology-wise, a cgroup can be in an invalid state. Please consider
316 A (threaded domain) - B (threaded) - C (domain, just created)
331 threads in the cgroup. Except that the operations are per-thread
332 instead of per-process, "cgroup.threads" has the same format and
347 a threaded controller is enabled inside a threaded subtree, it only
353 constraint, a threaded controller must be able to handle competition
354 between threads in a non-leaf cgroup and its child cgroups. Each
355 threaded controller defines how such competitions are handled.
359 --------------------------
361 Each non-root cgroup has a "cgroup.events" file which contains
362 "populated" field indicating whether the cgroup's sub-hierarchy has
366 example, to start a clean-up operation after all processes of a given
367 sub-hierarchy have exited. The populated state updates and
368 notifications are recursive. Consider the following sub-hierarchy
372 A(4) - B(0) - C(1)
382 -----------------------
391 cpu io memory
393 No controller is enabled by default. Controllers can be enabled and
396 # echo "+cpu +memory -io" > cgroup.subtree_control
400 all succeed or fail. If multiple operations on the same controller
403 Enabling a controller in a cgroup indicates that the distribution of
405 Consider the following sub-hierarchy. The enabled controllers are
408 A(cpu,memory) - B(memory) - C()
411 As A has "cpu" and "memory" enabled, A will control the distribution
412 of CPU cycles and memory to its children, in this case, B. As B has
413 "memory" enabled but not "CPU", C and D will compete freely on CPU
414 cycles but their division of memory available to B will be controlled.
416 As a controller regulates the distribution of the target resource to
417 the cgroup's children, enabling it creates the controller's interface
419 would create the "cpu." prefixed controller interface files in C and
420 D. Likewise, disabling "memory" from B would remove the "memory."
421 prefixed controller interface files from C and D. This means that the
422 controller interface files - anything which doesn't start with
426 Top-down Constraint
429 Resources are distributed top-down and a cgroup can further distribute
431 parent. This means that all non-root "cgroup.subtree_control" files
433 "cgroup.subtree_control" file. A controller can be enabled only if
434 the parent has the controller enabled and a controller can't be
441 Non-root cgroups can distribute domain resources to their children
446 This guarantees that, when a domain controller is looking at the part
455 is up to each controller (for more information on this topic please
456 refer to the Non-normative information section in the Controllers
460 enabled controller in the cgroup's "cgroup.subtree_control". This is
469 ----------
489 delegated, the user can build sub-hierarchy under the directory,
493 happens in the delegated sub-hierarchy, nothing can escape the
497 cgroups in or nesting depth of a delegated sub-hierarchy; however,
504 A delegated sub-hierarchy is contained in the sense that processes
505 can't be moved into or out of the sub-hierarchy by the delegatee.
508 requiring the following conditions for a process with a non-root euid
512 - The writer must have write access to the "cgroup.procs" file.
514 - The writer must have write access to the "cgroup.procs" file of the
518 processes around freely in the delegated sub-hierarchy it can't pull
519 in from or push out to outside the sub-hierarchy.
525 ~~~~~~~~~~~~~ - C0 - C00
528 ~~~~~~~~~~~~~ - C1 - C10
535 will be denied with -EACCES.
540 is not reachable, the migration is rejected with -ENOENT.
544 ----------
550 and stateful resources such as memory are not moved together with the
552 inherent trade-offs between migration and various hot paths in terms
558 resource structure once on start-up. Dynamic adjustments to resource
559 distribution can be made by changing controller configuration through
571 controller's interface files are prefixed with the controller name and
572 a dot. A controller's name is composed of lower case alphabets and
591 -------
597 work-conserving. Due to the dynamic nature, this model is usually
613 ------
616 Limits can be over-committed - the sum of the limits of children can
621 As limits can be over-committed, all configuration combinations are
630 -----------
635 soft boundaries. Protections can also be over-committed in which case
642 As protections can be over-committed, all configuration combinations
646 "memory.low" implements best-effort memory protection and is an
651 -----------
654 resource. Allocations can't be over-committed - the sum of the
661 As allocations can't be over-committed, some configuration
666 "cpu.rt.max" hard-allocates realtime slices and is an example of this
674 ------
679 New-line separated values
687 (when read-only or multiple values can be written at once)
713 -----------
715 - Settings for a single feature should be contained in a single file.
717 - The root cgroup should be exempt from resource control and thus
720 - The default time unit is microseconds. If a different unit is ever
723 - A parts-per quantity should use a percentage decimal with at least
724 two digit fractional part - e.g. 13.40.
726 - If a controller implements weight based resource distribution, its
732 - If a controller implements an absolute resource guarantee and/or
734 respectively. If a controller implements best effort resource
741 - If a setting has a configurable default value and keyed specific
755 # cat cgroup-example-interface-file
761 # echo 125 > cgroup-example-interface-file
765 # echo "default 125" > cgroup-example-interface-file
769 # echo "8:16 170" > cgroup-example-interface-file
773 # echo "8:0 default" > cgroup-example-interface-file
774 # cat cgroup-example-interface-file
778 - For events which are not very high frequency, an interface file
785 --------------------
791 A read-write single value file which exists on non-root
797 - "domain" : A normal valid domain cgroup.
799 - "domain threaded" : A threaded domain cgroup which is
802 - "domain invalid" : A cgroup which is in an invalid state.
806 - "threaded" : A threaded cgroup which is a member of a
813 A read-write new-line separated values file which exists on
817 the cgroup one-per-line. The PIDs are not ordered and the
826 - It must have write access to the "cgroup.procs" file.
828 - It must have write access to the "cgroup.procs" file of the
831 When delegating a sub-hierarchy, write access to this file
839 A read-write new-line separated values file which exists on
843 the cgroup one-per-line. The TIDs are not ordered and the
852 - It must have write access to the "cgroup.threads" file.
854 - The cgroup that the thread is currently in must be in the
857 - It must have write access to the "cgroup.procs" file of the
860 When delegating a sub-hierarchy, write access to this file
864 A read-only space separated values file which exists on all
871 A read-write space separated values file which exists on all
878 Space separated list of controllers prefixed with '+' or '-'
879 can be written to enable or disable controllers. A controller
880 name prefixed with '+' enables the controller and '-'
881 disables. If a controller appears more than once on the list,
886 A read-only flat-keyed file which exists on non-root cgroups.
898 A read-write single value files. The default is "max".
905 A read-write single value files. The default is "max".
912 A read-only flat-keyed file with the following entries:
930 A read-write single value file which exists on non-root cgroups.
953 create new sub-cgroups.
959 ---
962 controller implements weight and absolute bandwidth limit models for
974 the cpu controller can only be enabled when all RT processes are in
978 before the cpu controller can be enabled.
987 A read-only flat-keyed file.
988 This file exists whether the controller is enabled or not.
992 - usage_usec
993 - user_usec
994 - system_usec
996 and the following three when the controller is enabled:
998 - nr_periods
999 - nr_throttled
1000 - throttled_usec
1003 A read-write single value file which exists on non-root
1009 A read-write single value file which exists on non-root
1012 The nice value is in the range [-20, 19].
1021 A read-write two value file which exists on non-root cgroups.
1033 A read-only nested-key file which exists on non-root cgroups.
1039 A read-write single value file which exists on non-root cgroups.
1054 A read-write single value file which exists on non-root cgroups.
1066 Memory section in Controllers
1067 ------
1069 The "memory" controller regulates distribution of memory. Memory is
1071 intertwining between memory usage and reclaim pressure and the
1072 stateful nature of memory, the distribution model is relatively
1075 While not completely water-tight, all major memory usages by a given
1076 cgroup are tracked so that the total memory consumption can be
1078 following types of memory usages are tracked.
1080 - Userland memory - page cache and anonymous memory.
1082 - Kernel data structures such as dentries and inodes.
1084 - TCP socket buffers.
1089 Memory Interface Files argument
1092 All memory amounts are in bytes. If a value which is not aligned to
1096 memory.current
1097 A read-only single value file which exists on non-root
1100 The total amount of memory currently being used by the cgroup
1103 memory.min
1104 A read-write single value file which exists on non-root
1107 Hard memory protection. If the memory usage of a cgroup
1108 is within its effective min boundary, the cgroup's memory
1110 unprotected reclaimable memory available, OOM killer
1116 Effective min boundary is limited by memory.min values of
1117 all ancestor cgroups. If there is memory.min overcommitment
1118 (child cgroup or cgroups are requiring more protected memory
1121 actual memory usage below memory.min.
1123 Putting more memory than generally available under this
1126 If a memory cgroup is not populated with processes,
1127 its memory.min is ignored.
1129 memory.low
1130 A read-write single value file which exists on non-root
1133 Best-effort memory protection. If the memory usage of a
1135 memory won't be reclaimed unless there is no reclaimable
1136 memory available in unprotected cgroups.
1142 Effective low boundary is limited by memory.low values of
1143 all ancestor cgroups. If there is memory.low overcommitment
1144 (child cgroup or cgroups are requiring more protected memory
1147 actual memory usage below memory.low.
1149 Putting more memory than generally available under this
1152 memory.high
1153 A read-write single value file which exists on non-root
1156 Memory usage throttle limit. This is the main mechanism to
1157 control memory usage of a cgroup. If a cgroup's usage goes
1164 memory.max
1165 A read-write single value file which exists on non-root
1168 Memory usage hard limit. This is the final protection
1169 mechanism. If a cgroup's memory usage reaches this limit and
1174 In default configuration regular 0-order allocations always
1179 as -ENOMEM or silently ignore in cases like disk readahead.
1185 memory.oom.group
1186 A read-write single value file which exists on non-root
1192 (if the memory cgroup is not a leaf cgroup) are killed
1196 Tasks with the OOM protection (oom_score_adj set to -1000)
1201 memory.oom.group values of ancestor cgroups.
1203 memory.events
1204 A read-only flat-keyed file which exists on non-root cgroups.
1212 memory.events.local.
1216 high memory pressure even though its usage is under
1218 boundary is over-committed.
1222 throttled and routed to perform direct memory reclaim
1223 because the high memory boundary was exceeded. For a
1224 cgroup whose memory usage is capped by the high limit
1225 rather than global memory pressure, this event's
1229 The number of times the cgroup's memory usage was
1234 The number of time the cgroup's memory usage was
1238 considered as an option, e.g. for failed high-order
1245 memory.events.local
1246 Similar to memory.events but the fields in the file are local
1250 memory.stat
1251 A read-only flat-keyed file which exists on non-root cgroups.
1253 This breaks down the cgroup's memory footprint into different
1254 types of memory, type-specific details, and other information
1255 on the state and past events of the memory management system.
1257 All memory amounts are in bytes.
1263 If the entry has no per-node counter(or not show in the
1264 mempry.numa_stat). We use 'npn'(non-per-node) as the tag
1268 Amount of memory used in anonymous mappings such as
1272 Amount of memory used to cache filesystem data,
1273 including tmpfs and shared memory.
1276 Amount of memory allocated to kernel stacks.
1279 Amount of memory used for storing per-cpu kernel
1283 Amount of memory used in network transmission buffers
1286 Amount of cached filesystem data that is swap-backed,
1301 Amount of memory used in anonymous mappings backed by
1305 Amount of memory, swap-backed and filesystem-backed,
1306 on the internal memory management lists used by the
1310 memory management lists), inactive_foo + active_foo may not be equal to
1311 the value for the foo counter, since the foo counter is type-based, not
1312 list-based.
1319 Part of "slab" that cannot be reclaimed on memory
1323 Amount of memory used for storing in-kernel data
1372 Amount of pages postponed to be freed under memory pressure
1387 memory.numa_stat
1388 A read-only nested-keyed file which exists on non-root cgroups.
1390 This breaks down the cgroup's memory footprint into different
1391 types of memory, type-specific details, and other information
1392 per node on the state of the memory management system.
1400 All memory amounts are in bytes.
1402 The output format of memory.numa_stat is::
1410 The entries can refer to the memory.stat.
1412 memory.swap.current
1413 A read-only single value file which exists on non-root
1419 memory.swap.high
1420 A read-write single value file which exists on non-root
1425 allow userspace to implement custom out-of-memory procedures.
1429 during regular operation. Compare to memory.swap.max, which
1431 continue unimpeded as long as other memory can be reclaimed.
1435 memory.swap.max
1436 A read-write single value file which exists on non-root
1440 limit, anonymous memory of the cgroup will not be swapped out.
1442 memory.swap.events
1443 A read-only flat-keyed file which exists on non-root cgroups.
1459 because of running out of swap system-wide or max
1465 reduces the impact on the workload and memory management.
1467 memory.pressure
1468 A read-only nested-key file which exists on non-root cgroups.
1470 Shows pressure stall information for memory. See
1477 "memory.high" is the main mechanism to control memory usage.
1478 Over-committing on high limit (sum of high limits > available memory)
1479 and letting global memory pressure to distribute memory according to
1485 more memory or terminating the workload.
1487 Determining whether a cgroup has enough memory is not trivial as
1488 memory usage doesn't indicate whether the workload can benefit from
1489 more memory. For example, a workload which writes data received from
1490 network to a file can use all available memory but can also operate as
1491 performant with a small amount of memory. A measure of memory
1492 pressure - how much the workload is being impacted due to lack of
1493 memory - is necessary to determine whether a workload needs more
1494 memory; unfortunately, memory pressure monitoring mechanism isn't
1498 Memory Ownership argument
1501 A memory area is charged to the cgroup which instantiated it and stays
1503 to a different cgroup doesn't move the memory usages that it
1506 A memory area may be used by processes belonging to different cgroups.
1507 To which cgroup the area will be charged is in-deterministic; however,
1508 over time, the memory area is likely to end up in a cgroup which has
1509 enough memory allowance to avoid high reclaim pressure.
1511 If a cgroup sweeps a considerable amount of memory which is expected
1513 POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
1514 belonging to the affected files to ensure correct memory ownership.
1518 --
1520 The "io" controller regulates the distribution of IO resources. This
1521 controller implements both weight based and absolute bandwidth or IOPS
1523 only if cfq-iosched is in use and neither scheme is available for
1524 blk-mq devices.
1531 A read-only nested-keyed file.
1551 A read-write nested-keyed file with exists only on the root
1555 model based controller (CONFIG_BLK_CGROUP_IOCOST) which
1563 enable Weight-based control enable
1573 The controller is disabled by default and can be enabled by
1575 to zero and the controller uses internal device saturation
1583 shows that on sdb, the controller is enabled, will consider
1595 devices which show wide temporary behavior changes - e.g. a
1606 A read-write nested-keyed file with exists only on the root
1610 controller (CONFIG_BLK_CGROUP_IOCOST) which currently
1619 model The cost model in use - "linear"
1645 generate device-specific coefficients.
1648 A read-write flat-keyed file which exists on non-root cgroups.
1668 A read-write nested-keyed file which exists on non-root
1682 When writing, any number of nested key-value pairs can be
1707 A read-only nested-key file which exists on non-root cgroups.
1718 mechanism. Writeback sits between the memory and IO domains and
1719 regulates the proportion of dirty memory by balancing dirtying and
1722 The io controller, in conjunction with the memory controller,
1723 implements control of page cache writeback IOs. The memory controller
1724 defines the memory domain that dirty memory ratio is calculated and
1725 maintained for and the io controller defines the io domain which
1726 writes out dirty pages for the memory domain. Both system-wide and
1727 per-cgroup dirty memory states are examined and the more restrictive
1735 There are inherent differences in memory and writeback management
1736 which affects how cgroup ownership is tracked. Memory is tracked per
1741 As cgroup ownership for memory is tracked per page, there can be pages
1753 As memory controller assigns page ownership on the first use and
1764 amount of available memory capped by limits imposed by the
1765 memory controller and system-wide clean memory.
1769 total available memory and applied the same way as
1776 This is a cgroup v2 controller for IO workload protection. You provide a group
1778 controller will throttle any peers that have a lower latency target than the
1798 your real setting, setting at 10-15% higher than the value in io.stat.
1804 target the controller doesn't do anything. Once a group starts missing its
1808 - Queue depth throttling. This is the number of outstanding IO's a group is
1812 - Artificial delay induction. There are certain types of IO that cannot be
1835 If the controller is enabled you will see extra stats in io.stat in
1859 no-change
1862 none-to-rt
1867 restrict-to-be
1878 +-------------+---+
1879 | no-change | 0 |
1880 +-------------+---+
1881 | none-to-rt | 1 |
1882 +-------------+---+
1883 | rt-to-be | 2 |
1884 +-------------+---+
1885 | all-to-idle | 3 |
1886 +-------------+---+
1890 +-------------------------------+---+
1892 +-------------------------------+---+
1893 | IOPRIO_CLASS_RT (real-time) | 1 |
1894 +-------------------------------+---+
1896 +-------------------------------+---+
1898 +-------------------------------+---+
1902 - Translate the I/O priority class policy into a number.
1903 - Change the request I/O priority class into the maximum of the I/O priority
1907 ---
1909 The process number controller is used to allow a cgroup to stop any
1914 controllers cannot prevent, thus warranting its own controller. For
1916 hitting memory restrictions.
1918 Note that PIDs used in this controller refer to TIDs, process IDs as
1926 A read-write single value file which exists on non-root
1932 A read-only single value file which exists on all cgroups.
1942 through fork() or clone(). These will return -EAGAIN if the creation
1947 ------
1949 The "cpuset" controller provides a mechanism for constraining
1950 the CPU and memory node placement of tasks to only the resources
1954 memory placement to reduce cross-node memory access and contention
1957 The "cpuset" controller is hierarchical. That means the controller
1958 cannot use CPUs or memory nodes not allowed in its parent.
1965 A read-write multiple values file which exists on non-root
1966 cpuset-enabled cgroups.
1973 The CPU numbers are comma-separated numbers or ranges.
1977 0-4,6,8-10
1980 setting as the nearest cgroup ancestor with a non-empty
1987 A read-only multiple values file which exists on all
1988 cpuset-enabled cgroups.
2004 A read-write multiple values file which exists on non-root
2005 cpuset-enabled cgroups.
2007 It lists the requested memory nodes to be used by tasks within
2008 this cgroup. The actual list of memory nodes granted, however,
2010 from the requested memory nodes.
2012 The memory node numbers are comma-separated numbers or ranges.
2016 0-1,3
2019 setting as the nearest cgroup ancestor with a non-empty
2020 "cpuset.mems" or all the available memory nodes if none
2024 and won't be affected by any memory nodes hotplug events.
2027 A read-only multiple values file which exists on all
2028 cpuset-enabled cgroups.
2030 It lists the onlined memory nodes that are actually granted to
2031 this cgroup by its parent. These memory nodes are allowed to
2034 If "cpuset.mems" is empty, it shows all the memory nodes from the
2037 the memory nodes listed in "cpuset.mems" can be granted. In this
2040 Its value will be affected by memory nodes hotplug events.
2043 A read-write single value file which exists on non-root
2044 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2049 "root" - a partition root
2050 "member" - a non-root member of a partition
2091 "member" Non-root member of a partition
2116 Device controller
2117 -----------------
2119 Device controller manages access to device files. It includes both
2123 Cgroup v2 device controller has no interface files and is implemented
2128 the attempt will succeed or fail with -EPERM.
2133 If the program returns 0, the attempt fails with -EPERM, otherwise
2141 ----
2143 The "rdma" controller regulates the distribution and accounting of
2150 A readwrite nested-keyed file that exists for all the cgroups
2171 A read-only file that describes current resource usage.
2180 -------
2182 The HugeTLB controller allows to limit the HugeTLB usage per control group and
2183 enforces the controller limit during page fault.
2197 A read-only flat-keyed file which exists on non-root cgroups.
2208 ----
2213 perf_event controller, if not mounted on a legacy hierarchy, is
2215 always be filtered by cgroup v2 path. The controller can still be
2219 Non-normative information
2220 -------------------------
2226 CPU controller root cgroup process behaviour
2236 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2239 IO controller root cgroup process behaviour
2252 ------
2271 The path '/batchjobs/container_id1' can be considered as system-data
2276 # ls -l /proc/self/ns/cgroup
2277 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2283 # ls -l /proc/self/ns/cgroup
2284 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2288 When some thread from a multi-threaded process unshares its cgroup
2300 ------------------
2311 # ~/unshare -c # unshare cgroupns in some cgroup
2319 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2350 ----------------------
2379 ---------------------------------
2382 running inside a non-init cgroup namespace::
2384 # mount -t cgroup2 none $MOUNT_POINT
2391 the view of cgroup hierarchy by namespace-private cgroupfs mount
2404 --------------------------------
2407 address_space_operations->writepage[s]() to annotate bio's using the
2424 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2441 - Multiple hierarchies including named ones are not supported.
2443 - All v1 mount options are not supported.
2445 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2447 - "cgroup.clone_children" is removed.
2449 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2457 --------------------
2463 For example, as there is only one instance of each controller, utility
2470 the specific controller.
2474 each controller on its own hierarchy. Only closely related ones, such
2493 Also, as a controller couldn't have any expectation regarding the
2495 controller had to assume that all other controllers were attached to
2502 depending on the specific controller. In other words, hierarchy may
2505 how memory is distributed beyond a certain level while still wanting
2510 ------------------
2518 Generally, in-process knowledge is available only to the process
2519 itself; thus, unlike service-level organization of processes,
2526 sub-hierarchies and control resource distributions along them. This
2527 effectively raised cgroup to the status of a syscall-like API exposed
2537 that the process would actually be operating on its own sub-hierarchy.
2541 system-management pseudo filesystem. cgroup ended up with interface
2544 individual applications through the ill-defined delegation mechanism
2554 -------------------------------------------
2562 The cpu controller considered threads and cgroups as equivalents and
2565 cycles and the number of internal threads fluctuated - the ratios
2571 The io controller implicitly created a hidden leaf node for each
2579 The memory controller didn't have a way to control what happened
2581 clearly defined. There were attempts to add ad-hoc behaviors and
2595 ----------------------
2599 was how an empty cgroup was notified - a userland helper binary was
2602 to in-kernel event delivery filtering mechanism further complicating
2605 Controller interfaces were problematic too. An extreme example is
2617 formats and units even in the same controller.
2623 Controller Issues and Remedies
2624 ------------------------------
2626 Memory subsection
2631 global reclaim prefers is opt-in, rather than opt-out. The costs for
2641 becomes self-defeating.
2643 The memory.low boundary on the other hand is a top-down allocated
2652 available memory. The memory consumption of workloads varies during
2660 The memory.high boundary on the other hand can be set much more
2666 and make corrections until the minimal memory footprint that still
2673 system than killing the group. Otherwise, memory.max is there to
2677 Setting the original memory.limit_in_bytes below the current usage was
2679 limit setting to fail. memory.max on the other hand will first set the
2681 new limit is met - or the task writing to memory.max is killed.
2683 The combined memory+swap accounting and limiting is replaced by real
2686 The main argument for a combined memory+swap facility in the original
2688 able to swap all anonymous memory of a child group, regardless of the
2690 groups can sabotage swapping by other means - such as referencing its
2691 anonymous memory in a tight loop - and an admin can not assume full