Lines Matching +full:memory +full:- +full:controllers

9 conventions of cgroup v2.  It describes all userland-visible aspects
12 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
17 1-1. Terminology
18 1-2. What is cgroup?
20 2-1. Mounting
21 2-2. Organizing Processes and Threads
22 2-2-1. Processes
23 2-2-2. Threads
24 2-3. [Un]populated Notification
25 2-4. Controlling Controllers
26 2-4-1. Enabling and Disabling
27 2-4-2. Top-down Constraint
28 2-4-3. No Internal Process Constraint
29 2-5. Delegation
30 2-5-1. Model of Delegation
31 2-5-2. Delegation Containment
32 2-6. Guidelines
33 2-6-1. Organize Once and Control
34 2-6-2. Avoid Name Collisions
36 3-1. Weights
37 3-2. Limits
38 3-3. Protections
39 3-4. Allocations
41 4-1. Format
42 4-2. Conventions
43 4-3. Core Interface Files
44 5. Controllers
45 5-1. CPU
46 5-1-1. CPU Interface Files
47 5-2. Memory
48 5-2-1. Memory Interface Files
49 5-2-2. Usage Guidelines
50 5-2-3. Memory Ownership
51 5-3. IO
52 5-3-1. IO Interface Files
53 5-3-2. Writeback
54 5-3-3. IO Latency
55 5-3-3-1. How IO Latency Throttling Works
56 5-3-3-2. IO Latency Interface Files
57 5-3-4. IO Priority
58 5-4. PID
59 5-4-1. PID Interface Files
60 5-5. Cpuset
61 5.5-1. Cpuset Interface Files
62 5-6. Device
63 5-7. RDMA
64 5-7-1. RDMA Interface Files
65 5-8. HugeTLB
66 5.8-1. HugeTLB Interface Files
67 5-8. Misc
68 5-8-1. perf_event
69 5-N. Non-normative information
70 5-N-1. CPU controller root cgroup process behaviour
71 5-N-2. IO controller root cgroup process behaviour
73 6-1. Basics
74 6-2. The Root and Views
75 6-3. Migration and setns(2)
76 6-4. Interaction with Other Namespaces
78 P-1. Filesystem Support for Writeback
81 R-1. Multiple Hierarchies
82 R-2. Thread Granularity
83 R-3. Competition Between Inner Nodes and Threads
84 R-4. Other Interface Issues
85 R-5. Controller Issues and Remedies
86 R-5-1. Memory
93 -----------
97 qualifier as in "cgroup controllers". When explicitly referring to
102 ---------------
108 cgroup is largely composed of two parts - the core and controllers.
112 although there are utility controllers which serve purposes other than
122 Following certain structural constraints, controllers may be enabled or
124 hierarchical - if a controller is enabled on a cgroup, it affects all
126 sub-hierarchy of the cgroup. When a controller is enabled on a nested
136 --------
141 # mount -t cgroup2 none $MOUNT_POINT
144 controllers which support v2 and are not bound to a v1 hierarchy are
146 Controllers which are not in active use in the v2 hierarchy can be
151 is no longer referenced in its current hierarchy. Because per-cgroup
152 controller states are destroyed asynchronously and controllers may
158 to inter-controller dependencies, other controllers may need to be
162 controllers dynamically between the v2 and other hierarchies is
165 controllers after system boot.
168 automount the v1 cgroup filesystem and so hijack all controllers
171 disabling controllers in v1 and make them always available in v2.
180 ignored on non-init namespace mounts. Please refer to the
185 Only populate memory.events with data for the current cgroup,
190 option is ignored on non-init namespace mounts.
194 Recursively apply memory.min and memory.low protection to
199 behavior but is a mount-option to avoid regressing setups
205 --------------------------------
211 A child cgroup can be created by creating a sub-directory::
216 structure. Each cgroup has a read-writable interface file
218 belong to the cgroup one-per-line. The PIDs are not ordered and the
249 0::/test-cgroup/test-cgroup-nested
256 0::/test-cgroup/test-cgroup-nested (deleted)
262 cgroup v2 supports thread granularity for a subset of controllers to
270 Controllers which support thread mode are called threaded controllers.
271 The ones which don't are called domain controllers.
282 constraint - threaded controllers can be enabled on non-leaf cgroups
306 - As the cgroup will join the parent's resource domain. The parent
309 - When the parent is an unthreaded domain, it must not have any domain
310 controllers enabled or populated domain children. The root is
313 Topology-wise, a cgroup can be in an invalid state. Please consider
316 A (threaded domain) - B (threaded) - C (domain, just created)
325 cgroup becomes threaded or threaded controllers are enabled in the
331 threads in the cgroup. Except that the operations are per-thread
332 instead of per-process, "cgroup.threads" has the same format and
346 Only threaded controllers can be enabled in a threaded subtree. When
354 between threads in a non-leaf cgroup and its child cgroups. Each
359 --------------------------
361 Each non-root cgroup has a "cgroup.events" file which contains
362 "populated" field indicating whether the cgroup's sub-hierarchy has
366 example, to start a clean-up operation after all processes of a given
367 sub-hierarchy have exited. The populated state updates and
368 notifications are recursive. Consider the following sub-hierarchy
372 A(4) - B(0) - C(1)
381 Controlling Controllers
382 -----------------------
387 Each cgroup has a "cgroup.controllers" file which lists all
388 controllers available for the cgroup to enable::
390 # cat cgroup.controllers
391 cpu io memory
393 No controller is enabled by default. Controllers can be enabled and
396 # echo "+cpu +memory -io" > cgroup.subtree_control
398 Only controllers which are listed in "cgroup.controllers" can be
405 Consider the following sub-hierarchy. The enabled controllers are
408 A(cpu,memory) - B(memory) - C()
411 As A has "cpu" and "memory" enabled, A will control the distribution
412 of CPU cycles and memory to its children, in this case, B. As B has
413 "memory" enabled but not "CPU", C and D will compete freely on CPU
414 cycles but their division of memory available to B will be controlled.
420 D. Likewise, disabling "memory" from B would remove the "memory."
422 controller interface files - anything which doesn't start with
426 Top-down Constraint
429 Resources are distributed top-down and a cgroup can further distribute
431 parent. This means that all non-root "cgroup.subtree_control" files
432 can only contain controllers which are enabled in the parent's
441 Non-root cgroups can distribute domain resources to their children
444 controllers enabled in their "cgroup.subtree_control" files.
454 controllers. How resource consumption in the root cgroup is governed
456 refer to the Non-normative information section in the Controllers
464 children before enabling controllers in its "cgroup.subtree_control"
469 ----------
489 delegated, the user can build sub-hierarchy under the directory,
492 of all resource controllers are hierarchical and regardless of what
493 happens in the delegated sub-hierarchy, nothing can escape the
497 cgroups in or nesting depth of a delegated sub-hierarchy; however,
504 A delegated sub-hierarchy is contained in the sense that processes
505 can't be moved into or out of the sub-hierarchy by the delegatee.
508 requiring the following conditions for a process with a non-root euid
512 - The writer must have write access to the "cgroup.procs" file.
514 - The writer must have write access to the "cgroup.procs" file of the
518 processes around freely in the delegated sub-hierarchy it can't pull
519 in from or push out to outside the sub-hierarchy.
525 ~~~~~~~~~~~~~ - C0 - C00
528 ~~~~~~~~~~~~~ - C1 - C10
535 will be denied with -EACCES.
540 is not reachable, the migration is rejected with -ENOENT.
544 ----------
550 and stateful resources such as memory are not moved together with the
552 inherent trade-offs between migration and various hot paths in terms
558 resource structure once on start-up. Dynamic adjustments to resource
585 cgroup controllers implement several resource distribution schemes
591 -------
597 work-conserving. Due to the dynamic nature, this model is usually
613 ------
616 Limits can be over-committed - the sum of the limits of children can
621 As limits can be over-committed, all configuration combinations are
630 -----------
635 soft boundaries. Protections can also be over-committed in which case
642 As protections can be over-committed, all configuration combinations
646 "memory.low" implements best-effort memory protection and is an
651 -----------
654 resource. Allocations can't be over-committed - the sum of the
661 As allocations can't be over-committed, some configuration
666 "cpu.rt.max" hard-allocates realtime slices and is an example of this
674 ------
679 New-line separated values
687 (when read-only or multiple values can be written at once)
704 reading; however, controllers may allow omitting later fields or
713 -----------
715 - Settings for a single feature should be contained in a single file.
717 - The root cgroup should be exempt from resource control and thus
720 - The default time unit is microseconds. If a different unit is ever
723 - A parts-per quantity should use a percentage decimal with at least
724 two digit fractional part - e.g. 13.40.
726 - If a controller implements weight based resource distribution, its
732 - If a controller implements an absolute resource guarantee and/or
741 - If a setting has a configurable default value and keyed specific
755 # cat cgroup-example-interface-file
761 # echo 125 > cgroup-example-interface-file
765 # echo "default 125" > cgroup-example-interface-file
769 # echo "8:16 170" > cgroup-example-interface-file
773 # echo "8:0 default" > cgroup-example-interface-file
774 # cat cgroup-example-interface-file
778 - For events which are not very high frequency, an interface file
785 --------------------
791 A read-write single value file which exists on non-root
797 - "domain" : A normal valid domain cgroup.
799 - "domain threaded" : A threaded domain cgroup which is
802 - "domain invalid" : A cgroup which is in an invalid state.
803 It can't be populated or have controllers enabled. It may
806 - "threaded" : A threaded cgroup which is a member of a
813 A read-write new-line separated values file which exists on
817 the cgroup one-per-line. The PIDs are not ordered and the
826 - It must have write access to the "cgroup.procs" file.
828 - It must have write access to the "cgroup.procs" file of the
831 When delegating a sub-hierarchy, write access to this file
839 A read-write new-line separated values file which exists on
843 the cgroup one-per-line. The TIDs are not ordered and the
852 - It must have write access to the "cgroup.threads" file.
854 - The cgroup that the thread is currently in must be in the
857 - It must have write access to the "cgroup.procs" file of the
860 When delegating a sub-hierarchy, write access to this file
863 cgroup.controllers
864 A read-only space separated values file which exists on all
867 It shows space separated list of all controllers available to
868 the cgroup. The controllers are not ordered.
871 A read-write space separated values file which exists on all
874 When read, it shows space separated list of the controllers
878 Space separated list of controllers prefixed with '+' or '-'
879 can be written to enable or disable controllers. A controller
880 name prefixed with '+' enables the controller and '-'
886 A read-only flat-keyed file which exists on non-root cgroups.
898 A read-write single value files. The default is "max".
905 A read-write single value files. The default is "max".
912 A read-only flat-keyed file with the following entries:
930 A read-write single value file which exists on non-root cgroups.
953 create new sub-cgroups.
955 Controllers chapter
959 ---
961 The "cpu" controllers regulates distribution of CPU cycles. This
987 A read-only flat-keyed file.
992 - usage_usec
993 - user_usec
994 - system_usec
998 - nr_periods
999 - nr_throttled
1000 - throttled_usec
1003 A read-write single value file which exists on non-root
1009 A read-write single value file which exists on non-root
1012 The nice value is in the range [-20, 19].
1021 A read-write two value file which exists on non-root cgroups.
1033 A read-only nested-key file which exists on non-root cgroups.
1039 A read-write single value file which exists on non-root cgroups.
1054 A read-write single value file which exists on non-root cgroups.
1066 Memory section in Controllers
1067 ------
1069 The "memory" controller regulates distribution of memory. Memory is
1071 intertwining between memory usage and reclaim pressure and the
1072 stateful nature of memory, the distribution model is relatively
1075 While not completely water-tight, all major memory usages by a given
1076 cgroup are tracked so that the total memory consumption can be
1078 following types of memory usages are tracked.
1080 - Userland memory - page cache and anonymous memory.
1082 - Kernel data structures such as dentries and inodes.
1084 - TCP socket buffers.
1089 Memory Interface Files argument
1092 All memory amounts are in bytes. If a value which is not aligned to
1096 memory.current
1097 A read-only single value file which exists on non-root
1100 The total amount of memory currently being used by the cgroup
1103 memory.min
1104 A read-write single value file which exists on non-root
1107 Hard memory protection. If the memory usage of a cgroup
1108 is within its effective min boundary, the cgroup's memory
1110 unprotected reclaimable memory available, OOM killer
1116 Effective min boundary is limited by memory.min values of
1117 all ancestor cgroups. If there is memory.min overcommitment
1118 (child cgroup or cgroups are requiring more protected memory
1121 actual memory usage below memory.min.
1123 Putting more memory than generally available under this
1126 If a memory cgroup is not populated with processes,
1127 its memory.min is ignored.
1129 memory.low
1130 A read-write single value file which exists on non-root
1133 Best-effort memory protection. If the memory usage of a
1135 memory won't be reclaimed unless there is no reclaimable
1136 memory available in unprotected cgroups.
1142 Effective low boundary is limited by memory.low values of
1143 all ancestor cgroups. If there is memory.low overcommitment
1144 (child cgroup or cgroups are requiring more protected memory
1147 actual memory usage below memory.low.
1149 Putting more memory than generally available under this
1152 memory.high
1153 A read-write single value file which exists on non-root
1156 Memory usage throttle limit. This is the main mechanism to
1157 control memory usage of a cgroup. If a cgroup's usage goes
1164 memory.max
1165 A read-write single value file which exists on non-root
1168 Memory usage hard limit. This is the final protection
1169 mechanism. If a cgroup's memory usage reaches this limit and
1174 In default configuration regular 0-order allocations always
1179 as -ENOMEM or silently ignore in cases like disk readahead.
1185 memory.oom.group
1186 A read-write single value file which exists on non-root
1192 (if the memory cgroup is not a leaf cgroup) are killed
1196 Tasks with the OOM protection (oom_score_adj set to -1000)
1201 memory.oom.group values of ancestor cgroups.
1203 memory.events
1204 A read-only flat-keyed file which exists on non-root cgroups.
1212 memory.events.local.
1216 high memory pressure even though its usage is under
1218 boundary is over-committed.
1222 throttled and routed to perform direct memory reclaim
1223 because the high memory boundary was exceeded. For a
1224 cgroup whose memory usage is capped by the high limit
1225 rather than global memory pressure, this event's
1229 The number of times the cgroup's memory usage was
1234 The number of time the cgroup's memory usage was
1238 considered as an option, e.g. for failed high-order
1245 memory.events.local
1246 Similar to memory.events but the fields in the file are local
1250 memory.stat
1251 A read-only flat-keyed file which exists on non-root cgroups.
1253 This breaks down the cgroup's memory footprint into different
1254 types of memory, type-specific details, and other information
1255 on the state and past events of the memory management system.
1257 All memory amounts are in bytes.
1263 If the entry has no per-node counter(or not show in the
1264 mempry.numa_stat). We use 'npn'(non-per-node) as the tag
1268 Amount of memory used in anonymous mappings such as
1272 Amount of memory used to cache filesystem data,
1273 including tmpfs and shared memory.
1276 Amount of memory allocated to kernel stacks.
1279 Amount of memory used for storing per-cpu kernel
1283 Amount of memory used in network transmission buffers
1286 Amount of cached filesystem data that is swap-backed,
1301 Amount of memory used in anonymous mappings backed by
1305 Amount of memory, swap-backed and filesystem-backed,
1306 on the internal memory management lists used by the
1310 memory management lists), inactive_foo + active_foo may not be equal to
1311 the value for the foo counter, since the foo counter is type-based, not
1312 list-based.
1319 Part of "slab" that cannot be reclaimed on memory
1323 Amount of memory used for storing in-kernel data
1372 Amount of pages postponed to be freed under memory pressure
1387 memory.numa_stat
1388 A read-only nested-keyed file which exists on non-root cgroups.
1390 This breaks down the cgroup's memory footprint into different
1391 types of memory, type-specific details, and other information
1392 per node on the state of the memory management system.
1400 All memory amounts are in bytes.
1402 The output format of memory.numa_stat is::
1410 The entries can refer to the memory.stat.
1412 memory.swap.current
1413 A read-only single value file which exists on non-root
1419 memory.swap.high
1420 A read-write single value file which exists on non-root
1425 allow userspace to implement custom out-of-memory procedures.
1429 during regular operation. Compare to memory.swap.max, which
1431 continue unimpeded as long as other memory can be reclaimed.
1435 memory.swap.max
1436 A read-write single value file which exists on non-root
1440 limit, anonymous memory of the cgroup will not be swapped out.
1442 memory.swap.events
1443 A read-only flat-keyed file which exists on non-root cgroups.
1459 because of running out of swap system-wide or max
1465 reduces the impact on the workload and memory management.
1467 memory.pressure
1468 A read-only nested-key file which exists on non-root cgroups.
1470 Shows pressure stall information for memory. See
1477 "memory.high" is the main mechanism to control memory usage.
1478 Over-committing on high limit (sum of high limits > available memory)
1479 and letting global memory pressure to distribute memory according to
1485 more memory or terminating the workload.
1487 Determining whether a cgroup has enough memory is not trivial as
1488 memory usage doesn't indicate whether the workload can benefit from
1489 more memory. For example, a workload which writes data received from
1490 network to a file can use all available memory but can also operate as
1491 performant with a small amount of memory. A measure of memory
1492 pressure - how much the workload is being impacted due to lack of
1493 memory - is necessary to determine whether a workload needs more
1494 memory; unfortunately, memory pressure monitoring mechanism isn't
1498 Memory Ownership argument
1501 A memory area is charged to the cgroup which instantiated it and stays
1503 to a different cgroup doesn't move the memory usages that it
1506 A memory area may be used by processes belonging to different cgroups.
1507 To which cgroup the area will be charged is in-deterministic; however,
1508 over time, the memory area is likely to end up in a cgroup which has
1509 enough memory allowance to avoid high reclaim pressure.
1511 If a cgroup sweeps a considerable amount of memory which is expected
1513 POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
1514 belonging to the affected files to ensure correct memory ownership.
1518 --
1523 only if cfq-iosched is in use and neither scheme is available for
1524 blk-mq devices.
1531 A read-only nested-keyed file.
1551 A read-write nested-keyed file with exists only on the root
1563 enable Weight-based control enable
1595 devices which show wide temporary behavior changes - e.g. a
1606 A read-write nested-keyed file with exists only on the root
1619 model The cost model in use - "linear"
1645 generate device-specific coefficients.
1648 A read-write flat-keyed file which exists on non-root cgroups.
1668 A read-write nested-keyed file which exists on non-root
1682 When writing, any number of nested key-value pairs can be
1707 A read-only nested-key file which exists on non-root cgroups.
1718 mechanism. Writeback sits between the memory and IO domains and
1719 regulates the proportion of dirty memory by balancing dirtying and
1722 The io controller, in conjunction with the memory controller,
1723 implements control of page cache writeback IOs. The memory controller
1724 defines the memory domain that dirty memory ratio is calculated and
1726 writes out dirty pages for the memory domain. Both system-wide and
1727 per-cgroup dirty memory states are examined and the more restrictive
1735 There are inherent differences in memory and writeback management
1736 which affects how cgroup ownership is tracked. Memory is tracked per
1741 As cgroup ownership for memory is tracked per page, there can be pages
1753 As memory controller assigns page ownership on the first use and
1764 amount of available memory capped by limits imposed by the
1765 memory controller and system-wide clean memory.
1769 total available memory and applied the same way as
1798 your real setting, setting at 10-15% higher than the value in io.stat.
1808 - Queue depth throttling. This is the number of outstanding IO's a group is
1812 - Artificial delay induction. There are certain types of IO that cannot be
1830 This takes a similar format as the other controllers.
1859 no-change
1862 none-to-rt
1867 restrict-to-be
1878 +-------------+---+
1879 | no-change | 0 |
1880 +-------------+---+
1881 | none-to-rt | 1 |
1882 +-------------+---+
1883 | rt-to-be | 2 |
1884 +-------------+---+
1885 | all-to-idle | 3 |
1886 +-------------+---+
1890 +-------------------------------+---+
1892 +-------------------------------+---+
1893 | IOPRIO_CLASS_RT (real-time) | 1 |
1894 +-------------------------------+---+
1896 +-------------------------------+---+
1898 +-------------------------------+---+
1902 - Translate the I/O priority class policy into a number.
1903 - Change the request I/O priority class into the maximum of the I/O priority
1907 ---
1914 controllers cannot prevent, thus warranting its own controller. For
1916 hitting memory restrictions.
1926 A read-write single value file which exists on non-root
1932 A read-only single value file which exists on all cgroups.
1942 through fork() or clone(). These will return -EAGAIN if the creation
1947 ------
1950 the CPU and memory node placement of tasks to only the resources
1954 memory placement to reduce cross-node memory access and contention
1958 cannot use CPUs or memory nodes not allowed in its parent.
1965 A read-write multiple values file which exists on non-root
1966 cpuset-enabled cgroups.
1973 The CPU numbers are comma-separated numbers or ranges.
1977 0-4,6,8-10
1980 setting as the nearest cgroup ancestor with a non-empty
1987 A read-only multiple values file which exists on all
1988 cpuset-enabled cgroups.
2004 A read-write multiple values file which exists on non-root
2005 cpuset-enabled cgroups.
2007 It lists the requested memory nodes to be used by tasks within
2008 this cgroup. The actual list of memory nodes granted, however,
2010 from the requested memory nodes.
2012 The memory node numbers are comma-separated numbers or ranges.
2016 0-1,3
2019 setting as the nearest cgroup ancestor with a non-empty
2020 "cpuset.mems" or all the available memory nodes if none
2024 and won't be affected by any memory nodes hotplug events.
2027 A read-only multiple values file which exists on all
2028 cpuset-enabled cgroups.
2030 It lists the onlined memory nodes that are actually granted to
2031 this cgroup by its parent. These memory nodes are allowed to
2034 If "cpuset.mems" is empty, it shows all the memory nodes from the
2037 the memory nodes listed in "cpuset.mems" can be granted. In this
2040 Its value will be affected by memory nodes hotplug events.
2043 A read-write single value file which exists on non-root
2044 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2049 "root" - a partition root
2050 "member" - a non-root member of a partition
2091 "member" Non-root member of a partition
2117 -----------------
2128 the attempt will succeed or fail with -EPERM.
2133 If the program returns 0, the attempt fails with -EPERM, otherwise
2141 ----
2150 A readwrite nested-keyed file that exists for all the cgroups
2171 A read-only file that describes current resource usage.
2180 -------
2197 A read-only flat-keyed file which exists on non-root cgroups.
2208 ----
2219 Non-normative information
2220 -------------------------
2236 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2252 ------
2271 The path '/batchjobs/container_id1' can be considered as system-data
2276 # ls -l /proc/self/ns/cgroup
2277 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2283 # ls -l /proc/self/ns/cgroup
2284 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2288 When some thread from a multi-threaded process unshares its cgroup
2300 ------------------
2311 # ~/unshare -c # unshare cgroupns in some cgroup
2319 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2350 ----------------------
2379 ---------------------------------
2382 running inside a non-init cgroup namespace::
2384 # mount -t cgroup2 none $MOUNT_POINT
2391 the view of cgroup hierarchy by namespace-private cgroupfs mount
2400 controllers are not covered.
2404 --------------------------------
2407 address_space_operations->writepage[s]() to annotate bio's using the
2424 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2441 - Multiple hierarchies including named ones are not supported.
2443 - All v1 mount options are not supported.
2445 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2447 - "cgroup.clone_children" is removed.
2449 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2457 --------------------
2460 hierarchy could host any number of controllers. While this seemed to
2464 type controllers such as freezer which can be useful in all
2466 the fact that controllers couldn't be moved to another hierarchy once
2467 hierarchies were populated. Another issue was that all controllers
2472 In practice, these issues heavily limited which controllers could be
2475 as the cpu and cpuacct controllers, made sense to be put on the same
2483 used in general and what controllers was able to do.
2489 addition of controllers which existed only to identify membership,
2494 topologies of hierarchies other controllers might be on, each
2495 controller had to assume that all other controllers were attached to
2497 least very cumbersome, for controllers to cooperate with each other.
2499 In most use cases, putting controllers on hierarchies which are
2504 controllers. For example, a given configuration might not care about
2505 how memory is distributed beyond a certain level while still wanting
2510 ------------------
2513 This didn't make sense for some controllers and those controllers
2518 Generally, in-process knowledge is available only to the process
2519 itself; thus, unlike service-level organization of processes,
2526 sub-hierarchies and control resource distributions along them. This
2527 effectively raised cgroup to the status of a syscall-like API exposed
2537 that the process would actually be operating on its own sub-hierarchy.
2539 cgroup controllers implemented a number of knobs which would never be
2541 system-management pseudo filesystem. cgroup ended up with interface
2544 individual applications through the ill-defined delegation mechanism
2554 -------------------------------------------
2560 settle it. Different controllers did different things.
2565 cycles and the number of internal threads fluctuated - the ratios
2579 The memory controller didn't have a way to control what happened
2581 clearly defined. There were attempts to add ad-hoc behaviors and
2585 Multiple controllers struggled with internal tasks and came up with
2595 ----------------------
2599 was how an empty cgroup was notified - a userland helper binary was
2602 to in-kernel event delivery filtering mechanism further complicating
2606 controllers completely ignoring hierarchical organization and treating
2608 cgroup. Some controllers exposed a large amount of inconsistent
2611 There also was no consistency across controllers. When a new cgroup
2612 was created, some controllers defaulted to not imposing extra
2620 controllers so that they expose minimal and consistent interfaces.
2624 ------------------------------
2626 Memory subsection
2631 global reclaim prefers is opt-in, rather than opt-out. The costs for
2641 becomes self-defeating.
2643 The memory.low boundary on the other hand is a top-down allocated
2652 available memory. The memory consumption of workloads varies during
2660 The memory.high boundary on the other hand can be set much more
2666 and make corrections until the minimal memory footprint that still
2673 system than killing the group. Otherwise, memory.max is there to
2677 Setting the original memory.limit_in_bytes below the current usage was
2679 limit setting to fail. memory.max on the other hand will first set the
2681 new limit is met - or the task writing to memory.max is killed.
2683 The combined memory+swap accounting and limiting is replaced by real
2686 The main argument for a combined memory+swap facility in the original
2688 able to swap all anonymous memory of a child group, regardless of the
2690 groups can sabotage swapping by other means - such as referencing its
2691 anonymous memory in a tight loop - and an admin can not assume full
2696 that cgroup controllers should account and limit specific physical