1*4882a593Smuzhiyun==================== 2*4882a593SmuzhiyunCredentials in Linux 3*4882a593Smuzhiyun==================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunBy: David Howells <dhowells@redhat.com> 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun.. contents:: :local: 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunOverview 10*4882a593Smuzhiyun======== 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunThere are several parts to the security check performed by Linux when one 13*4882a593Smuzhiyunobject acts upon another: 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun 1. Objects. 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun Objects are things in the system that may be acted upon directly by 18*4882a593Smuzhiyun userspace programs. Linux has a variety of actionable objects, including: 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun - Tasks 21*4882a593Smuzhiyun - Files/inodes 22*4882a593Smuzhiyun - Sockets 23*4882a593Smuzhiyun - Message queues 24*4882a593Smuzhiyun - Shared memory segments 25*4882a593Smuzhiyun - Semaphores 26*4882a593Smuzhiyun - Keys 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun As a part of the description of all these objects there is a set of 29*4882a593Smuzhiyun credentials. What's in the set depends on the type of object. 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun 2. Object ownership. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun Amongst the credentials of most objects, there will be a subset that 34*4882a593Smuzhiyun indicates the ownership of that object. This is used for resource 35*4882a593Smuzhiyun accounting and limitation (disk quotas and task rlimits for example). 36*4882a593Smuzhiyun 37*4882a593Smuzhiyun In a standard UNIX filesystem, for instance, this will be defined by the 38*4882a593Smuzhiyun UID marked on the inode. 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun 3. The objective context. 41*4882a593Smuzhiyun 42*4882a593Smuzhiyun Also amongst the credentials of those objects, there will be a subset that 43*4882a593Smuzhiyun indicates the 'objective context' of that object. This may or may not be 44*4882a593Smuzhiyun the same set as in (2) - in standard UNIX files, for instance, this is the 45*4882a593Smuzhiyun defined by the UID and the GID marked on the inode. 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun The objective context is used as part of the security calculation that is 48*4882a593Smuzhiyun carried out when an object is acted upon. 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun 4. Subjects. 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun A subject is an object that is acting upon another object. 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun Most of the objects in the system are inactive: they don't act on other 55*4882a593Smuzhiyun objects within the system. Processes/tasks are the obvious exception: 56*4882a593Smuzhiyun they do stuff; they access and manipulate things. 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun Objects other than tasks may under some circumstances also be subjects. 59*4882a593Smuzhiyun For instance an open file may send SIGIO to a task using the UID and EUID 60*4882a593Smuzhiyun given to it by a task that called ``fcntl(F_SETOWN)`` upon it. In this case, 61*4882a593Smuzhiyun the file struct will have a subjective context too. 62*4882a593Smuzhiyun 63*4882a593Smuzhiyun 5. The subjective context. 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun A subject has an additional interpretation of its credentials. A subset 66*4882a593Smuzhiyun of its credentials forms the 'subjective context'. The subjective context 67*4882a593Smuzhiyun is used as part of the security calculation that is carried out when a 68*4882a593Smuzhiyun subject acts. 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun A Linux task, for example, has the FSUID, FSGID and the supplementary 71*4882a593Smuzhiyun group list for when it is acting upon a file - which are quite separate 72*4882a593Smuzhiyun from the real UID and GID that normally form the objective context of the 73*4882a593Smuzhiyun task. 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun 6. Actions. 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun Linux has a number of actions available that a subject may perform upon an 78*4882a593Smuzhiyun object. The set of actions available depends on the nature of the subject 79*4882a593Smuzhiyun and the object. 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun Actions include reading, writing, creating and deleting files; forking or 82*4882a593Smuzhiyun signalling and tracing tasks. 83*4882a593Smuzhiyun 84*4882a593Smuzhiyun 7. Rules, access control lists and security calculations. 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun When a subject acts upon an object, a security calculation is made. This 87*4882a593Smuzhiyun involves taking the subjective context, the objective context and the 88*4882a593Smuzhiyun action, and searching one or more sets of rules to see whether the subject 89*4882a593Smuzhiyun is granted or denied permission to act in the desired manner on the 90*4882a593Smuzhiyun object, given those contexts. 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun There are two main sources of rules: 93*4882a593Smuzhiyun 94*4882a593Smuzhiyun a. Discretionary access control (DAC): 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun Sometimes the object will include sets of rules as part of its 97*4882a593Smuzhiyun description. This is an 'Access Control List' or 'ACL'. A Linux 98*4882a593Smuzhiyun file may supply more than one ACL. 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun A traditional UNIX file, for example, includes a permissions mask that 101*4882a593Smuzhiyun is an abbreviated ACL with three fixed classes of subject ('user', 102*4882a593Smuzhiyun 'group' and 'other'), each of which may be granted certain privileges 103*4882a593Smuzhiyun ('read', 'write' and 'execute' - whatever those map to for the object 104*4882a593Smuzhiyun in question). UNIX file permissions do not allow the arbitrary 105*4882a593Smuzhiyun specification of subjects, however, and so are of limited use. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun A Linux file might also sport a POSIX ACL. This is a list of rules 108*4882a593Smuzhiyun that grants various permissions to arbitrary subjects. 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun b. Mandatory access control (MAC): 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun The system as a whole may have one or more sets of rules that get 113*4882a593Smuzhiyun applied to all subjects and objects, regardless of their source. 114*4882a593Smuzhiyun SELinux and Smack are examples of this. 115*4882a593Smuzhiyun 116*4882a593Smuzhiyun In the case of SELinux and Smack, each object is given a label as part 117*4882a593Smuzhiyun of its credentials. When an action is requested, they take the 118*4882a593Smuzhiyun subject label, the object label and the action and look for a rule 119*4882a593Smuzhiyun that says that this action is either granted or denied. 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunTypes of Credentials 123*4882a593Smuzhiyun==================== 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunThe Linux kernel supports the following types of credentials: 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun 1. Traditional UNIX credentials. 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun - Real User ID 130*4882a593Smuzhiyun - Real Group ID 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun The UID and GID are carried by most, if not all, Linux objects, even if in 133*4882a593Smuzhiyun some cases it has to be invented (FAT or CIFS files for example, which are 134*4882a593Smuzhiyun derived from Windows). These (mostly) define the objective context of 135*4882a593Smuzhiyun that object, with tasks being slightly different in some cases. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun - Effective, Saved and FS User ID 138*4882a593Smuzhiyun - Effective, Saved and FS Group ID 139*4882a593Smuzhiyun - Supplementary groups 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun These are additional credentials used by tasks only. Usually, an 142*4882a593Smuzhiyun EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID 143*4882a593Smuzhiyun will be used as the objective. For tasks, it should be noted that this is 144*4882a593Smuzhiyun not always true. 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun 2. Capabilities. 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun - Set of permitted capabilities 149*4882a593Smuzhiyun - Set of inheritable capabilities 150*4882a593Smuzhiyun - Set of effective capabilities 151*4882a593Smuzhiyun - Capability bounding set 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun These are only carried by tasks. They indicate superior capabilities 154*4882a593Smuzhiyun granted piecemeal to a task that an ordinary task wouldn't otherwise have. 155*4882a593Smuzhiyun These are manipulated implicitly by changes to the traditional UNIX 156*4882a593Smuzhiyun credentials, but can also be manipulated directly by the ``capset()`` 157*4882a593Smuzhiyun system call. 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun The permitted capabilities are those caps that the process might grant 160*4882a593Smuzhiyun itself to its effective or permitted sets through ``capset()``. This 161*4882a593Smuzhiyun inheritable set might also be so constrained. 162*4882a593Smuzhiyun 163*4882a593Smuzhiyun The effective capabilities are the ones that a task is actually allowed to 164*4882a593Smuzhiyun make use of itself. 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun The inheritable capabilities are the ones that may get passed across 167*4882a593Smuzhiyun ``execve()``. 168*4882a593Smuzhiyun 169*4882a593Smuzhiyun The bounding set limits the capabilities that may be inherited across 170*4882a593Smuzhiyun ``execve()``, especially when a binary is executed that will execute as 171*4882a593Smuzhiyun UID 0. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun 3. Secure management flags (securebits). 174*4882a593Smuzhiyun 175*4882a593Smuzhiyun These are only carried by tasks. These govern the way the above 176*4882a593Smuzhiyun credentials are manipulated and inherited over certain operations such as 177*4882a593Smuzhiyun execve(). They aren't used directly as objective or subjective 178*4882a593Smuzhiyun credentials. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun 4. Keys and keyrings. 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun These are only carried by tasks. They carry and cache security tokens 183*4882a593Smuzhiyun that don't fit into the other standard UNIX credentials. They are for 184*4882a593Smuzhiyun making such things as network filesystem keys available to the file 185*4882a593Smuzhiyun accesses performed by processes, without the necessity of ordinary 186*4882a593Smuzhiyun programs having to know about security details involved. 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun Keyrings are a special type of key. They carry sets of other keys and can 189*4882a593Smuzhiyun be searched for the desired key. Each process may subscribe to a number 190*4882a593Smuzhiyun of keyrings: 191*4882a593Smuzhiyun 192*4882a593Smuzhiyun Per-thread keying 193*4882a593Smuzhiyun Per-process keyring 194*4882a593Smuzhiyun Per-session keyring 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun When a process accesses a key, if not already present, it will normally be 197*4882a593Smuzhiyun cached on one of these keyrings for future accesses to find. 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun For more information on using keys, see ``Documentation/security/keys/*``. 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun 5. LSM 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun The Linux Security Module allows extra controls to be placed over the 204*4882a593Smuzhiyun operations that a task may do. Currently Linux supports several LSM 205*4882a593Smuzhiyun options. 206*4882a593Smuzhiyun 207*4882a593Smuzhiyun Some work by labelling the objects in a system and then applying sets of 208*4882a593Smuzhiyun rules (policies) that say what operations a task with one label may do to 209*4882a593Smuzhiyun an object with another label. 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun 6. AF_KEY 212*4882a593Smuzhiyun 213*4882a593Smuzhiyun This is a socket-based approach to credential management for networking 214*4882a593Smuzhiyun stacks [RFC 2367]. It isn't discussed by this document as it doesn't 215*4882a593Smuzhiyun interact directly with task and file credentials; rather it keeps system 216*4882a593Smuzhiyun level credentials. 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunWhen a file is opened, part of the opening task's subjective context is 220*4882a593Smuzhiyunrecorded in the file struct created. This allows operations using that file 221*4882a593Smuzhiyunstruct to use those credentials instead of the subjective context of the task 222*4882a593Smuzhiyunthat issued the operation. An example of this would be a file opened on a 223*4882a593Smuzhiyunnetwork filesystem where the credentials of the opened file should be presented 224*4882a593Smuzhiyunto the server, regardless of who is actually doing a read or a write upon it. 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun 227*4882a593SmuzhiyunFile Markings 228*4882a593Smuzhiyun============= 229*4882a593Smuzhiyun 230*4882a593SmuzhiyunFiles on disk or obtained over the network may have annotations that form the 231*4882a593Smuzhiyunobjective security context of that file. Depending on the type of filesystem, 232*4882a593Smuzhiyunthis may include one or more of the following: 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun * UNIX UID, GID, mode; 235*4882a593Smuzhiyun * Windows user ID; 236*4882a593Smuzhiyun * Access control list; 237*4882a593Smuzhiyun * LSM security label; 238*4882a593Smuzhiyun * UNIX exec privilege escalation bits (SUID/SGID); 239*4882a593Smuzhiyun * File capabilities exec privilege escalation bits. 240*4882a593Smuzhiyun 241*4882a593SmuzhiyunThese are compared to the task's subjective security context, and certain 242*4882a593Smuzhiyunoperations allowed or disallowed as a result. In the case of execve(), the 243*4882a593Smuzhiyunprivilege escalation bits come into play, and may allow the resulting process 244*4882a593Smuzhiyunextra privileges, based on the annotations on the executable file. 245*4882a593Smuzhiyun 246*4882a593Smuzhiyun 247*4882a593SmuzhiyunTask Credentials 248*4882a593Smuzhiyun================ 249*4882a593Smuzhiyun 250*4882a593SmuzhiyunIn Linux, all of a task's credentials are held in (uid, gid) or through 251*4882a593Smuzhiyun(groups, keys, LSM security) a refcounted structure of type 'struct cred'. 252*4882a593SmuzhiyunEach task points to its credentials by a pointer called 'cred' in its 253*4882a593Smuzhiyuntask_struct. 254*4882a593Smuzhiyun 255*4882a593SmuzhiyunOnce a set of credentials has been prepared and committed, it may not be 256*4882a593Smuzhiyunchanged, barring the following exceptions: 257*4882a593Smuzhiyun 258*4882a593Smuzhiyun 1. its reference count may be changed; 259*4882a593Smuzhiyun 260*4882a593Smuzhiyun 2. the reference count on the group_info struct it points to may be changed; 261*4882a593Smuzhiyun 262*4882a593Smuzhiyun 3. the reference count on the security data it points to may be changed; 263*4882a593Smuzhiyun 264*4882a593Smuzhiyun 4. the reference count on any keyrings it points to may be changed; 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun 5. any keyrings it points to may be revoked, expired or have their security 267*4882a593Smuzhiyun attributes changed; and 268*4882a593Smuzhiyun 269*4882a593Smuzhiyun 6. the contents of any keyrings to which it points may be changed (the whole 270*4882a593Smuzhiyun point of keyrings being a shared set of credentials, modifiable by anyone 271*4882a593Smuzhiyun with appropriate access). 272*4882a593Smuzhiyun 273*4882a593SmuzhiyunTo alter anything in the cred struct, the copy-and-replace principle must be 274*4882a593Smuzhiyunadhered to. First take a copy, then alter the copy and then use RCU to change 275*4882a593Smuzhiyunthe task pointer to make it point to the new copy. There are wrappers to aid 276*4882a593Smuzhiyunwith this (see below). 277*4882a593Smuzhiyun 278*4882a593SmuzhiyunA task may only alter its _own_ credentials; it is no longer permitted for a 279*4882a593Smuzhiyuntask to alter another's credentials. This means the ``capset()`` system call 280*4882a593Smuzhiyunis no longer permitted to take any PID other than the one of the current 281*4882a593Smuzhiyunprocess. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no 282*4882a593Smuzhiyunlonger permit attachment to process-specific keyrings in the requesting 283*4882a593Smuzhiyunprocess as the instantiating process may need to create them. 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun 286*4882a593SmuzhiyunImmutable Credentials 287*4882a593Smuzhiyun--------------------- 288*4882a593Smuzhiyun 289*4882a593SmuzhiyunOnce a set of credentials has been made public (by calling ``commit_creds()`` 290*4882a593Smuzhiyunfor example), it must be considered immutable, barring two exceptions: 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun 1. The reference count may be altered. 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun 2. While the keyring subscriptions of a set of credentials may not be 295*4882a593Smuzhiyun changed, the keyrings subscribed to may have their contents altered. 296*4882a593Smuzhiyun 297*4882a593SmuzhiyunTo catch accidental credential alteration at compile time, struct task_struct 298*4882a593Smuzhiyunhas _const_ pointers to its credential sets, as does struct file. Furthermore, 299*4882a593Smuzhiyuncertain functions such as ``get_cred()`` and ``put_cred()`` operate on const 300*4882a593Smuzhiyunpointers, thus rendering casts unnecessary, but require to temporarily ditch 301*4882a593Smuzhiyunthe const qualification to be able to alter the reference count. 302*4882a593Smuzhiyun 303*4882a593Smuzhiyun 304*4882a593SmuzhiyunAccessing Task Credentials 305*4882a593Smuzhiyun-------------------------- 306*4882a593Smuzhiyun 307*4882a593SmuzhiyunA task being able to alter only its own credentials permits the current process 308*4882a593Smuzhiyunto read or replace its own credentials without the need for any form of locking 309*4882a593Smuzhiyun-- which simplifies things greatly. It can just call:: 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun const struct cred *current_cred() 312*4882a593Smuzhiyun 313*4882a593Smuzhiyunto get a pointer to its credentials structure, and it doesn't have to release 314*4882a593Smuzhiyunit afterwards. 315*4882a593Smuzhiyun 316*4882a593SmuzhiyunThere are convenience wrappers for retrieving specific aspects of a task's 317*4882a593Smuzhiyuncredentials (the value is simply returned in each case):: 318*4882a593Smuzhiyun 319*4882a593Smuzhiyun uid_t current_uid(void) Current's real UID 320*4882a593Smuzhiyun gid_t current_gid(void) Current's real GID 321*4882a593Smuzhiyun uid_t current_euid(void) Current's effective UID 322*4882a593Smuzhiyun gid_t current_egid(void) Current's effective GID 323*4882a593Smuzhiyun uid_t current_fsuid(void) Current's file access UID 324*4882a593Smuzhiyun gid_t current_fsgid(void) Current's file access GID 325*4882a593Smuzhiyun kernel_cap_t current_cap(void) Current's effective capabilities 326*4882a593Smuzhiyun struct user_struct *current_user(void) Current's user account 327*4882a593Smuzhiyun 328*4882a593SmuzhiyunThere are also convenience wrappers for retrieving specific associated pairs of 329*4882a593Smuzhiyuna task's credentials:: 330*4882a593Smuzhiyun 331*4882a593Smuzhiyun void current_uid_gid(uid_t *, gid_t *); 332*4882a593Smuzhiyun void current_euid_egid(uid_t *, gid_t *); 333*4882a593Smuzhiyun void current_fsuid_fsgid(uid_t *, gid_t *); 334*4882a593Smuzhiyun 335*4882a593Smuzhiyunwhich return these pairs of values through their arguments after retrieving 336*4882a593Smuzhiyunthem from the current task's credentials. 337*4882a593Smuzhiyun 338*4882a593Smuzhiyun 339*4882a593SmuzhiyunIn addition, there is a function for obtaining a reference on the current 340*4882a593Smuzhiyunprocess's current set of credentials:: 341*4882a593Smuzhiyun 342*4882a593Smuzhiyun const struct cred *get_current_cred(void); 343*4882a593Smuzhiyun 344*4882a593Smuzhiyunand functions for getting references to one of the credentials that don't 345*4882a593Smuzhiyunactually live in struct cred:: 346*4882a593Smuzhiyun 347*4882a593Smuzhiyun struct user_struct *get_current_user(void); 348*4882a593Smuzhiyun struct group_info *get_current_groups(void); 349*4882a593Smuzhiyun 350*4882a593Smuzhiyunwhich get references to the current process's user accounting structure and 351*4882a593Smuzhiyunsupplementary groups list respectively. 352*4882a593Smuzhiyun 353*4882a593SmuzhiyunOnce a reference has been obtained, it must be released with ``put_cred()``, 354*4882a593Smuzhiyun``free_uid()`` or ``put_group_info()`` as appropriate. 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun 357*4882a593SmuzhiyunAccessing Another Task's Credentials 358*4882a593Smuzhiyun------------------------------------ 359*4882a593Smuzhiyun 360*4882a593SmuzhiyunWhile a task may access its own credentials without the need for locking, the 361*4882a593Smuzhiyunsame is not true of a task wanting to access another task's credentials. It 362*4882a593Smuzhiyunmust use the RCU read lock and ``rcu_dereference()``. 363*4882a593Smuzhiyun 364*4882a593SmuzhiyunThe ``rcu_dereference()`` is wrapped by:: 365*4882a593Smuzhiyun 366*4882a593Smuzhiyun const struct cred *__task_cred(struct task_struct *task); 367*4882a593Smuzhiyun 368*4882a593SmuzhiyunThis should be used inside the RCU read lock, as in the following example:: 369*4882a593Smuzhiyun 370*4882a593Smuzhiyun void foo(struct task_struct *t, struct foo_data *f) 371*4882a593Smuzhiyun { 372*4882a593Smuzhiyun const struct cred *tcred; 373*4882a593Smuzhiyun ... 374*4882a593Smuzhiyun rcu_read_lock(); 375*4882a593Smuzhiyun tcred = __task_cred(t); 376*4882a593Smuzhiyun f->uid = tcred->uid; 377*4882a593Smuzhiyun f->gid = tcred->gid; 378*4882a593Smuzhiyun f->groups = get_group_info(tcred->groups); 379*4882a593Smuzhiyun rcu_read_unlock(); 380*4882a593Smuzhiyun ... 381*4882a593Smuzhiyun } 382*4882a593Smuzhiyun 383*4882a593SmuzhiyunShould it be necessary to hold another task's credentials for a long period of 384*4882a593Smuzhiyuntime, and possibly to sleep while doing so, then the caller should get a 385*4882a593Smuzhiyunreference on them using:: 386*4882a593Smuzhiyun 387*4882a593Smuzhiyun const struct cred *get_task_cred(struct task_struct *task); 388*4882a593Smuzhiyun 389*4882a593SmuzhiyunThis does all the RCU magic inside of it. The caller must call put_cred() on 390*4882a593Smuzhiyunthe credentials so obtained when they're finished with. 391*4882a593Smuzhiyun 392*4882a593Smuzhiyun.. note:: 393*4882a593Smuzhiyun The result of ``__task_cred()`` should not be passed directly to 394*4882a593Smuzhiyun ``get_cred()`` as this may race with ``commit_cred()``. 395*4882a593Smuzhiyun 396*4882a593SmuzhiyunThere are a couple of convenience functions to access bits of another task's 397*4882a593Smuzhiyuncredentials, hiding the RCU magic from the caller:: 398*4882a593Smuzhiyun 399*4882a593Smuzhiyun uid_t task_uid(task) Task's real UID 400*4882a593Smuzhiyun uid_t task_euid(task) Task's effective UID 401*4882a593Smuzhiyun 402*4882a593SmuzhiyunIf the caller is holding the RCU read lock at the time anyway, then:: 403*4882a593Smuzhiyun 404*4882a593Smuzhiyun __task_cred(task)->uid 405*4882a593Smuzhiyun __task_cred(task)->euid 406*4882a593Smuzhiyun 407*4882a593Smuzhiyunshould be used instead. Similarly, if multiple aspects of a task's credentials 408*4882a593Smuzhiyunneed to be accessed, RCU read lock should be used, ``__task_cred()`` called, 409*4882a593Smuzhiyunthe result stored in a temporary pointer and then the credential aspects called 410*4882a593Smuzhiyunfrom that before dropping the lock. This prevents the potentially expensive 411*4882a593SmuzhiyunRCU magic from being invoked multiple times. 412*4882a593Smuzhiyun 413*4882a593SmuzhiyunShould some other single aspect of another task's credentials need to be 414*4882a593Smuzhiyunaccessed, then this can be used:: 415*4882a593Smuzhiyun 416*4882a593Smuzhiyun task_cred_xxx(task, member) 417*4882a593Smuzhiyun 418*4882a593Smuzhiyunwhere 'member' is a non-pointer member of the cred struct. For instance:: 419*4882a593Smuzhiyun 420*4882a593Smuzhiyun uid_t task_cred_xxx(task, suid); 421*4882a593Smuzhiyun 422*4882a593Smuzhiyunwill retrieve 'struct cred::suid' from the task, doing the appropriate RCU 423*4882a593Smuzhiyunmagic. This may not be used for pointer members as what they point to may 424*4882a593Smuzhiyundisappear the moment the RCU read lock is dropped. 425*4882a593Smuzhiyun 426*4882a593Smuzhiyun 427*4882a593SmuzhiyunAltering Credentials 428*4882a593Smuzhiyun-------------------- 429*4882a593Smuzhiyun 430*4882a593SmuzhiyunAs previously mentioned, a task may only alter its own credentials, and may not 431*4882a593Smuzhiyunalter those of another task. This means that it doesn't need to use any 432*4882a593Smuzhiyunlocking to alter its own credentials. 433*4882a593Smuzhiyun 434*4882a593SmuzhiyunTo alter the current process's credentials, a function should first prepare a 435*4882a593Smuzhiyunnew set of credentials by calling:: 436*4882a593Smuzhiyun 437*4882a593Smuzhiyun struct cred *prepare_creds(void); 438*4882a593Smuzhiyun 439*4882a593Smuzhiyunthis locks current->cred_replace_mutex and then allocates and constructs a 440*4882a593Smuzhiyunduplicate of the current process's credentials, returning with the mutex still 441*4882a593Smuzhiyunheld if successful. It returns NULL if not successful (out of memory). 442*4882a593Smuzhiyun 443*4882a593SmuzhiyunThe mutex prevents ``ptrace()`` from altering the ptrace state of a process 444*4882a593Smuzhiyunwhile security checks on credentials construction and changing is taking place 445*4882a593Smuzhiyunas the ptrace state may alter the outcome, particularly in the case of 446*4882a593Smuzhiyun``execve()``. 447*4882a593Smuzhiyun 448*4882a593SmuzhiyunThe new credentials set should be altered appropriately, and any security 449*4882a593Smuzhiyunchecks and hooks done. Both the current and the proposed sets of credentials 450*4882a593Smuzhiyunare available for this purpose as current_cred() will return the current set 451*4882a593Smuzhiyunstill at this point. 452*4882a593Smuzhiyun 453*4882a593SmuzhiyunWhen replacing the group list, the new list must be sorted before it 454*4882a593Smuzhiyunis added to the credential, as a binary search is used to test for 455*4882a593Smuzhiyunmembership. In practice, this means groups_sort() should be 456*4882a593Smuzhiyuncalled before set_groups() or set_current_groups(). 457*4882a593Smuzhiyungroups_sort() must not be called on a ``struct group_list`` which 458*4882a593Smuzhiyunis shared as it may permute elements as part of the sorting process 459*4882a593Smuzhiyuneven if the array is already sorted. 460*4882a593Smuzhiyun 461*4882a593SmuzhiyunWhen the credential set is ready, it should be committed to the current process 462*4882a593Smuzhiyunby calling:: 463*4882a593Smuzhiyun 464*4882a593Smuzhiyun int commit_creds(struct cred *new); 465*4882a593Smuzhiyun 466*4882a593SmuzhiyunThis will alter various aspects of the credentials and the process, giving the 467*4882a593SmuzhiyunLSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to 468*4882a593Smuzhiyunactually commit the new credentials to ``current->cred``, it will release 469*4882a593Smuzhiyun``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it 470*4882a593Smuzhiyunwill notify the scheduler and others of the changes. 471*4882a593Smuzhiyun 472*4882a593SmuzhiyunThis function is guaranteed to return 0, so that it can be tail-called at the 473*4882a593Smuzhiyunend of such functions as ``sys_setresuid()``. 474*4882a593Smuzhiyun 475*4882a593SmuzhiyunNote that this function consumes the caller's reference to the new credentials. 476*4882a593SmuzhiyunThe caller should _not_ call ``put_cred()`` on the new credentials afterwards. 477*4882a593Smuzhiyun 478*4882a593SmuzhiyunFurthermore, once this function has been called on a new set of credentials, 479*4882a593Smuzhiyunthose credentials may _not_ be changed further. 480*4882a593Smuzhiyun 481*4882a593Smuzhiyun 482*4882a593SmuzhiyunShould the security checks fail or some other error occur after 483*4882a593Smuzhiyun``prepare_creds()`` has been called, then the following function should be 484*4882a593Smuzhiyuninvoked:: 485*4882a593Smuzhiyun 486*4882a593Smuzhiyun void abort_creds(struct cred *new); 487*4882a593Smuzhiyun 488*4882a593SmuzhiyunThis releases the lock on ``current->cred_replace_mutex`` that 489*4882a593Smuzhiyun``prepare_creds()`` got and then releases the new credentials. 490*4882a593Smuzhiyun 491*4882a593Smuzhiyun 492*4882a593SmuzhiyunA typical credentials alteration function would look something like this:: 493*4882a593Smuzhiyun 494*4882a593Smuzhiyun int alter_suid(uid_t suid) 495*4882a593Smuzhiyun { 496*4882a593Smuzhiyun struct cred *new; 497*4882a593Smuzhiyun int ret; 498*4882a593Smuzhiyun 499*4882a593Smuzhiyun new = prepare_creds(); 500*4882a593Smuzhiyun if (!new) 501*4882a593Smuzhiyun return -ENOMEM; 502*4882a593Smuzhiyun 503*4882a593Smuzhiyun new->suid = suid; 504*4882a593Smuzhiyun ret = security_alter_suid(new); 505*4882a593Smuzhiyun if (ret < 0) { 506*4882a593Smuzhiyun abort_creds(new); 507*4882a593Smuzhiyun return ret; 508*4882a593Smuzhiyun } 509*4882a593Smuzhiyun 510*4882a593Smuzhiyun return commit_creds(new); 511*4882a593Smuzhiyun } 512*4882a593Smuzhiyun 513*4882a593Smuzhiyun 514*4882a593SmuzhiyunManaging Credentials 515*4882a593Smuzhiyun-------------------- 516*4882a593Smuzhiyun 517*4882a593SmuzhiyunThere are some functions to help manage credentials: 518*4882a593Smuzhiyun 519*4882a593Smuzhiyun - ``void put_cred(const struct cred *cred);`` 520*4882a593Smuzhiyun 521*4882a593Smuzhiyun This releases a reference to the given set of credentials. If the 522*4882a593Smuzhiyun reference count reaches zero, the credentials will be scheduled for 523*4882a593Smuzhiyun destruction by the RCU system. 524*4882a593Smuzhiyun 525*4882a593Smuzhiyun - ``const struct cred *get_cred(const struct cred *cred);`` 526*4882a593Smuzhiyun 527*4882a593Smuzhiyun This gets a reference on a live set of credentials, returning a pointer to 528*4882a593Smuzhiyun that set of credentials. 529*4882a593Smuzhiyun 530*4882a593Smuzhiyun - ``struct cred *get_new_cred(struct cred *cred);`` 531*4882a593Smuzhiyun 532*4882a593Smuzhiyun This gets a reference on a set of credentials that is under construction 533*4882a593Smuzhiyun and is thus still mutable, returning a pointer to that set of credentials. 534*4882a593Smuzhiyun 535*4882a593Smuzhiyun 536*4882a593SmuzhiyunOpen File Credentials 537*4882a593Smuzhiyun===================== 538*4882a593Smuzhiyun 539*4882a593SmuzhiyunWhen a new file is opened, a reference is obtained on the opening task's 540*4882a593Smuzhiyuncredentials and this is attached to the file struct as ``f_cred`` in place of 541*4882a593Smuzhiyun``f_uid`` and ``f_gid``. Code that used to access ``file->f_uid`` and 542*4882a593Smuzhiyun``file->f_gid`` should now access ``file->f_cred->fsuid`` and 543*4882a593Smuzhiyun``file->f_cred->fsgid``. 544*4882a593Smuzhiyun 545*4882a593SmuzhiyunIt is safe to access ``f_cred`` without the use of RCU or locking because the 546*4882a593Smuzhiyunpointer will not change over the lifetime of the file struct, and nor will the 547*4882a593Smuzhiyuncontents of the cred struct pointed to, barring the exceptions listed above 548*4882a593Smuzhiyun(see the Task Credentials section). 549*4882a593Smuzhiyun 550*4882a593SmuzhiyunTo avoid "confused deputy" privilege escalation attacks, access control checks 551*4882a593Smuzhiyunduring subsequent operations on an opened file should use these credentials 552*4882a593Smuzhiyuninstead of "current"'s credentials, as the file may have been passed to a more 553*4882a593Smuzhiyunprivileged process. 554*4882a593Smuzhiyun 555*4882a593SmuzhiyunOverriding the VFS's Use of Credentials 556*4882a593Smuzhiyun======================================= 557*4882a593Smuzhiyun 558*4882a593SmuzhiyunUnder some circumstances it is desirable to override the credentials used by 559*4882a593Smuzhiyunthe VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a 560*4882a593Smuzhiyundifferent set of credentials. This is done in the following places: 561*4882a593Smuzhiyun 562*4882a593Smuzhiyun * ``sys_faccessat()``. 563*4882a593Smuzhiyun * ``do_coredump()``. 564*4882a593Smuzhiyun * nfs4recover.c. 565