xref: /OK3568_Linux_fs/kernel/Documentation/security/credentials.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun====================
2*4882a593SmuzhiyunCredentials in Linux
3*4882a593Smuzhiyun====================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunBy: David Howells <dhowells@redhat.com>
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun.. contents:: :local:
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunOverview
10*4882a593Smuzhiyun========
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunThere are several parts to the security check performed by Linux when one
13*4882a593Smuzhiyunobject acts upon another:
14*4882a593Smuzhiyun
15*4882a593Smuzhiyun 1. Objects.
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun     Objects are things in the system that may be acted upon directly by
18*4882a593Smuzhiyun     userspace programs.  Linux has a variety of actionable objects, including:
19*4882a593Smuzhiyun
20*4882a593Smuzhiyun	- Tasks
21*4882a593Smuzhiyun	- Files/inodes
22*4882a593Smuzhiyun	- Sockets
23*4882a593Smuzhiyun	- Message queues
24*4882a593Smuzhiyun	- Shared memory segments
25*4882a593Smuzhiyun	- Semaphores
26*4882a593Smuzhiyun	- Keys
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun     As a part of the description of all these objects there is a set of
29*4882a593Smuzhiyun     credentials.  What's in the set depends on the type of object.
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun 2. Object ownership.
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun     Amongst the credentials of most objects, there will be a subset that
34*4882a593Smuzhiyun     indicates the ownership of that object.  This is used for resource
35*4882a593Smuzhiyun     accounting and limitation (disk quotas and task rlimits for example).
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun     In a standard UNIX filesystem, for instance, this will be defined by the
38*4882a593Smuzhiyun     UID marked on the inode.
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun 3. The objective context.
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun     Also amongst the credentials of those objects, there will be a subset that
43*4882a593Smuzhiyun     indicates the 'objective context' of that object.  This may or may not be
44*4882a593Smuzhiyun     the same set as in (2) - in standard UNIX files, for instance, this is the
45*4882a593Smuzhiyun     defined by the UID and the GID marked on the inode.
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun     The objective context is used as part of the security calculation that is
48*4882a593Smuzhiyun     carried out when an object is acted upon.
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun 4. Subjects.
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun     A subject is an object that is acting upon another object.
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun     Most of the objects in the system are inactive: they don't act on other
55*4882a593Smuzhiyun     objects within the system.  Processes/tasks are the obvious exception:
56*4882a593Smuzhiyun     they do stuff; they access and manipulate things.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun     Objects other than tasks may under some circumstances also be subjects.
59*4882a593Smuzhiyun     For instance an open file may send SIGIO to a task using the UID and EUID
60*4882a593Smuzhiyun     given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
61*4882a593Smuzhiyun     the file struct will have a subjective context too.
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun 5. The subjective context.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun     A subject has an additional interpretation of its credentials.  A subset
66*4882a593Smuzhiyun     of its credentials forms the 'subjective context'.  The subjective context
67*4882a593Smuzhiyun     is used as part of the security calculation that is carried out when a
68*4882a593Smuzhiyun     subject acts.
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun     A Linux task, for example, has the FSUID, FSGID and the supplementary
71*4882a593Smuzhiyun     group list for when it is acting upon a file - which are quite separate
72*4882a593Smuzhiyun     from the real UID and GID that normally form the objective context of the
73*4882a593Smuzhiyun     task.
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun 6. Actions.
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun     Linux has a number of actions available that a subject may perform upon an
78*4882a593Smuzhiyun     object.  The set of actions available depends on the nature of the subject
79*4882a593Smuzhiyun     and the object.
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun     Actions include reading, writing, creating and deleting files; forking or
82*4882a593Smuzhiyun     signalling and tracing tasks.
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun 7. Rules, access control lists and security calculations.
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun     When a subject acts upon an object, a security calculation is made.  This
87*4882a593Smuzhiyun     involves taking the subjective context, the objective context and the
88*4882a593Smuzhiyun     action, and searching one or more sets of rules to see whether the subject
89*4882a593Smuzhiyun     is granted or denied permission to act in the desired manner on the
90*4882a593Smuzhiyun     object, given those contexts.
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun     There are two main sources of rules:
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun     a. Discretionary access control (DAC):
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun	 Sometimes the object will include sets of rules as part of its
97*4882a593Smuzhiyun	 description.  This is an 'Access Control List' or 'ACL'.  A Linux
98*4882a593Smuzhiyun	 file may supply more than one ACL.
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun	 A traditional UNIX file, for example, includes a permissions mask that
101*4882a593Smuzhiyun	 is an abbreviated ACL with three fixed classes of subject ('user',
102*4882a593Smuzhiyun	 'group' and 'other'), each of which may be granted certain privileges
103*4882a593Smuzhiyun	 ('read', 'write' and 'execute' - whatever those map to for the object
104*4882a593Smuzhiyun	 in question).  UNIX file permissions do not allow the arbitrary
105*4882a593Smuzhiyun	 specification of subjects, however, and so are of limited use.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun	 A Linux file might also sport a POSIX ACL.  This is a list of rules
108*4882a593Smuzhiyun	 that grants various permissions to arbitrary subjects.
109*4882a593Smuzhiyun
110*4882a593Smuzhiyun     b. Mandatory access control (MAC):
111*4882a593Smuzhiyun
112*4882a593Smuzhiyun	 The system as a whole may have one or more sets of rules that get
113*4882a593Smuzhiyun	 applied to all subjects and objects, regardless of their source.
114*4882a593Smuzhiyun	 SELinux and Smack are examples of this.
115*4882a593Smuzhiyun
116*4882a593Smuzhiyun	 In the case of SELinux and Smack, each object is given a label as part
117*4882a593Smuzhiyun	 of its credentials.  When an action is requested, they take the
118*4882a593Smuzhiyun	 subject label, the object label and the action and look for a rule
119*4882a593Smuzhiyun	 that says that this action is either granted or denied.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun
122*4882a593SmuzhiyunTypes of Credentials
123*4882a593Smuzhiyun====================
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunThe Linux kernel supports the following types of credentials:
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun 1. Traditional UNIX credentials.
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun	- Real User ID
130*4882a593Smuzhiyun	- Real Group ID
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun     The UID and GID are carried by most, if not all, Linux objects, even if in
133*4882a593Smuzhiyun     some cases it has to be invented (FAT or CIFS files for example, which are
134*4882a593Smuzhiyun     derived from Windows).  These (mostly) define the objective context of
135*4882a593Smuzhiyun     that object, with tasks being slightly different in some cases.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun	- Effective, Saved and FS User ID
138*4882a593Smuzhiyun	- Effective, Saved and FS Group ID
139*4882a593Smuzhiyun	- Supplementary groups
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun     These are additional credentials used by tasks only.  Usually, an
142*4882a593Smuzhiyun     EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
143*4882a593Smuzhiyun     will be used as the objective.  For tasks, it should be noted that this is
144*4882a593Smuzhiyun     not always true.
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun 2. Capabilities.
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun	- Set of permitted capabilities
149*4882a593Smuzhiyun	- Set of inheritable capabilities
150*4882a593Smuzhiyun	- Set of effective capabilities
151*4882a593Smuzhiyun	- Capability bounding set
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun     These are only carried by tasks.  They indicate superior capabilities
154*4882a593Smuzhiyun     granted piecemeal to a task that an ordinary task wouldn't otherwise have.
155*4882a593Smuzhiyun     These are manipulated implicitly by changes to the traditional UNIX
156*4882a593Smuzhiyun     credentials, but can also be manipulated directly by the ``capset()``
157*4882a593Smuzhiyun     system call.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun     The permitted capabilities are those caps that the process might grant
160*4882a593Smuzhiyun     itself to its effective or permitted sets through ``capset()``.  This
161*4882a593Smuzhiyun     inheritable set might also be so constrained.
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun     The effective capabilities are the ones that a task is actually allowed to
164*4882a593Smuzhiyun     make use of itself.
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun     The inheritable capabilities are the ones that may get passed across
167*4882a593Smuzhiyun     ``execve()``.
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun     The bounding set limits the capabilities that may be inherited across
170*4882a593Smuzhiyun     ``execve()``, especially when a binary is executed that will execute as
171*4882a593Smuzhiyun     UID 0.
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun 3. Secure management flags (securebits).
174*4882a593Smuzhiyun
175*4882a593Smuzhiyun     These are only carried by tasks.  These govern the way the above
176*4882a593Smuzhiyun     credentials are manipulated and inherited over certain operations such as
177*4882a593Smuzhiyun     execve().  They aren't used directly as objective or subjective
178*4882a593Smuzhiyun     credentials.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun 4. Keys and keyrings.
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun     These are only carried by tasks.  They carry and cache security tokens
183*4882a593Smuzhiyun     that don't fit into the other standard UNIX credentials.  They are for
184*4882a593Smuzhiyun     making such things as network filesystem keys available to the file
185*4882a593Smuzhiyun     accesses performed by processes, without the necessity of ordinary
186*4882a593Smuzhiyun     programs having to know about security details involved.
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun     Keyrings are a special type of key.  They carry sets of other keys and can
189*4882a593Smuzhiyun     be searched for the desired key.  Each process may subscribe to a number
190*4882a593Smuzhiyun     of keyrings:
191*4882a593Smuzhiyun
192*4882a593Smuzhiyun	Per-thread keying
193*4882a593Smuzhiyun	Per-process keyring
194*4882a593Smuzhiyun	Per-session keyring
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun     When a process accesses a key, if not already present, it will normally be
197*4882a593Smuzhiyun     cached on one of these keyrings for future accesses to find.
198*4882a593Smuzhiyun
199*4882a593Smuzhiyun     For more information on using keys, see ``Documentation/security/keys/*``.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun 5. LSM
202*4882a593Smuzhiyun
203*4882a593Smuzhiyun     The Linux Security Module allows extra controls to be placed over the
204*4882a593Smuzhiyun     operations that a task may do.  Currently Linux supports several LSM
205*4882a593Smuzhiyun     options.
206*4882a593Smuzhiyun
207*4882a593Smuzhiyun     Some work by labelling the objects in a system and then applying sets of
208*4882a593Smuzhiyun     rules (policies) that say what operations a task with one label may do to
209*4882a593Smuzhiyun     an object with another label.
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun 6. AF_KEY
212*4882a593Smuzhiyun
213*4882a593Smuzhiyun     This is a socket-based approach to credential management for networking
214*4882a593Smuzhiyun     stacks [RFC 2367].  It isn't discussed by this document as it doesn't
215*4882a593Smuzhiyun     interact directly with task and file credentials; rather it keeps system
216*4882a593Smuzhiyun     level credentials.
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun
219*4882a593SmuzhiyunWhen a file is opened, part of the opening task's subjective context is
220*4882a593Smuzhiyunrecorded in the file struct created.  This allows operations using that file
221*4882a593Smuzhiyunstruct to use those credentials instead of the subjective context of the task
222*4882a593Smuzhiyunthat issued the operation.  An example of this would be a file opened on a
223*4882a593Smuzhiyunnetwork filesystem where the credentials of the opened file should be presented
224*4882a593Smuzhiyunto the server, regardless of who is actually doing a read or a write upon it.
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun
227*4882a593SmuzhiyunFile Markings
228*4882a593Smuzhiyun=============
229*4882a593Smuzhiyun
230*4882a593SmuzhiyunFiles on disk or obtained over the network may have annotations that form the
231*4882a593Smuzhiyunobjective security context of that file.  Depending on the type of filesystem,
232*4882a593Smuzhiyunthis may include one or more of the following:
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun * UNIX UID, GID, mode;
235*4882a593Smuzhiyun * Windows user ID;
236*4882a593Smuzhiyun * Access control list;
237*4882a593Smuzhiyun * LSM security label;
238*4882a593Smuzhiyun * UNIX exec privilege escalation bits (SUID/SGID);
239*4882a593Smuzhiyun * File capabilities exec privilege escalation bits.
240*4882a593Smuzhiyun
241*4882a593SmuzhiyunThese are compared to the task's subjective security context, and certain
242*4882a593Smuzhiyunoperations allowed or disallowed as a result.  In the case of execve(), the
243*4882a593Smuzhiyunprivilege escalation bits come into play, and may allow the resulting process
244*4882a593Smuzhiyunextra privileges, based on the annotations on the executable file.
245*4882a593Smuzhiyun
246*4882a593Smuzhiyun
247*4882a593SmuzhiyunTask Credentials
248*4882a593Smuzhiyun================
249*4882a593Smuzhiyun
250*4882a593SmuzhiyunIn Linux, all of a task's credentials are held in (uid, gid) or through
251*4882a593Smuzhiyun(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
252*4882a593SmuzhiyunEach task points to its credentials by a pointer called 'cred' in its
253*4882a593Smuzhiyuntask_struct.
254*4882a593Smuzhiyun
255*4882a593SmuzhiyunOnce a set of credentials has been prepared and committed, it may not be
256*4882a593Smuzhiyunchanged, barring the following exceptions:
257*4882a593Smuzhiyun
258*4882a593Smuzhiyun 1. its reference count may be changed;
259*4882a593Smuzhiyun
260*4882a593Smuzhiyun 2. the reference count on the group_info struct it points to may be changed;
261*4882a593Smuzhiyun
262*4882a593Smuzhiyun 3. the reference count on the security data it points to may be changed;
263*4882a593Smuzhiyun
264*4882a593Smuzhiyun 4. the reference count on any keyrings it points to may be changed;
265*4882a593Smuzhiyun
266*4882a593Smuzhiyun 5. any keyrings it points to may be revoked, expired or have their security
267*4882a593Smuzhiyun    attributes changed; and
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun 6. the contents of any keyrings to which it points may be changed (the whole
270*4882a593Smuzhiyun    point of keyrings being a shared set of credentials, modifiable by anyone
271*4882a593Smuzhiyun    with appropriate access).
272*4882a593Smuzhiyun
273*4882a593SmuzhiyunTo alter anything in the cred struct, the copy-and-replace principle must be
274*4882a593Smuzhiyunadhered to.  First take a copy, then alter the copy and then use RCU to change
275*4882a593Smuzhiyunthe task pointer to make it point to the new copy.  There are wrappers to aid
276*4882a593Smuzhiyunwith this (see below).
277*4882a593Smuzhiyun
278*4882a593SmuzhiyunA task may only alter its _own_ credentials; it is no longer permitted for a
279*4882a593Smuzhiyuntask to alter another's credentials.  This means the ``capset()`` system call
280*4882a593Smuzhiyunis no longer permitted to take any PID other than the one of the current
281*4882a593Smuzhiyunprocess. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
282*4882a593Smuzhiyunlonger permit attachment to process-specific keyrings in the requesting
283*4882a593Smuzhiyunprocess as the instantiating process may need to create them.
284*4882a593Smuzhiyun
285*4882a593Smuzhiyun
286*4882a593SmuzhiyunImmutable Credentials
287*4882a593Smuzhiyun---------------------
288*4882a593Smuzhiyun
289*4882a593SmuzhiyunOnce a set of credentials has been made public (by calling ``commit_creds()``
290*4882a593Smuzhiyunfor example), it must be considered immutable, barring two exceptions:
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun 1. The reference count may be altered.
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun 2. While the keyring subscriptions of a set of credentials may not be
295*4882a593Smuzhiyun    changed, the keyrings subscribed to may have their contents altered.
296*4882a593Smuzhiyun
297*4882a593SmuzhiyunTo catch accidental credential alteration at compile time, struct task_struct
298*4882a593Smuzhiyunhas _const_ pointers to its credential sets, as does struct file.  Furthermore,
299*4882a593Smuzhiyuncertain functions such as ``get_cred()`` and ``put_cred()`` operate on const
300*4882a593Smuzhiyunpointers, thus rendering casts unnecessary, but require to temporarily ditch
301*4882a593Smuzhiyunthe const qualification to be able to alter the reference count.
302*4882a593Smuzhiyun
303*4882a593Smuzhiyun
304*4882a593SmuzhiyunAccessing Task Credentials
305*4882a593Smuzhiyun--------------------------
306*4882a593Smuzhiyun
307*4882a593SmuzhiyunA task being able to alter only its own credentials permits the current process
308*4882a593Smuzhiyunto read or replace its own credentials without the need for any form of locking
309*4882a593Smuzhiyun-- which simplifies things greatly.  It can just call::
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun	const struct cred *current_cred()
312*4882a593Smuzhiyun
313*4882a593Smuzhiyunto get a pointer to its credentials structure, and it doesn't have to release
314*4882a593Smuzhiyunit afterwards.
315*4882a593Smuzhiyun
316*4882a593SmuzhiyunThere are convenience wrappers for retrieving specific aspects of a task's
317*4882a593Smuzhiyuncredentials (the value is simply returned in each case)::
318*4882a593Smuzhiyun
319*4882a593Smuzhiyun	uid_t current_uid(void)		Current's real UID
320*4882a593Smuzhiyun	gid_t current_gid(void)		Current's real GID
321*4882a593Smuzhiyun	uid_t current_euid(void)	Current's effective UID
322*4882a593Smuzhiyun	gid_t current_egid(void)	Current's effective GID
323*4882a593Smuzhiyun	uid_t current_fsuid(void)	Current's file access UID
324*4882a593Smuzhiyun	gid_t current_fsgid(void)	Current's file access GID
325*4882a593Smuzhiyun	kernel_cap_t current_cap(void)	Current's effective capabilities
326*4882a593Smuzhiyun	struct user_struct *current_user(void)  Current's user account
327*4882a593Smuzhiyun
328*4882a593SmuzhiyunThere are also convenience wrappers for retrieving specific associated pairs of
329*4882a593Smuzhiyuna task's credentials::
330*4882a593Smuzhiyun
331*4882a593Smuzhiyun	void current_uid_gid(uid_t *, gid_t *);
332*4882a593Smuzhiyun	void current_euid_egid(uid_t *, gid_t *);
333*4882a593Smuzhiyun	void current_fsuid_fsgid(uid_t *, gid_t *);
334*4882a593Smuzhiyun
335*4882a593Smuzhiyunwhich return these pairs of values through their arguments after retrieving
336*4882a593Smuzhiyunthem from the current task's credentials.
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun
339*4882a593SmuzhiyunIn addition, there is a function for obtaining a reference on the current
340*4882a593Smuzhiyunprocess's current set of credentials::
341*4882a593Smuzhiyun
342*4882a593Smuzhiyun	const struct cred *get_current_cred(void);
343*4882a593Smuzhiyun
344*4882a593Smuzhiyunand functions for getting references to one of the credentials that don't
345*4882a593Smuzhiyunactually live in struct cred::
346*4882a593Smuzhiyun
347*4882a593Smuzhiyun	struct user_struct *get_current_user(void);
348*4882a593Smuzhiyun	struct group_info *get_current_groups(void);
349*4882a593Smuzhiyun
350*4882a593Smuzhiyunwhich get references to the current process's user accounting structure and
351*4882a593Smuzhiyunsupplementary groups list respectively.
352*4882a593Smuzhiyun
353*4882a593SmuzhiyunOnce a reference has been obtained, it must be released with ``put_cred()``,
354*4882a593Smuzhiyun``free_uid()`` or ``put_group_info()`` as appropriate.
355*4882a593Smuzhiyun
356*4882a593Smuzhiyun
357*4882a593SmuzhiyunAccessing Another Task's Credentials
358*4882a593Smuzhiyun------------------------------------
359*4882a593Smuzhiyun
360*4882a593SmuzhiyunWhile a task may access its own credentials without the need for locking, the
361*4882a593Smuzhiyunsame is not true of a task wanting to access another task's credentials.  It
362*4882a593Smuzhiyunmust use the RCU read lock and ``rcu_dereference()``.
363*4882a593Smuzhiyun
364*4882a593SmuzhiyunThe ``rcu_dereference()`` is wrapped by::
365*4882a593Smuzhiyun
366*4882a593Smuzhiyun	const struct cred *__task_cred(struct task_struct *task);
367*4882a593Smuzhiyun
368*4882a593SmuzhiyunThis should be used inside the RCU read lock, as in the following example::
369*4882a593Smuzhiyun
370*4882a593Smuzhiyun	void foo(struct task_struct *t, struct foo_data *f)
371*4882a593Smuzhiyun	{
372*4882a593Smuzhiyun		const struct cred *tcred;
373*4882a593Smuzhiyun		...
374*4882a593Smuzhiyun		rcu_read_lock();
375*4882a593Smuzhiyun		tcred = __task_cred(t);
376*4882a593Smuzhiyun		f->uid = tcred->uid;
377*4882a593Smuzhiyun		f->gid = tcred->gid;
378*4882a593Smuzhiyun		f->groups = get_group_info(tcred->groups);
379*4882a593Smuzhiyun		rcu_read_unlock();
380*4882a593Smuzhiyun		...
381*4882a593Smuzhiyun	}
382*4882a593Smuzhiyun
383*4882a593SmuzhiyunShould it be necessary to hold another task's credentials for a long period of
384*4882a593Smuzhiyuntime, and possibly to sleep while doing so, then the caller should get a
385*4882a593Smuzhiyunreference on them using::
386*4882a593Smuzhiyun
387*4882a593Smuzhiyun	const struct cred *get_task_cred(struct task_struct *task);
388*4882a593Smuzhiyun
389*4882a593SmuzhiyunThis does all the RCU magic inside of it.  The caller must call put_cred() on
390*4882a593Smuzhiyunthe credentials so obtained when they're finished with.
391*4882a593Smuzhiyun
392*4882a593Smuzhiyun.. note::
393*4882a593Smuzhiyun   The result of ``__task_cred()`` should not be passed directly to
394*4882a593Smuzhiyun   ``get_cred()`` as this may race with ``commit_cred()``.
395*4882a593Smuzhiyun
396*4882a593SmuzhiyunThere are a couple of convenience functions to access bits of another task's
397*4882a593Smuzhiyuncredentials, hiding the RCU magic from the caller::
398*4882a593Smuzhiyun
399*4882a593Smuzhiyun	uid_t task_uid(task)		Task's real UID
400*4882a593Smuzhiyun	uid_t task_euid(task)		Task's effective UID
401*4882a593Smuzhiyun
402*4882a593SmuzhiyunIf the caller is holding the RCU read lock at the time anyway, then::
403*4882a593Smuzhiyun
404*4882a593Smuzhiyun	__task_cred(task)->uid
405*4882a593Smuzhiyun	__task_cred(task)->euid
406*4882a593Smuzhiyun
407*4882a593Smuzhiyunshould be used instead.  Similarly, if multiple aspects of a task's credentials
408*4882a593Smuzhiyunneed to be accessed, RCU read lock should be used, ``__task_cred()`` called,
409*4882a593Smuzhiyunthe result stored in a temporary pointer and then the credential aspects called
410*4882a593Smuzhiyunfrom that before dropping the lock.  This prevents the potentially expensive
411*4882a593SmuzhiyunRCU magic from being invoked multiple times.
412*4882a593Smuzhiyun
413*4882a593SmuzhiyunShould some other single aspect of another task's credentials need to be
414*4882a593Smuzhiyunaccessed, then this can be used::
415*4882a593Smuzhiyun
416*4882a593Smuzhiyun	task_cred_xxx(task, member)
417*4882a593Smuzhiyun
418*4882a593Smuzhiyunwhere 'member' is a non-pointer member of the cred struct.  For instance::
419*4882a593Smuzhiyun
420*4882a593Smuzhiyun	uid_t task_cred_xxx(task, suid);
421*4882a593Smuzhiyun
422*4882a593Smuzhiyunwill retrieve 'struct cred::suid' from the task, doing the appropriate RCU
423*4882a593Smuzhiyunmagic.  This may not be used for pointer members as what they point to may
424*4882a593Smuzhiyundisappear the moment the RCU read lock is dropped.
425*4882a593Smuzhiyun
426*4882a593Smuzhiyun
427*4882a593SmuzhiyunAltering Credentials
428*4882a593Smuzhiyun--------------------
429*4882a593Smuzhiyun
430*4882a593SmuzhiyunAs previously mentioned, a task may only alter its own credentials, and may not
431*4882a593Smuzhiyunalter those of another task.  This means that it doesn't need to use any
432*4882a593Smuzhiyunlocking to alter its own credentials.
433*4882a593Smuzhiyun
434*4882a593SmuzhiyunTo alter the current process's credentials, a function should first prepare a
435*4882a593Smuzhiyunnew set of credentials by calling::
436*4882a593Smuzhiyun
437*4882a593Smuzhiyun	struct cred *prepare_creds(void);
438*4882a593Smuzhiyun
439*4882a593Smuzhiyunthis locks current->cred_replace_mutex and then allocates and constructs a
440*4882a593Smuzhiyunduplicate of the current process's credentials, returning with the mutex still
441*4882a593Smuzhiyunheld if successful.  It returns NULL if not successful (out of memory).
442*4882a593Smuzhiyun
443*4882a593SmuzhiyunThe mutex prevents ``ptrace()`` from altering the ptrace state of a process
444*4882a593Smuzhiyunwhile security checks on credentials construction and changing is taking place
445*4882a593Smuzhiyunas the ptrace state may alter the outcome, particularly in the case of
446*4882a593Smuzhiyun``execve()``.
447*4882a593Smuzhiyun
448*4882a593SmuzhiyunThe new credentials set should be altered appropriately, and any security
449*4882a593Smuzhiyunchecks and hooks done.  Both the current and the proposed sets of credentials
450*4882a593Smuzhiyunare available for this purpose as current_cred() will return the current set
451*4882a593Smuzhiyunstill at this point.
452*4882a593Smuzhiyun
453*4882a593SmuzhiyunWhen replacing the group list, the new list must be sorted before it
454*4882a593Smuzhiyunis added to the credential, as a binary search is used to test for
455*4882a593Smuzhiyunmembership.  In practice, this means groups_sort() should be
456*4882a593Smuzhiyuncalled before set_groups() or set_current_groups().
457*4882a593Smuzhiyungroups_sort() must not be called on a ``struct group_list`` which
458*4882a593Smuzhiyunis shared as it may permute elements as part of the sorting process
459*4882a593Smuzhiyuneven if the array is already sorted.
460*4882a593Smuzhiyun
461*4882a593SmuzhiyunWhen the credential set is ready, it should be committed to the current process
462*4882a593Smuzhiyunby calling::
463*4882a593Smuzhiyun
464*4882a593Smuzhiyun	int commit_creds(struct cred *new);
465*4882a593Smuzhiyun
466*4882a593SmuzhiyunThis will alter various aspects of the credentials and the process, giving the
467*4882a593SmuzhiyunLSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
468*4882a593Smuzhiyunactually commit the new credentials to ``current->cred``, it will release
469*4882a593Smuzhiyun``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
470*4882a593Smuzhiyunwill notify the scheduler and others of the changes.
471*4882a593Smuzhiyun
472*4882a593SmuzhiyunThis function is guaranteed to return 0, so that it can be tail-called at the
473*4882a593Smuzhiyunend of such functions as ``sys_setresuid()``.
474*4882a593Smuzhiyun
475*4882a593SmuzhiyunNote that this function consumes the caller's reference to the new credentials.
476*4882a593SmuzhiyunThe caller should _not_ call ``put_cred()`` on the new credentials afterwards.
477*4882a593Smuzhiyun
478*4882a593SmuzhiyunFurthermore, once this function has been called on a new set of credentials,
479*4882a593Smuzhiyunthose credentials may _not_ be changed further.
480*4882a593Smuzhiyun
481*4882a593Smuzhiyun
482*4882a593SmuzhiyunShould the security checks fail or some other error occur after
483*4882a593Smuzhiyun``prepare_creds()`` has been called, then the following function should be
484*4882a593Smuzhiyuninvoked::
485*4882a593Smuzhiyun
486*4882a593Smuzhiyun	void abort_creds(struct cred *new);
487*4882a593Smuzhiyun
488*4882a593SmuzhiyunThis releases the lock on ``current->cred_replace_mutex`` that
489*4882a593Smuzhiyun``prepare_creds()`` got and then releases the new credentials.
490*4882a593Smuzhiyun
491*4882a593Smuzhiyun
492*4882a593SmuzhiyunA typical credentials alteration function would look something like this::
493*4882a593Smuzhiyun
494*4882a593Smuzhiyun	int alter_suid(uid_t suid)
495*4882a593Smuzhiyun	{
496*4882a593Smuzhiyun		struct cred *new;
497*4882a593Smuzhiyun		int ret;
498*4882a593Smuzhiyun
499*4882a593Smuzhiyun		new = prepare_creds();
500*4882a593Smuzhiyun		if (!new)
501*4882a593Smuzhiyun			return -ENOMEM;
502*4882a593Smuzhiyun
503*4882a593Smuzhiyun		new->suid = suid;
504*4882a593Smuzhiyun		ret = security_alter_suid(new);
505*4882a593Smuzhiyun		if (ret < 0) {
506*4882a593Smuzhiyun			abort_creds(new);
507*4882a593Smuzhiyun			return ret;
508*4882a593Smuzhiyun		}
509*4882a593Smuzhiyun
510*4882a593Smuzhiyun		return commit_creds(new);
511*4882a593Smuzhiyun	}
512*4882a593Smuzhiyun
513*4882a593Smuzhiyun
514*4882a593SmuzhiyunManaging Credentials
515*4882a593Smuzhiyun--------------------
516*4882a593Smuzhiyun
517*4882a593SmuzhiyunThere are some functions to help manage credentials:
518*4882a593Smuzhiyun
519*4882a593Smuzhiyun - ``void put_cred(const struct cred *cred);``
520*4882a593Smuzhiyun
521*4882a593Smuzhiyun     This releases a reference to the given set of credentials.  If the
522*4882a593Smuzhiyun     reference count reaches zero, the credentials will be scheduled for
523*4882a593Smuzhiyun     destruction by the RCU system.
524*4882a593Smuzhiyun
525*4882a593Smuzhiyun - ``const struct cred *get_cred(const struct cred *cred);``
526*4882a593Smuzhiyun
527*4882a593Smuzhiyun     This gets a reference on a live set of credentials, returning a pointer to
528*4882a593Smuzhiyun     that set of credentials.
529*4882a593Smuzhiyun
530*4882a593Smuzhiyun - ``struct cred *get_new_cred(struct cred *cred);``
531*4882a593Smuzhiyun
532*4882a593Smuzhiyun     This gets a reference on a set of credentials that is under construction
533*4882a593Smuzhiyun     and is thus still mutable, returning a pointer to that set of credentials.
534*4882a593Smuzhiyun
535*4882a593Smuzhiyun
536*4882a593SmuzhiyunOpen File Credentials
537*4882a593Smuzhiyun=====================
538*4882a593Smuzhiyun
539*4882a593SmuzhiyunWhen a new file is opened, a reference is obtained on the opening task's
540*4882a593Smuzhiyuncredentials and this is attached to the file struct as ``f_cred`` in place of
541*4882a593Smuzhiyun``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
542*4882a593Smuzhiyun``file->f_gid`` should now access ``file->f_cred->fsuid`` and
543*4882a593Smuzhiyun``file->f_cred->fsgid``.
544*4882a593Smuzhiyun
545*4882a593SmuzhiyunIt is safe to access ``f_cred`` without the use of RCU or locking because the
546*4882a593Smuzhiyunpointer will not change over the lifetime of the file struct, and nor will the
547*4882a593Smuzhiyuncontents of the cred struct pointed to, barring the exceptions listed above
548*4882a593Smuzhiyun(see the Task Credentials section).
549*4882a593Smuzhiyun
550*4882a593SmuzhiyunTo avoid "confused deputy" privilege escalation attacks, access control checks
551*4882a593Smuzhiyunduring subsequent operations on an opened file should use these credentials
552*4882a593Smuzhiyuninstead of "current"'s credentials, as the file may have been passed to a more
553*4882a593Smuzhiyunprivileged process.
554*4882a593Smuzhiyun
555*4882a593SmuzhiyunOverriding the VFS's Use of Credentials
556*4882a593Smuzhiyun=======================================
557*4882a593Smuzhiyun
558*4882a593SmuzhiyunUnder some circumstances it is desirable to override the credentials used by
559*4882a593Smuzhiyunthe VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
560*4882a593Smuzhiyundifferent set of credentials.  This is done in the following places:
561*4882a593Smuzhiyun
562*4882a593Smuzhiyun * ``sys_faccessat()``.
563*4882a593Smuzhiyun * ``do_coredump()``.
564*4882a593Smuzhiyun * nfs4recover.c.
565