1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun======================================= 4*4882a593SmuzhiyunThe padata parallel execution mechanism 5*4882a593Smuzhiyun======================================= 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun:Date: May 2020 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunPadata is a mechanism by which the kernel can farm jobs out to be done in 10*4882a593Smuzhiyunparallel on multiple CPUs while optionally retaining their ordering. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunIt was originally developed for IPsec, which needs to perform encryption and 13*4882a593Smuzhiyundecryption on large numbers of packets without reordering those packets. This 14*4882a593Smuzhiyunis currently the sole consumer of padata's serialized job support. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunPadata also supports multithreaded jobs, splitting up the job evenly while load 17*4882a593Smuzhiyunbalancing and coordinating between threads. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunRunning Serialized Jobs 20*4882a593Smuzhiyun======================= 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunInitializing 23*4882a593Smuzhiyun------------ 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThe first step in using padata to run serialized jobs is to set up a 26*4882a593Smuzhiyunpadata_instance structure for overall control of how jobs are to be run:: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun #include <linux/padata.h> 29*4882a593Smuzhiyun 30*4882a593Smuzhiyun struct padata_instance *padata_alloc(const char *name); 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun'name' simply identifies the instance. 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunThen, complete padata initialization by allocating a padata_shell:: 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunA padata_shell is used to submit a job to padata and allows a series of such 39*4882a593Smuzhiyunjobs to be serialized independently. A padata_instance may have one or more 40*4882a593Smuzhiyunpadata_shells associated with it, each allowing a separate series of jobs. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunModifying cpumasks 43*4882a593Smuzhiyun------------------ 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThe CPUs used to run jobs can be changed in two ways, programatically with 46*4882a593Smuzhiyunpadata_set_cpumask() or via sysfs. The former is defined:: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, 49*4882a593Smuzhiyun cpumask_var_t cpumask); 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunHere cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a 52*4882a593Smuzhiyunparallel cpumask describes which processors will be used to execute jobs 53*4882a593Smuzhiyunsubmitted to this instance in parallel and a serial cpumask defines which 54*4882a593Smuzhiyunprocessors are allowed to be used as the serialization callback processor. 55*4882a593Smuzhiyuncpumask specifies the new cpumask to use. 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunThere may be sysfs files for an instance's cpumasks. For example, pcrypt's 58*4882a593Smuzhiyunlive in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory 59*4882a593Smuzhiyunthere are two files, parallel_cpumask and serial_cpumask, and either cpumask 60*4882a593Smuzhiyunmay be changed by echoing a bitmask into the file, for example:: 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunReading one of these files shows the user-supplied cpumask, which may be 65*4882a593Smuzhiyundifferent from the 'usable' cpumask. 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunPadata maintains two pairs of cpumasks internally, the user-supplied cpumasks 68*4882a593Smuzhiyunand the 'usable' cpumasks. (Each pair consists of a parallel and a serial 69*4882a593Smuzhiyuncpumask.) The user-supplied cpumasks default to all possible CPUs on instance 70*4882a593Smuzhiyunallocation and may be changed as above. The usable cpumasks are always a 71*4882a593Smuzhiyunsubset of the user-supplied cpumasks and contain only the online CPUs in the 72*4882a593Smuzhiyunuser-supplied masks; these are the cpumasks padata actually uses. So it is 73*4882a593Smuzhiyunlegal to supply a cpumask to padata that contains offline CPUs. Once an 74*4882a593Smuzhiyunoffline CPU in the user-supplied cpumask comes online, padata is going to use 75*4882a593Smuzhiyunit. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunChanging the CPU masks are expensive operations, so it should not be done with 78*4882a593Smuzhiyungreat frequency. 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunRunning A Job 81*4882a593Smuzhiyun------------- 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunActually submitting work to the padata instance requires the creation of a 84*4882a593Smuzhiyunpadata_priv structure, which represents one job:: 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun struct padata_priv { 87*4882a593Smuzhiyun /* Other stuff here... */ 88*4882a593Smuzhiyun void (*parallel)(struct padata_priv *padata); 89*4882a593Smuzhiyun void (*serial)(struct padata_priv *padata); 90*4882a593Smuzhiyun }; 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunThis structure will almost certainly be embedded within some larger 93*4882a593Smuzhiyunstructure specific to the work to be done. Most of its fields are private to 94*4882a593Smuzhiyunpadata, but the structure should be zeroed at initialisation time, and the 95*4882a593Smuzhiyunparallel() and serial() functions should be provided. Those functions will 96*4882a593Smuzhiyunbe called in the process of getting the work done as we will see 97*4882a593Smuzhiyunmomentarily. 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunThe submission of the job is done with:: 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun int padata_do_parallel(struct padata_shell *ps, 102*4882a593Smuzhiyun struct padata_priv *padata, int *cb_cpu); 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunThe ps and padata structures must be set up as described above; cb_cpu 105*4882a593Smuzhiyunpoints to the preferred CPU to be used for the final callback when the job is 106*4882a593Smuzhiyundone; it must be in the current instance's CPU mask (if not the cb_cpu pointer 107*4882a593Smuzhiyunis updated to point to the CPU actually chosen). The return value from 108*4882a593Smuzhiyunpadata_do_parallel() is zero on success, indicating that the job is in 109*4882a593Smuzhiyunprogress. -EBUSY means that somebody, somewhere else is messing with the 110*4882a593Smuzhiyuninstance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the 111*4882a593Smuzhiyunserial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped 112*4882a593Smuzhiyuninstance. 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunEach job submitted to padata_do_parallel() will, in turn, be passed to 115*4882a593Smuzhiyunexactly one call to the above-mentioned parallel() function, on one CPU, so 116*4882a593Smuzhiyuntrue parallelism is achieved by submitting multiple jobs. parallel() runs with 117*4882a593Smuzhiyunsoftware interrupts disabled and thus cannot sleep. The parallel() 118*4882a593Smuzhiyunfunction gets the padata_priv structure pointer as its lone parameter; 119*4882a593Smuzhiyuninformation about the actual work to be done is probably obtained by using 120*4882a593Smuzhiyuncontainer_of() to find the enclosing structure. 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunNote that parallel() has no return value; the padata subsystem assumes that 123*4882a593Smuzhiyunparallel() will take responsibility for the job from this point. The job 124*4882a593Smuzhiyunneed not be completed during this call, but, if parallel() leaves work 125*4882a593Smuzhiyunoutstanding, it should be prepared to be called again with a new job before 126*4882a593Smuzhiyunthe previous one completes. 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunSerializing Jobs 129*4882a593Smuzhiyun---------------- 130*4882a593Smuzhiyun 131*4882a593SmuzhiyunWhen a job does complete, parallel() (or whatever function actually finishes 132*4882a593Smuzhiyunthe work) should inform padata of the fact with a call to:: 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun void padata_do_serial(struct padata_priv *padata); 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunAt some point in the future, padata_do_serial() will trigger a call to the 137*4882a593Smuzhiyunserial() function in the padata_priv structure. That call will happen on 138*4882a593Smuzhiyunthe CPU requested in the initial call to padata_do_parallel(); it, too, is 139*4882a593Smuzhiyunrun with local software interrupts disabled. 140*4882a593SmuzhiyunNote that this call may be deferred for a while since the padata code takes 141*4882a593Smuzhiyunpains to ensure that jobs are completed in the order in which they were 142*4882a593Smuzhiyunsubmitted. 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunDestroying 145*4882a593Smuzhiyun---------- 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunCleaning up a padata instance predictably involves calling the two free 148*4882a593Smuzhiyunfunctions that correspond to the allocation in reverse:: 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun void padata_free_shell(struct padata_shell *ps); 151*4882a593Smuzhiyun void padata_free(struct padata_instance *pinst); 152*4882a593Smuzhiyun 153*4882a593SmuzhiyunIt is the user's responsibility to ensure all outstanding jobs are complete 154*4882a593Smuzhiyunbefore any of the above are called. 155*4882a593Smuzhiyun 156*4882a593SmuzhiyunRunning Multithreaded Jobs 157*4882a593Smuzhiyun========================== 158*4882a593Smuzhiyun 159*4882a593SmuzhiyunA multithreaded job has a main thread and zero or more helper threads, with the 160*4882a593Smuzhiyunmain thread participating in the job and then waiting until all helpers have 161*4882a593Smuzhiyunfinished. padata splits the job into units called chunks, where a chunk is a 162*4882a593Smuzhiyunpiece of the job that one thread completes in one call to the thread function. 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunA user has to do three things to run a multithreaded job. First, describe the 165*4882a593Smuzhiyunjob by defining a padata_mt_job structure, which is explained in the Interface 166*4882a593Smuzhiyunsection. This includes a pointer to the thread function, which padata will 167*4882a593Smuzhiyuncall each time it assigns a job chunk to a thread. Then, define the thread 168*4882a593Smuzhiyunfunction, which accepts three arguments, ``start``, ``end``, and ``arg``, where 169*4882a593Smuzhiyunthe first two delimit the range that the thread operates on and the last is a 170*4882a593Smuzhiyunpointer to the job's shared state, if any. Prepare the shared state, which is 171*4882a593Smuzhiyuntypically allocated on the main thread's stack. Last, call 172*4882a593Smuzhiyunpadata_do_multithreaded(), which will return once the job is finished. 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunInterface 175*4882a593Smuzhiyun========= 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun.. kernel-doc:: include/linux/padata.h 178*4882a593Smuzhiyun.. kernel-doc:: kernel/padata.c 179