xref: /OK3568_Linux_fs/external/rknpu2/doc/RK3588_NPU_SRAM_usage.md (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun# RK3588 NPU SRAM使用说明
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun* RK3588 SOC内部含有1MB的SRAM,其中有956KB可供给SOC上各个IP所使用,已支持为RKNPU指定分配使用
4*4882a593Smuzhiyun* SRAM可以帮助RKNPU应用减轻DDR带宽压力,目前支持为Internal和Weight两种类型内存指定分配SRAM
5*4882a593Smuzhiyun
6*4882a593Smuzhiyun---
7*4882a593Smuzhiyun一、板端环境要求
8*4882a593Smuzhiyun---
9*4882a593Smuzhiyun1、内核环境要求
10*4882a593Smuzhiyun* RKNPU驱动版本>=0.8.0
11*4882a593Smuzhiyun* 内核config需要开启CONFIG_ROCKCHIP_RKNPU_SRAM=y
12*4882a593Smuzhiyun    * Android系统config路径如下:
13*4882a593Smuzhiyun    ```shell
14*4882a593Smuzhiyun    <path-to-your-kernel>/arch/arm64/configs/rockchip_defconfig
15*4882a593Smuzhiyun    ```
16*4882a593Smuzhiyun    * Linux系统config路径如下:
17*4882a593Smuzhiyun    ```
18*4882a593Smuzhiyun    <path-to-your-kernel>/arch/arm64/configs/rockchip_linux_defconfig
19*4882a593Smuzhiyun    ```
20*4882a593Smuzhiyun* 内核相应DTS需要从系统SRAM中分配给RKNPU使用
21*4882a593Smuzhiyun    * 从系统分配需求大小的SRAM给RKNPU,最大可分配956KB,且大小需要4K对齐
22*4882a593Smuzhiyun    * 注意:默认系统中可能已为其他IP分配SRAM,比如编解码模块,各IP分配的SRAM区域不能重叠,否则会存在同时读写出现数据错乱现象
23*4882a593Smuzhiyun    * 如下为956KB全部分配给RKNPU的例子:
24*4882a593Smuzhiyun    ```dts
25*4882a593Smuzhiyun    syssram: sram@ff001000 {
26*4882a593Smuzhiyun        compatible = "mmio-sram";
27*4882a593Smuzhiyun        reg = <0x0 0xff001000 0x0 0xef000>;
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun        #address-cells = <1>;
30*4882a593Smuzhiyun        #size-cells = <1>;
31*4882a593Smuzhiyun        ranges = <0x0 0x0 0xff001000 0xef000>;
32*4882a593Smuzhiyun        /* 分配RKNPU SRAM */
33*4882a593Smuzhiyun        /* start address and size should be 4k algin */
34*4882a593Smuzhiyun        rknpu_sram: rknpu_sram@0 {
35*4882a593Smuzhiyun            reg = <0x0 0xef000>; // 956KB
36*4882a593Smuzhiyun        };
37*4882a593Smuzhiyun    };
38*4882a593Smuzhiyun    ```
39*4882a593Smuzhiyun    * 把分配的SRAM挂到RKNPU节点,修改如下所示的dtsi文件:
40*4882a593Smuzhiyun    ```shell
41*4882a593Smuzhiyun    <path-to-your-kernel>/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
42*4882a593Smuzhiyun    ```
43*4882a593Smuzhiyun    ```dts
44*4882a593Smuzhiyun    rknpu: npu@fdab0000 {
45*4882a593Smuzhiyun        compatible = "rockchip,rk3588-rknpu";
46*4882a593Smuzhiyun        /* ... */
47*4882a593Smuzhiyun        /* 增加RKNPU sram的引用 */
48*4882a593Smuzhiyun        rockchip,sram = <&rknpu_sram>;
49*4882a593Smuzhiyun        status = "disabled";
50*4882a593Smuzhiyun    };
51*4882a593Smuzhiyun    ```
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun2、RKNN SDK版本要求
54*4882a593Smuzhiyun* RKNPU Runtime库(librknnrt.so)版本>=1.3.4b14
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun---
57*4882a593Smuzhiyun二、使用方法
58*4882a593Smuzhiyun---
59*4882a593Smuzhiyun1、指定Internal使用SRAM:
60*4882a593Smuzhiyun* 自动大小方式,将尝试从系统分配剩余足够的SRAM给Internal使用
61*4882a593Smuzhiyun    * **export RKNN_INTERNAL_MEM_TYPE=sram**
62*4882a593Smuzhiyun* 指定大小方式,将尝试从系统分配指定256KB大小的SRAM给Internal使用
63*4882a593Smuzhiyun    * **export RKNN_INTERNAL_MEM_TYPE=sram#256**
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun2、指定Weight使用SRAM:
66*4882a593Smuzhiyun* 自动大小方式,将尝试从系统分配剩余足够的SRAM给Weight使用
67*4882a593Smuzhiyun    * **export RKNN_SEPARATE_WEIGHT_MEM=1**
68*4882a593Smuzhiyun    * **export RKNN_WEIGHT_MEM_TYPE=sram**
69*4882a593Smuzhiyun* 指定大小方式,将尝试从系统分配指定128KB大小的SRAM给Weight使用
70*4882a593Smuzhiyun     * **export RKNN_SEPARATE_WEIGHT_MEM=1**
71*4882a593Smuzhiyun    * **export RKNN_WEIGHT_MEM_TYPE=sram#128**
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun3、混合指定
74*4882a593Smuzhiyun* RKNPU驱动支持对SRAM内存管理,支持同时指定SRAM给Internal和Weight同时使用,如下:
75*4882a593Smuzhiyun    * **export RKNN_INTERNAL_MEM_TYPE=sram#256**
76*4882a593Smuzhiyun    * **export RKNN_SEPARATE_WEIGHT_MEM=1**
77*4882a593Smuzhiyun    * **export RKNN_WEIGHT_MEM_TYPE=sram#128**
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun---
80*4882a593Smuzhiyun三、调试方法
81*4882a593Smuzhiyun---
82*4882a593Smuzhiyun1、SRAM是否启用查询
83*4882a593Smuzhiyun* 通过开机串口日志查看SRAM是否启用,包含为RKNPU指定SRAM的地址范围和大小信息,如下所示:
84*4882a593Smuzhiyun```shell
85*4882a593Smuzhiyunrk3588_s:/ # dmesg | grep rknpu -i
86*4882a593SmuzhiyunRKNPU fdab0000.npu: RKNPU: sram region: [0x00000000ff001000, 0x00000000ff0f0000), sram size: 0xef000
87*4882a593Smuzhiyun```
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun2、SRAM使用情况查询
90*4882a593Smuzhiyun* 可通过节点查询SRAM的使用情况
91*4882a593Smuzhiyun* 如下为未使用SRAM的位图表,每个点表示4K大小
92*4882a593Smuzhiyun```shell
93*4882a593Smuzhiyunrk3588_s:/ # cat /sys/kernel/debug/rknpu/mm
94*4882a593SmuzhiyunSRAM bitmap: "*" - used, "." - free (1bit = 4KB)
95*4882a593Smuzhiyun[000] [................................]
96*4882a593Smuzhiyun[001] [................................]
97*4882a593Smuzhiyun[002] [................................]
98*4882a593Smuzhiyun[003] [................................]
99*4882a593Smuzhiyun[004] [................................]
100*4882a593Smuzhiyun[005] [................................]
101*4882a593Smuzhiyun[006] [................................]
102*4882a593Smuzhiyun[007] [...............]
103*4882a593SmuzhiyunSRAM total size: 978944, used: 0, free: 978944
104*4882a593Smuzhiyun```
105*4882a593Smuzhiyun* 如下为分配使用512KB后的SRAM位图表
106*4882a593Smuzhiyun```shell
107*4882a593Smuzhiyunrk3588_s:/ # cat /sys/kernel/debug/rknpu/mm
108*4882a593SmuzhiyunSRAM bitmap: "*" - used, "." - free (1bit = 4KB)
109*4882a593Smuzhiyun[000] [********************************]
110*4882a593Smuzhiyun[001] [********************************]
111*4882a593Smuzhiyun[002] [********************************]
112*4882a593Smuzhiyun[003] [********************************]
113*4882a593Smuzhiyun[004] [................................]
114*4882a593Smuzhiyun[005] [................................]
115*4882a593Smuzhiyun[006] [................................]
116*4882a593Smuzhiyun[007] [...............]
117*4882a593SmuzhiyunSRAM total size: 978944, used: 524288, free: 454656
118*4882a593Smuzhiyun```
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun3、通过RKNN API查询SRAM大小
121*4882a593Smuzhiyun* 通过rknn_query的RKNN_QUERY_MEM_SIZE接口查询SRAM大小信息
122*4882a593Smuzhiyun```C++
123*4882a593Smuzhiyuntypedef struct _rknn_mem_size {
124*4882a593Smuzhiyun    uint32_t total_weight_size;
125*4882a593Smuzhiyun    uint32_t total_internal_size;
126*4882a593Smuzhiyun    uint64_t total_dma_allocated_size;
127*4882a593Smuzhiyun    uint32_t total_sram_size;
128*4882a593Smuzhiyun    uint32_t free_sram_size;
129*4882a593Smuzhiyun    uint32_t reserved[10];
130*4882a593Smuzhiyun} rknn_mem_size;
131*4882a593Smuzhiyun```
132*4882a593Smuzhiyun* 其中,total_sram_size表示:系统给RKNPU分配的SRAM总大小
133*4882a593Smuzhiyun* free_sram_size表示:剩余RKNPU能使用的SRAM大小
134*4882a593Smuzhiyun
135*4882a593Smuzhiyun4、查看网络SRAM的占用情况
136*4882a593Smuzhiyun* 板端环境中,RKNN应用运行前设置如下环境变量,可打印SRAM使用预测情况:
137*4882a593Smuzhiyun```shell
138*4882a593Smuzhiyunexport RKNN_LOG_LEVEL=3
139*4882a593Smuzhiyun```
140*4882a593Smuzhiyun* Internal分配SRAM的逐层占用情况,如下日志所示:
141*4882a593Smuzhiyun```shell
142*4882a593Smuzhiyun---------------------------------------------------------------------------
143*4882a593SmuzhiyunTotal allocated Internal SRAM Size: 524288, Addr: [0xff3e0000, 0xff460000)
144*4882a593Smuzhiyun---------------------------------------------------------------------------
145*4882a593Smuzhiyun---------------------------------------------------------------------+----------------------------------+-----------
146*4882a593SmuzhiyunID  User           Tensor   DataType OrigShape      NativeShape      |     [Start       End)       Size |    SramHit
147*4882a593Smuzhiyun---------------------------------------------------------------------+----------------------------------+-----------
148*4882a593Smuzhiyun1   ConvRelu       input0   INT8     (1,3,224,224)  (1,1,224,224,3)  | 0xff3b0000 0xff3d4c00 0x00024c00 | \
149*4882a593Smuzhiyun2   ConvRelu       output2  INT8     (1,32,112,112) (1,2,112,112,16) | 0xff404c00 0xff466c00 0x00062000 | 0x0005b400
150*4882a593Smuzhiyun3   ConvRelu       output4  INT8     (1,32,112,112) (1,4,112,112,16) | 0xff466c00 0xff52ac00 0x000c4000 | 0x00000000
151*4882a593Smuzhiyun4   ConvRelu       output6  INT8     (1,64,112,112) (1,4,112,112,16) | 0xff52ac00*0xff5eec00 0x000c4000 | 0x00000000
152*4882a593Smuzhiyun5   ConvRelu       output8  INT8     (1,64,56,56)   (1,4,56,56,16)   | 0xff3e0000 0xff411000 0x00031000 | 0x00031000
153*4882a593Smuzhiyun6   ConvRelu       output10 INT8     (1,128,56,56)  (1,8,56,56,16)   | 0xff411000 0xff473000 0x00062000 | 0x0004f000
154*4882a593Smuzhiyun7   ConvRelu       output12 INT8     (1,128,56,56)  (1,8,56,56,16)   | 0xff473000 0xff4d5000 0x00062000 | 0x00000000
155*4882a593Smuzhiyun8   ConvRelu       output14 INT8     (1,128,56,56)  (1,8,56,56,16)   | 0xff3e0000 0xff442000 0x00062000 | 0x00062000
156*4882a593Smuzhiyun9   ConvRelu       output16 INT8     (1,128,28,28)  (1,8,28,28,16)   | 0xff442000 0xff45a800 0x00018800 | 0x00018800
157*4882a593Smuzhiyun10  ConvRelu       output18 INT8     (1,256,28,28)  (1,16,28,28,16)  | 0xff3e0000 0xff411000 0x00031000 | 0x00031000
158*4882a593Smuzhiyun11  ConvRelu       output20 INT8     (1,256,28,28)  (1,16,28,28,16)  | 0xff411000 0xff442000 0x00031000 | 0x00031000
159*4882a593Smuzhiyun12  ConvRelu       output22 INT8     (1,256,28,28)  (1,16,28,28,16)  | 0xff3e0000 0xff411000 0x00031000 | 0x00031000
160*4882a593Smuzhiyun13  ConvRelu       output24 INT8     (1,256,14,14)  (1,16,14,14,16)  | 0xff411000 0xff41d400 0x0000c400 | 0x0000c400
161*4882a593Smuzhiyun14  ConvRelu       output26 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800
162*4882a593Smuzhiyun15  ConvRelu       output28 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3f8800 0xff411000 0x00018800 | 0x00018800
163*4882a593Smuzhiyun16  ConvRelu       output30 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800
164*4882a593Smuzhiyun17  ConvRelu       output32 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3f8800 0xff411000 0x00018800 | 0x00018800
165*4882a593Smuzhiyun18  ConvRelu       output34 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800
166*4882a593Smuzhiyun19  ConvRelu       output36 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3f8800 0xff411000 0x00018800 | 0x00018800
167*4882a593Smuzhiyun20  ConvRelu       output38 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800
168*4882a593Smuzhiyun21  ConvRelu       output40 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3f8800 0xff411000 0x00018800 | 0x00018800
169*4882a593Smuzhiyun22  ConvRelu       output42 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800
170*4882a593Smuzhiyun23  ConvRelu       output44 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3f8800 0xff411000 0x00018800 | 0x00018800
171*4882a593Smuzhiyun24  ConvRelu       output46 INT8     (1,512,14,14)  (1,32,14,14,16)  | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800
172*4882a593Smuzhiyun25  ConvRelu       output48 INT8     (1,512,7,7)    (1,33,7,7,16)    | 0xff3f8800 0xff3ff000 0x00006800 | 0x00006800
173*4882a593Smuzhiyun26  ConvRelu       output50 INT8     (1,1024,7,7)   (1,67,7,7,16)    | 0xff3e0000 0xff3ed000 0x0000d000 | 0x0000d000
174*4882a593Smuzhiyun27  ConvRelu       output52 INT8     (1,1024,7,7)   (1,67,7,7,16)    | 0xff3ed000 0xff3fa000 0x0000d000 | 0x0000d000
175*4882a593Smuzhiyun28  AveragePool    output54 INT8     (1,1024,7,7)   (1,67,7,7,16)    | 0xff3e0000 0xff3ed000 0x0000d000 | 0x0000d000
176*4882a593Smuzhiyun29  Conv           output55 INT8     (1,1024,1,1)   (1,64,1,1,16)    | 0xff3ed000 0xff3ed400 0x00000400 | 0x00000400
177*4882a593Smuzhiyun30  Softmax        output56 INT8     (1,1000,1,1)   (1,64,1,1,16)    | 0xff3e0000 0xff3e0400 0x00000400 | 0x00000400
178*4882a593Smuzhiyun31  OutputOperator output57 FLOAT    (1,1000,1,1)   (1,1000,1,1)     | 0xff3ae000 0xff3aefa0 0x00000fa0 | \
179*4882a593Smuzhiyun---------------------------------------------------------------------+----------------------------------+-----------
180*4882a593Smuzhiyun----------------------------------------
181*4882a593SmuzhiyunTotal Weight Memory Size: 4260864
182*4882a593SmuzhiyunTotal Internal Memory Size: 2157568
183*4882a593SmuzhiyunPredict Internal Memory RW Amount: 11068320
184*4882a593SmuzhiyunPredict Weight Memory RW Amount: 4260832
185*4882a593SmuzhiyunPredict SRAM Hit RW Amount: 6688768
186*4882a593Smuzhiyun----------------------------------------
187*4882a593Smuzhiyun```
188*4882a593Smuzhiyun* 其中上面文本图表中的SramHit为当前层Tensor所占用的SRAM大小,一般情况下将会节省当前大小的读+写的带宽
189*4882a593Smuzhiyun* Predict SRAM Hit RW Amount表示整个网络SRAM的读写预测情况,可近似估计每帧节省的带宽
190*4882a593Smuzhiyun* 注意:Linux环境日志重定向到终端,Android环境日志重定向到logcat
191