xref: /OK3568_Linux_fs/external/rknpu2/README.md (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun## RKNPU2
2*4882a593Smuzhiyun  RKNPU2 provides an advanced interface to access Rockchip NPU.
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun## Support Platform
5*4882a593Smuzhiyun  - RK3566/RK3568
6*4882a593Smuzhiyun  - RK3588/RK3588S
7*4882a593Smuzhiyun  - RV1103/RV1106
8*4882a593Smuzhiyun  - RK3562
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunNote:
11*4882a593Smuzhiyun      The rknn model must be generated using RKNN Toolkit 2:  https://github.com/rockchip-linux/rknn-toolkit2
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun​      **For RK1808/RV1109/RV1126/RK3399Pro, please use:**
14*4882a593Smuzhiyun
15*4882a593Smuzhiyunhttps://github.com/rockchip-linux/rknn-toolkit
16*4882a593Smuzhiyun
17*4882a593Smuzhiyunhttps://github.com/rockchip-linux/rknpu
18*4882a593Smuzhiyun
19*4882a593Smuzhiyunhttps://github.com/airockchip/RK3399Pro_npu
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun## ReleaseLog
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun# 1.5.0
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun- Support RK3562
26*4882a593Smuzhiyun- Support more NPU operator fuse, such as Conv-Silu/Conv-Swish/Conv-Hardswish/Conv-sigmoid/Conv-HardSwish/Conv-Gelu ..
27*4882a593Smuzhiyun- Improve support for  NHWC output layout
28*4882a593Smuzhiyun- RK3568/RK3588:The maximum input resolution up to 8192
29*4882a593Smuzhiyun- Improve support for Swish/DataConvert/Softmax/Lstm/LayerNorm/Gather/Transpose/Mul/Maxpool/Sigmoid/Pad
30*4882a593Smuzhiyun- Improve support for CPU operators (Cast, Sin, Cos, RMSNorm, ScalerND, GRU)
31*4882a593Smuzhiyun- Limited support for dynamic resolution
32*4882a593Smuzhiyun- Provide MATMUL API
33*4882a593Smuzhiyun- Add RV1103/RV1106 rknn_server application as proxy between PC and board
34*4882a593Smuzhiyun- Add more examples such as rknn_dynamic_shape_input_demo and video demo for yolov5
35*4882a593Smuzhiyun- Bug fix
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun### 1.4.0
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun- Support more NPU operators, such as Reshape、Transpose、MatMul、 Max、Min、exGelu、exSoftmax13、Resize etc.
42*4882a593Smuzhiyun- Add **Weight Share**  function, reduce memory usage.
43*4882a593Smuzhiyun- Add **Weight Compression** function, reduce memory and bandwidth usage.(RK3588/RV1103/RV1106)
44*4882a593Smuzhiyun- RK3588 supports storing weights or feature maps on SRAM, reducing system bandwidth consumption.
45*4882a593Smuzhiyun- RK3588 adds the function of running a single model on multiple cores at the same time.
46*4882a593Smuzhiyun- Add new output layout NHWC (C has alignment restrictions) .
47*4882a593Smuzhiyun- Improve support for non-4D input.
48*4882a593Smuzhiyun- Add more examples such as rknn_yolov5_android_apk_demo and rknn_internal_mem_reuse_demo.
49*4882a593Smuzhiyun- Bug fix.
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun### 1.3.0
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun- Support RV1103/RV1106(Beta SDK)
54*4882a593Smuzhiyun- rknn_tensor_attr support w_stride(rename from stride) and h_stride
55*4882a593Smuzhiyun- Rename rknn_destroy_mem()
56*4882a593Smuzhiyun- Support more NPU operators, such as Where, Resize, Pad, Reshape, Transpose etc.
57*4882a593Smuzhiyun- RK3588 support multi-batch multi-core mode
58*4882a593Smuzhiyun- When RKNN_LOG_LEVEL=4, it supports to display the MACs utilization and bandwidth occupation of each layer.
59*4882a593Smuzhiyun- Bug fix
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun### 1.2.0
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun- Support RK3588
64*4882a593Smuzhiyun- Support more operators, such as GRU、Swish、LayerNorm etc.
65*4882a593Smuzhiyun- Reduce memory usage
66*4882a593Smuzhiyun- Improve zero-copy interface implementation
67*4882a593Smuzhiyun- Bug fix
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun### 1.1.0
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun   - Support INT8+FP16 mixed quantization to improve model accuracy
72*4882a593Smuzhiyun   - Support specifying input and output dtype, which can be solidified into the model
73*4882a593Smuzhiyun   - Support multiple inputs of the model with different channel mean/std
74*4882a593Smuzhiyun   - Improve the stability of multi-thread + multi-process runtime
75*4882a593Smuzhiyun   - Support flashing cache for fd pointed to internal tensor memory which are allocated by users
76*4882a593Smuzhiyun   - Improve dumping internal layer results of the model
77*4882a593Smuzhiyun   - Add rknn_server application as proxy between PC and board
78*4882a593Smuzhiyun   - Support more operators, such as HardSigmoid、HardSwish、Gather、ReduceMax、Elu
79*4882a593Smuzhiyun   - Add LSTM support (structure cifg and peephole are not supported, function: layernormal, clip is not supported)
80*4882a593Smuzhiyun   - Bug fix
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun### 1.0
84*4882a593Smuzhiyun   - Optimize the performance of rknn_inputs_set()
85*4882a593Smuzhiyun   - Add more functions for zero-copy
86*4882a593Smuzhiyun   - Add new OP support, see OP support list document for details.
87*4882a593Smuzhiyun   - Add multi-process support
88*4882a593Smuzhiyun   - Support per-channel quantitative model
89*4882a593Smuzhiyun   - Bug fix
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun### 0.7
93*4882a593Smuzhiyun   - Optimize the performance of rknn_inputs_set(), especially for models whose input width is 8-byte aligned.
94*4882a593Smuzhiyun   - Add new OP support, see OP support list document for details.
95*4882a593Smuzhiyun   - Bug fix
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun### 0.6
98*4882a593Smuzhiyun   - Initial version
99*4882a593Smuzhiyun
100