xref: /OK3568_Linux_fs/u-boot/doc/README.440-DDR-performance (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593SmuzhiyunAMCC suggested to set the PMU bit to 0 for best performace on the
2*4882a593SmuzhiyunPPC440 DDR controller. The 440er common DDR setup files (sdram.c &
3*4882a593Smuzhiyunspd_sdram.c) are changed accordingly. So all 440er boards using
4*4882a593Smuzhiyunthese setup routines will automatically receive this performance
5*4882a593Smuzhiyunincrease.
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunPlease see below some benchmarks done by AMCC to demonstrate this
8*4882a593Smuzhiyunperformance changes:
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun----------------------------------------
12*4882a593SmuzhiyunSDRAM0_CFG0[PMU] = 1 (U-Boot default for Bamboo, Yosemite and Yellowstone)
13*4882a593Smuzhiyun----------------------------------------
14*4882a593SmuzhiyunStream benchmark results
15*4882a593Smuzhiyun-------------------------------------------------------------
16*4882a593SmuzhiyunThis system uses 8 bytes per DOUBLE PRECISION word.
17*4882a593Smuzhiyun-------------------------------------------------------------
18*4882a593SmuzhiyunArray size = 2000000, Offset = 0
19*4882a593SmuzhiyunTotal memory required = 45.8 MB.
20*4882a593SmuzhiyunEach test is run 10 times, but only
21*4882a593Smuzhiyunthe *best* time for each is used.
22*4882a593Smuzhiyun-------------------------------------------------------------
23*4882a593SmuzhiyunYour clock granularity/precision appears to be 1 microseconds.
24*4882a593SmuzhiyunEach test below will take on the order of 112345 microseconds.
25*4882a593Smuzhiyun   (= 112345 clock ticks)
26*4882a593SmuzhiyunIncrease the size of the arrays if this shows that you are not getting
27*4882a593Smuzhiyunat least 20 clock ticks per test.
28*4882a593Smuzhiyun-------------------------------------------------------------
29*4882a593SmuzhiyunWARNING -- The above is only a rough guideline.
30*4882a593SmuzhiyunFor best results, please be sure you know the precision of your system
31*4882a593Smuzhiyuntimer.
32*4882a593Smuzhiyun-------------------------------------------------------------
33*4882a593SmuzhiyunFunction      Rate (MB/s)   RMS time     Min time     Max time
34*4882a593SmuzhiyunCopy:         256.7683       0.1248       0.1246       0.1250
35*4882a593SmuzhiyunScale:        246.0157       0.1302       0.1301       0.1302
36*4882a593SmuzhiyunAdd:          255.0316       0.1883       0.1882       0.1885
37*4882a593SmuzhiyunTriad:        253.1245       0.1897       0.1896       0.1899
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunTTCP Benchmark Results
41*4882a593Smuzhiyunttcp-t: socket
42*4882a593Smuzhiyunttcp-t: connect
43*4882a593Smuzhiyunttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
44*4882a593Smuzhiyunlocalhost
45*4882a593Smuzhiyunttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++
46*4882a593Smuzhiyunttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57
47*4882a593Smuzhiyunttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun----------------------------------------
50*4882a593SmuzhiyunSDRAM0_CFG0[PMU] = 0 (Suggested modification)
51*4882a593SmuzhiyunSetting PMU = 0 provides a noticeable performance improvement *2% to
52*4882a593Smuzhiyun5% improvement in memory performance.
53*4882a593Smuzhiyun*Improves the Mbit/sec for TTCP benchmark by almost 76%.
54*4882a593Smuzhiyun----------------------------------------
55*4882a593SmuzhiyunStream benchmark results
56*4882a593Smuzhiyun-------------------------------------------------------------
57*4882a593SmuzhiyunThis system uses 8 bytes per DOUBLE PRECISION word.
58*4882a593Smuzhiyun-------------------------------------------------------------
59*4882a593SmuzhiyunArray size = 2000000, Offset = 0
60*4882a593SmuzhiyunTotal memory required = 45.8 MB.
61*4882a593SmuzhiyunEach test is run 10 times, but only
62*4882a593Smuzhiyunthe *best* time for each is used.
63*4882a593Smuzhiyun-------------------------------------------------------------
64*4882a593SmuzhiyunYour clock granularity/precision appears to be 1 microseconds.
65*4882a593SmuzhiyunEach test below will take on the order of 120066 microseconds.
66*4882a593Smuzhiyun   (= 120066 clock ticks)
67*4882a593SmuzhiyunIncrease the size of the arrays if this shows that you are not getting
68*4882a593Smuzhiyunat least 20 clock ticks per test.
69*4882a593Smuzhiyun-------------------------------------------------------------
70*4882a593SmuzhiyunWARNING -- The above is only a rough guideline.
71*4882a593SmuzhiyunFor best results, please be sure you know the precision of your system
72*4882a593Smuzhiyuntimer.
73*4882a593Smuzhiyun-------------------------------------------------------------
74*4882a593SmuzhiyunFunction      Rate (MB/s)   RMS time     Min time     Max time
75*4882a593SmuzhiyunCopy:         262.5167       0.1221       0.1219       0.1223
76*4882a593SmuzhiyunScale:        258.4856       0.1238       0.1238       0.1240
77*4882a593SmuzhiyunAdd:          262.5404       0.1829       0.1828       0.1831
78*4882a593SmuzhiyunTriad:        266.8594       0.1800       0.1799       0.1802
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunTTCP Benchmark Results
81*4882a593Smuzhiyunttcp-t: socket
82*4882a593Smuzhiyunttcp-t: connect
83*4882a593Smuzhiyunttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
84*4882a593Smuzhiyunlocalhost
85*4882a593Smuzhiyunttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++
86*4882a593Smuzhiyunttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89
87*4882a593Smuzhiyunttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun2006-07-28, Stefan Roese <sr@denx.de>
91