xref: /OK3568_Linux_fs/u-boot/doc/README.memory-test (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593SmuzhiyunThe most frequent cause of problems when porting U-Boot to new
2*4882a593Smuzhiyunhardware, or when using a sloppy port on some board, is memory errors.
3*4882a593SmuzhiyunIn most cases these are not caused by failing hardware, but by
4*4882a593Smuzhiyunincorrect initialization of the memory controller.  So it appears to
5*4882a593Smuzhiyunbe a good idea to always test if the memory is working correctly,
6*4882a593Smuzhiyunbefore looking for any other potential causes of any problems.
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunU-Boot implements 3 different approaches to perform memory tests:
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun1. The get_ram_size() function (see "common/memsize.c").
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun   This function is supposed to be used in each and every U-Boot port
13*4882a593Smuzhiyun   determine the presence and actual size of each of the potential
14*4882a593Smuzhiyun   memory banks on this piece of hardware.  The code is supposed to be
15*4882a593Smuzhiyun   very fast, so running it for each reboot does not hurt.  It is a
16*4882a593Smuzhiyun   little known and generally underrated fact that this code will also
17*4882a593Smuzhiyun   catch 99% of hardware related (i. e. reliably reproducible) memory
18*4882a593Smuzhiyun   errors.  It is strongly recommended to always use this function, in
19*4882a593Smuzhiyun   each and every port of U-Boot.
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun2. The "mtest" command.
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun   This is probably the best known memory test utility in U-Boot.
24*4882a593Smuzhiyun   Unfortunately, it is also the most problematic, and the most
25*4882a593Smuzhiyun   useless one.
26*4882a593Smuzhiyun
27*4882a593Smuzhiyun   There are a number of serious problems with this command:
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun   - It is terribly slow.  Running "mtest" on the whole system RAM
30*4882a593Smuzhiyun     takes a _long_ time before there is any significance in the fact
31*4882a593Smuzhiyun     that no errors have been found so far.
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun   - It is difficult to configure, and to use.  And any errors here
34*4882a593Smuzhiyun     will reliably crash or hang your system.  "mtest" is dumb and has
35*4882a593Smuzhiyun     no knowledge about memory ranges that may be in use for other
36*4882a593Smuzhiyun     purposes, like exception code, U-Boot code and data, stack,
37*4882a593Smuzhiyun     malloc arena, video buffer, log buffer, etc.  If you let it, it
38*4882a593Smuzhiyun     will happily "test" all such areas, which of course will cause
39*4882a593Smuzhiyun     some problems.
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun   - It is not easy to configure and use, and a large number of
42*4882a593Smuzhiyun     systems are seriously misconfigured.  The original idea was to
43*4882a593Smuzhiyun     test basically the whole system RAM, with only exempting the
44*4882a593Smuzhiyun     areas used by U-Boot itself - on most systems these are the areas
45*4882a593Smuzhiyun     used for the exception vectors (usually at the very lower end of
46*4882a593Smuzhiyun     system memory) and for U-Boot (code, data, etc. - see above;
47*4882a593Smuzhiyun     these are usually at the very upper end of system memory).  But
48*4882a593Smuzhiyun     experience has shown that a very large number of ports use
49*4882a593Smuzhiyun     pretty much bogus settings of CONFIG_SYS_MEMTEST_START and
50*4882a593Smuzhiyun     CONFIG_SYS_MEMTEST_END; this results in useless tests (because
51*4882a593Smuzhiyun     the ranges is too small and/or badly located) or in critical
52*4882a593Smuzhiyun     failures (system crashes).
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun   Because of these issues, the "mtest" command is considered depre-
55*4882a593Smuzhiyun   cated.  It should not be enabled in most normal ports of U-Boot,
56*4882a593Smuzhiyun   especially not in production.  If you really need a memory test,
57*4882a593Smuzhiyun   then see 1. and 3. above resp. below.
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun3. The most thorough memory test facility is available as part of the
60*4882a593Smuzhiyun   POST (Power-On Self Test) sub-system, see "post/drivers/memory.c".
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun   If you really need to perform memory tests (for example, because
63*4882a593Smuzhiyun   it is mandatory part of your requirement specification), then
64*4882a593Smuzhiyun   enable this test which is generic and should work on all archi-
65*4882a593Smuzhiyun   tectures.
66*4882a593Smuzhiyun
67*4882a593SmuzhiyunWARNING:
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunIt should pointed out that _all_ these memory tests have one
70*4882a593Smuzhiyunfundamental, unfixable design flaw:  they are based on the assumption
71*4882a593Smuzhiyunthat memory errors can be found by writing to and reading from memory.
72*4882a593SmuzhiyunUnfortunately, this is only true for the relatively harmless, usually
73*4882a593Smuzhiyunstatic errors like shorts between data or address lines, unconnected
74*4882a593Smuzhiyunpins, etc.  All the really nasty errors which will first turn your
75*4882a593Smuzhiyunhair gray, only to make you tear it out later, are dynamical errors,
76*4882a593Smuzhiyunwhich usually happen not with simple read or write cycles on the bus,
77*4882a593Smuzhiyunbut when performing back-to-back data transfers in burst mode.  Such
78*4882a593Smuzhiyunaccesses usually happen only for certain DMA operations, or for heavy
79*4882a593Smuzhiyuncache use (instruction fetching, cache flushing).  So far I am not
80*4882a593Smuzhiyunaware of any freely available code that implements a generic, and
81*4882a593Smuzhiyunefficient, memory test like that.  The best known test case to stress
82*4882a593Smuzhiyuna system like that is to boot Linux with root file system mounted over
83*4882a593SmuzhiyunNFS, and then build some larger software package natively (say,
84*4882a593Smuzhiyuncompile a Linux kernel on the system) - this will cause enough context
85*4882a593Smuzhiyunswitches, network traffic (and thus DMA transfers from the network
86*4882a593Smuzhiyuncontroller), varying RAM use, etc. to trigger any weak spots in this
87*4882a593Smuzhiyunarea.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunNote: An attempt was made once to implement such a test to catch
90*4882a593Smuzhiyunmemory problems on a specific board.  The code is pretty much board
91*4882a593Smuzhiyunspecific (for example, it includes setting specific GPIO signals to
92*4882a593Smuzhiyunprovide triggers for an attached logic analyzer), but you can get an
93*4882a593Smuzhiyunidea how it works: see "examples/standalone/test_burst*".
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunNote 2: Ironically enough, the "test_burst" did not catch any RAM
96*4882a593Smuzhiyunerrors, not a single one ever.  The problems this code was supposed
97*4882a593Smuzhiyunto catch did not happen when accessing the RAM, but when reading from
98*4882a593SmuzhiyunNOR flash.
99