sysperfana/perfinvestigator/com.nokia.carbide.cpp.pi.doc.user/html/concepts/func_level_load.htm
changeset 2 b9ab3b238396
child 5 844b047e260d
equal deleted inserted replaced
1:1050670c6980 2:b9ab3b238396
       
     1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
       
     2 "http://www.w3.org/TR/html4/loose.dtd">
       
     3 <html>
       
     4 <head>
       
     5 	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
       
     6 	<title>Function Level Load Analysis</title>
       
     7     <link href="../../book.css" rel="stylesheet" type="text/css">
       
     8 </head>
       
     9 
       
    10 <body>
       
    11 <h2>Function Level Load Analysis</h2>
       
    12 <p>Function level load analysis is based on the same principles as the thread level load analysis. The main difference is that the information collected within the periodic interrupt is used in a different way and combined with external symbolic information collected at compile-time. In function level load analysis, the importance is on the recorded value of the retrieved program counter register (PC), indicating the location of the execution before the periodic interrupt. By comparing the value of the PC with the memory map within the symbolic information, the name of the function that was executed can be extracted. In addition, the location of the interrupted execution within the function can be calculated as an offset value.</p>
       
    13 <p><a href="#fig2">Figure 1</a> shows how information from the symbolic function list is used to resolve a function&rsquo;s name and an offset to its starting point. The procedure is simple. In the example, the address 0x5032014e is from a periodic interrupt, thus execution was in that location when the interrupt occurred. In the analysis phase, the name of the function is calculated by first searching for a function from the symbolic function name list that matches the address in question. A match means that the searched address must be larger than the function start address (0x5032011c) and smaller than the function address + function length (0x5032011c + 0x50 = 0x5032016c). The addresses in the symbolic function name list appear in order, arranged according to the function addresses. Since the functions never overlap each other, there can be only one function matching each address. The example address 0x5032014e fulfils both of these conditions. Therefore the function name can be resolved as shown in Figure 2. In addition, a more precise place of execution within the function can be calculated by subtracting the function&rsquo;s start address from the sampled value. A value smaller than the function length (in this case 0x50) should be the outcome of the subtraction. The calculation in this example indicates that the execution has been in a location that is 0x32 bytes from the start of the function.</p>
       
    14 <p>When analyzing large amounts of sampled PC values gathered during normal execution, the most heavily loaded functions can be figured out with analysis that maps the PC values with functions, since without the symbolic information it would be impossible to distinguish between different functions. With periodic sampling, a heavily loaded function has more samples in a time period than other functions. Interpretation of the results is always case-dependent and accordingly there is no generic value or percentage that would distinguish heavily loaded functions from others. It is also important to notice that the amount of execution time spent in a function does not reveal the actual cause of the load nor the actual amount of function calls made to the function. The time spent in a function is a combination of the initial status of the data accessed by the function, the input parameters of the function, and the state of other activities performed in parallel with the function. Therefore the understanding of the statistical nature of function-level load analysis is a strong prerequisite for further analysis aiming at resolving the actual cause of the load.</p>
       
    15 <p align="center"><img src="../images/fig3_rslv_func_offset.png" width="580" height="304"></p>
       
    16 <p class="figure"><a name="fig2"></a>Figure 1. Resolving the function name and a relative offset</p>
       
    17 <p>A high load percentage in one function is not always an indication of a performance problem. Certain functions have a well justified right to occupy processing time at certain occurrences. One way to have more understanding of the results of the function-level load analysis is to perform a linked function level load analysis, in which the call/callee relations of function call chains can be resolved to a certain extent. This however requires more complicated instrumentation and analysis methods as will be explained in the following.</p>
       
    18 <h3>Linked function level load tracing and analysis (Function Call trace)</h3>
       
    19 <p>When performing periodic sampling, processing of a normal scheduled load (application threads and processes) is interrupted externally with a timer interrupt. During the time the interrupt service routine executes, the register values within the processor and the data values within the kernel structures can be investigated. As explained in the previous examples of periodic sampling and function level load analysis, the stored value of the program counter register (PC) is used to resolve the location of the interrupted execution.</p>
       
    20 <p>In addition to the value of the PC, another important register within the ARM architecture is the link register (LR). With the GCC (and RVCT) compiler options set for current Symbian build, entering an arbitrary function (performing a function call) takes place with a machine instruction called branch and link (BL). In the internal operation of the ARM processor, the BL instruction stores the return address to the link register. The return address follows the BL instruction that causes the branch. Returning from a function takes place simply by storing the LR value back to the PC, thus forcing the processor to continue execution from the location in the function that preceded the branch.</p>
       
    21 <p>Accordingly, it is possible to resolve the function that made the call to the currently executing function by examining the value of the LR at any arbitrary point of execution. The value can be used to add important information to the one produced by the function analysis. From a large number of individual samples it is possible to create a statistical distribution of the callers of each function. When the distribution of each function&rsquo;s callers is known, the caller and callee functions can be connected together in the analysis. This way it is possible to construct more complex representations of the relevant functions and their call relationships within the sampled execution. In other words, the individual functions can be connected together into a grid that gives a more comprehensive view on the execution by giving indication about the functions that initiate complex function call chains.</p>
       
    22 <p align="center"><img src="../images/fig4_prin_lnkd_func.png" width="937" height="485"></p>
       
    23 <p class="figure"> Figure 2. Basic principle of linked function level load analysis</p>
       
    24 <p>Just to make things a bit more complicated, there is one important circumstance in which the value in the LR is invalid for use with the ARM processor and GCC (and RVCT) compiler with Symbian software. After entering a function, the value of the return address is stored in the LR by the BL instruction. According to common practices, the value of the LR register is most often pushed to the stack (depending on the compiler-initiated decisions within the function&rsquo;s register usage). The LR can then be retrieved from the stack just before leaving the function, or in some cases the value can be copied directly from the stack to the PC register in order to cause the processor to branch. Pushing the LR to the stack makes it possible to waste the original value within the LR within the function, without having to worry about its appropriate retrieval before leaving the function and thus retrieving the value from the stack. The compiler takes care of all of this automatically. In the presented use, however, it is essential to know whether the value in the LR is the correct value (pointing to the original return address).</p>
       
    25 <p>Fortunately, the mechanism in which the LR gets wasted is in most cases quite trivial. The original value is overwritten in a subsequent branch and link (BL) command within the function. After a subsequent branch, the LR value remains to point to the return value of the last BL command executed, and therefore the original value is lost. In all cases, this last value has to be inside the function in which the execution takes place. Therefore, a simple rule can be applied. If the value within the LR register points outside the currently executing function, it points to the original return value of the function. On the contrary, if the value within the LR points to a value within the currently executing function, its value is due to a subsequent branch that has taken place within the currently executing function. In such cases the original value remains unknown. In practice, the original value remains in the stack, but due to the dynamic nature of stack utilization within a function, its retrieval would be much more complicated. In practice the retrieval would require instruction-level back-trace of the function&rsquo;s execution.</p>
       
    26 <p align="center"><img src="../images/figure5.png" width="849" height="482"></p>
       
    27 <p class="figure">Figure 3. Resolving correct values in linked function level load analysis</p>
       
    28 <p>Thus, through periodic sampling it is possible to always retrieve the address of the interrupted execution from the PC, and in a certain proportion of cases it is also possible to retrieve the caller of the function in which the execution takes place from the LR. The proportion of those two cases is in practice dictated by the proportion of instructions executed on average before and after the first subsequent BL command within the function. When analyzing the results, this proportion can be calculated for each function as a proportion between the sampled LR values that point outside the function and the values that point inside the function. In the analysis within the Performance Investigator, this proportion is used in extrapolating the amount of callers to a certain function. It is assumed that the distribution of callers of a certain function is the same in the samples from which the caller function could not be resolved as it is within the samples in which the value could be retrieved. This is shown in Figure 3. In specific circumstances, this assumption can result in an error, mainly by multiplying &ldquo;noise&rdquo; to statistically important dimensions. This has to be taken into consideration when reading the results.</p>
       
    29 <div id="footer">Copyright &copy; 2009 Nokia Corporation and/or its subsidiary(-ies). All rights reserved. <br>License: <a href="http://www.eclipse.org/legal/epl-v10.html">http://www.eclipse.org/legal/epl-v10.html</a></div>
       
    30 </body>
       
    31 </html>