Impact of Data Paging on User Side Code Guide

Explains cases in which data paging degrades the performance of user side code and sets out strategies to mitigate these effects.

Purpose

Data paging degrades the performance of user side code. This document describes strategies to mitigate these effects. It is intended for application developers whose code uses device drivers.

Intended Audience:

Application developers writing modules which involve device drivers.

Mitigating the impact of data paging on user side code

Data paging is a technique which increases the size of virtual RAM by holding data on external media and read into physical RAM when accessed. The technique trades off an increase in available RAM against decreased performance. Developers must allow for the impact on performance and aim to mitigate it by using the practices described in this document.

Data paging is mainly a property of processes. Processes can be configured to be paged or unpaged when they are built or put into a ROM image. Threads and the data which they use inherit the data paging configuration of the creating process and that configuration can be modified at the level of the thread or the individual items of data.

Thread scheduling

When a platform uses data paging there is a higher risk of delays, timing-related defects and race conditions.

When a thread accesses paged memory, the required page may be paged in (actually in RAM) or paged out (stored in media). If it is paged out, a page fault results, slowing performance by a factor of thousands and sometimes up to a million. The delay can also expose latent race conditions and timing-related defects in existing code: for instance an asynchronous request to a server may appear to complete synchronously, returning control to the client before the request has completed with incorrect behavior as a result.

The cure for this problem is to configure data paging when chunks, heaps and threads are created.

When creating a thread of class RThread you can call the creating function RThread::Create() with an argument of class TThreadCreateInfo as argument. You use an instance of this class to configure the data paging attributes to one of

  • EUnspecified (the thread inherits the paging attributes of the creating process),

  • EPaged (the thread will data page its stack and heap), or

  • EUnpaged (the thread will not data page its stack and heap).

When creating a chunk of class RChunk you can call the creating function RChunk::Create() with an argument of class TChunkCreateInfo as argument. You use an instance of this class to configure the data paging attributes to one of

  • EUnspecified (the chunk inherits the paging attributes of the creating process),

  • EPaged (the chunk will be data paged), or

  • EUnpaged (the chunk will not be data paged).

The RChunk class also has a function IsPaged() to test whether a chunk is data paged.

When creating a chunk heap of class UserHeap you can call the creating function UserHeap::ChunkHeap() with an argument of class TChunkHeapCreateInfo as argument. You use an instance of this class to configure the data paging attributes to one of

  • EUnspecified (the heap inherits the paging attributes of the creating process),

  • EPaged (the heap will be data paged), or

  • EUnpaged (the heap will not be data paged).

Inter-process communication

Data paging impacts on inter-process communication when a non-paged server accesses paged memory passed from a client. If the memory being accessed is not paged in, unpredictable delays may occur, and when the server offers performance guarantees to its clients, all the other clients will be affected as well. There are three separate solutions to this problem:

  • pinning memory automatically,

  • pinning memory as requested by the client, and

  • using separate threads for paged and unpaged clients.

Pinning paged memory means paging it into the RAM cache (if it is not already present) and preventing it from being paged out until it is unpinned.

You can set a server so that all memory passed to it by a client call gets pinned for the duration of the call. You do so by calling the function SetPinClientDescriptors() of a CServer2 object after construction but before calling CServer2::Start() for the first time. This method is easy but wasteful, as all memory gets pinned not just the data needs and the performance of the paging cache is impacted. Automatic pinning should therefore only be used as a last resort.

You can pin specified items of memory at the request of the client by calling the PinArgs() function of the TIpcArgs class. This is the more efficient method as it allows fine-grained control over what memory is pinned.

Separate threads for paged and unpaged clients.

Thread performance

The set of pages accessed by a thread over a given period of time is called its working set. If the working set is paged, the performance of the thread degrades as the working set increases. When working with paged memory it is therefore important to minimise the working set.

The main solution to this problem is to choose data structures with high locality, that is data structure residing in single or adjacent pages. An example of this is a preference for arrays rather than linked lists, since arrays are usually in adjacent pages while the elements of linked lists may reside on multiple pages.