Device -Driver Writing and Migration Technology Tutorial

Impact of data paging on kernel APIs

The use of -data paging impacts on the task of writing and migrating device drivers in -two main ways: the preconditions for kernel API calls and the performance -of the kernel APIs.

Firstly, kernel APIs which access user memory -may only be called subject to preconditions. The preconditions are that

no fast mutex may be -held while calling them, and
no kernel mutex may -be held while calling them.

The APIs are these:

Kern::KUDesInfo
Kern::InfoCopy
Kern::KUDesGet
Kern::KUDesPut
Kern::KUDesSetLength
umemget, kumemget, umemget32, kumemget32
umemput, kumemput, umemput32, kumemput32
umemset, kumemset, umemset32, kumemset32
Kern::RequestComplete
Kern::ThreadRawRead
Kern::ThreadRawWrite
Kern::ThreadDesRead
Kern::ThreadDesWrite
Kern::ThreadGetDesLength
Kern::ThreadGetDesMaxLength
Kern::ThreadGetDesInfo

Impact of data paging on execution

Device drivers -use kernel side APIs to access user memory, and even when they are called -in accordance with their preconditions they are no longer guaranteed to execute -in a short and bounded time. This is because they may access paged memory -and incur page faults which propagate from one thread to another. This document -discusses how to mitigate the impact of data paging on device drivers.

Mitigating data paging: general principles

Three -general principles are involved in mitigating the impact of data paging on -device drivers.

Device drivers should -not use shared DFC queues.
Device drivers should, -as far as possible, not access paged memory.
If a device driver needs -to access paged memory, it should do so in the context of the client thread.

Driver frameworks

There are three main categories -of device driver:

boot-loaded non-channel -drivers,
media drivers, and
dynamically loaded, -channel-based IO device drivers

The document Demand -Paged Device Drivers Writing Guide discusses these three frameworks -and identifies areas where they are vulnerable to the impact of data paging.

Mitigation techniques

The impact of data paging -on device drivers is mitigated by the use of various different techniques -which are the subject of the rest of this document.

Passing -data by value

Clients should pass data by value not as pointers. -Return values of calls should be return codes not data.

Using dedicated DFC queues

All drivers which use DFCs should -use a dedicated DFC queue to service them. You should not use the kernel queues Kern::DfcQue0 and Kern::DfcQue1 for this purpose. How you create a dedicated DFC queue depends on the nature -of the driver.

To service boot loaded drivers and media drivers, you -create a DFC queue by calling Kern::DfcQueueCreate().

To -service dynamically loaded drivers derived from DLogicalChannelBase you call Kern::DynamicDfcQCreate() with -a TDynamicDfcQue as argument:

TInt Kern::DynamicDfcQCreate(TDynamicDfcQue*& aDfcQ, TInt aPriority, const TDesC& aBaseName);

To service a dynamically loaded driver derived from DLogicalChannel, -you use the DFC queue supplied with it (the member iDfc, -accessed by pointer). To use the queue you call the SetDfcQ() function -during the second phase construction of the LDD.

You destroy queues -by calling their function Destroy() which also terminates -the associated thread.

Setting -realtime state

The realtime state of a thread determines whether -it is enabled to access paged memory. If a thread is realtime (its realtime -state is on) it is guaranteed not to access paged memory, so avoiding unpredictable -delays. The realtime state of a thread may be set to ERealtimeStateOn, ERealtimeStateOff and ERealtimeStateWarn as defined in the enumeration TThreadRealtimeState and -set by the kernel function SetRealtimeState.

If -a driver uses DFC threads and is subject to performance guarantees, their -realtime state should be set to on (this is the default when data paging is -enabled). Otherwise the state should be set to off: the warning state is used -for debugging.

Validating -arguments in client context

It is often necessary to validate -the arguments of a request function. This should be done in the context of -the client thread as far as possible.

When a driver derived from the -class DLogicalChannelBase makes a request this happens -automatically as a call to the Request() function takes -place in the client thread. When the driver is derived from the class DLogicalChannel the -request involves a call to the SendMsg() function inherited -from the base class and it is necessary to override the base implementation -to force evaluation of the arguments within the client thread.

Accessing user memory from client context

The DFC should access -user memory as little as possible. Whenever there is a need to access user -memory and it can be accessed in the context of the client thread, it should -be.

When the driver is derived from the class DLogicalChannelBase, -read and write operations on user memory can be performed with calls to the Request() function -and these take place in the context of the client thread.

When the -driver is derived from the class DLogicalChannel it is -possible to read from and write to user memory by overriding the SendMsg() function -before passing the message on to be processed by the DFC thread if necessary. -If the message is passed on, data must be stored kernel side either on the -client thread's kernel stack or in the channel object.

Message data -can only be stored on the client thread's kernel stack if the message is synchronous -and the size of the data is less than 4Kb. Since the stack is local to the -client it can be used by more than one thread. One way of doing this is to -implement SendMsg() with a call to SendControl() which -is itself implemented to perform the copying in the client thread context -and independently call the SendMsg() function of the parent -class.

Where the message is asynchronous you can use a similar strategy -for overriding the SendMsg() function but this time perform -the copying to a buffer owned by the channel independently of a call to the SendMsg() function -of the parent class. In this case the size of the data must be small (in the -region of 4Kb), there must be only one client using the buffer, and data cannot -be written back to the client.

Using -TClientDataRequest

An asynchronous request often needs to copy -a structure of fixed size to its client to complete a request. The TClientDataRequest object -exists for this purpose: it writes a fixed size structure to user memory and -completes the request in the following steps.

The driver creates a TClientDataRequest object -for each asynchronous client which may be outstanding concurrently: either -one per client or one per request as appropriate.
When the client makes -a request the TClientDataRequest object is set to contain -the address of the client's buffer or descriptor and the address of the client's TRequestStatus. -This takes place in the client context.
The data to be written -is copied into the buffer of the TClientDataRequest object.
A call to Kern::QueueRequestComplete() passes -the address of the TClientDataRequest object.
The client is signalled -immediately.
When the client thread -next runs, the buffer contents and completion value are written to the client.

Using -TClientBufferRequest

When it is necessary to access user memory -from a DFC thread context, that memory must be pinned for the duration of -the request and unpinned when the request is completed. The pinning must be -performed in the context of the client thread. The TClientBufferRequest object -exists for this purpose.It is used in the following way.

The driver creates a TClientBufferRequest object -for each client request which may be outstanding concurrently: either one -per client or one per request as appropriate.
Whe a client makes a -request, the TClientBufferRequest object is set to contain -the address of any buffers used and the address of the client's TRequestStatus. -Doing so pins the contents of the buffers: they can be specified as descriptors -or by start address and length. This takes place in the client context.
The driver calls Kern::ThreadBufRead() and Kern::ThreadBufWrite() to access the client's buffers. This takes place in the context of the DFC.
When the request is -complete, the driver calls Kern::QueueBufferRequestComplete() passing -the TClientBufferRequest object. This signals the client -immediately and unpins the buffers.
When the client thread -next runs, the completion value is written back to the client along with the -updated length of any descriptors.

Using -Kern::RequestComplete()

The function Kern::RequestComplete() exists -in two versions:

static void Kern::RequestComplete(DThread* aThread, TRequestStatus*& aStatus, TInt aReason);

which is now deprecated, and its overloaded replacement

static void Kern::RequestComplete(TRequestStatus*& aStatus, TInt aReason);

The -overloaded version should always be used, as it does not take a thread pointer -argument.

Using -shared chunks

Shared chunks are a mechanism by which kernel side -code shares buffers with user side code. As an alternative to pinning memory -they have the following advantages:

Shared chunks cannot -be paged and therefore paging faults never arise.
Shared chunks transfer -data with a minimum number of copying operations and are useful where high -speeds and large volumes are required.

Shared chunks present disadvantages when a driver is being migrated -rather than written from scratch, as the client API must be rewritten as well -as the driver code.