Device Driver Writing and Migration Technology Tutorial

Explains techniques for writing device drivers on data paged systems and migrating device drivers to data paged systems.

Impact of data paging on kernel APIs

The use of data paging impacts on the task of writing and migrating device drivers in two main ways: the preconditions for kernel API calls and the performance of the kernel APIs.

Firstly, kernel APIs which access user memory may only be called subject to preconditions. The preconditions are that

  • no fast mutex may be held while calling them, and

  • no kernel mutex may be held while calling them.

The APIs are these:

Impact of data paging on execution

Device drivers use kernel side APIs to access user memory, and even when they are called in accordance with their preconditions they are no longer guaranteed to execute in a short and bounded time. This is because they may access paged memory and incur page faults which propagate from one thread to another. This document discusses how to mitigate the impact of data paging on device drivers.

Mitigating data paging: general principles

Three general principles are involved in mitigating the impact of data paging on device drivers.

  • Device drivers should not use shared DFC queues.

  • Device drivers should, as far as possible, not access paged memory.

  • If a device driver needs to access paged memory, it should do so in the context of the client thread.

Driver frameworks

There are three main categories of device driver:

  • Boot-Loaded Non-Channel Drivers

    Boot loaded drivers are built as kernel extensions. They are typically simple device drivers such as keyboard drivers with limited or no client side interface and are not much impacted by data paging. It is generally safe for them to pass data structures using the HAL in the context of a kernel thread and for them to execute in the context of a kernel thread: however, this assumption must always be verified for individual cases.

  • Media Drivers

    Media drivers are both channel based drivers and kernel extensions. When written according to the recommended model they either execute wholly in the context of their clients or use a unique DFC queue and associated kernel thread. If these recommendations are followed, no additional measures to mitigate the impact of data paging are required.

  • Dynamically loaded channel based IO device drivers

    Channel based IO device drivers are based on various models: all are dynamically loaded. They are derived either from DLogicalChannelBase or DLogicalChannel.

    Channel based drivers derived from DLogicalChannelBase usually execute in the context of their client, mitigating the impact of data paging. Where they are multi-threaded, they typically create separate and unique kernel threads and do not use shared DFC queues, mitigating the impact of data paging: if they use a shared DFC queue and associated kernel thread, they are impacted by data paging and must be written so as to mitigate the effects. Channel based drivers derived from DLogicalChannel may communicate with the hardware directly (LDD to hardware) or indirectly (LDD to PDD to hardware). If a PDD is involved, mitigation of data paging should take place at that level and not the LDD. Channel based drivers may have single or multiple clients, channels and hardware. It is these drivers which require most work to mitigate the impact of data paging.

Mitigation techniques

The impact of data paging on device drivers is mitigated by the use of various different techniques which are the subject of the rest of this document.

Passing data by value

Clients should pass data by value not as pointers. Return values of calls should be return codes not data.

Using dedicated DFC queues

All drivers which use DFCs should use a dedicated DFC queue to service them. You should not use the kernel queues Kern::DfcQue0 and Kern::DfcQue1 for this purpose. How you create a dedicated DFC queue depends on the nature of the driver.

To service boot loaded drivers and media drivers, you create a DFC queue by calling Kern::DfcQueueCreate().

To service dynamically loaded drivers derived from DLogicalChannelBase you call Kern::DynamicDfcQCreate() with a TDynamicDfcQue as argument:

TInt Kern::DynamicDfcQCreate(TDynamicDfcQue*& aDfcQ, TInt aPriority, const TDesC& aBaseName);

To service a dynamically loaded driver derived from DLogicalChannel, you use the DFC queue supplied with it (the member iDfc, accessed by pointer). To use the queue you call the SetDfcQ() function during the second phase construction of the LDD.

You destroy queues by calling their function Destroy() which also terminates the associated thread.

Setting realtime state

The realtime state of a thread determines whether it is enabled to access paged memory. If a thread is realtime (its realtime state is on) it is guaranteed not to access paged memory, so avoiding unpredictable delays. The realtime state of a thread may be set to ERealtimeStateOn, ERealtimeStateOff and ERealtimeStateWarn as defined in the enumeration TThreadRealtimeState and set by the kernel function SetRealtimeState.

If a driver uses DFC threads and is subject to performance guarantees, their realtime state should be set to on (this is the default when data paging is enabled). Otherwise the state should be set to off: the warning state is used for debugging.

Validating arguments in client context

It is often necessary to validate the arguments of a request function. This should be done in the context of the client thread as far as possible.

When a driver derived from the class DLogicalChannelBase makes a request this happens automatically as a call to the Request() function takes place in the client thread. When the driver is derived from the class DLogicalChannel the request involves a call to the SendMsg() function inherited from the base class and it is necessary to override the base implementation to force evaluation of the arguments within the client thread.

Accessing user memory from client context

The DFC should access user memory as little as possible. Whenever there is a need to access user memory and it can be accessed in the context of the client thread, it should be.

When the driver is derived from the class DLogicalChannelBase, read and write operations on user memory can be performed with calls to the Request() function and these take place in the context of the client thread.

When the driver is derived from the class DLogicalChannel it is possible to read from and write to user memory by overriding the SendMsg() function before passing the message on to be processed by the DFC thread if necessary. If the message is passed on, data must be stored kernel side either on the client thread's kernel stack or in the channel object.

Message data can only be stored on the client thread's kernel stack if the message is synchronous and the size of the data is less than 4Kb. Since the stack is local to the client it can be used by more than one thread. One way of doing this is to implement SendMsg() with a call to SendControl() which is itself implemented to perform the copying in the client thread context and independently call the SendMsg() function of the parent class.

Where the message is asynchronous you can use a similar strategy for overriding the SendMsg() function but this time perform the copying to a buffer owned by the channel independently of a call to the SendMsg() function of the parent class. In this case the size of the data must be small (in the region of 4Kb), there must be only one client using the buffer, and data cannot be written back to the client.

Using TClientDataRequest

An asynchronous request often needs to copy a structure of fixed size to its client to complete a request. The TClientDataRequest object exists for this purpose: it writes a fixed size structure to user memory and completes the request in the following steps.

  1. The driver creates a TClientDataRequest object for each asynchronous client which may be outstanding concurrently: either one per client or one per request as appropriate.

  2. When the client makes a request the TClientDataRequest object is set to contain the address of the client's buffer or descriptor and the address of the client's TRequestStatus. This takes place in the client context.

  3. The data to be written is copied into the buffer of the TClientDataRequest object.

  4. A call to Kern::QueueRequestComplete() passes the address of the TClientDataRequest object.

  5. The client is signalled immediately.

  6. When the client thread next runs, the buffer contents and completion value are written to the client.

Using TClientBufferRequest

When it is necessary to access user memory from a DFC thread context, that memory must be pinned for the duration of the request and unpinned when the request is completed. The pinning must be performed in the context of the client thread. The TClientBufferRequest object exists for this purpose.It is used in the following way.

  1. The driver creates a TClientBufferRequest object for each client request which may be outstanding concurrently: either one per client or one per request as appropriate.

  2. Whe a client makes a request, the TClientBufferRequest object is set to contain the address of any buffers used and the address of the client's TRequestStatus. Doing so pins the contents of the buffers: they can be specified as descriptors or by start address and length. This takes place in the client context.

  3. The driver calls Kern::ThreadBufRead() and Kern::ThreadBufWrite() to access the client's buffers. This takes place in the context of the DFC.

  4. When the request is complete, the driver calls Kern::QueueBufferRequestComplete() passing the TClientBufferRequest object. This signals the client immediately and unpins the buffers.

  5. When the client thread next runs, the completion value is written back to the client along with the updated length of any descriptors.

Using Kern::RequestComplete()

The function Kern::RequestComplete() exists in two versions:

static void Kern::RequestComplete(DThread* aThread, TRequestStatus*& aStatus, TInt aReason);

which is now deprecated, and its overloaded replacement

static void Kern::RequestComplete(TRequestStatus*& aStatus, TInt aReason);

The overloaded version should always be used, as it does not take a thread pointer argument.

Using shared chunks

Shared chunks are a mechanism by which kernel side code shares buffers with user side code. As an alternative to pinning memory they have the following advantages:

  • Shared chunks cannot be paged and therefore paging faults never arise.

  • Shared chunks transfer data with a minimum number of copying operations and are useful where high speeds and large volumes are required.

Shared chunks present disadvantages when a driver is being migrated rather than written from scratch, as the client API must be rewritten as well as the driver code.