SMP Troubleshooting Guide

This document describes a few common error patterns to help you identify and correct some of the problems you may have in writing SMP-safe code.

There are two main classes of programming errors that can occur in code running on an SMP device:
  • unsafe assumptions about thread execution order

  • unsafe assumptions about memory/data sharing.

These errors can cause deadlocks or panics on a SMP system, or may cause data corruption or unexpected results.

Execution order assumptions

Most of the errors found in code so far are not unique to SMP, but are more likely to be found on an SMP system. Some existing code makes assumptions about how threads are scheduled and how thread priority affects the sequence of operations across multiple threads.

For example, one thread makes a request on a server. The developer assumes that the server will complete the request before the calling thread became active again. This is a dangerous assumption as the server might not be ready to run, or could pause while waiting for a resource. This would mean that the scheduler could continue executing the original calling thread before the server has completed the request.

The above example would be even more likely to be an unsafe assumption on an SMP system because the scheduler could run the server on one CPU core and the original calling thread on another at the same time. This is why it is important for the calling thread to be written in a thread-safe manner which waits for the server to complete the request before continuing into code that requires that request to be complete.

Shared data/memory

Errors may occur if multiple threads try to update or delete the same memory areas at the same time.

Good programming practice, whether for single core or multiple core, is to lock objects while they are being updated to prevent them being read by another thread in the middle of an update. For kernel-side code use a lock to ensure that a particular thread has exclusive access to data and interrupts.

Flexible Memory Model Problems

One problem that has been found, though not specifically SMP related, is where the flexible memory model is being used. For global chunks, memory locations are accessed as an offset to a base address. However, each thread may have a different base address for the global chunk: if applications use absolute addresses, they will read the wrong data.