Biz & IT —

Transactional memory going mainstream with Intel Haswell

Transactional memory is a promising technique for making the development of …

This is actually a Sandy Bridge wafer shot. I couldn't find one of Haswell.
This is actually a Sandy Bridge wafer shot. I couldn't find one of Haswell.

Intel has announced that its Haswell architecture, due to ship some time in 2013, will include hardware support for transactional memory.

Transactional memory is a promising technique designed to make the creation of reliable multithreaded programs easier. It does this by using a transactional model wherein complex operations can be performed concurrently, in isolation from each other, with those operations either completing or being undone as if they'd never been started—a model that developers are already familiar with from database programming.

Transactional memory refresher

For a longer explanation of transactional memory, read our coverage of IBM's hardware transactional memory in the BlueGene/Q supercomputer. Here's a rough outline:

In conventional multithreaded software, programs protect shared resources (which may be files on disk, data held in memory, network connections, or anything else) with "locks." Only one thread can hold a lock at any one time, so it can ensure that no other thread is modifying the shared resource at the same time. This tends to be pessimistic: the thread with the lock prevents any threads from from taking the lock, even if they only want to read the shared resource or make a non-conflicting update to it, such as adding two different entries into a dictionary.

With transactional memory, threads no longer need to take out locks when manipulating data structures held in memory. They start a transaction before attempting any modification to the structure, make their changes, and when they've finished, commit the transaction. During the transaction, the transactional memory system takes note of all the memory that the thread reads and writes.

When the transaction is committed, the system checks that no other thread made any changes to the memory the transaction used. If there were no changes, the transaction is committed and the thread continues. If there were changes, the transaction is aborted, and all its changes are undone. The thread can then retry the operation, try a different strategy (for example, one that uses locks), or give up entirely.

Intel's dual-pronged approach

Haswell's transactional support, which Intel is calling Transactional Synchronization Extensions (TSX), come in two parts. Hardware Lock Elision (HLE) allows easy conversion of lock-based programs into transactional programs in a way that's backwards compatible with current processors. Restricted Transactional Memory (RTM) is a more complete transactional memory implementation.

At their core, operating systems implement locks with pieces of memory, typically using the processor's natural integer type (so 32-bit for 32-bit operating systems, 64-bit for 64-bit ones). The thread taking the lock (and blocking all the other threads) does something to this piece of memory, typically something along the lines of incrementing its value from 0 to 1. To release the lock, the operation is reversed (so for example, decrementing from 1 to 0).

These modifications are visible to every processor core (and thread) in the system, and if another thread sees that the value in our example is 1, it knows it cannot take the lock, and must instead wait for it to go back to 0.

Hardware Lock Elision

HLE modifies this by slightly altering the instructions used to change the piece of memory. Specifically, it uses prefixes—instructions that do nothing in and of itself, but which change the interpretation of the next instruction—to modify the instructions that acquire and release the lock. Acquisitions are prefixed with an instruction called XACQUIRE, releases one called XRELEASE. With the prefixes, the attempt to acquire the lock succeeds—the thread thinks it has the lock and continue processing—but no global change is made to the lock value itself. So the thread with the lock will think its value is 1, but every other thread in the system will still see it as 0. This allows other threads to simultaneously also acquire the lock—and again, the processor will lie to them, so they see the lock as being held, but no other thread does.

This explains HLE's name: instead of the value going from 0 to 1 and back to 0, it just stays at 0. The "unnecessary" write gets elided.

Between acquiring the lock and releasing it, the processor tracks all the memory that the threads read and write. If something conflicts—if two threads modify the same piece of memory, or one thread reads a value that a second thread changes—then the processor will abort the transaction when the lock is released. If there are no conflicts, however, execution continues normally.

In this way, HLE allows conventional lock-based code to run optimistically. Each thread will still think it's obtained the lock, but the threads will be allowed to run simultaneously, and as long as it's safe, there won't be any aborted transactions.

The system is particularly clever because it's backwards compatible. The operations to manipulate the lock are still there, but with a special prefix. Haswell processors will honour the prefix and use transactional execution instead of the lock manipulations. Every other processor will ignore the prefix and just manipulate the lock, falling back to the traditional lock-based behavior. This is because XACQUIRE and XRELEASE are reusing prefixes that already exist, but are meaningless (and hence ignored) except when paired with a few specific instructions—instructions that wouldn't be used for implementing a lock anyway.

With HLE, it will then be possible to write programs and operating systems that will use transactions on Haswell, and hence achieve greater concurrency and have fewer threads waiting around for locks, but will still run correctly on current processors. In turn, this makes adoption of the feature much simpler and safer.

Restricted Transactional Memory

RTM is more involved and sheds that backwards compatibility. Where HLE implicitly uses transactions to allow lock-based code to run concurrently, RTM makes the starting, committing, and aborting transactions explicit. Whenever a thread starts a transaction with the new XBEGIN instruction, it also specifies a "fallback" routine that will execute if the transaction fails. When the transaction is ended, with the new XEND instruction, the processor commits the transaction if there were no conflicts, or aborts it and switches to the fallback routine if there were. Transactions can also be aborted explicitly by software with a new XABORT instruction.

Thanks to its explicit transaction boundaries and fallback feature, RTM allows much finer control of the transactions than HLE. In the longer term, with software support for transactions (an area that Intel is working on; the company has a proposal for C++ syntax for controlling transactions), RTM could provide a natural way of implementing transactional features.

Technically, Intel is stopping short of promising to actually implement full transactional memory. The documentation of the new instructions notes that "the hardware provides no guarantees as to whether an RTM region will ever successfully commit transactionally". An implementation could, in principle, always use the fallback path. However, the company expects Haswell to support transactions at least most of the time, provided the applications do not attempt to perform certain forbidden operations within a transaction (such as changing the processor's mode).

Intel's transactional memory might even one-up IBM's. IBM's system only supports transactions within a single (multicore) processor; Intel's documentation appears not to impose any such restriction on TSX.

Until now, transactional memory has been a technique best described as "experimental." The theoretical gains—a simpler programming model that allows much greater concurrency than lock-based systems—are well-known, but practical (software-based) implementations have offset those gains due to their poor performance. Even IBM's implementation is designed with an eye on experimentation to see if transactional memory is useful in practice. But with Intel planning to include the feature in a mainstream, mass-market processor, that changes: transactional memory will start being used for real. For parallel programmers, that's an exciting prospect indeed.

Listing image by Photograph by Intel Brasil

Channel Ars Technica