In my previous note, I discussed about writing a concurrent stack.
In this post, I am going to describe writing a non-blocking queue using the Interlocked family of functions like my previous post. However, this time the implementation is in C# (Being an automatically garbage collected language, the ABA problem should not occur, at least I did not see it in my testing).
Concurrent Queue
A concurrent queue is basically a queue which provides protection against multiple threads mutating its state and thus causing inconsistencies.
A naive way to implement a concurrent queue may be to just slap locks in its enqueue and dequeue functions when they try to modify the head and tail. In the GitHub page, I have provided a sample implementation of this approach in the class ConcurrentQueueUsingLocks. A more sophisticated implementation may be to use condition variables as means for the enqueue and dequeue functions to communicate.
The complexity increases significantly when we try to go lock-free. There are many implementations that you can find on the web for a lock-free queue. In this post, I will show a sample implementation and peek into some of the race conditions that may occur and how the code provides protection against them. In general, simulating race conditions is hard since we don’t have control over when the CPU will schedule a thread. However, if we are inside a debugger, we can ‘freeze’ and ‘thaw’ threads to get an understanding of what may occur in a real world scenario. Below are the enqueue and dequeue functions reproduced for convenience.
Enqueue
Dequeue
Let us see an example when a thread might be suspended before it has got the chance to update the tail of the queue. To do so, call the enqueue function twice on two different threads. When the first thread reaches the last CompareExchange call, we ‘freeze’ it:
This means that although the next node to tail is the new node just created, the tail itself has not been modified it. After freezing this thread, the second thread runs:
As you can see, the tail does not point to null. We atomically fix the tail.
A similar exercise can be performed for the dequeue operation. However, it gets a little tricky there. What we need to do is, call enqueue on one thread and when that thread reaches the last CompareExchange instruction, we freeze it and give the other thread which performs dequeue a chance to run (Notice the placement of the breakpoints):
Press F5:
enter image description here
Although both the head and the tail are the same, tail’s next is not null! This is a real possibility when the CPU suspends a thread just before it has had a chance to update the tail and schedules another thread which may perform a dequeue operation. In this case we need to atomically update the tail and continue with our loop.
You can download the code from my GitHub page and play around with the code to simulate the other race conditions that may occur (according to the comments).
Comparison with the blocking queue
To compare the blocking and the non-blocking queues, I wrote a quick and dirty test which apart from measuring the running time, also tests for correctness. The idea is to enqueue a set of known values and then pop them. Since we know the values that we enqueued, we can test for two things:
All values are popped. No value is popped twice. After verifying that both versions worked correctly for one million integers to 10 million integers, I found that the time taken by the non-blocking queue was not much less than the blocking queue if the number of threads was small (2). However, with around 8-10 threads, the non-blocking queue out performed the blocking queue.
(Sorry my Excel skills are not that great, I would have produced a graph otherwise ;))
Hope you liked this post!