多线程队列的算法优化

多线程队列(Concurrent Queue)的使用场合非常多,高性能服务器中的消息队列,并行算法中的Work Stealing等都离不开它。对于一个队列来说有两个最主要的动作:添加(enqueue)和删除(dequeue)节点。在一个(或多个)线程在对一个队列进行enqueue操作的同时可能会有一个(或多个)线程对这个队列进行dequeue操作。因为enqueue和dequeue都是对同一个队列里的节点进行操作,为了保证线程安全,一般在实现中都会在队列的结构体中加入一个队列锁(典型的如pthread_mutex_t q_lock),在进行enqueue和dequeue时都会先锁住这个锁以锁住整个队列然后再进行相关的操作。这样的设计如果实现的好的话一般性能就会很不错了。但是它其实有一个潜在的性能瓶颈,导致在线程数增多时极大的影响多线程程序的性能。

阅读全文>>

How to do performance analysis on your parallelized program efficiently?

Be a scientist: Gather data. Analyze it. Especially when it comes to parallelism and scalability, there’s just no substitute for the advice to measure, measure, measure, and understand what the results mean. Putting together test harnesses and generating and analyzing numbers is work, but the work will reward you with a priceless understanding of how your code actually runs, especially on parallel hardware—an understanding you will never gain from just reading the code or in any other way. And then, at the end, you will ship high-quality parallel code not because you think it’s fast enough, but because you know under what circumstances it is and isn’t (there will always be an “isn’t”), and why.

Herb Sutter

阅读全文>>