• M
    Fix deadlock in ColumnFamilyData::InstallSuperVersion() · 97307d88
    Mike Kolupaev 提交于
    Summary:
    Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.
    
    This deadlock is hit all the time on our workload. It blocks our release.
    
    In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.
    
    So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.
    
    This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.
    
    I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
    Closes https://github.com/facebook/rocksdb/pull/3510
    
    Reviewed By: sagar0
    
    Differential Revision: D7005346
    
    Pulled By: al13n321
    
    fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
    97307d88
column_family.cc 47.2 KB