• T
    KVM: Switch to srcu-less get_dirty_log() · 60c34612
    Takuya Yoshikawa 提交于
    We have seen some problems of the current implementation of
    get_dirty_log() which uses synchronize_srcu_expedited() for updating
    dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms
    order of latency when we use VGA displays.
    
    Furthermore the recent discussion on the following thread
        "srcu: Implement call_srcu()"
        http://lkml.org/lkml/2012/1/31/211
    also motivated us to implement get_dirty_log() without SRCU.
    
    This patch achieves this goal without sacrificing the performance of
    both VGA and live migration: in practice the new code is much faster
    than the old one unless we have too many dirty pages.
    
    Implementation:
    
    The key part of the implementation is the use of xchg() operation for
    clearing dirty bits atomically.  Since this allows us to update only
    BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap
    until every dirty bit is cleared again for the next call.
    
    Although some people may worry about the problem of using the atomic
    memory instruction many times to the concurrently accessible bitmap,
    it is usually accessed with mmu_lock held and we rarely see concurrent
    accesses: so what we need to care about is the pure xchg() overheads.
    
    Another point to note is that we do not use for_each_set_bit() to check
    which ones in each BITS_PER_LONG pages are actually dirty.  Instead we
    simply use __ffs() in a loop.  This is much faster than repeatedly call
    find_next_bit().
    
    Performance:
    
    The dirty-log-perf unit test showed nice improvements, some times faster
    than before, except for some extreme cases; for such cases the speed of
    getting dirty page information is much faster than we process it in the
    userspace.
    
    For real workloads, both VGA and live migration, we have observed pure
    improvements: when the guest was reading a file during live migration,
    we originally saw a few ms of latency, but with the new method the
    latency was less than 200us.
    Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
    Signed-off-by: NAvi Kivity <avi@redhat.com>
    60c34612
x86.c 163.6 KB