• A
    x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write · 0d1622d7
    Avi Kivity 提交于
    The Intel Architecture Optimization Reference Manual states that a short
    load that follows a long store to the same object will suffer a store
    forwading penalty, particularly if the two accesses use different addresses.
    Trivially, a long load that follows a short store will also suffer a penalty.
    
    __downgrade_write() in rwsem incurs both penalties:  the increment operation
    will not be able to reuse a recently-loaded rwsem value, and its result will
    not be reused by any recently-following rwsem operation.
    
    A comment in the code states that this is because 64-bit immediates are
    special and expensive; but while they are slightly special (only a single
    instruction allows them), they aren't expensive: a test shows that two loops,
    one loading a 32-bit immediate and one loading a 64-bit immediate, both take
    1.5 cycles per iteration.
    
    Fix this by changing __downgrade_write to use the same add instruction on
    i386 and on x86_64, so that it uses the same operand size as all the other
    rwsem functions.
    Signed-off-by: NAvi Kivity <avi@redhat.com>
    LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com>
    Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
    0d1622d7
rwsem.h 7.5 KB