• C
    [IA64] relax per-cpu TLB requirement to DTC · 00b65985
    Chen, Kenneth W 提交于
    Instead of pinning per-cpu TLB into a DTR, use DTC.  This will free up
    one TLB entry for application, or even kernel if access pattern to
    per-cpu data area has high temporal locality.
    
    Since per-cpu is mapped at the top of region 7 address, we just need to
    add special case in alt_dtlb_miss.  The physical address of per-cpu data
    is already conveniently stored in IA64_KR(PER_CPU_DATA).  Latency for
    alt_dtlb_miss is not affected as we can hide all the latency.  It was
    measured that alt_dtlb_miss handler has 23 cycles latency before and
    after the patch.
    
    The performance effect is massive for applications that put lots of tlb
    pressure on CPU.  Workload environment like database online transaction
    processing or application uses tera-byte of memory would benefit the most.
    Measurement with industry standard database benchmark shown an upward
    of 1.6% gain.  While smaller workloads like cpu, java also showing small
    improvement.
    Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
    Signed-off-by: NTony Luck <tony.luck@intel.com>
    00b65985
mca_asm.S 26.1 KB