1. 02 3月, 2017 18 次提交
  2. 28 2月, 2017 6 次提交
  3. 25 2月, 2017 5 次提交
  4. 24 2月, 2017 1 次提交
    • J
      objtool: Improve detection of BUG() and other dead ends · d1091c7f
      Josh Poimboeuf 提交于
      The BUG() macro's use of __builtin_unreachable() via the unreachable()
      macro tells gcc that the instruction is a dead end, and that it's safe
      to assume the current code path will not execute past the previous
      instruction.
      
      On x86, the BUG() macro is implemented with the 'ud2' instruction.  When
      objtool's branch analysis sees that instruction, it knows the current
      code path has come to a dead end.
      
      Peter Zijlstra has been working on a patch to change the WARN macros to
      use 'ud2'.  That patch will break objtool's assumption that 'ud2' is
      always a dead end.
      
      Generally it's best for objtool to avoid making those kinds of
      assumptions anyway.  The more ignorant it is of kernel code internals,
      the better.
      
      So create a more generic way for objtool to detect dead ends by adding
      an annotation to the unreachable() macro.  The annotation stores a
      pointer to the end of the unreachable code path in an '__unreachable'
      section.  Objtool can read that section to find the dead ends.
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/41a6d33971462ebd944a1c60ad4bf5be86c17b77.1487712920.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d1091c7f
  5. 23 2月, 2017 2 次提交
  6. 22 2月, 2017 2 次提交
  7. 21 2月, 2017 6 次提交
    • W
      x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 · dd0fd8bc
      Waiman Long 提交于
      It was found when running fio sequential write test with a XFS ramdisk
      on a KVM guest running on a 2-socket x86-64 system, the %CPU times
      as reported by perf were as follows:
      
       69.75%  0.59%  fio  [k] down_write
       69.15%  0.01%  fio  [k] call_rwsem_down_write_failed
       67.12%  1.12%  fio  [k] rwsem_down_write_failed
       63.48% 52.77%  fio  [k] osq_lock
        9.46%  7.88%  fio  [k] __raw_callee_save___kvm_vcpu_is_preempt
        3.93%  3.93%  fio  [k] __kvm_vcpu_is_preempted
      
      Making vcpu_is_preempted() a callee-save function has a relatively
      high cost on x86-64 primarily due to at least one more cacheline of
      data access from the saving and restoring of registers (8 of them)
      to and from stack as well as one more level of function call.
      
      To reduce this performance overhead, an optimized assembly version
      of the the __raw_callee_save___kvm_vcpu_is_preempt() function is
      provided for x86-64.
      
      With this patch applied on a KVM guest on a 2-socket 16-core 32-thread
      system with 16 parallel jobs (8 on each socket), the aggregrate
      bandwidth of the fio test on an XFS ramdisk were as follows:
      
         I/O Type      w/o patch    with patch
         --------      ---------    ----------
         random read   8141.2 MB/s  8497.1 MB/s
         seq read      8229.4 MB/s  8304.2 MB/s
         random write  1675.5 MB/s  1701.5 MB/s
         seq write     1681.3 MB/s  1699.9 MB/s
      
      There are some increases in the aggregated bandwidth because of
      the patch.
      
      The perf data now became:
      
       70.78%  0.58%  fio  [k] down_write
       70.20%  0.01%  fio  [k] call_rwsem_down_write_failed
       69.70%  1.17%  fio  [k] rwsem_down_write_failed
       59.91% 55.42%  fio  [k] osq_lock
       10.14% 10.14%  fio  [k] __kvm_vcpu_is_preempted
      
      The assembly code was verified by using a test kernel module to
      compare the output of C __kvm_vcpu_is_preempted() and that of assembly
      __raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dd0fd8bc
    • W
      x86/paravirt: Change vcp_is_preempted() arg type to long · 6c62985d
      Waiman Long 提交于
      The cpu argument in the function prototype of vcpu_is_preempted()
      is changed from int to long. That makes it easier to provide a better
      optimized assembly version of that function.
      
      For Xen, vcpu_is_preempted(long) calls xen_vcpu_stolen(int), the
      downcast from long to int is not a problem as vCPU number won't exceed
      32 bits.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c62985d
    • C
      KVM: VMX: use correct vmcs_read/write for guest segment selector/base · 96794e4e
      Chao Peng 提交于
      Guest segment selector is 16 bit field and guest segment base is natural
      width field. Fix two incorrect invocations accordingly.
      
      Without this patch, build fails when aggressive inlining is used with ICC.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      96794e4e
    • A
      x86/kvm/vmx: Defer TR reload after VM exit · b7ffc44d
      Andy Lutomirski 提交于
      Intel's VMX is daft and resets the hidden TSS limit register to 0x67
      on VMX reload, and the 0x67 is not configurable.  KVM currently
      reloads TR using the LTR instruction on every exit, but this is quite
      slow because LTR is serializing.
      
      The 0x67 limit is entirely harmless unless ioperm() is in use, so
      defer the reload until a task using ioperm() is actually running.
      
      Here's some poorly done benchmarking using kvm-unit-tests:
      
      Before:
      
      cpuid 1313
      vmcall 1195
      mov_from_cr8 11
      mov_to_cr8 17
      inl_from_pmtimer 6770
      inl_from_qemu 6856
      inl_from_kernel 2435
      outl_to_kernel 1402
      
      After:
      
      cpuid 1291
      vmcall 1181
      mov_from_cr8 11
      mov_to_cr8 16
      inl_from_pmtimer 6457
      inl_from_qemu 6209
      inl_from_kernel 2339
      outl_to_kernel 1391
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      [Force-reload TR in invalidate_tss_limit. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7ffc44d
    • A
      x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss · d3273dea
      Andy Lutomirski 提交于
      Historically, the entire TSS + io bitmap structure was cacheline
      aligned, but commit ca241c75 ("x86: unify tss_struct") changed it
      (presumably inadvertently) so that the fixed-layout hardware part is
      cacheline-aligned and the io bitmap is after the padding.  This wastes
      24 bytes (the hardware part should be 104 bytes, but this pads it to
      128 bytes) and, serves no purpose, and causes sizeof(struct
      x86_hw_tss) to have a confusing value.
      
      Drop the pointless alignment.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d3273dea
    • A
      x86/kvm/vmx: Simplify segment_base() · 8c2e41f7
      Andy Lutomirski 提交于
      Use actual pointer types for pointers (instead of unsigned long) and
      replace hardcoded constants with the appropriate self-documenting
      macros.
      
      The function is still a bit messy, but this seems a lot better than
      before to me.
      
      This is mostly borrowed from a patch by Thomas Garnier.
      
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8c2e41f7