1. 11 3月, 2017 1 次提交
    • T
      kexec, x86/purgatory: Unbreak it and clean it up · 40c50c1f
      Thomas Gleixner 提交于
      The purgatory code defines global variables which are referenced via a
      symbol lookup in the kexec code (core and arch).
      
      A recent commit addressing sparse warnings made these static and thereby
      broke kexec_file.
      
      Why did this happen? Simply because the whole machinery is undocumented and
      lacks any form of forward declarations. The variable names are unspecific
      and lack a prefix, so adding forward declarations creates shadow variables
      in the core code. Aside of that the code relies on magic constants and
      duplicate struct definitions with no way to ensure that these things stay
      in sync. The section placement of the purgatory variables happened by
      chance and not by design.
      
      Unbreak kexec and cleanup the mess:
      
       - Add proper forward declarations and document the usage
       - Use common struct definition
       - Use the proper common defines instead of magic constants
       - Add a purgatory_ prefix to have a proper name space
       - Use ARRAY_SIZE() instead of a homebrewn reimplementation
       - Add proper sections to the purgatory variables [ From Mike ]
      
      Fixes: 72042a8c ("x86/purgatory: Make functions and variables static")
      Reported-by: NMike Galbraith <&lt;efault@gmx.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Nicholas Mc Guire <der.herr@hofr.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Tobin C. Harding" <me@tobin.cc>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      40c50c1f
  2. 10 3月, 2017 2 次提交
  3. 06 3月, 2017 1 次提交
  4. 03 3月, 2017 1 次提交
    • I
      sched/headers: Move task->mm handling methods to <linux/sched/mm.h> · 68e21be2
      Ingo Molnar 提交于
      Move the following task->mm helper APIs into a new header file,
      <linux/sched/mm.h>, to further reduce the size and complexity
      of <linux/sched.h>.
      
      Here are how the APIs are used in various kernel files:
      
        # mm_alloc():
        arch/arm/mach-rpc/ecard.c
        fs/exec.c
        include/linux/sched/mm.h
        kernel/fork.c
      
        # __mmdrop():
        arch/arc/include/asm/mmu_context.h
        include/linux/sched/mm.h
        kernel/fork.c
      
        # mmdrop():
        arch/arm/mach-rpc/ecard.c
        arch/m68k/sun3/mmu_emu.c
        arch/x86/mm/tlb.c
        drivers/gpu/drm/amd/amdkfd/kfd_process.c
        drivers/gpu/drm/i915/i915_gem_userptr.c
        drivers/infiniband/hw/hfi1/file_ops.c
        drivers/vfio/vfio_iommu_spapr_tce.c
        fs/exec.c
        fs/proc/base.c
        fs/proc/task_mmu.c
        fs/proc/task_nommu.c
        fs/userfaultfd.c
        include/linux/mmu_notifier.h
        include/linux/sched/mm.h
        kernel/fork.c
        kernel/futex.c
        kernel/sched/core.c
        mm/khugepaged.c
        mm/ksm.c
        mm/mmu_context.c
        mm/mmu_notifier.c
        mm/oom_kill.c
        virt/kvm/kvm_main.c
      
        # mmdrop_async_fn():
        include/linux/sched/mm.h
      
        # mmdrop_async():
        include/linux/sched/mm.h
        kernel/fork.c
      
        # mmget_not_zero():
        fs/userfaultfd.c
        include/linux/sched/mm.h
        mm/oom_kill.c
      
        # mmput():
        arch/arc/include/asm/mmu_context.h
        arch/arc/kernel/troubleshoot.c
        arch/frv/mm/mmu-context.c
        arch/powerpc/platforms/cell/spufs/context.c
        arch/sparc/include/asm/mmu_context_32.h
        drivers/android/binder.c
        drivers/gpu/drm/etnaviv/etnaviv_gem.c
        drivers/gpu/drm/i915/i915_gem_userptr.c
        drivers/infiniband/core/umem.c
        drivers/infiniband/core/umem_odp.c
        drivers/infiniband/core/uverbs_main.c
        drivers/infiniband/hw/mlx4/main.c
        drivers/infiniband/hw/mlx5/main.c
        drivers/infiniband/hw/usnic/usnic_uiom.c
        drivers/iommu/amd_iommu_v2.c
        drivers/iommu/intel-svm.c
        drivers/lguest/lguest_user.c
        drivers/misc/cxl/fault.c
        drivers/misc/mic/scif/scif_rma.c
        drivers/oprofile/buffer_sync.c
        drivers/vfio/vfio_iommu_type1.c
        drivers/vhost/vhost.c
        drivers/xen/gntdev.c
        fs/exec.c
        fs/proc/array.c
        fs/proc/base.c
        fs/proc/task_mmu.c
        fs/proc/task_nommu.c
        fs/userfaultfd.c
        include/linux/sched/mm.h
        kernel/cpuset.c
        kernel/events/core.c
        kernel/events/uprobes.c
        kernel/exit.c
        kernel/fork.c
        kernel/ptrace.c
        kernel/sys.c
        kernel/trace/trace_output.c
        kernel/tsacct.c
        mm/memcontrol.c
        mm/memory.c
        mm/mempolicy.c
        mm/migrate.c
        mm/mmu_notifier.c
        mm/nommu.c
        mm/oom_kill.c
        mm/process_vm_access.c
        mm/rmap.c
        mm/swapfile.c
        mm/util.c
        virt/kvm/async_pf.c
      
        # mmput_async():
        include/linux/sched/mm.h
        kernel/fork.c
        mm/oom_kill.c
      
        # get_task_mm():
        arch/arc/kernel/troubleshoot.c
        arch/powerpc/platforms/cell/spufs/context.c
        drivers/android/binder.c
        drivers/gpu/drm/etnaviv/etnaviv_gem.c
        drivers/infiniband/core/umem.c
        drivers/infiniband/core/umem_odp.c
        drivers/infiniband/hw/mlx4/main.c
        drivers/infiniband/hw/mlx5/main.c
        drivers/infiniband/hw/usnic/usnic_uiom.c
        drivers/iommu/amd_iommu_v2.c
        drivers/iommu/intel-svm.c
        drivers/lguest/lguest_user.c
        drivers/misc/cxl/fault.c
        drivers/misc/mic/scif/scif_rma.c
        drivers/oprofile/buffer_sync.c
        drivers/vfio/vfio_iommu_type1.c
        drivers/vhost/vhost.c
        drivers/xen/gntdev.c
        fs/proc/array.c
        fs/proc/base.c
        fs/proc/task_mmu.c
        include/linux/sched/mm.h
        kernel/cpuset.c
        kernel/events/core.c
        kernel/exit.c
        kernel/fork.c
        kernel/ptrace.c
        kernel/sys.c
        kernel/trace/trace_output.c
        kernel/tsacct.c
        mm/memcontrol.c
        mm/memory.c
        mm/mempolicy.c
        mm/migrate.c
        mm/mmu_notifier.c
        mm/nommu.c
        mm/util.c
      
        # mm_access():
        fs/proc/base.c
        include/linux/sched/mm.h
        kernel/fork.c
        mm/process_vm_access.c
      
        # mm_release():
        arch/arc/include/asm/mmu_context.h
        fs/exec.c
        include/linux/sched/mm.h
        include/uapi/linux/sched.h
        kernel/exit.c
        kernel/fork.c
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      68e21be2
  5. 02 3月, 2017 18 次提交
  6. 01 3月, 2017 7 次提交
  7. 28 2月, 2017 2 次提交
  8. 25 2月, 2017 2 次提交
  9. 24 2月, 2017 1 次提交
    • J
      objtool: Improve detection of BUG() and other dead ends · d1091c7f
      Josh Poimboeuf 提交于
      The BUG() macro's use of __builtin_unreachable() via the unreachable()
      macro tells gcc that the instruction is a dead end, and that it's safe
      to assume the current code path will not execute past the previous
      instruction.
      
      On x86, the BUG() macro is implemented with the 'ud2' instruction.  When
      objtool's branch analysis sees that instruction, it knows the current
      code path has come to a dead end.
      
      Peter Zijlstra has been working on a patch to change the WARN macros to
      use 'ud2'.  That patch will break objtool's assumption that 'ud2' is
      always a dead end.
      
      Generally it's best for objtool to avoid making those kinds of
      assumptions anyway.  The more ignorant it is of kernel code internals,
      the better.
      
      So create a more generic way for objtool to detect dead ends by adding
      an annotation to the unreachable() macro.  The annotation stores a
      pointer to the end of the unreachable code path in an '__unreachable'
      section.  Objtool can read that section to find the dead ends.
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/41a6d33971462ebd944a1c60ad4bf5be86c17b77.1487712920.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d1091c7f
  10. 21 2月, 2017 3 次提交
    • W
      x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 · dd0fd8bc
      Waiman Long 提交于
      It was found when running fio sequential write test with a XFS ramdisk
      on a KVM guest running on a 2-socket x86-64 system, the %CPU times
      as reported by perf were as follows:
      
       69.75%  0.59%  fio  [k] down_write
       69.15%  0.01%  fio  [k] call_rwsem_down_write_failed
       67.12%  1.12%  fio  [k] rwsem_down_write_failed
       63.48% 52.77%  fio  [k] osq_lock
        9.46%  7.88%  fio  [k] __raw_callee_save___kvm_vcpu_is_preempt
        3.93%  3.93%  fio  [k] __kvm_vcpu_is_preempted
      
      Making vcpu_is_preempted() a callee-save function has a relatively
      high cost on x86-64 primarily due to at least one more cacheline of
      data access from the saving and restoring of registers (8 of them)
      to and from stack as well as one more level of function call.
      
      To reduce this performance overhead, an optimized assembly version
      of the the __raw_callee_save___kvm_vcpu_is_preempt() function is
      provided for x86-64.
      
      With this patch applied on a KVM guest on a 2-socket 16-core 32-thread
      system with 16 parallel jobs (8 on each socket), the aggregrate
      bandwidth of the fio test on an XFS ramdisk were as follows:
      
         I/O Type      w/o patch    with patch
         --------      ---------    ----------
         random read   8141.2 MB/s  8497.1 MB/s
         seq read      8229.4 MB/s  8304.2 MB/s
         random write  1675.5 MB/s  1701.5 MB/s
         seq write     1681.3 MB/s  1699.9 MB/s
      
      There are some increases in the aggregated bandwidth because of
      the patch.
      
      The perf data now became:
      
       70.78%  0.58%  fio  [k] down_write
       70.20%  0.01%  fio  [k] call_rwsem_down_write_failed
       69.70%  1.17%  fio  [k] rwsem_down_write_failed
       59.91% 55.42%  fio  [k] osq_lock
       10.14% 10.14%  fio  [k] __kvm_vcpu_is_preempted
      
      The assembly code was verified by using a test kernel module to
      compare the output of C __kvm_vcpu_is_preempted() and that of assembly
      __raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dd0fd8bc
    • W
      x86/paravirt: Change vcp_is_preempted() arg type to long · 6c62985d
      Waiman Long 提交于
      The cpu argument in the function prototype of vcpu_is_preempted()
      is changed from int to long. That makes it easier to provide a better
      optimized assembly version of that function.
      
      For Xen, vcpu_is_preempted(long) calls xen_vcpu_stolen(int), the
      downcast from long to int is not a problem as vCPU number won't exceed
      32 bits.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c62985d
    • A
      x86/kvm/vmx: Defer TR reload after VM exit · b7ffc44d
      Andy Lutomirski 提交于
      Intel's VMX is daft and resets the hidden TSS limit register to 0x67
      on VMX reload, and the 0x67 is not configurable.  KVM currently
      reloads TR using the LTR instruction on every exit, but this is quite
      slow because LTR is serializing.
      
      The 0x67 limit is entirely harmless unless ioperm() is in use, so
      defer the reload until a task using ioperm() is actually running.
      
      Here's some poorly done benchmarking using kvm-unit-tests:
      
      Before:
      
      cpuid 1313
      vmcall 1195
      mov_from_cr8 11
      mov_to_cr8 17
      inl_from_pmtimer 6770
      inl_from_qemu 6856
      inl_from_kernel 2435
      outl_to_kernel 1402
      
      After:
      
      cpuid 1291
      vmcall 1181
      mov_from_cr8 11
      mov_to_cr8 16
      inl_from_pmtimer 6457
      inl_from_qemu 6209
      inl_from_kernel 2339
      outl_to_kernel 1391
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      [Force-reload TR in invalidate_tss_limit. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7ffc44d
  11. 14 2月, 2017 1 次提交
  12. 11 2月, 2017 1 次提交