1. 14 4月, 2016 3 次提交
    • M
      powerpc/livepatch: Add live patching support on ppc64le · 85baa095
      Michael Ellerman 提交于
      Add the kconfig logic & assembly support for handling live patched
      functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
      depends on the new -mprofile-kernel ftrace ABI, which is only supported
      currently on ppc64le.
      
      Live patching is handled by a special ftrace handler. This means it runs
      from ftrace_caller(). The live patch handler modifies the NIP so as to
      redirect the return from ftrace_caller() to the new patched function.
      
      However there is one particularly tricky case we need to handle.
      
      If a function A calls another function B, and it is known at link time
      that they share the same TOC, then A will not save or restore its TOC,
      and will call the local entry point of B.
      
      When we live patch B, we replace it with a new function C, which may
      not have the same TOC as A. At live patch time it's too late to modify A
      to do the TOC save/restore, so the live patching code must interpose
      itself between A and C, and do the TOC save/restore that A omitted.
      
      An additionaly complication is that the livepatch code can not create a
      stack frame in order to save the TOC. That is because if C takes > 8
      arguments, or is varargs, A will have written the arguments for C in
      A's stack frame.
      
      To solve this, we introduce a "livepatch stack" which grows upward from
      the base of the regular stack, and is used to store the TOC & LR when
      calling a live patched function.
      
      When the patched function returns, we retrieve the real LR & TOC from
      the livepatch stack, restore them, and pop the livepatch "stack frame".
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      85baa095
    • M
      powerpc/livepatch: Add livepatch stack to struct thread_info · 5d31a96e
      Michael Ellerman 提交于
      In order to support live patching we need to maintain an alternate
      stack of TOC & LR values. We use the base of the stack for this, and
      store the "live patch stack pointer" in struct thread_info.
      
      Unlike the other fields of thread_info, we can not statically initialise
      that value, so it must be done at run time.
      
      This patch just adds the code to support that, it is not enabled until
      the next patch which actually adds live patch support.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      5d31a96e
    • M
      powerpc/livepatch: Add livepatch header · f63e6d89
      Michael Ellerman 提交于
      Add the powerpc specific livepatch definitions. In particular we provide
      a non-default implementation of klp_get_ftrace_location().
      
      This is required because the location of the mcount call is not constant
      when using -mprofile-kernel (which we always do for live patching).
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f63e6d89
  2. 09 3月, 2016 2 次提交
    • B
      PCI: Include pci/hotplug Kconfig directly from pci/Kconfig · e7e127e3
      Bjorn Helgaas 提交于
      Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't
      have to source both pci/Kconfig and pci/hotplug/Kconfig.
      
      Note that this effectively adds pci/hotplug/Kconfig to the following
      arches, because they already sourced drivers/pci/Kconfig but they
      previously did not source drivers/pci/hotplug/Kconfig:
      
        alpha
        arm
        avr32
        frv
        m68k
        microblaze
        mn10300
        sparc
        unicore32
      
      Inspired-by-patch-from: Bogicevic Sasa <brutallesale@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      e7e127e3
    • B
      PCI: Include pci/pcie/Kconfig directly from pci/Kconfig · 5f8fc432
      Bogicevic Sasa 提交于
      Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't
      have to source both pci/Kconfig and pci/pcie/Kconfig.
      
      Note that this effectively adds pci/pcie/Kconfig to the following
      arches, because they already sourced drivers/pci/Kconfig but they
      previously did not source drivers/pci/pcie/Kconfig:
      
        alpha
        avr32
        blackfin
        frv
        m32r
        m68k
        microblaze
        mn10300
        parisc
        sparc
        unicore32
        xtensa
      
      [bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace]
      Signed-off-by: NSasa Bogicevic <brutallesale@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      5f8fc432
  3. 08 3月, 2016 2 次提交
  4. 07 3月, 2016 8 次提交
    • T
      powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel · 8c50b72a
      Torsten Duwe 提交于
      Firstly we add logic to Kconfig to allow a user to choose if they want
      mprofile-kernel. This has to be user-selectable because only some
      current toolchains support it. If we enabled it unconditionally we would
      prevent some users from building the kernel entirely.
      
      Arguably it would be nice if we could detect if mprofile-kernel was
      available, and use it then. However that would violate the principle of
      least surprise because a user having choosen options such as live
      patching, would then see them quietly disabled at build time.
      
      We also make the user selectable option negative, ie. it disables when
      selected, so that allyesconfig continues to build on old toolchains.
      
      Once we've decided we do want to use mprofile-kernel, we then add a
      script which checks it actually works. That is because there are
      versions of gcc that accept the flag but don't generate correct code.
      
      Due to the way kconfig works, we can't error out when we detect a
      non-working toolchain. If we did a user would never be able to modify
      their config and run oldconfig - because the check would block oldconfig
      from running. Instead we emit a warning and add a bogus flag to CFLAGS
      so that the build will fail.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8c50b72a
    • T
      powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI · 15308664
      Torsten Duwe 提交于
      The gcc switch -mprofile-kernel defines a new ABI for calling _mcount()
      very early in the function with minimal overhead.
      
      Although mprofile-kernel has been available since GCC 3.4, there were
      bugs which were only fixed recently. Currently it is known to work in
      GCC 4.9, 5 and 6.
      
      Additionally there are two possible code sequences generated by the
      flag, the first uses mflr/std/bl and the second is optimised to omit the
      std. Currently only gcc 6 has the optimised sequence. This patch
      supports both sequences.
      
      Initial work started by Vojtech Pavlik, used with permission.
      
      Key changes:
       - rework _mcount() to work for both the old and new ABIs.
       - implement new versions of ftrace_caller() and ftrace_graph_caller()
         which deal with the new ABI.
       - updates to __ftrace_make_nop() to recognise the new mcount calling
         sequence.
       - updates to __ftrace_make_call() to recognise the nop'ed sequence.
       - implement ftrace_modify_call().
       - updates to the module loader to surpress the toc save in the module
         stub when calling mcount with the new ABI.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      15308664
    • T
      powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace · 9a7841ae
      Torsten Duwe 提交于
      Rather than open-coding -pg whereever we want to disable ftrace, use the
      existing $(CC_FLAGS_FTRACE) variable.
      
      This has the advantage that it will work in future when we use a
      different set of flags to enable ftrace.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9a7841ae
    • T
      powerpc/ftrace: Use generic ftrace_modify_all_code() · c96f8385
      Torsten Duwe 提交于
      Convert powerpc's arch_ftrace_update_code() from its own version to use
      the generic default functionality (without stop_machine -- our
      instructions are properly aligned and the replacements atomic).
      
      With this we gain error checking and the much-needed function_trace_op
      handling.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c96f8385
    • M
      powerpc/module: Create a special stub for ftrace_caller() · 336a7b5d
      Michael Ellerman 提交于
      In order to support the new -mprofile-kernel ABI, we need to be able to
      call from the module back to ftrace_caller() (in the kernel) without
      using the module's r2. That is because the function in this module which
      is calling ftrace_caller() may not have setup r2, if it doesn't
      otherwise need it (ie. it accesses no globals).
      
      To make that work we add a new stub which is used for calling
      ftrace_caller(), which uses the kernel toc instead of the module toc.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      336a7b5d
    • M
      powerpc/module: Mark module stubs with a magic value · f17c4e01
      Michael Ellerman 提交于
      When a module is loaded, calls out to the kernel go via a stub which is
      generated at runtime. One of these stubs is used to call _mcount(),
      which is the default target of tracing calls generated by the compiler
      with -pg.
      
      If dynamic ftrace is enabled (which it typically is), another stub is
      used to call ftrace_caller(), which is the target of tracing calls when
      ftrace is actually active.
      
      ftrace then wants to disable the calls to _mcount() at module startup,
      and enable/disable the calls to ftrace_caller() when enabling/disabling
      tracing - all of these it does by patching the code.
      
      As part of that code patching, the ftrace code wants to confirm that the
      branch it is about to modify, is in fact a call to a module stub which
      calls _mcount() or ftrace_caller().
      
      Currently it does that by inspecting the instructions and confirming
      they are what it expects. Although that works, the code to do it is
      pretty intricate because it requires lots of knowledge about the exact
      format of the stub.
      
      We can make that process easier by marking the generated stubs with a
      magic value, and then looking for that magic value. Altough this is not
      as rigorous as the current method, I believe it is sufficient in
      practice.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f17c4e01
    • M
      powerpc/module: Only try to generate the ftrace_caller() stub once · 136cd345
      Michael Ellerman 提交于
      Currently we generate the module stub for ftrace_caller() at the bottom
      of apply_relocate_add(). However apply_relocate_add() is potentially
      called more than once per module, which means we will try to generate
      the ftrace_caller() stub multiple times.
      
      Although the current code deals with that correctly, ie. it only
      generates a stub the first time, it would be clearer to only try to
      generate the stub once.
      
      Note also on first reading it may appear that we generate a different
      stub for each section that requires relocation, but that is not the
      case. The code in stub_for_addr() that searches for an existing stub
      uses sechdrs[me->arch.stubs_section], ie. the single stub section for
      this module.
      
      A cleaner approach is to only generate the ftrace_caller() stub once,
      from module_finalize(). Although the original code didn't check to see
      if the stub was actually generated correctly, it seems prudent to add a
      check, so do that. And an additional benefit is we can clean the ifdefs
      up a little.
      
      Finally we must propagate the const'ness of some of the pointers passed
      to module_finalize(), but that is also an improvement.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      136cd345
    • M
      powerpc: Create a helper for getting the kernel toc value · a5cab83c
      Michael Ellerman 提交于
      Move the logic to work out the kernel toc pointer into a header. This is
      a good cleanup, and also means we can use it elsewhere in future.
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      a5cab83c
  5. 04 3月, 2016 1 次提交
  6. 03 3月, 2016 1 次提交
    • R
      powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event · fb822e60
      Ravi Bangoria 提交于
      When destroying a hw_breakpoint event, the kernel oopses as follows:
      
        Unable to handle kernel paging request for data at address 0x00000c07
        NIP [c0000000000291d0] arch_unregister_hw_breakpoint+0x40/0x60
        LR [c00000000020b6b4] release_bp_slot+0x44/0x80
      
      Call chain:
      
        hw_breakpoint_event_init()
          bp->destroy = bp_perf_event_destroy;
      
        do_exit()
          perf_event_exit_task()
            perf_event_exit_task_context()
              WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE);
              perf_event_exit_event()
                free_event()
                  _free_event()
                    bp_perf_event_destroy() // event->destroy(event);
                      release_bp_slot()
                        arch_unregister_hw_breakpoint()
      
      perf_event_exit_task_context() sets child_ctx->task as TASK_TOMBSTONE
      which is (void *)-1. arch_unregister_hw_breakpoint() tries to fetch
      'thread' attribute of 'task' resulting in oops.
      
      Peterz points out that the code shouldn't be using bp->ctx anyway, but
      fixing that will require a decent amount of rework. So for now to fix
      the oops, check if bp->ctx->task has been set to (void *)-1, before
      dereferencing it. We don't use TASK_TOMBSTONE, because that would
      require exporting it and it's supposed to be an internal detail.
      
      Fixes: 63b6da39 ("perf: Fix perf_event_exit_task() race")
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb822e60
  7. 02 3月, 2016 4 次提交
  8. 29 2月, 2016 9 次提交
    • S
      KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection · 520fe9c6
      Suresh E. Warrier 提交于
      Redirecting the wakeup of a VCPU from the H_IPI hypercall to
      a core running in the host is usually a good idea, most workloads
      seemed to benefit. However, in one heavily interrupt-driven SMT1
      workload, some regression was observed. This patch adds a kvm_hv
      module parameter called h_ipi_redirect to control this feature.
      
      The default value for this tunable is 1 - that is enable the feature.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      520fe9c6
    • S
      KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU · e17769eb
      Suresh E. Warrier 提交于
      This patch adds support to real-mode KVM to search for a core
      running in the host partition and send it an IPI message with
      VCPU to be woken. This avoids having to switch to the host
      partition to complete an H_IPI hypercall when the VCPU which
      is the target of the the H_IPI is not loaded (is not running
      in the guest).
      
      The patch also includes the support in the IPI handler running
      in the host to do the wakeup by calling kvmppc_xics_ipi_action
      for the PPC_MSG_RM_HOST_ACTION message.
      
      When a guest is being destroyed, we need to ensure that there
      are no pending IPIs waiting to wake up a VCPU before we free
      the VCPUs of the guest. This is accomplished by:
      - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
        before freeing any VCPUs in kvm_arch_destroy_vm().
      - Any PPC_MSG_RM_HOST_ACTION messages must be executed first
        before any other PPC_MSG_CALL_FUNCTION messages.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e17769eb
    • S
      KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM · 0c2a6606
      Suresh Warrier 提交于
      This patch adds the support for the kick VCPU operation for
      kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function
      provides the function to be invoked for a host side operation
      when poked by the real mode KVM. This is initiated by KVM by
      sending an IPI to any free host core.
      
      KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and
      rm_data to point to the VCPU to be woken up before sending the IPI.
      Note that we have allocated one kvmppc_host_rm_core structure
      per core. The above values need to be set in the structure
      corresponding to the core to which the IPI will be sent.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0c2a6606
    • S
      KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs · 6f3bb809
      Suresh Warrier 提交于
      The kvmppc_host_rm_ops structure keeps track of which cores are
      are in the host by maintaining a bitmask of active/runnable
      online CPUs that have not entered the guest. This patch adds
      support to manage the bitmask when a CPU is offlined or onlined
      in the host.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      6f3bb809
    • S
      KVM: PPC: Book3S HV: Manage core host state · b8e6a87c
      Suresh Warrier 提交于
      Update the core host state in kvmppc_host_rm_ops whenever
      the primary thread of the core enters the guest or returns
      back.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b8e6a87c
    • S
      KVM: PPC: Book3S HV: Host-side RM data structures · 79b6c247
      Suresh Warrier 提交于
      This patch defines the data structures to support the setting up
      of host side operations while running in real mode in the guest,
      and also the functions to allocate and free it.
      
      The operations are for now limited to virtual XICS operations.
      Currently, we have only defined one operation in the data
      structure:
               - Wake up a VCPU sleeping in the host when it
                 receives a virtual interrupt
      
      The operations are assigned at the core level because PowerKVM
      requires that the host run in SMT off mode. For each core,
      we will need to manage its state atomically - where the state
      is defined by:
      1. Is the core running in the host?
      2. Is there a Real Mode (RM) operation pending on the host?
      
      Currently, core state is only managed at the whole-core level
      even when the system is in split-core mode. This just limits
      the number of free or "available" cores in the host to perform
      any host-side operations.
      
      The kvmppc_host_rm_core.rm_data allows any data to be passed by
      KVM in real mode to the host core along with the operation to
      be performed.
      
      The kvmppc_host_rm_ops structure is allocated the very first time
      a guest VM is started. Initial core state is also set - all online
      cores are in the host. This structure is never deleted, not even
      when there are no active guests. However, it needs to be freed
      when the module is unloaded because the kvmppc_host_rm_ops_hv
      can contain function pointers to kvm-hv.ko functions for the
      different supported host operations.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      79b6c247
    • S
      powerpc/xics: Add icp_native_cause_ipi_rm · ec13e9b6
      Suresh Warrier 提交于
      Function to cause an IPI by directly updating the MFFR register
      in the XICS. The function is meant for real-mode callers since
      they cannot use the smp_ops->cause_ipi function which uses an
      ioremapped address.
      
      Normal usage is for the the KVM real mode code to set the IPI message
      using smp_muxed_ipi_message_pass and then invoke icp_native_cause_ipi_rm
      to cause the actual IPI.
      
      The function requires kvm_hstate.xics_phys to have been initialized
      with the physical address of XICS.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ec13e9b6
    • S
      powerpc/smp: Add smp_muxed_ipi_set_message · 31639c77
      Suresh Warrier 提交于
      smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which
      uses an ioremapped address to access registers on the XICS
      interrupt controller to cause the IPI. Because of this real
      mode callers cannot call smp_muxed_ipi_message_pass() for IPI
      messaging.
      
      This patch creates a separate function smp_muxed_ipi_set_message
      just to set the IPI message without the cause_ipi routine.
      After calling this function to set the IPI message, real
      mode callers must cause the IPI by writing to the XICS registers
      directly.
      
      As part of this, we also change smp_muxed_ipi_message_pass
      to call smp_muxed_ipi_set_message to set the message instead
      of doing it directly inside the routine.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      31639c77
    • S
      powerpc/smp: Support more IPI messages · bd7f561f
      Suresh Warrier 提交于
      This patch increases the number of demuxed messages for a
      controller with a single ipi to 8 for 64-bit systems.
      
      This is required because we want to use the IPI mechanism
      to send messages from a CPU running in KVM real mode in a
      guest to a CPU in the host to take some action. Currently,
      we only support 4 messages and all 4 are already taken.
      
      Define a fifth message PPC_MSG_RM_HOST_ACTION for this
      purpose.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bd7f561f
  9. 28 2月, 2016 1 次提交
    • D
      mm: ASLR: use get_random_long() · 5ef11c35
      Daniel Cashman 提交于
      Replace calls to get_random_int() followed by a cast to (unsigned long)
      with calls to get_random_long().  Also address shifting bug which, in
      case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
      Signed-off-by: NDaniel Cashman <dcashman@android.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef11c35
  10. 25 2月, 2016 1 次提交
    • M
      KVM: Use simple waitqueue for vcpu->wq · 8577370f
      Marcelo Tosatti 提交于
      The problem:
      
      On -rt, an emulated LAPIC timer instances has the following path:
      
      1) hard interrupt
      2) ksoftirqd is scheduled
      3) ksoftirqd wakes up vcpu thread
      4) vcpu thread is scheduled
      
      This extra context switch introduces unnecessary latency in the
      LAPIC path for a KVM guest.
      
      The solution:
      
      Allow waking up vcpu thread from hardirq context,
      thus avoiding the need for ksoftirqd to be scheduled.
      
      Normal waitqueues make use of spinlocks, which on -RT
      are sleepable locks. Therefore, waking up a waitqueue
      waiter involves locking a sleeping lock, which
      is not allowed from hard interrupt context.
      
      cyclictest command line:
      
      This patch reduces the average latency in my tests from 14us to 11us.
      
      Daniel writes:
      Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
      benchmark on mainline. The test was run 1000 times on
      tip/sched/core 4.4.0-rc8-01134-g0905f04e:
      
        ./x86-run x86/tscdeadline_latency.flat -cpu host
      
      with idle=poll.
      
      The test seems not to deliver really stable numbers though most of
      them are smaller. Paolo write:
      
      "Anything above ~10000 cycles means that the host went to C1 or
      lower---the number means more or less nothing in that case.
      
      The mean shows an improvement indeed."
      
      Before:
      
                     min             max         mean           std
      count  1000.000000     1000.000000  1000.000000   1000.000000
      mean   5162.596000  2019270.084000  5824.491541  20681.645558
      std      75.431231   622607.723969    89.575700   6492.272062
      min    4466.000000    23928.000000  5537.926500    585.864966
      25%    5163.000000  1613252.750000  5790.132275  16683.745433
      50%    5175.000000  2281919.000000  5834.654000  23151.990026
      75%    5190.000000  2382865.750000  5861.412950  24148.206168
      max    5228.000000  4175158.000000  6254.827300  46481.048691
      
      After
                     min            max         mean           std
      count  1000.000000     1000.00000  1000.000000   1000.000000
      mean   5143.511000  2076886.10300  5813.312474  21207.357565
      std      77.668322   610413.09583    86.541500   6331.915127
      min    4427.000000    25103.00000  5529.756600    559.187707
      25%    5148.000000  1691272.75000  5784.889825  17473.518244
      50%    5160.000000  2308328.50000  5832.025000  23464.837068
      75%    5172.000000  2393037.75000  5853.177675  24223.969976
      max    5222.000000  3922458.00000  6186.720500  42520.379830
      
      [Patch was originaly based on the swait implementation found in the -rt
       tree. Daniel ported it to mainline's version and gathered the
       benchmark numbers for tscdeadline_latency test.]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8577370f
  11. 22 2月, 2016 2 次提交
    • A
      powerpc/mm/hash: Clear the invalid slot information correctly · 9ab3ac23
      Aneesh Kumar K.V 提交于
      We can get a hash pte fault with 4k base page size and find the pte
      already inserted with 64K base page size. In that case we need to clear
      the existing slot information from the old pte. Fix this correctly
      
      With THP, we also clear the slot information with respect to all
      the 64K hash pte mapping that 16MB page. They are all invalid
      now. This make sure we don't find the slot valid when we fault with
      4k base page size. Finding the slot valid should not result in any wrong
      behavior because we do check again in hash page table for the validity.
      But we can avoid that check completely.
      
      Fixes: a43c0eb8 ("powerpc/mm: Convert 4k hash insert to C")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9ab3ac23
    • G
      powerpc/eeh: Fix partial hotplug criterion · f6bf0fa1
      Gavin Shan 提交于
      During error recovery, the device could be removed as part of the
      partial hotplug. The criterion used to come with partial hotplug
      is: if the device driver provides error_detected(), slot_reset()
      and resume() callbacks, it's immune from hotplug. Otherwise,
      it's going to experience partial hotplug during EEH recovery. But
      the criterion isn't correct enough: mlx4_core driver for Mellanox
      adapters provides error_detected(), slot_reset() callbacks, but
      resume() isn't there. Those Mellanox adapters won't be to involved
      in the partial hotplug.
      
      This fixes the criterion to a practical one: adpater with driver
      that provides error_detected(), slot_reset() will be immune from
      partial hotplug. resume() isn't mandatory.
      
      Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion")
      Cc: stable@vger.kernel.org #v4.4+
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f6bf0fa1
  12. 17 2月, 2016 3 次提交
  13. 16 2月, 2016 3 次提交