1. 25 9月, 2015 1 次提交
  2. 16 9月, 2015 1 次提交
    • P
      KVM: add halt_attempted_poll to VCPU stats · 62bea5bf
      Paolo Bonzini 提交于
      This new statistic can help diagnosing VCPUs that, for any reason,
      trigger bad behavior of halt_poll_ns autotuning.
      
      For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
      like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
      10+20+40+80+160+320+480 = 1110 microseconds out of every
      479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
      is consuming about 30% more CPU than it would use without
      polling.  This would show as an abnormally high number of
      attempted polling compared to the successful polls.
      
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62bea5bf
  3. 11 9月, 2015 5 次提交
    • C
      dma-mapping: consolidate dma_set_mask · 452e06af
      Christoph Hellwig 提交于
      Almost everyone implements dma_set_mask the same way, although some time
      that's hidden in ->set_dma_mask methods.
      
      This patch consolidates those into a common implementation that either
      calls ->set_dma_mask if present or otherwise uses the default
      implementation.  Some architectures used to only call ->set_dma_mask
      after the initial checks, and those instance have been fixed to do the
      full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
      been fixed.
      
      Unfortunately some architectures overload unrelated semantics like changing
      the dma_ops into it so we still need to allow for an architecture override
      for now.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      452e06af
    • C
      dma-mapping: consolidate dma_supported · ee196371
      Christoph Hellwig 提交于
      Most architectures just call into ->dma_supported, but some also return 1
      if the method is not present, or 0 if no dma ops are present (although
      that should never happeb). Consolidate this more broad version into
      common code.
      
      Also fix h8300 which inorrectly always returned 0, which would have been
      a problem if it's dma_set_mask implementation wasn't a similarly buggy
      noop.
      
      As a few architectures have much more elaborate implementations, we
      still allow for arch overrides.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee196371
    • C
      dma-mapping: cosolidate dma_mapping_error · efa21e43
      Christoph Hellwig 提交于
      Currently there are three valid implementations of dma_mapping_error:
      
       (1) call ->mapping_error
       (2) check for a hardcoded error code
       (3) always return 0
      
      This patch provides a common implementation that calls ->mapping_error
      if present, then checks for DMA_ERROR_CODE if defined or otherwise
      returns 0.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      efa21e43
    • C
      dma-mapping: consolidate dma_{alloc,free}_noncoherent · 1e893752
      Christoph Hellwig 提交于
      Most architectures do not support non-coherent allocations and either
      define dma_{alloc,free}_noncoherent to their coherent versions or stub
      them out.
      
      Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
      implements them directly.
      
      This patch moves the Openrisc version to common code, and handles the
      DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.
      
      Note that actual non-coherent allocations require a dma_cache_sync
      implementation, so if non-coherent allocations didn't work on
      an architecture before this patch they still won't work after it.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e893752
    • C
      dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} · 6894258e
      Christoph Hellwig 提交于
      Since 2009 we have a nice asm-generic header implementing lots of DMA API
      functions for architectures using struct dma_map_ops, but unfortunately
      it's still missing a lot of APIs that all architectures still have to
      duplicate.
      
      This series consolidates the remaining functions, although we still need
      arch opt outs for two of them as a few architectures have very
      non-standard implementations.
      
      This patch (of 5):
      
      The coherent DMA allocator works the same over all architectures supporting
      dma_map operations.
      
      This patch consolidates them and converges the minor differences:
      
       - the debug_dma helpers are now called from all architectures, including
         those that were previously missing them
       - dma_alloc_from_coherent and dma_release_from_coherent are now always
         called from the generic alloc/free routines instead of the ops
         dma-mapping-common.h always includes dma-coherent.h to get the defintions
         for them, or the stubs if the architecture doesn't support this feature
       - checks for ->alloc / ->free presence are removed.  There is only one
         magic instead of dma_map_ops without them (mic_dma_ops) and that one
         is x86 only anyway.
      
      Besides that only x86 needs special treatment to replace a default devices
      if none is passed and tweak the gfp_flags.  An optional arch hook is provided
      for that.
      
      [linux@roeck-us.net: fix build]
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6894258e
  4. 03 9月, 2015 1 次提交
    • T
      KVM: PPC: Book3S: Fix size of the PSPB register · f35f3a48
      Thomas Huth 提交于
      The size of the Problem State Priority Boost Register is only
      32 bits, but the kvm_vcpu_arch->pspb variable is declared as
      "ulong", ie. 64-bit. However, the assembler code accesses this
      variable with 32-bit accesses, and the KVM_REG_PPC_PSPB macro
      is defined with SIZE_U32, too, so that the current code is
      broken on big endian hosts: kvmppc_get_one_reg_hv() will only
      return zero for this register since it is using the wrong half
      of the pspb variable. Let's fix this problem by adjusting the
      size of the pspb field in the kvm_vcpu_arch structure.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f35f3a48
  5. 22 8月, 2015 6 次提交
    • M
      powerpc/powernv: Fix mis-merge of OPAL support for LEDS driver · 5d53be7d
      Michael Ellerman 提交于
      When I merged the OPAL support for the powernv LEDS driver I missed a
      hunk.
      
      This is slightly modified from the original patch, as the original added
      code to opal-api.h which is not in the skiboot version, which is
      discouraged.
      
      Instead those values are moved into the driver, which is the only place
      they are used.
      
      Fixes: 8a8d9181 ("powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states")
      Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5d53be7d
    • S
      KVM: PPC: Book3S: correct width in XER handling · c63517c2
      Sam bobroff 提交于
      In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
      bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
      accessed as such.
      
      This patch corrects places where it is accessed as a 32 bit field by a
      64 bit kernel.  In some cases this is via a 32 bit load or store
      instruction which, depending on endianness, will cause either the
      lower or upper 32 bits to be missed.  In another case it is cast as a
      u32, causing the upper 32 bits to be cleared.
      
      This patch corrects those places by extending the access methods to
      64 bits.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c63517c2
    • P
      KVM: PPC: Book3S HV: Fix bug in dirty page tracking · 08fe1e7b
      Paul Mackerras 提交于
      This fixes a bug in the tracking of pages that get modified by the
      guest.  If the guest creates a large-page HPTE, writes to memory
      somewhere within the large page, and then removes the HPTE, we only
      record the modified state for the first normal page within the large
      page, when in fact the guest might have modified some other normal
      page within the large page.
      
      To fix this we use some unused bits in the rmap entry to record the
      order (log base 2) of the size of the page that was modified, when
      removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
      order to return the correct number of modified pages.
      
      The same thing could in principle happen when removing a HPTE at the
      host's request, i.e. when paging out a page, except that we never
      page out large pages, and the guest can only create large-page HPTEs
      if the guest RAM is backed by large pages.  However, we also fix
      this case for the sake of future-proofing.
      
      The reference bit is also subject to the same loss of information.  We
      don't make the same fix here for the reference bit because there isn't
      an interface for userspace to find out which pages the guest has
      referenced, whereas there is one for userspace to find out which pages
      the guest has modified.  Because of this loss of information, the
      kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
      say that a page has not been referenced when it has, but that doesn't
      matter greatly because we never page or swap out large pages.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      08fe1e7b
    • P
      KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 · b4deba5c
      Paul Mackerras 提交于
      This builds on the ability to run more than one vcore on a physical
      core by using the micro-threading (split-core) modes of the POWER8
      chip.  Previously, only vcores from the same VM could be run together,
      and (on POWER8) only if they had just one thread per core.  With the
      ability to split the core on guest entry and unsplit it on guest exit,
      we can run up to 8 vcpu threads from up to 4 different VMs, and we can
      run multiple vcores with 2 or 4 vcpus per vcore.
      
      Dynamic micro-threading is only available if the static configuration
      of the cores is whole-core mode (unsplit), and only on POWER8.
      
      To manage this, we introduce a new kvm_split_mode struct which is
      shared across all of the subcores in the core, with a pointer in the
      paca on each thread.  In addition we extend the core_info struct to
      have information on each subcore.  When deciding whether to add a
      vcore to the set already on the core, we now have two possibilities:
      (a) piggyback the vcore onto an existing subcore, or (b) start a new
      subcore.
      
      Currently, when any vcpu needs to exit the guest and switch to host
      virtual mode, we interrupt all the threads in all subcores and switch
      the core back to whole-core mode.  It may be possible in future to
      allow some of the subcores to keep executing in the guest while
      subcore 0 switches to the host, but that is not implemented in this
      patch.
      
      This adds a module parameter called dynamic_mt_modes which controls
      which micro-threading (split-core) modes the code will consider, as a
      bitmap.  In other words, if it is 0, no micro-threading mode is
      considered; if it is 2, only 2-way micro-threading is considered; if
      it is 4, only 4-way, and if it is 6, both 2-way and 4-way
      micro-threading mode will be considered.  The default is 6.
      
      With this, we now have secondary threads which are the primary thread
      for their subcore and therefore need to do the MMU switch.  These
      threads will need to be started even if they have no vcpu to run, so
      we use the vcore pointer in the PACA rather than the vcpu pointer to
      trigger them.
      
      It is now possible for thread 0 to find that an exit has been
      requested before it gets to switch the subcore state to the guest.  In
      that case we haven't added the guest's timebase offset to the
      timebase, so we need to be careful not to subtract the offset in the
      guest exit path.  In fact we just skip the whole path that switches
      back to host context, since we haven't switched to the guest context.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4deba5c
    • P
      KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165
      Paul Mackerras 提交于
      When running a virtual core of a guest that is configured with fewer
      threads per core than the physical cores have, the extra physical
      threads are currently unused.  This makes it possible to use them to
      run one or more other virtual cores from the same guest when certain
      conditions are met.  This applies on POWER7, and on POWER8 to guests
      with one thread per virtual core.  (It doesn't apply to POWER8 guests
      with multiple threads per vcore because they require a 1-1 virtual to
      physical thread mapping in order to be able to use msgsndp and the
      TIR.)
      
      The idea is that we maintain a list of preempted vcores for each
      physical cpu (i.e. each core, since the host runs single-threaded).
      Then, when a vcore is about to run, it checks to see if there are
      any vcores on the list for its physical cpu that could be
      piggybacked onto this vcore's execution.  If so, those additional
      vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
      threads are started as well as the original vcore, which is called
      the master vcore.
      
      After the vcores have exited the guest, the extra ones are put back
      onto the preempted list if any of their VCPUs are still runnable and
      not idle.
      
      This means that vcpu->arch.ptid is no longer necessarily the same as
      the physical thread that the vcpu runs on.  In order to make it easier
      for code that wants to send an IPI to know which CPU to target, we
      now store that in a new field in struct vcpu_arch, called thread_cpu.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ec257165
    • T
      KVM: PPC: Fix warnings from sparse · 5358a963
      Thomas Huth 提交于
      When compiling the KVM code for POWER with "make C=1", sparse
      complains about functions missing proper prototypes and a 64-bit
      constant missing the ULL prefix. Let's fix this by making the
      functions static or by including the proper header with the
      prototypes, and by appending a ULL prefix to the constant
      PPC_MPPE_ADDRESS_MASK.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5358a963
  6. 20 8月, 2015 1 次提交
  7. 18 8月, 2015 5 次提交
    • G
      powerpc/eeh: Disable automatically blocked PCI config · 39bfd715
      Gavin Shan 提交于
      pcibios_set_pcie_reset_state() could be called to complete
      reset request when passing through PCI device, flag
      EEH_PE_ISOLATED is set before saving the PCI config sapce.
      On some Broadcom adapters, EEH_PE_CFG_BLOCKED is automatically
      set when the flag EEH_PE_ISOLATED is marked. It caused bogus
      data saved from the PCI config space, which will be restored
      to the PCI adapter after the reset. Eventually, the hardware
      can't work with corrupted data in PCI config space.
      
      The patch fixes the issue with eeh_pe_state_mark_no_cfg(), which
      doesn't set EEH_PE_CFG_BLOCKED when seeing EEH_PE_ISOLATED on the
      PE, in order to avoid the bogus data saved and restored to the PCI
      config space.
      Reported-by: NRajanikanth H. Adaveeshaiah <rajanikanth.ha@in.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      39bfd715
    • A
      powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops · 53522982
      Andrew Donnellan 提交于
      Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
      pci_controller_ops, similar to commit 763d2d8d ("powerpc/powernv:
      Move dma_set_mask from pnv_phb to pci_controller_ops").
      
      Previous call chain:
      
        0) call dma_get_required_mask() (kernel/dma.c)
        1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
           points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
        2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
           (platforms/powernv/pci.c)
        3) call phb->dma_get_required_mask if it exists
        4) it only exists in the ioda case, where it points to
             pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)
      
      New call chain:
      
        0) call dma_get_required_mask() (kernel/dma.c)
        1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
           if it exists
        2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
           (platforms/powernv/pci-ioda.c)
      
      In the p5ioc2 case, the call chain remains the same -
      dma_get_required_mask() does not find either a ppc_md call or
      pci_controller_ops call, so it calls __dma_get_required_mask().
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      53522982
    • M
      powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index() · 95300577
      Michael Ellerman 提交于
      Now that support for 64k pages with a 4K kernel is removed, this code is
      unreachable.
      
      CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
      also true.
      
      But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
      includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
      crucially __real_pte, which means this definition can never be used.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      95300577
    • M
      powerpc/cell: Drop support for 64K local store on 4K kernels · f444f1f8
      Michael Ellerman 提交于
      Back in the olden days we added support for using 64K pages to map the
      SPU (Synergistic Processing Unit) local store on Cell, when the main
      kernel was using 4K pages.
      
      This was useful at the time because distros were using 4K pages, but
      using 64K pages on the SPUs could reduce TLB pressure there.
      
      However these days the number of Cell users is approaching zero, and
      supporting this option adds unpleasant complexity to the memory
      management code.
      
      So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NJeremy Kerr <jk@ozlabs.org>
      f444f1f8
    • M
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 74b5037b
      Michael Ellerman 提交于
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      PAGE_SIZE.
      
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      
      Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      74b5037b
  8. 14 8月, 2015 1 次提交
  9. 08 8月, 2015 5 次提交
    • S
      powerpc/fsl: Force coherent memory on e500mc derivatives · c6023202
      Scott Wood 提交于
      In CoreNet systems it is not allowed to mix M and non-M mappings to the
      same memory, and coherent DMA accesses are considered to be M mappings
      for this purpose.  Ignoring this has been observed to cause hard
      lockups in non-SMP kernels on e6500.
      
      Furthermore, e6500 implements the LRAT (logical to real address table)
      which allows KVM guests to control the WIMGE bits.  This means that
      KVM cannot force the M bit on the way it usually does, so the guest had
      better set it itself.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      c6023202
    • S
      powerpc/booke64: Move mb() to __set_pte_at() with kernel-addr test · 0d61f0b3
      Scott Wood 提交于
      map_kernel() doesn't catch all places that create kernel PTEs.  In
      particular, vmalloc() calls set_pte_at() directly.  This causes a
      crash when booting a non-SMP kernel on e6500.
      
      Move the sync to __set_pte(), to be executed only for kernel addresses.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      0d61f0b3
    • S
      powerpc/mm: Don't call __flush_dcache_icache_phys() with PA>VA · 2f7d2b74
      Scott Wood 提交于
      __flush_dcache_icache_phys() requires the ability to access the
      memory with the MMU disabled, which means that on a 32-bit system
      any memory above 4 GiB is inaccessible.  In particular, mpc86xx is
      32-bit and can have more than 4 GiB of RAM.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      2f7d2b74
    • L
      powerpc: add support for csum_add() · 501c8de7
      LEROY Christophe 提交于
      The C version of csum_add() as defined in include/net/checksum.h gives
      the following assembly in ppc32:
             0:       7c 04 1a 14     add     r0,r4,r3
             4:       7c 64 00 10     subfc   r3,r4,r0
             8:       7c 63 19 10     subfe   r3,r3,r3
             c:       7c 63 00 50     subf    r3,r3,r0
      and the following in ppc64:
         0xc000000000001af8 <+0>:	add     r3,r3,r4
         0xc000000000001afc <+4>:	cmplw   cr7,r3,r4
         0xc000000000001b00 <+8>:	mfcr    r4
         0xc000000000001b04 <+12>:	rlwinm  r4,r4,29,31,31
         0xc000000000001b08 <+16>:	add     r3,r4,r3
         0xc000000000001b0c <+20>:	clrldi  r3,r3,32
         0xc000000000001b10 <+24>:	blr
      
      include/net/checksum.h also offers the possibility to define an arch
      specific function.  This patch provides a specific csum_add() inline
      function.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      501c8de7
    • L
      powerpc: put csum_tcpudp_magic inline · 92c985f1
      LEROY Christophe 提交于
      csum_tcpudp_magic() is only a few instructions, and does modify
      really few registers. So it is not worth having it as a separate
      function and suffer function branching and saving of volatile
      registers.
      
      This patch makes it inline by use of the already existing
      csum_tcpudp_nofold() function.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      92c985f1
  10. 06 8月, 2015 3 次提交
  11. 04 8月, 2015 1 次提交
  12. 03 8月, 2015 2 次提交
    • P
      locking/static_keys: Add a new static_key interface · 11276d53
      Peter Zijlstra 提交于
      There are various problems and short-comings with the current
      static_key interface:
      
       - static_key_{true,false}() read like a branch depending on the key
         value, instead of the actual likely/unlikely branch depending on
         init value.
      
       - static_key_{true,false}() are, as stated above, tied to the
         static_key init values STATIC_KEY_INIT_{TRUE,FALSE}.
      
       - we're limited to the 2 (out of 4) possible options that compile to
         a default NOP because that's what our arch_static_branch() assembly
         emits.
      
      So provide a new static_key interface:
      
        DEFINE_STATIC_KEY_TRUE(name);
        DEFINE_STATIC_KEY_FALSE(name);
      
      Which define a key of different types with an initial true/false
      value.
      
      Then allow:
      
         static_branch_likely()
         static_branch_unlikely()
      
      to take a key of either type and emit the right instruction for the
      case.
      
      This means adding a second arch_static_branch_jump() assembly helper
      which emits a JMP per default.
      
      In order to determine the right instruction for the right state,
      encode the branch type in the LSB of jump_entry::key.
      
      This is the final step in removing the naming confusion that has led to
      a stream of avoidable bugs such as:
      
        a833581e ("x86, perf: Fix static_key bug in load_mm_cr4()")
      
      ... but it also allows new static key combinations that will give us
      performance enhancements in the subsequent patches.
      
      Tested-by: Rabin Vincent <rabin@rab.in> # arm
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> # ppc
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      11276d53
    • A
      locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire() · 76695af2
      Andrey Konovalov 提交于
      Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire()
      with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips,
      powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work
      reliably on non-scalar types.
      
      WRITE_ONCE() and READ_ONCE() were introduced in the following commits:
      
        230fa253 ("kernel: Provide READ_ONCE and ASSIGN_ONCE")
        43239cbe ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      76695af2
  13. 29 7月, 2015 7 次提交
    • L
      arch/*/io.h: Add ioremap_uc() to all architectures · 4c73e892
      Luis R. Rodriguez 提交于
      This adds ioremap_uc() only for architectures that do not
      include asm-generic.h/io.h as that already provides a default
      definition for them for both cases where you have CONFIG_MMU
      and you do not, and because of this, the number of architectures
      this patch address is less than the architectures that the
      ioremap_wt() patch addressed, "arch/*/io.h: Add ioremap_wt() to
      all architectures").
      
      In order to reduce the number of architectures we have to
      modify by adding new architecture IO APIs we'll have to review
      the architectures in this patch, see why they can't add
      asm-generic.h/io.h or issues that would be created by doing
      so and then spread a consistent inclusion of this header
      towards the end of their own header. For instance arch/metag
      includes the asm-generic/io.h *before* the ioremap*()
      definitions, this should be the other way around but only
      once we have guard wrappers for the non-MMU case also for
      asm-generic/io.h.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Cc: Abhilash Kesavan <a.kesavan@samsung.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Kyle McMartin <kyle@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-am33-list@redhat.com
      Cc: linux-arch@vger.kernel.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-sh@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20150728181713.GB30479@wotan.suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4c73e892
    • M
      powerpc/kernel: Add SIG_SYS support for compat tasks · 1b60bab0
      Michael Ellerman 提交于
      SIG_SYS was added in commit a0727e8c "signal, x86: add SIGSYS info
      and make it synchronous."
      
      Because we use the asm-generic struct siginfo, we got support for
      SIG_SYS for free as part of that commit.
      
      However there was no compat handling added for powerpc. That means we've
      been advertising the existence of signfo._sifields._sigsys to compat
      tasks, but not actually filling in the fields correctly.
      
      Luckily it looks like no one has noticed, presumably because the only
      user of SIGSYS in the kernel is seccomp filter, which we don't support
      yet.
      
      So before we enable seccomp filter, add compat handling for SIGSYS.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      1b60bab0
    • M
      powerpc: Change syscall_get_nr() to return int · e9fbe686
      Michael Ellerman 提交于
      The documentation for syscall_get_nr() in asm-generic says:
      
       Note this returns int even on 64-bit machines. Only 32 bits of
       system call number can be meaningful. If the actual arch value
       is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
      
      However our implementation was never updated to reflect this.
      
      Generally it's not important, but there is once case where it matters.
      
      For seccomp filter with SECCOMP_RET_TRACE, the tracer will set
      regs->gpr[0] to -1 to reject the syscall. When the task is a compat
      task, this means we end up with 0xffffffff in r0 because ptrace will
      zero extend the 32-bit value.
      
      If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will
      see a positive value in r0 and will incorrectly allow the syscall
      through seccomp.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      e9fbe686
    • M
      powerpc: Use orig_gpr3 in syscall_get_arguments() · 1cb9839b
      Michael Ellerman 提交于
      Currently syscall_get_arguments() is used by syscall tracepoints, and
      collect_syscall() which is used in some debugging as well as
      /proc/pid/syscall.
      
      The current implementation just copies regs->gpr[3 .. 5] out, which is
      fine for all the current use cases.
      
      When we enable seccomp filter, that will also start using
      syscall_get_arguments(). However for seccomp filter we want to use r3
      as the return value of the syscall, and orig_gpr3 as the first
      parameter. This will allow seccomp to modify the return value in r3.
      
      To support this we need to modify syscall_get_arguments() to return
      orig_gpr3 instead of r3. This is safe for all uses because orig_gpr3
      always contains the r3 value that was passed to the syscall. We store it
      in the syscall entry path and never modify it.
      
      Update syscall_set_arguments() while we're here, even though it's never
      used.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      1cb9839b
    • M
      powerpc: Rework syscall_get_arguments() so there is only one loop · a7657844
      Michael Ellerman 提交于
      Currently syscall_get_arguments() has two loops, one for compat and one
      for regular tasks. In prepartion for the next patch, which changes which
      registers we use, switch it to only have one loop, so we only have one
      place to update.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      a7657844
    • M
      powerpc: Don't negate error in syscall_set_return_value() · 1b1a3702
      Michael Ellerman 提交于
      Currently the only caller of syscall_set_return_value() is seccomp
      filter, which is not enabled on powerpc.
      
      This means we have not noticed that our implementation of
      syscall_set_return_value() negates error, even though the value passed
      in is already negative.
      
      So remove the negation in syscall_set_return_value(), and expect the
      caller to do it like all other implementations do.
      
      Also add a comment about the ccr handling.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      1b1a3702
    • M
      powerpc: Drop unused syscall_get_error() · 2923e6d5
      Michael Ellerman 提交于
      syscall_get_error() is unused, and never has been.
      
      It's also probably wrong, as it negates r3 before returning it, but that
      depends on what the caller is expecting.
      
      It also doesn't deal with compat, and doesn't deal with TIF_NOERROR.
      
      Although we could fix those, until it has a caller and it's clear what
      semantics the caller wants it's just untested code. So drop it.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      2923e6d5
  14. 28 7月, 2015 1 次提交
    • S
      powerpc/powernv: Add definition of OPAL_MSG_OCC message type · 196ba2d5
      Shilpasri G Bhat 提交于
      Add OPAL_MSG_OCC message definition to opal_message_type to receive
      OCC events like reset, load and throttled. Host performance can be
      affected when OCC is reset or OCC throttles the max Pstate.
      We can register to opal_message_notifier to receive OPAL_MSG_OCC type
      of message and report it to the userspace so as to keep the user
      informed about the reason for a performance drop in workloads.
      
      The reset and load OCC events are notified to kernel when FSP sends
      OCC_RESET and OCC_LOAD commands.  Both reset and load messages are
      sent to kernel on successful completion of reset and load operation
      respectively.
      
      The throttle OCC event indicates that the Pmax of the chip is reduced.
      The chip_id and throttle reason for reducing Pmax is also queued along
      with the message.
      Signed-off-by: NShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      196ba2d5