1. 22 8月, 2015 2 次提交
  2. 08 7月, 2015 1 次提交
  3. 07 7月, 2015 1 次提交
    • S
      powerpc/powernv: Fix race in updating core_idle_state · b32aadc1
      Shreyas B. Prabhu 提交于
      core_idle_state is maintained for each core. It uses 0-7 bits to track
      whether a thread in the core has entered fastsleep or winkle. 8th bit is
      used as a lock bit.
      The lock bit is set in these 2 scenarios-
       - The thread is first in subcore to wakeup from sleep/winkle.
       - If its the last thread in the core about to enter sleep/winkle
      
      While the lock bit is set, if any other thread in the core wakes up, it
      loops until the lock bit is cleared before proceeding in the wakeup
      path. This helps prevent race conditions w.r.t fastsleep workaround and
      prevents threads from switching to process context before core/subcore
      resources are restored.
      
      But, in the path to sleep/winkle entry, we currently don't check for
      lock-bit. This exposes us to following race when running with subcore
      on-
      
      First thread in the subcorea		Another thread in the same
      waking up		   		core entering sleep/winkle
      
      lwarx   r15,0,r14
      ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
      stwcx.  r15,0,r14
      [Code to restore subcore state]
      
      						lwarx   r15,0,r14
      						[clear thread bit]
      						stwcx.  r15,0,r14
      
      andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
      stw     r15,0(r14)
      
      Here, after the thread entering sleep clears its thread bit in
      core_idle_state, the value is overwritten by the thread waking up.
      In such cases when the core enters fastsleep, code mistakes an idle
      thread as running. Because of this, the first thread waking up from
      fastsleep which is supposed to resync timebase skips it. So we can
      end up having a core with stale timebase value.
      
      This patch fixes the above race by looping on the lock bit even while
      entering the idle states.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
      Cc: stable@vger.kernel.org # 3.19+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b32aadc1
  4. 06 7月, 2015 5 次提交
  5. 26 6月, 2015 1 次提交
  6. 25 6月, 2015 8 次提交
  7. 24 6月, 2015 1 次提交
  8. 19 6月, 2015 4 次提交
  9. 18 6月, 2015 4 次提交
  10. 17 6月, 2015 5 次提交
    • S
      powerpc: Make doorbell check preemption safe · 3609d819
      Shreyas B. Prabhu 提交于
      Doorbell can be used to cause ipi on cpus which are sibling threads on
      the same core. So icp_native_cause_ipi checks if the destination cpu
      is a sibling thread of the current cpu and uses doorbell in such cases.
      
      But while running with CONFIG_PREEMPT=y, since this section is
      preemtible, we can run into issues if after we check if the destination
      cpu is a sibling cpu, the task gets migrated from a sibling cpu to a
      cpu on another core.
      
      Fix this by using get_cpu()/ put_cpu()
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3609d819
    • P
      powerpc: don't use module_init for non-modular core hugetlb code · 6f114281
      Paul Gortmaker 提交于
      The hugetlbpage.o is obj-y (always built in).  It will never
      be modular, so using module_init as an alias for __initcall is
      somewhat misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of arch_initcall (which
      makes sense for arch code) will thus change this registration
      from level 6-device to level 3-arch (i.e. slightly earlier).
      However no observable impact of that small difference has
      been observed during testing, or is expected.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      6f114281
    • P
      powerpc: use subsys_initcall for Freescale Local Bus · 383d14a5
      Paul Gortmaker 提交于
      The FSL_SOC option is bool, and hence this code is either
      present or absent.  It will never be modular, so using
      module_init as an alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of subsys_initcall (which
      makes sense for bus code) will thus change this registration
      from level 6-device to level 4-subsys (i.e. slightly earlier).
      However no observable impact of that small difference has
      been observed during testing, or is expected.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Scott Wood <scottwood@freescale.com>
      Acked-by: NScott Wood <scottwood@freescale.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      383d14a5
    • P
      powerpc: don't use module_init in non-modular 83xx suspend code · a390a2f1
      Paul Gortmaker 提交于
      The suspend.o is built for SUSPEND -- which is bool, and hence
      this code is either present or absent.  It will never be modular,
      so using module_init as an alias for __initcall can be somewhat
      misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of device_initcall
      directly in this change means that the runtime impact is
      zero -- it will remain at level 6 in initcall ordering.
      
      Cc: Scott Wood <scottwood@freescale.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      a390a2f1
    • P
      powerpc: use device_initcall for registering rtc devices · 8f6b9512
      Paul Gortmaker 提交于
      Currently these two RTC devices are in core platform code
      where it is not possible for them to be modular.  It will
      never be modular, so using module_init as an alias for
      __initcall can be somewhat misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of device_initcall
      directly in this change means that the runtime impact is
      zero -- they will remain at level 6 in initcall ordering.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Acked-by: NGeoff Levand <geoff@infradead.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      8f6b9512
  11. 15 6月, 2015 1 次提交
  12. 11 6月, 2015 7 次提交
    • A
      powerpc: Don't use gcc specific options on clang · 238abecd
      Anton Blanchard 提交于
      We have code to choose between several options, eg. -mabi=elfv2 vs
      -mcall-aixdesc, and -mcmodel=medium vs -mminimal-toc. But these are all
      GCC specific, so use cc-option on all of them.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      238abecd
    • A
      powerpc: Don't use -mno-strict-align on clang · 92d6cf2d
      Anton Blanchard 提交于
      We added -mno-strict-align in commit f036b368 (powerpc: Work around little
      endian gcc bug) to fix gcc bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57134
      
      Clang doesn't understand it. We need to use a conditional because we can't use the
      simpler call cc-option here.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92d6cf2d
    • A
      powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it · a50a862e
      Anton Blanchard 提交于
      These options are not recognised on LLVM, so use call cc-option to check
      for support.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a50a862e
    • A
      powerpc: Only use -mabi=altivec if toolchain supports it · 1fb3f5a7
      Anton Blanchard 提交于
      The -mabi=altivec option is not recognised on LLVM, so use call cc-option
      to check for support.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1fb3f5a7
    • A
      powerpc: Fix duplicate const clang warning in user access code · b91c1e3e
      Anton Blanchard 提交于
      We see a large number of duplicate const errors in the user access
      code when building with llvm/clang:
      
        include/linux/pagemap.h:576:8: warning: duplicate 'const' declaration specifier
            [-Wduplicate-decl-specifier]
              ret = __get_user(c, uaddr);
      
      The problem is we are doing const __typeof__(*(ptr)), which will hit the
      warning if ptr is marked const.
      
      Removing const does not seem to have any effect on GCC code generation.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b91c1e3e
    • A
      vfio: powerpc/spapr: Support Dynamic DMA windows · e633bc86
      Alexey Kardashevskiy 提交于
      This adds create/remove window ioctls to create and remove DMA windows.
      sPAPR defines a Dynamic DMA windows capability which allows
      para-virtualized guests to create additional DMA windows on a PCI bus.
      The existing linux kernels use this new window to map the entire guest
      memory and switch to the direct DMA operations saving time on map/unmap
      requests which would normally happen in a big amounts.
      
      This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
      VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
      Up to 2 windows are supported now by the hardware and by this driver.
      
      This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
      information such as a number of supported windows and maximum number
      levels of TCE tables.
      
      DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature
      as we still want to support v2 on platforms which cannot do DDW for
      the sake of TCE acceleration in KVM (coming soon).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e633bc86
    • A
      vfio: powerpc/spapr: Register memory and define IOMMU v2 · 2157e7b8
      Alexey Kardashevskiy 提交于
      The existing implementation accounts the whole DMA window in
      the locked_vm counter. This is going to be worse with multiple
      containers and huge DMA windows. Also, real-time accounting would requite
      additional tracking of accounted pages due to the page size difference -
      IOMMU uses 4K pages and system uses 4K or 64K pages.
      
      Another issue is that actual pages pinning/unpinning happens on every
      DMA map/unmap request. This does not affect the performance much now as
      we spend way too much time now on switching context between
      guest/userspace/host but this will start to matter when we add in-kernel
      DMA map/unmap acceleration.
      
      This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
      New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
      2 new ioctls to register/unregister DMA memory -
      VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
      which receive user space address and size of a memory region which
      needs to be pinned/unpinned and counted in locked_vm.
      New IOMMU splits physical pages pinning and TCE table update
      into 2 different operations. It requires:
      1) guest pages to be registered first
      2) consequent map/unmap requests to work only with pre-registered memory.
      For the default single window case this means that the entire guest
      (instead of 2GB) needs to be pinned before using VFIO.
      When a huge DMA window is added, no additional pinning will be
      required, otherwise it would be guest RAM + 2GB.
      
      The new memory registration ioctls are not supported by
      VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
      will require memory to be preregistered in order to work.
      
      The accounting is done per the user process.
      
      This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
      can do with v1 or v2 IOMMUs.
      
      In order to support memory pre-registration, we need a way to track
      the use of every registered memory region and only allow unregistration
      if a region is not in use anymore. So we need a way to tell from what
      region the just cleared TCE was from.
      
      This adds a userspace view of the TCE table into iommu_table struct.
      It contains userspace address, one per TCE entry. The table is only
      allocated when the ownership over an IOMMU group is taken which means
      it is only used from outside of the powernv code (such as VFIO).
      
      As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
      DDW API), this creates a default DMA window for IODA2 for consistency.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2157e7b8