1. 23 11月, 2015 2 次提交
    • J
      slub: optimize bulk slowpath free by detached freelist · d0ecd894
      Jesper Dangaard Brouer 提交于
      This change focus on improving the speed of object freeing in the
      "slowpath" of kmem_cache_free_bulk.
      
      The calls slab_free (fastpath) and __slab_free (slowpath) have been
      extended with support for bulk free, which amortize the overhead of
      the (locked) cmpxchg_double.
      
      To use the new bulking feature, we build what I call a detached
      freelist.  The detached freelist takes advantage of three properties:
      
       1) the free function call owns the object that is about to be freed,
          thus writing into this memory is synchronization-free.
      
       2) many freelist's can co-exist side-by-side in the same slab-page
          each with a separate head pointer.
      
       3) it is the visibility of the head pointer that needs synchronization.
      
      Given these properties, the brilliant part is that the detached
      freelist can be constructed without any need for synchronization.  The
      freelist is constructed directly in the page objects, without any
      synchronization needed.  The detached freelist is allocated on the
      stack of the function call kmem_cache_free_bulk.  Thus, the freelist
      head pointer is not visible to other CPUs.
      
      All objects in a SLUB freelist must belong to the same slab-page.
      Thus, constructing the detached freelist is about matching objects
      that belong to the same slab-page.  The bulk free array is scanned is
      a progressive manor with a limited look-ahead facility.
      
      Kmem debug support is handled in call of slab_free().
      
      Notice kmem_cache_free_bulk no longer need to disable IRQs. This
      only slowed down single free bulk with approx 3 cycles.
      
      Performance data:
       Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz
      
      SLUB fastpath single object quick reuse: 47 cycles(tsc) 11.931 ns
      
      To get stable and comparable numbers, the kernel have been booted with
      "slab_merge" (this also improve performance for larger bulk sizes).
      
      Performance data, compared against fallback bulking:
      
      bulk -  fallback bulk            - improvement with this patch
         1 -  62 cycles(tsc) 15.662 ns - 49 cycles(tsc) 12.407 ns- improved 21.0%
         2 -  55 cycles(tsc) 13.935 ns - 30 cycles(tsc) 7.506 ns - improved 45.5%
         3 -  53 cycles(tsc) 13.341 ns - 23 cycles(tsc) 5.865 ns - improved 56.6%
         4 -  52 cycles(tsc) 13.081 ns - 20 cycles(tsc) 5.048 ns - improved 61.5%
         8 -  50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.659 ns - improved 64.0%
        16 -  49 cycles(tsc) 12.412 ns - 17 cycles(tsc) 4.495 ns - improved 65.3%
        30 -  49 cycles(tsc) 12.484 ns - 18 cycles(tsc) 4.533 ns - improved 63.3%
        32 -  50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.707 ns - improved 64.0%
        34 -  96 cycles(tsc) 24.243 ns - 23 cycles(tsc) 5.976 ns - improved 76.0%
        48 -  83 cycles(tsc) 20.818 ns - 21 cycles(tsc) 5.329 ns - improved 74.7%
        64 -  74 cycles(tsc) 18.700 ns - 20 cycles(tsc) 5.127 ns - improved 73.0%
       128 -  90 cycles(tsc) 22.734 ns - 27 cycles(tsc) 6.833 ns - improved 70.0%
       158 -  99 cycles(tsc) 24.776 ns - 30 cycles(tsc) 7.583 ns - improved 69.7%
       250 - 104 cycles(tsc) 26.089 ns - 37 cycles(tsc) 9.280 ns - improved 64.4%
      
      Performance data, compared current in-kernel bulking:
      
      bulk - curr in-kernel  - improvement with this patch
         1 -  46 cycles(tsc) - 49 cycles(tsc) - improved (cycles:-3) -6.5%
         2 -  27 cycles(tsc) - 30 cycles(tsc) - improved (cycles:-3) -11.1%
         3 -  21 cycles(tsc) - 23 cycles(tsc) - improved (cycles:-2) -9.5%
         4 -  18 cycles(tsc) - 20 cycles(tsc) - improved (cycles:-2) -11.1%
         8 -  17 cycles(tsc) - 18 cycles(tsc) - improved (cycles:-1) -5.9%
        16 -  18 cycles(tsc) - 17 cycles(tsc) - improved (cycles: 1)  5.6%
        30 -  18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0)  0.0%
        32 -  18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0)  0.0%
        34 -  78 cycles(tsc) - 23 cycles(tsc) - improved (cycles:55) 70.5%
        48 -  60 cycles(tsc) - 21 cycles(tsc) - improved (cycles:39) 65.0%
        64 -  49 cycles(tsc) - 20 cycles(tsc) - improved (cycles:29) 59.2%
       128 -  69 cycles(tsc) - 27 cycles(tsc) - improved (cycles:42) 60.9%
       158 -  79 cycles(tsc) - 30 cycles(tsc) - improved (cycles:49) 62.0%
       250 -  86 cycles(tsc) - 37 cycles(tsc) - improved (cycles:49) 57.0%
      
      Performance with normal SLUB merging is significantly slower for
      larger bulking.  This is believed to (primarily) be an effect of not
      having to share the per-CPU data-structures, as tuning per-CPU size
      can achieve similar performance.
      
      bulk - slab_nomerge   -  normal SLUB merge
         1 -  49 cycles(tsc) - 49 cycles(tsc) - merge slower with cycles:0
         2 -  30 cycles(tsc) - 30 cycles(tsc) - merge slower with cycles:0
         3 -  23 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:0
         4 -  20 cycles(tsc) - 20 cycles(tsc) - merge slower with cycles:0
         8 -  18 cycles(tsc) - 18 cycles(tsc) - merge slower with cycles:0
        16 -  17 cycles(tsc) - 17 cycles(tsc) - merge slower with cycles:0
        30 -  18 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:5
        32 -  18 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:4
        34 -  23 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:-1
        48 -  21 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:1
        64 -  20 cycles(tsc) - 48 cycles(tsc) - merge slower with cycles:28
       128 -  27 cycles(tsc) - 57 cycles(tsc) - merge slower with cycles:30
       158 -  30 cycles(tsc) - 59 cycles(tsc) - merge slower with cycles:29
       250 -  37 cycles(tsc) - 56 cycles(tsc) - merge slower with cycles:19
      
      Joint work with Alexander Duyck.
      
      [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
      
      [akpm@linux-foundation.org: BUG_ON -> WARN_ON;return]
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0ecd894
    • J
      slub: support for bulk free with SLUB freelists · 81084651
      Jesper Dangaard Brouer 提交于
      Make it possible to free a freelist with several objects by adjusting API
      of slab_free() and __slab_free() to have head, tail and an objects counter
      (cnt).
      
      Tail being NULL indicate single object free of head object.  This allow
      compiler inline constant propagation in slab_free() and
      slab_free_freelist_hook() to avoid adding any overhead in case of single
      object free.
      
      This allows a freelist with several objects (all within the same
      slab-page) to be free'ed using a single locked cmpxchg_double in
      __slab_free() and with an unlocked cmpxchg_double in slab_free().
      
      Object debugging on the free path is also extended to handle these
      freelists.  When CONFIG_SLUB_DEBUG is enabled it will also detect if
      objects don't belong to the same slab-page.
      
      These changes are needed for the next patch to bulk free the detached
      freelists it introduces and constructs.
      
      Micro benchmarking showed no performance reduction due to this change,
      when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81084651
  2. 21 11月, 2015 22 次提交
  3. 20 11月, 2015 15 次提交
    • L
      Merge tag 'dmaengine-fix-4.4-rc2' of git://git.infradead.org/users/vkoul/slave-dma · 86eaf54d
      Linus Torvalds 提交于
      Pull dmaengine fixes from Vinod Koul:
       "This has odd fixes spreadout drivers, not major here
      
         - usbdmac fixes for pm
         - edma build and logic fixes
         - build warn fixes for few drivers"
      
      * tag 'dmaengine-fix-4.4-rc2' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: at_hdmac: use %pad format string for dma_addr_t
        dmaengine: at_xdmac: use %pad format string for dma_addr_t
        dmaengine: imx-sdma: remove __init annotation on sdma_event_remap
        dmaengine: edma: predecence bug in GET_NUM_QDMACH()
        dmaengine: edma: fix build without CONFIG_OF
        dmaengine: of_dma: Correct return code for of_dma_request_slave_channel in case !CONFIG_OF
        dmaengine: sh: usb-dmac: Fix pm_runtime_{enable,disable}() imbalance
        dmaengine: sh: usb-dmac: Fix crash on runtime suspend
      86eaf54d
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · c69bde78
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "A varied bunch of fixes, the radeon pull is probably a bit larger than
        I'd like, but it contains 2 weeks of stuff, and the Fiji fixes are a
        bit large, but they are Fiji specific.
      
        Otherwise:
      
         - mgag200: One cursor regression oops fix.
         - vc4: A few small fixes and cleanups.
         - core: Atomic fixes and Atomic helper fixes
         - i915: Revert for the backlight regression along with a bunch of
           fixes"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (58 commits)
        drm/atomic-helper: Check encoder/crtc constraints
        Revert "drm/i915: skip modeset if compatible for everyone."
        drm/mgag200: fix kernel hang in cursor code.
        drm/amdgpu: reserve/unreserve objects out of map/unmap operations
        drm/amdgpu: move bo_reserve out of amdgpu_vm_clear_bo
        drm/amdgpu: add lock for interval tree in vm
        drm/amdgpu: keep the owner for VMIDs
        drm/amdgpu: move VM manager clean into the VM code again
        drm/amdgpu: cleanup VM coding style
        drm/amdgpu: remove unused VM manager field
        drm/amdgpu: cleanup scheduler command submission
        drm/amdgpu: fix typo in firmware name
        drm/i915: Consider SPLL as another shared pll, v2.
        drm/i915: Fix gpu frequency change tracing
        drm/vc4: Make sure that planes aren't scaled.
        drm/vc4: Fix some failure to track __iomem decorations on pointers.
        drm/vc4: checking for NULL instead of IS_ERR
        drm/vc4: fix itnull.cocci warnings
        drm/vc4: fix platform_no_drv_owner.cocci warnings
        drm/vc4: vc4_plane_duplicate_state() can be static
        ...
      c69bde78
    • L
      Merge tag 'for-linus-4.4' of git://git.code.sf.net/p/openipmi/linux-ipmi · cd6caf55
      Linus Torvalds 提交于
      Pull IPMI updates from Corey Minyard:
       "Some fixes for small IPMI problems.
      
        The most significant is that the driver wasn't starting the timer for
        some messages, which would result in problems if that message failed
        for some reason.
      
        The others are small optimizations or making things a little neater"
      
      * tag 'for-linus-4.4' of git://git.code.sf.net/p/openipmi/linux-ipmi:
        ipmi watchdog : add panic_wdt_timeout parameter
        char: ipmi: Move MODULE_DEVICE_TABLE() to follow struct
        ipmi: Stop the timer immediately if idle
        ipmi: Start the timer and thread on internal msgs
      cd6caf55
    • L
      Merge tag 'renesas-sh-drivers-for-v4.4' of... · 8bdddfae
      Linus Torvalds 提交于
      Merge tag 'renesas-sh-drivers-for-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas
      
      Pull SH driver fixlet from Simon Horman:
       "I am sending this change after v4.4-rc1 has been released as it
        depends on SoC changes which are present in that rc:
      
         = Remove now unnecessary reference to CONFIG_ARCH_SHMOBILE_MULTI"
      
      * tag 'renesas-sh-drivers-for-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
        drivers: sh: Get rid of CONFIG_ARCH_SHMOBILE_MULTI
      8bdddfae
    • R
      Merge branches 'acpi-smbus', 'acpi-ec' and 'acpi-pci' · a3767e3c
      Rafael J. Wysocki 提交于
      * acpi-smbus:
        Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"
        ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code
      
      * acpi-ec:
        ACPI-EC: Drop unnecessary check made before calling acpi_ec_delete_query()
      
      * acpi-pci:
        PCI: Fix OF logic in pci_dma_configure()
      a3767e3c
    • R
      Merge branch 'pm-sleep' · 0aba0ab8
      Rafael J. Wysocki 提交于
      * pm-sleep:
        PM / wakeirq: check that wake IRQ is valid before accepting it
      0aba0ab8
    • R
      Merge branches 'pm-cpufreq' and 'acpi-cppc' · 9832bf3a
      Rafael J. Wysocki 提交于
      * pm-cpufreq:
        Revert "Documentation: kernel_parameters for Intel P state driver"
        cpufreq: mediatek: fix build error
        cpufreq: intel_pstate: Add separate support for Airmont cores
        cpufreq: intel_pstate: Replace BYT with ATOM
        Revert "cpufreq: intel_pstate: Use ACPI perf configuration"
        Revert "cpufreq: intel_pstate: Avoid calculation for max/min"
      
      * acpi-cppc:
        ACPI / CPPC: Use h/w reduced version of the PCCT structure
      9832bf3a
    • S
      PCI: Fix OF logic in pci_dma_configure() · 768acd64
      Suravee Suthikulpanit 提交于
      This patch fixes a bug introduced by previous commit,
      which incorrectly checkes the of_node of the end-point device.
      Instead, it should check the of_node of the host bridge.
      
      Fixes: 50230713 ("PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()")
      Reported-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      768acd64
    • D
      Merge tag 'drm-intel-fixes-2015-11-19' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 2d591ab1
      Dave Airlie 提交于
      i915 fixes for 4.4, including the revert for the backlight regression
      Olof reported. Otherwise fixes all around.
      
      * tag 'drm-intel-fixes-2015-11-19' of git://anongit.freedesktop.org/drm-intel:
        Revert "drm/i915: skip modeset if compatible for everyone."
        drm/i915: Consider SPLL as another shared pll, v2.
        drm/i915: Fix gpu frequency change tracing
        drm/i915: Don't clobber the addfb2 ioctl params
        drm/i915: Clear intel_crtc->atomic before updating it.
        drm/i915: get runtime PM reference around GEM set_caching IOCTL
        drm/i915: Fix GT frequency rounding
        drm/i915: quirk backlight present on Macbook 4, 1
        drm/i915: Fix crtc_y assignment in intel_find_initial_plane_obj()
      2d591ab1
    • D
      Merge tag 'topic/drm-fixes-2015-11-19' of git://anongit.freedesktop.org/drm-intel into drm-fixes · db395637
      Dave Airlie 提交于
      Here are some drm core fixes for v4.4 that I've picked up. Atomic fixes
      from Maarten, and atomic helper fixes from Ville and Daniel.
      
      Admittedly the topmost commit didn't sit in our tree for very long, but
      does come with reviews and testing from trustworthy people.
      
      * tag 'topic/drm-fixes-2015-11-19' of git://anongit.freedesktop.org/drm-intel:
        drm/atomic-helper: Check encoder/crtc constraints
        drm: Fix primary plane size for stereo doubled modes for legacy setcrtc
        drm/core: Fix old_fb handling in pan_display_atomic.
        drm/core: Fix old_fb handling in restore_fbdev_mode_atomic.
        drm/atomic: add a drm_atomic_clean_old_fb helper.
        drm/core: Fix old_fb handling in drm_mode_atomic_ioctl.
        drm/core: Set legacy_cursor_update in drm_atomic_helper_disable_plane.
      db395637
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · b4ba1f0f
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix size alignment in __iommu_{alloc,free}_attrs
      
       - Kernel memory mapping fix with CONFIG_DEBUG_RODATA for page sizes
         other than 4KB and a fix of the mark_rodata_ro permissions
      
       - dma_get_ops() simplification and behaviour alignment between DT and
         ACPI
      
       - function_graph trace fix for cpu_suspend() (CPUs returning from deep
         sleep via a different path and confusing the tracer)
      
       - Use of non-global mappings for UEFI run-time services to avoid a
         (potentially theoretical) TLB conflict
      
       - Crypto priority reduction of core AES cipher (the accelerated
         asynchronous implementation is preferred when available)
      
       - Reverting an old commit that removed BogoMIPS from /proc/cpuinfo on
         arm64.  Apparently, we had it for a relatively short time and libvirt
         started checking for its presence
      
       - Compiler warnings fixed (ptrace.h inclusion from compat.h,
         smp_load_acquire with const argument)
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: restore bogomips information in /proc/cpuinfo
        arm64: barriers: fix smp_load_acquire to work with const arguments
        arm64: Fix R/O permissions in mark_rodata_ro
        arm64: crypto: reduce priority of core AES cipher
        arm64: use non-global mappings for UEFI runtime regions
        arm64: kernel: pause/unpause function graph tracer in cpu_suspend()
        arm64: do not include ptrace.h from compat.h
        arm64: simplify dma_get_ops
        arm64: mm: use correct mapping granularity under DEBUG_RODATA
        arm64/dma-mapping: Fix sizes in __iommu_{alloc,free}_attrs
      b4ba1f0f
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · a3d66b5a
      Linus Torvalds 提交于
      Pull livepatching fix from Jiri Kosina:
       "A fix for module handling in case kASLR has been enabled, from Zhou
        Chengming"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: x86: fix relocation computation with kASLR
      a3d66b5a
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 319645ca
      Linus Torvalds 提交于
      Pull HID fixes from Jiri Kosina:
       "Two functional fixes for wacom HID driver from Ping Cheng and Jiri
        Kosina"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: wacom: fixup quirks setup for WACOM_DEVICETYPE_PAD
        HID: wacom: Add outbounding area for DTU1141
      319645ca
    • L
      Merge tag 'mmc-v4.4-rc1' of git://git.linaro.org/people/ulf.hansson/mmc · 1282ac40
      Linus Torvalds 提交于
      Pull MMC fixes from Ulf Hansson:
       "Here are some mmc fixes intended for v4.4 rc2.  It's based on a commit
        prior rc1 as I wanted to get them a bit more tested in next before
        sending you the pull request.
      
        MMC core:
         - Improve reliability when selecting HS200 mode
         - Improve reliability when selecting HS400 mode
         - mmc: remove bondage between REQ_META and reliable write
      
        MMC host:
         - pxamci: Fix read-only gpio detection polarity
         - mtk-sd: Preinitialize delay_phase to fix the case when delay is zero
         - android-goldfish: Fix build dependency by adding HAS_DMA
         - dw_mmc: Remove Seungwon Jeon from MAINTAINERS"
      
      * tag 'mmc-v4.4-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
        mmc: remove bondage between REQ_META and reliable write
        mmc: MMC_GOLDFISH should depend on HAS_DMA
        mmc: mediatek: Preinitialize delay_phase in get_best_delay()
        MAINTAINERS: mmc: Remove Seungwon Jeon from dw_mmc
        mmc: mmc: Improve reliability of mmc_select_hs400()
        mmc: mmc: Move mmc_switch_status()
        mmc: mmc: Fix HS setting in mmc_select_hs400()
        mmc: mmc: Improve reliability of mmc_select_hs200()
        mmc: pxamci: fix read-only gpio detection polarity
      1282ac40
    • Y
      arm64: restore bogomips information in /proc/cpuinfo · 92e788b7
      Yang Shi 提交于
      As previously reported, some userspace applications depend on bogomips
      showed by /proc/cpuinfo. Although there is much less legacy impact on
      aarch64 than arm, it does break libvirt.
      
      This patch reverts commit 326b16db ("arm64: delay: don't bother
      reporting bogomips in /proc/cpuinfo"), but with some tweak due to
      context change and without the pr_info().
      
      Fixes: 326b16db ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo")
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org> # 3.12+
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      92e788b7
  4. 19 11月, 2015 1 次提交