1. 09 9月, 2015 1 次提交
    • M
      mm: add utility for early copy from unmapped ram · 6b0f68e3
      Mark Salter 提交于
      When booting an arm64 kernel w/initrd using UEFI/grub, use of mem= will
      likely cut off part or all of the initrd.  This leaves it outside the
      kernel linear map which leads to failure when unpacking.  The x86 code
      has a similar need to relocate an initrd outside of mapped memory in
      some cases.
      
      The current x86 code uses early_memremap() to copy the original initrd
      from unmapped to mapped RAM.  This patchset creates a generic
      copy_from_early_mem() utility based on that x86 code and has arm64 and
      x86 share it in their respective initrd relocation code.
      
      This patch (of 3):
      
      In some early boot circumstances, it may be necessary to copy from RAM
      outside the kernel linear mapping to mapped RAM.  The need to relocate
      an initrd is one example in the x86 code.  This patch creates a helper
      function based on current x86 code.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b0f68e3
  2. 25 8月, 2015 1 次提交
    • L
      PCI: Add pci_iomap_wc() variants · 1b3d4200
      Luis R. Rodriguez 提交于
      PCI BARs tell us whether prefetching is safe, but they don't say
      anything about write combining (WC). WC changes ordering rules
      and allows writes to be collapsed, so it's not safe in general
      to use it on a prefetchable region.
      
      Add pci_iomap_wc() and pci_iomap_wc_range() so drivers can take
      advantage of write combining when they know it's safe.
      
      On architectures that don't fully support WC, e.g., x86 without
      PAT, drivers for legacy framebuffers may get some of the benefit
      by using arch_phys_wc_add() in addition to pci_iomap_wc().  But
      arch_phys_wc_add() is unreliable and should be avoided in
      general.  On x86, it uses MTRRs, which are limited in number and
      size, so the results will vary based on driver loading order.
      
      The goals of adding pci_iomap_wc() are to:
      
      - Give drivers an architecture-independent way to use WC so they can stop
        using interfaces like mtrr_add() (on x86, pci_iomap_wc() uses
        PAT when available).
      
      - Move toward using _PAGE_CACHE_MODE_UC, not _PAGE_CACHE_MODE_UC_MINUS,
        on x86 on ioremap_nocache() (see de33c442 ("x86 PAT: fix
        performance drop for glx, use UC minus for ioremap(), ioremap_nocache()
        and pci_mmap_page_range()").
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      [ Move IORESOURCE_IO check up, space out statements for better readability. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: <roger.pau@citrix.com>
      Cc: <syrjala@sci.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stefan Bader <stefan.bader@canonical.com>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Ville Syrjälä <syrjala@sci.fi>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: airlied@linux.ie
      Cc: benh@kernel.crashing.org
      Cc: dan.j.williams@intel.com
      Cc: david.vrabel@citrix.com
      Cc: jbeulich@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-arch@vger.kernel.org
      Cc: linux-fbdev@vger.kernel.org
      Cc: linux-pci@vger.kernel.org
      Cc: venkatesh.pallipadi@intel.com
      Cc: vinod.koul@intel.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1440443613-13696-6-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1b3d4200
  3. 12 8月, 2015 4 次提交
  4. 03 8月, 2015 2 次提交
    • K
      sched/preempt: Fix cond_resched_lock() and cond_resched_softirq() · fe32d3cd
      Konstantin Khlebnikov 提交于
      These functions check should_resched() before unlocking spinlock/bh-enable:
      preempt_count always non-zero => should_resched() always returns false.
      cond_resched_lock() worked iff spin_needbreak is set.
      
      This patch adds argument "preempt_offset" to should_resched().
      
      preempt_count offset constants for that:
      
        PREEMPT_DISABLE_OFFSET  - offset after preempt_disable()
        PREEMPT_LOCK_OFFSET     - offset after spin_lock()
        SOFTIRQ_DISABLE_OFFSET  - offset after local_bh_distable()
        SOFTIRQ_LOCK_OFFSET     - offset after spin_lock_bh()
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: bdb43806 ("sched: Extract the basic add/sub preempt_count modifiers")
      Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fe32d3cd
    • A
      locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire() · 76695af2
      Andrey Konovalov 提交于
      Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire()
      with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips,
      powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work
      reliably on non-scalar types.
      
      WRITE_ONCE() and READ_ONCE() were introduced in the following commits:
      
        230fa253 ("kernel: Provide READ_ONCE and ASSIGN_ONCE")
        43239cbe ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      76695af2
  5. 27 7月, 2015 3 次提交
  6. 21 7月, 2015 1 次提交
    • L
      x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default · 8c7ea50c
      Luis R. Rodriguez 提交于
      We currently have no safe way of currently defining architecture
      agnostic IOMMU ioremap_*() variants. The trend is for folks to
      *assume* that ioremap_nocache() should be the default everywhere
      and then add this mapping on each architectures -- this is not
      correct today for a variety of reasons.
      
      We have two options:
      
        1) Sit and wait for every architecture in Linux to get a
           an ioremap_*() variant defined before including it upstream.
      
        2) Gather consensus on a safe architecture agnostic ioremap_*()
           default.
      
      Approach 1) introduces development latencies, and since 2) will
      take time and work on clarifying semantics the only remaining
      sensible thing to do to avoid issues is returning NULL on
      ioremap_*() variants.
      
      In order for this to work we must have all architectures declare
      their own ioremap_*() variants as defined. This will take some
      work, do this for ioremp_uc() to set the example as its only
      currently implemented on x86. Document all this.
      
      We only provide implementation support for ioremap_uc() as the
      other ioremap_*() variants are well defined all over the kernel
      for other architectures already.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: benh@kernel.crashing.org
      Cc: bp@suse.de
      Cc: dan.j.williams@intel.com
      Cc: geert@linux-m68k.org
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: linux-mm@kvack.org
      Cc: luto@amacapital.net
      Cc: mpe@ellerman.id.au
      Cc: mst@redhat.com
      Cc: ralf@linux-mips.org
      Cc: ross.zwisler@linux.intel.com
      Cc: stefan.bader@canonical.com
      Cc: tj@kernel.org
      Cc: tomi.valkeinen@ti.com
      Cc: toshi.kani@hp.com
      Link: http://lkml.kernel.org/r/1436488096-3165-1-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8c7ea50c
  7. 18 7月, 2015 1 次提交
  8. 06 7月, 2015 2 次提交
  9. 25 6月, 2015 4 次提交
  10. 23 6月, 2015 1 次提交
  11. 08 6月, 2015 1 次提交
  12. 07 6月, 2015 3 次提交
  13. 19 5月, 2015 4 次提交
  14. 13 5月, 2015 1 次提交
    • S
      locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG · cede8841
      Sebastian Andrzej Siewior 提交于
      The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few
      other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This
      define was first introduced in 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      which is v2.6.18. The generic cmpxchg was introduced later in 068fbad2
      ("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is
      v2.6.25.
      Back then something was required to get rtmutex working with the fast
      path on architectures without cmpxchg and this seems to be the result.
      
      It popped up recently on rt-users because ARM (v6+) does not define
      __HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower
      locking performance in the fast path.
      To put some numbers on it: preempt -RT, am335x, 10 loops of
      100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is
      the average of the 10 loops for the 100000 invocations, "loop" is
      "total / 100000 * 1000"):
      
           cmpxchg |    slowpath used  ||    cmpxchg used
                   |   total   | loop  ||   total    | loop
           --------|-----------|-------||------------|-------
           ARMv6   | 9129.4 us | 91 ns ||  3311.9 us |  33 ns
           generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns
           ----------------------------||--------------------
      
      Forcing it to generic cmpxchg() made things worse for the slowpath and
      even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock
      in a cache hot loop so it might not be that much in real world.
      The last test was a substitute for pre ARMv6 machine but then I was able
      to perform the comparison on imx28 which is ARMv5 and therefore is
      always is using the generic cmpxchg implementation. And the numbers:
      
                    |   total     | loop
           -------- |-----------  |--------
           slowpath | 263937.2 us | 2639 ns
           cmpxchg  |  16934.2 us |  169 ns
           --------------------------------
      
      The numbers are larger since the machine is slower in general. However,
      letting rtmutex use cmpxchg() instead the slowpath seem to improve things.
      
      Since from the ARM (tested on am335x + imx28) point of view always
      using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I
      would drop the define.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: will.deacon@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20150225175613.GE6823@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cede8841
  15. 12 5月, 2015 1 次提交
    • J
      gpio: remove gpiod_sysfs_set_active_low · 166a85e4
      Johan Hovold 提交于
      Remove gpiod_sysfs_set_active_low (and gpio_sysfs_set_active_low) which
      allowed code to change the polarity of a gpio line even after it had
      been exported through sysfs.
      
      Drivers should not care, and generally does not know, about gpio-line
      polarity which is a hardware feature that needs to be described by
      firmware.
      
      It is currently possible to define gpio-line polarity in device-tree and
      acpi firmware or using platform data. Userspace can also change the
      polarity through sysfs.
      
      Note that drivers using the legacy gpio interface could still use
      GPIOF_ACTIVE_LOW to change the polarity before exporting the gpio.
      
      There are no in-kernel users of this interface.
      
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Harry Wei <harryxiyou@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@zh-kernel.org
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Reviewed-by: NAlexandre Courbot <acourbot@nvidia.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      166a85e4
  16. 11 5月, 2015 1 次提交
    • L
      x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) · e4b6be33
      Luis R. Rodriguez 提交于
      ioremap_nocache() currently uses UC- by default. Our goal is to
      eventually make UC the default. Linux maps UC- to PCD=1, PWT=0
      page attributes on non-PAT systems. Linux maps UC to PCD=1,
      PWT=1 page attributes on non-PAT systems. On non-PAT and PAT
      systems a WC MTRR has different effects on pages with either of
      these attributes. In order to help with a smooth transition its
      best to enable use of UC (PCD,1, PWT=1) on a region as that
      ensures a WC MTRR will have no effect on a region, this however
      requires us to have an way to declare a region as UC and we
      currently do not have a way to do this.
      
        WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
        WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.
      
        WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
        WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.
      
      A flip of the default ioremap_nocache() behaviour from UC- to UC
      can therefore regress a memory region from effective memory type
      WC to UC if MTRRs are used. Use of MTRRs should be phased out
      and in the best case only arch_phys_wc_add() use will remain,
      even if this happens arch_phys_wc_add() will have an effect on
      non-PAT systems and changes to default ioremap_nocache()
      behaviour could regress drivers.
      
      Now, ideally we'd use ioremap_nocache() on the regions in which
      we'd need uncachable memory types and avoid any MTRRs on those
      regions. There are however some restrictions on MTRRs use, such
      as the requirement of having the base and size of variable sized
      MTRRs to be powers of two, which could mean having to use a WC
      MTRR over a large area which includes a region in which
      write-combining effects are undesirable.
      
      Add ioremap_uc() to help with the both phasing out of MTRR use
      and also provide a way to blacklist small WC undesirable regions
      in devices with mixed regions which are size-implicated to use
      large WC MTRRs. Use of ioremap_uc() helps phase out MTRR use by
      avoiding regressions with an eventual flip of default behaviour
      or ioremap_nocache() from UC- to UC.
      
      Drivers working with WC MTRRs can use the below table to review
      and consider the use of ioremap*() and similar helpers to ensure
      appropriate behaviour long term even if default
      ioremap_nocache() behaviour changes from UC- to UC.
      
      Although ioremap_uc() is being added we leave set_memory_uc() to
      use UC- as only initial memory type setup is required to be able
      to accommodate existing device drivers and phase out MTRR use.
      It should also be clarified that set_memory_uc() cannot be used
      with IO memory, even though its use will not return any errors,
      it really has no effect.
      
        ----------------------------------------------------------------------
        MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
        ----------------------------------------------------------------------
                                                          Non-PAT |  PAT
             PAT
             |PCD
             ||PWT
             |||
        WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
        WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
        WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
        WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
        ----------------------------------------------------------------------
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Ville Syrjälä <syrjala@sci.fi>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-fbdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1430343851-967-2-git-send-email-mcgrof@do-not-panic.com
      Link: http://lkml.kernel.org/r/1431332153-18566-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4b6be33
  17. 08 5月, 2015 5 次提交
    • P
      locking/qspinlock: Revert to test-and-set on hypervisors · 2aa79af6
      Peter Zijlstra (Intel) 提交于
      When we detect a hypervisor (!paravirt, see qspinlock paravirt support
      patches), revert to a simple test-and-set lock to avoid the horrors
      of queue preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2aa79af6
    • P
      locking/qspinlock: Optimize for smaller NR_CPUS · 69f9cae9
      Peter Zijlstra (Intel) 提交于
      When we allow for a max NR_CPUS < 2^14 we can optimize the pending
      wait-acquire and the xchg_tail() operations.
      
      By growing the pending bit to a byte, we reduce the tail to 16bit.
      This means we can use xchg16 for the tail part and do away with all
      the repeated compxchg() operations.
      
      This in turn allows us to unconditionally acquire; the locked state
      as observed by the wait loops cannot change. And because both locked
      and pending are now a full byte we can use simple stores for the
      state transition, obviating one atomic operation entirely.
      
      This optimization is needed to make the qspinlock achieve performance
      parity with ticket spinlock at light load.
      
      All this is horribly broken on Alpha pre EV56 (and any other arch that
      cannot do single-copy atomic byte stores).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69f9cae9
    • W
      locking/qspinlock: Extract out code snippets for the next patch · 6403bd7d
      Waiman Long 提交于
      This is a preparatory patch that extracts out the following 2 code
      snippets to prepare for the next performance optimization patch.
      
       1) the logic for the exchange of new and previous tail code words
          into a new xchg_tail() function.
       2) the logic for clearing the pending bit and setting the locked bit
          into a new clear_pending_set_locked() function.
      
      This patch also simplifies the trylock operation before queuing by
      calling queued_spin_trylock() directly.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6403bd7d
    • P
      locking/qspinlock: Add pending bit · c1fb159d
      Peter Zijlstra (Intel) 提交于
      Because the qspinlock needs to touch a second cacheline (the per-cpu
      mcs_nodes[]); add a pending bit and allow a single in-word spinner
      before we punt to the second cacheline.
      
      It is possible so observe the pending bit without the locked bit when
      the last owner has just released but the pending owner has not yet
      taken ownership.
      
      In this case we would normally queue -- because the pending bit is
      already taken. However, in this case the pending bit is guaranteed
      to be released 'soon', therefore wait for it and avoid queueing.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1fb159d
    • W
      locking/qspinlock: Introduce a simple generic 4-byte queued spinlock · a33fda35
      Waiman Long 提交于
      This patch introduces a new generic queued spinlock implementation that
      can serve as an alternative to the default ticket spinlock. Compared
      with the ticket spinlock, this queued spinlock should be almost as fair
      as the ticket spinlock. It has about the same speed in single-thread
      and it can be much faster in high contention situations especially when
      the spinlock is embedded within the data structure to be protected.
      
      Only in light to moderate contention where the average queue depth
      is around 1-3 will this queued spinlock be potentially a bit slower
      due to the higher slowpath overhead.
      
      This queued spinlock is especially suit to NUMA machines with a large
      number of cores as the chance of spinlock contention is much higher
      in those machines. The cost of contention is also higher because of
      slower inter-node memory traffic.
      
      Due to the fact that spinlocks are acquired with preemption disabled,
      the process will not be migrated to another CPU while it is trying
      to get a spinlock. Ignoring interrupt handling, a CPU can only be
      contending in one spinlock at any one time. Counting soft IRQ, hard
      IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
      activities.  By allocating a set of per-cpu queue nodes and used them
      to form a waiting queue, we can encode the queue node address into a
      much smaller 24-bit size (including CPU number and queue node index)
      leaving one byte for the lock.
      
      Please note that the queue node is only needed when waiting for the
      lock. Once the lock is acquired, the queue node can be released to
      be used later.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a33fda35
  18. 06 5月, 2015 1 次提交
  19. 17 4月, 2015 1 次提交
    • K
      seccomp: allow COMPAT sigreturn overrides · ddaa27ee
      Kees Cook 提交于
      Most architectures don't need to do much special for the strict-mode
      seccomp syscall entries.  Remove the redundant headers and reduce the
      others.
      
      This patch (of 8):
      
      Some architectures may need to override the compat sigreturn definition,
      as is already possible in the non-compat case.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddaa27ee
  20. 15 4月, 2015 2 次提交
    • T
      mm: change vunmap to tear down huge KVA mappings · b9820d8f
      Toshi Kani 提交于
      Change vunmap_pmd_range() and vunmap_pud_range() to tear down huge KVA
      mappings when they are set.  pud_clear_huge() and pmd_clear_huge() return
      zero when no-operation is performed, i.e.  huge page mapping was not used.
      
      These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined
      on the architecture.
      
      [akpm@linux-foundation.org: use consistent code layout]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9820d8f
    • T
      mm: change ioremap to set up huge I/O mappings · e61ce6ad
      Toshi Kani 提交于
      ioremap_pud_range() and ioremap_pmd_range() are changed to create huge I/O
      mappings when their capability is enabled, and a request meets required
      conditions -- both virtual & physical addresses are aligned by their huge
      page size, and a requested range fufills their huge page size.  When
      pud_set_huge() or pmd_set_huge() returns zero, i.e.  no-operation is
      performed, the code simply falls back to the next level.
      
      The changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined on
      the architecture.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e61ce6ad