1. 18 7月, 2015 1 次提交
  2. 25 6月, 2015 4 次提交
  3. 23 6月, 2015 1 次提交
  4. 08 6月, 2015 1 次提交
  5. 07 6月, 2015 3 次提交
  6. 19 5月, 2015 4 次提交
  7. 13 5月, 2015 1 次提交
    • S
      locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG · cede8841
      Sebastian Andrzej Siewior 提交于
      The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few
      other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This
      define was first introduced in 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      which is v2.6.18. The generic cmpxchg was introduced later in 068fbad2
      ("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is
      v2.6.25.
      Back then something was required to get rtmutex working with the fast
      path on architectures without cmpxchg and this seems to be the result.
      
      It popped up recently on rt-users because ARM (v6+) does not define
      __HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower
      locking performance in the fast path.
      To put some numbers on it: preempt -RT, am335x, 10 loops of
      100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is
      the average of the 10 loops for the 100000 invocations, "loop" is
      "total / 100000 * 1000"):
      
           cmpxchg |    slowpath used  ||    cmpxchg used
                   |   total   | loop  ||   total    | loop
           --------|-----------|-------||------------|-------
           ARMv6   | 9129.4 us | 91 ns ||  3311.9 us |  33 ns
           generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns
           ----------------------------||--------------------
      
      Forcing it to generic cmpxchg() made things worse for the slowpath and
      even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock
      in a cache hot loop so it might not be that much in real world.
      The last test was a substitute for pre ARMv6 machine but then I was able
      to perform the comparison on imx28 which is ARMv5 and therefore is
      always is using the generic cmpxchg implementation. And the numbers:
      
                    |   total     | loop
           -------- |-----------  |--------
           slowpath | 263937.2 us | 2639 ns
           cmpxchg  |  16934.2 us |  169 ns
           --------------------------------
      
      The numbers are larger since the machine is slower in general. However,
      letting rtmutex use cmpxchg() instead the slowpath seem to improve things.
      
      Since from the ARM (tested on am335x + imx28) point of view always
      using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I
      would drop the define.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: will.deacon@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20150225175613.GE6823@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cede8841
  8. 12 5月, 2015 1 次提交
    • J
      gpio: remove gpiod_sysfs_set_active_low · 166a85e4
      Johan Hovold 提交于
      Remove gpiod_sysfs_set_active_low (and gpio_sysfs_set_active_low) which
      allowed code to change the polarity of a gpio line even after it had
      been exported through sysfs.
      
      Drivers should not care, and generally does not know, about gpio-line
      polarity which is a hardware feature that needs to be described by
      firmware.
      
      It is currently possible to define gpio-line polarity in device-tree and
      acpi firmware or using platform data. Userspace can also change the
      polarity through sysfs.
      
      Note that drivers using the legacy gpio interface could still use
      GPIOF_ACTIVE_LOW to change the polarity before exporting the gpio.
      
      There are no in-kernel users of this interface.
      
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Harry Wei <harryxiyou@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@zh-kernel.org
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Reviewed-by: NAlexandre Courbot <acourbot@nvidia.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      166a85e4
  9. 11 5月, 2015 1 次提交
    • L
      x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) · e4b6be33
      Luis R. Rodriguez 提交于
      ioremap_nocache() currently uses UC- by default. Our goal is to
      eventually make UC the default. Linux maps UC- to PCD=1, PWT=0
      page attributes on non-PAT systems. Linux maps UC to PCD=1,
      PWT=1 page attributes on non-PAT systems. On non-PAT and PAT
      systems a WC MTRR has different effects on pages with either of
      these attributes. In order to help with a smooth transition its
      best to enable use of UC (PCD,1, PWT=1) on a region as that
      ensures a WC MTRR will have no effect on a region, this however
      requires us to have an way to declare a region as UC and we
      currently do not have a way to do this.
      
        WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
        WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.
      
        WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
        WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.
      
      A flip of the default ioremap_nocache() behaviour from UC- to UC
      can therefore regress a memory region from effective memory type
      WC to UC if MTRRs are used. Use of MTRRs should be phased out
      and in the best case only arch_phys_wc_add() use will remain,
      even if this happens arch_phys_wc_add() will have an effect on
      non-PAT systems and changes to default ioremap_nocache()
      behaviour could regress drivers.
      
      Now, ideally we'd use ioremap_nocache() on the regions in which
      we'd need uncachable memory types and avoid any MTRRs on those
      regions. There are however some restrictions on MTRRs use, such
      as the requirement of having the base and size of variable sized
      MTRRs to be powers of two, which could mean having to use a WC
      MTRR over a large area which includes a region in which
      write-combining effects are undesirable.
      
      Add ioremap_uc() to help with the both phasing out of MTRR use
      and also provide a way to blacklist small WC undesirable regions
      in devices with mixed regions which are size-implicated to use
      large WC MTRRs. Use of ioremap_uc() helps phase out MTRR use by
      avoiding regressions with an eventual flip of default behaviour
      or ioremap_nocache() from UC- to UC.
      
      Drivers working with WC MTRRs can use the below table to review
      and consider the use of ioremap*() and similar helpers to ensure
      appropriate behaviour long term even if default
      ioremap_nocache() behaviour changes from UC- to UC.
      
      Although ioremap_uc() is being added we leave set_memory_uc() to
      use UC- as only initial memory type setup is required to be able
      to accommodate existing device drivers and phase out MTRR use.
      It should also be clarified that set_memory_uc() cannot be used
      with IO memory, even though its use will not return any errors,
      it really has no effect.
      
        ----------------------------------------------------------------------
        MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
        ----------------------------------------------------------------------
                                                          Non-PAT |  PAT
             PAT
             |PCD
             ||PWT
             |||
        WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
        WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
        WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
        WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
        ----------------------------------------------------------------------
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Ville Syrjälä <syrjala@sci.fi>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-fbdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1430343851-967-2-git-send-email-mcgrof@do-not-panic.com
      Link: http://lkml.kernel.org/r/1431332153-18566-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4b6be33
  10. 08 5月, 2015 5 次提交
    • P
      locking/qspinlock: Revert to test-and-set on hypervisors · 2aa79af6
      Peter Zijlstra (Intel) 提交于
      When we detect a hypervisor (!paravirt, see qspinlock paravirt support
      patches), revert to a simple test-and-set lock to avoid the horrors
      of queue preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2aa79af6
    • P
      locking/qspinlock: Optimize for smaller NR_CPUS · 69f9cae9
      Peter Zijlstra (Intel) 提交于
      When we allow for a max NR_CPUS < 2^14 we can optimize the pending
      wait-acquire and the xchg_tail() operations.
      
      By growing the pending bit to a byte, we reduce the tail to 16bit.
      This means we can use xchg16 for the tail part and do away with all
      the repeated compxchg() operations.
      
      This in turn allows us to unconditionally acquire; the locked state
      as observed by the wait loops cannot change. And because both locked
      and pending are now a full byte we can use simple stores for the
      state transition, obviating one atomic operation entirely.
      
      This optimization is needed to make the qspinlock achieve performance
      parity with ticket spinlock at light load.
      
      All this is horribly broken on Alpha pre EV56 (and any other arch that
      cannot do single-copy atomic byte stores).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69f9cae9
    • W
      locking/qspinlock: Extract out code snippets for the next patch · 6403bd7d
      Waiman Long 提交于
      This is a preparatory patch that extracts out the following 2 code
      snippets to prepare for the next performance optimization patch.
      
       1) the logic for the exchange of new and previous tail code words
          into a new xchg_tail() function.
       2) the logic for clearing the pending bit and setting the locked bit
          into a new clear_pending_set_locked() function.
      
      This patch also simplifies the trylock operation before queuing by
      calling queued_spin_trylock() directly.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6403bd7d
    • P
      locking/qspinlock: Add pending bit · c1fb159d
      Peter Zijlstra (Intel) 提交于
      Because the qspinlock needs to touch a second cacheline (the per-cpu
      mcs_nodes[]); add a pending bit and allow a single in-word spinner
      before we punt to the second cacheline.
      
      It is possible so observe the pending bit without the locked bit when
      the last owner has just released but the pending owner has not yet
      taken ownership.
      
      In this case we would normally queue -- because the pending bit is
      already taken. However, in this case the pending bit is guaranteed
      to be released 'soon', therefore wait for it and avoid queueing.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1fb159d
    • W
      locking/qspinlock: Introduce a simple generic 4-byte queued spinlock · a33fda35
      Waiman Long 提交于
      This patch introduces a new generic queued spinlock implementation that
      can serve as an alternative to the default ticket spinlock. Compared
      with the ticket spinlock, this queued spinlock should be almost as fair
      as the ticket spinlock. It has about the same speed in single-thread
      and it can be much faster in high contention situations especially when
      the spinlock is embedded within the data structure to be protected.
      
      Only in light to moderate contention where the average queue depth
      is around 1-3 will this queued spinlock be potentially a bit slower
      due to the higher slowpath overhead.
      
      This queued spinlock is especially suit to NUMA machines with a large
      number of cores as the chance of spinlock contention is much higher
      in those machines. The cost of contention is also higher because of
      slower inter-node memory traffic.
      
      Due to the fact that spinlocks are acquired with preemption disabled,
      the process will not be migrated to another CPU while it is trying
      to get a spinlock. Ignoring interrupt handling, a CPU can only be
      contending in one spinlock at any one time. Counting soft IRQ, hard
      IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
      activities.  By allocating a set of per-cpu queue nodes and used them
      to form a waiting queue, we can encode the queue node address into a
      much smaller 24-bit size (including CPU number and queue node index)
      leaving one byte for the lock.
      
      Please note that the queue node is only needed when waiting for the
      lock. Once the lock is acquired, the queue node can be released to
      be used later.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a33fda35
  11. 06 5月, 2015 1 次提交
  12. 17 4月, 2015 1 次提交
    • K
      seccomp: allow COMPAT sigreturn overrides · ddaa27ee
      Kees Cook 提交于
      Most architectures don't need to do much special for the strict-mode
      seccomp syscall entries.  Remove the redundant headers and reduce the
      others.
      
      This patch (of 8):
      
      Some architectures may need to override the compat sigreturn definition,
      as is already possible in the non-compat case.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddaa27ee
  13. 15 4月, 2015 3 次提交
    • T
      mm: change vunmap to tear down huge KVA mappings · b9820d8f
      Toshi Kani 提交于
      Change vunmap_pmd_range() and vunmap_pud_range() to tear down huge KVA
      mappings when they are set.  pud_clear_huge() and pmd_clear_huge() return
      zero when no-operation is performed, i.e.  huge page mapping was not used.
      
      These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined
      on the architecture.
      
      [akpm@linux-foundation.org: use consistent code layout]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9820d8f
    • T
      mm: change ioremap to set up huge I/O mappings · e61ce6ad
      Toshi Kani 提交于
      ioremap_pud_range() and ioremap_pmd_range() are changed to create huge I/O
      mappings when their capability is enabled, and a request meets required
      conditions -- both virtual & physical addresses are aligned by their huge
      page size, and a requested range fufills their huge page size.  When
      pud_set_huge() or pmd_set_huge() returns zero, i.e.  no-operation is
      performed, the code simply falls back to the next level.
      
      The changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined on
      the architecture.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e61ce6ad
    • K
      mm: define default PGTABLE_LEVELS to two · 235a8f02
      Kirill A. Shutemov 提交于
      By this time all architectures which support more than two page table
      levels should be covered.  This patch add default definiton of
      PGTABLE_LEVELS equal 2.
      
      We also add assert to detect inconsistence between CONFIG_PGTABLE_LEVELS
      and __PAGETABLE_PMD_FOLDED/__PAGETABLE_PUD_FOLDED.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      235a8f02
  14. 10 4月, 2015 1 次提交
  15. 08 4月, 2015 1 次提交
    • S
      tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values · 0c564a53
      Steven Rostedt (Red Hat) 提交于
      Several tracepoints use the helper functions __print_symbolic() or
      __print_flags() and pass in enums that do the mapping between the
      binary data stored and the value to print. This works well for reading
      the ASCII trace files, but when the data is read via userspace tools
      such as perf and trace-cmd, the conversion of the binary value to a
      human string format is lost if an enum is used, as userspace does not
      have access to what the ENUM is.
      
      For example, the tracepoint trace_tlb_flush() has:
      
       __print_symbolic(REC->reason,
          { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
          { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
          { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
          { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
      
      Which maps the enum values to the strings they represent. But perf and
      trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
      not be able to map it.
      
      With TRACE_DEFINE_ENUM(), developers can place these in the event header
      files and ftrace will convert the enums to their values:
      
      By adding:
      
       TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
       TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
       TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
       TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
      
       $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
      [...]
       __print_symbolic(REC->reason,
          { 0, "flush on task switch" },
          { 1, "remote shootdown" },
          { 2, "local shootdown" },
          { 3, "local mm shootdown" })
      
      The above is what userspace expects to see, and tools do not need to
      be modified to parse them.
      
      Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
      
      Cc: Guilherme Cox <cox@computer.org>
      Cc: Tony Luck <tony.luck@gmail.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0c564a53
  16. 30 3月, 2015 1 次提交
  17. 27 3月, 2015 1 次提交
    • P
      serial: earlycon: Enable earlycon without command line param · 470ca0de
      Peter Hurley 提交于
      Earlycon matching can only be triggered if 'earlycon=...' has been
      specified on the kernel command line. To workaround this limitation
      requires tight coupling between arches and specific serial drivers
      in order to start an earlycon. Devicetree avoids this limitation
      with a link table that contains the required data to match earlycons.
      
      Mirror this approach for earlycon match by name. Re-purpose
      EARLYCON_DECLARE to generate a table entry which associates name with
      setup() function. Re-purpose setup_earlycon() to scan this table for
      an earlycon match, which is registered if found.
      
      Declare one "earlycon" early_param, which calls setup_earlycon().
      
      This design allows setup_earlycon() to be called directly with a
      param string (as if 'earlycon=...' had been set on the command line).
      Re-registration (either directly or by early_param) is prevented.
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      470ca0de
  18. 24 3月, 2015 1 次提交
  19. 19 3月, 2015 1 次提交
  20. 09 3月, 2015 1 次提交
  21. 14 2月, 2015 1 次提交
    • A
      kernel: add support for .init_array.* constructors · 9ddf8252
      Andrey Ryabinin 提交于
      KASan uses constructors for initializing redzones for global variables.
      Globals instrumentation in GCC 4.9.2 produces constructors with priority
      (.init_array.00099)
      
      Currently kernel ignores such constructors.  Only constructors with
      default priority supported (.init_array)
      
      This patch adds support for constructors with priorities.  For kernel
      image we put pointers to constructors between __ctors_start/__ctors_end
      and do_ctors() will call them on start up.  For modules we merge
      .init_array.* sections into resulting .init_array.  Module code properly
      handles constructors in .init_array section.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ddf8252
  22. 13 2月, 2015 2 次提交
    • M
      mm: remove remaining references to NUMA hinting bits and helpers · 21d9ee3e
      Mel Gorman 提交于
      This patch removes the NUMA PTE bits and associated helpers.  As a
      side-effect it increases the maximum possible swap space on x86-64.
      
      One potential source of problems is races between the marking of PTEs
      PROT_NONE, NUMA hinting faults and migration.  It must be guaranteed that
      a PTE being protected is not faulted in parallel, seen as a pte_none and
      corrupting memory.  The base case is safe but transhuge has problems in
      the past due to an different migration mechanism and a dependance on page
      lock to serialise migrations and warrants a closer look.
      
      task_work hinting update			parallel fault
      ------------------------			--------------
      change_pmd_range
        change_huge_pmd
          __pmd_trans_huge_lock
            pmdp_get_and_clear
      						__handle_mm_fault
      						pmd_none
      						  do_huge_pmd_anonymous_page
      						  read? pmd_lock blocks until hinting complete, fail !pmd_none test
      						  write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none
            pmd_modify
            set_pmd_at
      
      task_work hinting update			parallel migration
      ------------------------			------------------
      change_pmd_range
        change_huge_pmd
          __pmd_trans_huge_lock
            pmdp_get_and_clear
      						__handle_mm_fault
      						  do_huge_pmd_numa_page
      						    migrate_misplaced_transhuge_page
      						    pmd_lock waits for updates to complete, recheck pmd_same
            pmd_modify
            set_pmd_at
      
      Both of those are safe and the case where a transhuge page is inserted
      during a protection update is unchanged.  The case where two processes try
      migrating at the same time is unchanged by this series so should still be
      ok.  I could not find a case where we are accidentally depending on the
      PTE not being cleared and flushed.  If one is missed, it'll manifest as
      corruption problems that start triggering shortly after this series is
      merged and only happen when NUMA balancing is enabled.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21d9ee3e
    • M
      mm: add p[te|md] protnone helpers for use by NUMA balancing · e7bb4b6d
      Mel Gorman 提交于
      This is a preparatory patch that introduces protnone helpers for automatic
      NUMA balancing.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7bb4b6d
  23. 12 2月, 2015 1 次提交
    • K
      mm, asm-generic: define PUD_SHIFT in <asm-generic/4level-fixup.h> · 4155b8e0
      Kirill A. Shutemov 提交于
      If an architecure uses <asm-generic/4level-fixup.h>, build fails if we
      try to use PUD_SHIFT in generic code:
      
         In file included from arch/microblaze/include/asm/bug.h:1:0,
                          from include/linux/bug.h:4,
                          from include/linux/thread_info.h:11,
                          from include/asm-generic/preempt.h:4,
                          from arch/microblaze/include/generated/asm/preempt.h:1,
                          from include/linux/preempt.h:18,
                          from include/linux/spinlock.h:50,
                          from include/linux/mmzone.h:7,
                          from include/linux/gfp.h:5,
                          from include/linux/slab.h:14,
                          from mm/mmap.c:12:
         mm/mmap.c: In function 'exit_mmap':
      >> mm/mmap.c:2858:46: error: 'PUD_SHIFT' undeclared (first use in this function)
             round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
                                                       ^
         include/asm-generic/bug.h:86:25: note: in definition of macro 'WARN_ON'
           int __ret_warn_on = !!(condition);    \
                                  ^
         mm/mmap.c:2858:46: note: each undeclared identifier is reported only once for each function it appears in
             round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
                                                       ^
         include/asm-generic/bug.h:86:25: note: in definition of macro 'WARN_ON'
           int __ret_warn_on = !!(condition);    \
                                  ^
      As with <asm-generic/pgtable-nopud.h>, let's define PUD_SHIFT to
      PGDIR_SHIFT.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4155b8e0
  24. 11 2月, 2015 1 次提交
  25. 21 1月, 2015 1 次提交