1. 07 5月, 2014 1 次提交
    • V
      sched: Rework sched_domain topology definition · 143e1e28
      Vincent Guittot 提交于
      We replace the old way to configure the scheduler topology with a new method
      which enables a platform to declare additionnal level (if needed).
      
      We still have a default topology table definition that can be used by platform
      that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
      be overwritten by an arch which either wants to add new level where a load
      balance make sense like BOOK or powergating level or wants to change the flags
      configuration of some levels.
      
      For each level, we need a function pointer that returns cpumask for each cpu,
      a function pointer that returns the flags for the level and a name. Only flags
      that describe topology, can be set by an architecture. The current topology
      flags are:
      
       SD_SHARE_CPUPOWER
       SD_SHARE_PKG_RESOURCES
       SD_NUMA
       SD_ASYM_PACKING
      
      Then, each level must be a subset on the next one. The build sequence of the
      sched_domain will take care of removing useless levels like those with 1 CPU
      and those with the same CPU span and no more relevant information for
      load balancing than its children.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hanjun Guo <hanjun.guo@linaro.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux390@de.ibm.com
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Link: http://lkml.kernel.org/r/1397209481-28542-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      143e1e28
  2. 11 3月, 2014 1 次提交
  3. 10 2月, 2014 2 次提交
    • T
      locking/mcs: Allow architecture specific asm files to be used for contended case · ddf1d169
      Tim Chen 提交于
      This patch allows each architecture to add its specific assembly optimized
      arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
      MCS lock and unlock functions.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: AswinChandramouleeswaran <aswin@hp.com>
      Cc: George Spelvin <linux@horizon.com>
      Cc: Rik vanRiel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: MichelLespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Figo.zhang" <figo1802@gmail.com>
      Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ddf1d169
    • T
      locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order · b119fa61
      Tim Chen 提交于
      We perform a clean up of the Kbuid files in each architecture.
      We order the files in each Kbuild in alphabetical order
      by running the below script.
      
      for i in arch/*/include/asm/Kbuild
      do
              cat $i | gawk '/^generic-y/ {
                      i = 3;
                      do {
                              for (; i <= NF; i++) {
                                      if ($i == "\\") {
                                              getline;
                                              i = 1;
                                              continue;
                                      }
                                      if ($i != "")
                                              hdr[$i] = $i;
                              }
                              break;
                      } while (1);
                      next;
              }
              // {
                      print $0;
              }
              END {
                      n = asort(hdr);
                      for (i = 1; i <= n; i++)
                              print "generic-y += " hdr[i];
              }' > ${i}.sorted;
              mv ${i}.sorted $i;
      done
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
      Cc: AswinChandramouleeswaran <aswin@hp.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "Figo.zhang" <figo1802@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: George Spelvin <linux@horizon.com>
      Cc: MichelLespinasse <walken@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      [ Fixed build bug. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b119fa61
  4. 04 2月, 2014 1 次提交
  5. 29 1月, 2014 1 次提交
  6. 24 1月, 2014 2 次提交
  7. 12 1月, 2014 1 次提交
    • P
      arch: Introduce smp_load_acquire(), smp_store_release() · 47933ad4
      Peter Zijlstra 提交于
      A number of situations currently require the heavyweight smp_mb(),
      even though there is no need to order prior stores against later
      loads.  Many architectures have much cheaper ways to handle these
      situations, but the Linux kernel currently has no portable way
      to make use of them.
      
      This commit therefore supplies smp_load_acquire() and
      smp_store_release() to remedy this situation.  The new
      smp_load_acquire() primitive orders the specified load against
      any subsequent reads or writes, while the new smp_store_release()
      primitive orders the specifed store against any prior reads or
      writes.  These primitives allow array-based circular FIFOs to be
      implemented without an smp_mb(), and also allow a theoretical
      hole in rcu_assign_pointer() to be closed at no additional
      expense on most architectures.
      
      In addition, the RCU experience transitioning from explicit
      smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
      and rcu_assign_pointer(), respectively resulted in substantial
      improvements in readability.  It therefore seems likely that
      replacing other explicit barriers with smp_load_acquire() and
      smp_store_release() will provide similar benefits.  It appears
      that roughly half of the explicit barriers in core kernel code
      might be so replaced.
      
      [Changelog by PaulMck]
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47933ad4
  8. 18 12月, 2013 1 次提交
  9. 11 12月, 2013 1 次提交
  10. 15 11月, 2013 2 次提交
    • K
      ia64: handle pgtable_page_ctor() fail · ca973d86
      Kirill A. Shutemov 提交于
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca973d86
    • R
      ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node · 7b199811
      Rafael J. Wysocki 提交于
      Modify struct acpi_dev_node to contain a pointer to struct acpi_device
      associated with the given device object (that is, its ACPI companion
      device) instead of an ACPI handle corresponding to it.  Introduce two
      new macros for manipulating that pointer in a CONFIG_ACPI-safe way,
      ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the
      ACPI_HANDLE() macro to take the above changes into account.
      Drop the ACPI_HANDLE_SET() macro entirely and rework its users to
      use ACPI_COMPANION_SET() instead.  For some of them who used to
      pass the result of acpi_get_child() directly to ACPI_HANDLE_SET()
      introduce a helper routine acpi_preset_companion() doing an
      equivalent thing.
      
      The main motivation for doing this is that there are things
      represented by struct acpi_device objects that don't have valid
      ACPI handles (so called fixed ACPI hardware features, such as
      power and sleep buttons) and we would like to create platform
      device objects for them and "glue" them to their ACPI companions
      in the usual way (which currently is impossible due to the
      lack of valid ACPI handles).  However, there are more reasons
      why it may be useful.
      
      First, struct acpi_device pointers allow of much better type checking
      than void pointers which are ACPI handles, so it should be more
      difficult to write buggy code using modified struct acpi_dev_node
      and the new macros.  Second, the change should help to reduce (over
      time) the number of places in which the result of ACPI_HANDLE() is
      passed to acpi_bus_get_device() in order to obtain a pointer to the
      struct acpi_device associated with the given "physical" device,
      because now that pointer is returned by ACPI_COMPANION() directly.
      Finally, the change should make it easier to write generic code that
      will build both for CONFIG_ACPI set and unset without adding explicit
      compiler directives to it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
      7b199811
  11. 14 11月, 2013 1 次提交
  12. 13 11月, 2013 1 次提交
    • K
      exec/ptrace: fix get_dumpable() incorrect tests · d049f74f
      Kees Cook 提交于
      The get_dumpable() return value is not boolean.  Most users of the
      function actually want to be testing for non-SUID_DUMP_USER(1) rather than
      SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
      protected state.  Almost all places did this correctly, excepting the two
      places fixed in this patch.
      
      Wrong logic:
          if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
              or
          if (dumpable == 0) { /* be protective */ }
              or
          if (!dumpable) { /* be protective */ }
      
      Correct logic:
          if (dumpable != SUID_DUMP_USER) { /* be protective */ }
              or
          if (dumpable != 1) { /* be protective */ }
      
      Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
      user was able to ptrace attach to processes that had dropped privileges to
      that user.  (This may have been partially mitigated if Yama was enabled.)
      
      The macros have been moved into the file that declares get/set_dumpable(),
      which means things like the ia64 code can see them too.
      
      CVE-2013-2929
      Reported-by: NVasily Kulikov <segoon@openwall.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d049f74f
  13. 31 10月, 2013 1 次提交
  14. 25 10月, 2013 1 次提交
  15. 14 10月, 2013 1 次提交
  16. 10 10月, 2013 1 次提交
    • S
      xen: introduce xen_alloc/free_coherent_pages · d6fe76c5
      Stefano Stabellini 提交于
      xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
      and devices. On native x86 is sufficient to call __get_free_pages in
      order to get a coherent buffer, while on ARM (and potentially ARM64) we
      need to call the native dma_ops->alloc implementation.
      
      Introduce xen_alloc_coherent_pages to abstract the arch specific buffer
      allocation.
      
      Similarly introduce xen_free_coherent_pages to free a coherent buffer:
      on x86 is simply a call to free_pages while on ARM and ARM64 is
      arm_dma_ops.free.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      
      Changes in v7:
      - rename __get_dma_ops to __generic_dma_ops;
      - call __generic_dma_ops(hwdev)->alloc/free on arm64 too.
      
      Changes in v6:
      - call __get_dma_ops to get the native dma_ops pointer on arm.
      d6fe76c5
  17. 25 9月, 2013 1 次提交
  18. 05 9月, 2013 2 次提交
  19. 27 8月, 2013 1 次提交
  20. 20 8月, 2013 1 次提交
  21. 16 8月, 2013 1 次提交
    • L
      Fix TLB gather virtual address range invalidation corner cases · 2b047252
      Linus Torvalds 提交于
      Ben Tebulin reported:
      
       "Since v3.7.2 on two independent machines a very specific Git
        repository fails in 9/10 cases on git-fsck due to an SHA1/memory
        failures.  This only occurs on a very specific repository and can be
        reproduced stably on two independent laptops.  Git mailing list ran
        out of ideas and for me this looks like some very exotic kernel issue"
      
      and bisected the failure to the backport of commit 53a59fc6 ("mm:
      limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
      
      That commit itself is not actually buggy, but what it does is to make it
      much more likely to hit the partial TLB invalidation case, since it
      introduces a new case in tlb_next_batch() that previously only ever
      happened when running out of memory.
      
      The real bug is that the TLB gather virtual memory range setup is subtly
      buggered.  It was introduced in commit 597e1c35 ("mm/mmu_gather:
      enable tlb flush range in generic mmu_gather"), and the range handling
      was already fixed at least once in commit e6c495a9 ("mm: fix the TLB
      range flushed when __tlb_remove_page() runs out of slots"), but that fix
      was not complete.
      
      The problem with the TLB gather virtual address range is that it isn't
      set up by the initial tlb_gather_mmu() initialization (which didn't get
      the TLB range information), but it is set up ad-hoc later by the
      functions that actually flush the TLB.  And so any such case that forgot
      to update the TLB range entries would potentially miss TLB invalidates.
      
      Rather than try to figure out exactly which particular ad-hoc range
      setup was missing (I personally suspect it's the hugetlb case in
      zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
      did), this patch just gets rid of the problem at the source: make the
      TLB range information available to tlb_gather_mmu(), and initialize it
      when initializing all the other tlb gather fields.
      
      This makes the patch larger, but conceptually much simpler.  And the end
      result is much more understandable; even if you want to play games with
      partial ranges when invalidating the TLB contents in chunks, now the
      range information is always there, and anybody who doesn't want to
      bother with it won't introduce subtle bugs.
      
      Ben verified that this fixes his problem.
      Reported-bisected-and-tested-by: NBen Tebulin <tebulin@googlemail.com>
      Build-testing-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Build-testing-by: NRichard Weinberger <richard.weinberger@gmail.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b047252
  22. 14 8月, 2013 1 次提交
    • F
      vtime: Describe overriden functions in dedicated arch headers · a5725ac2
      Frederic Weisbecker 提交于
      If the arch overrides some generic vtime APIs, let it describe
      these on a dedicated and standalone header. This way it becomes
      convenient to include it in vtime generic headers without irrelevant
      stuff in such a low level header.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      a5725ac2
  23. 29 6月, 2013 1 次提交
  24. 26 6月, 2013 1 次提交
  25. 19 6月, 2013 2 次提交
  26. 18 6月, 2013 1 次提交
  27. 06 6月, 2013 1 次提交
    • P
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra 提交于
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  28. 30 4月, 2013 1 次提交
    • G
      mm/hugetlb: add more arch-defined huge_pte functions · 106c992a
      Gerald Schaefer 提交于
      Commit abf09bed ("s390/mm: implement software dirty bits")
      introduced another difference in the pte layout vs.  the pmd layout on
      s390, thoroughly breaking the s390 support for hugetlbfs.  This requires
      replacing some more pte_xxx functions in mm/hugetlbfs.c with a
      huge_pte_xxx version.
      
      This patch introduces those huge_pte_xxx functions and their generic
      implementation in asm-generic/hugetlb.h, which will now be included on
      all architectures supporting hugetlbfs apart from s390.  This change
      will be a no-op for those architectures.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Acked-by: Michal Hocko <mhocko@suse.cz>	[for !s390 parts]
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      106c992a
  29. 27 4月, 2013 1 次提交
  30. 17 4月, 2013 1 次提交
  31. 08 4月, 2013 1 次提交
  32. 03 4月, 2013 2 次提交
    • Y
      Fix build error for numa_clear_node() under IA64 · eee46b3d
      Yijing Wang 提交于
      numa_clear_node() function is not implemented under IA64,
      it will be called in unmap_cpu_on_node() in mm/memory_hotplug.c.
      This cause build error under IA64, this patch adds numa_clear_node()
      in IA64 to fix this problem.
      
      [Added __cpuinit notation to numa_clear_node() to keep linker happy -Tony]
      Signed-off-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      eee46b3d
    • T
      Fix initialization of CMCI/CMCP interrupts · d303e9e9
      Tony Luck 提交于
      Back 2010 during a revamp of the irq code some initializations
      were moved from ia64_mca_init() to ia64_mca_late_init() in
      
      	commit c75f2aa1
      	Cannot use register_percpu_irq() from ia64_mca_init()
      
      But this was hideously wrong. First of all these initializations
      are now down far too late. Specifically after all the other cpus
      have been brought up and initialized their own CMC vectors from
      smp_callin(). Also ia64_mca_late_init() may be called from any cpu
      so the line:
      	ia64_mca_cmc_vector_setup();       /* Setup vector on BSP */
      is generally not executed on the BSP, and so the CMC vector isn't
      setup at all on that processor.
      
      Make use of the arch_early_irq_init() hook to get this code executed
      at just the right moment: not too early, not too late.
      Reported-by: NFred Hartnett <fred.hartnett@hp.com>
      Tested-by: NFred Hartnett <fred.hartnett@hp.com>
      Cc: stable@kernel.org # v2.6.37+
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      d303e9e9
  33. 20 3月, 2013 1 次提交
    • S
      Wrong asm register contraints in the futex implementation · 136f39dd
      Stephan Schreiber 提交于
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/include/asm/futex.h.
      
      I observed this on Kernel 3.2.23 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/include/asm/futex.h:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8");
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov %0=r0				\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "=r" (r8), "=r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      The list of output registers is
      			: "=r" (r8), "=r" (prev)
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are uaddr, newval, oldval on the
      example.
      The second assembly instruction
      			"	mov %0=r0				\n"
      is the first one which writes to a register; it sets %0 to 0. %0 means
      the first register operand; it is r8 here. (The r0 is read-only and
      always 0 on the Itanium; it can be used if an immediate zero value is
      needed.)
      This instruction might overwrite one of the other registers which are
      still needed.
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The objdump utility can give us disassembly.
      The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
      look for a module that uses the funtion. This is the
      cmpxchg_futex_value_locked() function in
      kernel/futex.c:
      
      static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
      				      u32 uval, u32 newval)
      {
      	int ret;
      
      	pagefault_disable();
      	ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
      	pagefault_enable();
      
      	return ret;
      }
      
      Now the disassembly. At first from the Kernel package 3.2.23 which has
      been compiled with GCC 4.4, remeber this Kernel seemed to work:
      objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	00 00 00 02 00 00 	            nop.m 0x0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 88 00 08 e0 	[MLX]       addp4 r8=r34,r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 84 40 90 11 	[MIB]       st4 [r32]=r33
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 58 20 1a 19 21 	[MMI]       adds r11=3208,r13;;
            2f6:	20 01 2c 20 20 00 	            ld4 r18=[r11]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 88 fc 25 3f 23 	[MMI]       adds r17=-1,r18;;
            306:	00 88 2c 20 23 00 	            st4 [r11]=r17
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      The lines
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
      are the instructions of the assembly block.
      The line
            2b6:	80 00 00 00 42 00 	            mov r8=r0
      sets the r8 register to 0 and after that
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
      prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
      is wrong.
      What happened here is what I explained above: An input register is
      overwritten which is still needed.
      The register operand constraints in futex.h are wrong.
      
      (The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
      
      The attached patch fixes the register operand constraints in futex.h.
      The code after patching of it:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8") = 0;
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "+r" (r8), "=&r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      I also initialized the 'r8' var with the C programming language.
      The _asm qualifier on the definition of the 'r8' var forces GCC to use
      the r8 processor register for it.
      I don't believe that we should use inline assembly for zeroing out a
      local variable.
      The constraint is
      "+r" (r8)
      what means that it is both an input register and an output register.
      Note that the page fault handler will modify the r8 register which
      will be the return value of the function.
      The real fix is
      "=&r" (prev)
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      Patched the Kernel 3.2.23 and compiled it with GCC4.4:
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	20 12 01 10 40 00 	            addp4 r34=r34,r0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0b 00 00 00 22 00 	[MMI]       mf;;
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
            2bc:	00 00 04 00       	            nop.i 0x0;;
            2c0:	09 58 8c 42 11 10 	[MMI]       cmpxchg4.acq r11=[r33],r35,ar.ccv
            2c6:	00 00 00 02 00 00 	            nop.m 0x0
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 2c 40 90 11 	[MIB]       st4 [r32]=r11
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 88 20 1a 19 21 	[MMI]       adds r17=3208,r13;;
            2f6:	30 01 44 20 20 00 	            ld4 r19=[r17]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 90 fc 27 3f 23 	[MMI]       adds r18=-1,r19;;
            306:	00 90 44 20 23 00 	            st4 [r17]=r18
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      Much better.
      There is a
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
      which was generated by C code r8 = 0. Below
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
      what means that oldval is no longer overwritten.
      
      This is Debian bug#702641
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
      Signed-off-by: NStephan Schreiber <info@fs-driver.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      136f39dd
  34. 04 3月, 2013 1 次提交