1. 21 7月, 2015 2 次提交
  2. 11 7月, 2015 1 次提交
    • J
      parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results · 01ab6057
      John David Anglin 提交于
      The increased use of pdtlb/pitlb instructions seemed to increase the
      frequency of random segmentation faults building packages. Further, we
      had a number of cases where TLB inserts would repeatedly fail and all
      forward progress would stop. The Haskell ghc package caused a lot of
      trouble in this area. The final indication of a race in pte handling was
      this syslog entry on sibaris (C8000):
      
       swap_free: Unused swap offset entry 00000004
       BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
       addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
       CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
       Backtrace:
        [<0000000040173eb0>] show_stack+0x20/0x38
        [<0000000040444424>] dump_stack+0x9c/0x110
        [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
        [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
        [<00000000402a4090>] zap_page_range+0xf0/0x198
        [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0
      
      Note that the pte value is 0 except for the accessed bit 0x100. This bit
      shouldn't be set without the present bit.
      
      It should be noted that the madvise system call is probably a trigger for many
      of the random segmentation faults.
      
      In looking at the kernel code, I found the following problems:
      
      1) The pte_clear define didn't take TLB lock when clearing a pte.
      2) We didn't test pte present bit inside lock in exception support.
      3) The pte and tlb locks needed to merged in order to ensure consistency
      between page table and TLB. This also has the effect of serializing TLB
      broadcasts on SMP systems.
      
      The attached change implements the above and a few other tweaks to try
      to improve performance. Based on the timing code, TLB purges are very
      slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
      beneficial to test the split_tlb variable to avoid duplicate purges.
      Probably, all PA 2.0 machines have combined TLBs.
      
      I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
      applications and most threads have a stack size that is too large to
      make this useful. I added some comments to this effect.
      
      Since implementing 1 through 3, I haven't had any random segmentation
      faults on mx3210 (rp3440) in about one week of building code and running
      as a Debian buildd.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v3.18+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      01ab6057
  3. 10 7月, 2015 6 次提交
  4. 09 7月, 2015 19 次提交
  5. 08 7月, 2015 6 次提交
  6. 07 7月, 2015 6 次提交
    • A
      ACPI / ARM64 : use the new BAD_MADT_GICC_ENTRY macro · 99e3e3ae
      Al Stone 提交于
      For those parts of the arm64 ACPI code that need to check GICC subtables
      in the MADT, use the new BAD_MADT_GICC_ENTRY macro instead of the previous
      BAD_MADT_ENTRY.  The new macro takes into account differences in the size
      of the GICC subtable that the old macro did not; this caused failures even
      though the subtable entries are valid.
      
      Fixes: aeb823bb ("ACPICA: ACPI 6.0: Add changes for FADT table.")
      Signed-off-by: NAl Stone <al.stone@linaro.org>
      Reviewed-by: NHanjun Guo <hanjun.guo@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      99e3e3ae
    • A
      ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro · b6cfb277
      Al Stone 提交于
      The BAD_MADT_ENTRY() macro is designed to work for all of the subtables
      of the MADT.  In the ACPI 5.1 version of the spec, the struct for the
      GICC subtable (struct acpi_madt_generic_interrupt) is 76 bytes long; in
      ACPI 6.0, the struct is 80 bytes long.  But, there is only one definition
      in ACPICA for this struct -- and that is the 6.0 version.  Hence, when
      BAD_MADT_ENTRY() compares the struct size to the length in the GICC
      subtable, it fails if 5.1 structs are in use, and there are systems in
      the wild that have them.
      
      This patch adds the BAD_MADT_GICC_ENTRY() that checks the GICC subtable
      only, accounting for the difference in specification versions that are
      possible.  The BAD_MADT_ENTRY() will continue to work as is for all other
      MADT subtables.
      
      This code is being added to an arm64 header file since that is currently
      the only architecture using the GICC subtable of the MADT.  As a GIC is
      specific to ARM, it is also unlikely the subtable will be used elsewhere.
      
      Fixes: aeb823bb ("ACPICA: ACPI 6.0: Add changes for FADT table.")
      Signed-off-by: NAl Stone <al.stone@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
      [catalin.marinas@arm.com: extra brackets around macro arguments]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      b6cfb277
    • T
      x86/irq: Retrieve irq data after locking irq_desc · 09cf92b7
      Thomas Gleixner 提交于
      irq_data is protected by irq_desc->lock, so retrieving the irq chip
      from irq_data outside the lock is racy vs. an concurrent update. Move
      it into the lock held region.
      
      While at it add a comment why the vector walk does not require
      vector_lock.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: xiao jin <jin.xiao@intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.331320612@linutronix.de
      09cf92b7
    • T
      x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable() · cbb24dc7
      Thomas Gleixner 提交于
      It's unsafe to examine fields in the irq descriptor w/o holding the
      descriptor lock. Add proper locking.
      
      While at it add a comment why the vector check can run lock less
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: xiao jin <jin.xiao@intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.236544164@linutronix.de
      cbb24dc7
    • T
      x86/irq: Plug irq vector hotplug race · 5a3f75e3
      Thomas Gleixner 提交于
      Jin debugged a nasty cpu hotplug race which results in leaking a irq
      vector on the newly hotplugged cpu.
      
      cpu N				cpu M
      native_cpu_up                   device_shutdown
        do_boot_cpu			  free_msi_irqs
        start_secondary                   arch_teardown_msi_irqs
          smp_callin                        default_teardown_msi_irqs
             setup_vector_irq                  arch_teardown_msi_irq
              __setup_vector_irq		   native_teardown_msi_irq
                lock(vector_lock)		     destroy_irq 
                install vectors
                unlock(vector_lock)
      					       lock(vector_lock)
      --->                                  	       __clear_irq_vector
                                          	       unlock(vector_lock)
          lock(vector_lock)
          set_cpu_online
          unlock(vector_lock)
      
      This leaves the irq vector(s) which are torn down on CPU M stale in
      the vector array of CPU N, because CPU M does not see CPU N online
      yet. There is a similar issue with concurrent newly setup interrupts.
      
      The alloc/free protection of irq descriptors does not prevent the
      above race, because it merily prevents interrupt descriptors from
      going away or changing concurrently.
      
      Prevent this by moving the call to setup_vector_irq() into the
      vector_lock held region which protects set_cpu_online():
      
      cpu N				cpu M
      native_cpu_up                   device_shutdown
        do_boot_cpu			  free_msi_irqs
        start_secondary                   arch_teardown_msi_irqs
          smp_callin                        default_teardown_msi_irqs
             lock(vector_lock)                arch_teardown_msi_irq
             setup_vector_irq()
              __setup_vector_irq		   native_teardown_msi_irq
                install vectors		     destroy_irq 
             set_cpu_online
             unlock(vector_lock)
      					       lock(vector_lock)
                                        	       __clear_irq_vector
                                          	       unlock(vector_lock)
      
      So cpu M either sees the cpu N online before clearing the vector or
      cpu N installs the vectors after cpu M has cleared it.
      Reported-by: Nxiao jin <jin.xiao@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
      5a3f75e3
    • S
      powerpc/powernv: Fix race in updating core_idle_state · b32aadc1
      Shreyas B. Prabhu 提交于
      core_idle_state is maintained for each core. It uses 0-7 bits to track
      whether a thread in the core has entered fastsleep or winkle. 8th bit is
      used as a lock bit.
      The lock bit is set in these 2 scenarios-
       - The thread is first in subcore to wakeup from sleep/winkle.
       - If its the last thread in the core about to enter sleep/winkle
      
      While the lock bit is set, if any other thread in the core wakes up, it
      loops until the lock bit is cleared before proceeding in the wakeup
      path. This helps prevent race conditions w.r.t fastsleep workaround and
      prevents threads from switching to process context before core/subcore
      resources are restored.
      
      But, in the path to sleep/winkle entry, we currently don't check for
      lock-bit. This exposes us to following race when running with subcore
      on-
      
      First thread in the subcorea		Another thread in the same
      waking up		   		core entering sleep/winkle
      
      lwarx   r15,0,r14
      ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
      stwcx.  r15,0,r14
      [Code to restore subcore state]
      
      						lwarx   r15,0,r14
      						[clear thread bit]
      						stwcx.  r15,0,r14
      
      andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
      stw     r15,0(r14)
      
      Here, after the thread entering sleep clears its thread bit in
      core_idle_state, the value is overwritten by the thread waking up.
      In such cases when the core enters fastsleep, code mistakes an idle
      thread as running. Because of this, the first thread waking up from
      fastsleep which is supposed to resync timebase skips it. So we can
      end up having a core with stale timebase value.
      
      This patch fixes the above race by looping on the lock bit even while
      entering the idle states.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
      Cc: stable@vger.kernel.org # 3.19+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b32aadc1