1. 14 2月, 2011 4 次提交
    • S
      x86: Avoid tlbstate lock if not enough cpus · 7064d865
      Shaohua Li 提交于
      This one isn't related to previous patch. If online cpus are
      below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The
      comments in the code declares we don't need the check, but a hot
      lock still needs an atomic operation and expensive, so add the
      check here.
      
      Uses nr_cpu_ids here as suggested by Eric Dumazet.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      LKML-Reference: <1295232730.1949.710.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7064d865
    • S
      x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32 · 70e4a369
      Shaohua Li 提交于
      Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
      with a maximum of 32 vectors.
      
      We currently only have 8 vectors for TLB invalidation and that is clearly
      inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
      tlbstate_lock is used to protect them. flush_tlb_page() is
      heavily used in page reclaim, which will cause a lot of lock
      contention for tlbstate_lock.
      
      Andi Kleen suggested increasing the vectors number to 32, which should be
      good for current typical systems to reduce the tlbstate_lock contention.
      
      My test system has 4 sockets and 64G memory, and 64 CPUs. My
      workload creates 64 processes. Each process mmap reads a big
      empty sparse file. The total size of the files are 2*total_mem,
      so this will cause a lot of page reclaim.
      
      Below is the result I get from perf call-graph profiling:
      
       without the patch:
       ------------------
      
          24.25%           usemem  [kernel]                                   [k] _raw_spin_lock
                           |
                           --- _raw_spin_lock
                              |
                              |--42.15%-- native_flush_tlb_others
      
       with the patch:
       ------------------
      
          14.96%           usemem  [kernel]                                   [k] _raw_spin_lock
                           |
                           --- _raw_spin_lock
                              |--13.89%-- native_flush_tlb_others
      
      So this heavily reduces the tlbstate_lock contention.
      Suggested-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      70e4a369
    • S
      x86: Allocate 32 tlb_invalidate_interrupt handler stubs · 3a09fb45
      Shaohua Li 提交于
      Add up to 32 invalidate_interrupt handlers. How many handlers are
      added depends on NUM_INVALIDATE_TLB_VECTORS. So if
      NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
      size.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a09fb45
    • S
      x86: Cleanup vector usage · 60f6e65d
      Shaohua Li 提交于
      Cleanup the vector usage and make them continuous if possible.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <1295232722.1949.707.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60f6e65d
  2. 11 2月, 2011 3 次提交
  3. 10 2月, 2011 1 次提交
    • J
      KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index · 893a5ab6
      Joerg Roedel 提交于
      The gs_index loading code uses the swapgs instruction to
      switch to the user gs_base temporarily. This is unsave in an
      lightweight exit-path in KVM on AMD because the
      KERNEL_GS_BASE MSR is switches lazily. An NMI happening in
      the critical path of load_gs_index may use the wrong GS_BASE
      value then leading to unpredictable behavior, e.g. a
      triple-fault.
      
      This patch fixes the issue by making sure that load_gs_index
      is called only with a valid KERNEL_GS_BASE value loaded in
      KVM.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      893a5ab6
  4. 08 2月, 2011 4 次提交
  5. 07 2月, 2011 15 次提交
  6. 06 2月, 2011 3 次提交
  7. 05 2月, 2011 2 次提交
  8. 04 2月, 2011 2 次提交
    • S
      serial: bfin_5xx: split uart RX lock from uart port lock to avoid deadlock · 0f66e50a
      Sonic Zhang 提交于
      The RX lock is used to protect the RX buffer from concurrent access in DMA
      mode between the timer and RX interrupt routines.  It is independent from
      the uart lock which is used to protect the TX buffer.  It is possible for
      a uart TX transfer to be started up from the RX interrupt handler if low
      latency is enabled.  So we need to split the locks to avoid deadlocking in
      this situation.
      
      In PIO mode, the RX lock is not necessary because the handle_simple_irq
      and handle_level_irq functions ensure driver interrupt handlers are called
      once on one core.
      
      And now that the RX path has its own lock, the TX interrupt has nothing to
      do with the RX path, so disabling it at the same time.
      Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0f66e50a
    • S
      x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm · 831d52bc
      Suresh Siddha 提交于
      Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb
      IPI's while the cr3 is still pointing to the prev mm.  And this window
      can lead to the possibility of bogus TLB fills resulting in strange
      failures.  One such problematic scenario is mentioned below.
      
       T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI
           etc between the point of clearing the cpu from the mm_cpumask(mm1)
           and before reloading the cr3 with the new mm2.
      
       T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with
           flushing the TLB for mm1.  It doesn't send the flush TLB to CPU-1
           as it doesn't see that cpu listed in the mm_cpumask(mm1).
      
       T3. After the TLB flush is complete, CPU-2 goes ahead and frees the
           page-table pages associated with the removed vma mapping.
      
       T4. CPU-2 now allocates those freed page-table pages for something
           else.
      
       T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1
           can potentially speculate and walk through the page-table caches
           and can insert new TLB entries.  As the page-table pages are
           already freed and being used on CPU-2, this page walk can
           potentially insert a bogus global TLB entry depending on the
           (random) contents of the page that is being used on CPU-2.
      
       T6. This bogus TLB entry being global will be active across future CR3
           changes and can result in weird memory corruption etc.
      
      To avoid this issue, for the prev mm that is handing over the cpu to
      another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is
      changed.
      
      Marking it for -stable, though we haven't seen any reported failure that
      can be attributed to this.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org	[v2.6.32+]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      831d52bc
  9. 03 2月, 2011 4 次提交
    • S
      x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms · f7448548
      Suresh Siddha 提交于
      Markus Kohn ran into a hard hang regression on an acer aspire
      1310, when acpi is enabled. git bisect showed the following
      commit as the bad one that introduced the boot regression.
      
      	commit d0af9eed
      	Author: Suresh Siddha <suresh.b.siddha@intel.com>
      	Date:   Wed Aug 19 18:05:36 2009 -0700
      
      	    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
      
      Because of the UP configuration of that platform,
      native_smp_prepare_cpus() bailed out (in smp_sanity_check())
      before doing the set_mtrr_aps_delayed_init()
      
      Further down the boot path, native_smp_cpus_done() will call the
      delayed MTRR initialization for the AP's (mtrr_aps_init()) with
      mtrr_aps_delayed_init not set. This resulted in the boot
      processor reprogramming its MTRR's to the values seen during the
      start of the OS boot. While this is not needed ideally, this
      shouldn't have caused any side-effects. This is because the
      reprogramming of MTRR's (set_mtrr_state() that gets called via
      set_mtrr()) will check if the live register contents are
      different from what is being asked to write and will do the actual
      write only if they are different.
      
      BP's mtrr state is read during the start of the OS boot and
      typically nothing would have changed when we ask to reprogram it
      on BP again because of the above scenario on an UP platform. So
      on a normal UP platform no reprogramming of BP MTRR MSR's
      happens and all is well.
      
      However, on this platform, bios seems to be modifying the fixed
      mtrr range registers between the start of OS boot and when we
      double check the live registers for reprogramming BP MTRR
      registers. And as the live registers are modified, we end up
      reprogramming the MTRR's to the state seen during the start of
      the OS boot.
      
      During ACPI initialization, something in the bios (probably smi
      handler?) don't like this fact and results in a hard lockup.
      
      We didn't see this boot hang issue on this platform before the
      commit d0af9eed, because only
      the AP's (if any) will program its MTRR's to the value that BP
      had at the start of the OS boot.
      
      Fix this issue by checking mtrr_aps_delayed_init before
      continuing further in the mtrr_aps_init(). Now, only AP's (if
      any) will program its MTRR's to the BP values during boot.
      
      Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393
      
        [ By the way, this behavior of the bios modifying MTRR's after the start
          of the OS boot is not common and the kernel is not prepared to
          handle this situation well. Irrespective of this issue, during
          suspend/resume, linux kernel will try to reprogram the BP's MTRR values
          to the values seen during the start of the OS boot. So suspend/resume might
          be already broken on this platform for all linux kernel versions. ]
      Reported-and-bisected-by: NMarkus Kohn <jabber@gmx.org>
      Tested-by: NMarkus Kohn <jabber@gmx.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Thomas Renninger <trenn@novell.com>
      Cc: Rafael Wysocki <rjw@novell.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: stable@kernel.org # [v2.6.32+]
      LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7448548
    • M
      x86, nx: Don't force pages RW when setting NX bits · f12d3d04
      Matthieu CASTET 提交于
      Xen want page table pages read only.
      
      But the initial page table (from head_*.S) live in .data or .bss.
      
      That was broken by 64edc8ed.  There is
      absolutely no reason to force these pages RW after they have already
      been marked RO.
      Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f12d3d04
    • M
      arm: omap4: panda: remove usb_nop_xceiv_register(v1) · ed2af92b
      Ming Lei 提交于
      Panda uses both twl6030 otg phy(vbus, id) and internal
      phy(data lines, DP/DM), so removes usb_nop_xceiv_register to make
      twl6030 otg driver working since current otg code only supports
      one global transceiver. Otherwise, musb doesn't work without
      the remove.
      Reviewd-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      ed2af92b
    • M
      OMAP1: Fix non-working LCD on OMAP310 · 719078a6
      Marek Vasut 提交于
      This patch fixes bug introduced in revision:
      
      f8e9e984
      omap1: DMA: move LCD related code from plat-omap to mach-omap1
      
      The code introduced by this patch didn't consider any other CPUs but OMAP1510,
      which rendered OMAP310 -- which has the same LCD controller -- non-working. Use
      cpu_is_omap15xx() instead of cpu_is_omap1510() to squash this issue.
      
      Bug found on Palm Zire 71 hardware.
      Signed-off-by: NMarek Vasut <marek.vasut@gmail.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      719078a6
  10. 02 2月, 2011 2 次提交
    • T
      OMAP3: Devkit8000: Change lcd power pin · daf7aabc
      Thomas Weber 提交于
      This patch fixes a wrongly used lcd enable pin.
      
      The Devkit8000 uses twl4030_ledA configured as output gpio only for
      the lcd enable line. twl4030_gpio.1 is used through the generic
      gpio functions while ledA is used via low level twl4030 calls.
      
      This patch removes the low level calls and use the generic gpio functions
      for initialization and use of ledA. This patch also fixes a bug where the
      lcd would not power down when blanking.
      
      Further this patch fixes an indentation issue. The comment line uses
      eight whitespace and is replaced with a hard tab.
      
      gpio_request + gpio_direction_output are replaced with gpio_request_one.
      The return value of gpio_request_one is used to set the value of the
      gpio to -EINVAL when unsuccessful, so that gpio_is_valid can detect the
      unsuccessful request. But already successful requested gpios are not freed.
      Reported-by: NDaniel Morsing <daniel.morsing@gmail.com>
      Signed-off-by: NThomas Weber <weber@corscience.de>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      daf7aabc
    • H
      omap1: remove duplicated #include · 190910cb
      Huang Weiyi 提交于
      Remove duplicated #include('s) in
        arch/arm/mach-omap1/time.c
      Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      190910cb