1. 24 5月, 2011 1 次提交
  2. 23 5月, 2011 16 次提交
  3. 22 5月, 2011 6 次提交
    • G
      KVM: make guest mode entry to be rcu quiescent state · 8fa22068
      Gleb Natapov 提交于
      KVM does not hold any references to rcu protected data when it switches
      CPU into a guest mode. In fact switching to a guest mode is very similar
      to exiting to userspase from rcu point of view. In addition CPU may stay
      in a guest mode for quite a long time (up to one time slice). Lets treat
      guest mode as quiescent state, just like we do with user-mode execution.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8fa22068
    • S
      KVM: PPC: booke: add sregs support · 5ce941ee
      Scott Wood 提交于
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5ce941ee
    • A
      KVM: Use pci_store/load_saved_state() around VM device usage · f8fcfd77
      Alex Williamson 提交于
      Store the device saved state so that we can reload the device back
      to the original state when it's unassigned.  This has the benefit
      that the state survives across pci_reset_function() calls via
      the PCI sysfs reset interface while the VM is using the device.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      f8fcfd77
    • A
      PCI: Add interfaces to store and load the device saved state · ffbdd3f7
      Alex Williamson 提交于
      For KVM device assignment, we'd like to save off the state of a device
      prior to passing it to the guest and restore it later.  We also want
      to allow pci_reset_funciton() to be called while the device is owned
      by the guest.  This however overwrites and invalidates the struct pci_dev
      buffers, so we can't just manually call save and restore.  Add generic
      interfaces for the saved state to be stored and reloaded back into
      struct pci_dev at a later time.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      ffbdd3f7
    • A
      PCI: Track the size of each saved capability data area · 24a4742f
      Alex Williamson 提交于
      This will allow us to store and load it later.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      24a4742f
    • Y
      PCI/e1000e: Add and use pci_disable_link_state_locked() · 9f728f53
      Yinghai Lu 提交于
      Need to use it in _e1000e_disable_aspm.  This routine is used for error
      recovery, where the pci_bus_sem is already held, and we don't want
      pci_disable_link_state to try to take it again.  So add a locked variant
      for use in cases like this.
      
      Found lock up:
      
      [ 2374.654557] kworker/32:1    D ffff881027f6b0f0     0  6075      2 0x00000000
      [ 2374.654816]  ffff88503f099a68 0000000000000046 ffff88503f098000 0000000000004000
      [ 2374.654837]  00000000001d1ec0 ffff88503f099fd8 00000000001d1ec0 ffff88503f099fd8
      [ 2374.654860]  0000000000004000 00000000001d1ec0 ffff88503dcc8000 ffff88503f090000
      [ 2374.654880] Call Trace:
      [ 2374.654898]  [<ffffffff810b1302>] ? __lock_acquired+0x3a/0x224
      [ 2374.654914]  [<ffffffff81c2b59c>] ? _raw_spin_unlock_irq+0x30/0x36
      [ 2374.654925]  [<ffffffff810b069d>] ? trace_hardirqs_on_caller+0x1f/0x178
      [ 2374.654936]  [<ffffffff81c2ab24>] rwsem_down_failed_common+0xd3/0x103
      [ 2374.654945]  [<ffffffff810b158f>] ? __lock_contended+0x3a/0x2a2
      [ 2374.654955]  [<ffffffff81c2ab7b>] rwsem_down_read_failed+0x12/0x14
      [ 2374.654967]  [<ffffffff813371e4>] call_rwsem_down_read_failed+0x14/0x30
      [ 2374.654981]  [<ffffffff8135df20>] ? pci_disable_link_state+0x5f/0xf5
      [ 2374.654990]  [<ffffffff81c2a0e6>] ? down_read+0x7e/0x91
      [ 2374.654999]  [<ffffffff8135df20>] ? pci_disable_link_state+0x5f/0xf5
      [ 2374.655008]  [<ffffffff8135df20>] pci_disable_link_state+0x5f/0xf5
      [ 2374.655024]  [<ffffffff81661796>] e1000e_disable_aspm+0x55/0x5a
      [ 2374.655037]  [<ffffffff816677eb>] e1000_io_slot_reset+0x59/0xea
      [ 2374.655048]  [<ffffffff8135fe0d>] ? report_mmio_enabled+0x5d/0x5d
      [ 2374.655057]  [<ffffffff8135fe3b>] report_slot_reset+0x2e/0x5d
      [ 2374.655072]  [<ffffffff8135369e>] pci_walk_bus+0x8a/0xb7
      [ 2374.655081]  [<ffffffff8135fe0d>] ? report_mmio_enabled+0x5d/0x5d
      [ 2374.655091]  [<ffffffff813603be>] broadcast_error_message+0xa4/0xb2
      [ 2374.655101]  [<ffffffff81352c71>] ? pci_bus_read_config_dword+0x72/0x80
      [ 2374.655110]  [<ffffffff813606df>] do_recovery+0x9e/0xf9
      [ 2374.655120]  [<ffffffff81360786>] handle_error_source+0x4c/0x51
      [ 2374.655129]  [<ffffffff81360974>] aer_isr_one_error+0x1e9/0x21a
      [ 2374.655138]  [<ffffffff81360a6c>] aer_isr+0xc7/0xcc
      [ 2374.655147]  [<ffffffff813609a5>] ? aer_isr_one_error+0x21a/0x21a
      [ 2374.655159]  [<ffffffff81096d9f>] process_one_work+0x237/0x3ec
      [ 2374.655168]  [<ffffffff81096d10>] ? process_one_work+0x1a8/0x3ec
      [ 2374.655178]  [<ffffffff8109728d>] worker_thread+0x17c/0x240
      [ 2374.655186]  [<ffffffff810b0803>] ? trace_hardirqs_on+0xd/0xf
      [ 2374.655196]  [<ffffffff81097111>] ? manage_workers+0xab/0xab
      [ 2374.655209]  [<ffffffff8109c8ed>] kthread+0xa0/0xa8
      [ 2374.655223]  [<ffffffff81c332d4>] kernel_thread_helper+0x4/0x10
      [ 2374.655232]  [<ffffffff81c2b880>] ? retint_restore_args+0xe/0xe
      [ 2374.655243]  [<ffffffff8109c84d>] ? __init_kthread_worker+0x5b/0x5b
      [ 2374.655252]  [<ffffffff81c332d0>] ? gs_change+0xb/0xb
      
      when aer happens,
      pci_walk_bus already have down_read(&pci_bus_sem)...
      then report_slot_reset
              ==> e1000_io_slot_reset
                      ==> e1000e_disable_aspm
                              ==> pci_disable_link_state...
      
      We can not use pci_disable_link_state, and it will try to hold pci_bus_sem again.
      
      Try to have __pci_disable_link_state that will not need to hold pci_bus_sem.
      
      -v2: change name to pci_disable_link_state_locked() according to Jesse.
      
      [jbarnes: make sure new function is exported for modules]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      9f728f53
  4. 21 5月, 2011 5 次提交
  5. 20 5月, 2011 11 次提交
    • N
      sched: Increase SCHED_LOAD_SCALE resolution · c8b28116
      Nikhil Rao 提交于
      Introduce SCHED_LOAD_RESOLUTION, which scales is added to
      SCHED_LOAD_SHIFT and increases the resolution of
      SCHED_LOAD_SCALE. This patch sets the value of
      SCHED_LOAD_RESOLUTION to 10, scaling up the weights for all
      sched entities by a factor of 1024. With this extra resolution,
      we can handle deeper cgroup hiearchies and the scheduler can do
      better shares distribution and load load balancing on larger
      systems (especially for low weight task groups).
      
      This does not change the existing user interface, the scaled
      weights are only used internally. We do not modify
      prio_to_weight values or inverses, but use the original weights
      when calculating the inverse which is used to scale execution
      time delta in calc_delta_mine(). This ensures we do not lose
      accuracy when accounting time to the sched entities. Thanks to
      Nikunj Dadhania for fixing an bug in c_d_m() that broken fairness.
      
      Below is some analysis of the performance costs/improvements of
      this patch.
      
      1. Micro-arch performance costs:
      
      Experiment was to run Ingo's pipe_test_100k 200 times with the
      task pinned to one cpu. I measured instruction, cycles and
      stalled-cycles for the runs. See:
      
         http://thread.gmane.org/gmane.linux.kernel/1129232/focus=1129389
      
      for more info.
      
      -tip (baseline):
      
       Performance counter stats for '/root/load-scale/pipe-test-100k' (200 runs):
      
             964,991,769 instructions             #    0.82  insns per cycle
                                                  #    0.33  stalled cycles per insn
                                                  #    ( +-  0.05% )
           1,171,186,635 cycles                   #    0.000 GHz                      ( +-  0.08% )
             306,373,664 stalled-cycles-backend   #   26.16% backend  cycles idle     ( +-  0.28% )
             314,933,621 stalled-cycles-frontend  #   26.89% frontend cycles idle     ( +-  0.34% )
      
              1.122405684  seconds time elapsed  ( +-  0.05% )
      
      -tip+patches:
      
       Performance counter stats for './load-scale/pipe-test-100k' (200 runs):
      
             963,624,821 instructions             #    0.82  insns per cycle
                                                  #    0.33  stalled cycles per insn
                                                  #    ( +-  0.04% )
           1,175,215,649 cycles                   #    0.000 GHz                      ( +-  0.08% )
             315,321,126 stalled-cycles-backend   #   26.83% backend  cycles idle     ( +-  0.28% )
             316,835,873 stalled-cycles-frontend  #   26.96% frontend cycles idle     ( +-  0.29% )
      
              1.122238659  seconds time elapsed  ( +-  0.06% )
      
      With this patch, instructions decrease by ~0.10% and cycles
      increase by 0.27%. This doesn't look statistically significant.
      The number of stalled cycles in the backend increased from
      26.16% to 26.83%. This can be attributed to the shifts we do in
      c_d_m() and other places. The fraction of stalled cycles in the
      frontend remains about the same, at 26.96% compared to 26.89% in -tip.
      
      2. Balancing low-weight task groups
      
      Test setup: run 50 tasks with random sleep/busy times (biased
      around 100ms) in a low weight container (with cpu.shares = 2).
      Measure %idle as reported by mpstat over a 10s window.
      
      -tip (baseline):
      
      06:47:48 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle    intr/s
      06:47:49 PM  all   94.32    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.62  15888.00
      06:47:50 PM  all   94.57    0.00    0.62    0.00    0.00    0.00    0.00    0.00    4.81  16180.00
      06:47:51 PM  all   94.69    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.25  15966.00
      06:47:52 PM  all   95.81    0.00    0.00    0.00    0.00    0.00    0.00    0.00    4.19  16053.00
      06:47:53 PM  all   94.88    0.06    0.00    0.00    0.00    0.00    0.00    0.00    5.06  15984.00
      06:47:54 PM  all   93.31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    6.69  15806.00
      06:47:55 PM  all   94.19    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.75  15896.00
      06:47:56 PM  all   92.87    0.00    0.00    0.00    0.00    0.00    0.00    0.00    7.13  15716.00
      06:47:57 PM  all   94.88    0.00    0.00    0.00    0.00    0.00    0.00    0.00    5.12  15982.00
      06:47:58 PM  all   95.44    0.00    0.00    0.00    0.00    0.00    0.00    0.00    4.56  16075.00
      Average:     all   94.49    0.01    0.08    0.00    0.00    0.00    0.00    0.00    5.42  15954.60
      
      -tip+patches:
      
      06:47:03 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle    intr/s
      06:47:04 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16630.00
      06:47:05 PM  all   99.69    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.31  16580.20
      06:47:06 PM  all   99.69    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.25  16596.00
      06:47:07 PM  all   99.20    0.00    0.74    0.00    0.00    0.06    0.00    0.00    0.00  17838.61
      06:47:08 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16540.00
      06:47:09 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16575.00
      06:47:10 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16614.00
      06:47:11 PM  all   99.94    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.06  16588.00
      06:47:12 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.00  16593.00
      06:47:13 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.00  16551.00
      Average:     all   99.84    0.00    0.09    0.00    0.00    0.01    0.00    0.00    0.06  16711.58
      
      We see an improvement in idle% on the system (drops from 5.42% on -tip to 0.06%
      with the patches).
      
      We see an improvement in idle% on the system (drops from 5.42%
      on -tip to 0.06% with the patches).
      Signed-off-by: NNikhil Rao <ncrao@google.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
      Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Cc: Stephan Barwolf <stephan.baerwolf@tu-ilmenau.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1305754668-18792-1-git-send-email-ncrao@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c8b28116
    • N
      sched: Introduce SCHED_POWER_SCALE to scale cpu_power calculations · 1399fa78
      Nikhil Rao 提交于
      SCHED_LOAD_SCALE is used to increase nice resolution and to
      scale cpu_power calculations in the scheduler. This patch
      introduces SCHED_POWER_SCALE and converts all uses of
      SCHED_LOAD_SCALE for scaling cpu_power to use SCHED_POWER_SCALE
      instead.
      
      This is a preparatory patch for increasing the resolution of
      SCHED_LOAD_SCALE, and there is no need to increase resolution
      for cpu_power calculations.
      Signed-off-by: NNikhil Rao <ncrao@google.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
      Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Cc: Stephan Barwolf <stephan.baerwolf@tu-ilmenau.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Link: http://lkml.kernel.org/r/1305738580-9924-3-git-send-email-ncrao@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1399fa78
    • E
      macvlan: remove one synchronize_rcu() call · 449f4544
      Eric Dumazet 提交于
      When one macvlan device is dismantled, we can avoid one
      synchronize_rcu() call done after deletion from hash list, since caller
      will perform a synchronize_net() call after its ndo_stop() call.
      
      Add a new netdev->dismantle field to signal this dismantle intent.
      
      Reduces RTNL hold time.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Ben Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      449f4544
    • S
      signal.h need a definition of struct task_struct · 1477fcc2
      Stephen Rothwell 提交于
      This fixes these build errors on powerpc:
      
        In file included from arch/powerpc/mm/fault.c:18:
        include/linux/signal.h:239: error: 'struct task_struct' declared inside parameter list
        include/linux/signal.h:239: error: its scope is only this definition or declaration, which is probably not what you want
        include/linux/signal.h:240: error: 'struct task_struct' declared inside parameter list
        ..
      
      Exposed by commit e66eed65 ("list: remove prefetching from regular
      list iterators"), which removed the include of <linux/prefetch.h> from
      <linux/list.h>.
      
      Without that, linux/signal.h no longer accidentally got the declaration
      of 'struct task_struct'.
      
      Fix by properly declaring the struct, rather than introducing any new
      header file dependency.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1477fcc2
    • K
      libata: Power off empty ports · 8a745f1f
      Kristen Carlson Accardi 提交于
      Give users the option of completely powering off unoccupied
      SATA ports using the existing min_power link_power_management_policy
      option.  When the use selects this option on an empty port, we
      will power the port off by setting DET to off.  For occupied ports,
      behavior is unchanged.
      Signed-off-by: NKristen Carlson Accardi <kristen@linux.intel.com>
      Signed-off-by: NJeff Garzik <jgarzik@pobox.com>
      8a745f1f
    • S
      tty/serial: Fix break handling for PORT_TEGRA · 5f873bae
      Stephen Warren 提交于
      When a break is received, Tegra's UART apparently fills the FIFO with
      0 bytes. These must be drained so that they aren't interpreted as actual
      data received. This allows e.g. MAGIC_SYSRQ to work on Tegra's UARTs.
      
      v2: Added FIXME comment to clear_rx_fifo
      Originally-by: NLaxman Dewangan <ldewangan@nvidia.com>
      Cc: Laxman Dewangan <ldewangan@nvidia.com>
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Acked-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      5f873bae
    • S
      tty/serial: Add explicit PORT_TEGRA type · 4539c24f
      Stephen Warren 提交于
      Tegra's UART is currently auto-detected as PORT_XSCALE due to register
      bit UART_IER.UUE being writable. However, the Tegra documentation states
      that this register bit is reserved. Hence, we should not program it.
      
      Instead, the documentation specifies that the UART is 16550 compatible.
      However, Tegra does need register bit UART_IER.RTOIE set, which is not
      enabled by any 16550 port type. This was not noticed before, since
      PORT_XSCALE enables CAP_UUE, which conflates both UUE and RTOIE bit
      programming.
      
      This change defines PORT_TEGRA that doesn't set UART_CAP_UUE, but does
      set UART_CAP_RTOIE, which is a new capability indicating that the RTOIE
      bit needs to be enabled.
      Based-on-code-by: NLaxman Dewangan <ldewangan@nvidia.com>
      Cc: Laxman Dewangan <ldewangan@nvidia.com>
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Acked-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      4539c24f
    • L
      list: remove prefetching from regular list iterators · e66eed65
      Linus Torvalds 提交于
      This is removes the use of software prefetching from the regular list
      iterators.  We don't want it.  If you do want to prefetch in some
      iterator of yours, go right ahead.  Just don't expect the iterator to do
      it, since normally the downsides are bigger than the upsides.
      
      It also replaces <linux/prefetch.h> with <linux/const.h>, because the
      use of LIST_POISON ends up needing it.  <linux/poison.h> is sadly not
      self-contained, and including prefetch.h just happened to hide that.
      
      Suggested by David Miller (networking has a lot of regular lists that
      are often empty or a single entry, and prefetching is not going to do
      anything but add useless instructions).
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e66eed65
    • D
      ASoC: Asahi Kasei AK4641 codec driver · 00d27010
      Dmitry Artamonow 提交于
      A driver for the AK4641 codec used in iPAQ hx4700 and Glofiish M800
      among others.
      Signed-off-by: NHarald Welte <laforge@gnumonks.org>
      Signed-off-by: NPhilipp Zabel <philipp.zabel@gmail.com>
      Signed-off-by: NDmitry Artamonow <mad_soft@inbox.ru>
      Acked-by: NLiam Girdwood <lrg@ti.com>
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      00d27010
    • L
      hlist: remove software prefetching in hlist iterators · 75d65a42
      Linus Torvalds 提交于
      They not only increase the code footprint, they actually make things
      slower rather than faster.  On internationally acclaimed benchmarks
      ("make -j16" on an already fully built kernel source tree) the hlist
      prefetching slows down the build by up to 1%.
      
      (Almost all of it comes from hlist_for_each_entry_rcu() as used by
      avc_has_perm_noaudit(), which is very hot due to all the pathname
      lookups to see if there is anything to do).
      
      The cause seems to be two-fold:
      
       - on at least some Intel cores, prefetch(NULL) ends up with some
         microarchitectural stall due to the TLB miss that it incurs.  The
         hlist case triggers this very commonly, since the NULL pointer is the
         last entry in the list.
      
       - the prefetch appears to cause more D$ activity, probably because it
         prefetches hash list entries that are never actually used (because we
         ended the search early due to a hit).
      
      Regardless, the numbers clearly say that the implicit prefetching is
      simply a bad idea.  If some _particular_ user of the hlist iterators
      wants to prefetch the next list entry, they can do so themselves
      explicitly, rather than depend on all list iterators doing so
      implicitly.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75d65a42
    • E
      ipv6: reduce per device ICMP mib sizes · be281e55
      Eric Dumazet 提交于
      ipv6 has per device ICMP SNMP counters, taking too much space because
      they use percpu storage.
      
      needed size per device is :
      (512+4)*sizeof(long)*number_of_possible_cpus*2
      
      On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of
      memory per ipv6 enabled network device, taken in vmalloc pool.
      
      Since ICMP messages are rare, just use shared counters (atomic_long_t)
      
      Per network space ICMP counters are still using percpu memory, we might
      also convert them to shared counters in a future patch.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Denys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be281e55
  6. 19 5月, 2011 1 次提交