1. 22 11月, 2010 1 次提交
    • M
      drm/vblank: Add support for precise vblank timestamping. · 27641c3f
      Mario Kleiner 提交于
      The DRI2 swap & sync implementation needs precise
      vblank counts and precise timestamps corresponding
      to those vblank counts. For conformance to the OpenML
      OML_sync_control extension specification the DRM
      timestamp associated with a vblank count should
      correspond to the start of video scanout of the first
      scanline of the video frame following the vblank
      interval for that vblank count.
      
      Therefore we need to carry around precise timestamps
      for vblanks. Currently the DRM and KMS drivers generate
      timestamps ad-hoc via do_gettimeofday() in some
      places. The resulting timestamps are sometimes not
      very precise due to interrupt handling delays, they
      don't conform to OML_sync_control and some are wrong,
      as they aren't taken synchronized to the vblank.
      
      This patch implements support inside the drm core
      for precise and robust timestamping. It consists
      of the following interrelated pieces.
      
      1. Vblank timestamp caching:
      
      A per-crtc ringbuffer stores the most recent vblank
      timestamps corresponding to vblank counts.
      
      The ringbuffer can be read out lock-free via the
      accessor function:
      
      struct timeval timestamp;
      vblankcount = drm_vblank_count_and_time(dev, crtcid, &timestamp).
      
      The function returns the current vblank count and
      the corresponding timestamp for start of video
      scanout following the vblank interval. It can be
      used anywhere between enclosing drm_vblank_get(dev, crtcid)
      and drm_vblank_put(dev,crtcid) statements. It is used
      inside the drmWaitVblank ioctl and in the vblank event
      queueing and handling. It should be used by kms drivers for
      timestamping of bufferswap completion.
      
      The timestamp ringbuffer is reinitialized each time
      vblank irq's get reenabled in drm_vblank_get()/
      drm_update_vblank_count(). It is invalidated when
      vblank irq's get disabled.
      
      The ringbuffer is updated inside drm_handle_vblank()
      at each vblank irq.
      
      2. Calculation of precise vblank timestamps:
      
      drm_get_last_vbltimestamp() is used to compute the
      timestamp for the end of the most recent vblank (if
      inside active scanout), or the expected end of the
      current vblank interval (if called inside a vblank
      interval). The function calls into a new optional kms
      driver entry point dev->driver->get_vblank_timestamp()
      which is supposed to provide the precise timestamp.
      If a kms driver doesn't implement the entry point or
      if the call fails, a simple do_gettimeofday() timestamp
      is returned as crude approximation of the true vblank time.
      
      A new drm module parameter drm.timestamp_precision_usec
      allows to disable high precision timestamps (if set to
      zero) or to specify the maximum acceptable error in
      the timestamps in microseconds.
      
      Kms drivers could implement their get_vblank_timestamp()
      function in a gpu specific way, as long as returned
      timestamps conform to OML_sync_control, e.g., by use
      of gpu specific hardware timestamps.
      
      Optionally, kms drivers can simply wrap and use the new
      utility function drm_calc_vbltimestamp_from_scanoutpos().
      This function calls a new optional kms driver function
      dev->driver->get_scanout_position() which returns the
      current horizontal and vertical video scanout position
      of the crtc. The scanout position together with the
      drm_display_timing of the current video mode is used
      to calculate elapsed time relative to start of active scanout
      for the current video frame. This elapsed time is subtracted
      from the current do_gettimeofday() time to get the timestamp
      corresponding to start of video scanout. Currently
      non-interlaced, non-doublescan video modes, with or
      without panel scaling are handled correctly. Interlaced/
      doublescan modes are tbd in a future patch.
      
      3. Filtering of redundant vblank irq's and removal of
      some race-conditions in the vblank irq enable/disable path:
      
      Some gpu's (e.g., Radeon R500/R600) send spurious vblank
      irq's outside the vblank if vblank irq's get reenabled.
      These get detected by use of the vblank timestamps and
      filtered out to avoid miscounting of vblanks.
      
      Some race-conditions between the vblank irq enable/disable
      functions, the vblank irq handler and the gpu itself (updating
      its hardware vblank counter in the "wrong" moment) are
      fixed inside vblank_disable_and_save() and
      drm_update_vblank_count() by use of the vblank timestamps and
      a new spinlock dev->vblank_time_lock.
      
      The time until vblank irq disable is now configurable via
      a new drm module parameter drm.vblankoffdelay to allow
      experimentation with timeouts that are much shorter than
      the current 5 seconds and should allow longer vblank off
      periods for better power savings.
      
      Followup patches will use these new functions to
      implement precise timestamping for the intel and radeon
      kms drivers.
      Signed-off-by: NMario Kleiner <mario.kleiner@tuebingen.mpg.de>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      27641c3f
  2. 20 11月, 2010 1 次提交
  3. 19 11月, 2010 2 次提交
  4. 18 11月, 2010 4 次提交
  5. 17 11月, 2010 2 次提交
  6. 16 11月, 2010 8 次提交
    • T
      NLM: Fix a regression in lockd · 8e35f8e7
      Trond Myklebust 提交于
      Nick Bowler reports:
      There are no unusual messages on the client... but I just logged into
      the server and I see lots of messages of the following form:
      
        nfsd: request from insecure port (192.168.8.199:35766)!
        nfsd: request from insecure port (192.168.8.199:35766)!
        nfsd: request from insecure port (192.168.8.199:35766)!
        nfsd: request from insecure port (192.168.8.199:35766)!
        nfsd: request from insecure port (192.168.8.199:35766)!
      
      Bisected to commit 92476850 (SUNRPC:
      Properly initialize sock_xprt.srcaddr in all cases)
      
      Apparently, removing the 'transport->srcaddr.ss_family = family' from
      xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly
      initialising the srcaddr family to AF_UNSPEC.
      Reported-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      8e35f8e7
    • E
      capabilities/syslog: open code cap_syslog logic to fix build failure · 12b3052c
      Eric Paris 提交于
      The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
      failure when CONFIG_PRINTK=n.  This is because the capabilities code
      which used the new option was built even though the variable in question
      didn't exist.
      
      The patch here fixes this by moving the capabilities checks out of the
      LSM and into the caller.  All (known) LSMs should have been calling the
      capabilities hook already so it actually makes the code organization
      better to eliminate the hook altogether.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12b3052c
    • J
      i2c: Mark i2c_adapter.id as deprecated · e1e18ee1
      Jean Delvare 提交于
      It's about time to make it clear that i2c_adapter.id is deprecated.
      Hopefully this will remind the last user to move over to a different
      strategy.
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Acked-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Acked-by: NHans Verkuil <hverkuil@xs4all.nl>
      e1e18ee1
    • J
      i2c: Delete unused adapter IDs · dfdee5f0
      Jean Delvare 提交于
      Delete unused I2C adapter IDs. Special cases are:
      
      * I2C_HW_B_RIVA was still set in driver rivafb, however no other
        driver is ever looking for this value, so we can safely remove it.
      * I2C_HW_B_HDPVR is used in staging driver lirc_zilog, however no
        adapter ID is ever set to this value, so the code in question never
        runs. As the code additionally expects that I2C_HW_B_HDPVR may not
        be defined, we can delete it now and let the lirc_zilog driver
        maintainer rewrite this piece of code.
      
      Big thanks for Hans Verkuil for doing all the hard work :)
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Acked-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Acked-by: NHans Verkuil <hverkuil@xs4all.nl>
      dfdee5f0
    • L
      include/linux/kernel.h: Move logging bits to include/linux/printk.h · 968ab183
      Linus Torvalds 提交于
      Move the logging bits from kernel.h into printk.h so that
      there is a bit more logical separation of the generic from
      the printk logging specific parts.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      968ab183
    • J
      cfg80211: fix WIPHY_FLAG_IBSS_RSN bit · 309075cf
      Jussi Kivilinna 提交于
      WIPHY_FLAG_IBSS_RSN is BIT(7) as is WIPHY_FLAG_CONTROL_PORT_PROTOCOL. Change
      to BIT(8).
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Acked-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      309075cf
    • A
      b43legacy: Fix compile on ARM architecture · 62370e2b
      Arnd Hannemann 提交于
      When b43legacy is compiled on the arm platform, the following errors are seen:
      
        CC [M]  drivers/net/wireless/b43legacy/xmit.o
      In file included from include/net/dst.h:11,
      from drivers/net/wireless/b43legacy/xmit.c:31:
      include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__'
         before '____cacheline_aligned_in_smp'
      include/net/dst_ops.h: In function 'dst_entries_get_fast':
      include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named
         'pcpuc_entries'
      include/net/dst_ops.h: In function 'dst_entries_get_slow':
      include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named
         'pcpuc_entries'
      include/net/dst_ops.h: In function 'dst_entries_add':
      include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named
         'pcpuc_entries'
      include/net/dst_ops.h: In function 'dst_entries_init':
      include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named
         'pcpuc_entries'
      include/net/dst_ops.h: In function 'dst_entries_destroy':
      include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named
         'pcpuc_entries'
      make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
      make[3]: *** [drivers/net/wireless/b43legacy] Error 2
      make[2]: *** [drivers/net/wireless] Error 2
      make[1]: *** [drivers/net] Error 2
      make: *** [drivers] Error 2
      
      The cause is a missing include of <linux/cache.h>, which is present for
      i386 and x86_64 architectures, but not for arm.
      Signed-off-by: NArnd Hannemann <arnd@arndnet.de>
      Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Stable <stable@kernel.org>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      62370e2b
    • A
      net: rtnetlink.h -- only include linux/netdevice.h when used by the kernel · 3b42a96d
      Andy Whitcroft 提交于
      The commit below added a new helper dev_ingress_queue to cleanly obtain the
      ingress queue pointer.  This necessitated including 'linux/netdevice.h':
      
        commit 24824a09
        Author: Eric Dumazet <eric.dumazet@gmail.com>
        Date:   Sat Oct 2 06:11:55 2010 +0000
      
          net: dynamic ingress_queue allocation
      
      However this include triggers issues for applications in userspace
      which use the rtnetlink interfaces.  Commonly this requires they include
      'net/if.h' and 'linux/rtnetlink.h' leading to a compiler error as below:
      
        In file included from /usr/include/linux/netdevice.h:28:0,
                         from /usr/include/linux/rtnetlink.h:9,
                         from t.c:2:
        /usr/include/linux/if.h:135:8: error: redefinition of ‘struct ifmap’
        /usr/include/net/if.h:112:8: note: originally defined here
        /usr/include/linux/if.h:169:8: error: redefinition of ‘struct ifreq’
        /usr/include/net/if.h:127:8: note: originally defined here
        /usr/include/linux/if.h:218:8: error: redefinition of ‘struct ifconf’
        /usr/include/net/if.h:177:8: note: originally defined here
      
      The new helper is only defined for the kernel and protected by __KERNEL__
      therefore we can simply pull the include down into the same protected
      section.
      Signed-off-by: NAndy Whitcroft <apw@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b42a96d
  7. 15 11月, 2010 1 次提交
  8. 13 11月, 2010 1 次提交
  9. 12 11月, 2010 10 次提交
    • A
      backlight: add low threshold to pwm backlight · fef7764f
      Arun Murthy 提交于
      The intensity of the backlight can be varied from a range of
      max_brightness to zero.  Though most, if not all the pwm based backlight
      devices start flickering at lower brightness value.  And also for each
      device there exists a brightness value below which the backlight appears
      to be turned off though the value is not equal to zero.
      
      If the range of brightness for a device is from zero to max_brightness.  A
      graph is plotted for brightness Vs intensity for the pwm based backlight
      device has to be a linear graph.
      
      intensity
      	  |   /
      	  |  /
      	  | /
      	  |/
      	  ---------
      	 0	max_brightness
      
      But pratically on measuring the above we note that the intensity of
      backlight goes to zero(OFF) when the value in not zero almost nearing to
      zero(some x%).  so the graph looks like
      
      intensity
      	  |    /
      	  |   /
      	  |  /
      	  |  |
      	  ------------
      	 0   x	 max_brightness
      
      In order to overcome this drawback knowing this x% i.e nothing but the low
      threshold beyond which the backlight is off and will have no effect, the
      brightness value is being offset by the low threshold value(retaining the
      linearity of the graph).  Now the graph becomes
      
      intensity
      	  |     /
      	  |    /
      	  |   /
      	  |  /
      	  -------------
      	   0	  max_brightness
      
      With this for each and every digit increment in the brightness from zero
      there is a change in the intensity of backlight.  Devices having this
      behaviour can set the low threshold brightness(lth_brightness) and pass
      the same as platform data else can have it as zero.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NArun Murthy <arun.murthy@stericsson.com>
      Acked-by: NLinus Walleij <linus.walleij@stericsson.com>
      Acked-by: NRichard Purdie <rpurdie@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fef7764f
    • S
      leds: driver for National Semiconductors LP5523 chip · 0efba16c
      Samu Onkalo 提交于
      LP5523 chip is nine channel led driver with programmable engines.  Driver
      provides support for that chip for direct access via led class or via
      programmable engines.
      Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Jean Delvare <khali@linux-fr.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0efba16c
    • S
      leds: driver for National Semiconductor LP5521 chip · 500fe141
      Samu Onkalo 提交于
      This patchset provides support for LP5521 and LP5523 LED driver chips from
      National Semicondutor.  Both drivers supports programmable engines and
      naturally LED class features.
      
      Documentation is provided as a part of the patchset.  I created "leds"
      subdirectory under Documentation.  Perhaps the rest of the leds*
      documentation should be moved there.
      
      Datasheets are freely available at National Semiconductor www pages.
      
      This patch:
      
      LP5521 chip is three channel led driver with programmable engines.  Driver
      provides support for that chip for direct access via led class or via
      programmable engines.
      Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Jean Delvare <khali@linux-fr.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      500fe141
    • J
      led-class: always implement blinking · 5ada28bf
      Johannes Berg 提交于
      Currently, blinking LEDs can be awkward because it is not guaranteed that
      all LEDs implement blinking.  The trigger that wants it to blink then
      needs to implement its own timer solution.
      
      Rather than require that, add led_blink_set() API that triggers can use.
      This function will attempt to use hw blinking, but if that fails
      implements a timer for it.  To stop blinking again, brightness_set() also
      needs to be wrapped into API that will stop the software blink.
      
      As a result of this, the timer trigger becomes a very trivial one, and
      hopefully we can finally see triggers using blinking as well because it's
      always easy to use.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NRichard Purdie <rpurdie@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ada28bf
    • N
      radix-tree: fix RCU bug · 27d20fdd
      Nick Piggin 提交于
      Salman Qazi describes the following radix-tree bug:
      
      In the following case, we get can get a deadlock:
      
      0.  The radix tree contains two items, one has the index 0.
      1.  The reader (in this case find_get_pages) takes the rcu_read_lock.
      2.  The reader acquires slot(s) for item(s) including the index 0 item.
      3.  The non-zero index item is deleted, and as a consequence the other item is
          moved to the root of the tree. The place where it used to be is queued for
          deletion after the readers finish.
      3b. The zero item is deleted, removing it from the direct slot, it remains in
          the rcu-delayed indirect node.
      4.  The reader looks at the index 0 slot, and finds that the page has 0 ref
          count
      5.  The reader looks at it again, hoping that the item will either be freed or
          the ref count will increase. This never happens, as the slot it is looking
          at will never be updated. Also, this slot can never be reclaimed because
          the reader is holding rcu_read_lock and is in an infinite loop.
      
      The fix is to re-use the same "indirect" pointer case that requires a slot
      lookup retry into a general "retry the lookup" bit.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Reported-by: NSalman Qazi <sqazi@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27d20fdd
    • D
      Restrict unprivileged access to kernel syslog · eaf06b24
      Dan Rosenberg 提交于
      The kernel syslog contains debugging information that is often useful
      during exploitation of other vulnerabilities, such as kernel heap
      addresses.  Rather than futilely attempt to sanitize hundreds (or
      thousands) of printk statements and simultaneously cripple useful
      debugging functionality, it is far simpler to create an option that
      prevents unprivileged users from reading the syslog.
      
      This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
      dmesg_restrict sysctl.  When set to "0", the default, no restrictions are
      enforced.  When set to "1", only users with CAP_SYS_ADMIN can read the
      kernel syslog via dmesg(8) or other mechanisms.
      
      [akpm@linux-foundation.org: explain the config option in kernel.txt]
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NEugene Teo <eugeneteo@kernel.org>
      Acked-by: NKees Cook <kees.cook@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eaf06b24
    • C
      include/linux/highmem.h needs hardirq.h · 43b3a0c7
      Catalin Marinas 提交于
      Commit 3e4d3af5 ("mm: stack based kmap_atomic()") introduced the
      kmap_atomic_idx_push() function which warns on in_irq() with
      CONFIG_DEBUG_HIGHMEM enabled.  This patch includes linux/hardirq.h for
      the in_irq definition.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43b3a0c7
    • E
      atomic: add atomic_inc_not_zero_hint() · 3f9d35b9
      Eric Dumazet 提交于
      Followup of perf tools session in Netfilter WorkShop 2010
      
      In the network stack we make high usage of atomic_inc_not_zero() in
      contexts we know the probable value of atomic before increment (2 for udp
      sockets for example)
      
      Using a special version of atomic_inc_not_zero() giving this hint can help
      processor to use less bus transactions.
      
      On x86 (MESI protocol) for example, this avoids entering Shared state,
      because "lock cmpxchg" issues an RFO (Read For Ownership)
      
      akpm: Adds a new include/linux/atomic.h.  This means that new code should
      henceforth include linux/atomic.h and not asm/atomic.h.  The presence of
      include/linux/atomic.h will in fact cause checkpatch.pl to warn about use
      of asm/atomic.h.  The new include/linux/atomic.h becomes the place where
      arch-neutral atomic_t code should be placed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f9d35b9
    • J
      include/linux/resource.h needs types.h · 8705a1ba
      Jean Delvare 提交于
      Fix the following warning:
      usr/include/linux/resource.h:49: found __[us]{8,16,32,64} type without #include <linux/types.h>
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8705a1ba
    • E
      netfilter: NF_HOOK_COND has wrong conditional · ac5aa2e3
      Eric Paris 提交于
      The NF_HOOK_COND returns 0 when it shouldn't due to what I believe to be an
      error in the code as the order of operations is not what was intended.  C will
      evalutate == before =.  Which means ret is getting set to the bool result,
      rather than the return value of the function call.  The code says
      
      if (ret = function() == 1)
      when it meant to say:
      if ((ret = function()) == 1)
      
      Normally the compiler would warn, but it doesn't notice it because its
      a actually complex conditional and so the wrong code is wrapped in an explict
      set of () [exactly what the compiler wants you to do if this was intentional].
      Fixing this means that errors when netfilter denies a packet get propagated
      back up the stack rather than lost.
      
      Problem introduced by commit 2249065f (netfilter: get rid of the grossness
      in netfilter.h).
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      ac5aa2e3
  10. 11 11月, 2010 4 次提交
    • J
      block: remove unused copy_io_context() · cedb4a7d
      Jens Axboe 提交于
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      cedb4a7d
    • S
      sched: Use group weight, idle cpu metrics to fix imbalances during idle · aae6d3dd
      Suresh Siddha 提交于
      Currently we consider a sched domain to be well balanced when the imbalance
      is less than the domain's imablance_pct. As the number of cores and threads
      are increasing, current values of imbalance_pct (for example 25% for a
      NUMA domain) are not enough to detect imbalances like:
      
      a) On a WSM-EP system (two sockets, each having 6 cores and 12 logical threads),
      24 cpu-hogging tasks get scheduled as 13 on one socket and 11 on another
      socket. Leading to an idle HT cpu.
      
      b) On a hypothetial 2 socket NHM-EX system (each socket having 8 cores and
      16 logical threads), 16 cpu-hogging tasks can get scheduled as 9 on one
      socket and 7 on another socket. Leaving one core in a socket idle
      whereas in another socket we have a core having both its HT siblings busy.
      
      While this issue can be fixed by decreasing the domain's imbalance_pct
      (by making it a function of number of logical cpus in the domain), it
      can potentially cause more task migrations across sched groups in an
      overloaded case.
      
      Fix this by using imbalance_pct only during newly_idle and busy
      load balancing. And during idle load balancing, check if there
      is an imbalance in number of idle cpu's across the busiest and this
      sched_group or if the busiest group has more tasks than its weight that
      the idle cpu in this_group can pull.
      Reported-by: NNikhil Rao <ncrao@google.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1284760952.2676.11.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aae6d3dd
    • S
      perf_events: Fix time tracking in samples · eed01528
      Stephane Eranian 提交于
      This patch corrects time tracking in samples. Without this patch
      both time_enabled and time_running are bogus when user asks for
      PERF_SAMPLE_READ.
      
      One uses PERF_SAMPLE_READ to sample the values of other counters
      in each sample. Because of multiplexing, it is necessary to know
      both time_enabled, time_running to be able to scale counts correctly.
      
      In this second version of the patch, we maintain a shadow
      copy of ctx->time which allows us to compute ctx->time without
      calling update_context_time() from NMI context. We avoid the
      issue that update_context_time() must always be called with
      ctx->lock held.
      
      We do not keep shadow copies of the other event timings
      because if the lead event is overflowing then it is active
      and thus it's been scheduled in via event_sched_in() in
      which case neither tstamp_stopped, tstamp_running can be modified.
      
      This timing logic only applies to samples when PERF_SAMPLE_READ
      is used.
      
      Note that this patch does not address timing issues related
      to sampling inheritance between tasks. This will be addressed
      in a future patch.
      
      With this patch, the libpfm4 example task_smpl now reports
      correct counts (shown on 2.4GHz Core 2):
      
      $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears  noploop 5
      noploop for 5 seconds
      IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3
      	2,400,000,254 unhalted_core_cycles:u (33)
      	2,399,273,744 instructions_retired:u (34)
      	53,340 baclears (35)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eed01528
    • E
      net: avoid limits overflow · 8d987e5c
      Eric Dumazet 提交于
      Robin Holt tried to boot a 16TB machine and found some limits were
      reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]
      
      We can switch infrastructure to use long "instead" of "int", now
      atomic_long_t primitives are available for free.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NRobin Holt <holt@sgi.com>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d987e5c
  11. 10 11月, 2010 6 次提交