1. 04 3月, 2010 3 次提交
  2. 01 3月, 2010 1 次提交
  3. 28 2月, 2010 1 次提交
  4. 27 2月, 2010 11 次提交
    • P
      rcu: Fix accelerated GPs for last non-dynticked CPU · 71da8132
      Paul E. McKenney 提交于
      This patch disables irqs across the call to rcu_needs_cpu().  It
      also enforces a hold-off period so that the idle loop doesn't
      softirq itself to death when there are lots of RCU callbacks in
      flight on the last non-dynticked CPU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267231138-27856-3-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71da8132
    • P
      rcu: Fix accelerated grace periods for last non-dynticked CPU · a47cd880
      Paul E. McKenney 提交于
      It is invalid to invoke __rcu_process_callbacks() with irqs
      disabled, so do it indirectly via raise_softirq().  This
      requires a state-machine implementation to cycle through the
      grace-period machinery the required number of times.
      Located-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267231138-27856-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a47cd880
    • S
      ftrace: Add function names to dangling } in function graph tracer · f1c7f517
      Steven Rostedt 提交于
      The function graph tracer is currently the most invasive tracer
      in the ftrace family. It can easily overflow the buffer even with
      10megs per CPU. This means that events can often be lost.
      
      On start up, or after events are lost, if the function return is
      recorded but the function enter was lost, all we get to see is the
      exiting '}'.
      
      Here is how a typical trace output starts:
      
       [tracing] cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        0) + 91.897 us   |                  }
        0) ! 567.961 us  |                }
        0)   <========== |
        0) ! 579.083 us  |                _raw_spin_lock_irqsave();
        0)   4.694 us    |                _raw_spin_unlock_irqrestore();
        0) ! 594.862 us  |              }
        0) ! 603.361 us  |            }
        0) ! 613.574 us  |          }
        0) ! 623.554 us  |        }
        0)   3.653 us    |        fget_light();
        0)               |        sock_poll() {
      
      There are a series of '}' with no matching "func() {". There's no information
      to what functions these ending brackets belong to.
      
      This patch adds a stack on the per cpu structure used in outputting
      the function graph tracer to keep track of what function was outputted.
      Then on a function exit event, it checks the depth to see if the
      function exit has a matching entry event. If it does, then it only
      prints the '}', otherwise it adds the function name after the '}'.
      
      This allows function exit events to show what function they belong to
      at trace output startup, when the entry was lost due to ring buffer
      overflow, or even after a new task is scheduled in.
      
      Here is what the above trace will look like after this patch:
      
       [tracing] cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        0) + 91.897 us   |                  } (irq_exit)
        0) ! 567.961 us  |                } (smp_apic_timer_interrupt)
        0)   <========== |
        0) ! 579.083 us  |                _raw_spin_lock_irqsave();
        0)   4.694 us    |                _raw_spin_unlock_irqrestore();
        0) ! 594.862 us  |              } (add_wait_queue)
        0) ! 603.361 us  |            } (__pollwait)
        0) ! 613.574 us  |          } (tcp_poll)
        0) ! 623.554 us  |        } (sock_poll)
        0)   3.653 us    |        fget_light();
        0)               |        sock_poll() {
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f1c7f517
    • R
      PM / Hibernate: Fix preallocating of memory · a9c9b442
      Rafael J. Wysocki 提交于
      The hibernate memory preallocation code allocates memory to push some
      user space data out of physical RAM, so that the hibernation image is
      not too large.  It allocates more memory than necessary for creating
      the image, so it has to release some pages to make room for
      allocations made while suspending devices and disabling nonboot CPUs,
      or the system will hang due to the lack of free pages to allocate
      from.  Unfortunately, the function used for freeing these pages,
      free_unnecessary_pages(), contains a bug that prevents it from doing
      the job on all systems without highmem.
      
      Fix this problem, which is a regression from the 2.6.30 kernel, by
      using the right condition for the termination of the loop in
      free_unnecessary_pages().
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reported-and-tested-by: NAlan Jenkins <sourcejedi.lkml@googlemail.com>
      Cc: stable@kernel.org
      a9c9b442
    • J
      PM / Hibernate: Remove swsusp.c finally · f8bb0db8
      Jiri Slaby 提交于
      Its contents and entry in Makefile were already removed in
      8e60c6a1
      (Shift remaining code from swsusp.c to hibernate.c)
      but somehow it remained in-place (rjw: which most likely was my
      mistake).
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Acked-by: NNigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      f8bb0db8
    • F
      PM / Hibernate: Remove trailing space in message · 07c3bb57
      Frans Pop 提交于
      Remove a trailing space from a message in swsusp_save().
      Signed-off-by: NFrans Pop <elendil@planet.nl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      07c3bb57
    • J
      PM / Hibernate: Swap, remove useless check from swsusp_read() · 09c09bc6
      Jiri Slaby 提交于
      It will never reach here if the sws_resume_bdev is erratic.
      swsusp_read() is called only from software_resume(), but after
      swsusp_check() which would catch the error state.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      09c09bc6
    • J
      PM / Hibernate: Really deprecate deprecated user ioctls · b694e52e
      Jiri Slaby 提交于
      They were deprecated and removed from exported headers more than 2
      years ago. Inform users about their removal in the future now.
      
      (Switch cases needed to be reorderded for an easy fall through.)
      
      And add an entry to feature-removal-schedule.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      b694e52e
    • R
      PM: Add facility for advanced testing of async suspend/resume · 5a2eb858
      Rafael J. Wysocki 提交于
      Add configuration switch CONFIG_PM_ADVANCED_DEBUG for compiling in
      extra PM debugging/testing code allowing one to access some
      PM-related attributes of devices from the user space via sysfs.
      
      If CONFIG_PM_ADVANCED_DEBUG is set, add sysfs attribute power/async
      for every device allowing the user space to access the device's
      power.async_suspend flag and modify it, if desired.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      5a2eb858
    • R
      PM: Add a switch for disabling/enabling asynchronous suspend/resume · 0e06b4a8
      Rafael J. Wysocki 提交于
      Add sysfs attribute /sys/power/pm_async allowing the user space to
      disable/enable asynchronous suspend/resume of devices.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      0e06b4a8
    • P
      perf_event: Fix preempt warning in perf_clock() · 24691ea9
      Peter Zijlstra 提交于
      A recent commit introduced a preemption warning for
      perf_clock(), use raw_smp_processor_id() to avoid this, it
      really doesn't matter which cpu we use here.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1267198583.22519.684.camel@laptop>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      24691ea9
  5. 26 2月, 2010 8 次提交
    • S
      sched: Fix SCHED_MC regression caused by change in sched cpu_power · dd5feea1
      Suresh Siddha 提交于
      On platforms like dual socket quad-core platform, the scheduler load
      balancer is not detecting the load imbalances in certain scenarios. This
      is leading to scenarios like where one socket is completely busy (with
      all the 4 cores running with 4 tasks) and leaving another socket
      completely idle. This causes performance issues as those 4 tasks share
      the memory controller, last-level cache bandwidth etc. Also we won't be
      taking advantage of turbo-mode as much as we would like, etc.
      
      Some of the comparisons in the scheduler load balancing code are
      comparing the "weighted cpu load that is scaled wrt sched_group's
      cpu_power" with the "weighted average load per task that is not scaled
      wrt sched_group's cpu_power". While this has probably been broken for a
      longer time (for multi socket numa nodes etc), the problem got aggrevated
      via this recent change:
      
       |
       |  commit f93e65c1
       |  Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
       |  Date:   Tue Sep 1 10:34:32 2009 +0200
       |
       |	sched: Restore __cpu_power to a straight sum of power
       |
      
      Also with this change, the sched group cpu power alone no longer reflects
      the group capacity that is needed to implement MC, MT performance
      (default) and power-savings (user-selectable) policies.
      
      We need to use the computed group capacity (sgs.group_capacity, that is
      computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to
      find out if the group with the max load is above its capacity and how
      much load to move etc.
      Reported-by: NMa Ling <ling.ma@intel.com>
      Initial-Analysis-by: NZhang, Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      [ -v2: build fix ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x]
      LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dd5feea1
    • P
      perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in() · 6e37738a
      Peter Zijlstra 提交于
      Since the cpu argument to hw_perf_group_sched_in() is always
      smp_processor_id(), simplify the code a little by removing this argument
      and using the current cpu where needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1265890918.5396.3.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e37738a
    • S
      perf_events, x86: AMD event scheduling · 38331f62
      Stephane Eranian 提交于
      This patch adds correct AMD NorthBridge event scheduling.
      
      NB events are events measuring L3 cache, Hypertransport traffic. They are
      identified by an event code >= 0xe0. They measure events on the
      Northbride which is shared by all cores on a package. NB events are
      counted on a shared set of counters. When a NB event is programmed in a
      counter, the data actually comes from a shared counter. Thus, access to
      those counters needs to be synchronized.
      
      We implement the synchronization such that no two cores can be measuring
      NB events using the same counters. Thus, we maintain a per-NB allocation
      table. The available slot is propagated using the event_constraint
      structure.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703957.0702d00a.6bf2.7b7d@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      38331f62
    • S
      perf_events: Add new start/stop PMU callbacks · d76a0812
      Stephane Eranian 提交于
      In certain situations, the kernel may need to stop and start the same
      event rapidly. The current PMU callbacks do not distinguish between stop
      and release (i.e., stop + free the resource). Thus, a counter may be
      released, then it will be immediately re-acquired. Event scheduling will
      again take place with no guarantee to assign the same counter. On some
      processors, this may event yield to failure to assign the event back due
      to competion between cores.
      
      This patch is adding a new pair of callback to stop and restart a counter
      without actually release the underlying counter resource. On stop, the
      counter is stopped, its values saved and that's it. On start, the value
      is reloaded and counter is restarted (on x86, actual restart is delayed
      until perf_enable()).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added fallback to ->enable/->disable for all other PMUs
        fixed x86_pmu_start() to call x86_pmu.enable()
        merged __x86_pmu_disable into x86_pmu_stop() ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703875.0a04d00a.7896.ffffb824@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d76a0812
    • P
      perf_events: Report the MMAP pgoff value in bytes · 3a0304e9
      Peter Zijlstra 提交于
      DaveM reported that currently perf interprets the pgoff value reported by
      the MMAP events as a byte range, but the kernel reports it as a page
      offset.
      
      Since its broken (and unusable) anyway, change the kernel behaviour (ABI)
      to report bytes indeed, avoiding the need for userspace to deal with
      PAGE_SIZE things.
      Reported-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a0304e9
    • P
      rcu: Export rcu_scheduler_active · f5f65409
      Paul E. McKenney 提交于
      Kernel modules using rcu_read_lock_sched_held() must now have
      access to rcu_scheduler_active, so it must be exported.
      
      This should fix the fix for the boot-time RCU-lockdep splat.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <20100226030230.GA7743@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f5f65409
    • P
      rcu: Make rcu_read_lock_sched_held() take boot time into account · d9f1bb6a
      Paul E. McKenney 提交于
      Before the scheduler starts, all tasks are non-preemptible by
      definition. So, during that time, rcu_read_lock_sched_held()
      needs to always return "true".  This patch makes that be so.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267135607-7056-2-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9f1bb6a
    • P
      rcu: Make lockdep_rcu_dereference() message less alarmist · 056ba4a9
      Paul E. McKenney 提交于
      Change from "unsafe" to "suspicious", given that there will be
      false alarms.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267135607-7056-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      056ba4a9
  6. 25 2月, 2010 16 次提交