1. 09 3月, 2015 1 次提交
    • T
      workqueue: dump workqueues on sysrq-t · 3494fc30
      Tejun Heo 提交于
      Workqueues are used extensively throughout the kernel but sometimes
      it's difficult to debug stalls involving work items because visibility
      into its inner workings is fairly limited.  Although sysrq-t task dump
      annotates each active worker task with the information on the work
      item being executed, it is challenging to find out which work items
      are pending or delayed on which queues and how pools are being
      managed.
      
      This patch implements show_workqueue_state() which dumps all busy
      workqueues and pools and is called from the sysrq-t handler.  At the
      end of sysrq-t dump, something like the following is printed.
      
       Showing busy workqueues and worker pools:
       ...
       workqueue filler_wq: flags=0x0
         pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
           in-flight: 491:filler_workfn, 507:filler_workfn
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
           in-flight: 501:filler_workfn
           pending: filler_workfn
       ...
       workqueue test_wq: flags=0x8
         pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
           in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
           delayed: test_workfn1 BAR(492), test_workfn2
       ...
       pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
       pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
       pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
       pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62
      
      The above shows that test_wq is executing test_workfn() on pid 510
      which is the rescuer and also that there are two tasks 69 and 500
      waiting for the work item to finish in flush_work().  As test_wq has
      max_active of 1, there are two work items for test_workfn1() and
      test_workfn2() which are delayed till the current work item is
      finished.  In addition, pid 492 is flushing test_workfn1().
      
      The work item for test_workfn() is being executed on pwq of pool 2
      which is the normal priority per-cpu pool for CPU 1.  The pool has
      three workers, two of which are executing filler_workfn() for
      filler_wq and the last one is assuming the manager role trying to
      create more workers.
      
      This extra workqueue state dump will hopefully help chasing down hangs
      involving workqueues.
      
      v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.
      
      v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
          printk()'s replaced with pr_info()'s, and cpumask printing now
          uses cpulist_pr_cont().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      CC: Ingo Molnar <mingo@redhat.com>
      3494fc30
  2. 12 2月, 2015 2 次提交
    • M
      oom, PM: make OOM detection in the freezer path raceless · c32b3cbe
      Michal Hocko 提交于
      Commit 5695be14 ("OOM, PM: OOM killed task shouldn't escape PM
      suspend") has left a race window when OOM killer manages to
      note_oom_kill after freeze_processes checks the counter.  The race
      window is quite small and really unlikely and partial solution deemed
      sufficient at the time of submission.
      
      Tejun wasn't happy about this partial solution though and insisted on a
      full solution.  That requires the full OOM and freezer's task freezing
      exclusion, though.  This is done by this patch which introduces oom_sem
      RW lock and turns oom_killer_disable() into a full OOM barrier.
      
      oom_killer_disabled check is moved from the allocation path to the OOM
      level and we take oom_sem for reading for both the check and the whole
      OOM invocation.
      
      oom_killer_disable() takes oom_sem for writing so it waits for all
      currently running OOM killer invocations.  Then it disable all the further
      OOMs by setting oom_killer_disabled and checks for any oom victims.
      Victims are counted via mark_tsk_oom_victim resp.  unmark_oom_victim.  The
      last victim wakes up all waiters enqueued by oom_killer_disable().
      Therefore this function acts as the full OOM barrier.
      
      The page fault path is covered now as well although it was assumed to be
      safe before.  As per Tejun, "We used to have freezing points deep in file
      system code which may be reacheable from page fault." so it would be
      better and more robust to not rely on freezing points here.  Same applies
      to the memcg OOM killer.
      
      out_of_memory tells the caller whether the OOM was allowed to trigger and
      the callers are supposed to handle the situation.  The page allocation
      path simply fails the allocation same as before.  The page fault path will
      retry the fault (more on that later) and Sysrq OOM trigger will simply
      complain to the log.
      
      Normally there wouldn't be any unfrozen user tasks after
      try_to_freeze_tasks so the function will not block. But if there was an
      OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
      finish yet then we have to wait for it. This should complete in a finite
      time, though, because
      
      	- the victim cannot loop in the page fault handler (it would die
      	  on the way out from the exception)
      	- it cannot loop in the page allocator because all the further
      	  allocation would fail and __GFP_NOFAIL allocations are not
      	  acceptable at this stage
      	- it shouldn't be blocked on any locks held by frozen tasks
      	  (try_to_freeze expects lockless context) and kernel threads and
      	  work queues are not frozen yet
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c32b3cbe
    • M
      sysrq: convert printk to pr_* equivalent · 401e4a7c
      Michal Hocko 提交于
      While touching this area let's convert printk to pr_*.  This also makes
      the printing of continuation lines done properly.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      401e4a7c
  3. 07 8月, 2014 1 次提交
  4. 07 6月, 2014 2 次提交
    • R
      sysrq,rcu: suppress RCU stall warnings while sysrq runs · 722773af
      Rik van Riel 提交于
      Some sysrq handlers can run for a long time, because they dump a lot of
      data onto a serial console.  Having RCU stall warnings pop up in the
      middle of them only makes the problem worse.
      
      This patch temporarily disables RCU stall warnings while a sysrq request
      is handled.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NPaul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Madper Xie <cxie@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      722773af
    • R
      sysrq: rcu-ify __handle_sysrq · 984d74a7
      Rik van Riel 提交于
      Echoing values into /proc/sysrq-trigger seems to be a popular way to get
      information out of the kernel.  However, dumping information about
      thousands of processes, or hundreds of CPUs to serial console can result
      in IRQs being blocked for minutes, resulting in various kinds of cascade
      failures.
      
      The most common failure is due to interrupts being blocked for a very
      long time.  This can lead to things like failed IO requests, and other
      things the system cannot easily recover from.
      
      This problem is easily fixable by making __handle_sysrq use RCU instead
      of spin_lock_irqsave.
      
      This leaves the warning that RCU grace periods have not elapsed for a
      long time, but the system will come back from that automatically.
      
      It also leaves sysrq-from-irq-context when the sysrq keys are pressed,
      but that is probably desired since people want that to work in
      situations where the system is already hosed.
      
      The callers of register_sysrq_key and unregister_sysrq_key appear to be
      capable of sleeping.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reported-by: NMadper Xie <cxie@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      984d74a7
  5. 05 6月, 2014 1 次提交
  6. 17 10月, 2013 1 次提交
  7. 13 8月, 2013 1 次提交
  8. 07 6月, 2013 1 次提交
  9. 04 6月, 2013 1 次提交
  10. 02 4月, 2013 1 次提交
  11. 16 3月, 2013 1 次提交
  12. 28 2月, 2013 1 次提交
    • L
      sysrq: don't depend on weak undefined arrays to have an address that compares as NULL · adf96e6f
      Linus Torvalds 提交于
      When taking an address of an extern array, gcc quite naturally should be
      able to say "an address of an object can never be NULL" and just
      optimize away the test entirely.
      
      However, the new alternate sysrq reset code (commit 154b7a48:
      "Input: sysrq - allow specifying alternate reset sequence") did exactly
      that, and declared platform_sysrq_reset_seq[] as a weak array, and
      expecting that testing the address of the array would show whether it
      actually got linked against something or not.
      
      And that doesn't work with all gcc versions.  Clearly it works with
      *some* versions of gcc, and maybe it's even supposed to work, but it
      really is a very fragile concept.
      
      So instead of testing the address of the weak variable, just create a
      weak instance of that array that is empty.  If some platform then has a
      real platform_sysrq_reset_seq[] that overrides our weak one, the linker
      will switch to that one, and it all works without any run-time
      conditionals at all.
      Reported-by: NDave Airlie <airlied@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      adf96e6f
  13. 08 2月, 2013 1 次提交
  14. 17 1月, 2013 1 次提交
  15. 16 11月, 2012 1 次提交
  16. 17 10月, 2012 1 次提交
  17. 06 4月, 2012 1 次提交
    • A
      sysrq: use SEND_SIG_FORCED instead of force_sig() · b82c3287
      Anton Vorontsov 提交于
      Change send_sig_all() to use do_send_sig_info(SEND_SIG_FORCED) instead
      of force_sig(SIGKILL).  With the recent changes we do not need force_ to
      kill the CLONE_NEWPID tasks.
      
      And this is more correct.  force_sig() can race with the exiting thread,
      while do_send_sig_info(group => true) kill the whole process.
      
      Some more notes from Oleg Nesterov:
      
      > Just one note. This change makes no difference for sysrq_handle_kill().
      > But it obviously changes the behaviour sysrq_handle_term(). I think
      > this is fine, if you want to really kill the task which blocks/ignores
      > SIGTERM you can use sysrq_handle_kill().
      >
      > Even ignoring the reasons why force_sig() is simply wrong here,
      > force_sig(SIGTERM) looks strange. The task won't be killed if it has
      > a handler, but SIG_IGN can't help. However if it has the handler
      > but blocks SIGTERM temporary (this is very common) it will be killed.
      
      Also,
      
      > force_sig() can't kill the process if the main thread has already
      > exited. IOW, it is trivial to create the process which can't be
      > killed by sysrq.
      
      So, this patch fixes the issue.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Alan Cox <alan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b82c3287
  18. 22 3月, 2012 1 次提交
  19. 09 3月, 2012 1 次提交
    • A
      vt:tackle kbd_table · 079c9534
      Alan Cox 提交于
      Keyboard struct lifetime is easy, but the locking is not and is completely
      ignored by the existing code. Tackle this one head on
      
      - Make the kbd_table private so we can run down all direct users
      - Hoick the relevant ioctl handlers into the keyboard layer
      - Lock them with the keyboard lock so they don't change mid keypress
      - Add helpers for things like console stop/start so we isolate the poking
        around properly
      - Tweak the braille console so it still builds
      
      There are a couple of FIXME locking cases left for ioctls that are so hideous
      they should be addressed in a later patch. After this patch the kbd_table is
      private and all the keyboard jiggery pokery is in one place.
      
      This update fixes speakup and also a memory leak in the original.
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      079c9534
  20. 10 2月, 2012 2 次提交
  21. 04 1月, 2012 1 次提交
  22. 25 3月, 2011 1 次提交
  23. 05 11月, 2010 1 次提交
  24. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  25. 30 9月, 2010 1 次提交
  26. 21 8月, 2010 1 次提交
  27. 20 8月, 2010 1 次提交
  28. 22 7月, 2010 1 次提交
  29. 09 6月, 2010 1 次提交
  30. 22 4月, 2010 1 次提交
    • F
      tracing: Dump either the oops's cpu source or all cpus buffers · cecbca96
      Frederic Weisbecker 提交于
      The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
      dump every cpu buffers when an oops or panic happens.
      
      It's nice when you have few cpus but it may take ages if have many,
      plus you miss the real origin of the problem in all the cpu traces.
      
      Sometimes, all you need is to dump the cpu buffer that triggered the
      opps, most of the time it is our main interest.
      
      This patch modifies ftrace_dump_on_oops to handle this choice.
      
      The ftrace_dump_on_oops kernel parameter, when it comes alone, has
      the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
      will only dump the buffer of the cpu that oops'ed.
      
      Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
      echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
      behaviour. But setting 2 jumps into cpu origin dump mode.
      
      v2: Fix double setup
      v3: Fix spelling issues reported by Randy Dunlap
      v4: Also update __ftrace_dump in the selftests
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      cecbca96
  31. 14 4月, 2010 1 次提交
    • D
      Input: implement SysRq as a separate input handler · 97f5f0cd
      Dmitry Torokhov 提交于
      Instead of keeping SysRq support inside of legacy keyboard driver split
      it out into a separate input handler (filter). This stops most SysRq input
      events from leaking into evdev clients (some events, such as first SysRq
      scancode - not keycode - event, are still leaked into both legacy keyboard
      and evdev).
      
      [martinez.javier@gmail.com: fix compile error when CONFIG_MAGIC_SYSRQ is
       not defined]
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      97f5f0cd
  32. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  33. 16 12月, 2009 1 次提交
    • K
      oom-kill: fix NUMA constraint check with nodemask · 4365a567
      KAMEZAWA Hiroyuki 提交于
      Fix node-oriented allocation handling in oom-kill.c I myself think of this
      as a bugfix not as an ehnancement.
      
      In these days, things are changed as
        - alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask().
        - mempolicy don't maintain its own private zonelists.
        (And cpuset doesn't use nodemask for __alloc_pages_nodemask())
      
      So, current oom-killer's check function is wrong.
      
      This patch does
        - check nodemask, if nodemask && nodemask doesn't cover all
          node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY.
        - Scan all zonelist under nodemask, if it hits cpuset's wall
          this faiulre is from cpuset.
      And
        - modifies the caller of out_of_memory not to call oom if __GFP_THISNODE.
          This doesn't change "current" behavior. If callers use __GFP_THISNODE
          it should handle "page allocation failure" by itself.
      
        - handle __GFP_NOFAIL+__GFP_THISNODE path.
          This is something like a FIXME but this gfpmask is not used now.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4365a567
  34. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  35. 03 8月, 2009 1 次提交
    • I
      debug lockups: Improve lockup detection, fix generic arch fallback · 47cab6a7
      Ingo Molnar 提交于
      As Andrew noted, my previous patch ("debug lockups: Improve lockup
      detection") broke/removed SysRq-L support from architecture that do
      not provide a __trigger_all_cpu_backtrace implementation.
      
      Restore a fallback path and clean up the SysRq-L machinery a bit:
      
       - Rename the arch method to arch_trigger_all_cpu_backtrace()
      
       - Simplify the define
      
       - Document the method a bit - in the hope of more architectures
         adding support for it.
      
      [ The patch touches Sparc code for the rename. ]
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      LKML-Reference: <20090802140809.7ec4bb6b.akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47cab6a7
  36. 02 8月, 2009 1 次提交
    • I
      debug lockups: Improve lockup detection · c1dc0b9c
      Ingo Molnar 提交于
      When debugging a recent lockup bug i found various deficiencies
      in how our current lockup detection helpers work:
      
       - SysRq-L is not very efficient as it uses a workqueue, hence
         it cannot punch through hard lockups and cannot see through
         most soft lockups either.
      
       - The SysRq-L code depends on the NMI watchdog - which is off
         by default.
      
       - We dont print backtraces from the RCU code's built-in
         'RCU state machine is stuck' debug code. This debug
         code tends to be one of the first (and only) mechanisms
         that show that a lockup has occured.
      
      This patch changes the code so taht we:
      
       - Trigger the NMI backtrace code from SysRq-L instead of using
         a workqueue (which cannot punch through hard lockups)
      
       - Trigger print-all-CPU-backtraces from the RCU lockup detection
         code
      
      Also decouple the backtrace printing code from the NMI watchdog:
      
       - Dont use variable size cpumasks (it might not be initialized
         and they are a bit more fragile anyway)
      
       - Trigger an NMI immediately via an IPI, instead of waiting
         for the NMI tick to occur. This is a lot faster and can
         produce more relevant backtraces. It will also work if the
         NMI watchdog is disabled.
      
       - Dont print the 'dazed and confused' message when we print
         a backtrace from the NMI
      
       - Do a show_regs() plus a dump_stack() to get maximum info
         out of the dump. Worst-case we get two stacktraces - which
         is not a big deal. Sometimes, if register content is
         corrupted, the precise stack walker in show_regs() wont
         give us a full backtrace - in this case dump_stack() will
         do it.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1dc0b9c
  37. 30 7月, 2009 1 次提交
    • H
      sysrq, kdump: make sysrq-c consistent · cab8bd34
      Hidetoshi Seto 提交于
      commit d6580a9f ("kexec: sysrq: simplify
      sysrq-c handler") changed the behavior of sysrq-c to unconditional
      dereference of NULL pointer.  So in cases with CONFIG_KEXEC, where
      crash_kexec() was directly called from sysrq-c before, now it can be said
      that a step of "real oops" was inserted before starting kdump.
      
      However, in contrast to oops via SysRq-c from keyboard which results in
      panic due to in_interrupt(), oops via "echo c > /proc/sysrq-trigger" will
      not become panic unless panic_on_oops=1.  It means that even if dump is
      properly configured to be taken on panic, the sysrq-c from proc interface
      might not start crashdump while the sysrq-c from keyboard can start
      crashdump.  This confuses traditional users of kdump, i.e.  people who
      expect sysrq-c to do common behavior in both of the keyboard and proc
      interface.
      
      This patch brings the keyboard and proc interface behavior of sysrq-c in
      line, by forcing panic_on_oops=1 before oops in sysrq-c handler.
      
      And some updates in documentation are included, to clarify that there is
      no longer dependency with CONFIG_KEXEC, and that now the system can just
      crash by sysrq-c if no dump mechanism is configured.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Brayan Arraes <brayan@yack.com.br>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cab8bd34