1. 12 11月, 2009 4 次提交
  2. 11 11月, 2009 5 次提交
    • E
      sysctl: Don't look at ctl_name and strategy in the generic code · 2315ffa0
      Eric W. Biederman 提交于
      The ctl_name and strategy fields are unused, now that sys_sysctl
      is a compatibility wrapper around /proc/sys.  No longer looking
      at them in the generic code is effectively what we are doing
      now and provides the guarantee that during further cleanups
      we can just remove references to those fields and everything
      will work ok.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      2315ffa0
    • E
      sysctl: Remove references to ctl_name and strategy from the generic sysctl table · 6fce56ec
      Eric W. Biederman 提交于
      Now that sys_sysctl is a generic wrapper around /proc/sys  .ctl_name
      and .strategy members of sysctl tables are dead code.  Remove them.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      6fce56ec
    • E
      sysctl: Remove dead code from sysctl_check · 83ac201b
      Eric W. Biederman 提交于
      Now that the sys_sysctl is now a compatibility wrapper around
      /proc/sys we can remove much of sysctl_check and reduce it
      to a few remaining sanity checks.  This completely decouples
      it from the binary sysctl system call.
      
      Little things like ensuring that the sysctl has not already
      been registered are all that remain.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      83ac201b
    • E
      sysctl: Neuter the generic sysctl strategy routines. · a965cf94
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatibility layer on top of /proc/sys
      these routines are never called but are still put in sysctl
      tables so I have reduced them to stubs until they can be
      removed entirely.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      a965cf94
    • E
      sysctl: Reduce sys_sysctl to a compatibility wrapper around /proc/sys · 26a7034b
      Eric W. Biederman 提交于
      To simply maintenance and to be able to remove all of the binary
      sysctl support from various subsystems I have rewritten the binary
      sysctl code as a compatibility wrapper around proc/sys.
      
      The code is built around a hard coded table based on the table
      in sysctl_check.c that lists all of our current binary sysctls
      and provides enough information to convert from the sysctl
      binary input into into ascii and back again.  New in this
      patch is the realization that the only dynamic entries
      that need to be handled have ifname as the asscii string
      and ifindex as their ctl_name.
      
      When a sys_sysctl is called the code now looks in the
      translation table converting the binary name to the
      path under /proc where the value is to be found.  Opens
      that file, and calls into a format conversion wrapper
      that calls fop->read and then fop->write as appropriate.
      
      Since in practice the practically no one uses or tests
      sys_sysctl rewritting the code to be beautiful is a little
      silly.  The redeeming merit of this work is it allows us to
      rip out all of the binary sysctl syscall support from
      everywhere else in the tree.  Allowing us to remove
      a lot of dead (after this patch) and barely maintained code.
      
      In addition it becomes much easier to optimize the sysctl
      implementation for being the backing store of /proc/sys,
      without having to worry about sys_sysctl.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      26a7034b
  3. 06 11月, 2009 5 次提交
  4. 03 11月, 2009 4 次提交
    • I
      Correct nr_processes() when CPUs have been unplugged · 1d510750
      Ian Campbell 提交于
      nr_processes() returns the sum of the per cpu counter process_counts for
      all online CPUs. This counter is incremented for the current CPU on
      fork() and decremented for the current CPU on exit(). Since a process
      does not necessarily fork and exit on the same CPU the process_count for
      an individual CPU can be either positive or negative and effectively has
      no meaning in isolation.
      
      Therefore calculating the sum of process_counts over only the online
      CPUs omits the processes which were started or stopped on any CPU which
      has since been unplugged. Only the sum of process_counts across all
      possible CPUs has meaning.
      
      The only caller of nr_processes() is proc_root_getattr() which
      calculates the number of links to /proc as
              stat->nlink = proc_root.nlink + nr_processes();
      
      You don't have to be all that unlucky for the nr_processes() to return a
      negative value leading to a negative number of links (or rather, an
      apparently enormous number of links). If this happens then you can get
      failures where things like "ls /proc" start to fail because they got an
      -EOVERFLOW from some stat() call.
      
      Example with some debugging inserted to show what goes on:
              # ps haux|wc -l
              nr_processes: CPU0:     90
              nr_processes: CPU1:     1030
              nr_processes: CPU2:     -900
              nr_processes: CPU3:     -136
              nr_processes: TOTAL:    84
              proc_root_getattr. nlink 12 + nr_processes() 84 = 96
              84
              # echo 0 >/sys/devices/system/cpu/cpu1/online
              # ps haux|wc -l
              nr_processes: CPU0:     85
              nr_processes: CPU2:     -901
              nr_processes: CPU3:     -137
              nr_processes: TOTAL:    -953
              proc_root_getattr. nlink 12 + nr_processes() -953 = -941
              75
              # stat /proc/
              nr_processes: CPU0:     84
              nr_processes: CPU2:     -901
              nr_processes: CPU3:     -137
              nr_processes: TOTAL:    -954
              proc_root_getattr. nlink 12 + nr_processes() -954 = -942
                File: `/proc/'
                Size: 0               Blocks: 0          IO Block: 1024   directory
              Device: 3h/3d   Inode: 1           Links: 4294966354
              Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
              Access: 2009-11-03 09:06:55.000000000 +0000
              Modify: 2009-11-03 09:06:55.000000000 +0000
              Change: 2009-11-03 09:06:55.000000000 +0000
      
      I'm not 100% convinced that the per_cpu regions remain valid for offline
      CPUs, although my testing suggests that they do. If not then I think the
      correct solution would be to aggregate the process_count for a given CPU
      into a global base value in cpu_down().
      
      This bug appears to pre-date the transition to git and it looks like it
      may even have been present in linux-2.6.0-test7-bk3 since it looks like
      the code Rusty patched in http://lwn.net/Articles/64773/ was already
      wrong.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d510750
    • J
      PM / Hibernate: Add newline to load_image() fail path · bf9fd67a
      Jiri Slaby 提交于
      Finish a line by \n when load_image fails in the middle of loading.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      bf9fd67a
    • J
      PM / Hibernate: Fix error handling in save_image() · 4ff277f9
      Jiri Slaby 提交于
      There are too many retval variables in save_image(). Thus error return
      value from snapshot_read_next() may be ignored and only part of the
      snapshot (successfully) written.
      
      Remove 'error' variable, invert the condition in the do-while loop
      and convert the loop to use only 'ret' variable.
      
      Switch the rest of the function to consider only 'ret'.
      
      Also make sure we end printed line by \n if an error occurs.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      4ff277f9
    • J
      PM / Hibernate: Fix blkdev refleaks · 76b57e61
      Jiri Slaby 提交于
      While cruising through the swsusp code I found few blkdev reference
      leaks of resume_bdev.
      
      swsusp_read: remove blkdev_put altogether. Some fail paths do
                   not do that.
      swsusp_check: make sure we always put a reference on fail paths
      software_resume: all fail paths between swsusp_check and swsusp_read
                       omit swsusp_close. Add it in those cases. And since
                       swsusp_read doesn't drop the reference anymore, do
                       it here unconditionally.
      
      [rjw: Fixed a small coding style issue.]
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      76b57e61
  5. 29 10月, 2009 7 次提交
    • A
      sysctl: fix false positives when PROC_SYSCTL=n · 8c85dd87
      Alexey Dobriyan 提交于
      Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
      people use such combination to reduce ifdefs with non-standard handlers.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Reported-by: tthtlc's avatarPeter Teoh <htmldeveloper@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c85dd87
    • K
      cgroup: fix strstrip() misuse · 478988d3
      KOSAKI Motohiro 提交于
      cgroup_write_X64() and cgroup_write_string() ignore the return value of
      strstrip().  it makes small inconsistent behavior.
      
      example:
      =========================
       # cd /mnt/cgroup/hoge
       # cat memory.swappiness
       60
       # echo "59 " > memory.swappiness
       # cat memory.swappiness
       59
       # echo " 58" > memory.swappiness
       bash: echo: write error: Invalid argument
      
      This patch fixes it.
      
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      478988d3
    • C
      connector: fix regression introduced by sid connector · 0d0df599
      Christian Borntraeger 提交于
      Since commit 02b51df1 (proc connector: add
      event for process becoming session leader) we have the following warning:
      
      Badness at kernel/softirq.c:143
      [...]
      Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
      [...]
      Call Trace:
      ([<000000013fe04100>] 0x13fe04100)
       [<000000000048a946>] sk_filter+0x9a/0xd0
       [<000000000049d938>] netlink_broadcast+0x2c0/0x53c
       [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
       [<00000000003baef0>] proc_sid_connector+0xc4/0xd4
       [<0000000000142604>] __set_special_pids+0x58/0x90
       [<0000000000159938>] sys_setsid+0xb4/0xd8
       [<00000000001187fe>] sysc_noemu+0x10/0x16
       [<00000041616cb266>] 0x41616cb266
      
      The warning is
      --->    WARN_ON_ONCE(in_irq() || irqs_disabled());
      
      The network code must not be called with disabled interrupts but
      sys_setsid holds the tasklist_lock with spinlock_irq while calling the
      connector.
      
      After a discussion we agreed that we can move proc_sid_connector from
      __set_special_pids to sys_setsid.
      
      We also agreed that it is sufficient to change the check from
      task_session(curr) != pid into err > 0, since if we don't change the
      session, this means we were already the leader and return -EPERM.
      
      One last thing:
      There is also daemonize(), and some people might want to get a
      notification in that case. Since daemonize() is only needed if a user
      space does kernel_thread this does not look important (and there seems
      to be no consensus if this connector should be called in daemonize). If
      we really want this, we can add proc_sid_connector to daemonize() in an
      additional patch (Scott?)
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Scott James Remnant <scott@ubuntu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d0df599
    • R
      param: fix setting arrays of bool · 3c7d76e3
      Rusty Russell 提交于
      We create a dummy struct kernel_param on the stack for parsing each
      array element, but we didn't initialize the flags word.  This matters
      for arrays of type "bool", where the flag indicates if it really is
      an array of bools or unsigned int (old-style).
      Reported-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      3c7d76e3
    • R
      param: fix NULL comparison on oom · d553ad86
      Rusty Russell 提交于
      kp->arg is always true: it's the contents of that pointer we care about.
      Reported-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      d553ad86
    • R
      param: fix lots of bugs with writing charp params from sysfs, by leaking mem. · 65afac7d
      Rusty Russell 提交于
      e180a6b7 "param: fix charp parameters set via sysfs" fixed the case
      where charp parameters written via sysfs were freed, leaving drivers
      accessing random memory.
      
      Unfortunately, storing a flag in the kparam struct was a bad idea: it's
      rodata so setting it causes an oops on some archs.  But that's not all:
      
      1) module_param_array() on charp doesn't work reliably, since we use an
         uninitialized temporary struct kernel_param.
      2) there's a fundamental race if a module uses this parameter and then
         it's changed: they will still access the old, freed, memory.
      
      The simplest fix (ie. for 2.6.32) is to never free the memory.  This
      prevents all these problems, at cost of a memory leak.  In practice, there
      are only 18 places where a charp is writable via sysfs, and all are
      root-only writable.
      Reported-by: NTakashi Iwai <tiwai@suse.de>
      Cc: Sitsofe Wheeler <sitsofe@yahoo.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Christof Schmitt <christof.schmitt@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      65afac7d
    • T
      futex: Fix spurious wakeup for requeue_pi really · 11df6ddd
      Thomas Gleixner 提交于
      The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
      NULL test) nor does it use the wake_list of futex_wake() which where
      the reason for commit 41890f2 (futex: Handle spurious wake up)
      
      See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
      
      The changes in this fix to the wait_requeue_pi path were considered to
      be a likely unecessary, but harmless safety net. But it turns out that
      due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
      as EAGAIN we built an endless loop in the code path which returns
      correctly EWOULDBLOCK.
      
      Spurious wakeups in wait_requeue_pi code path are unlikely so we do
      the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
      it deal with the spurious wakeup.
      
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      LKML-Reference: <4AE23C74.1090502@us.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      11df6ddd
  6. 28 10月, 2009 1 次提交
    • J
      sched: move rq_weight data array out of .percpu · 4a6cc4bd
      Jiri Kosina 提交于
      Commit 34d76c41 introduced percpu array update_shares_data, size of which
      being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
      NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
      
      Fix this by allocating this array dynamically and keep only pointer to it
      percpu.
      
      The per-cpu handling doesn't impose significant performance penalty on
      potentially contented path in tg_shares_up().
      
      ...
      ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
      ffffffff81043383:       00 00
      ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
      ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff81043393:       00
      ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff8104339b:       00
      ffffffff8104339c:       48 01 d0                add    %rdx,%rax
      ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
      ffffffff810433a6:       00
      ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
      ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
      ...
      
      After:
      
      ...
      ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
      ffffffff81043383:       00
      ffffffff81043384:       48 98                   cltq
      ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
      ffffffff8104338d:       00
      ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
      ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
      ffffffff8104339c:       81
      ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff810433a4:       00
      ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
      ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff810433b5:       00
      ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
      ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
      ...
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4a6cc4bd
  7. 24 10月, 2009 4 次提交
  8. 23 10月, 2009 2 次提交
    • S
      perf events: Don't generate events for the idle task when exclude_idle is set · 54f44076
      Soeren Sandmann 提交于
      Getting samples for the idle task is often not interesting, so
      don't generate them when exclude_idle is set for the event in
      question.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <ye8pr8fmlq7.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      54f44076
    • S
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time... · 721a669b
      Soeren Sandmann 提交于
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
      
      Make the hrtimer based events work for sysprof.
      
      Whenever a swevent is scheduled out, the hrtimer is canceled.
      When it is scheduled back in, the timer is restarted. This
      happens every scheduler tick, which means the timer never
      expired because it was getting repeatedly restarted over and
      over with the same period.
      
      To fix that, save the remaining time when disabling; when
      reenabling, use that saved time as the period instead of the
      user-specified sampling period.
      
      Also, move the starting and stopping of the hrtimers to helper
      functions instead of duplicating the code.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      721a669b
  9. 22 10月, 2009 1 次提交
  10. 19 10月, 2009 1 次提交
    • A
      HWPOISON: Allow schedule_on_each_cpu() from keventd · 65a64464
      Andi Kleen 提交于
      Right now when calling schedule_on_each_cpu() from keventd there
      is a deadlock because it tries to schedule a work item on the current CPU
      too. This happens via lru_add_drain_all() in hwpoison.
      
      Just call the function for the current CPU in this case. This is actually
      faster too.
      
      Debugging with Fengguang Wu & Max Asbock
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      65a64464
  11. 16 10月, 2009 2 次提交
    • D
      futex: Move drop_futex_key_refs out of spinlock'ed region · 89061d3d
      Darren Hart 提交于
      When requeuing tasks from one futex to another, the reference held
      by the requeued task to the original futex location needs to be
      dropped eventually.
      
      Dropping the reference may ultimately lead to a call to
      "iput_final" and subsequently call into filesystem- specific code -
      which may be non-atomic.
      
      It is therefore safer to defer this drop operation until after the
      futex_hash_bucket spinlock has been dropped.
      
      Originally-From: Helge Bahmann <hcb@chaoticmind.net>
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <4AD7A298.5040802@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89061d3d
    • P
      rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang · 237c80c5
      Paul E. McKenney 提交于
      If the following sequence of events occurs, then
      TREE_PREEMPT_RCU will hang waiting for a grace period to
      complete, eventually OOMing the system:
      
      o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
      	with more than 64 physical CPUs present (32 on a 32-bit system).
      	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
      	with RCU_FANOUT set to a sufficiently small value that the
      	physical CPUs populate two or more leaf rcu_node structures.
      
      o	A task is preempted in an RCU read-side critical section
      	while running on a CPU corresponding to a given leaf rcu_node
      	structure.
      
      o	All CPUs corresponding to this same leaf rcu_node structure
      	record quiescent states for the current grace period.
      
      o	All of these same CPUs go offline (hence the need for enough
      	physical CPUs to populate more than one leaf rcu_node structure).
      	This causes the preempted task to be moved to the root rcu_node
      	structure.
      
      At this point, there is nothing left to cause the quiescent
      state to be propagated up the rcu_node tree, so the current
      grace period never completes.
      
      The simplest fix, especially after considering the deadlock
      possibilities, is to detect this situation when the last CPU is
      offlined, and to set that CPU's ->qsmask bit in its leaf
      rcu_node structure.  This will cause the next invocation of
      force_quiescent_state() to end the grace period.
      
      Without this fix, this hang can be triggered in an hour or so on
      some machines with rcutorture and random CPU onlining/offlining.
      With this fix, these same machines pass a full 10 hours of this
      sort of abuse.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      237c80c5
  12. 15 10月, 2009 4 次提交
    • P
      rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU · 019129d5
      Paul E. McKenney 提交于
      For the short term, map synchronize_rcu_expedited() to
      synchronize_rcu() for TREE_PREEMPT_RCU and to
      synchronize_sched_expedited() for TREE_RCU.
      
      Longer term, there needs to be a real expedited grace period for
      TREE_PREEMPT_RCU, but candidate patches to date are considerably
      more complex and intrusive.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592331-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      019129d5
    • P
      rcu: Prevent RCU IPI storms in presence of high call_rcu() load · 37c72e56
      Paul E. McKenney 提交于
      As the number of callbacks on a given CPU rises, invoke
      force_quiescent_state() only every blimit number of callbacks
      (defaults to 10,000), and even then only if no other CPU has
      invoked force_quiescent_state() in the meantime.
      
      This should fix the performance regression reported by Nick.
      Reported-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592133-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37c72e56
    • L
      workqueue: add 'flush_delayed_work()' to run and wait for delayed work · 8c53e463
      Linus Torvalds 提交于
      It basically turns a delayed work into an immediate work, and then waits
      for it to finish, thus allowing you to force (and wait for) an immediate
      flush of a delayed work.
      
      We'll want to use this in the tty layer to clean up tty_flush_to_ldisc().
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      [ Fixed to use 'del_timer_sync()' as noted by Oleg ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c53e463
    • D
      futex: Check for NULL keys in match_futex · 2bc87203
      Darren Hart 提交于
      If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
      it will find the futex_q->requeue_pi_key to be NULL and OOPS.
      
      Check for NULL in match_futex() instead of doing explicit NULL pointer
      checks on all call sites.  While match_futex(NULL, NULL) returning
      false is a little odd, it's still correct as we expect valid key
      references.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Dinakar Guniguntala <dino@in.ibm.com>
      CC: John Stultz <johnstul@us.ibm.com>
      Cc: stable@kernel.org
      LKML-Reference: <4AD60687.10306@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2bc87203