1. 18 1月, 2008 1 次提交
  2. 16 1月, 2008 2 次提交
  3. 15 1月, 2008 1 次提交
  4. 14 1月, 2008 1 次提交
    • R
      remove task_ppid_nr_ns · 84427eae
      Roland McGrath 提交于
      task_ppid_nr_ns is called in three places.  One of these should never
      have called it.  In the other two, using it broke the existing
      semantics.  This was presumably accidental.  If the function had not
      been there, it would have been much more obvious to the eye that those
      patches were changing the behavior.  We don't need this function.
      
      In task_state, the pid of the ptracer is not the ppid of the ptracer.
      
      In do_task_stat, ppid is the tgid of the real_parent, not its pid.
      I also moved the call outside of lock_task_sighand, since it doesn't
      need it.
      
      In sys_getppid, ppid is the tgid of the real_parent, not its pid.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84427eae
  5. 12 1月, 2008 1 次提交
  6. 10 1月, 2008 1 次提交
  7. 09 1月, 2008 2 次提交
    • T
      futex: Prevent stale futex owner when interrupted/timeout · cdf71a10
      Thomas Gleixner 提交于
      Roland Westrelin did a great analysis of a long standing thinko in the
      return path of futex_lock_pi.
      
      While we fixed the lock steal case long ago, which was easy to trigger,
      we never had a test case which exposed this problem and stupidly never
      thought about the reverse lock stealing scenario and the return to user
      space with a stale state.
      
      When a blocked tasks returns from rt_mutex_timed_locked without holding
      the rt_mutex (due to a signal or timeout) and at the same time the task
      holding the futex is releasing the futex and assigning the ownership of
      the futex to the returning task, then it might happen that a third task
      acquires the rt_mutex before the final rt_mutex_trylock() of the
      returning task happens under the futex hash bucket lock. The returning
      task returns to user space with ETIMEOUT or EINTR, but the user space
      futex value is assigned to this task. The task which acquired the
      rt_mutex fixes the user space futex value right after the hash bucket
      lock has been released by the returning task, but for a short period of
      time the user space value is wrong.
      
      Detailed description is available at:
      
         https://bugzilla.redhat.com/show_bug.cgi?id=400541
      
      The fix for this is the same as we do when the rt_mutex was acquired by
      a higher priority task via lock stealing from the designated new owner.
      In that case we already fix the user space value and the internal
      pi_state up before we return. This mechanism can be used to fixup the
      above corner case as well. When the returning task, which failed to
      acquire the rt_mutex, notices that it is the designated owner of the
      futex, then it fixes up the stale user space value and the pi_state,
      before returning to user space. This happens with the futex hash bucket
      lock held, so the task which acquired the rt_mutex is guaranteed to be
      blocked on the hash bucket lock. We can access the rt_mutex owner, which
      gives us the pid of the new owner, safely here as the owner is not able
      to modify (release) it while waiting on the hash bucket lock.
      
      Rename the "curr" argument of fixup_pi_state_owner() to "newowner" to
      avoid confusion with current and add the check for the stale state into
      the failure path of rt_mutex_trylock() in the return path of
      unlock_futex_pi(). If the situation is detected use
      fixup_pi_state_owner() to assign everything to the owner of the
      rt_mutex.
      Pointed-out-and-tested-by: NRoland Westrelin <roland.westrelin@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cdf71a10
    • K
      vmcoreinfo: add the array length of "free_list" for filtering free pages · 83a08e7c
      Ken'ichi Ohmichi 提交于
      This patch adds the array length of "free_area.free_list" to the vmcoreinfo
      data so that makedumpfile (dump filtering command) can exclude all free pages
      in linux-2.6.24.
      
      makedumpfile creates a small dumpfile by excluding unnecessary pages for the
      analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo
      data which has the minimum debugging information only for dump filtering.
      
      In 2.6.24-rc1 or later, the free_area.free_list is an array which has one list
      for each migrate types instead of a single list. makedumpfile needs the array
      length of "free_area.free_list" and the vmcoreinfo data should contain it.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Tested-by: NKen'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83a08e7c
  8. 08 1月, 2008 1 次提交
  9. 03 1月, 2008 2 次提交
  10. 31 12月, 2007 1 次提交
    • I
      sched: fix gcc warnings · 90b2628f
      Ingo Molnar 提交于
      Meelis Roos reported these warnings on sparc64:
      
        CC      kernel/sched.o
        In file included from kernel/sched.c:879:
        kernel/sched_debug.c: In function 'nsec_high':
        kernel/sched_debug.c:38: warning: comparison of distinct pointer types lacks a cast
      
      the debug check in do_div() is over-eager here, because the long long
      is always positive in these places. Mark this by casting them to
      unsigned long long.
      
      no change in code output:
      
         text    data     bss     dec     hex filename
        51471    6582     376   58429    e43d sched.o.before
        51471    6582     376   58429    e43d sched.o.after
      
        md5:
         7f7729c111f185bf3ccea4d542abc049  sched.o.before.asm
         7f7729c111f185bf3ccea4d542abc049  sched.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      90b2628f
  11. 29 12月, 2007 1 次提交
    • D
      [SERIAL]: Fix section mismatches in Sun serial console drivers. · fb445ee5
      David S. Miller 提交于
      We're exporting an __init function, oops :-)
      
      The core issue here is that add_preferred_console() is marked
      as __init, this makes it impossible to invoke this thing from
      a driver probe routine which is what the Sparc serial drivers
      need to do.
      
      There is no harm in dropping the __init marker.  This code will
      actually work properly when invoked from a modular driver,
      except that init will probably not pick up the console change
      without some other support code.
      
      Then we can drop the __init from sunserial_console_match()
      and we're no longer exporting an __init function to modules.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb445ee5
  12. 23 12月, 2007 1 次提交
    • G
      Modules: fix memory leak of module names · d172f4ef
      Greg Kroah-Hartman 提交于
      Due to the change in kobject name handling, the module kobject needs to
      have a null release function to ensure that the name it previously set
      will be properly cleaned up.
      
      All of this wierdness goes away in 2.6.25 with the rework of the kobject
      name and cleanup logic, but this is required for 2.6.24.
      
      Thanks to Alexey Dobriyan for finding the problem, and to Kay Sievers
      for pointing out the simple way to fix it after I tried many complex
      ways.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      d172f4ef
  13. 20 12月, 2007 2 次提交
    • A
      debug: add end-of-oops marker · 2c3b20e9
      Arjan van de Ven 提交于
      Right now it's nearly impossible for parsers that collect kernel crashes
      from logs or emails (such as www.kerneloops.org) to detect the
      end-of-oops condition. In addition, it's not currently possible to
      detect whether or not 2 oopses that look alike are actually the same
      oops reported twice, or are truly two unique oopses.
      
      This patch adds an end-of-oops marker, and makes the end marker include
      a very simple 64-bit random ID to be able to detect duplicate reports.
      
      Normally, this ID is calculated as a late_initcall() (in the hope that
      at that time there is enough entropy to get a unique enough ID); however
      for early oopses the oops_exit() function needs to generate the ID on
      the fly.
      
      We do this all at the _end_ of an oops printout, so this does not impact
      our ability to get the most important portions of a crash out to the
      console first.
      
      [ Sidenote: the already existing oopses-since-bootup counter we print
        during crashes serves as the differentiator between multiple oopses
        that trigger during the same bootup. ]
      
      Tested on 32-bit and 64-bit x86. Artificially injected very early
      crashes as well, as expected they result in this constant ID after
      multiple bootups:
      
        ---[ end trace ca143223eefdc828 ]---
        ---[ end trace ca143223eefdc828 ]---
      
      because the random pools are still all zero. But it all still works
      fine and causes no additional problems (which is the main goal of
      instrumentation code).
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c3b20e9
    • P
      sched: rt: account the cpu time during the tick · 67e2be02
      Peter Zijlstra 提交于
      Realtime tasks would not account their runtime during ticks. Which would lead
      to:
      
              struct sched_param param = { .sched_priority = 10 };
              pthread_setschedparam(pthread_self(), SCHED_FIFO, &param);
      
      	while (1) ;
      
      Not showing up in top.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      67e2be02
  14. 19 12月, 2007 3 次提交
    • S
      genirq: revert lazy irq disable for simple irqs · 971e5b35
      Steven Rostedt 提交于
      In commit 76d21601 lazy irq disabling
      was implemented, and the simple irq handler had a masking set to it.
      
      Remy Bohmer discovered that some devices in the ARM architecture
      would trigger the mask, but never unmask it. His patch to do the
      unmasking was questioned by Russell King about masking simple irqs
      to begin with. Looking further, it was discovered that the problems
      Remy was seeing was due to improper use of the simple handler by
      devices, and he later submitted patches to fix those. But the issue
      that was uncovered was that the simple handler should never mask.
      
      This patch reverts the masking in the simple handler.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      971e5b35
    • A
      timer: kernel/timer.c section fixes · b4be6258
      Adrian Bunk 提交于
      This patch fixes the following section mismatches with CONFIG_HOTPLUG=n,
      CONFIG_HOTPLUG_CPU=y:
      
      ...
      WARNING: vmlinux.o(.text+0x41cd3): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
      WARNING: vmlinux.o(.text+0x41d67): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
      ...
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b4be6258
    • T
      clockevents: fix reprogramming decision in oneshot broadcast · cdc6f27d
      Thomas Gleixner 提交于
      Resolve the following regression of a choppy, almost unusable laptop:
      
       http://lkml.org/lkml/2007/12/7/299
       http://bugzilla.kernel.org/show_bug.cgi?id=9525
      
      A previous version of the code did the reprogramming of the broadcast
      device in the return from idle code. This was removed, but the logic in
      tick_handle_oneshot_broadcast() was kept the same.
      
      When a broadcast interrupt happens we signal the expiry to all CPUs
      which have an expired event. If none of the CPUs has an expired event,
      which can happen in dyntick mode, then we reprogram the broadcast
      device. We do not reprogram otherwise, but this is only correct if all
      CPUs, which are in the idle broadcast state have been woken up.
      
      The code ignores, that there might be pending not yet expired events on
      other CPUs, which are in the idle broadcast state. So the delivery of
      those events can be delayed for quite a time.
      
      Change the tick_handle_oneshot_broadcast() function to check for CPUs,
      which are in broadcast state and are not woken up by the current event,
      and enforce the rearming of the broadcast device for those CPUs.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdc6f27d
  15. 18 12月, 2007 8 次提交
  16. 08 12月, 2007 4 次提交
  17. 07 12月, 2007 2 次提交
  18. 06 12月, 2007 2 次提交
    • P
      Avoid potential NULL dereference in unregister_sysctl_table · f1dad166
      Pavel Emelyanov 提交于
      register_sysctl_table() can return NULL sometimes, e.g.  when kmalloc()
      returns NULL or when sysctl check fails.
      
      I've also noticed, that many (most?) code in the kernel doesn't check for
      the return value from register_sysctl_table() and later simply calls the
      unregister_sysctl_table() with potentially NULL argument.
      
      This is unlikely on a common kernel configuration, but in case we're
      dealing with modules and/or fault-injection support, there's a slight
      possibility of an OOPS.
      
      Changing all the users to check for return code from the registering does
      not look like a good solution - there are too many code doing this and
      failure in sysctl tables registration is not a good reason to abort module
      loading (in most of the cases).
      
      So I think, that we can just have this check in unregister_sysctl_table
      just to avoid accidental OOPS-es (actually, the unregister_sysctl_table()
      did exactly this, before the start_unregistering() appeared).
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1dad166
    • E
      fix clone(CLONE_NEWPID) · 5cd17569
      Eric W. Biederman 提交于
      Currently we are complicating the code in copy_process, the clone ABI, and
      if we fix the bugs sys_setsid itself, with an unnecessary open coded
      version of sys_setsid.
      
      So just simplify everything and don't special case the session and pgrp of
      the initial process in a pid namespace.
      
      Having this special case actually presents to user space the classic linux
      startup conditions with session == pgrp == 0 for /sbin/init.
      
      We already handle sending signals to processes in a child pid namespace.
      
      We need to handle sending signals to processes in a parent pid namespace
      for cases like SIGCHILD and SIGIO.
      
      This makes nothing extra visible inside a pid namespace.  So this extra
      special case appears to have no redeeming merits.
      
      Further removing this special case increases the flexibility of how we can
      use pid namespaces, by not requiring the initial process in a pid namespace
      to be a daemon.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cd17569
  19. 05 12月, 2007 4 次提交