1. 03 1月, 2008 1 次提交
  2. 29 12月, 2007 1 次提交
    • D
      [SERIAL]: Fix section mismatches in Sun serial console drivers. · fb445ee5
      David S. Miller 提交于
      We're exporting an __init function, oops :-)
      
      The core issue here is that add_preferred_console() is marked
      as __init, this makes it impossible to invoke this thing from
      a driver probe routine which is what the Sparc serial drivers
      need to do.
      
      There is no harm in dropping the __init marker.  This code will
      actually work properly when invoked from a modular driver,
      except that init will probably not pick up the console change
      without some other support code.
      
      Then we can drop the __init from sunserial_console_match()
      and we're no longer exporting an __init function to modules.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb445ee5
  3. 23 12月, 2007 1 次提交
    • G
      Modules: fix memory leak of module names · d172f4ef
      Greg Kroah-Hartman 提交于
      Due to the change in kobject name handling, the module kobject needs to
      have a null release function to ensure that the name it previously set
      will be properly cleaned up.
      
      All of this wierdness goes away in 2.6.25 with the rework of the kobject
      name and cleanup logic, but this is required for 2.6.24.
      
      Thanks to Alexey Dobriyan for finding the problem, and to Kay Sievers
      for pointing out the simple way to fix it after I tried many complex
      ways.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      d172f4ef
  4. 20 12月, 2007 2 次提交
    • A
      debug: add end-of-oops marker · 2c3b20e9
      Arjan van de Ven 提交于
      Right now it's nearly impossible for parsers that collect kernel crashes
      from logs or emails (such as www.kerneloops.org) to detect the
      end-of-oops condition. In addition, it's not currently possible to
      detect whether or not 2 oopses that look alike are actually the same
      oops reported twice, or are truly two unique oopses.
      
      This patch adds an end-of-oops marker, and makes the end marker include
      a very simple 64-bit random ID to be able to detect duplicate reports.
      
      Normally, this ID is calculated as a late_initcall() (in the hope that
      at that time there is enough entropy to get a unique enough ID); however
      for early oopses the oops_exit() function needs to generate the ID on
      the fly.
      
      We do this all at the _end_ of an oops printout, so this does not impact
      our ability to get the most important portions of a crash out to the
      console first.
      
      [ Sidenote: the already existing oopses-since-bootup counter we print
        during crashes serves as the differentiator between multiple oopses
        that trigger during the same bootup. ]
      
      Tested on 32-bit and 64-bit x86. Artificially injected very early
      crashes as well, as expected they result in this constant ID after
      multiple bootups:
      
        ---[ end trace ca143223eefdc828 ]---
        ---[ end trace ca143223eefdc828 ]---
      
      because the random pools are still all zero. But it all still works
      fine and causes no additional problems (which is the main goal of
      instrumentation code).
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c3b20e9
    • P
      sched: rt: account the cpu time during the tick · 67e2be02
      Peter Zijlstra 提交于
      Realtime tasks would not account their runtime during ticks. Which would lead
      to:
      
              struct sched_param param = { .sched_priority = 10 };
              pthread_setschedparam(pthread_self(), SCHED_FIFO, &param);
      
      	while (1) ;
      
      Not showing up in top.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      67e2be02
  5. 19 12月, 2007 3 次提交
    • S
      genirq: revert lazy irq disable for simple irqs · 971e5b35
      Steven Rostedt 提交于
      In commit 76d21601 lazy irq disabling
      was implemented, and the simple irq handler had a masking set to it.
      
      Remy Bohmer discovered that some devices in the ARM architecture
      would trigger the mask, but never unmask it. His patch to do the
      unmasking was questioned by Russell King about masking simple irqs
      to begin with. Looking further, it was discovered that the problems
      Remy was seeing was due to improper use of the simple handler by
      devices, and he later submitted patches to fix those. But the issue
      that was uncovered was that the simple handler should never mask.
      
      This patch reverts the masking in the simple handler.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      971e5b35
    • A
      timer: kernel/timer.c section fixes · b4be6258
      Adrian Bunk 提交于
      This patch fixes the following section mismatches with CONFIG_HOTPLUG=n,
      CONFIG_HOTPLUG_CPU=y:
      
      ...
      WARNING: vmlinux.o(.text+0x41cd3): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
      WARNING: vmlinux.o(.text+0x41d67): Section mismatch: reference to .init.data:tvec_base_done.22610 (between 'timer_cpu_notify' and 'run_timer_softirq')
      ...
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b4be6258
    • T
      clockevents: fix reprogramming decision in oneshot broadcast · cdc6f27d
      Thomas Gleixner 提交于
      Resolve the following regression of a choppy, almost unusable laptop:
      
       http://lkml.org/lkml/2007/12/7/299
       http://bugzilla.kernel.org/show_bug.cgi?id=9525
      
      A previous version of the code did the reprogramming of the broadcast
      device in the return from idle code. This was removed, but the logic in
      tick_handle_oneshot_broadcast() was kept the same.
      
      When a broadcast interrupt happens we signal the expiry to all CPUs
      which have an expired event. If none of the CPUs has an expired event,
      which can happen in dyntick mode, then we reprogram the broadcast
      device. We do not reprogram otherwise, but this is only correct if all
      CPUs, which are in the idle broadcast state have been woken up.
      
      The code ignores, that there might be pending not yet expired events on
      other CPUs, which are in the idle broadcast state. So the delivery of
      those events can be delayed for quite a time.
      
      Change the tick_handle_oneshot_broadcast() function to check for CPUs,
      which are in broadcast state and are not woken up by the current event,
      and enforce the rearming of the broadcast device for those CPUs.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdc6f27d
  6. 18 12月, 2007 8 次提交
  7. 08 12月, 2007 4 次提交
  8. 07 12月, 2007 2 次提交
  9. 06 12月, 2007 2 次提交
    • P
      Avoid potential NULL dereference in unregister_sysctl_table · f1dad166
      Pavel Emelyanov 提交于
      register_sysctl_table() can return NULL sometimes, e.g.  when kmalloc()
      returns NULL or when sysctl check fails.
      
      I've also noticed, that many (most?) code in the kernel doesn't check for
      the return value from register_sysctl_table() and later simply calls the
      unregister_sysctl_table() with potentially NULL argument.
      
      This is unlikely on a common kernel configuration, but in case we're
      dealing with modules and/or fault-injection support, there's a slight
      possibility of an OOPS.
      
      Changing all the users to check for return code from the registering does
      not look like a good solution - there are too many code doing this and
      failure in sysctl tables registration is not a good reason to abort module
      loading (in most of the cases).
      
      So I think, that we can just have this check in unregister_sysctl_table
      just to avoid accidental OOPS-es (actually, the unregister_sysctl_table()
      did exactly this, before the start_unregistering() appeared).
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1dad166
    • E
      fix clone(CLONE_NEWPID) · 5cd17569
      Eric W. Biederman 提交于
      Currently we are complicating the code in copy_process, the clone ABI, and
      if we fix the bugs sys_setsid itself, with an unnecessary open coded
      version of sys_setsid.
      
      So just simplify everything and don't special case the session and pgrp of
      the initial process in a pid namespace.
      
      Having this special case actually presents to user space the classic linux
      startup conditions with session == pgrp == 0 for /sbin/init.
      
      We already handle sending signals to processes in a child pid namespace.
      
      We need to handle sending signals to processes in a parent pid namespace
      for cases like SIGCHILD and SIGIO.
      
      This makes nothing extra visible inside a pid namespace.  So this extra
      special case appears to have no redeeming merits.
      
      Further removing this special case increases the flexibility of how we can
      use pid namespaces, by not requiring the initial process in a pid namespace
      to be a daemon.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cd17569
  10. 05 12月, 2007 8 次提交
    • T
      futex: correctly return -EFAULT not -EINVAL · cde898fa
      Thomas Gleixner 提交于
      return -EFAULT not -EINVAL. Found by review.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cde898fa
    • O
      lockdep: in_range() fix · 54561783
      Oleg Nesterov 提交于
      Torsten Kaiser wrote:
      
      | static inline int in_range(const void *start, const void *addr, const void *end)
      | {
      |         return addr >= start && addr <= end;
      | }
      | This  will return true, if addr is in the range of start (including)
      | to end (including).
      |
      | But debug_check_no_locks_freed() seems does:
      | const void *mem_to = mem_from + mem_len
      | -> mem_to is the last byte of the freed range, that fits in_range
      | lock_from = (void *)hlock->instance;
      | -> first byte of the lock
      | lock_to = (void *)(hlock->instance + 1);
      | -> first byte of the next lock, not last byte of the lock that is being checked!
      |
      | The test is:
      | if (!in_range(mem_from, lock_from, mem_to) &&
      |                                         !in_range(mem_from, lock_to, mem_to))
      |                         continue;
      | So it tests, if the first byte of the lock is in the range that is freed ->OK
      | And if the first byte of the *next* lock is in the range that is freed
      | -> Not OK.
      
      We can also simplify in_range checks, we need only 2 comparisons, not 4.
      If the lock is not in memory range, it should be either at the left of range
      or at the right.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      54561783
    • I
      lockdep: fix debug_show_all_locks() · 85684873
      Ingo Molnar 提交于
      fix the oops that can be seen in:
      
         http://bugzilla.kernel.org/attachment.cgi?id=13828&action=view
      
      it is not safe to print the locks of running tasks.
      
      (even with this fix we have a small race - but this is a debug
       function after all.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      85684873
    • I
      sched: style cleanups · 41a2d6cf
      Ingo Molnar 提交于
      style cleanup of various changes that were done recently.
      
      no code changed:
      
            text    data     bss     dec     hex filename
           23680    2542      28   26250    668a sched.o.before
           23680    2542      28   26250    668a sched.o.after
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      41a2d6cf
    • S
      futex: fix for futex_wait signal stack corruption · ce6bd420
      Steven Rostedt 提交于
      David Holmes found a bug in the -rt tree with respect to
      pthread_cond_timedwait. After trying his test program on the latest git
      from mainline, I found the bug was there too.  The bug he was seeing
      that his test program showed, was that if one were to do a "Ctrl-Z" on a
      process that was in the pthread_cond_timedwait, and then did a "bg" on
      that process, it would return with a "-ETIMEDOUT" but early. That is,
      the timer would go off early.
      
      Looking into this, I found the source of the problem. And it is a rather
      nasty bug at that.
      
      Here's the relevant code from kernel/futex.c: (not in order in the file)
      
      [...]
      smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
                                struct timespec __user *utime, u32 __user *uaddr2,
                                u32 val3)
      {
              struct timespec ts;
              ktime_t t, *tp = NULL;
              u32 val2 = 0;
              int cmd = op & FUTEX_CMD_MASK;
      
              if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
                      if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
                              return -EFAULT;
                      if (!timespec_valid(&ts))
                              return -EINVAL;
      
                      t = timespec_to_ktime(ts);
                      if (cmd == FUTEX_WAIT)
                              t = ktime_add(ktime_get(), t);
                      tp = &t;
              }
      [...]
              return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
      }
      
      [...]
      
      long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
                      u32 __user *uaddr2, u32 val2, u32 val3)
      {
              int ret;
              int cmd = op & FUTEX_CMD_MASK;
              struct rw_semaphore *fshared = NULL;
      
              if (!(op & FUTEX_PRIVATE_FLAG))
                      fshared = &current->mm->mmap_sem;
      
              switch (cmd) {
              case FUTEX_WAIT:
                      ret = futex_wait(uaddr, fshared, val, timeout);
      
      [...]
      
      static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
                            u32 val, ktime_t *abs_time)
      {
      [...]
                     struct restart_block *restart;
                      restart = &current_thread_info()->restart_block;
                      restart->fn = futex_wait_restart;
                      restart->arg0 = (unsigned long)uaddr;
                      restart->arg1 = (unsigned long)val;
                      restart->arg2 = (unsigned long)abs_time;
                      restart->arg3 = 0;
                      if (fshared)
                              restart->arg3 |= ARG3_SHARED;
                      return -ERESTART_RESTARTBLOCK;
      [...]
      
      static long futex_wait_restart(struct restart_block *restart)
      {
              u32 __user *uaddr = (u32 __user *)restart->arg0;
              u32 val = (u32)restart->arg1;
              ktime_t *abs_time = (ktime_t *)restart->arg2;
              struct rw_semaphore *fshared = NULL;
      
              restart->fn = do_no_restart_syscall;
              if (restart->arg3 & ARG3_SHARED)
                      fshared = &current->mm->mmap_sem;
              return (long)futex_wait(uaddr, fshared, val, abs_time);
      }
      
      So when the futex_wait is interrupt by a signal we break out of the
      hrtimer code and set up or return from signal. This code does not return
      back to userspace, so we set up a RESTARTBLOCK.  The bug here is that we
      save the "abs_time" which is a pointer to the stack variable "ktime_t t"
      from sys_futex.
      
      This returns and unwinds the stack before we get to call our signal. On
      return from the signal we go to futex_wait_restart, where we update all
      the parameters for futex_wait and call it. But here we have a problem
      where abs_time is no longer valid.
      
      I verified this with print statements, and sure enough, what abs_time
      was set to ends up being garbage when we get to futex_wait_restart.
      
      The solution I did to solve this (with input from Linus Torvalds)
      was to add unions to the restart_block to allow system calls to
      use the restart with specific parameters.  This way the futex code now
      saves the time in a 64bit value in the restart block instead of storing
      it on the stack.
      
      Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
      in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
      Not sure what that is there for.  If this turns out to be a problem, I've
      tested this with using "unsigned int" for u32 and "unsigned long long" for
      u64 and it worked just the same. I'm using u32 and u64 just to be
      consistent with what the futex code uses.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce6bd420
    • D
      [SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string. · 874a5f87
      David S. Miller 提交于
      Based upon a report by Mikael Pettersson.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      874a5f87
    • I
      sched: default to more agressive yield for SCHED_BATCH tasks · db292ca3
      Ingo Molnar 提交于
      do more agressive yield for SCHED_BATCH tuned tasks: they are all
      about throughput anyway. This allows a gentler migration path for
      any apps that relied on stronger yield.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      db292ca3
    • I
      sched: fix crash in sys_sched_rr_get_interval() · 77034937
      Ingo Molnar 提交于
      Luiz Fernando N. Capitulino reported that sched_rr_get_interval()
      crashes for SCHED_OTHER tasks that are on an idle runqueue.
      
      The fix is to return a 0 timeslice for tasks that are on an idle
      runqueue. (and which are not running, obviously)
      
      this also shrinks the code a bit:
      
         text    data     bss     dec     hex filename
        47903    3934     336   52173    cbcd sched.o.before
        47885    3934     336   52155    cbbb sched.o.after
      Reported-by: NLuiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77034937
  11. 04 12月, 2007 1 次提交
  12. 03 12月, 2007 1 次提交
    • S
      sched: cpu accounting controller (V2) · d842de87
      Srivatsa Vaddagiri 提交于
      Commit cfb52856 removed a useful feature for
      us, which provided a cpu accounting resource controller.  This feature would be
      useful if someone wants to group tasks only for accounting purpose and doesnt
      really want to exercise any control over their cpu consumption.
      
      The patch below reintroduces the feature. It is based on Paul Menage's
      original patch (Commit 62d0df64), with
      these differences:
      
              - Removed load average information. I felt it needs more thought (esp
      	  to deal with SMP and virtualized platforms) and can be added for
      	  2.6.25 after more discussions.
              - Convert group cpu usage to be nanosecond accurate (as rest of the cfs
      	  stats are) and invoke cpuacct_charge() from the respective scheduler
      	  classes
      	- Make accounting scalable on SMP systems by splitting the usage
      	  counter to be per-cpu
      	- Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
      	  code is not big enough to warrant a new file and also this rightly
      	  needs to live inside the scheduler. Also things like accessing
      	  rq->lock while reading cpu usage becomes easier if the code lived in
      	  kernel/sched.c)
      
      The patch also modifies the cpu controller not to provide the same accounting
      information.
      Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      
       Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
       some simple tests like cpuspin (spin on the cpu), ran several tasks in
       the same group and timed them. Compared their time stamps with
       cpuacct.usage.
      Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d842de87
  13. 30 11月, 2007 4 次提交
  14. 28 11月, 2007 2 次提交