You need to sign in or sign up before continuing.
  1. 09 12月, 2010 2 次提交
  2. 07 12月, 2010 2 次提交
    • R
      PM / Hibernate: Fix memory corruption related to swap · c9e664f1
      Rafael J. Wysocki 提交于
      There is a problem that swap pages allocated before the creation of
      a hibernation image can be released and used for storing the contents
      of different memory pages while the image is being saved.  Since the
      kernel stored in the image doesn't know of that, it causes memory
      corruption to occur after resume from hibernation, especially on
      systems with relatively small RAM that need to swap often.
      
      This issue can be addressed by keeping the GFP_IOFS bits clear
      in gfp_allowed_mask during the entire hibernation, including the
      saving of the image, until the system is finally turned off or
      the hibernation is aborted.  Unfortunately, for this purpose
      it's necessary to rework the way in which the hibernate and
      suspend code manipulates gfp_allowed_mask.
      
      This change is based on an earlier patch from Hugh Dickins.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reported-by: NOndrej Zary <linux@rainbow-software.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: stable@kernel.org
      c9e664f1
    • B
      PM / Hibernate: Use async I/O when reading compressed hibernation image · 9f339caf
      Bojan Smojver 提交于
      This is a fix for reading LZO compressed image using async I/O.
      Essentially, instead of having just one page into which we keep
      reading blocks from swap, we allocate enough of them to cover the
      largest compressed size and then let block I/O pick them all up. Once
      we have them all (and here we wait), we decompress them, as usual.
      Obviously, the very first block we still pick up synchronously,
      because we need to know the size of the lot before we pick up the
      rest.
      
      Also fixed the copyright line, which I've forgotten before.
      Signed-off-by: NBojan Smojver <bojan@rexursive.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      9f339caf
  3. 03 12月, 2010 1 次提交
    • N
      do_exit(): make sure that we run with get_fs() == USER_DS · 33dd94ae
      Nelson Elhage 提交于
      If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
      otherwise reset before do_exit().  do_exit may later (via mm_release in
      fork.c) do a put_user to a user-controlled address, potentially allowing
      a user to leverage an oops into a controlled write into kernel memory.
      
      This is only triggerable in the presence of another bug, but this
      potentially turns a lot of DoS bugs into privilege escalations, so it's
      worth fixing.  I have proof-of-concept code which uses this bug along
      with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
      I've tested that this is not theoretical.
      
      A more logical place to put this fix might be when we know an oops has
      occurred, before we call do_exit(), but that would involve changing
      every architecture, in multiple places.
      
      Let's just stick it in do_exit instead.
      
      [akpm@linux-foundation.org: update code comment]
      Signed-off-by: NNelson Elhage <nelhage@ksplice.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33dd94ae
  4. 01 12月, 2010 1 次提交
  5. 30 11月, 2010 2 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
    • P
      sched: Fix unregister_fair_sched_group() · 822bc180
      Paul Turner 提交于
      In the flipping and flopping between calling
      unregister_fair_sched_group() on a per-cpu versus per-group basis
      we ended up in a bad state.
      
      Remove from the list for the passed cpu as opposed to some
      arbitrary index.
      
      ( This fixes explosions w/ autogroup as well as a group
        creation/destruction stress test. )
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Turner <pjt@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20101130005740.080828123@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      822bc180
  6. 26 11月, 2010 6 次提交
    • N
      sched: Remove unused argument dest_cpu to migrate_task() · b7a2b39d
      Nikanth Karthikesan 提交于
      Remove unused argument, 'dest_cpu' of migrate_task(), and pass runqueue,
      as it is always known at the call site.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <201011261237.09187.knikanth@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7a2b39d
    • G
      mutexes, sched: Introduce arch_mutex_cpu_relax() · 335d7afb
      Gerald Schaefer 提交于
      The spinning mutex implementation uses cpu_relax() in busy loops as a
      compiler barrier. Depending on the architecture, cpu_relax() may do more
      than needed in this specific mutex spin loops. On System z we also give
      up the time slice of the virtual cpu in cpu_relax(), which prevents
      effective spinning on the mutex.
      
      This patch replaces cpu_relax() in the spinning mutex code with
      arch_mutex_cpu_relax(), which can be defined by each architecture that
      selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
      this patch should not affect other architectures than System z for now.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1290437256.7455.4.camel@thinkpad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      335d7afb
    • H
      nohz: Fix printk_needs_cpu() return value on offline cpus · 61ab2544
      Heiko Carstens 提交于
      This patch fixes a hang observed with 2.6.32 kernels where timers got enqueued
      on offline cpus.
      
      printk_needs_cpu() may return 1 if called on offline cpus. When a cpu gets
      offlined it schedules the idle process which, before killing its own cpu, will
      call tick_nohz_stop_sched_tick(). That function in turn will call
      printk_needs_cpu() in order to check if the local tick can be disabled. On
      offline cpus this function should naturally return 0 since regardless if the
      tick gets disabled or not the cpu will be dead short after. That is besides the
      fact that __cpu_disable() should already have made sure that no interrupts on
      the offlined cpu will be delivered anyway.
      
      In this case it prevents tick_nohz_stop_sched_tick() to call
      select_nohz_load_balancer(). No idea if that really is a problem. However what
      made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is
      used within __mod_timer() to select a cpu on which a timer gets enqueued. If
      printk_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get
      updated when a cpu gets offlined. It may contain the cpu number of an offline
      cpu. In turn timers get enqueued on an offline cpu and not very surprisingly
      they never expire and cause system hangs.
      
      This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses
      get_nohz_timer_target() which doesn't have that problem. However there might be
      other problems because of the too early exit tick_nohz_stop_sched_tick() in
      case a cpu goes offline.
      
      Easiest way to fix this is just to test if the current cpu is offline and call
      printk_tick() directly which clears the condition.
      
      Alternatively I tried a cpu hotplug notifier which would clear the condition,
      however between calling the notifier function and printk_needs_cpu() something
      could have called printk() again and the problem is back again. This seems to
      be the safest fix.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      LKML-Reference: <20101126120235.406766476@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      61ab2544
    • H
      printk: Fix wake_up_klogd() vs cpu hotplug · 49f41383
      Heiko Carstens 提交于
      wake_up_klogd() may get called from preemptible context but uses
      __raw_get_cpu_var() to write to a per cpu variable. If it gets preempted
      between getting the address and writing to it, the cpu in question could be
      offline if the process gets scheduled back and hence writes to the per cpu data
      of an offline cpu.
      
      This buggy behaviour was introduced with fa33507a "printk: robustify
      printk, fix #2" which was supposed to fix a "using smp_processor_id() in
      preemptible" warning.
      
      Let's use this_cpu_write() instead which disables preemption and makes sure
      that the outlined scenario cannot happen.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101126124247.GC7023@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49f41383
    • P
      perf: Fix the software context switch counter · ee6dcfa4
      Peter Zijlstra 提交于
      Stephane noticed that because the perf_sw_event() call is inside the
      perf_event_task_sched_out() call it won't get called unless we
      have a per-task counter.
      Reported-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ee6dcfa4
    • T
      perf: Fix inherit vs. context rotation bug · dddd3379
      Thomas Gleixner 提交于
      It was found that sometimes children of tasks with inherited events had
      one extra event. Eventually it turned out to be due to the list rotation
      no being exclusive with the list iteration in the inheritance code.
      
      Cure this by temporarily disabling the rotation while we inherit the events.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dddd3379
  7. 23 11月, 2010 5 次提交
  8. 20 11月, 2010 1 次提交
    • L
      Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking" · 33e0d57f
      Linus Torvalds 提交于
      This reverts commit 59365d13.
      
      It turns out that this can break certain existing user land setups.
      Quoth Sarah Sharp:
      
       "On Wednesday, I updated my branch to commit 460781b5 from linus' tree,
        and my box would not boot.  klogd segfaulted, which stalled the whole
        system.
      
        At first I thought it actually hung the box, but it continued booting
        after 5 minutes, and I was able to log in.  It dropped back to the
        text console instead of the graphical bootup display for that period
        of time.  dmesg surprisingly still works.  I've bisected the problem
        down to this commit (commit 59365d13)
      
        The box is running klogd 1.5.5ubuntu3 (from Jaunty).  Yes, I know
        that's old.  I read the bit in the commit about changing the
        permissions of kallsyms after boot, but if I can't boot that doesn't
        help."
      
      So let's just keep the old default, and encourage distributions to do
      the "chmod -r /proc/kallsyms" in their bootup scripts.  This is not
      worth a kernel option to change default behavior, since it's so easily
      done in user space.
      Reported-and-bisected-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
      Cc: Marcus Meissner <meissner@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Jesper Juhl <jj@chaosbits.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33e0d57f
  9. 18 11月, 2010 19 次提交
  10. 17 11月, 2010 1 次提交
    • M
      kernel: make /proc/kallsyms mode 400 to reduce ease of attacking · 59365d13
      Marcus Meissner 提交于
      Making /proc/kallsyms readable only for root by default makes it
      slightly harder for attackers to write generic kernel exploits by
      removing one source of knowledge where things are in the kernel.
      
      This is the second submit, discussion happened on this on first submit
      and mostly concerned that this is just one hole of the sieve ...  but
      one of the bigger ones.
      
      Changing the permissions of at least System.map and vmlinux is also
      required to fix the same set, but a packaging issue.
      
      Target of this starter patch and follow ups is removing any kind of
      kernel space address information leak from the kernel.
      
      [ Side note: the default of root-only reading is the "safe" value, and
        it's easy enough to then override at any time after boot.  The /proc
        filesystem allows root to change the permissions with a regular
        chmod, so you can "revert" this at run-time by simply doing
      
          chmod og+r /proc/kallsyms
      
        as root if you really want regular users to see the kernel symbols.
        It does help some tools like "perf" figure them out without any
        setup, so it may well make sense in some situations.  - Linus ]
      Signed-off-by: NMarcus Meissner <meissner@suse.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NEugene Teo <eugeneteo@kernel.org>
      Reviewed-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59365d13