1. 05 8月, 2010 1 次提交
  2. 03 8月, 2010 1 次提交
  3. 31 7月, 2010 4 次提交
  4. 30 7月, 2010 1 次提交
    • D
      CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials · de09a977
      David Howells 提交于
      It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
      credentials by incrementing their usage count after their replacement by the
      task being accessed.
      
      What happens is that get_task_cred() can race with commit_creds():
      
      	TASK_1			TASK_2			RCU_CLEANER
      	-->get_task_cred(TASK_2)
      	rcu_read_lock()
      	__cred = __task_cred(TASK_2)
      				-->commit_creds()
      				old_cred = TASK_2->real_cred
      				TASK_2->real_cred = ...
      				put_cred(old_cred)
      				  call_rcu(old_cred)
      		[__cred->usage == 0]
      	get_cred(__cred)
      		[__cred->usage == 1]
      	rcu_read_unlock()
      							-->put_cred_rcu()
      							[__cred->usage == 1]
      							panic()
      
      However, since a tasks credentials are generally not changed very often, we can
      reasonably make use of a loop involving reading the creds pointer and using
      atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
      
      If successful, we can safely return the credentials in the knowledge that, even
      if the task we're accessing has released them, they haven't gone to the RCU
      cleanup code.
      
      We then change task_state() in procfs to use get_task_cred() rather than
      calling get_cred() on the result of __task_cred(), as that suffers from the
      same problem.
      
      Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
      tripped when it is noticed that the usage count is not zero as it ought to be,
      for example:
      
      kernel BUG at kernel/cred.c:168!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/kernel/mm/ksm/run
      CPU 0
      Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
      745
      RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
      RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
      RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
      RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
      R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
      FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
      Stack:
       ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
      <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
      <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
      Call Trace:
       [<ffffffff810698cd>] put_cred+0x13/0x15
       [<ffffffff81069b45>] commit_creds+0x16b/0x175
       [<ffffffff8106aace>] set_current_groups+0x47/0x4e
       [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
       [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
      Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
      48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
      04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
      RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
       RSP <ffff88019e7e9eb8>
      ---[ end trace df391256a100ebdd ]---
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de09a977
  5. 28 7月, 2010 2 次提交
  6. 26 7月, 2010 3 次提交
  7. 22 7月, 2010 5 次提交
  8. 21 7月, 2010 1 次提交
    • N
      drop_monitor: convert some kfree_skb call sites to consume_skb · 70d4bf6d
      Neil Horman 提交于
      Convert a few calls from kfree_skb to consume_skb
      
      Noticed while I was working on dropwatch that I was detecting lots of internal
      skb drops in several places.  While some are legitimate, several were not,
      freeing skbs that were at the end of their life, rather than being discarded due
      to an error.  This patch converts those calls sites from using kfree_skb to
      consume_skb, which quiets the in-kernel drop_monitor code from detecting them as
      drops.  Tested successfully by myself
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70d4bf6d
  9. 19 7月, 2010 11 次提交
    • C
      kmemleak: Add support for NO_BOOTMEM configurations · 9078370c
      Catalin Marinas 提交于
      With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and
      friends use the early_res functions for memory management when
      NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the
      corresponding code paths for bootmem allocations.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      9078370c
    • P
      update email address · a2531293
      Pavel Machek 提交于
      pavel@suse.cz no longer works, replace it with working address.
      Signed-off-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      a2531293
    • D
      padata: Added sysfs primitives to padata subsystem · 5e017dc3
      Dan Kruchinin 提交于
      Added sysfs primitives to padata subsystem. Now API user may
      embedded kobject each padata instance contains into any sysfs
      hierarchy. For now padata sysfs interface provides only
      two objects:
          serial_cpumask   [RW] - cpumask for serial workers
          parallel_cpumask [RW] - cpumask for parallel workers
      Signed-off-by: NDan Kruchinin <dkruchinin@acm.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      5e017dc3
    • D
      padata: Make two separate cpumasks · e15bacbe
      Dan Kruchinin 提交于
      The aim of this patch is to make two separate cpumasks
      for padata parallel and serial workers respectively.
      It allows user to make more thin and sophisticated configurations
      of padata framework. For example user may bind parallel and serial workers to non-intersecting
      CPU groups to gain better performance. Also each padata instance has notifiers chain for its
      cpumasks now. If either parallel or serial or both masks were changed all
      interested subsystems will get notification about that. It's especially useful
      if padata user uses algorithm for callback CPU selection according to serial cpumask.
      Signed-off-by: NDan Kruchinin <dkruchinin@acm.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e15bacbe
    • R
      PM / Suspend: Fix ordering of calls in suspend error paths · ce441011
      Rafael J. Wysocki 提交于
      The ACPI suspend code calls suspend_nvs_free() at a wrong place,
      which may lead to a memory leak if there's an error executing
      acpi_pm_prepare(), because acpi_pm_finish() will not be called in
      that case.  However, the root cause of this problem is the
      apparently confusing ordering of calls in suspend error paths that
      needs to be fixed.
      
      In addition to that, fix a typo in a label name in suspend.c.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NLen Brown <len.brown@intel.com>
      ce441011
    • R
      PM / Hibernate: Fix snapshot error code path · d074ee02
      Rafael J. Wysocki 提交于
      There is an inconsistency between hibernation_platform_enter()
      and hibernation_snapshot(), because the latter calls
      hibernation_ops->end() after failing hibernation_ops->begin(), while
      the former doesn't do that.  Make hibernation_snapshot() behave in
      the same way as hibernation_platform_enter() in that respect.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NLen Brown <len.brown@intel.com>
      d074ee02
    • R
      PM / Hibernate: Fix hibernation_platform_enter() · f6f71f18
      Rafael J. Wysocki 提交于
      The hibernation_platform_enter() function calls dpm_suspend_noirq()
      instead of dpm_resume_noirq() by mistake.  Fix this.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NLen Brown <len.brown@intel.com>
      f6f71f18
    • J
      pm_qos: Get rid of the allocation in pm_qos_add_request() · 82f68251
      James Bottomley 提交于
      All current users of pm_qos_add_request() have the ability to supply
      the memory required by the pm_qos routines, so make them do this and
      eliminate the kmalloc() with pm_qos_add_request().  This has the
      double benefit of making the call never fail and allowing it to be
      called from atomic context.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      82f68251
    • J
      pm_qos: Reimplement using plists · 5f279845
      James Bottomley 提交于
      A lot of the pm_qos extremal value handling is really duplicating what a
      priority ordered list does, just in a less efficient fashion.  Simply
      redoing the implementation in terms of a plist gets rid of a lot of this
      junk (although there are several other strange things that could do with
      tidying up, like pm_qos_request_list has to carry the pm_qos_class with
      every node, simply because it doesn't get passed in to
      pm_qos_update_request even though every caller knows full well what
      parameter it's updating).
      
      I think this redo is a win independent of android, so we should do
      something like this now.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      5f279845
    • R
      PM: Make it possible to avoid races between wakeup and system sleep · c125e96f
      Rafael J. Wysocki 提交于
      One of the arguments during the suspend blockers discussion was that
      the mainline kernel didn't contain any mechanisms making it possible
      to avoid races between wakeup and system suspend.
      
      Generally, there are two problems in that area.  First, if a wakeup
      event occurs exactly when /sys/power/state is being written to, it
      may be delivered to user space right before the freezer kicks in, so
      the user space consumer of the event may not be able to process it
      before the system is suspended.  Second, if a wakeup event occurs
      after user space has been frozen, it is not generally guaranteed that
      the ongoing transition of the system into a sleep state will be
      aborted.
      
      To address these issues introduce a new global sysfs attribute,
      /sys/power/wakeup_count, associated with a running counter of wakeup
      events and three helper functions, pm_stay_awake(), pm_relax(), and
      pm_wakeup_event(), that may be used by kernel subsystems to control
      the behavior of this attribute and to request the PM core to abort
      system transitions into a sleep state already in progress.
      
      The /sys/power/wakeup_count file may be read from or written to by
      user space.  Reads will always succeed (unless interrupted by a
      signal) and return the current value of the wakeup events counter.
      Writes, however, will only succeed if the written number is equal to
      the current value of the wakeup events counter.  If a write is
      successful, it will cause the kernel to save the current value of the
      wakeup events counter and to abort the subsequent system transition
      into a sleep state if any wakeup events are reported after the write
      has returned.
      
      [The assumption is that before writing to /sys/power/state user space
      will first read from /sys/power/wakeup_count.  Next, user space
      consumers of wakeup events will have a chance to acknowledge or
      veto the upcoming system transition to a sleep state.  Finally, if
      the transition is allowed to proceed, /sys/power/wakeup_count will
      be written to and if that succeeds, /sys/power/state will be written
      to as well.  Still, if any wakeup events are reported to the PM core
      by kernel subsystems after that point, the transition will be
      aborted.]
      
      Additionally, put a wakeup events counter into struct dev_pm_info and
      make these per-device wakeup event counters available via sysfs,
      so that it's possible to check the activity of various wakeup event
      sources within the kernel.
      
      To illustrate how subsystems can use pm_wakeup_event(), make the
      low-level PCI runtime PM wakeup-handling code use it.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: Nmarkgross <markgross@thegnar.org>
      Reviewed-by: NAlan Stern <stern@rowland.harvard.edu>
      c125e96f
    • C
      PM / Hibernate: Fix typos in comments in kernel/power/swap.c · 90133673
      Cesar Eduardo Barros 提交于
      There are a few typos in kernel/power/swap.c.  Fix them.
      Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      90133673
  10. 14 7月, 2010 5 次提交
  11. 12 7月, 2010 1 次提交
  12. 05 7月, 2010 1 次提交
  13. 01 7月, 2010 2 次提交
    • P
      sched: Cure nr_iowait_cpu() users · 8c215bd3
      Peter Zijlstra 提交于
      Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
      broke things by not making sure preemption was indeed disabled
      by the callers of nr_iowait_cpu() which took the iowait value of
      the current cpu.
      
      This resulted in a heap of preempt warnings. Cure this by making
      nr_iowait_cpu() take a cpu number and fix up the callers to pass
      in the right number.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: linux-pm@lists.linux-foundation.org
      LKML-Reference: <1277968037.1868.120.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c215bd3
    • M
      futex: futex_find_get_task remove credentails check · 7a0ea09a
      Michal Hocko 提交于
      futex_find_get_task is currently used (through lookup_pi_state) from two
      contexts, futex_requeue and futex_lock_pi_atomic.  None of the paths
      looks it needs the credentials check, though.  Different (e)uids
      shouldn't matter at all because the only thing that is important for
      shared futex is the accessibility of the shared memory.
      
      The credentail check results in glibc assert failure or process hang (if
      glibc is compiled without assert support) for shared robust pthread
      mutex with priority inheritance if a process tries to lock already held
      lock owned by a process with a different euid:
      
      pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
      
      The problem is that futex_lock_pi_atomic which is called when we try to
      lock already held lock checks the current holder (tid is stored in the
      futex value) to get the PI state.  It uses lookup_pi_state which in turn
      gets task struct from futex_find_get_task.  ESRCH is returned either
      when the task is not found or if credentials check fails.
      
      futex_lock_pi_atomic simply returns if it gets ESRCH.  glibc code,
      however, doesn't expect that robust lock returns with ESRCH because it
      should get either success or owner died.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a0ea09a
  14. 30 6月, 2010 1 次提交
  15. 25 6月, 2010 1 次提交