1. 11 8月, 2011 2 次提交
    • J
      alarmtimers: Change alarmtimer functions to return alarmtimer_restart values · 4b41308d
      John Stultz 提交于
      In order to properly fix the denial of service issue with high freq
      periodic alarm timers, we need to push the re-arming logic into the
      alarm timer handler, much as the hrtimer code does.
      
      This patch introduces alarmtimer_restart enum and changes the
      alarmtimer handler declarations to use it as a return value. Further,
      to ease following changes, it extends the alarmtimer handler functions
      to also take the time at expiration. No logic is yet modified.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      4b41308d
    • J
      alarmtimers: Avoid possible denial of service with high freq periodic timers · 6af7e471
      John Stultz 提交于
      Its possible to jam up the alarm timers by setting very small interval
      timers, which will cause the alarmtimer subsystem to spend all of its time
      firing and restarting timers. This can effectivly lock up a box.
      
      A deeper fix is needed, closely mimicking the hrtimer code, but for now
      just cap the interval to 100us to avoid userland hanging the system.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: stable@kernel.org
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      6af7e471
  2. 10 8月, 2011 2 次提交
  3. 04 8月, 2011 6 次提交
  4. 02 8月, 2011 4 次提交
  5. 31 7月, 2011 1 次提交
  6. 28 7月, 2011 4 次提交
  7. 27 7月, 2011 7 次提交
    • A
      atomic: use <linux/atomic.h> · 60063497
      Arun Sharma 提交于
      This allows us to move duplicated code in <asm/atomic.h>
      (atomic_inc_not_zero() for now) to <linux/atomic.h>
      Signed-off-by: NArun Sharma <asharma@fb.com>
      Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60063497
    • H
      panic: panic=-1 for immediate reboot · 4302fbc8
      Hugh Dickins 提交于
      When a kernel BUG or oops occurs, ChromeOS intends to panic and
      immediately reboot, with stacktrace and other messages preserved in RAM
      across reboot.
      
      But the longer we delay, the more likely the user is to poweroff and
      lose the info.
      
      panic_timeout (seconds before rebooting) is set by panic= boot option or
      sysctl or /proc/sys/kernel/panic; but 0 means wait forever, so at
      present we have to delay at least 1 second.
      
      Let a negative number mean reboot immediately (with the small cosmetic
      benefit of suppressing that newline-less "Rebooting in %d seconds.."
      message).
      Signed-off-by: NHugh Dickins <hughd@chromium.org>
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4302fbc8
    • V
      gcov: disable CONSTRUCTORS for UML · 947be5df
      Vitaliy Ivanov 提交于
      Selecting GCOV for UML causing configuration mismatch:
      
        warning: (GCOV_KERNEL) selects CONSTRUCTORS which has unmet direct dependencies (!UML)
      
      Constructors are not needed for UML.
      Signed-off-by: NVitaliy Ivanov <vitalivanov@gmail.com>
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      947be5df
    • V
      ipc: introduce shm_rmid_forced sysctl · b34a6b1d
      Vasiliy Kulikov 提交于
      Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
      memory objects in current ipc namespace will be automatically forced to
      use IPC_RMID.
      
      The POSIX way of handling shmem allows one to create shm objects and
      call shmdt(), leaving shm object associated with no process, thus
      consuming memory not counted via rlimits.
      
      With shm_rmid_forced=1 the shared memory object is counted at least for
      one process, so OOM killer may effectively kill the fat process holding
      the shared memory.
      
      It obviously breaks POSIX - some programs relying on the feature would
      stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
      "orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
      reasons.
      
      The feature was previously impemented in -ow as a configure option.
      
      [akpm@linux-foundation.org: fix documentation, per Randy]
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: readability/conventionality tweaks]
      [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Solar Designer <solar@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b34a6b1d
    • D
    • M
      cpusets: randomize node rotor used in cpuset_mem_spread_node() · 778d3b0f
      Michal Hocko 提交于
      [ This patch has already been accepted as commit 0ac0c0d0 but later
        reverted (commit 35926ff5) because it itroduced arch specific
        __node_random which was defined only for x86 code so it broke other
        archs.  This is a followup without any arch specific code.  Other than
        that there are no functional changes.]
      
      Some workloads that create a large number of small files tend to assign
      too many pages to node 0 (multi-node systems).  Part of the reason is
      that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
      at node 0 for newly created tasks.
      
      This patch changes the rotor to be initialized to a random node number
      of the cpuset.
      
      [akpm@linux-foundation.org: fix layout]
      [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
      [mhocko@suse.cz: Make it arch independent]
      [akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
      Signed-off-by: NJack Steiner <steiner@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Paul Menage <menage@google.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Menage <menage@google.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      778d3b0f
    • S
      futex: Fix regression with read only mappings · 9ea71503
      Shawn Bohrer 提交于
      commit 7485d0d3 (futexes: Remove rw
      parameter from get_futex_key()) in 2.6.33 fixed two problems:  First, It
      prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW
      MAP_PRIVATE futex operations by forcing the COW to occur by
      unconditionally performing a write access get_user_pages_fast() to get
      the page.  The commit also introduced a user-mode regression in that it
      broke futex operations on read-only memory maps.  For example, this
      breaks workloads that have one or more reader processes doing a
      FUTEX_WAIT on a futex within a read only shared file mapping, and a
      writer processes that has a writable mapping issuing the FUTEX_WAKE.
      
      This fixes the regression for valid futex operations on RO mappings by
      trying a RO get_user_pages_fast() when the RW get_user_pages_fast()
      fails. This change makes it necessary to also check for invalid use
      cases, such as anonymous RO mappings (which can never change) and the
      ZERO_PAGE which the commit referenced above was written to address.
      
      This patch does restore the original behavior with RO MAP_PRIVATE
      mappings, which have inherent user-mode usage problems and don't really
      make sense.  With this patch performing a FUTEX_WAIT within a RO
      MAP_PRIVATE mapping will be successfully woken provided another process
      updates the region of the underlying mapped file.  However, the mmap()
      man page states that for a MAP_PRIVATE mapping:
      
        It is unspecified whether changes made to the file after
        the mmap() call are visible in the mapped region.
      
      So user-mode users attempting to use futex operations on RO MAP_PRIVATE
      mappings are depending on unspecified behavior.  Additionally a
      RO MAP_PRIVATE mapping could fail to wake up in the following case.
      
        Thread-A: call futex(FUTEX_WAIT, memory-region-A).
                  get_futex_key() return inode based key.
                  sleep on the key
        Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
        Thread-B: write memory-region-A.
                  COW happen. This process's memory-region-A become related
                  to new COWed private (ie PageAnon=1) page.
        Thread-B: call futex(FUETX_WAKE, memory-region-A).
                  get_futex_key() return mm based key.
                  IOW, we fail to wake up Thread-A.
      
      Once again doing something like this is just silly and users who do
      something like this get what they deserve.
      
      While RO MAP_PRIVATE mappings are nonsensical, checking for a private
      mapping requires walking the vmas and was deemed too costly to avoid a
      userspace hang.
      
      This Patch is based on Peter Zijlstra's initial patch with modifications to
      only allow RO mappings for futex operations that need VERIFY_READ access.
      Reported-by: NDavid Oliver <david@rgmadvisors.com>
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: peterz@infradead.org
      Cc: eric.dumazet@gmail.com
      Cc: zvonler@rgmadvisors.com
      Cc: hughd@google.com
      Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9ea71503
  8. 26 7月, 2011 4 次提交
    • S
      kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n · 626a0312
      Stephen Boyd 提交于
      If CONFIG_IKCONFIG=m but CONFIG_IKCONFIG_PROC=n we get a module that has
      no MODULE_LICENSE definition.  Move the MODULE_*() definitions outside the
      CONFIG_IKCONFIG_PROC #ifdef to prevent this configuration from tainting
      the kernel.
      Signed-off-by: NStephen Boyd <bebarino@gmail.com>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      626a0312
    • A
      notifiers: sys: move reboot notifiers into reboot.h · c5f41752
      Amerigo Wang 提交于
      It is not necessary to share the same notifier.h.
      
      This patch already moves register_reboot_notifier() and
      unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.
      
      [amwang@redhat.com: make allyesconfig succeed on ppc64]
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5f41752
    • M
      devres: fix possible use after free · ae891a1b
      Maxin B John 提交于
      devres uses the pointer value as key after it's freed, which is safe but
      triggers spurious use-after-free warnings on some static analysis tools.
      Rearrange code to avoid such warnings.
      Signed-off-by: NMaxin B. John <maxin.john@gmail.com>
      Reviewed-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae891a1b
    • B
      mm/futex: fix futex writes on archs with SW tracking of dirty & young · 2efaca92
      Benjamin Herrenschmidt 提交于
      I haven't reproduced it myself but the fail scenario is that on such
      machines (notably ARM and some embedded powerpc), if you manage to hit
      that futex path on a writable page whose dirty bit has gone from the PTE,
      you'll livelock inside the kernel from what I can tell.
      
      It will go in a loop of trying the atomic access, failing, trying gup to
      "fix it up", getting succcess from gup, go back to the atomic access,
      failing again because dirty wasn't fixed etc...
      
      So I think you essentially hang in the kernel.
      
      The scenario is probably rare'ish because affected architecture are
      embedded and tend to not swap much (if at all) so we probably rarely hit
      the case where dirty is missing or young is missing, but I think Shan has
      a piece of SW that can reliably reproduce it using a shared writable
      mapping & fork or something like that.
      
      On archs who use SW tracking of dirty & young, a page without dirty is
      effectively mapped read-only and a page without young unaccessible in the
      PTE.
      
      Additionally, some architectures might lazily flush the TLB when relaxing
      write protection (by doing only a local flush), and expect a fault to
      invalidate the stale entry if it's still present on another processor.
      
      The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
      "fix it up" by causing get_user_pages() which would then be equivalent to
      taking the fault.
      
      However that isn't the case.  get_user_pages() will not call
      handle_mm_fault() in the case where the PTE seems to have the right
      permissions, regardless of the dirty and young state.  It will eventually
      update those bits ...  in the struct page, but not in the PTE.
      
      Additionally, it will not handle the lazy TLB flushing that can be
      required by some architectures in the fault case.
      
      Basically, gup is the wrong interface for the job.  The patch provides a
      more appropriate one which boils down to just calling handle_mm_fault()
      since what we are trying to do is simulate a real page fault.
      
      The futex code currently attempts to write to user memory within a
      pagefault disabled section, and if that fails, tries to fix it up using
      get_user_pages().
      
      This doesn't work on archs where the dirty and young bits are maintained
      by software, since they will gate access permission in the TLB, and will
      not be updated by gup().
      
      In addition, there's an expectation on some archs that a spurious write
      fault triggers a local TLB flush, and that is missing from the picture as
      well.
      
      I decided that adding those "features" to gup() would be too much for this
      already too complex function, and instead added a new simpler
      fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
      which the futex code can call.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-by: NShan Hai <haishan.bai@gmail.com>
      Tested-by: NShan Hai <haishan.bai@gmail.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Darren Hart <darren.hart@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2efaca92
  9. 24 7月, 2011 4 次提交
  10. 22 7月, 2011 6 次提交