1. 15 11月, 2007 2 次提交
    • L
      Migration: find correct vma in new_vma_page() · 3ad33b24
      Lee Schermerhorn 提交于
      We hit the BUG_ON() in mm/rmap.c:vma_address() when trying to migrate via
      mbind(MPOL_MF_MOVE) a non-anon region that spans multiple vmas.  For
      anon-regions, we just fail to migrate any pages beyond the 1st vma in the
      range.
      
      This occurs because do_mbind() collects a list of pages to migrate by
      calling check_range().  check_range() walks the task's mm, spanning vmas as
      necessary, to collect the migratable pages into a list.  Then, do_mbind()
      calls migrate_pages() passing the list of pages, a function to allocate new
      pages based on vma policy [new_vma_page()], and a pointer to the first vma
      of the range.
      
      For each page in the list, new_vma_page() calls page_address_in_vma()
      passing the page and the vma [first in range] to obtain the address to get
      for alloc_page_vma().  The page address is needed to get interleaving
      policy correct.  If the pages in the list come from multiple vmas,
      eventually, new_page_address() will pass that page to page_address_in_vma()
      with the incorrect vma.  For !PageAnon pages, this will result in a bug
      check in rmap.c:vma_address().  For anon pages, vma_address() will just
      return EFAULT and fail the migration.
      
      This patch modifies new_vma_page() to check the return value from
      page_address_in_vma().  If the return value is EFAULT, new_vma_page()
      searchs forward via vm_next for the vma that maps the page--i.e., that does
      not return EFAULT.  This assumes that the pages in the list handed to
      migrate_pages() is in address order.  This is currently case.  The patch
      documents this assumption in a new comment block for new_vma_page().
      
      If new_vma_page() cannot locate the vma mapping the page in a forward
      search in the mm, it will pass a NULL vma to alloc_page_vma().  This will
      result in the allocation using the task policy, if any, else system default
      policy.  This situation is unlikely, but the patch documents this behavior
      with a comment.
      
      Note, this patch results in restarting from the first vma in a multi-vma
      range each time new_vma_page() is called.  If this is not acceptable, we
      can make the vma argument a pointer, both in new_vma_page() and it's caller
      unmap_and_move() so that the value held by the loop in migrate_pages()
      always passes down the last vma in which a page was found.  This will
      require changes to all new_page_t functions passed to migrate_pages().  Is
      this necessary?
      
      For this patch to work, we can't bug check in vma_address() for pages
      outside the argument vma.  This patch removes the BUG_ON().  All other
      callers [besides new_vma_page()] already check the return status.
      
      Tested on x86_64, 4 node NUMA platform.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ad33b24
    • A
      slab: fix typo in allocation failure handling · cc550def
      Akinobu Mita 提交于
      This patch fixes wrong array index in allocation failure handling.
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc550def
  2. 13 11月, 2007 2 次提交
  3. 06 11月, 2007 1 次提交
  4. 05 11月, 2007 1 次提交
  5. 01 11月, 2007 1 次提交
    • L
      Remove broken ptrace() special-case code from file mapping · 5307cc1a
      Linus Torvalds 提交于
      The kernel has for random historical reasons allowed ptrace() accesses
      to access (and insert) pages into the page cache above the size of the
      file.
      
      However, Nick broke that by mistake when doing the new fault handling in
      commit 54cb8821 ("mm: merge populate and
      nopage into fault (fixes nonlinear)".  The breakage caused a hang with
      gdb when trying to access the invalid page.
      
      The ptrace "feature" really isn't worth resurrecting, since it really is
      wrong both from a portability _and_ from an internal page cache validity
      standpoint.  So this removes those old broken remnants, and fixes the
      ptrace() hang in the process.
      
      Noticed and bisected by Duane Griffin, who also supplied a test-case
      (quoth Nick: "Well that's probably the best bug report I've ever had,
      thanks Duane!").
      
      Cc: Duane Griffin <duaneg@dghda.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5307cc1a
  6. 31 10月, 2007 1 次提交
    • Z
      dio: fix cache invalidation after sync writes · bdb76ef5
      Zach Brown 提交于
      Commit commit 65b8291c ("dio: invalidate
      clean pages before dio write") introduced a bug which stopped dio from
      ever invalidating the page cache after writes.  It still invalidated it
      before writes so most users were fine.
      
      Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
      this bug when he had a buffered reader immediately reading file data
      after an O_DIRECT wirter had written the data.  The kernel issued
      read-ahead beyond the position of the reader which overlapped with the
      O_DIRECT writer.  The failure to invalidate after writes caused the
      reader to see stale data from the read-ahead.
      
      The following patch is originally from Karl.  The following commentary
      is his:
      
      	The below 3rd try takes on your suggestion of just invalidating
      	no matter what the retval from the direct_IO call.  I ran it
      	thru the test-case several times and it has worked every time.
      	The post-invalidate is probably still too early for async-directio,
      	but I don't have a testcase for that;  just sync.  And, this
      	won't be any worse in the async case.
      
      I added a test to the aio-dio-regress repository which mimics Karl's IO
      pattern.  It verifed the bad behaviour and that the patch fixed it.  I
      agree with Karl, this still doesn't help the case where a buffered
      reader follows an AIO O_DIRECT writer.  That will require a bit more
      work.
      
      This gives up on the idea of returning EIO to indicate to userspace that
      stale data remains if the invalidation failed.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Cc: Karl Schendel <kschendel@datallegro.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdb76ef5
  7. 30 10月, 2007 3 次提交
    • H
      fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE · 487e9bf2
      Hugh Dickins 提交于
      It's possible to provoke unionfs (not yet in mainline, though in mm and
      some distros) to hit shmem_writepage's BUG_ON(page_mapped(page)).  I expect
      it's possible to provoke the 2.6.23 ecryptfs in the same way (but the
      2.6.24 ecryptfs no longer calls lower level's ->writepage).
      
      This came to light with the recent find that AOP_WRITEPAGE_ACTIVATE could
      leak from tmpfs via write_cache_pages and unionfs to userspace.  There's
      already a fix (e4230030 - writeback: don't
      propagate AOP_WRITEPAGE_ACTIVATE) in the tree for that, and it's okay so
      far as it goes; but insufficient because it doesn't address the underlying
      issue, that shmem_writepage expects to be called only by vmscan (relying on
      backing_dev_info capabilities to prevent the normal writeback path from
      ever approaching it).
      
      That's an increasingly fragile assumption, and ramdisk_writepage (the other
      source of AOP_WRITEPAGE_ACTIVATEs) is already careful to check
      wbc->for_reclaim before returning it.  Make the same check in
      shmem_writepage, thereby sidestepping the page_mapped BUG also.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Erez Zadok <ezk@cs.sunysb.edu>
      Cc: <stable@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      487e9bf2
    • G
      mm/sparse-vmemmap.c: make sure init_mm is included · 8bca44bb
      Glauber de Oliveira Costa 提交于
      mm/sparse-vmemmap.c uses init_mm in some places.  However, it is not
      present in any of the headers currently included in the file.
      
      init_mm is defined as extern in sched.h, so we add it to the headers list
      
      Up to now, this problem was masked by the fact that functions like
      set_pte_at() and pmd_populate_kernel() are usually macros that expand to
      simpler variants that does not use the first parameter at all.
      Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bca44bb
    • L
      Revert "x86_64: allocate sparsemem memmap above 4G" · 6a22c57b
      Linus Torvalds 提交于
      This reverts commit 2e1c49db.
      
      First off, testing in Fedora has shown it to cause boot failures,
      bisected down by Martin Ebourne, and reported by Dave Jobes.  So the
      commit will likely be reverted in the 2.6.23 stable kernels.
      
      Secondly, in the 2.6.24 model, x86-64 has now grown support for
      SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
      bug is not visible any more, it's become invisible due to the code just
      being irrelevant and no longer enabled on the only architecture that
      this ever affected.
      Reported-by: NDave Jones <davej@redhat.com>
      Tested-by: NMartin Ebourne <fedora@ebourne.me.uk>
      Cc: Zou Nan hai <nanhai.zou@intel.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a22c57b
  8. 29 10月, 2007 3 次提交
  9. 23 10月, 2007 1 次提交
  10. 22 10月, 2007 4 次提交
  11. 21 10月, 2007 1 次提交
  12. 20 10月, 2007 14 次提交
  13. 19 10月, 2007 4 次提交
  14. 17 10月, 2007 2 次提交
    • A
      security/ cleanups · cbfee345
      Adrian Bunk 提交于
      This patch contains the following cleanups that are now possible:
      - remove the unused security_operations->inode_xattr_getsuffix
      - remove the no longer used security_operations->unregister_security
      - remove some no longer required exit code
      - remove a bunch of no longer used exports
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbfee345
    • S
      Implement file posix capabilities · b5376771
      Serge E. Hallyn 提交于
      Implement file posix capabilities.  This allows programs to be given a
      subset of root's powers regardless of who runs them, without having to use
      setuid and giving the binary all of root's powers.
      
      This version works with Kaigai Kohei's userspace tools, found at
      http://www.kaigai.gr.jp/index.php.  For more information on how to use this
      patch, Chris Friedhoff has posted a nice page at
      http://www.friedhoff.org/fscaps.html.
      
      Changelog:
      	Nov 27:
      	Incorporate fixes from Andrew Morton
      	(security-introduce-file-caps-tweaks and
      	security-introduce-file-caps-warning-fix)
      	Fix Kconfig dependency.
      	Fix change signaling behavior when file caps are not compiled in.
      
      	Nov 13:
      	Integrate comments from Alexey: Remove CONFIG_ ifdef from
      	capability.h, and use %zd for printing a size_t.
      
      	Nov 13:
      	Fix endianness warnings by sparse as suggested by Alexey
      	Dobriyan.
      
      	Nov 09:
      	Address warnings of unused variables at cap_bprm_set_security
      	when file capabilities are disabled, and simultaneously clean
      	up the code a little, by pulling the new code into a helper
      	function.
      
      	Nov 08:
      	For pointers to required userspace tools and how to use
      	them, see http://www.friedhoff.org/fscaps.html.
      
      	Nov 07:
      	Fix the calculation of the highest bit checked in
      	check_cap_sanity().
      
      	Nov 07:
      	Allow file caps to be enabled without CONFIG_SECURITY, since
      	capabilities are the default.
      	Hook cap_task_setscheduler when !CONFIG_SECURITY.
      	Move capable(TASK_KILL) to end of cap_task_kill to reduce
      	audit messages.
      
      	Nov 05:
      	Add secondary calls in selinux/hooks.c to task_setioprio and
      	task_setscheduler so that selinux and capabilities with file
      	cap support can be stacked.
      
      	Sep 05:
      	As Seth Arnold points out, uid checks are out of place
      	for capability code.
      
      	Sep 01:
      	Define task_setscheduler, task_setioprio, cap_task_kill, and
      	task_setnice to make sure a user cannot affect a process in which
      	they called a program with some fscaps.
      
      	One remaining question is the note under task_setscheduler: are we
      	ok with CAP_SYS_NICE being sufficient to confine a process to a
      	cpuset?
      
      	It is a semantic change, as without fsccaps, attach_task doesn't
      	allow CAP_SYS_NICE to override the uid equivalence check.  But since
      	it uses security_task_setscheduler, which elsewhere is used where
      	CAP_SYS_NICE can be used to override the uid equivalence check,
      	fixing it might be tough.
      
      	     task_setscheduler
      		 note: this also controls cpuset:attach_task.  Are we ok with
      		     CAP_SYS_NICE being used to confine to a cpuset?
      	     task_setioprio
      	     task_setnice
      		 sys_setpriority uses this (through set_one_prio) for another
      		 process.  Need same checks as setrlimit
      
      	Aug 21:
      	Updated secureexec implementation to reflect the fact that
      	euid and uid might be the same and nonzero, but the process
      	might still have elevated caps.
      
      	Aug 15:
      	Handle endianness of xattrs.
      	Enforce capability version match between kernel and disk.
      	Enforce that no bits beyond the known max capability are
      	set, else return -EPERM.
      	With this extra processing, it may be worth reconsidering
      	doing all the work at bprm_set_security rather than
      	d_instantiate.
      
      	Aug 10:
      	Always call getxattr at bprm_set_security, rather than
      	caching it at d_instantiate.
      
      [morgan@kernel.org: file-caps clean up for linux/capability.h]
      [bunk@kernel.org: unexport cap_inode_killpriv]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andrew Morgan <morgan@kernel.org>
      Signed-off-by: NAndrew Morgan <morgan@kernel.org>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5376771