1. 07 2月, 2008 26 次提交
  2. 06 2月, 2008 14 次提交
    • A
      deprecate smbfs in favour of cifs · c7736339
      Andrew Morton 提交于
      smbfs is a bit buggy and has no maintainer.  Change it to shout at the user on
      the first five mount attempts - tell them to switch to CIFS.
      
      Come December we'll mark it BROKEN and see what happens.
      
      [olecom@flower.upol.cz: documentation update]
      Cc: Urban Widmark <urban@teststation.com>
      Acked-by: NSteven French <sfrench@us.ibm.com>
      Signed-off-by: NOleg Verych <olecom@flower.upol.cz>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Adrian Bunk <bunk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7736339
    • D
      uml: fix hostfs tv_usec calculations · d7b88513
      Dominique Quatravaux 提交于
      To convert from tv_nsec to tv_usec, one needs to divide by 1000, not multiply.
      Signed-off-by: NDominique Quatravaux <dominique@quatravaux.org>
      Signed-off-by: NJeff Dike <jdike@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7b88513
    • A
      Add 64-bit capability support to the kernel · e338d263
      Andrew Morgan 提交于
      The patch supports legacy (32-bit) capability userspace, and where possible
      translates 32-bit capabilities to/from userspace and the VFS to 64-bit
      kernel space capabilities.  If a capability set cannot be compressed into
      32-bits for consumption by user space, the system call fails, with -ERANGE.
      
      FWIW libcap-2.00 supports this change (and earlier capability formats)
      
       http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
      
      [akpm@linux-foundation.org: coding-syle fixes]
      [akpm@linux-foundation.org: use get_task_comm()]
      [ezk@cs.sunysb.edu: build fix]
      [akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
      [akpm@linux-foundation.org: unused var]
      [serue@us.ibm.com: export __cap_ symbols]
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e338d263
    • D
      VFS: Reorder vfs_getxattr to avoid unnecessary calls to the LSM · 4bea5805
      David P. Quigley 提交于
      Originally vfs_getxattr would pull the security xattr variable using
      the inode getxattr handle and then proceed to clobber it with a subsequent call
      to the LSM.
      
      This patch reorders the two operations such that when the xattr requested is
      in the security namespace it first attempts to grab the value from the LSM
      directly.
      
      If it fails to obtain the value because there is no module present or the
      module does not support the operation it will fall back to using the inode
      getxattr operation.
      
      In the event that both are inaccessible it returns EOPNOTSUPP.
      Signed-off-by: NDavid P. Quigley <dpquigl@tycho.nsa.gov>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4bea5805
    • D
      VFS/Security: Rework inode_getsecurity and callers to return resulting buffer · 42492594
      David P. Quigley 提交于
      This patch modifies the interface to inode_getsecurity to have the function
      return a buffer containing the security blob and its length via parameters
      instead of relying on the calling function to give it an appropriately sized
      buffer.
      
      Security blobs obtained with this function should be freed using the
      release_secctx LSM hook.  This alleviates the problem of the caller having to
      guess a length and preallocate a buffer for this function allowing it to be
      used elsewhere for Labeled NFS.
      
      The patch also removed the unused err parameter.  The conversion is similar to
      the one performed by Al Viro for the security_getprocattr hook.
      Signed-off-by: NDavid P. Quigley <dpquigl@tycho.nsa.gov>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42492594
    • F
      writeback: speed up writeback of big dirty files · 8bc3be27
      Fengguang Wu 提交于
      After making dirty a 100M file, the normal behavior is to start the
      writeback for all data after 30s delays.  But sometimes the following
      happens instead:
      
      	- after 30s:    ~4M
      	- after 5s:     ~4M
      	- after 5s:     all remaining 92M
      
      Some analyze shows that the internal io dispatch queues goes like this:
      
      		s_io            s_more_io
      		-------------------------
      	1)	100M,1K         0
      	2)	1K              96M
      	3)	0               96M
      1) initial state with a 100M file and a 1K file
      
      2) 4M written, nr_to_write <= 0, so write more
      
      3) 1K written, nr_to_write > 0, no more writes(BUG)
      
      nr_to_write > 0 in (3) fools the upper layer to think that data have all
      been written out.  The big dirty file is actually still sitting in
      s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
      becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
      may starve newly expired inodes in s_dirty.  It is also not an option to
      draw inodes from both s_more_io and s_dirty, an let the loop go on: this
      might lead to live locks, and might also starve other superblocks in sync
      time(well kupdate may still starve some superblocks, that's another bug).
      
      We have to return when a full scan of s_io completes.  So nr_to_write > 0
      does not necessarily mean that "all data are written".  This patch
      introduces a flag writeback_control.more_io to indicate that more io should
      be done.  With it the big dirty file no longer has to wait for the next
      kupdate invokation 5s later.
      
      In sync_sb_inodes() we only set more_io on super_blocks we actually
      visited.  This avoids the interaction between two pdflush deamons.
      
      Also in __sync_single_inode() we don't blindly keep requeuing the io if the
      filesystem cannot progress.  Failing to do so may lead to 100% iowait.
      Tested-by: NMike Snitzer <snitzer@gmail.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Michael Rubin <mrubin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bc3be27
    • Q
      skip writing data pages when inode is under I_SYNC · 2d544564
      Qi Yong 提交于
      Since I_SYNC was split out from I_LOCK, the concern in commit
      4b89eed9 ("Write back inode data pages
      even when the inode itself is locked") is not longer valid.
      
      We should revert to the original behavior: in __writeback_single_inode(),
      when we find an I_SYNC-ed inode and we're not doing a data-integrity sync,
      skip writing entirely.  Otherwise, we are double calling do_writepages()
      Signed-off-by: NQi Yong <qiyong@fc-cn.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Cc: Michael Rubin <mrubin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d544564
    • A
      Fix /proc dcache deadlock in do_exit · 7766755a
      Andrea Arcangeli 提交于
      This patch fixes a sles9 system hang in start_this_handle from a customer
      with some heavy workload where all tasks are waiting on kjournald to commit
      the transaction, but kjournald waits on t_updates to go down to zero (it
      never does).
      
      This was reported as a lowmem shortage deadlock but when checking the debug
      data I noticed the VM wasn't under pressure at all (well it was really
      under vm pressure, because lots of tasks hanged in the VM prune_dcache
      methods trying to flush dirty inodes, but no task was hanging in GFP_NOFS
      mode, the holder of the journal handle should have if this was a vm issue
      in the first place).
      
      No task was apparently holding the leftover handle in the committing
      transaction, so I deduced t_updates was stuck to 1 because a journal_stop
      was never run by some path (this turned out to be correct).  With a debug
      patch adding proper reverse links and stack trace logging in ext3 deployed
      in production, I found journal_stop is never run because
      mark_inode_dirty_sync is called inside release_task called by do_exit.
      (that was quite fun because I would have never thought about this
      subtleness, I thought a regular path in ext3 had a bug and it forgot to
      call journal_stop)
      
      do_exit->release_task->mark_inode_dirty_sync->schedule() (will never
      come back to run journal_stop)
      
      The reason is that shrink_dcache_parent is racy by design (feature not
      a bug) and it can do blocking I/O in some case, but the point is that
      calling shrink_dcache_parent at the last stage of do_exit isn't safe
      for self-reaping tasks.
      
      I guess the memory pressure of the unbalanced highmem system allowed
      to trigger this more easily.
      
      Now mainline doesn't have this line in iput (like sles9 has):
      
          	     if (inode->i_state & I_DIRTY_DELAYED)
      	     			mark_inode_dirty_sync(inode);
      
      so it will probably not crash with ext3, but for example ext2 implements an
      I/O-blocking ext2_put_inode that will lead to similar screwups with
      ext2_free_blocks never coming back and it's definitely wrong to call
      blocking-IO paths inside do_exit.  So this should fix a subtle bug in
      mainline too (not verified in practice though).  The equivalent fix for
      ext3 is also not verified yet to fix the problem in sles9 but I don't have
      doubt it will (it usually takes days to crash, so it'll take weeks to be
      sure).
      
      An alternate fix would be to offload that work to a kernel thread, but I
      don't think a reschedule for this is worth it, the vm should be able to
      collect those entries for the synchronous release_task.
      Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7766755a
    • M
      maps4: make page monitoring /proc file optional · 1e883281
      Matt Mackall 提交于
      Make /proc/ page monitoring configurable
      
      This puts the following files under an embedded config option:
      
      /proc/pid/clear_refs
      /proc/pid/smaps
      /proc/pid/pagemap
      /proc/kpagecount
      /proc/kpageflags
      
      [akpm@linux-foundation.org: Kconfig fix]
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e883281
    • M
      maps4: add /proc/kpageflags interface · 304daa81
      Matt Mackall 提交于
      This makes a subset of physical page flags available to userspace. Together
      with /proc/pid/kpagemap, this allows tracking of a wide variety of VM behaviors.
      
      Exported flags are decoupled from the kernel's internal flags. This
      allows us to reorder flag bits, and synthesize any bits that get
      redefined in terms of other bits.
      
      [akpm@linux-foundation.org: remove unneeded access_ok()]
      [akpm@linux-foundation.org: s/0/NULL/]
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      304daa81
    • M
      maps4: add /proc/kpagecount interface · 161f47bf
      Matt Mackall 提交于
      This makes physical page map counts available to userspace. Together
      with /proc/pid/pagemap and /proc/pid/clear_refs, this can be used to
      monitor memory usage on a per-page basis.
      
      [akpm@linux-foundation.org: remove unneeded access_ok()]
      [bunk@stusta.de: make struct proc_kpagemap static]
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      161f47bf
    • M
      maps4: add /proc/pid/pagemap interface · 85863e47
      Matt Mackall 提交于
      This interface provides a mapping for each page in an address space to its
      physical page frame number, allowing precise determination of what pages are
      mapped and what pages are shared between processes.
      
      New in this version:
      
      - headers gone again (as recommended by Dave Hansen and Alan Cox)
      - 64-bit entries (as per discussion with Andi Kleen)
      - swap pte information exported (from Dave Hansen)
      - page walker callback for holes (from Dave Hansen)
      - direct put_user I/O (as suggested by Rusty Russell)
      
      This patch folds in cleanups and swap PTE support from Dave Hansen
      <haveblue@us.ibm.com>.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85863e47
    • M
      maps4: regroup task_mmu by interface · a6198797
      Matt Mackall 提交于
      Reorder source so that all the code and data for each interface is together.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6198797
    • M
      maps4: move clear_refs code to task_mmu.c · f248dcb3
      Matt Mackall 提交于
      This puts all the clear_refs code where it belongs and probably lets things
      compile on MMU-less systems as well.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f248dcb3