1. 13 3月, 2009 3 次提交
    • N
      fs: new inode i_state corruption fix · 7ef0d737
      Nick Piggin 提交于
      There was a report of a data corruption
      http://lkml.org/lkml/2008/11/14/121.  There is a script included to
      reproduce the problem.
      
      During testing, I encountered a number of strange things with ext3, so I
      tried ext2 to attempt to reduce complexity of the problem.  I found that
      fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
      cleared, even though instrumentation showed that unlock_new_inode had
      already been called for that inode.  This points to memory scribble, or
      synchronisation problme.
      
      i_state of I_NEW inodes is not protected by inode_lock because other
      processes are not supposed to touch them until I_LOCK (and I_NEW) is
      cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
      i_state revealed that generic_sync_sb_inodes is picking up new inodes from
      the inode lists and passing them to __writeback_single_inode without
      waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
      my case it would look like this:
      
      CPU0                            CPU1
      unlock_new_inode()              __sync_single_inode()
       reg <- inode->i_state
       reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
       reg -> inode->i_state          reg -> reg | I_SYNC
                                      reg -> inode->i_state
      
      Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.
      
      Fix for this is rather than wait for I_NEW inodes, just skip over them:
      inodes concurrently being created are not subject to data integrity
      operations, and should not significantly contribute to dirty memory
      either.
      
      After this change, I'm unable to reproduce any of the added warnings or
      hangs after ~1hour of running.  Previously, the new warnings would start
      immediately and hang would happen in under 5 minutes.
      
      I'm also testing on ext3 now, and so far no problems there either.  I
      don't know whether this fixes the problem reported above, but it fixes a
      real problem for me.
      
      Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Reported-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef0d737
    • L
      vfs: add missing unlock in sget() · a3cfbb53
      Li Zefan 提交于
      In sget(), destroy_super(s) is called with s->s_umount held, which makes
      lockdep unhappy.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3cfbb53
    • O
      pipe_rdwr_fasync: fix the error handling to prevent the leak/crash · e5bc49ba
      Oleg Nesterov 提交于
      If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error
      but leaves the file on ->fasync_readers.
      
      This was always wrong, but since 233e70f4
      "saner FASYNC handling on file close" we have the new problem.  Because in
      this case setfl() doesn't set FASYNC bit, __fput() will not do
      ->fasync(0), and we leak fasync_struct with ->fa_file pointing to the
      freed file.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5bc49ba
  2. 12 3月, 2009 2 次提交
  3. 11 3月, 2009 2 次提交
  4. 09 3月, 2009 1 次提交
    • C
      Btrfs: fix spinlock assertions on UP systems · b9447ef8
      Chris Mason 提交于
      btrfs_tree_locked was being used to make sure a given extent_buffer was
      properly locked in a few places.  But, it wasn't correct for UP compiled
      kernels.
      
      This switches it to using assert_spin_locked instead, and renames it to
      btrfs_assert_tree_locked to better reflect how it was really being used.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9447ef8
  5. 07 3月, 2009 3 次提交
  6. 05 3月, 2009 3 次提交
  7. 28 2月, 2009 2 次提交
  8. 27 2月, 2009 9 次提交
  9. 26 2月, 2009 2 次提交
  10. 28 2月, 2009 2 次提交
    • T
      ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2 · d8ae4601
      Theodore Ts'o 提交于
      In fs/Makefile, ext3 was placed before ext2 so that a root filesystem
      that possessed a journal, it would be mounted as ext3 instead of ext2.
      This was necessary because a cleanly unmounted ext3 filesystem was
      fully backwards compatible with ext2, and could be mounted by ext2 ---
      but it was desirable that it be mounted with ext3 so that the
      journaling would be enabled.
      
      The ext4 filesystem supports new incompatible features, so there is no
      danger of an ext4 filesystem being mistaken for an ext2 filesystem.
      At that point, the relative ordering of ext4 with respect to ext2
      didn't matter until ext4 gained the ability to mount filesystems
      without a journal starting in 2.6.29-rc1.  Now that this is the case,
      given that ext4 is before ext2, it means that root filesystems that
      were using the plain-jane ext2 format are getting mounted using the
      ext4 filesystem driver, which is a change in behavior which could be
      surprising to users.
      
      It's doubtful that there are that many ext2-only root filesystem users
      that would also have ext4 compiled into the kernel, but to adhere to
      the principle of least surprise, the correct ordering in fs/Makefile
      is ext3, followed by ext2, and finally ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d8ae4601
    • T
      ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze() · 8b1a8ff8
      Theodore Ts'o 提交于
      Commit c4be0c1d added error checking to ext4_freeze() when calling
      ext4_commit_super().  Unfortunately the patch failed to remove the
      original call to ext4_commit_super(), with the net result that when
      freezing the filesystem, the superblock gets written twice, the first
      time without error checking.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8b1a8ff8
  11. 25 2月, 2009 1 次提交
  12. 24 2月, 2009 1 次提交
  13. 23 2月, 2009 1 次提交
  14. 22 2月, 2009 1 次提交
    • T
      ext4: Add fallback for find_group_flex · 05bf9e83
      Theodore Ts'o 提交于
      This is a workaround for find_group_flex() which badly needs to be
      replaced.  One of its problems (besides ignoring the Orlov algorithm)
      is that it is a bit hyperactive about returning failure under
      suspicious circumstances.  This can lead to spurious ENOSPC failures
      even when there are inodes still available.
      
      Work around this for now by retrying the search using
      find_group_other() if find_group_flex() returns -1.  If
      find_group_other() succeeds when find_group_flex() has failed, log a
      warning message.
      
      A better block/inode allocator that will fix this problem for real has
      been queued up for the next merge window.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      05bf9e83
  15. 21 2月, 2009 7 次提交
    • T
      [JFFS2] fix mount crash caused by removed nodes · 4c41bd0e
      Thomas Gleixner 提交于
      At scan time we observed following scenario:
      
         node A inserted
         node B inserted
         node C inserted -> sets overlapped flag on node B
      
         node A is removed due to CRC failure -> overlapped flag on node B remains
      
         while (tn->overlapped)
         	 tn = tn_prev(tn);
      
         ==> crash, when tn_prev(B) is referenced.
      
      When the ultimate node is removed at scan time and the overlapped flag
      is set on the penultimate node, then nothing updates the overlapped
      flag of that node. The overlapped iterators blindly expect that the
      ultimate node does not have the overlapped flag set, which causes the
      scan code to crash.
      
      It would be a huge overhead to go through the node chain on node
      removal and fix up the overlapped flags, so detecting such a case on
      the fly in the overlapped iterators is a simpler and reliable
      solution.
      
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      4c41bd0e
    • S
      [CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts · eca6acf9
      Steve French 提交于
      When two different users mount the same Windows 2003 Server share using CIFS,
      the first session mounted can be invalidated.  Some servers invalidate the first
      smb session when a second similar user (e.g. two users who get mapped by server to "guest")
      authenticates an smb session from the same client.
      
      By making sure that we set the 2nd and subsequent vc numbers to nonzero values,
      this ensures that we will not have this problem.
      
      Fixes Samba bug 6004, problem description follows:
      How to reproduce:
      
      - configure an "open share" (full permissions to Guest user) on Windows 2003
      Server (I couldn't reproduce the problem with Samba server or Windows older
      than 2003)
      - mount the share twice with different users who will be authenticated as guest.
      
       noacl,noperm,user=john,dir_mode=0700,domain=DOMAIN,rw
       noacl,noperm,user=jeff,dir_mode=0700,domain=DOMAIN,rw
      
      Result:
      
      - just the mount point mounted last is accessible:
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      eca6acf9
    • S
      [CIFS] improve posix semantics of file create · c3b2a0c6
      Steve French 提交于
      Samba server added support for a new posix open/create/mkdir operation
      a year or so ago, and we added support to cifs for mkdir to use it,
      but had not added the corresponding code to file create.
      
      The following patch helps improve the performance of the cifs create
      path (to Samba and servers which support the cifs posix protocol
      extensions).  Using Connectathon basic test1, with 2000 files, the
      performance improved about 15%, and also helped reduce network traffic
      (17% fewer SMBs sent over the wire) due to saving a network round trip
      for the SetPathInfo on every file create.
      
      It should also help the semantics (and probably the performance) of
      write (e.g. when posix byte range locks are on the file) on file
      handles opened with posix create, and adds support for a few flags
      which would have to be ignored otherwise.
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      c3b2a0c6
    • S
      [CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS · 69765529
      Steve French 提交于
      Fixes kernel bug #10451 http://bugzilla.kernel.org/show_bug.cgi?id=10451
      
      Certain NAS appliances do not set the operating system or network operating system
      fields in the session setup response on the wire.  cifs was oopsing on the unexpected
      zero length response fields (when trying to null terminate a zero length field).
      
      This fixes the oops.
      Acked-by: NJeff Layton <jlayton@redhat.com>
      CC: stable <stable@kernel.org>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      69765529
    • J
      cifs: posix fill in inode needed by posix open · 44f68fad
      Jeff Layton 提交于
      function needed to prepare for posix open
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      44f68fad
    • J
      cifs: properly handle case where CIFSGetSrvInodeNumber fails · 950ec528
      Jeff Layton 提交于
      ...if it does then we pass a pointer to an unintialized variable for
      the inode number to cifs_new_inode. Have it pass a NULL pointer instead.
      
      Also tweak the function prototypes to reduce the amount of casting.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      950ec528
    • J
      cifs: refactor new_inode() calls and inode initialization · 132ac7b7
      Jeff Layton 提交于
      Move new inode creation into a separate routine and refactor the
      callers to take advantage of it.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      132ac7b7