1. 26 5月, 2011 9 次提交
    • S
      vfs: push dentry_unhash on rmdir into file systems · 79bf7c73
      Sage Weil 提交于
      Only a few file systems need this.  Start by pushing it down into each
      fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
      basis.
      
      This does not change behavior for any in-tree file systems.
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      79bf7c73
    • S
      vfs: remove dget() from dentry_unhash() · 64252c75
      Sage Weil 提交于
      This serves no useful purpose that I can discern.  All callers (rename,
      rmdir) hold their own reference to the dentry.
      
      A quick audit of all file systems showed no relevant checks on the value
      of d_count in vfs_rmdir/vfs_rename_dir paths.
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      64252c75
    • S
      vfs: dentry_unhash immediately prior to rmdir · 48293699
      Sage Weil 提交于
      This presumes that there is no reason to unhash a dentry if we fail because
      it is a mountpoint or the LSM check fails, and that the LSM checks do not
      depend on the dentry being unhashed.
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      48293699
    • J
      vfs: Block mmapped writes while the fs is frozen · ea13a864
      Jan Kara 提交于
      We should not allow file modification via mmap while the filesystem is
      frozen. So block in block_page_mkwrite() while the filesystem is frozen.
      We cannot do the blocking wait in __block_page_mkwrite() since e.g. ext4
      will want to call that function with transaction started in some cases
      and that would deadlock. But we can at least do the non-blocking reliable
      check in __block_page_mkwrite() which is the hardest part anyway.
      
      We have to check for frozen filesystem with the page marked dirty and under
      page lock with which we then return from ->page_mkwrite(). Only that way we
      cannot race with writeback done by freezing code - either we mark the page
      dirty after the writeback has started, see freezing in progress and block, or
      writeback will wait for our page lock which is released only when the fault is
      done and then writeback will writeout and writeprotect the page again.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ea13a864
    • J
      vfs: Create __block_page_mkwrite() helper passing error values back · 24da4fab
      Jan Kara 提交于
      Create __block_page_mkwrite() helper which does all what block_page_mkwrite()
      does except that it passes back errors from __block_write_begin /
      block_commit_write calls.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      24da4fab
    • R
      fs/namespace.c: bound mount propagation fix · 7c6e984d
      Roman Borisov 提交于
      This issue was discovered by users of busybox.  And the bug is actual for
      busybox users, I don't know how it affects others.  Apparently, mount is
      called with and without MS_SILENT, and this affects mount() behaviour.
      But MS_SILENT is only supposed to affect kernel logging verbosity.
      
      The following script was run in an empty test directory:
      
      mkdir -p mount.dir mount.shared1 mount.shared2
      touch mount.dir/a mount.dir/b
      mount -vv --bind         mount.shared1 mount.shared1
      mount -vv --make-rshared mount.shared1
      mount -vv --bind         mount.shared2 mount.shared2
      mount -vv --make-rshared mount.shared2
      mount -vv --bind mount.shared2 mount.shared1
      mount -vv --bind mount.dir     mount.shared2
      ls -R mount.dir mount.shared1 mount.shared2
      umount mount.dir mount.shared1 mount.shared2 2>/dev/null
      umount mount.dir mount.shared1 mount.shared2 2>/dev/null
      umount mount.dir mount.shared1 mount.shared2 2>/dev/null
      rm -f mount.dir/a mount.dir/b mount.dir/c
      rmdir mount.dir mount.shared1 mount.shared2
      
      mount -vv was used to show the mount() call arguments and result.
      Output shows that flag argument has 0x00008000 = MS_SILENT bit:
      
      mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0
      mount: mount('','mount.shared1','',0x0010c000,''):0
      mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0
      mount: mount('','mount.shared2','',0x0010c000,''):0
      mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0
      mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0
      mount.dir:
      a
      b
      
      mount.shared1:
      
      mount.shared2:
      a
      b
      
      After adding --loud option to remove MS_SILENT bit from just one mount cmd:
      
      mkdir -p mount.dir mount.shared1 mount.shared2
      touch mount.dir/a mount.dir/b
      mount -vv --bind         mount.shared1 mount.shared1 2>&1
      mount -vv --make-rshared mount.shared1               2>&1
      mount -vv --bind         mount.shared2 mount.shared2 2>&1
      mount -vv --loud --make-rshared mount.shared2               2>&1  # <-HERE
      mount -vv --bind mount.shared2 mount.shared1         2>&1
      mount -vv --bind mount.dir     mount.shared2         2>&1
      ls -R mount.dir mount.shared1 mount.shared2      2>&1
      umount mount.dir mount.shared1 mount.shared2 2>/dev/null
      umount mount.dir mount.shared1 mount.shared2 2>/dev/null
      umount mount.dir mount.shared1 mount.shared2 2>/dev/null
      rm -f mount.dir/a mount.dir/b mount.dir/c
      rmdir mount.dir mount.shared1 mount.shared2
      
      The result is different now - look closely at mount.shared1 directory listing.
      Now it does show files 'a' and 'b':
      
      mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0
      mount: mount('','mount.shared1','',0x0010c000,''):0
      mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0
      mount: mount('','mount.shared2','',0x00104000,''):0
      mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0
      mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0
      
      mount.dir:
      a
      b
      
      mount.shared1:
      a
      b
      
      mount.shared2:
      a
      b
      
      The analysis shows that MS_SILENT flag which is ON by default in any
      busybox-> mount operations cames to flags_to_propagation_type function and
      causes the error return while is_power_of_2 checking because the function
      expects only one bit set.  This doesn't allow to do busybox->mount with
      any --make-[r]shared, --make-[r]private etc options.
      
      Moreover, the recently added flags_to_propagation_type() function doesn't
      allow us to do such operations as --make-[r]private --make-[r]shared etc.
      when MS_SILENT is on.  The idea or clearing the MS_SILENT flag came from
      to Denys Vlasenko.
      Signed-off-by: NRoman Borisov <ext-roman.borisov@nokia.com>
      Reported-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Cc: Alexander Shishkin <virtuoso@slind.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7c6e984d
    • J
      exportfs: reallow building as a module · 79fead47
      Jonas Gorski 提交于
      Commit 990d6c2d ("vfs: Add name to file
      handle conversion support") changed EXPORTFS to be a bool.
      This was needed for earlier revisions of the original patch, but the actual
      commit put the code needing it into its own file that only gets compiled
      when FHANDLE is selected which in turn selects EXPORTFS.
      So EXPORTFS can be safely compiled as a module when not selecting FHANDLE.
      Signed-off-by: NJonas Gorski <jonas.gorski@gmail.com>
      Acked-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      79fead47
    • A
      merge handle_reval_dot and nameidata_drop_rcu_last · 9f1fafee
      Al Viro 提交于
      new helper: complete_walk().  Done on successful completion
      of walk, drops out of RCU mode, does d_revalidate of final
      result if that hadn't been done already.
      
      handle_reval_dot() and nameidata_drop_rcu_last() subsumed into
      that one; callers converted to use of complete_walk().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9f1fafee
    • A
      consolidate nameidata_..._drop_rcu() · 19660af7
      Al Viro 提交于
      Merge these into a single function (unlazy_walk(nd, dentry)),
      kill ..._maybe variants
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19660af7
  2. 29 4月, 2011 1 次提交
  3. 28 4月, 2011 3 次提交
  4. 27 4月, 2011 1 次提交
  5. 26 4月, 2011 14 次提交
  6. 25 4月, 2011 2 次提交
  7. 24 4月, 2011 2 次提交
    • L
      vfs: get rid of insane dentry hashing rules · dea3667b
      Linus Torvalds 提交于
      The dentry hashing rules have been really quite complicated for a long
      while, in odd ways.  That made functions like __d_drop() very fragile
      and non-obvious.
      
      In particular, whether a dentry was hashed or not was indicated with an
      explicit DCACHE_UNHASHED bit.  That's despite the fact that the hash
      abstraction that the dentries use actually have a 'is this entry hashed
      or not' model (which is a simple test of the 'pprev' pointer).
      
      The reason that was done is because we used the normal 'is this entry
      unhashed' model to mark whether the dentry had _ever_ been hashed in the
      dentry hash tables, and that logic goes back many years (commit
      b3423415: "dcache: avoid RCU for never-hashed dentries").
      
      That, in turn, meant that __d_drop had totally different unhashing logic
      for the dentry hash table case and for the anonymous dcache case,
      because in order to use the "is this dentry hashed" logic as a flag for
      whether it had ever been on the RCU hash table, we had to unhash such a
      dentry differently so that we'd never think that it wasn't 'unhashed'
      and wouldn't be free'd correctly.
      
      That's just insane.  It made the logic really hard to follow, when there
      were two different kinds of "unhashed" states, and one of them (the one
      that used "list_bl_unhashed()") really had nothing at all to do with
      being unhashed per se, but with a very subtle lifetime rule instead.
      
      So turn all of it around, and make it logical.
      
      Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
      the dentry is on the hash chains or not, use the hash chain unhashed
      logic for that.  Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
      and everything makes sense.
      
      And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
      If we ever insert the dentry into the dentry hash table so that it is
      visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
      needs the RCU lifetime rules.  Now suddently that test at dentry free
      time makes sense too.
      
      And because unhashing now is sane and doesn't depend on where the dentry
      got unhashed from (because the dentry hash chain details doesn't have
      some subtle side effects), we can re-unify the __d_drop() logic and use
      common code for the unhashing.
      
      Also fix one more open-coded hash chain bit_spin_lock() that I missed in
      the previous chain locking cleanup commit.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dea3667b
    • L
      vfs: get rid of 'struct dcache_hash_bucket' abstraction · b07ad996
      Linus Torvalds 提交于
      It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
      help anything - quite the reverse.  All the users end up having to know
      about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
      etc. So it just makes the code look confusing.
      
      And the cost of it is extra '&b->head' syntactic noise, but more
      importantly it spuriously makes the hash table dentry list look
      different from the per-superblock DCACHE_DISCONNECTED dentry list.
      
      As a result, the code ended up using ad-hoc locking for one case and
      special helper functions for what is really another totally identical
      case in the very same function.
      
      Make it all look and work the same.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b07ad996
  8. 22 4月, 2011 1 次提交
  9. 21 4月, 2011 4 次提交
    • J
      vfs: Pass setxattr(2) flags properly · df7e1303
      Jan Kara 提交于
      For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
      XATTR_REPLACE) to the filesystem specific helper. This caused that
      setxattr(2) syscall just ignored these flags.
      
      Fix the bug by passing flags correctly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df7e1303
    • A
      UBIFS: fix master node recovery · 6e0d9fd3
      Artem Bityutskiy 提交于
      This patch fixes the following symptoms:
      1. Unmount UBIFS cleanly.
      2. Start mounting UBIFS R/W and have a power cut immediately
      3. Start mounting UBIFS R/O, this succeeds
      4. Try to re-mount UBIFS R/W - this fails immediately or later on,
         because UBIFS will write the master node to the flash area
         which has been written before.
      
      The analysis of the problem:
      
      1. UBIFS is unmounted cleanly, both copies of the master node are clean.
      2. UBIFS is being mounter R/W, starts changing master node copy 1, and
         a power cut happens. The copy N1 becomes corrupted.
      3. UBIFS is being mounted R/O. It notices the copy N1 is corrupted and
         reads copy N2. Copy N2 is clean.
      4. Because of R/O mode, UBIFS cannot recover copy 1.
      5. The mount code (ubifs_mount()) sees that the master node is clean,
         so it decides that no recovery is needed.
      6. We are re-mounting R/W. UBIFS believes no recovery is needed and
         starts updating the master node, but copy N1 is still corrupted
         and was not recovered!
      
      Fix this problem by marking the master node as dirty every time we
      recover it and we are in R/O mode. This forces further recovery and
      the UBIFS cleans-up the corruptions and recovers the copy N1 when
      re-mounting R/W later.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: stable@kernel.org
      6e0d9fd3
    • A
      UBIFS: fix false assertion warning in case of I/O failures · 1a067a22
      Artem Bityutskiy 提交于
      When UBIFS switches to R/O mode because it detects I/O failures, then
      when we unmount, we still may have allocated budget, and the assertions
      which verify that we have not budget will fire. But it is expected to
      have the budget in case of I/O failures, so the assertion warnings will
      be false. Suppress them for the I/O failure case.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      1a067a22
    • D
      xfs: fix duplicate message output · 3eff1268
      Dave Chinner 提交于
      Commit 957935dc ("xfs: fix xfs_debug warnings" broke the logic in
      __xfs_printk(). Instead of only printing one of two possible output
      strings based on whether the fs has a name or not, it outputs both.
      Fix it to only output one message again.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      3eff1268
  10. 20 4月, 2011 3 次提交
    • A
      UBIFS: fix false space checking failure · 8c230d9a
      Artem Bityutskiy 提交于
      This patch fixes UBIFS mount failure when the debugging support is enabled,
      we are recovering from a power cut, we were first mounter R/O and we are
      re-mounting R/W. In this case we should not assume that the amount of free
      space before we have re-mounted R/W and after are equivalent, because
      when we have mounted R/O the file-system is in a non-committed state so
      the amount of free space is slightly smaller, due to the fact that we cannot
      predict the amount of free space precisely before we commit.
      
      This patch fixes the issue by skipping the debugging check in case of
      recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com>
      here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reported-by: NCaizhiyong <caizhiyong@huawei.com>
      Cc: stable@kernel.org [2.6.30+]
      8c230d9a
    • S
      Open with O_CREAT flag set fails to open existing files on non writable directories · 1574dff8
      Sachin Prabhu 提交于
      An open on a NFS4 share using the O_CREAT flag on an existing file for
      which we have permissions to open but contained in a directory with no
      write permissions will fail with EACCES.
      
      A tcpdump shows that the client had set the open mode to UNCHECKED which
      indicates that the file should be created if it doesn't exist and
      encountering an existing flag is not an error. Since in this case the
      file exists and can be opened by the user, the NFS server is wrong in
      attempting to check create permissions on the parent directory.
      
      The patch adds a conditional statement to check for create permissions
      only if the file doesn't exist.
      Signed-off-by: NSachin S. Prabhu <sprabhu@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1574dff8
    • C
      Btrfs: do some plugging in the submit_bio threads · 211588ad
      Chris Mason 提交于
      The Btrfs submit bio threads have a small number of
      threads responsible for pushing down bios we've collected
      for a large number of devices.
      
      Since we do all the bios for a single device at once,
      we want to make sure we unplug and send down the bios
      for each device as we're done processing them.
      
      The new plugging API removed the btrfs code to
      unplug while processing bios, this adds it back with
      the new API.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      211588ad