1. 28 11月, 2014 5 次提交
    • E
      xfs: catch invalid negative blknos in _xfs_buf_find() · db52d09e
      Eric Sandeen 提交于
      Here blkno is a daddr_t, which is a __s64; it's possible to hold
      a value which is negative, and thus pass the (blkno >= eofs)
      test.  Then we try to do a xfs_perag_get() for a ridiculous
      agno via xfs_daddr_to_agno(), and bad things happen when that
      fails, and returns a null pag which is dereferenced shortly
      thereafter.
      
      Found via a user-supplied fuzzed image...
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      db52d09e
    • B
      xfs: allow lazy sb counter sync during filesystem freeze sequence · 91ee575f
      Brian Foster 提交于
      The expectation since the introduction the lazy superblock counters is
      that the counters are synced and superblock logged appropriately as part
      of the filesystem freeze sequence. This does not occur, however, due to
      the logic in xfs_fs_writable() that prevents progress when the fs is in
      any state other than SB_UNFROZEN.
      
      While this is a bug, it has not been exposed to date because the last
      thing XFS does during freeze is dirty the log. The log recovery process
      recalculates the counters from AGI/AGF metadata to ensure everything is
      correct. Therefore should a crash occur while an fs is frozen, the
      subsequent log recovery puts everything back in order. See the following
      commit for reference:
      
      	92821e2b [XFS] Lazy Superblock Counters
      
      We might not always want to rely on dirtying the log on a frozen fs.
      Modify xfs_log_sbcount() to proceed when the filesystem is freezing but
      not once the freeze process has completed. Modify xfs_fs_writable() to
      accept the minimum freeze level for which modifications should be
      blocked to support various codepaths.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      91ee575f
    • B
      xfs: fix error handling in xfs_qm_log_quotaoff() · 5d45ee1b
      Brian Foster 提交于
      The error handling in xfs_qm_log_quotaoff() has a couple problems. If
      xfs_trans_commit() fails, we fall through to the error block and call
      xfs_trans_cancel(). This is incorrect on commit failure. If
      xfs_trans_reserve() fails, we jump to the error block, cancel the tp and
      restore the superblock qflags to oldsbqflag. However, oldsbqflag has
      been initialized to zero and not yet updated from the original flags so
      we set the flags to zero.
      
      Fix up the error handling in xfs_qm_log_quotaoff() to not restore flags
      if they haven't been modified and not cancel the tp on commit failure.
      Remove the flag restore code altogether because commit error is the only
      failure condition and we don't know whether the transaction made it to
      disk.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5d45ee1b
    • B
      xfs: replace on-stack xfs_trans_res with pointer in xfs_create() · 062647a8
      Brian Foster 提交于
      There's no need to store a full struct xfs_trans_res on the stack in
      xfs_create() and copy the fields. Use a pointer to the appropriate
      structures embedded in the xfs_mount.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      062647a8
    • B
      xfs: replace global xfslogd wq with per-mount wq · 78c931b8
      Brian Foster 提交于
      The xfslogd workqueue is a global, single-job workqueue for buffer ioend
      processing. This means we allow for a single work item at a time for all
      possible XFS mounts on a system. fsstress testing in loopback XFS over
      XFS configurations has reproduced xfslogd deadlocks due to the single
      threaded nature of the queue and dependencies introduced between the
      separate XFS instances by online discard (-o discard).
      
      Discard over a loopback device converts the discard request to a hole
      punch (fallocate) on the underlying file. Online discard requests are
      issued synchronously and from xfslogd context in XFS, hence the xfslogd
      workqueue is blocked in the upper fs waiting on a hole punch request to
      be servied in the lower fs. If the lower fs issues I/O that depends on
      xfslogd to complete, both filesystems end up hung indefinitely. This is
      reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
      the '-o discard' mount option.
      
      Further, docker implementations appear to use this kind of configuration
      for container instance filesystems by default (container fs->dm->
      loop->base fs) and therefore are subject to this deadlock when running
      on XFS.
      
      Replace the global xfslogd workqueue with a per-mount variant. This
      guarantees each mount access to a single worker and prevents deadlocks
      due to inter-fs dependencies introduced by discard. Since the queue is
      only responsible for buffer iodone processing at this point in time,
      rename xfslogd to xfs-buf.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      78c931b8
  2. 25 10月, 2014 4 次提交
  3. 24 10月, 2014 13 次提交
    • A
      fix inode leaks on d_splice_alias() failure exits · 51486b90
      Al Viro 提交于
      d_splice_alias() callers expect it to either stash the inode reference
      into a new alias, or drop the inode reference.  That makes it possible
      to just return d_splice_alias() result from ->lookup() instance, without
      any extra housekeeping required.
      
      Unfortunately, that should include the failure exits.  If d_splice_alias()
      returns an error, it leaves the dentry it has been given negative and
      thus it *must* drop the inode reference.  Easily fixed, but it goes way
      back and will need backporting.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      51486b90
    • M
      fs: limit filesystem stacking depth · 69c433ed
      Miklos Szeredi 提交于
      Add a simple read-only counter to super_block that indicates how deep this
      is in the stack of filesystems.  Previously ecryptfs was the only stackable
      filesystem and it explicitly disallowed multiple layers of itself.
      
      Overlayfs, however, can be stacked recursively and also may be stacked
      on top of ecryptfs or vice versa.
      
      To limit the kernel stack usage we must limit the depth of the
      filesystem stack.  Initially the limit is set to 2.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      69c433ed
    • E
      overlayfs: implement show_options · f45827e8
      Erez Zadok 提交于
      This is useful because of the stacking nature of overlayfs.  Users like to
      find out (via /proc/mounts) which lower/upper directory were used at mount
      time.
      
      AV: even failing ovl_parse_opt() could've done some kstrdup()
      AV: failure of ovl_alloc_entry() should end up with ENOMEM, not EINVAL
      Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      f45827e8
    • A
      overlayfs: add statfs support · cc259639
      Andy Whitcroft 提交于
      Add support for statfs to the overlayfs filesystem.  As the upper layer
      is the target of all write operations assume that the space in that
      filesystem is the space in the overlayfs.  There will be some inaccuracy as
      overwriting a file will copy it up and consume space we were not expecting,
      but it is better than nothing.
      
      Use the upper layer dentry and mount from the overlayfs root inode,
      passing the statfs call to that filesystem.
      Signed-off-by: NAndy Whitcroft <apw@canonical.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      cc259639
    • M
      overlay filesystem · e9be9d5e
      Miklos Szeredi 提交于
      Overlayfs allows one, usually read-write, directory tree to be
      overlaid onto another, read-only directory tree.  All modifications
      go to the upper, writable layer.
      
      This type of mechanism is most often used for live CDs but there's a
      wide variety of other uses.
      
      The implementation differs from other "union filesystem"
      implementations in that after a file is opened all operations go
      directly to the underlying, lower or upper, filesystems.  This
      simplifies the implementation and allows native performance in these
      cases.
      
      The dentry tree is duplicated from the underlying filesystems, this
      enables fast cached lookups without adding special support into the
      VFS.  This uses slightly more memory than union mounts, but dentries
      are relatively small.
      
      Currently inodes are duplicated as well, but it is a possible
      optimization to share inodes for non-directories.
      
      Opening non directories results in the open forwarded to the
      underlying filesystem.  This makes the behavior very similar to union
      mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
      descriptors).
      
      Usage:
      
        mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay
      
      The following cotributions have been folded into this patch:
      
      Neil Brown <neilb@suse.de>:
       - minimal remount support
       - use correct seek function for directories
       - initialise is_real before use
       - rename ovl_fill_cache to ovl_dir_read
      
      Felix Fietkau <nbd@openwrt.org>:
       - fix a deadlock in ovl_dir_read_merged
       - fix a deadlock in ovl_remove_whiteouts
      
      Erez Zadok <ezk@fsl.cs.sunysb.edu>
       - fix cleanup after WARN_ON
      
      Sedat Dilek <sedat.dilek@googlemail.com>
       - fix up permission to confirm to new API
      
      Robin Dong <hao.bigrat@gmail.com>
       - fix possible leak in ovl_new_inode
       - create new inode in ovl_link
      
      Andy Whitcroft <apw@canonical.com>
       - switch to __inode_permission()
       - copy up i_uid/i_gid from the underlying inode
      
      AV:
       - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
       - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
         lack of check for udir being the parent of upper, dropping and regaining
         the lock on udir (which would require _another_ check for parent being
         right).
       - bogus d_drop() in copyup and rename [fix from your mail]
       - copyup/remove and copyup/rename races [fix from your mail]
       - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
       - ovl_entry_free() is pointless - it's just a kfree_rcu()
       - fold ovl_do_lookup() into ovl_lookup()
       - manually assigning ->d_op is wrong.  Just use ->s_d_op.
       [patches picked from Miklos]:
       * copyup/remove and copyup/rename races
       * bogus d_drop() in copyup and rename
      
      Also thanks to the following people for testing and reporting bugs:
      
        Jordi Pujol <jordipujolp@gmail.com>
        Andy Whitcroft <apw@canonical.com>
        Michal Suchanek <hramrach@centrum.cz>
        Felix Fietkau <nbd@openwrt.org>
        Erez Zadok <ezk@fsl.cs.sunysb.edu>
        Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      e9be9d5e
    • M
      ext4: support RENAME_WHITEOUT · cd808dec
      Miklos Szeredi 提交于
      Add whiteout support to ext4_rename().  A whiteout inode (chrdev/0,0) is
      created before the rename takes place.  The whiteout inode is added to the
      old entry instead of deleting it.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      cd808dec
    • M
      vfs: add RENAME_WHITEOUT · 0d7a8555
      Miklos Szeredi 提交于
      This adds a new RENAME_WHITEOUT flag.  This flag makes rename() create a
      whiteout of source.  The whiteout creation is atomic relative to the
      rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      0d7a8555
    • M
      vfs: add whiteout support · 787fb6bc
      Miklos Szeredi 提交于
      Whiteout isn't actually a new file type, but is represented as a char
      device (Linus's idea) with 0/0 device number.
      
      This has several advantages compared to introducing a new whiteout file
      type:
      
       - no userspace API changes (e.g. trivial to make backups of upper layer
         filesystem, without losing whiteouts)
      
       - no fs image format changes (you can boot an old kernel/fsck without
         whiteout support and things won't break)
      
       - implementation is trivial
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      787fb6bc
    • M
      vfs: export check_sticky() · cbdf35bc
      Miklos Szeredi 提交于
      It's already duplicated in btrfs and about to be used in overlayfs too.
      
      Move the sticky bit check to an inline helper and call the out-of-line
      helper only in the unlikly case of the sticky bit being set.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      cbdf35bc
    • M
      vfs: introduce clone_private_mount() · c771d683
      Miklos Szeredi 提交于
      Overlayfs needs a private clone of the mount, so create a function for
      this and export to modules.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      c771d683
    • M
      vfs: export __inode_permission() to modules · bd5d0856
      Miklos Szeredi 提交于
      We need to be able to check inode permissions (but not filesystem implied
      permissions) for stackable filesystems.  Expose this interface for overlayfs.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      bd5d0856
    • M
      vfs: export do_splice_direct() to modules · 1c118596
      Miklos Szeredi 提交于
      Export do_splice_direct() to modules.  Needed by overlay filesystem.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      1c118596
    • M
      vfs: add i_op->dentry_open() · 4aa7c634
      Miklos Szeredi 提交于
      Add a new inode operation i_op->dentry_open().  This is for stacked filesystems
      that want to return a struct file from a different filesystem.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      4aa7c634
  4. 20 10月, 2014 1 次提交
  5. 18 10月, 2014 1 次提交
  6. 17 10月, 2014 10 次提交
    • C
      Revert "Btrfs: race free update of commit root for ro snapshots" · d3797308
      Chris Mason 提交于
      This reverts commit 9c3b306e.
      
      Switching only one commit root during a transaction is wrong because it
      leads the fs into an inconsistent state. All commit roots should be
      switched at once, at transaction commit time, otherwise backref walking
      can often miss important references that were only accessible through
      the old commit root.  Plus, the root item for the snapshot's root wasn't
      getting updated and preventing the next transaction commit to do it.
      
      This made several users get into random corruption issues after creation
      of readonly snapshots.
      
      A regression test for xfstests will follow soon.
      
      Cc: stable@vger.kernel.org # 3.17
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d3797308
    • S
    • S
      Workaround Mac server problem · b5b374ea
      Steve French 提交于
      Mac server returns that they support CIFS Unix Extensions but
      doesn't actually support QUERY_FILE_UNIX_BASIC so mount fails.
      
      Workaround this problem by disabling use of Unix CIFS protocol
      extensions if server returns an EOPNOTSUPP error on
      QUERY_FILE_UNIX_BASIC during mount.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      b5b374ea
    • S
      Remap reserved posix characters by default (part 3/3) · 2baa2682
      Steve French 提交于
      This is a bigger patch, but its size is mostly due to
      a single change for how we check for remapping illegal characters
      in file names - a lot of repeated, small changes to
      the way callers request converting file names.
      
      The final patch in the series does the following:
      
      1) changes default behavior for cifs to be more intuitive.
      Currently we do not map by default to seven reserved characters,
      ie those valid in POSIX but not in NTFS/CIFS/SMB3/Windows,
      unless a mount option (mapchars) is specified.  Change this
      to by default always map and map using the SFM maping
      (like the Mac uses) unless the server negotiates the CIFS Unix
      Extensions (like Samba does when mounting with the cifs protocol)
      when the remapping of the characters is unnecessary.  This should
      help SMB3 mounts in particular since Samba will likely be
      able to implement this mapping with its new "vfs_fruit" module
      as it will be doing for the Mac.
      2) if the user specifies the existing "mapchars" mount option then
      use the "SFU" (Microsoft Services for Unix, SUA) style mapping of
      the seven characters instead.
      3) if the user specifies "nomapposix" then disable SFM/MAC style mapping
      (so no character remapping would be used unless the user specifies
      "mapchars" on mount as well, as above).
      4) change all the places in the code that check for the superblock
      flag on the mount which is set by mapchars and passed in on all
      path based operation and change it to use a small function call
      instead to set the mapping type properly (and check for the
      mapping type in the cifs unicode functions)
      Signed-off-by: NSteve French <smfrench@gmail.com>
      2baa2682
    • S
      Allow conversion of characters in Mac remap range (part 2) · a4153cb1
      Steve French 提交于
      The previous patch allowed remapping reserved characters from directory
      listenings, this patch adds conversion the other direction, allowing
      opening of files with any of the seven reserved characters.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      a4153cb1
    • S
      Allow conversion of characters in Mac remap range. Part 1 · b693855f
      Steve French 提交于
      This allows directory listings to Mac to display filenames
      correctly which have been created with illegal (to Windows)
      characters in their filename. It does not allow
      converting the other direction yet ie opening files with
      these characters (followon patch).
      
      There are seven reserved characters that need to be remapped when
      mounting to Windows, Mac (or any server without Unix Extensions) which
      are valid in POSIX but not in the other OS.
      
      : \ < > ? * |
      
      We used the normal UCS-2 remap range for this in order to convert this
      to/from UTF8 as did Windows Services for Unix (basically add 0xF000 to
      any of the 7 reserved characters), at least when the "mapchars" mount
      option was specified.
      
      Mac used a very slightly different "Services for Mac" remap range
      0xF021 through 0xF027.  The attached patch allows cifs.ko (the kernel
      client) to read directories on macs containing files with these
      characters and display their names properly.  In theory this even
      might be useful on mounts to Samba when the vfs_catia or new
      "vfs_fruit" module is loaded.
      
      Currently the 7 reserved characters look very strange in directory
      listings from cifs.ko to Mac server.  This patch allows these file
      name characters to be read (requires specifying mapchars on mount).
      
      Two additional changes are needed:
      1) Make it more automatic: a way of detecting enough info so that
      we know to try to always remap these characters or not. Various
      have suggested that the SFM approach be made the default when
      the server does not support POSIX Unix extensions (cifs mounts
      to Samba for example) so need to make SFM remapping the default
      unless mapchars (SFU style mapping) specified on mount or no
      mapping explicitly requested or no mapping needed (cifs mounts to Samba).
      
      2) Adding a patch to map the characters the other direction
      (ie UTF-8 to UCS-2 on open).  This patch does it for translating
      readdir entries (ie UCS-2 to UTF-8)
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      b693855f
    • S
      mfsymlinks support for SMB2.1/SMB3. Part 2 query symlink · c22870ea
      Steve French 提交于
      Adds support on SMB2.1 and SMB3 mounts for emulation of symlinks
      via the "Minshall/French" symlink format already used for cifs
      mounts when mfsymlinks mount option is used (and also used by Apple).
        http://wiki.samba.org/index.php/UNIX_Extensions#Minshall.2BFrench_symlinks
      This second patch adds support to query them (recognize them as symlinks
      and read them).  Third version of patch makes minor corrections
      to error handling.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NStefan Metzmacher <metze@samba.org>
      c22870ea
    • S
      Add mfsymlinks support for SMB2.1/SMB3. Part 1 create symlink · 5ab97578
      Steve French 提交于
      Adds support on SMB2.1 and SMB3 mounts for emulation of symlinks
      via the "Minshall/French" symlink format already used for cifs
      mounts when mfsymlinks mount option is used (and also used by Apple).
      http://wiki.samba.org/index.php/UNIX_Extensions#Minshall.2BFrench_symlinks
      This first patch adds support to create them.  The next patch will
      add support for recognizing them and reading them.  Although CIFS/SMB3
      have other types of symlinks, in the many use cases they aren't
      practical (e.g. either require cifs only mounts with unix extensions
      to Samba, or require the user to be Administrator to Windows for SMB3).
      This also helps enable running additional xfstests over SMB3 (since some
      xfstests directly or indirectly require symlink support).
      Signed-off-by: NSteve French <smfrench@gmail.com>
      CC: Stefan Metzmacher <metze@samba.org>
      5ab97578
    • S
      Allow mknod and mkfifo on SMB2/SMB3 mounts · db8b631d
      Steve French 提交于
      The "sfu" mount option did not work on SMB2/SMB3 mounts.
      With these changes when the "sfu" mount option is passed in
      on an smb2/smb2.1/smb3 mount the client can emulate (and
      recognize) fifo and device (character and device files).
      
      In addition the "sfu" mount option should not conflict
      with "mfsymlinks" (symlink emulation) as we will never
      create "sfu" style symlinks, but using "sfu" mount option
      will allow us to recognize existing symlinks, created with
      Microsoft "Services for Unix" (SFU and SUA).
      
      To enable the "sfu" mount option for SMB2/SMB3 the calling
      syntax of the generic cifs/smb2/smb3 sync_read and sync_write
      protocol dependent function needed to be changed (we
      don't have a file struct in all cases), but this actually
      ended up simplifying the code a little.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      db8b631d
    • S
      73322979
  7. 16 10月, 2014 4 次提交
  8. 15 10月, 2014 2 次提交