1. 20 4月, 2018 1 次提交
  2. 16 4月, 2018 3 次提交
  3. 14 4月, 2018 1 次提交
  4. 13 4月, 2018 12 次提交
  5. 12 4月, 2018 23 次提交
    • D
      btrfs: add SPDX header to Kconfig · 852eb3ae
      David Sterba 提交于
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      852eb3ae
    • D
      btrfs: replace GPL boilerplate by SPDX -- sources · c1d7c514
      David Sterba 提交于
      Remove GPL boilerplate text (long, short, one-line) and keep the rest,
      ie. personal, company or original source copyright statements. Add the
      SPDX header.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c1d7c514
    • D
      btrfs: replace GPL boilerplate by SPDX -- headers · 9888c340
      David Sterba 提交于
      Remove GPL boilerplate text (long, short, one-line) and keep the rest,
      ie. personal, company or original source copyright statements. Add the
      SPDX header.
      
      Unify the include protection macros to match the file names.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9888c340
    • F
      Btrfs: fix loss of prealloc extents past i_size after fsync log replay · 471d557a
      Filipe Manana 提交于
      Currently if we allocate extents beyond an inode's i_size (through the
      fallocate system call) and then fsync the file, we log the extents but
      after a power failure we replay them and then immediately drop them.
      This behaviour happens since about 2009, commit c71bf099 ("Btrfs:
      Avoid orphan inodes cleanup while replaying log"), because it marks
      the inode as an orphan instead of dropping any extents beyond i_size
      before replaying logged extents, so after the log replay, and while
      the mount operation is still ongoing, we find the inode marked as an
      orphan and then perform a truncation (drop extents beyond the inode's
      i_size). Because the processing of orphan inodes is still done
      right after replaying the log and before the mount operation finishes,
      the intention of that commit does not make any sense (at least as
      of today). However reverting that behaviour is not enough, because
      we can not simply discard all extents beyond i_size and then replay
      logged extents, because we risk dropping extents beyond i_size created
      in past transactions, for example:
      
        add prealloc extent beyond i_size
        fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
        transaction commit
        add another prealloc extent beyond i_size
        fsync - triggers the fast fsync path
        power failure
      
      In that scenario, we would drop the first extent and then replay the
      second one. To fix this just make sure that all prealloc extents
      beyond i_size are logged, and if we find too many (which is far from
      a common case), fallback to a full transaction commit (like we do when
      logging regular extents in the fast fsync path).
      
      Trivial reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
       $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
       $ sync
       $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
       $ xfs_io -c "fsync" /mnt/foo
       <power failure>
      
       # mount to replay log
       $ mount /dev/sdb /mnt
       # at this point the file only has one extent, at offset 0, size 256K
      
      A test case for fstests follows soon, covering multiple scenarios that
      involve adding prealloc extents with previous shrinking truncates and
      without such truncates.
      
      Fixes: c71bf099 ("Btrfs: Avoid orphan inodes cleanup while replaying log")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      471d557a
    • L
      Btrfs: clean up resources during umount after trans is aborted · af722733
      Liu Bo 提交于
      Currently if some fatal errors occur, like all IO get -EIO, resources
      would be cleaned up when
      a) transaction is being committed or
      b) BTRFS_FS_STATE_ERROR is set
      
      However, in some rare cases, resources may be left alone after transaction
      gets aborted and umount may run into some ASSERT(), e.g.
      ASSERT(list_empty(&block_group->dirty_list));
      
      For case a), in btrfs_commit_transaciton(), there're several places at the
      beginning where we just call btrfs_end_transaction() without cleaning up
      resources.  For case b), it is possible that the trans handle doesn't have
      any dirty stuff, then only trans hanlde is marked as aborted while
      BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
      
      This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
      all resources won't stay in memory after umount.
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      af722733
    • A
      ovl: add support for "xino" mount and config options · 795939a9
      Amir Goldstein 提交于
      With mount option "xino=on", mounter declares that there are enough
      free high bits in underlying fs to hold the layer fsid.
      If overlayfs does encounter underlying inodes using the high xino
      bits reserved for layer fsid, a warning will be emitted and the original
      inode number will be used.
      
      The mount option name "xino" goes after a similar meaning mount option
      of aufs, but in overlayfs case, the mapping is stateless.
      
      An example for a use case of "xino=on" is when upper/lower is on an xfs
      filesystem. xfs uses 64bit inode numbers, but it currently never uses the
      upper 8bit for inode numbers exposed via stat(2) and that is not likely to
      change in the future without user opting-in for a new xfs feature. The
      actual number of unused upper bit is much larger and determined by the xfs
      filesystem geometry (64 - agno_log - agblklog - inopblog). That means
      that for all practical purpose, there are enough unused bits in xfs
      inode numbers for more than OVL_MAX_STACK unique fsid's.
      
      Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode
      numbers are allocated sequentially since boot, so they will practially
      never use the high inode number bits.
      
      For compatibility with applications that expect 32bit inodes, the feature
      can be disabled with "xino=off". The option "xino=auto" automatically
      detects underlying filesystem that use 32bit inodes and enables the
      feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of
      the same name, determine if the default mode for overlayfs mount is
      "xino=auto" or "xino=off".
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      795939a9
    • A
      ovl: consistent d_ino for non-samefs with xino · adbf4f7e
      Amir Goldstein 提交于
      When overlay layers are not all on the same fs, but all inode numbers
      of underlying fs do not use the high 'xino' bits, overlay st_ino values
      are constant and persistent.
      
      In that case, relax non-samefs constraint for consistent d_ino and always
      iterate non-merge dir using ovl_fill_real() actor so we can remap lower
      inode numbers to unique lower fs range.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      adbf4f7e
    • A
      ovl: consistent i_ino for non-samefs with xino · 12574a9f
      Amir Goldstein 提交于
      When overlay layers are not all on the same fs, but all inode numbers
      of underlying fs do not use the high 'xino' bits, overlay st_ino values
      are constant and persistent.
      
      In that case, set i_ino value to the same value as st_ino for nfsd
      readdirplus validator.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      12574a9f
    • A
      ovl: constant st_ino for non-samefs with xino · e487d889
      Amir Goldstein 提交于
      On 64bit systems, when overlay layers are not all on the same fs, but
      all inode numbers of underlying fs are not using the high bits, use the
      high bits to partition the overlay st_ino address space.  The high bits
      hold the fsid (upper fsid is 0).  This way overlay inode numbers are unique
      and all inodes use overlay st_dev.  Inode numbers are also persistent
      for a given layer configuration.
      
      Currently, our only indication for available high ino bits is from a
      filesystem that supports file handles and uses the default encode_fh()
      operation, which encodes a 32bit inode number.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e487d889
    • A
      ovl: allocate anon bdev per unique lower fs · 5148626b
      Amir Goldstein 提交于
      Instead of allocating an anonymous bdev per lower layer, allocate
      one anonymous bdev per every unique lower fs that is different than
      upper fs.
      
      Every unique lower fs is assigned an fsid > 0 and the number of
      unique lower fs are stored in ofs->numlowerfs.
      
      The assigned fsid is stored in the lower layer struct and will be
      used also for inode number multiplexing.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5148626b
    • A
      ovl: factor out ovl_map_dev_ino() helper · da309e8c
      Amir Goldstein 提交于
      A helper for ovl_getattr() to map the values of st_dev and st_ino
      according to constant st_ino rules.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      da309e8c
    • M
      ovl: cleanup ovl_update_time() · 8f35cf51
      Miklos Szeredi 提交于
      No need to mess with an alias, the upperdentry can be retrieved directly
      from the overlay inode.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8f35cf51
    • M
      ovl: add WARN_ON() for non-dir redirect cases · 3a291774
      Miklos Szeredi 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3a291774
    • V
      ovl: cleanup setting OVL_INDEX · 0471a9cd
      Vivek Goyal 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0471a9cd
    • V
      ovl: set d->is_dir and d->opaque for last path element · 102b0d11
      Vivek Goyal 提交于
      Certain properties in ovl_lookup_data should be set only for the last
      element of the path. IOW, if we are calling ovl_lookup_single() for an
      absolute redirect, then d->is_dir and d->opaque do not make much sense
      for intermediate path elements. Instead set them only if dentry being
      lookup is last path element.
      
      As of now we do not seem to be making use of d->opaque if it is set for
      a path/dentry in lower. But just define the semantics so that future code
      can make use of this assumption.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      102b0d11
    • V
      ovl: Do not check for redirect if this is last layer · e9b77f90
      Vivek Goyal 提交于
      If we are looking in last layer, then there should not be any need to
      process redirect. redirect information is used only for lookup in next
      lower layer and there is no more lower layer to look into. So no need
      to process redirects.
      
      IOW, ignore redirects on lowest layer.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e9b77f90
    • A
      ovl: lookup in inode cache first when decoding lower file handle · 8b58924a
      Amir Goldstein 提交于
      When decoding a lower file handle, we need to check if lower file was
      copied up and indexed and if it has a whiteout index, we need to check
      if this is an unlinked but open non-dir before returning -ESTALE.
      
      To find out if this is an unlinked but open non-dir we need to lookup
      an overlay inode in inode cache by lower inode and that requires decoding
      the lower file handle before looking in inode cache.
      
      Before this change, if the lower inode turned out to be a directory, we
      may have paid an expensive cost to reconnect that lower directory for
      nothing.
      
      After this change, we start by decoding a disconnected lower dentry and
      using the lower inode for looking up an overlay inode in inode cache.
      If we find overlay inode and dentry in cache, we avoid the index lookup
      overhead. If we don't find an overlay inode and dentry in cache, then we
      only need to decode a connected lower dentry in case the lower dentry is
      a non-indexed directory.
      
      The xfstests group overlay/exportfs tests decoding overlayfs file
      handles after drop_caches with different states of the file at encode
      and decode time. Overall the tests in the group call ovl_lower_fh_to_d()
      89 times to decode a lower file handle.
      
      Before this change, the tests called ovl_get_index_fh() 75 times and
      reconnect_one() 61 times.
      After this change, the tests call ovl_get_index_fh() 70 times and
      reconnect_one() 59 times. The 2 cases where reconnect_one() was avoided
      are cases where a non-upper directory file handle was encoded, then the
      directory removed and then file handle was decoded.
      
      To demonstrate the affect on decoding file handles with hot inode/dentry
      cache, the drop_caches call in the tests was disabled. Without
      drop_caches, there are no reconnect_one() calls at all before or after
      the change. Before the change, there are 75 calls to ovl_get_index_fh(),
      exactly as the case with drop_caches. After the change, there are only
      10 calls to ovl_get_index_fh().
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8b58924a
    • A
      ovl: do not try to reconnect a disconnected origin dentry · 8a22efa1
      Amir Goldstein 提交于
      On lookup of non directory, we try to decode the origin file handle
      stored in upper inode. The origin file handle is supposed to be decoded
      to a disconnected non-dir dentry, which is fine, because we only need
      the lower inode of a copy up origin.
      
      However, if the origin file handle somehow turns out to be a directory
      we pay the expensive cost of reconnecting the directory dentry, only to
      get a mismatch file type and drop the dentry.
      
      Optimize this case by explicitly opting out of reconnecting the dentry.
      Opting-out of reconnect is done by passing a NULL acceptable callback
      to exportfs_decode_fh().
      
      While the case described above is a strange corner case that does not
      really need to be optimized, the API added for this optimization will
      be used by a following patch to optimize a more common case of decoding
      an overlayfs file handle.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8a22efa1
    • A
      ovl: disambiguate ovl_encode_fh() · 5b2cccd3
      Amir Goldstein 提交于
      Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the
      exportfs function ovl_encode_inode_fh() and change the latter to
      ovl_encode_fh() to match the exportfs method name.
      
      Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5b2cccd3
    • A
      ovl: set lower layer st_dev only if setting lower st_ino · 9f99e50d
      Amir Goldstein 提交于
      For broken hardlinks, we do not return lower st_ino, so we should
      also not return lower pseudo st_dev.
      
      Fixes: a0c5ad30 ("ovl: relax same fs constraint for constant st_ino")
      Cc: <stable@vger.kernel.org> #v4.15
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9f99e50d
    • A
      ovl: fix lookup with middle layer opaque dir and absolute path redirects · 3ec9b3fa
      Amir Goldstein 提交于
      As of now if we encounter an opaque dir while looking for a dentry, we set
      d->last=true. This means that there is no need to look further in any of
      the lower layers. This works fine as long as there are no redirets or
      relative redircts. But what if there is an absolute redirect on the
      children dentry of opaque directory. We still need to continue to look into
      next lower layer. This patch fixes it.
      
      Here is an example to demonstrate the issue. Say you have following setup.
      
      upper:  /redirect (redirect=/a/b/c)
      lower1: /a/[b]/c       ([b] is opaque) (c has absolute redirect=/a/b/d/)
      lower0: /a/b/d/foo
      
      Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d.
      Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look
      into lower0 because children c has an absolute redirect.
      
      Following is a reproducer.
      
      Watch me make foo disappear:
      
       $ mkdir lower middle upper work work2 merged
       $ mkdir lower/origin
       $ touch lower/origin/foo
       $ mount -t overlay none merged/ \
               -olowerdir=lower,upperdir=middle,workdir=work2
       $ mkdir merged/pure
       $ mv merged/origin merged/pure/redirect
       $ umount merged
       $ mount -t overlay none merged/ \
               -olowerdir=middle:lower,upperdir=upper,workdir=work
       $ mv merged/pure/redirect merged/redirect
      
      Now you see foo inside a twice redirected merged dir:
      
       $ ls merged/redirect
       foo
       $ umount merged
       $ mount -t overlay none merged/ \
               -olowerdir=middle:lower,upperdir=upper,workdir=work
      
      After mount cycle you don't see foo inside the same dir:
      
       $ ls merged/redirect
      
      During middle layer lookup, the opaqueness of middle/pure is left in
      the lookup state and then middle/pure/redirect is wrongly treated as
      opaque.
      
      Fixes: 02b69b28 ("ovl: lookup redirects")
      Cc: <stable@vger.kernel.org> #v4.10
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3ec9b3fa
    • V
      ovl: Set d->last properly during lookup · 452061fd
      Vivek Goyal 提交于
      d->last signifies that this is the last layer we are looking into and there
      is no more. And that means this allows for some optimzation opportunities
      during lookup. For example, in ovl_lookup_single() we don't have to check
      for opaque xattr of a directory is this is the last layer we are looking
      into (d->last = true).
      
      But knowing for sure whether we are looking into last layer can be very
      tricky. If redirects are not enabled, then we can look at poe->numlower and
      figure out if the lookup we are about to is last layer or not. But if
      redircts are enabled then it is possible poe->numlower suggests that we are
      looking in last layer, but there is an absolute redirect present in found
      element and that redirects us to a layer in root and that means lookup will
      continue in lower layers further.
      
      For example, consider following.
      
      /upperdir/pure (opaque=y)
      /upperdir/pure/foo (opaque=y,redirect=/bar)
      /lowerdir/bar
      
      In this case pure is "pure upper". When we look for "foo", that time
      poe->numlower=0. But that alone does not mean that we will not search for a
      merge candidate in /lowerdir. Absolute redirect changes that.
      
      IOW, d->last should not be set just based on poe->numlower if redirects are
      enabled. That can lead to setting d->last while it should not have and that
      means we will not check for opaque xattr while we should have.
      
      So do this.
      
       - If redirects are not enabled, then continue to rely on poe->numlower
         information to determine if it is last layer or not.
      
       - If redirects are enabled, then set d->last = true only if this is the
         last layer in root ovl_entry (roe).
      Suggested-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 02b69b28 ("ovl: lookup redirects")
      Cc: <stable@vger.kernel.org> #v4.10
      452061fd
    • A
      ovl: set i_ino to the value of st_ino for NFS export · 695b46e7
      Amir Goldstein 提交于
      Eddie Horng reported that readdir of an overlayfs directory that
      was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN.
      The reason is that while preparing the response for readdirplus, nfsd
      checks inside encode_entryplus_baggage() that a child dentry's inode
      number matches the value of d_ino returns by overlayfs readdir iterator.
      
      Because the overlayfs inodes use arbitrary inode numbers that are not
      correlated with the values of st_ino/d_ino, NFSv3 falls back to not
      encoding d_type. Although this is an allowed behavior, we can fix it for
      the case of all overlayfs layers on the same underlying filesystem.
      
      When NFS export is enabled and d_ino is consistent with st_ino
      (samefs), set the same value also to i_ino in ovl_fill_inode() for all
      overlayfs inodes, nfsd readdirplus sanity checks will pass.
      ovl_fill_inode() may be called from ovl_new_inode(), before real inode
      was created with ino arg 0. In that case, i_ino will be updated to real
      upper inode i_ino on ovl_inode_init() or ovl_inode_update().
      Reported-by: NEddie Horng <eddiehorng.tw@gmail.com>
      Tested-by: NEddie Horng <eddiehorng.tw@gmail.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Fixes: 8383f174 ("ovl: wire up NFS export operations")
      Cc: <stable@vger.kernel.org> #v4.16
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      695b46e7