1. 03 12月, 2014 9 次提交
    • M
      Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 · 4245215d
      Miao Xie 提交于
      The commit c404e0dc (Btrfs: fix use-after-free in the finishing
      procedure of the device replace) fixed a use-after-free problem
      which happened when removing the source device at the end of device
      replace, but at that time, btrfs didn't support device replace
      on raid56, so we didn't fix the problem on the raid56 profile.
      Currently, we implemented device replace for raid56, so we need
      kick that problem out before we enable that function for raid56.
      
      The fix method is very simple, we just increase the bio per-cpu
      counter before we submit a raid56 io, and decrease the counter
      when the raid56 io ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      4245215d
    • M
      Btrfs, replace: write raid56 parity into the replace target device · 76035976
      Miao Xie 提交于
      This function reused the code of parity scrub, and we just write
      the right parity or corrected parity into the target device before
      the parity scrub end.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      76035976
    • M
      Btrfs, replace: write dirty pages into the replace target device · 2c8cdd6e
      Miao Xie 提交于
      The implementation is simple:
      - In order to avoid changing the code logic of btrfs_map_bio and
        RAID56, we add the stripes of the replace target devices at the
        end of the stripe array in btrfs bio, and we sort those target
        device stripes in the array. And we keep the number of the target
        device stripes in the btrfs bio.
      - Except write operation on RAID56, all the other operation don't
        take the target device stripes into account.
      - When we do write operation, we read the data from the common devices
        and calculate the parity. Then write the dirty data and new parity
        out, at this time, we will find the relative replace target stripes
        and wirte the relative data into it.
      
      Note: The function that copying old data on the source device to
      the target device was implemented in the past, it is similar to
      the other RAID type.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2c8cdd6e
    • M
      Btrfs, raid56: support parity scrub on raid56 · 5a6ac9ea
      Miao Xie 提交于
      The implementation is:
      - Read and check all the data with checksum in the same stripe.
        All the data which has checksum is COW data, and we are sure
        that it is not changed though we don't lock the stripe. because
        the space of that data just can be reclaimed after the current
        transction is committed, and then the fs can use it to store the
        other data, but when doing scrub, we hold the current transaction,
        that is that data can not be recovered, it is safe that read and check
        it out of the stripe lock.
      - Lock the stripe
      - Read out all the data without checksum and parity
        The data without checksum and the parity may be changed if we don't
        lock the stripe, so we need read it in the stripe lock context.
      - Check the parity
      - Re-calculate the new parity and write back it if the old parity
        is not right
      - Unlock the stripe
      
      If we can not read out the data or the data we read is corrupted,
      we will try to repair it. If the repair fails. we will mark the
      horizontal sub-stripe(pages on the same horizontal) as corrupted
      sub-stripe, and we will skip the parity check and repair of that
      horizontal sub-stripe.
      
      And in order to skip the horizontal sub-stripe that has no data, we
      introduce a bitmap. If there is some data on the horizontal sub-stripe,
      we will the relative bit to 1, and when we check and repair the
      parity, we will skip those horizontal sub-stripes that the relative
      bits is 0.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      5a6ac9ea
    • M
      Btrfs, raid56: use a variant to record the operation type · 1b94b556
      Miao Xie 提交于
      We will introduce new operation type later, if we still use integer
      variant as bool variant to record the operation type, we would add new
      variant and increase the size of raid bio structure. It is not good,
      by this patch, we define different number for different operation,
      and we can just use a variant to record the operation type.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      1b94b556
    • M
      Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted · af8e2d1d
      Miao Xie 提交于
      This patch implement the RAID5/6 common data repair function, the
      implementation is similar to the scrub on the other RAID such as
      RAID1, the differentia is that we don't read the data from the
      mirror, we use the data repair function of RAID5/6.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      af8e2d1d
    • M
      Btrfs, raid56: don't change bbio and raid_map · b89e1b01
      Miao Xie 提交于
      Because we will reuse bbio and raid_map during the scrub later, it is
      better that we don't change any variant of bbio and don't free it at
      the end of IO request. So we introduced similar variants into the raid
      bio, and don't access those bbio's variants any more.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      b89e1b01
    • Z
      Btrfs: remove unnecessary code of stripe_index assignment in __btrfs_map_block · 6de65650
      Zhao Lei 提交于
      stripe_index's value was set again in latter line:
      stripe_index = 0;
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      6de65650
    • Z
      Btrfs: remove noused bbio_ret in __btrfs_map_block in condition · f90523d1
      Zhao Lei 提交于
      bbio_ret in this condition is always !NULL because previous code
      already have a check-and-skip:
      4908 if (!bbio_ret)
      4909     goto out;
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      f90523d1
  2. 20 11月, 2014 10 次提交
    • M
      ovl: ovl_dir_fsync() cleanup · 7676895f
      Miklos Szeredi 提交于
      Check against !OVL_PATH_LOWER instead of OVL_PATH_MERGE.  For a copied up
      directory the two are currently equivalent.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      7676895f
    • M
      ovl: pass dentry into ovl_dir_read_merged() · c9f00fdb
      Miklos Szeredi 提交于
      Pass dentry into ovl_dir_read_merged() insted of upperpath and lowerpath.
      This cleans up callers and paves the way for multi-layer directory reads.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      c9f00fdb
    • M
      ovl: use lockless_dereference() for upperdentry · 71d50928
      Miklos Szeredi 提交于
      Don't open code lockless_dereference() in ovl_upperdentry_dereference().
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      71d50928
    • M
      ovl: allow filenames with comma · 91c77947
      Miklos Szeredi 提交于
      Allow option separator (comma) to be escaped with backslash.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      91c77947
    • M
      ovl: fix race in private xattr checks · 52148463
      Miklos Szeredi 提交于
      Xattr operations can race with copy up.  This does not matter as long as
      we consistently fiter out "trunsted.overlay.opaque" attribute on upper
      directories.
      
      Previously we checked parent against OVL_PATH_MERGE.  This is too general,
      and prone to race with copy-up.  I.e. we found the parent to be on the
      lower layer but ovl_dentry_real() would return the copied-up dentry,
      possibly with the "opaque" attribute.
      
      So instead use ovl_path_real() and decide to filter the attributes based on
      the actual type of the dentry we'll use.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      52148463
    • M
      ovl: fix remove/copy-up race · a105d685
      Miklos Szeredi 提交于
      ovl_remove_and_whiteout() needs to check if upper dentry exists or not
      after having locked upper parent directory.
      
      Previously we used a "type" value computed before locking the upper parent
      directory, which is susceptible to racing with copy-up.
      
      There's a similar check in ovl_check_empty_and_clear().  This one is not
      actually racy, since copy-up doesn't change the "emptyness" property of a
      directory.  Add a comment to this effect, and check the existence of upper
      dentry locally to make the code cleaner.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      a105d685
    • M
      ovl: rename filesystem type to "overlay" · ef94b186
      Miklos Szeredi 提交于
      Some distributions carry an "old" format of overlayfs while mainline has a
      "new" format.
      
      The distros will possibly want to keep the old overlayfs alongside the new
      for compatibility reasons.
      
      To make it possible to differentiate the two versions change the name of
      the new one from "overlayfs" to "overlay".
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reported-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      ef94b186
    • C
      btrfs: fix lockups from btrfs_clear_path_blocking · f82c458a
      Chris Mason 提交于
      The fair reader/writer locks mean that btrfs_clear_path_blocking needs
      to strictly follow lock ordering rules even when we already have
      blocking locks on a given path.
      
      Before we can clear a blocking lock on the path, we need to make sure
      all of the locks have been converted to blocking.  This will remove lock
      inversions against anyone spinning in write_lock() against the buffers
      we're trying to get read locks on.  These inversions didn't exist before
      the fair read/writer locks, but now we need to be more careful.
      
      We papered over this deadlock in the past by changing
      btrfs_try_read_lock() to be a true trylock against both the spinlock and
      the blocking lock.  This was slower, and not sufficient to fix all the
      deadlocks.  This patch adds a btrfs_tree_read_lock_atomic(), which
      basically means get the spinlock but trylock on the blocking lock.
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reported-by: NPatrick Schmid <schmid@phys.ethz.ch>
      cc: stable@vger.kernel.org #v3.15+
      f82c458a
    • A
      isofs: avoid unused function warning · 7ca2f234
      Arnd Bergmann 提交于
      With the isofs_hash() function removed, isofs_hash_ms() is the only user
      of isofs_hash_common(), but it's defined inside of an #ifdef, which triggers
      this gcc warning in ARM axm55xx_defconfig starting with v3.18-rc3:
      
      fs/isofs/inode.c:177:1: warning: 'isofs_hash_common' defined but not used [-Wunused-function]
      
      This patch moves the function inside of the same #ifdef section to avoid that
      warning, which seems the best compromise of a relatively harmless patch for
      a late -rc.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: b0afd8e5 ("isofs: don't bother with ->d_op for normal case")
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7ca2f234
    • Y
      vfs: fix reference leak in d_prune_aliases() · 4a7795d3
      Yan, Zheng 提交于
      In "d_prune_alias(): just lock the parent and call __dentry_kill()" the old
      dget + d_drop + dput has been replaced with lock_parent + __dentry_kill;
      unfortunately, dput() does more than just killing dentry - it also drops the
      reference to parent.  New variant leaks that reference and needs dput(parent)
      after killing the child off.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4a7795d3
  3. 14 11月, 2014 2 次提交
  4. 13 11月, 2014 11 次提交
  5. 07 11月, 2014 6 次提交
    • D
      xfs: track bulkstat progress by agino · 00275899
      Dave Chinner 提交于
      The bulkstat main loop progress is tracked by the "lastino"
      variable, which is a full 64 bit inode. However, the loop actually
      works on agno/agino pairs, and so there's a significant disconnect
      between the rest of the loop and the main cursor. Convert this to
      use the agino, and pass the agino into the chunk formatting function
      and convert it too.
      
      This gets rid of the inconsistency in the loop processing, and
      finally makes it simple for us to skip inodes at any point in the
      loop simply by incrementing the agino cursor.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      00275899
    • D
      xfs: bulkstat error handling is broken · febe3cbe
      Dave Chinner 提交于
      The error propagation is a horror - xfs_bulkstat() returns
      a rval variable which is only set if there are formatter errors. Any
      sort of btree walk error or corruption will cause the bulkstat walk
      to terminate but will not pass an error back to userspace. Worse
      is the fact that formatter errors will also be ignored if any inodes
      were correctly formatted into the user buffer.
      
      Hence bulkstat can fail badly yet still report success to userspace.
      This causes significant issues with xfsdump not dumping everything
      in the filesystem yet reporting success. It's not until a restore
      fails that there is any indication that the dump was bad and tha
      bulkstat failed. This patch now triggers xfsdump to fail with
      bulkstat errors rather than silently missing files in the dump.
      
      This now causes bulkstat to fail when the lastino cookie does not
      fall inside an existing inode chunk. The pre-3.17 code tolerated
      that error by allowing the code to move to the next inode chunk
      as the agino target is guaranteed to fall into the next btree
      record.
      
      With the fixes up to this point in the series, xfsdump now passes on
      the troublesome filesystem image that exposes all these bugs.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      febe3cbe
    • D
      xfs: bulkstat main loop logic is a mess · 6e57c542
      Dave Chinner 提交于
      There are a bunch of variables tha tare more wildy scoped than they
      need to be, obfuscated user buffer checks and tortured "next inode"
      tracking. This all needs cleaning up to expose the real issues that
      need fixing.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6e57c542
    • D
      xfs: bulkstat chunk-formatter has issues · 2b831ac6
      Dave Chinner 提交于
      The loop construct has issues:
      	- clustidx is completely unused, so remove it.
      	- the loop tries to be smart by terminating when the
      	  "freecount" tells it that all inodes are free. Just drop
      	  it as in most cases we have to scan all inodes in the
      	  chunk anyway.
      	- move the "user buffer left" condition check to the only
      	  point where we consume space int eh user buffer.
      	- move the initialisation of agino out of the loop, leaving
      	  just a simple loop control logic using the clusteridx.
      
      Also, double handling of the user buffer variables leads to problems
      tracking the current state - use the cursor variables directly
      rather than keeping local copies and then having to update the
      cursor before returning.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2b831ac6
    • D
      xfs: bulkstat chunk formatting cursor is broken · bf4a5af2
      Dave Chinner 提交于
      The xfs_bulkstat_agichunk formatting cursor takes buffer values from
      the main loop and passes them via the structure to the chunk
      formatter, and the writes the changed values back into the main loop
      local variables. Unfortunately, this complex dance is full of corner
      cases that aren't handled correctly.
      
      The biggest problem is that it is double handling the information in
      both the main loop and the chunk formatting function, leading to
      inconsistent updates and endless loops where progress is not made.
      
      To fix this, push the struct xfs_bulkstat_agichunk outwards to be
      the primary holder of user buffer information. this removes the
      double handling in the main loop.
      
      Also, pass the last inode processed by the chunk formatter as a
      separate parameter as it purely an output variable and is not
      related to the user buffer consumption cursor.
      
      Finally, the chunk formatting code is not shared by anyone, so make
      it local to xfs_itable.c.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bf4a5af2
    • D
      xfs: bulkstat btree walk doesn't terminate · afa947cb
      Dave Chinner 提交于
      The bulkstat code has several different ways of detecting the end of
      an AG when doing a walk. They are not consistently detected, and the
      code that checks for the end of AG conditions is not consistently
      coded. Hence the are conditions where the walk code can get stuck in
      an endless loop making no progress and not triggering any
      termination conditions.
      
      Convert all the "tmp/i" status return codes from btree operations
      to a common name (stat) and apply end-of-ag detection to these
      operations consistently.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      afa947cb
  6. 06 11月, 2014 1 次提交
  7. 05 11月, 2014 1 次提交