1. 29 11月, 2022 3 次提交
    • D
      xfs: add debug knob to slow down write for fun · 254e3459
      Darrick J. Wong 提交于
      Add a new error injection knob so that we can arbitrarily slow down
      pagecache writes to test for race conditions and aberrant reclaim
      behavior if the writeback mechanisms are slow to issue writeback.  This
      will enable functional testing for the ifork sequence counters
      introduced in commit 304a68b9 ("xfs: use iomap_valid method to
      detect stale cached iomaps") that fixes write racing with reclaim
      writeback.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      254e3459
    • D
      xfs: add debug knob to slow down writeback for fun · c2beff99
      Darrick J. Wong 提交于
      Add a new error injection knob so that we can arbitrarily slow down
      writeback to test for race conditions and aberrant reclaim behavior if
      the writeback mechanisms are slow to issue writeback.  This will enable
      functional testing for the ifork sequence counters introduced in commit
      745b3f76 ("xfs: maintain a sequence count for inode fork
      manipulations").
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      c2beff99
    • D
      xfs: drop write error injection is unfixable, remove it · 6e8af15c
      Dave Chinner 提交于
      With the changes to scan the page cache for dirty data to avoid data
      corruptions from partial write cleanup racing with other page cache
      operations, the drop writes error injection no longer works the same
      way it used to and causes xfs/196 to fail. This is because xfs/196
      writes to the file and populates the page cache before it turns on
      the error injection and starts failing -overwrites-.
      
      The result is that the original drop-writes code failed writes only
      -after- overwriting the data in the cache, followed by invalidates
      the cached data, then punching out the delalloc extent from under
      that data.
      
      On the surface, this looks fine. The problem is that page cache
      invalidation *doesn't guarantee that it removes anything from the
      page cache* and it doesn't change the dirty state of the folio. When
      block size == page size and we do page aligned IO (as xfs/196 does)
      everything happens to align perfectly and page cache invalidation
      removes the single page folios that span the written data. Hence the
      followup delalloc punch pass does not find cached data over that
      range and it can punch the extent out.
      
      IOWs, xfs/196 "works" for block size == page size with the new
      code. I say "works", because it actually only works for the case
      where IO is page aligned, and no data was read from disk before
      writes occur. Because the moment we actually read data first, the
      readahead code allocates multipage folios and suddenly the
      invalidate code goes back to zeroing subfolio ranges without
      changing dirty state.
      
      Hence, with multipage folios in play, block size == page size is
      functionally identical to block size < page size behaviour, and
      drop-writes is manifestly broken w.r.t to this case. Invalidation of
      a subfolio range doesn't result in the folio being removed from the
      cache, just the range gets zeroed. Hence after we've sequentially
      walked over a folio that we've dirtied (via write data) and then
      invalidated, we end up with a dirty folio full of zeroed data.
      
      And because the new code skips punching ranges that have dirty
      folios covering them, we end up leaving the delalloc range intact
      after failing all the writes. Hence failed writes now end up
      writing zeroes to disk in the cases where invalidation zeroes folios
      rather than removing them from cache.
      
      This is a fundamental change of behaviour that is needed to avoid
      the data corruption vectors that exist in the old write fail path,
      and it renders the drop-writes injection non-functional and
      unworkable as it stands.
      
      As it is, I think the error injection is also now unnecessary, as
      partial writes that need delalloc extent are going to be a lot more
      common with stale iomap detection in place. Hence this patch removes
      the drop-writes error injection completely. xfs/196 can remain for
      testing kernels that don't have this data corruption fix, but those
      that do will report:
      
      xfs/196 3s ... [not run] XFS error injection drop_writes unknown on this kernel.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      6e8af15c
  2. 21 10月, 2022 1 次提交
    • Z
      xfs: fix memory leak in xfs_errortag_init · cf4f4c12
      Zeng Heng 提交于
      When `xfs_sysfs_init` returns failed, `mp->m_errortag` needs to free.
      Otherwise kmemleak would report memory leak after mounting xfs image:
      
      unreferenced object 0xffff888101364900 (size 192):
        comm "mount", pid 13099, jiffies 4294915218 (age 335.207s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f08ad25c>] __kmalloc+0x41/0x1b0
          [<00000000dca9aeb6>] kmem_alloc+0xfd/0x430
          [<0000000040361882>] xfs_errortag_init+0x20/0x110
          [<00000000b384a0f6>] xfs_mountfs+0x6ea/0x1a30
          [<000000003774395d>] xfs_fs_fill_super+0xe10/0x1a80
          [<000000009cf07b6c>] get_tree_bdev+0x3e7/0x700
          [<00000000046b5426>] vfs_get_tree+0x8e/0x2e0
          [<00000000952ec082>] path_mount+0xf8c/0x1990
          [<00000000beb1f838>] do_mount+0xee/0x110
          [<000000000e9c41bb>] __x64_sys_mount+0x14b/0x1f0
          [<00000000f7bb938e>] do_syscall_64+0x3b/0x90
          [<000000003fcd67a9>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: c6840101 ("xfs: expose errortag knobs via sysfs")
      Signed-off-by: NZeng Heng <zengheng4@huawei.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      cf4f4c12
  3. 12 10月, 2022 1 次提交
    • J
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld 提交于
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
  4. 11 5月, 2022 3 次提交
  5. 07 1月, 2022 1 次提交
  6. 20 8月, 2021 1 次提交
  7. 26 3月, 2021 2 次提交
  8. 23 1月, 2021 2 次提交
  9. 07 5月, 2020 1 次提交
  10. 12 3月, 2020 2 次提交
  11. 19 11月, 2019 1 次提交
  12. 06 11月, 2019 1 次提交
  13. 05 11月, 2019 2 次提交
  14. 27 8月, 2019 1 次提交
  15. 29 6月, 2019 2 次提交
  16. 12 2月, 2019 2 次提交
    • D
      xfs: cache unlinked pointers in an rhashtable · 9b247179
      Darrick J. Wong 提交于
      Use a rhashtable to cache the unlinked list incore.  This should speed
      up unlinked processing considerably when there are a lot of inodes on
      the unlinked list because iunlink_remove no longer has to traverse an
      entire bucket list to find which inode points to the one being removed.
      
      The incore list structure records "X.next_unlinked = Y" relations, with
      the rhashtable using Y to index the records.  This makes finding the
      inode X that points to a inode Y very quick.  If our cache fails to find
      anything we can always fall back on the old method.
      
      FWIW this drastically reduces the amount of time it takes to remove
      inodes from the unlinked list.  I wrote a program to open a lot of
      O_TMPFILE files and then close them in the same order, which takes
      a very long time if we have to traverse the unlinked lists.  With the
      ptach, I see:
      
      + /d/t/tmpfile/tmpfile
      Opened 193531 files in 6.33s.
      Closed 193531 files in 5.86s
      
      real    0m12.192s
      user    0m0.064s
      sys     0m11.619s
      + cd /
      + umount /mnt
      
      real    0m0.050s
      user    0m0.004s
      sys     0m0.030s
      
      And without the patch:
      
      + /d/t/tmpfile/tmpfile
      Opened 193588 files in 6.35s.
      Closed 193588 files in 751.61s
      
      real    12m38.853s
      user    0m0.084s
      sys     12m34.470s
      + cd /
      + umount /mnt
      
      real    0m0.086s
      user    0m0.000s
      sys     0m0.060s
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      9b247179
    • M
      xfs: Introduce XFS_PTAG_VERIFIER_ERROR panic mask · d519da41
      Marco Benatto 提交于
      Currently we have a few PTAGs in place allowing us to transform a filesystem
      error in a BUG() call.  However, we don't have a panic tag for corrupt
      metadata, so introduce XFS_PTAG_VERIFIER_ERROR so that the administrator can
      use the fs.xfs.panic_mask sysctl knob to convert any error detected by buffer
      verifiers into a kernel panic.
      Signed-off-by: NMarco Benatto <mbenatto@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      [darrick: light editing of commit message]
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d519da41
  17. 24 7月, 2018 1 次提交
  18. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  19. 05 6月, 2018 1 次提交
  20. 16 5月, 2018 1 次提交
  21. 24 3月, 2018 1 次提交
  22. 29 1月, 2018 1 次提交
  23. 09 1月, 2018 4 次提交
  24. 07 11月, 2017 1 次提交
  25. 02 11月, 2017 1 次提交
  26. 27 10月, 2017 1 次提交
    • B
      xfs: buffer lru reference count error injection tag · 7561d27e
      Brian Foster 提交于
      XFS uses a fixed reference count for certain types of buffers in the
      internal LRU cache. These reference counts dictate how aggressively
      certain buffers are reclaimed vs. others. While the reference counts
      implements priority across different buffer types, all buffers
      (other than uncached buffers) are typically cached for at least one
      reclaim cycle.
      
      We've had at least one bug recently that has been hidden by a
      released buffer sitting around in the LRU. Users hitting the problem
      were able to reproduce under enough memory pressure to cause
      aggressive reclaim in a particular window of time.
      
      To support future xfstests cases, add an error injection tag to
      hardcode the buffer reference count to zero. When enabled, this
      bypasses caching of associated buffers and facilitates test cases
      that depend on this behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7561d27e
  27. 26 9月, 2017 1 次提交