1. 30 10月, 2019 10 次提交
  2. 21 10月, 2019 1 次提交
  3. 06 9月, 2019 1 次提交
    • D
      xfs: prevent CIL push holdoff in log recovery · 8ab39f11
      Dave Chinner 提交于
      generic/530 on a machine with enough ram and a non-preemptible
      kernel can run the AGI processing phase of log recovery enitrely out
      of cache. This means it never blocks on locks, never waits for IO
      and runs entirely through the unlinked lists until it either
      completes or blocks and hangs because it has run out of log space.
      
      It runs out of log space because the background CIL push is
      scheduled but never runs. queue_work() queues the CIL work on the
      current CPU that is busy, and the workqueue code will not run it on
      any other CPU. Hence if the unlinked list processing never yields
      the CPU voluntarily, the push work is delayed indefinitely. This
      results in the CIL aggregating changes until all the log space is
      consumed.
      
      When the log recoveyr processing evenutally blocks, the CIL flushes
      but because the last iclog isn't submitted for IO because it isn't
      full, the CIL flush never completes and nothing ever moves the log
      head forwards, or indeed inserts anything into the tail of the log,
      and hence nothing is able to get the log moving again and recovery
      hangs.
      
      There are several problems here, but the two obvious ones from
      the trace are that:
      	a) log recovery does not yield the CPU for over 4 seconds,
      	b) binding CIL pushes to a single CPU is a really bad idea.
      
      This patch addresses just these two aspects of the problem, and are
      suitable for backporting to work around any issues in older kernels.
      The more fundamental problem of preventing the CIL from consuming
      more than 50% of the log without committing will take more invasive
      and complex work, so will be done as followup work.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      8ab39f11
  4. 30 8月, 2019 1 次提交
    • D
      fs: Fill in max and min timestamps in superblock · 22b13969
      Deepa Dinamani 提交于
      Fill in the appropriate limits to avoid inconsistencies
      in the vfs cached inode times when timestamps are
      outside the permitted range.
      
      Even though some filesystems are read-only, fill in the
      timestamps to reflect the on-disk representation.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-By: NTigran Aivazian <aivazian.tigran@gmail.com>
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Cc: aivazian.tigran@gmail.com
      Cc: al@alarsen.net
      Cc: coda@cs.cmu.edu
      Cc: darrick.wong@oracle.com
      Cc: dushistov@mail.ru
      Cc: dwmw2@infradead.org
      Cc: hch@infradead.org
      Cc: jack@suse.com
      Cc: jaharkes@cs.cmu.edu
      Cc: luisbg@kernel.org
      Cc: nico@fluxnic.net
      Cc: phillip@squashfs.org.uk
      Cc: richard@nod.at
      Cc: salah.triki@gmail.com
      Cc: shaggy@kernel.org
      Cc: linux-xfs@vger.kernel.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: reiserfs-devel@vger.kernel.org
      22b13969
  5. 29 6月, 2019 3 次提交
  6. 12 6月, 2019 1 次提交
  7. 02 5月, 2019 1 次提交
  8. 30 4月, 2019 1 次提交
  9. 27 4月, 2019 2 次提交
  10. 17 4月, 2019 1 次提交
  11. 07 4月, 2019 1 次提交
    • C
      block: remove CONFIG_LBDAF · 72deb455
      Christoph Hellwig 提交于
      Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
      architectures.  These types are required to support block device and/or
      file sizes larger than 2 TiB, and have generally defaulted to on for
      a long time.  Enabling the option only increases the i386 tinyconfig
      size by 145 bytes, and many data structures already always use
      64-bit values for their in-core and on-disk data structures anyway,
      so there should not be a large change in dynamic memory usage either.
      
      Dropping this option removes a somewhat weird non-default config that
      has cause various bugs or compiler warnings when actually used.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      72deb455
  12. 21 2月, 2019 1 次提交
    • C
      xfs: introduce an always_cow mode · 66ae56a5
      Christoph Hellwig 提交于
      Add a mode where XFS never overwrites existing blocks in place.  This
      is to aid debugging our COW code, and also put infatructure in place
      for things like possible future support for zoned block devices, which
      can't support overwrites.
      
      This mode is enabled globally by doing a:
      
          echo 1 > /sys/fs/xfs/debug/always_cow
      
      Note that the parameter is global to allow running all tests in xfstests
      easily in this mode, which would not easily be possible with a per-fs
      sysfs file.
      
      In always_cow mode persistent preallocations are disabled, and fallocate
      will fail when called with a 0 mode (with our without
      FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
      when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.
      
      There are a few interesting xfstests failures when run in always_cow
      mode:
      
       - generic/392 fails because the bytes used in the file used to test
         hole punch recovery are less after the log replay.  This is
         because the blocks written and then punched out are only freed
         with a delay due to the logging mechanism.
       - xfs/170 will fail as the already fragile file streams mechanism
         doesn't seem to interact well with the COW allocator
       - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
         the file system is badly fragmented, but there is not much we
         can do to avoid that when always writing out of place
       - xfs/205 fails because overwriting a file in always_cow mode
         will require new space allocation and the assumption in the
         test thus don't work anymore.
       - xfs/326 fails to modify the file at all in always_cow mode after
         injecting the refcount error, leading to an unexpected md5sum
         after the remount, but that again is expected
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      66ae56a5
  13. 15 2月, 2019 1 次提交
  14. 13 12月, 2018 2 次提交
  15. 18 10月, 2018 2 次提交
  16. 30 7月, 2018 1 次提交
  17. 27 7月, 2018 1 次提交
  18. 12 7月, 2018 1 次提交
  19. 09 6月, 2018 1 次提交
  20. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  21. 31 5月, 2018 3 次提交
  22. 16 5月, 2018 3 次提交
    • D
      xfs: clear sb->s_fs_info on mount failure · c9fbd7bb
      Dave Chinner 提交于
      We recently had an oops reported on a 4.14 kernel in
      xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
      and so the m_perag_tree lookup walked into lala land.
      
      Essentially, the machine was under memory pressure when the mount
      was being run, xfs_fs_fill_super() failed after allocating the
      xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and
      freed the xfs_mount, but the sb->s_fs_info field still pointed to
      the freed memory. Hence when the superblock shrinker then ran
      it fell off the bad pointer.
      
      With the superblock shrinker problem fixed at teh VFS level, this
      stale s_fs_info pointer is still a problem - we use it
      unconditionally in ->put_super when the superblock is being torn
      down, and hence we can still trip over it after a ->fill_super
      call failure. Hence we need to clear s_fs_info if
      xfs-fs_fill_super() fails, and we need to check if it's valid in
      the places it can potentially be dereferenced after a ->fill_super
      failure.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c9fbd7bb
    • D
      xfs: add mount delay debug option · dae5cd81
      Dave Chinner 提交于
      Similar to log_recovery_delay, this delay occurs between the VFS
      superblock being initialised and the xfs_mount being fully
      initialised. It also poisons the per-ag radix tree node so that it
      can be used for triggering shrinker races during mount
      such as the following:
      
      <run memory pressure workload in background>
      
      $ cat dirty-mount.sh
      #! /bin/bash
      
      umount -f /dev/pmem0
      mkfs.xfs -f /dev/pmem0
      mount /dev/pmem0 /mnt/test
      rm -f /mnt/test/foo
      xfs_io -fxc "pwrite 0 4k" -c fsync -c "shutdown" /mnt/test/foo
      umount /dev/pmem0
      
      # let's crash it now!
      echo 30 > /sys/fs/xfs/debug/mount_delay
      mount /dev/pmem0 /mnt/test
      echo 0 > /sys/fs/xfs/debug/mount_delay
      umount /dev/pmem0
      $ sudo ./dirty-mount.sh
      .....
      [   60.378118] CPU: 3 PID: 3577 Comm: fs_mark Tainted: G      D W        4.16.0-rc5-dgc #440
      [   60.378120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [   60.378124] RIP: 0010:radix_tree_next_chunk+0x76/0x320
      [   60.378127] RSP: 0018:ffffc9000276f4f8 EFLAGS: 00010282
      [   60.383670] RAX: a5a5a5a5a5a5a5a4 RBX: 0000000000000010 RCX: 000000000000001a
      [   60.385277] RDX: 0000000000000000 RSI: ffffc9000276f540 RDI: 0000000000000000
      [   60.386554] RBP: 0000000000000000 R08: 0000000000000000 R09: a5a5a5a5a5a5a5a5
      [   60.388194] R10: 0000000000000006 R11: 0000000000000001 R12: ffffc9000276f598
      [   60.389288] R13: 0000000000000040 R14: 0000000000000228 R15: ffff880816cd6458
      [   60.390827] FS:  00007f5c124b9740(0000) GS:ffff88083fc00000(0000) knlGS:0000000000000000
      [   60.392253] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   60.393423] CR2: 00007f5c11bba0b8 CR3: 000000035580e001 CR4: 00000000000606e0
      [   60.394519] Call Trace:
      [   60.395252]  radix_tree_gang_lookup_tag+0xc4/0x130
      [   60.395948]  xfs_perag_get_tag+0x37/0xf0
      [   60.396522]  xfs_reclaim_inodes_count+0x32/0x40
      [   60.397178]  xfs_fs_nr_cached_objects+0x11/0x20
      [   60.397837]  super_cache_count+0x35/0xc0
      [   60.399159]  shrink_slab.part.66+0xb1/0x370
      [   60.400194]  shrink_node+0x7e/0x1a0
      [   60.401058]  try_to_free_pages+0x199/0x470
      [   60.402081]  __alloc_pages_slowpath+0x3a1/0xd20
      [   60.403729]  __alloc_pages_nodemask+0x1c3/0x200
      [   60.404941]  cache_grow_begin+0x20b/0x2e0
      [   60.406164]  fallback_alloc+0x160/0x200
      [   60.407088]  kmem_cache_alloc+0x111/0x4e0
      [   60.408038]  ? xfs_buf_rele+0x61/0x430
      [   60.408925]  kmem_zone_alloc+0x61/0xe0
      [   60.409965]  xfs_inode_alloc+0x24/0x1d0
      .....
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      dae5cd81
    • D
      xfs: halt auto-reclamation activities while rebuilding rmap · d6b636eb
      Darrick J. Wong 提交于
      Rebuilding the reverse-mapping tree requires us to quiesce all inodes in
      the filesystem, so we must stop background reclamation of post-EOF and
      CoW prealloc blocks.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      d6b636eb