1. 29 3月, 2023 1 次提交
    • D
      xfs, iomap: limit individual ioend chain lengths in writeback · c5883137
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc3
      commit ebb7fb15
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebb7fb1557b1d03b906b668aa2164b51e6b7d19a
      
      --------------------------------
      
      Trond Myklebust reported soft lockups in XFS IO completion such as
      this:
      
       watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
       CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
       Workqueue: xfs-conv/md127 xfs_end_io [xfs]
       RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
       Call Trace:
        wake_up_page_bit+0x8a/0x110
        iomap_finish_ioend+0xd7/0x1c0
        iomap_finish_ioends+0x7f/0xb0
        xfs_end_ioend+0x6b/0x100 [xfs]
        xfs_end_io+0xb9/0xe0 [xfs]
        process_one_work+0x1a7/0x360
        worker_thread+0x1fa/0x390
        kthread+0x116/0x130
        ret_from_fork+0x35/0x40
      
      Ioends are processed as an atomic completion unit when all the
      chained bios in the ioend have completed their IO. Logically
      contiguous ioends can also be merged and completed as a single,
      larger unit.  Both of these things can be problematic as both the
      bio chains per ioend and the size of the merged ioends processed as
      a single completion are both unbound.
      
      If we have a large sequential dirty region in the page cache,
      write_cache_pages() will keep feeding us sequential pages and we
      will keep mapping them into ioends and bios until we get a dirty
      page at a non-sequential file offset. These large sequential runs
      can will result in bio and ioend chaining to optimise the io
      patterns. The pages iunder writeback are pinned within these chains
      until the submission chaining is broken, allowing the entire chain
      to be completed. This can result in huge chains being processed
      in IO completion context.
      
      We get deep bio chaining if we have large contiguous physical
      extents. We will keep adding pages to the current bio until it is
      full, then we'll chain a new bio to keep adding pages for writeback.
      Hence we can build bio chains that map millions of pages and tens of
      gigabytes of RAM if the page cache contains big enough contiguous
      dirty file regions. This long bio chain pins those pages until the
      final bio in the chain completes and the ioend can iterate all the
      chained bios and complete them.
      
      OTOH, if we have a physically fragmented file, we end up submitting
      one ioend per physical fragment that each have a small bio or bio
      chain attached to them. We do not chain these at IO submission time,
      but instead we chain them at completion time based on file
      offset via iomap_ioend_try_merge(). Hence we can end up with unbound
      ioend chains being built via completion merging.
      
      XFS can then do COW remapping or unwritten extent conversion on that
      merged chain, which involves walking an extent fragment at a time
      and running a transaction to modify the physical extent information.
      IOWs, we merge all the discontiguous ioends together into a
      contiguous file range, only to then process them individually as
      discontiguous extents.
      
      This extent manipulation is computationally expensive and can run in
      a tight loop, so merging logically contiguous but physically
      discontigous ioends gains us nothing except for hiding the fact the
      fact we broke the ioends up into individual physical extents at
      submission and then need to loop over those individual physical
      extents at completion.
      
      Hence we need to have mechanisms to limit ioend sizes and
      to break up completion processing of large merged ioend chains:
      
      1. bio chains per ioend need to be bound in length. Pure overwrites
      go straight to iomap_finish_ioend() in softirq context with the
      exact bio chain attached to the ioend by submission. Hence the only
      way to prevent long holdoffs here is to bound ioend submission
      sizes because we can't reschedule in softirq context.
      
      2. iomap_finish_ioends() has to handle unbound merged ioend chains
      correctly. This relies on any one call to iomap_finish_ioend() being
      bound in runtime so that cond_resched() can be issued regularly as
      the long ioend chain is processed. i.e. this relies on mechanism #1
      to limit individual ioend sizes to work correctly.
      
      3. filesystems have to loop over the merged ioends to process
      physical extent manipulations. This means they can loop internally,
      and so we break merging at physical extent boundaries so the
      filesystem can easily insert reschedule points between individual
      extent manipulations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reported-and-tested-by: NTrond Myklebust <trondmy@hammerspace.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Conflicts:
      	include/linux/iomap.h
      	fs/iomap/buffered-io.c
      	fs/xfs/xfs_aops.c
      
      	[ 6e552494 ("iomap: remove unused private field from ioend")
      	  is not applied.
      	  95c4cd05 ("iomap: Convert to_iomap_page to take a folio") is
      	  not applied.
      	  8ffd74e9 ("iomap: Convert bio completions to use folios") is
      	  not applied.
      	  044c6449 ("xfs: drop unused ioend private merge and
      	  setfilesize code") is not applied. ]
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      c5883137
  2. 18 1月, 2023 1 次提交
  3. 27 9月, 2022 1 次提交
  4. 15 11月, 2021 1 次提交
  5. 30 10月, 2021 1 次提交
  6. 21 10月, 2021 1 次提交
  7. 19 10月, 2021 1 次提交
    • X
      mm/swap: consider max pages in iomap_swapfile_add_extent · c32d79ae
      Xu Yu 提交于
      stable inclusion
      from stable-5.10.65
      commit 9295566a136cc26a3f6166eff6e2f5525432f73b
      bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9295566a136cc26a3f6166eff6e2f5525432f73b
      
      --------------------------------
      
      [ Upstream commit 36ca7943 ]
      
      When the max pages (last_page in the swap header + 1) is smaller than
      the total pages (inode size) of the swapfile, iomap_swapfile_activate
      overwrites sis->max with total pages.
      
      However, frontswap_map is a swap page state bitmap allocated using the
      initial sis->max page count read from the swap header.  If swapfile
      activation increases sis->max, it's possible for the frontswap code to
      walk off the end of the bitmap, thereby corrupting kernel memory.
      
      [djwong: modify the description a bit; the original paragraph reads:
      
      "However, frontswap_map is allocated using max pages. When test and clear
      the sis offset, which is larger than max pages, of frontswap_map in
      __frontswap_invalidate_page(), neighbors of frontswap_map may be
      overwritten, i.e., slab is polluted."
      
      Note also that this bug resulted in a behavioral change: activating a
      swap file that was formatted and later extended results in all pages
      being activated, not the number of pages recorded in the swap header.]
      
      This fixes the issue by considering the limitation of max pages of swap
      info in iomap_swapfile_add_extent().
      
      To reproduce the case, compile kernel with slub RED ZONE, then run test:
      $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \
       --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
       --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
       --temp-path /root/tmpdir/stress-ng/
      
      We'll get the error log as below:
      
      [ 1151.015141] =============================================================================
      [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
      [ 1151.017486] -----------------------------------------------------------------------------
      [ 1151.017486]
      [ 1151.018997] Disabling lock debugging due to kernel taint
      [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
      [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
      [ 1151.022538]  __slab_alloc+0xe/0x20
      [ 1151.023069]  __kmalloc_node+0xfd/0x4b0
      [ 1151.023704]  __do_sys_swapon+0xcf6/0x1170
      [ 1151.024346]  do_syscall_64+0x33/0x40
      [ 1151.024925]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
      [ 1151.026889]  kfree+0x276/0x2b0
      [ 1151.027405]  put_cred_rcu+0xa1/0xc0
      [ 1151.027949]  rcu_do_batch+0x17d/0x410
      [ 1151.028566]  rcu_core+0x14e/0x2b0
      [ 1151.029084]  __do_softirq+0x101/0x29e
      [ 1151.029645]  asm_call_irq_on_stack+0x12/0x20
      [ 1151.030381]  do_softirq_own_stack+0x37/0x40
      [ 1151.031037]  do_softirq.part.15+0x2b/0x30
      [ 1151.031710]  __local_bh_enable_ip+0x4b/0x50
      [ 1151.032412]  copy_fpstate_to_sigframe+0x111/0x360
      [ 1151.033197]  __setup_rt_frame+0xce/0x480
      [ 1151.033809]  arch_do_signal+0x1a3/0x250
      [ 1151.034463]  exit_to_user_mode_prepare+0xcf/0x110
      [ 1151.035242]  syscall_exit_to_user_mode+0x27/0x190
      [ 1151.035970]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
      [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
      [ 1151.038323]
      [ 1151.039683] Redzone  000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
      [ 1151.041180] Object   000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [ 1151.042714] Redzone  0000000084e43932: 00 00 00 c0 cc cc cc cc                          ........
      [ 1151.044120] Padding  000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G    B             5.10.50+ #7
      [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [ 1151.048633] Call Trace:
      [ 1151.049072]  dump_stack+0x57/0x6a
      [ 1151.049585]  check_bytes_and_report+0xed/0x110
      [ 1151.050320]  check_object+0x1eb/0x290
      [ 1151.050924]  ? __x64_sys_swapoff+0x39a/0x540
      [ 1151.051646]  free_debug_processing+0x151/0x350
      [ 1151.052333]  __slab_free+0x21a/0x3a0
      [ 1151.052938]  ? _cond_resched+0x2d/0x40
      [ 1151.053529]  ? __vunmap+0x1de/0x220
      [ 1151.054139]  ? __x64_sys_swapoff+0x39a/0x540
      [ 1151.054796]  ? kfree+0x276/0x2b0
      [ 1151.055307]  kfree+0x276/0x2b0
      [ 1151.055832]  __x64_sys_swapoff+0x39a/0x540
      [ 1151.056466]  do_syscall_64+0x33/0x40
      [ 1151.057084]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1151.057866] RIP: 0033:0x150340b0ffb7
      [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
      [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
      [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
      [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
      [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
      [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
      [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
      [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
      [ 1151.066413]
      [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
      
      Fixes: 67482129 ("iomap: add a swapfile activation function")
      Fixes: a45c0ecc ("iomap: move the swapfile code into a separate file")
      Signed-off-by: NGang Deng <gavin.dg@linux.alibaba.com>
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c32d79ae
  8. 15 10月, 2021 2 次提交
  9. 14 7月, 2021 1 次提交
  10. 22 4月, 2021 1 次提交
    • R
      iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate · 9b07e3cc
      Ritesh Harjani 提交于
      stable inclusion
      from stable-5.10.28
      commit 4eff80b14014508134a1ae84ac03ad18d0a3dee7
      bugzilla: 51779
      
      --------------------------------
      
      [ Upstream commit 5808fecc ]
      
      In case if isi.nr_pages is 0, we are making sis->pages (which is
      unsigned int) a huge value in iomap_swapfile_activate() by assigning -1.
      This could cause a kernel crash in kernel v4.18 (with below signature).
      Or could lead to unknown issues on latest kernel if the fake big swap gets
      used.
      
      Fix this issue by returning -EINVAL in case of nr_pages is 0, since it
      is anyway a invalid swapfile. Looks like this issue will be hit when
      we have pagesize < blocksize type of configuration.
      
      I was able to hit the issue in case of a tiny swap file with below
      test script.
      https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/scripts/swap-issue.sh
      
      kernel crash analysis on v4.18
      
      ==============================
      On v4.18 kernel, it causes a kernel panic, since sis->pages becomes
      a huge value and isi.nr_extents is 0. When 0 is returned it is
      considered as a swapfile over NFS and SWP_FILE is set (sis->flags |= SWP_FILE).
      Then when swapoff was getting called it was calling a_ops->swap_deactivate()
      if (sis->flags & SWP_FILE) is true. Since a_ops->swap_deactivate() is
      NULL in case of XFS, it causes below panic.
      
      Panic signature on v4.18 kernel:
      =======================================
      root@qemu:/home/qemu# [ 8291.723351] XFS (loop2): Unmounting Filesystem
      [ 8292.123104] XFS (loop2): Mounting V5 Filesystem
      [ 8292.132451] XFS (loop2): Ending clean mount
      [ 8292.263362] Adding 4294967232k swap on /mnt1/test/swapfile.  Priority:-2 extents:1 across:274877906880k
      [ 8292.277834] Unable to handle kernel paging request for instruction fetch
      [ 8292.278677] Faulting instruction address: 0x00000000
      cpu 0x19: Vector: 400 (Instruction Access) at [c0000009dd5b7ad0]
          pc: 0000000000000000
          lr: c0000000003eb9dc: destroy_swap_extents+0xfc/0x120
          sp: c0000009dd5b7d50
         msr: 8000000040009033
        current = 0xc0000009b6710080
        paca    = 0xc00000003ffcb280   irqmask: 0x03   irq_happened: 0x01
          pid   = 5604, comm = swapoff
      Linux version 4.18.0 (riteshh@xxxxxxx) (gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04)) #57 SMP Wed Mar 3 01:33:04 CST 2021
      enter ? for help
      [link register   ] c0000000003eb9dc destroy_swap_extents+0xfc/0x120
      [c0000009dd5b7d50] c0000000025a7058 proc_poll_event+0x0/0x4 (unreliable)
      [c0000009dd5b7da0] c0000000003f0498 sys_swapoff+0x3f8/0x910
      [c0000009dd5b7e30] c00000000000bbe4 system_call+0x5c/0x70
      Exception: c01 (System Call) at 00007ffff7d208d8
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      [djwong: rework the comment to provide more details]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9b07e3cc
  11. 05 11月, 2020 2 次提交
    • B
      iomap: clean up writeback state logic on writepage error · 50e7d6c7
      Brian Foster 提交于
      The iomap writepage error handling logic is a mash of old and
      slightly broken XFS writepage logic. When keepwrite writeback state
      tracking was introduced in XFS in commit 0d085a52 ("xfs: ensure
      WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
      additional cluster writeback context that scanned ahead of
      ->writepage() to process dirty pages over the current ->writepage()
      extent mapping. This context expected a dirty page and required
      retention of the TOWRITE tag on partial page processing so the
      higher level writeback context would revisit the page (in contrast
      to ->writepage(), which passes a page with the dirty bit already
      cleared).
      
      The cluster writeback mechanism was eventually removed and some of
      the error handling logic folded into the primary writeback path in
      commit 150d5be0 ("xfs: remove xfs_cancel_ioend"). This patch
      accidentally conflated the two contexts by using the keepwrite logic
      in ->writepage() without accounting for the fact that the page is
      not dirty. Further, the keepwrite logic has no practical effect on
      the core ->writepage() caller (write_cache_pages()) because it never
      revisits a page in the current function invocation.
      
      Technically, the page should be redirtied for the keepwrite logic to
      have any effect. Otherwise, write_cache_pages() may find the tagged
      page but will skip it since it is clean. Even if the page was
      redirtied, however, there is still no practical effect to keepwrite
      since write_cache_pages() does not wrap around within a single
      invocation of the function. Therefore, the dirty page would simply
      end up retagged on the next writeback sequence over the associated
      range.
      
      All that being said, none of this really matters because redirtying
      a partially processed page introduces a potential infinite redirty
      -> writeback failure loop that deviates from the current design
      principle of clearing the dirty state on writepage failure to avoid
      building up too much dirty, unreclaimable memory on the system.
      Therefore, drop the spurious keepwrite usage and dirty state
      clearing logic from iomap_writepage_map(), treat the partially
      processed page the same as a fully processed page, and let the
      imminent ioend failure clean up the writeback state.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      50e7d6c7
    • B
      iomap: support partial page discard on writeback block mapping failure · 763e4cdc
      Brian Foster 提交于
      iomap writeback mapping failure only calls into ->discard_page() if
      the current page has not been added to the ioend. Accordingly, the
      XFS callback assumes a full page discard and invalidation. This is
      problematic for sub-page block size filesystems where some portion
      of a page might have been mapped successfully before a failure to
      map a delalloc block occurs. ->discard_page() is not called in that
      error scenario and the bio is explicitly failed by iomap via the
      error return from ->prepare_ioend(). As a result, the filesystem
      leaks delalloc blocks and corrupts the filesystem block counters.
      
      Since XFS is the only user of ->discard_page(), tweak the semantics
      to invoke the callback unconditionally on mapping errors and provide
      the file offset that failed to map. Update xfs_discard_page() to
      discard the corresponding portion of the file and pass the range
      along to iomap_invalidatepage(). The latter already properly handles
      both full and sub-page scenarios by not changing any iomap or page
      state on sub-page invalidations.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      763e4cdc
  12. 28 9月, 2020 3 次提交
  13. 21 9月, 2020 10 次提交
  14. 10 9月, 2020 4 次提交
  15. 24 8月, 2020 1 次提交
  16. 06 8月, 2020 2 次提交
    • C
      iomap: fall back to buffered writes for invalidation failures · 60263d58
      Christoph Hellwig 提交于
      Failing to invalid the page cache means data in incoherent, which is
      a very bad state for the system.  Always fall back to buffered I/O
      through the page cache if we can't invalidate mappings.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4
      Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2
      Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
      60263d58
    • D
      iomap: Only invalidate page cache pages on direct IO writes · 54752de9
      Dave Chinner 提交于
      The historic requirement for XFS to invalidate cached pages on
      direct IO reads has been lost in the twisty pages of history - it was
      inherited from Irix, which implemented page cache invalidation on
      read as a method of working around problems synchronising page
      cache state with uncached IO.
      
      XFS has carried this ever since. In the initial linux ports it was
      necessary to get mmap and DIO to play "ok" together and not
      immediately corrupt data. This was the state of play until the linux
      kernel had infrastructure to track unwritten extents and synchronise
      page faults with allocations and unwritten extent conversions
      (->page_mkwrite infrastructure). IOws, the page cache invalidation
      on DIO read was necessary to prevent trivial data corruptions. This
      didn't solve all the problems, though.
      
      There were peformance problems if we didn't invalidate the entire
      page cache over the file on read - we couldn't easily determine if
      the cached pages were over the range of the IO, and invalidation
      required taking a serialising lock (i_mutex) on the inode. This
      serialising lock was an issue for XFS, as it was the only exclusive
      lock in the direct Io read path.
      
      Hence if there were any cached pages, we'd just invalidate the
      entire file in one go so that subsequent IOs didn't need to take the
      serialising lock. This was a problem that prevented ranged
      invalidation from being particularly useful for avoiding the
      remaining coherency issues. This was solved with the conversion of
      i_mutex to i_rwsem and the conversion of the XFS inode IO lock to
      use i_rwsem. Hence we could now just do ranged invalidation and the
      performance problem went away.
      
      However, page cache invalidation was still needed to serialise
      sub-page/sub-block zeroing via direct IO against buffered IO because
      bufferhead state attached to the cached page could get out of whack
      when direct IOs were issued.  We've removed bufferheads from the
      XFS code, and we don't carry any extent state on the cached pages
      anymore, and so this problem has gone away, too.
      
      IOWs, it would appear that we don't have any good reason to be
      invalidating the page cache on DIO reads anymore. Hence remove the
      invalidation on read because it is unnecessary overhead,
      not needed to maintain coherency between mmap/buffered access and
      direct IO anymore, and prevents anyone from using direct IO reads
      from intentionally invalidating the page cache of a file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      54752de9
  17. 07 7月, 2020 1 次提交
  18. 09 6月, 2020 1 次提交
  19. 04 6月, 2020 4 次提交
  20. 03 6月, 2020 1 次提交