1. 21 10月, 2020 3 次提交
    • B
      gfs2: Eliminate gl_vm · 23cfb0c3
      Bob Peterson 提交于
      The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4
      ("GFS2: Use range based functions for rgrp sync/invalidation"), which stores
      the location of resource groups within their address space.  This structure is
      in a union with iopen glock specific fields.  It was introduced because at
      unmount time, the resource group objects were destroyed before flushing out any
      pending resource group glock work, and flushing out such work could require
      flushing / truncating the address space.
      
      Since commit b3422cac ("gfs2: Rework how rgrp buffer_heads are managed"),
      any pending resource group glock work is flushed out before destroying the
      resource group objects.  So the resource group objects will now always exist in
      rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values
      where needed instead of caching them.  This also eliminates the union.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      23cfb0c3
    • B
      gfs2: Only access gl_delete for iopen glocks · 2ffed529
      Bob Peterson 提交于
      Only initialize gl_delete for iopen glocks, but more importantly, only access
      it for iopen glocks in flush_delete_work: flush_delete_work is called for
      different types of glocks including rgrp glocks, and those use gl_vm which is
      in a union with gl_delete.  Without this fix, we'll end up clobbering gl_vm,
      which results in general memory corruption.
      
      Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work")
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      2ffed529
    • B
      gfs2: Fix comments to glock_hash_walk · dbffb29d
      Bob Peterson 提交于
      The comments before function glock_hash_walk had the wrong name and
      an extra parameter. This simply fixes the comments.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      dbffb29d
  2. 15 10月, 2020 23 次提交
    • B
      gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) · e2c6c8a7
      Bob Peterson 提交于
      Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
      when a glock had a holder queued. It was only checked for inode glocks,
      although set and cleared by all glocks, and it was only used to determine
      whether the glock should be held for the minimum hold time before releasing.
      
      The problem is that the flag is not accurate at all. If a process holds
      the glock, the flag is set. When they dequeue the glock, it only cleared
      the flag in cases when the state actually changed. So if the state doesn't
      change, the flag may still be set, even when nothing is queued.
      
      This happens to iopen glocks often: the get held in SH, then the file is
      closed, but the glock remains in SH mode.
      
      We don't need a special flag to indicate this: we can simply tell whether
      the glock has any items queued to the holders queue. It's a waste of cpu
      time to maintain it.
      
      This patch eliminates the flag in favor of simply checking list_empty
      on the glock holders.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e2c6c8a7
    • B
      gfs2: Ignore journal log writes for jdata holes · b2a846db
      Bob Peterson 提交于
      When flushing out its ail1 list, gfs2_write_jdata_page calls function
      __block_write_full_page passing in function gfs2_get_block_noalloc.
      But there was a problem when a process wrote to a jdata file, then
      truncated it or punched a hole, leaving references to the blocks within
      the new hole in its ail list, which are to be written to the journal log.
      
      In writing them to the journal, after calling gfs2_block_map, function
      gfs2_get_block_noalloc determined that the (hole-punched) block was not
      mapped, so it returned -EIO to generic_writepages, which passed it back
      to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
      there was a real IO error writing to the journal.
      
      This might be a valid error when writing metadata to the journal, but for
      journaled data writes, it does not warrant a withdraw.
      
      This patch adds a check to function gfs2_block_map that makes an exception
      for journaled data writes that correspond to jdata holes: If the iomap
      get function returns a block type of IOMAP_HOLE, it instead returns
      -ENODATA which does not cause the withdraw. Other errors are returned as
      before.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b2a846db
    • B
      gfs2: simplify gfs2_block_map · a6645745
      Bob Peterson 提交于
      Function gfs2_block_map had a lot of redundancy between its create and
      no_create paths. This patch simplifies the code to eliminate the redundancy.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a6645745
    • B
      gfs2: Only set PageChecked if we have a transaction · 6302d6f4
      Bob Peterson 提交于
      With jdata writes, we frequently got into situations where gfs2 deadlocked
      because of this calling sequence:
      
      gfs2_ail1_start
         gfs2_ail1_flush - for every tr on the sd_ail1_list:
            gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
               generic_writepages
      	    write_cache_pages passing __writepage()
      	       calls clear_page_dirty_for_io which calls set_page_dirty:
      	          which calls jdata_set_page_dirty which sets PageChecked.
      	       __writepage() calls
      	          mapping->a_ops->writepage AKA gfs2_jdata_writepage
      
      However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
      ignores the write and redirties the page. The problem is that write_cache_pages
      calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
      in page-writeback.c starting with "Yes, Virginia". If it's jdata,
      set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
      That causes a conflict because it makes it look like the page has been
      redirtied by another writer, in which case we need to skip writing it and
      redirty the page. That ends up in a deadlock because it isn't a "real" writer
      and nothing will ever clear PageChecked.
      
      If we do have a real writer, it will have started a transaction. So this
      patch checks if a transaction is in use, and if not, it skips setting
      PageChecked. That way, the page will be dirtied, cleaned, and written
      appropriately.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      6302d6f4
    • B
      gfs2: don't lock sd_ail_lock in gfs2_releasepage · 249ffe18
      Bob Peterson 提交于
      Patch 380f7c65 changed gfs2_releasepage
      so that it held the sd_ail_lock spin_lock for most of its processing.
      It did this for some mysterious undocumented bug somewhere in the
      evict code path. But in the nine years since, evict has been reworked
      and fixed many times, and so have the transactions and ail list.
      I can't see a reason to hold the sd_ail_lock unless it's protecting
      the actual ail lists hung off the transactions. Therefore, this patch
      removes the locking to increase speed and efficiency, and to further help
      us rework the log flush code to be more concurrent with transactions.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      249ffe18
    • B
      gfs2: make gfs2_ail1_empty_one return the count of active items · 36c78309
      Bob Peterson 提交于
      This patch is one baby step toward simplifying the journal management.
      It simply changes function gfs2_ail1_empty_one from a void to an int and
      makes it return a count of active items. This allows the caller to check
      the return code rather than list_empty on the tr_ail1_list. This way
      we can, in a later patch, combine transaction ail1 and ail2 lists.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      36c78309
    • B
      gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe · 68942870
      Bob Peterson 提交于
      Before this patch, when blocks were freed, it called gfs2_meta_wipe to
      take the metadata out of the pending journal blocks. It did this mostly
      by calling another function called gfs2_remove_from_journal. This is
      shortsighted because it does not do anything with jdata blocks which
      may also be in the journal.
      
      This patch expands the function so that it wipes out jdata blocks from
      the journal as well, and it wipes it from the ail1 list if it hasn't
      been written back yet. Since it now processes jdata blocks as well,
      the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.
      
      New function gfs2_ail1_wipe wants a static view of the ail list, so it
      locks the sd_ail_lock when removing items. To accomplish this, function
      gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
      the caller's responsibility to do so.
      
      I was going to make sd_ail_lock locking conditional, but the practice is
      generally frowned upon. For details, see: https://lwn.net/Articles/109066/Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      68942870
    • B
      gfs2: enhance log_blocks trace point to show log blocks free · 97c5e43d
      Bob Peterson 提交于
      This patch adds some code to enhance the log_blocks trace point. It
      reports the number of free log blocks. This makes the trace point much
      more useful, especially for debugging performance problems when we can
      tell when the journal gets full and needs to wait for flushes, etc.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      97c5e43d
    • B
      gfs2: add missing log_blocks trace points in gfs2_write_revokes · 77650bdb
      Bob Peterson 提交于
      Function gfs2_write_revokes was incrementing and decrementing the number
      of log blocks free, but there was never a log_blocks trace point for it.
      Thus, the free blocks from a log_blocks trace would jump around
      mysteriously.
      
      This patch adds the missing trace points so the trace makes more sense.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      77650bdb
    • B
      gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm · 21b6924b
      Bob Peterson 提交于
      Since the function is only used for writing jdata pages, this patch
      simply renames function gfs2_write_full_page to a more appropriate
      name: gfs2_write_jdata_page. This makes the code easier to understand.
      
      The function was only called in one place, which passed in a pointer to
      function gfs2_get_block_noalloc. The function doesn't need to be
      passed in. Therefore, this also eliminates the unnecessary parameter
      to increase efficiency.
      
      I also took the liberty of cleaning up the function comments.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      21b6924b
    • A
      gfs2: add validation checks for size of superblock · 0ddc5154
      Anant Thazhemadam 提交于
      In gfs2_check_sb(), no validation checks are performed with regards to
      the size of the superblock.
      syzkaller detected a slab-out-of-bounds bug that was primarily caused
      because the block size for a superblock was set to zero.
      A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
      Performing validation checks and ensuring that the size of the superblock
      is valid fixes this bug.
      
      Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
      Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
      Suggested-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com>
      [Minor code reordering.]
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      0ddc5154
    • J
      gfs2: use-after-free in sysfs deregistration · c2a04b02
      Jamie Iles 提交于
      syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:
      
        Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228
      
        CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
        Hardware name: linux,dummy-virt (DT)
        Workqueue: events kobject_delayed_cleanup
        Call trace:
         dump_backtrace+0x0/0x4d8
         show_stack+0x34/0x48
         dump_stack+0x174/0x1f8
         print_address_description.constprop.0+0x5c/0x550
         kasan_report+0x13c/0x1c0
         __asan_report_load1_noabort+0x34/0x60
         memcmp+0xd0/0xd8
         gfs2_uevent+0xc4/0x188
         kobject_uevent_env+0x54c/0x1240
         kobject_uevent+0x2c/0x40
         __kobject_del+0x190/0x1d8
         kobject_delayed_cleanup+0x2bc/0x3b8
         process_one_work+0x96c/0x18c0
         worker_thread+0x3f0/0xc30
         kthread+0x390/0x498
         ret_from_fork+0x10/0x18
      
        Allocated by task 1110:
         kasan_save_stack+0x28/0x58
         __kasan_kmalloc.isra.0+0xc8/0xe8
         kasan_kmalloc+0x10/0x20
         kmem_cache_alloc_trace+0x1d8/0x2f0
         alloc_super+0x64/0x8c0
         sget_fc+0x110/0x620
         get_tree_bdev+0x190/0x648
         gfs2_get_tree+0x50/0x228
         vfs_get_tree+0x84/0x2e8
         path_mount+0x1134/0x1da8
         do_mount+0x124/0x138
         __arm64_sys_mount+0x164/0x238
         el0_svc_common.constprop.0+0x15c/0x598
         do_el0_svc+0x60/0x150
         el0_svc+0x34/0xb0
         el0_sync_handler+0xc8/0x5b4
         el0_sync+0x15c/0x180
      
        Freed by task 228:
         kasan_save_stack+0x28/0x58
         kasan_set_track+0x28/0x40
         kasan_set_free_info+0x24/0x48
         __kasan_slab_free+0x118/0x190
         kasan_slab_free+0x14/0x20
         slab_free_freelist_hook+0x6c/0x210
         kfree+0x13c/0x460
      
      Use the same pattern as f2fs + ext4 where the kobject destruction must
      complete before allowing the FS itself to be freed.  This means that we
      need an explicit free_sbd in the callers.
      
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NJamie Iles <jamie@nuviainc.com>
      [Also go to fail_free when init_names fails.]
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c2a04b02
    • A
      gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump · 0e539ca1
      Andrew Price 提交于
      When an rindex entry is found to be corrupt, compute_bitstructs() calls
      gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:
      
          gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);
      
      gfs2_rgrp_dump then dereferences the gl without checking it and we get
      
          BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280
      
      because there's no rgrp glock involved while reading the rindex on mount.
      
      Fix this by changing gfs2_rgrp_dump to take an rgrp argument.
      
      Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      0e539ca1
    • C
      gfs2: use iomap for buffered I/O in ordered and writeback mode · 2164f9b9
      Christoph Hellwig 提交于
      Switch to using the iomap readpage and writepage helpers for all I/O in
      the ordered and writeback modes, and thus eliminate using buffer_heads
      for I/O in these cases.  The journaled data mode is left untouched.
      
      (Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
      to set_page_dirty instead of accidentally leaving the page / buffer clean.)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      2164f9b9
    • B
      gfs2: call truncate_inode_pages_final for address space glocks · ee1e2c77
      Bob Peterson 提交于
      Before this patch, we were not calling truncate_inode_pages_final for the
      address space for glocks, which left the possibility of a leak. We now
      take care of the problem instead of complaining, and we do it during
      glock tear-down..
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      ee1e2c77
    • B
      gfs2: simplify the logic in gfs2_evict_inode · 0a0d9f55
      Bob Peterson 提交于
      Now that we've factored out the deleted and undeleted dinode cases
      in gfs2_evict_inode, we can greatly simplify the logic. Now the
      function is easy to read and understand.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      0a0d9f55
    • B
      gfs2: factor evict_linked_inode out of gfs2_evict_inode · d90be6ab
      Bob Peterson 提交于
      Now that we've factored out the delete-dinode case to simplify
      gfs2_evict_inode, we take it a step further and factor out the other
      case: where we don't delete the inode.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d90be6ab
    • B
      gfs2: further simplify gfs2_evict_inode with new func evict_should_delete · 53dbc27e
      Bob Peterson 提交于
      This patch further simplifies function gfs2_evict_inode() by adding a
      new function evict_should_delete. The function may also lock the inode
      glock.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      53dbc27e
    • B
      gfs2: factor evict_unlinked_inode out of gfs2_evict_inode · 6e7e9a50
      Bob Peterson 提交于
      Function gfs2_evict_inode is way too big, complex and unreadable. This
      is a baby step toward breaking it apart to be more readable. It factors
      out the portion that deletes the online bits for a dinode that is
      unlinked and needs to be deleted. A future patch will factor out more.
      (If I factor out too much, the patch itself becomes unreadable).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      6e7e9a50
    • B
      gfs2: rename variable error to ret in gfs2_evict_inode · 23d828fc
      Bob Peterson 提交于
      Function gfs2_evict_inode is too big and unreadable. This patch is just
      a baby step toward improving that. This first step just renames variable
      error to ret. This will help make future patches more readable.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      23d828fc
    • L
      gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro · e8a8023e
      Liu Shixin 提交于
      Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
      Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e8a8023e
    • B
      gfs2: Fix bad comment for trans_drain · 521031fa
      Bob Peterson 提交于
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      521031fa
    • A
      gfs2: Make sure we don't miss any delayed withdraws · 5a61ae14
      Andreas Gruenbacher 提交于
      Commit ca399c96 changes gfs2_log_flush to not withdraw the
      filesystem while holding the log flush lock, but it fails to check if
      the filesystem needs to be withdrawn once the log flush lock has been
      released.  Likewise, commit f05b86db depends on gfs2_log_flush to
      trigger for delayed withdraws.  Add that and clean up the code flow
      somewhat.
      
      In gfs2_put_super, add a check for delayed withdraws that have been
      missed to prevent these kinds of bugs in the future.
      
      Fixes: ca399c96 ("gfs2: flesh out delayed withdraw for gfs2_log_flush")
      Fixes: f05b86db ("gfs2: Prepare to withdraw as soon as an IO error occurs in log write")
      Cc: stable@vger.kernel.org # v5.7+: 462582b9: gfs2: add some much needed cleanup for log flushes that fail
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      5a61ae14
  3. 12 10月, 2020 10 次提交
  4. 11 10月, 2020 3 次提交
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · da690031
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Some more driver bugfixes for I2C. Including a revert - the updated
        series for it will come during the next merge window"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: owl: Clear NACK and BUS error bits
        Revert "i2c: imx: Fix reset of I2SR_IAL flag"
        i2c: meson: fixup rate calculation with filter delay
        i2c: meson: keep peripheral clock enabled
        i2c: meson: fix clock setting overwrite
        i2c: imx: Fix reset of I2SR_IAL flag
      da690031
    • V
      cifs: Fix incomplete memory allocation on setxattr path · 64b7f674
      Vladimir Zapolskiy 提交于
      On setxattr() syscall path due to an apprent typo the size of a dynamically
      allocated memory chunk for storing struct smb2_file_full_ea_info object is
      computed incorrectly, to be more precise the first addend is the size of
      a pointer instead of the wanted object size. Coincidentally it makes no
      difference on 64-bit platforms, however on 32-bit targets the following
      memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
      
        =============================================================================
        BUG kmalloc-16 (Not tainted): Redzone overwritten
        -----------------------------------------------------------------------------
      
        Disabling lock debugging due to kernel taint
        INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
        INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
        INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
      
        Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
        Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69  ........snrub.fi
        Redzone 79e69a6f: 73 68 32 0a                                      sh2.
        Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        CPU: 0 PID: 8196 Comm: attr Tainted: G    B             5.9.0-rc8+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
        Call Trace:
         dump_stack+0x54/0x6e
         print_trailer+0x12c/0x134
         check_bytes_and_report.cold+0x3e/0x69
         check_object+0x18c/0x250
         free_debug_processing+0xfe/0x230
         __slab_free+0x1c0/0x300
         kfree+0x1d3/0x220
         smb2_set_ea+0x27d/0x540
         cifs_xattr_set+0x57f/0x620
         __vfs_setxattr+0x4e/0x60
         __vfs_setxattr_noperm+0x4e/0x100
         __vfs_setxattr_locked+0xae/0xd0
         vfs_setxattr+0x4e/0xe0
         setxattr+0x12c/0x1a0
         path_setxattr+0xa4/0xc0
         __ia32_sys_lsetxattr+0x1d/0x20
         __do_fast_syscall_32+0x40/0x70
         do_fast_syscall_32+0x29/0x60
         do_SYSENTER_32+0x15/0x20
         entry_SYSENTER_32+0x9f/0xf2
      
      Fixes: 5517554e ("cifs: Add support for writing attributes on SMB2+")
      Signed-off-by: NVladimir Zapolskiy <vladimir@tuxera.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64b7f674
    • H
      mm/khugepaged: fix filemap page_to_pgoff(page) != offset · 033b5d77
      Hugh Dickins 提交于
      There have been elusive reports of filemap_fault() hitting its
      VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
      with CONFIG_READ_ONLY_THP_FOR_FS=y.
      
      Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
      CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
      without NUMA reuses the same huge page after collapse_file() failed
      (whereas NUMA targets its allocation to the respective node each time).
      And most of us were usually testing with CONFIG_NUMA=y kernels.
      
      collapse_file(old start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=old offset
        new_page->mapping = mapping
        xas_store(&xas, new_page)
      
                                filemap_fault
                                  page = find_get_page(mapping, offset)
                                  // if offset falls inside hpage then
                                  // compound_head(page) == hpage
                                  lock_page_maybe_drop_mmap()
                                    __lock_page(page)
      
        // collapse fails
        xas_store(&xas, old page)
        new_page->mapping = NULL
        unlock_page(new_page)
      
      collapse_file(new start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=new offset
        new_page->mapping = mapping // mapping becomes valid again
      
                                  // since compound_head(page) == hpage
                                  // page_to_pgoff(page) got changed
                                  VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
      
      An initial patch replaced __SetPageLocked() by lock_page(), which did
      fix the race which Suren illustrates above.  But testing showed that it's
      not good enough: if the racing task's __lock_page() gets delayed long
      after its find_get_page(), then it may follow collapse_file(new start)'s
      successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
      
      It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
      check and retry (as is done for mapping), with similar relaxations in
      find_lock_entry() and pagecache_get_page(): but it's not obvious what
      else might get caught out; and khugepaged non-NUMA appears to be unique
      in exposing a page to page cache, then revoking, without going through
      a full cycle of freeing before reuse.
      
      Instead, non-NUMA khugepaged_prealloc_page() release the old page
      if anyone else has a reference to it (1% of cases when I tested).
      
      Although never reported on huge tmpfs, I believe its find_lock_entry()
      has been at similar risk; but huge tmpfs does not rely on khugepaged
      for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
      Reported-by: NDenis Lisov <dennis.lissov@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
      Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.orgReported-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pwReported-and-analyzed-by: NSuren Baghdasaryan <surenb@google.com>
      Fixes: 87c460a0 ("mm/khugepaged: collapse_shmem() without freezing new_page")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v4.9+
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      033b5d77
  5. 10 10月, 2020 1 次提交