1. 09 12月, 2021 1 次提交
  2. 08 12月, 2021 4 次提交
  3. 05 12月, 2021 4 次提交
  4. 29 11月, 2021 1 次提交
  5. 26 11月, 2021 2 次提交
  6. 25 11月, 2021 4 次提交
  7. 24 11月, 2021 4 次提交
  8. 22 11月, 2021 1 次提交
  9. 21 11月, 2021 1 次提交
    • D
      proc/vmcore: fix clearing user buffer by properly using clear_user() · c1e63117
      David Hildenbrand 提交于
      To clear a user buffer we cannot simply use memset, we have to use
      clear_user().  With a virtio-mem device that registers a vmcore_cb and
      has some logically unplugged memory inside an added Linux memory block,
      I can easily trigger a BUG by copying the vmcore via "cp":
      
        systemd[1]: Starting Kdump Vmcore Save Service...
        kdump[420]: Kdump is using the default log level(3).
        kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
        kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
        kdump[465]: saving vmcore-dmesg.txt complete
        kdump[467]: saving vmcore
        BUG: unable to handle page fault for address: 00007f2374e01000
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
        Oops: 0003 [#1] PREEMPT SMP NOPTI
        CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
        RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
        Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
        RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
        RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
        RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
        RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
        R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
        R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
        FS:  00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
        Call Trace:
         read_vmcore+0x236/0x2c0
         proc_reg_read+0x55/0xa0
         vfs_read+0x95/0x190
         ksys_read+0x4f/0xc0
         do_syscall_64+0x3b/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
      Prevention (SMAP)", which is used to detect wrong access from the kernel
      to user buffers like this: SMAP triggers a permissions violation on
      wrong access.  In the x86-64 variant of clear_user(), SMAP is properly
      handled via clac()+stac().
      
      To fix, properly use clear_user() when we're dealing with a user buffer.
      
      Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
      Fixes: 997c136f ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Philipp Rudo <prudo@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1e63117
  10. 17 11月, 2021 5 次提交
    • C
      fs: handle circular mappings correctly · 96821970
      Christian Brauner 提交于
      When calling setattr_prepare() to determine the validity of the attributes the
      ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id.
      When the {g,u}id attribute of the file isn't altered and the caller's fs{g,u}id
      matches the current {g,u}id attribute the attribute change is allowed.
      
      The value in ia_{g,u}id does already account for idmapped mounts and will have
      taken the relevant idmapping into account. So in order to verify that the
      {g,u}id attribute isn't changed we simple need to compare the ia_{g,u}id value
      against the inode's i_{g,u}id value.
      
      This only has any meaning for idmapped mounts as idmapping helpers are
      idempotent without them. And for idmapped mounts this really only has a meaning
      when circular idmappings are used, i.e. mappings where e.g. id 1000 is mapped
      to id 1001 and id 1001 is mapped to id 1000. Such ciruclar mappings can e.g. be
      useful when sharing the same home directory between multiple users at the same
      time.
      
      As an example consider a directory with two files: /source/file1 owned by
      {g,u}id 1000 and /source/file2 owned by {g,u}id 1001. Assume we create an
      idmapped mount at /target with an idmapping that maps files owned by {g,u}id
      1000 to being owned by {g,u}id 1001 and files owned by {g,u}id 1001 to being
      owned by {g,u}id 1000. In effect, the idmapped mount at /target switches the
      ownership of /source/file1 and source/file2, i.e. /target/file1 will be owned
      by {g,u}id 1001 and /target/file2 will be owned by {g,u}id 1000.
      
      This means that a user with fs{g,u}id 1000 must be allowed to setattr
      /target/file2 from {g,u}id 1000 to {g,u}id 1000. Similar, a user with fs{g,u}id
      1001 must be allowed to setattr /target/file1 from {g,u}id 1001 to {g,u}id
      1001. Conversely, a user with fs{g,u}id 1000 must fail to setattr /target/file1
      from {g,u}id 1001 to {g,u}id 1000. And a user with fs{g,u}id 1001 must fail to
      setattr /target/file2 from {g,u}id 1000 to {g,u}id 1000. Both cases must fail
      with EPERM for non-capable callers.
      
      Before this patch we could end up denying legitimate attribute changes and
      allowing invalid attribute changes when circular mappings are used. To even get
      into this situation the caller must've been privileged both to create that
      mapping and to create that idmapped mount.
      
      This hasn't been seen in the wild anywhere but came up when expanding the
      testsuite during work on a series of hardening patches. All idmapped fstests
      pass without any regressions and we add new tests to verify the behavior of
      circular mappings.
      
      Link: https://lore.kernel.org/r/20211109145713.1868404-1-brauner@kernel.org
      Fixes: 2f221d6f ("attr: handle idmapped mounts")
      Cc: Seth Forshee <seth.forshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      96821970
    • K
      Revert "mark pstore-blk as broken" · d1faacbf
      Kees Cook 提交于
      This reverts commit d07f3b08.
      
      pstore-blk was fixed to avoid the unwanted APIs in commit 7bb9557b
      ("pstore/blk: Use the normal block device I/O path"), which landed in
      the same release as the commit adding BROKEN.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20211116181559.3975566-1-keescook@chromium.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
      d1faacbf
    • P
      cifs: introduce cifs_ses_mark_for_reconnect() helper · 8ae87bbe
      Paulo Alcantara 提交于
      Use new cifs_ses_mark_for_reconnect() helper to mark all session
      channels for reconnect instead of duplicating it in different places.
      Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      8ae87bbe
    • S
      cifs: protect srv_count with cifs_tcp_ses_lock · 446e2148
      Steve French 提交于
      Updates to the srv_count field are protected elsewhere
      with the cifs_tcp_ses_lock spinlock.  Add one missing place
      (cifs_get_tcp_sesion).
      
      CC: Shyam Prasad N <sprasad@microsoft.com>
      Addresses-Coverity: 1494149 ("Data Race Condition")
      Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      446e2148
    • S
      cifs: move debug print out of spinlock · 0226487a
      Steve French 提交于
      It is better to print debug messages outside of the chan_lock
      spinlock where possible.
      Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
      Addresses-Coverity: 1493854 ("Thread deadlock")
      Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      0226487a
  11. 16 11月, 2021 7 次提交
    • N
      btrfs: deprecate BTRFS_IOC_BALANCE ioctl · 6c405b24
      Nikolay Borisov 提交于
      The v2 balance ioctl has been introduced more than 9 years ago. Users of
      the old v1 ioctl should have long been migrated to it. It's time we
      deprecate it and eventually remove it.
      
      The only known user is in btrfs-progs that tries v1 as a fallback in
      case v2 is not supported. This is not necessary anymore.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6c405b24
    • C
      btrfs: make 1-bit bit-fields of scrub_page unsigned int · d08e38b6
      Colin Ian King 提交于
      The bitfields have_csum and io_error are currently signed which is not
      recommended as the representation is an implementation defined
      behaviour. Fix this by making the bit-fields unsigned ints.
      
      Fixes: 2c363954 ("btrfs: scrub: remove the anonymous structure from scrub_page")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d08e38b6
    • W
      btrfs: check-integrity: fix a warning on write caching disabled disk · a91cf0ff
      Wang Yugui 提交于
      When a disk has write caching disabled, we skip submission of a bio with
      flush and sync requests before writing the superblock, since it's not
      needed. However when the integrity checker is enabled, this results in
      reports that there are metadata blocks referred by a superblock that
      were not properly flushed. So don't skip the bio submission only when
      the integrity checker is enabled for the sake of simplicity, since this
      is a debug tool and not meant for use in non-debug builds.
      
      fstests/btrfs/220 trigger a check-integrity warning like the following
      when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
      
        btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
        ------------[ cut here ]------------
        WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
        CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
        Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
        RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
        RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
        RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
        RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
        R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
        R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
        FS:  00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
        Call Trace:
         btrfsic_process_written_block+0x2f7/0x850 [btrfs]
         __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
         ? bio_associate_blkg_from_css+0xa4/0x2c0
         btrfsic_submit_bio+0x18/0x30 [btrfs]
         write_dev_supers+0x81/0x2a0 [btrfs]
         ? find_get_pages_range_tag+0x219/0x280
         ? pagevec_lookup_range_tag+0x24/0x30
         ? __filemap_fdatawait_range+0x6d/0xf0
         ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
         ? find_first_extent_bit+0x9b/0x160 [btrfs]
         ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
         write_all_supers+0x1b3/0xa70 [btrfs]
         ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
         btrfs_commit_transaction+0x59d/0xac0 [btrfs]
         close_ctree+0x11d/0x339 [btrfs]
         generic_shutdown_super+0x71/0x110
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0xb8/0x140
         task_work_run+0x6d/0xb0
         exit_to_user_mode_prepare+0x1f0/0x200
         syscall_exit_to_user_mode+0x12/0x30
         do_syscall_64+0x46/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7f009f711dfb
        RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
        RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
        R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
        R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
        ---[ end trace 2c4b82abcef9eec4 ]---
        S-65536(sdb2/65536/1)
         -->
        M-1064960(sdb2/1064960/1)
      Reviewed-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NWang Yugui <wangyugui@e16-tech.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a91cf0ff
    • F
      btrfs: silence lockdep when reading chunk tree during mount · 4d9380e0
      Filipe Manana 提交于
      Often some test cases like btrfs/161 trigger lockdep splats that complain
      about possible unsafe lock scenario due to the fact that during mount,
      when reading the chunk tree we end up calling blkdev_get_by_path() while
      holding a read lock on a leaf of the chunk tree. That produces a lockdep
      splat like the following:
      
      [ 3653.683975] ======================================================
      [ 3653.685148] WARNING: possible circular locking dependency detected
      [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted
      [ 3653.687239] ------------------------------------------------------
      [ 3653.688400] mount/447465 is trying to acquire lock:
      [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.691054]
                     but task is already holding lock:
      [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
      [ 3653.693978]
                     which lock already depends on the new lock.
      
      [ 3653.695510]
                     the existing dependency chain (in reverse order) is:
      [ 3653.696915]
                     -> #3 (btrfs-chunk-00){++++}-{3:3}:
      [ 3653.698053]        down_read_nested+0x4b/0x140
      [ 3653.698893]        __btrfs_tree_read_lock+0x24/0x110 [btrfs]
      [ 3653.699988]        btrfs_read_lock_root_node+0x31/0x40 [btrfs]
      [ 3653.701205]        btrfs_search_slot+0x537/0xc00 [btrfs]
      [ 3653.702234]        btrfs_insert_empty_items+0x32/0x70 [btrfs]
      [ 3653.703332]        btrfs_init_new_device+0x563/0x15b0 [btrfs]
      [ 3653.704439]        btrfs_ioctl+0x2110/0x3530 [btrfs]
      [ 3653.705405]        __x64_sys_ioctl+0x83/0xb0
      [ 3653.706215]        do_syscall_64+0x3b/0xc0
      [ 3653.706990]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 3653.708040]
                     -> #2 (sb_internal#2){.+.+}-{0:0}:
      [ 3653.708994]        lock_release+0x13d/0x4a0
      [ 3653.709533]        up_write+0x18/0x160
      [ 3653.710017]        btrfs_sync_file+0x3f3/0x5b0 [btrfs]
      [ 3653.710699]        __loop_update_dio+0xbd/0x170 [loop]
      [ 3653.711360]        lo_ioctl+0x3b1/0x8a0 [loop]
      [ 3653.711929]        block_ioctl+0x48/0x50
      [ 3653.712442]        __x64_sys_ioctl+0x83/0xb0
      [ 3653.712991]        do_syscall_64+0x3b/0xc0
      [ 3653.713519]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 3653.714233]
                     -> #1 (&lo->lo_mutex){+.+.}-{3:3}:
      [ 3653.715026]        __mutex_lock+0x92/0x900
      [ 3653.715648]        lo_open+0x28/0x60 [loop]
      [ 3653.716275]        blkdev_get_whole+0x28/0x90
      [ 3653.716867]        blkdev_get_by_dev.part.0+0x142/0x320
      [ 3653.717537]        blkdev_open+0x5e/0xa0
      [ 3653.718043]        do_dentry_open+0x163/0x390
      [ 3653.718604]        path_openat+0x3f0/0xa80
      [ 3653.719128]        do_filp_open+0xa9/0x150
      [ 3653.719652]        do_sys_openat2+0x97/0x160
      [ 3653.720197]        __x64_sys_openat+0x54/0x90
      [ 3653.720766]        do_syscall_64+0x3b/0xc0
      [ 3653.721285]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 3653.721986]
                     -> #0 (&disk->open_mutex){+.+.}-{3:3}:
      [ 3653.722775]        __lock_acquire+0x130e/0x2210
      [ 3653.723348]        lock_acquire+0xd7/0x310
      [ 3653.723867]        __mutex_lock+0x92/0x900
      [ 3653.724394]        blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.725041]        blkdev_get_by_path+0xb8/0xd0
      [ 3653.725614]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
      [ 3653.726332]        open_fs_devices+0xd7/0x2c0 [btrfs]
      [ 3653.726999]        btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
      [ 3653.727739]        open_ctree+0xb8e/0x17bf [btrfs]
      [ 3653.728384]        btrfs_mount_root.cold+0x12/0xde [btrfs]
      [ 3653.729130]        legacy_get_tree+0x30/0x50
      [ 3653.729676]        vfs_get_tree+0x28/0xc0
      [ 3653.730192]        vfs_kern_mount.part.0+0x71/0xb0
      [ 3653.730800]        btrfs_mount+0x11d/0x3a0 [btrfs]
      [ 3653.731427]        legacy_get_tree+0x30/0x50
      [ 3653.731970]        vfs_get_tree+0x28/0xc0
      [ 3653.732486]        path_mount+0x2d4/0xbe0
      [ 3653.732997]        __x64_sys_mount+0x103/0x140
      [ 3653.733560]        do_syscall_64+0x3b/0xc0
      [ 3653.734080]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 3653.734782]
                     other info that might help us debug this:
      
      [ 3653.735784] Chain exists of:
                       &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00
      
      [ 3653.737123]  Possible unsafe locking scenario:
      
      [ 3653.737865]        CPU0                    CPU1
      [ 3653.738435]        ----                    ----
      [ 3653.739007]   lock(btrfs-chunk-00);
      [ 3653.739449]                                lock(sb_internal#2);
      [ 3653.740193]                                lock(btrfs-chunk-00);
      [ 3653.740955]   lock(&disk->open_mutex);
      [ 3653.741431]
                      *** DEADLOCK ***
      
      [ 3653.742176] 3 locks held by mount/447465:
      [ 3653.742739]  #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
      [ 3653.744114]  #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs]
      [ 3653.745563]  #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
      [ 3653.747066]
                     stack backtrace:
      [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1
      [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [ 3653.750592] Call Trace:
      [ 3653.750967]  dump_stack_lvl+0x57/0x72
      [ 3653.751526]  check_noncircular+0xf3/0x110
      [ 3653.752136]  ? stack_trace_save+0x4b/0x70
      [ 3653.752748]  __lock_acquire+0x130e/0x2210
      [ 3653.753356]  lock_acquire+0xd7/0x310
      [ 3653.753898]  ? blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.754596]  ? lock_is_held_type+0xe8/0x140
      [ 3653.755125]  ? blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.755729]  ? blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.756338]  __mutex_lock+0x92/0x900
      [ 3653.756794]  ? blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.757400]  ? do_raw_spin_unlock+0x4b/0xa0
      [ 3653.757930]  ? _raw_spin_unlock+0x29/0x40
      [ 3653.758437]  ? bd_prepare_to_claim+0x129/0x150
      [ 3653.758999]  ? trace_module_get+0x2b/0xd0
      [ 3653.759508]  ? try_module_get.part.0+0x50/0x80
      [ 3653.760072]  blkdev_get_by_dev.part.0+0xe7/0x320
      [ 3653.760661]  ? devcgroup_check_permission+0xc1/0x1f0
      [ 3653.761288]  blkdev_get_by_path+0xb8/0xd0
      [ 3653.761797]  btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
      [ 3653.762454]  open_fs_devices+0xd7/0x2c0 [btrfs]
      [ 3653.763055]  ? clone_fs_devices+0x8f/0x170 [btrfs]
      [ 3653.763689]  btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
      [ 3653.764370]  ? kvm_sched_clock_read+0x14/0x40
      [ 3653.764922]  open_ctree+0xb8e/0x17bf [btrfs]
      [ 3653.765493]  ? super_setup_bdi_name+0x79/0xd0
      [ 3653.766043]  btrfs_mount_root.cold+0x12/0xde [btrfs]
      [ 3653.766780]  ? rcu_read_lock_sched_held+0x3f/0x80
      [ 3653.767488]  ? kfree+0x1f2/0x3c0
      [ 3653.767979]  legacy_get_tree+0x30/0x50
      [ 3653.768548]  vfs_get_tree+0x28/0xc0
      [ 3653.769076]  vfs_kern_mount.part.0+0x71/0xb0
      [ 3653.769718]  btrfs_mount+0x11d/0x3a0 [btrfs]
      [ 3653.770381]  ? rcu_read_lock_sched_held+0x3f/0x80
      [ 3653.771086]  ? kfree+0x1f2/0x3c0
      [ 3653.771574]  legacy_get_tree+0x30/0x50
      [ 3653.772136]  vfs_get_tree+0x28/0xc0
      [ 3653.772673]  path_mount+0x2d4/0xbe0
      [ 3653.773201]  __x64_sys_mount+0x103/0x140
      [ 3653.773793]  do_syscall_64+0x3b/0xc0
      [ 3653.774333]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 3653.775094] RIP: 0033:0x7f648bc45aaa
      
      This happens because through btrfs_read_chunk_tree(), which is called only
      during mount, ends up acquiring the mutex open_mutex of a block device
      while holding a read lock on a leaf of the chunk tree while other paths
      need to acquire other locks before locking extent buffers of the chunk
      tree.
      
      Since at mount time when we call btrfs_read_chunk_tree() we know that
      we don't have other tasks running in parallel and modifying the chunk
      tree, we can simply skip locking of chunk tree extent buffers. So do
      that and move the assertion that checks the fs is not yet mounted to the
      top block of btrfs_read_chunk_tree(), with a comment before doing it.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4d9380e0
    • N
      btrfs: fix memory ordering between normal and ordered work functions · 45da9c17
      Nikolay Borisov 提交于
      Ordered work functions aren't guaranteed to be handled by the same thread
      which executed the normal work functions. The only way execution between
      normal/ordered functions is synchronized is via the WORK_DONE_BIT,
      unfortunately the used bitops don't guarantee any ordering whatsoever.
      
      This manifested as seemingly inexplicable crashes on ARM64, where
      async_chunk::inode is seen as non-null in async_cow_submit which causes
      submit_compressed_extents to be called and crash occurs because
      async_chunk::inode suddenly became NULL. The call trace was similar to:
      
          pc : submit_compressed_extents+0x38/0x3d0
          lr : async_cow_submit+0x50/0xd0
          sp : ffff800015d4bc20
      
          <registers omitted for brevity>
      
          Call trace:
           submit_compressed_extents+0x38/0x3d0
           async_cow_submit+0x50/0xd0
           run_ordered_work+0xc8/0x280
           btrfs_work_helper+0x98/0x250
           process_one_work+0x1f0/0x4ac
           worker_thread+0x188/0x504
           kthread+0x110/0x114
           ret_from_fork+0x10/0x18
      
      Fix this by adding respective barrier calls which ensure that all
      accesses preceding setting of WORK_DONE_BIT are strictly ordered before
      setting the flag. At the same time add a read barrier after reading of
      WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads
      would be strictly ordered after reading the bit. This in turn ensures
      are all accesses before WORK_DONE_BIT are going to be strictly ordered
      before any access that can occur in ordered_func.
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Fixes: 08a9ff32 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue")
      CC: stable@vger.kernel.org # 4.4+
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Tested-by: NChris Murphy <chris@colorremedies.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      45da9c17
    • Q
      btrfs: fix a out-of-bound access in copy_compressed_data_to_page() · 6f019c0e
      Qu Wenruo 提交于
      [BUG]
      The following script can cause btrfs to crash:
      
        $ mount -o compress-force=lzo $DEV /mnt
        $ dd if=/dev/urandom of=/mnt/foo bs=4k count=1
        $ sync
      
      The call trace looks like this:
      
        general protection fault, probably for non-canonical address 0xe04b37fccce3b000: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 5 PID: 164 Comm: kworker/u20:3 Not tainted 5.15.0-rc7-custom+ #4
        Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
        RIP: 0010:__memcpy+0x12/0x20
        Call Trace:
         lzo_compress_pages+0x236/0x540 [btrfs]
         btrfs_compress_pages+0xaa/0xf0 [btrfs]
         compress_file_range+0x431/0x8e0 [btrfs]
         async_cow_start+0x12/0x30 [btrfs]
         btrfs_work_helper+0xf6/0x3e0 [btrfs]
         process_one_work+0x294/0x5d0
         worker_thread+0x55/0x3c0
         kthread+0x140/0x170
         ret_from_fork+0x22/0x30
        ---[ end trace 63c3c0f131e61982 ]---
      
      [CAUSE]
      In lzo_compress_pages(), parameter @out_pages is not only an output
      parameter (for the number of compressed pages), but also an input
      parameter, as the upper limit of compressed pages we can utilize.
      
      In commit d4088803 ("btrfs: subpage: make lzo_compress_pages()
      compatible"), the refactoring doesn't take @out_pages as an input, thus
      completely ignoring the limit.
      
      And for compress-force case, we could hit incompressible data that
      compressed size would go beyond the page limit, and cause the above
      crash.
      
      [FIX]
      Save @out_pages as @max_nr_page, and pass it to lzo_compress_pages(),
      and check if we're beyond the limit before accessing the pages.
      
      Note: this also fixes crash on 32bit architectures that was suspected to
      be caused by merge of btrfs patches to 5.16-rc1. Reported in
      https://lore.kernel.org/all/20211104115001.GU20319@twin.jikos.cz/ .
      Reported-by: NOmar Sandoval <osandov@fb.com>
      Fixes: d4088803 ("btrfs: subpage: make lzo_compress_pages() compatible")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add note ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6f019c0e
    • C
      NFSD: Fix exposure in nfsd4_decode_bitmap() · c0019b7d
      Chuck Lever 提交于
      rtm@csail.mit.edu reports:
      > nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
      > directs it to do so. This can cause nfsd4_decode_state_protect4_a()
      > to write client-supplied data beyond the end of
      > nfsd4_exchange_id.spo_must_allow[] when called by
      > nfsd4_decode_exchange_id().
      
      Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
      @bmlen.
      
      Reported by: rtm@csail.mit.edu
      Fixes: d1c263a0 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c0019b7d
  12. 13 11月, 2021 6 次提交