1. 23 1月, 2019 27 次提交
  2. 17 1月, 2019 13 次提交
    • G
      Linux 4.19.16 · 9c5931b6
      Greg Kroah-Hartman 提交于
      9c5931b6
    • F
      Btrfs: use nofs context when initializing security xattrs to avoid deadlock · 7a1b9b76
      Filipe Manana 提交于
      commit 827aa18e7b903c5ff3b3cd8fec328a99b1dbd411 upstream.
      
      When initializing the security xattrs, we are holding a transaction handle
      therefore we need to use a GFP_NOFS context in order to avoid a deadlock
      with reclaim in case it's triggered.
      
      Fixes: 39a27ec1 ("btrfs: use GFP_KERNEL for xattr and acl allocations")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a1b9b76
    • F
      Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation · 79aa5c0d
      Filipe Manana 提交于
      commit 9a6f209e36500efac51528132a3e3083586eda5f upstream.
      
      If the quota enable and snapshot creation ioctls are called concurrently
      we can get into a deadlock where the task enabling quotas will deadlock
      on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
      twice, or the task creating a snapshot tries to commit the transaction
      while the task enabling quota waits for the former task to commit the
      transaction while holding the mutex. The following time diagrams show how
      both cases happen.
      
      First scenario:
      
                 CPU 0                                    CPU 1
      
       btrfs_ioctl()
        btrfs_ioctl_quota_ctl()
         btrfs_quota_enable()
          mutex_lock(fs_info->qgroup_ioctl_lock)
          btrfs_start_transaction()
      
                                                   btrfs_ioctl()
                                                    btrfs_ioctl_snap_create_v2
                                                     create_snapshot()
                                                      --> adds snapshot to the
                                                          list pending_snapshots
                                                          of the current
                                                          transaction
      
          btrfs_commit_transaction()
           create_pending_snapshots()
             create_pending_snapshot()
              qgroup_account_snapshot()
               btrfs_qgroup_inherit()
      	   mutex_lock(fs_info->qgroup_ioctl_lock)
      	    --> deadlock, mutex already locked
      	        by this task at
      		btrfs_quota_enable()
      
      Second scenario:
      
                 CPU 0                                    CPU 1
      
       btrfs_ioctl()
        btrfs_ioctl_quota_ctl()
         btrfs_quota_enable()
          mutex_lock(fs_info->qgroup_ioctl_lock)
          btrfs_start_transaction()
      
                                                   btrfs_ioctl()
                                                    btrfs_ioctl_snap_create_v2
                                                     create_snapshot()
                                                      --> adds snapshot to the
                                                          list pending_snapshots
                                                          of the current
                                                          transaction
      
                                                      btrfs_commit_transaction()
                                                       --> waits for task at
                                                           CPU 0 to release
                                                           its transaction
                                                           handle
      
          btrfs_commit_transaction()
           --> sees another task started
               the transaction commit first
           --> releases its transaction
               handle
           --> waits for the transaction
               commit to be completed by
               the task at CPU 1
      
                                                       create_pending_snapshot()
                                                        qgroup_account_snapshot()
                                                         btrfs_qgroup_inherit()
                                                          mutex_lock(fs_info->qgroup_ioctl_lock)
                                                           --> deadlock, task at CPU 0
                                                               has the mutex locked but
                                                               it is waiting for us to
                                                               finish the transaction
                                                               commit
      
      So fix this by setting the quota enabled flag in fs_info after committing
      the transaction at btrfs_quota_enable(). This ends up serializing quota
      enable and snapshot creation as if the snapshot creation happened just
      before the quota enable request. The quota rescan task, scheduled after
      committing the transaction in btrfs_quote_enable(), will do the accounting.
      
      Fixes: 6426c7ad ("btrfs: qgroup: Fix qgroup accounting when creating snapshot")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79aa5c0d
    • F
      Btrfs: fix access to available allocation bits when starting balance · 829431a2
      Filipe Manana 提交于
      commit 5a8067c0d17feb7579db0476191417b441a8996e upstream.
      
      The available allocation bits members from struct btrfs_fs_info are
      protected by a sequence lock, and when starting balance we access them
      incorrectly in two different ways:
      
      1) In the read sequence lock loop at btrfs_balance() we use the values we
         read from fs_info->avail_*_alloc_bits and we can immediately do actions
         that have side effects and can not be undone (printing a message and
         jumping to a label). This is wrong because a retry might be needed, so
         our actions must not have side effects and must be repeatable as long
         as read_seqretry() returns a non-zero value. In other words, we were
         essentially ignoring the sequence lock;
      
      2) Right below the read sequence lock loop, we were reading the values
         from avail_metadata_alloc_bits and avail_data_alloc_bits without any
         protection from concurrent writers, that is, reading them outside of
         the read sequence lock critical section.
      
      So fix this by making sure we only read the available allocation bits
      while in a read sequence lock critical section and that what we do in the
      critical section is repeatable (has nothing that can not be undone) so
      that any eventual retry that is needed is handled properly.
      
      Fixes: de98ced9 ("Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits")
      Fixes: 14506127 ("btrfs: fix a bogus warning when converting only data or metadata")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      829431a2
    • W
      arm64: compat: Don't pull syscall number from regs in arm_compat_syscall · 6c9a2046
      Will Deacon 提交于
      commit 53290432145a8eb143fe29e06e9c1465d43dc723 upstream.
      
      The syscall number may have been changed by a tracer, so we should pass
      the actual number in from the caller instead of pulling it from the
      saved r7 value directly.
      
      Cc: <stable@vger.kernel.org>
      Cc: Pi-Hsun Shih <pihsun@chromium.org>
      Reviewed-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      6c9a2046
    • C
      KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · 4f14f446
      Christoffer Dall 提交于
      commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.
      
      We recently addressed a VMID generation race by introducing a read/write
      lock around accesses and updates to the vmid generation values.
      
      However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
      does so without taking the read lock.
      
      As far as I can tell, this can lead to the same kind of race:
      
        VM 0, VCPU 0			VM 0, VCPU 1
        ------------			------------
        update_vttbr (vmid 254)
        				update_vttbr (vmid 1) // roll over
      				read_lock(kvm_vmid_lock);
      				force_vm_exit()
        local_irq_disable
        need_new_vmid_gen == false //because vmid gen matches
      
        enter_guest (vmid 254)
        				kvm_arch.vttbr = <PGD>:<VMID 1>
      				read_unlock(kvm_vmid_lock);
      
        				enter_guest (vmid 1)
      
      Which results in running two VCPUs in the same VM with different VMIDs
      and (even worse) other VCPUs from other VMs could now allocate clashing
      VMID 254 from the new generation as long as VCPU 0 is not exiting.
      
      Attempt to solve this by making sure vttbr is updated before another CPU
      can observe the updated VMID generation.
      
      Cc: stable@vger.kernel.org
      Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race"
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f14f446
    • V
      sunrpc: use-after-free in svc_process_common() · 44e7bab3
      Vasily Averin 提交于
      commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream.
      
      if node have NFSv41+ mounts inside several net namespaces
      it can lead to use-after-free in svc_process_common()
      
      svc_process_common()
              /* Setup reply header */
              rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
      
      svc_process_common() can use incorrect rqstp->rq_xprt,
      its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
      The problem is that serv is global structure but sv_bc_xprt
      is assigned per-netnamespace.
      
      According to Trond, the whole "let's set up rqstp->rq_xprt
      for the back channel" is nothing but a giant hack in order
      to work around the fact that svc_process_common() uses it
      to find the xpt_ops, and perform a couple of (meaningless
      for the back channel) tests of xpt_flags.
      
      All we really need in svc_process_common() is to be able to run
      rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
      
      Bruce J Fields points that this xpo_prep_reply_hdr() call
      is an awfully roundabout way just to do "svc_putnl(resv, 0);"
      in the tcp case.
      
      This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
      now it calls svc_process_common() with rqstp->rq_xprt = NULL.
      
      To adjust reply header svc_process_common() just check
      rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
      
      To handle rqstp->rq_xprt = NULL case in functions called from
      svc_process_common() patch intruduces net namespace pointer
      svc_rqst->rq_bc_net and adjust SVC_NET() definition.
      Some other function was also adopted to properly handle described case.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Fixes: 23c20ecd ("NFS: callback up - users counting cleanup")
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      v2: added lost extern svc_tcp_prep_reply_hdr()
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44e7bab3
    • J
      mm: page_mapped: don't assume compound page is huge or THP · 160f79c0
      Jan Stancek 提交于
      commit 8ab88c7169b7fba98812ead6524b9d05bc76cf00 upstream.
      
      LTP proc01 testcase has been observed to rarely trigger crashes
      on arm64:
          page_mapped+0x78/0xb4
          stable_page_flags+0x27c/0x338
          kpageflags_read+0xfc/0x164
          proc_reg_read+0x7c/0xb8
          __vfs_read+0x58/0x178
          vfs_read+0x90/0x14c
          SyS_read+0x60/0xc0
      
      The issue is that page_mapped() assumes that if compound page is not
      huge, then it must be THP.  But if this is 'normal' compound page
      (COMPOUND_PAGE_DTOR), then following loop can keep running (for
      HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
      mapped and triggers a panic:
      
              for (i = 0; i < hpage_nr_pages(page); i++) {
                      if (atomic_read(&page[i]._mapcount) >= 0)
                              return true;
      	}
      
      I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
      with a custom kernel module [1] which:
       - allocates compound page (PAGEC) of order 1
       - allocates 2 normal pages (COPY), which are initialized to 0xff (to
         satisfy _mapcount >= 0)
       - 2 PAGEC page structs are copied to address of first COPY page
       - second page of COPY is marked as not present
       - call to page_mapped(COPY) now triggers fault on access to 2nd COPY
         page at offset 0x30 (_mapcount)
      
      [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c
      
      Fix the loop to iterate for "1 << compound_order" pages.
      
      Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
      pages".
      
      Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
      Fixes: e1534ae9 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
      Signed-off-by: NJan Stancek <jstancek@redhat.com>
      Debugged-by: NLaszlo Ersek <lersek@redhat.com>
      Suggested-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      160f79c0
    • T
      ext4: fix special inode number checks in __ext4_iget() · 5dc41af3
      Theodore Ts'o 提交于
      commit 191ce17876c9367819c4b0a25b503c0f6d9054d8 upstream.
      
      The check for special (reserved) inode number checks in __ext4_iget()
      was broken by commit 8a363970d1dc: ("ext4: avoid declaring fs
      inconsistent due to invalid file handles").  This was caused by a
      botched reversal of the sense of the flag now known as
      EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL).
      Fix the logic appropriately.
      
      Fixes: 8a363970d1dc ("ext4: avoid declaring fs inconsistent...")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dc41af3
    • T
      ext4: track writeback errors using the generic tracking infrastructure · bb80ad0d
      Theodore Ts'o 提交于
      commit 95cb67138746451cc84cf8e516e14989746e93b0 upstream.
      
      We already using mapping_set_error() in fs/ext4/page_io.c, so all we
      need to do is to use file_check_and_advance_wb_err() when handling
      fsync() requests in ext4_sync_file().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb80ad0d
    • T
      ext4: use ext4_write_inode() when fsyncing w/o a journal · da38a1b4
      Theodore Ts'o 提交于
      commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a upstream.
      
      In no-journal mode, we previously used __generic_file_fsync() in
      no-journal mode.  This triggers a lockdep warning, and in addition,
      it's not safe to depend on the inode writeback mechanism in the case
      ext4.  We can solve both problems by calling ext4_write_inode()
      directly.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da38a1b4
    • T
      ext4: avoid kernel warning when writing the superblock to a dead device · 01db6e5c
      Theodore Ts'o 提交于
      commit e86807862e6880809f191c4cea7f88a489f0ed34 upstream.
      
      The xfstests generic/475 test switches the underlying device with
      dm-error while running a stress test.  This results in a large number
      of file system errors, and since we can't lock the buffer head when
      marking the superblock dirty in the ext4_grp_locked_error() case, it's
      possible the superblock to be !buffer_uptodate() without
      buffer_write_io_error() being true.
      
      We need to set buffer_uptodate() before we call mark_buffer_dirty() or
      this will trigger a WARN_ON.  It's safe to do this since the
      superblock must have been properly read into memory or the mount would
      have been successful.  So if buffer_uptodate() is not set, we can
      safely assume that this happened due to a failed attempt to write the
      superblock.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01db6e5c
    • T
      ext4: fix a potential fiemap/page fault deadlock w/ inline_data · 926cdac1
      Theodore Ts'o 提交于
      commit 2b08b1f12cd664dc7d5c84ead9ff25ae97ad5491 upstream.
      
      The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
      while still holding the xattr semaphore.  This is not necessary and it
      triggers a circular lockdep warning.  This is because
      fiemap_fill_next_extent() could trigger a page fault when it writes
      into page which triggers a page fault.  If that page is mmaped from
      the inline file in question, this could very well result in a
      deadlock.
      
      This problem can be reproduced using generic/519 with a file system
      configuration which has the inline_data feature enabled.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      926cdac1