1. 15 3月, 2020 1 次提交
    • P
      io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN} · f1d96a8f
      Pavel Begunkov 提交于
      Processing links, io_submit_sqe() prepares requests, drops sqes, and
      passes them with sqe=NULL to io_queue_sqe(). There IOSQE_DRAIN and/or
      IOSQE_ASYNC requests will go through the same prep, which doesn't expect
      sqe=NULL and fail with NULL pointer deference.
      
      Always do full prepare including io_alloc_async_ctx() for linked
      requests, and then it can skip the second preparation.
      
      Cc: stable@vger.kernel.org # 5.5
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f1d96a8f
  2. 09 3月, 2020 1 次提交
    • J
      io_uring: ensure RCU callback ordering with rcu_barrier() · 805b13ad
      Jens Axboe 提交于
      After more careful studying, Paul informs me that we cannot rely on
      ordering of RCU callbacks in the way that the the tagged commit did.
      The current construct looks like this:
      
      	void C(struct rcu_head *rhp)
      	{
      		do_something(rhp);
      		call_rcu(&p->rh, B);
      	}
      
      	call_rcu(&p->rh, A);
      	call_rcu(&p->rh, C);
      
      and we're relying on ordering between A and B, which isn't guaranteed.
      Make this explicit instead, and have a work item issue the rcu_barrier()
      to ensure that A has run before we manually execute B.
      
      While thorough testing never showed this issue, it's dependent on the
      per-cpu load in terms of RCU callbacks. The updated method simplifies
      the code as well, and eliminates the need to maintain an rcu_head in
      the fileset data.
      
      Fixes: c1e2148f ("io_uring: free fixed_file_data after RCU grace period")
      Reported-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      805b13ad
  3. 07 3月, 2020 2 次提交
    • P
      io_uring: fix lockup with timeouts · f0e20b89
      Pavel Begunkov 提交于
      There is a recipe to deadlock the kernel: submit a timeout sqe with a
      linked_timeout (e.g.  test_single_link_timeout_ception() from liburing),
      and SIGKILL the process.
      
      Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
      isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
      during io_put_free() to cancel the linked timeout. Probably, the same
      can happen with another io_kill_timeout() call site, that is
      io_commit_cqring().
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f0e20b89
    • J
      io_uring: free fixed_file_data after RCU grace period · c1e2148f
      Jens Axboe 提交于
      The percpu refcount protects this structure, and we can have an atomic
      switch in progress when exiting. This makes it unsafe to just free the
      struct normally, and can trigger the following KASAN warning:
      
      BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
      Read of size 1 at addr ffff888181a19a30 by task swapper/0/0
      
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack+0x76/0xa0
       print_address_description.constprop.0+0x3b/0x60
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       __kasan_report.cold+0x1a/0x3d
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       rcu_core+0x370/0x830
       ? percpu_ref_exit+0x50/0x50
       ? rcu_note_context_switch+0x7b0/0x7b0
       ? run_rebalance_domains+0x11d/0x140
       __do_softirq+0x10a/0x3e9
       irq_exit+0xd5/0xe0
       smp_apic_timer_interrupt+0x86/0x200
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      RIP: 0010:default_idle+0x26/0x1f0
      
      Fix this by punting the final exit and free of the struct to RCU, then
      we know that it's safe to do so. Jann suggested the approach of using a
      double rcu callback to achieve this. It's important that we do a nested
      call_rcu() callback, as otherwise the free could be ordered before the
      atomic switch, even if the latter was already queued.
      
      Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
      Suggested-by: NJann Horn <jannh@google.com>
      Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c1e2148f
  4. 03 3月, 2020 1 次提交
  5. 02 3月, 2020 5 次提交
    • P
      io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation · fc04c39b
      Pavel Begunkov 提交于
      To cancel a work, io-wq sets IO_WQ_WORK_CANCEL and executes the
      callback. However, IO_WQ_WORK_NO_CANCEL works will just execute and may
      return next work, which will be ignored and lost.
      
      Cancel the whole link.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fc04c39b
    • L
      Linux 5.6-rc4 · 98d54f81
      Linus Torvalds 提交于
      98d54f81
    • L
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · e7086982
      Linus Torvalds 提交于
      Pull ext4 fixes from Ted Ts'o:
       "Two more bug fixes (including a regression) for 5.6"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
        jbd2: fix data races at struct journal_head
      e7086982
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f853ed90
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
       "More bugfixes, including a few remaining "make W=1" issues such as too
        large frame sizes on some configurations.
      
        On the ARM side, the compiler was messing up shadow stacks between EL1
        and EL2 code, which is easily fixed with __always_inline"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: VMX: check descriptor table exits on instruction emulation
        kvm: x86: Limit the number of "kvm: disabled by bios" messages
        KVM: x86: avoid useless copy of cpufreq policy
        KVM: allow disabling -Werror
        KVM: x86: allow compiling as non-module with W=1
        KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
        KVM: Introduce pv check helpers
        KVM: let declaration of kvm_get_running_vcpus match implementation
        KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
        arm64: Ask the compiler to __always_inline functions used by KVM at HYP
        KVM: arm64: Define our own swab32() to avoid a uapi static inline
        KVM: arm64: Ask the compiler to __always_inline functions used at HYP
        kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
        KVM: arm/arm64: Fix up includes for trace.h
      f853ed90
    • O
      KVM: VMX: check descriptor table exits on instruction emulation · 86f7e90c
      Oliver Upton 提交于
      KVM emulates UMIP on hardware that doesn't support it by setting the
      'descriptor table exiting' VM-execution control and performing
      instruction emulation. When running nested, this emulation is broken as
      KVM refuses to emulate L2 instructions by default.
      
      Correct this regression by allowing the emulation of descriptor table
      instructions if L1 hasn't requested 'descriptor table exiting'.
      
      Fixes: 07721fee ("KVM: nVMX: Don't emulate instructions in guest mode")
      Reported-by: NJan Kiszka <jan.kiszka@web.de>
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      86f7e90c
  6. 01 3月, 2020 4 次提交
    • L
      Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · fb279f4e
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "I2C has three driver bugfixes for you. We agreed on the Mac regression
        to go in via I2C"
      
      * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        macintosh: therm_windtunnel: fix regression when instantiating devices
        i2c: altera: Fix potential integer overflow
        i2c: jz4780: silence log flood on txabrt
      fb279f4e
    • D
      ext4: potential crash on allocation error in ext4_alloc_flex_bg_array() · 37b0b6b8
      Dan Carpenter 提交于
      If sbi->s_flex_groups_allocated is zero and the first allocation fails
      then this code will crash.  The problem is that "i--" will set "i" to
      -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
      is type promoted to unsigned and becomes UINT_MAX.  Since UINT_MAX
      is more than zero, the condition is true so we call kvfree(new_groups[-1]).
      The loop will carry on freeing invalid memory until it crashes.
      
      Fixes: 7c990728 ("ext4: fix potential race between s_flex_groups online resizing and access")
      Reviewed-by: NSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountainSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      37b0b6b8
    • W
      macintosh: therm_windtunnel: fix regression when instantiating devices · 38b17afb
      Wolfram Sang 提交于
      Removing attach_adapter from this driver caused a regression for at
      least some machines. Those machines had the sensors described in their
      DT, too, so they didn't need manual creation of the sensor devices. The
      old code worked, though, because manual creation came first. Creation of
      DT devices then failed later and caused error logs, but the sensors
      worked nonetheless because of the manually created devices.
      
      When removing attach_adaper, manual creation now comes later and loses
      the race. The sensor devices were already registered via DT, yet with
      another binding, so the driver could not be bound to it.
      
      This fix refactors the code to remove the race and only manually creates
      devices if there are no DT nodes present. Also, the DT binding is updated
      to match both, the DT and manually created devices. Because we don't
      know which device creation will be used at runtime, the code to start
      the kthread is moved to do_probe() which will be called by both methods.
      
      Fixes: 3e7bed52 ("macintosh: therm_windtunnel: drop using attach_adapter")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723Reported-by: NErhard Furtner <erhard_f@mailbox.org>
      Tested-by: NErhard Furtner <erhard_f@mailbox.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org # v4.19+
      38b17afb
    • Q
      jbd2: fix data races at struct journal_head · 6c5d9112
      Qian Cai 提交于
      journal_head::b_transaction and journal_head::b_next_transaction could
      be accessed concurrently as noticed by KCSAN,
      
       LTP: starting fsync04
       /dev/zero: Can't open blockdev
       EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
       EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
       ==================================================================
       BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
      
       write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
        __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
        __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
        jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
        (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
        kjournald2+0x13b/0x450 [jbd2]
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
        jbd2_write_access_granted+0x1b2/0x250 [jbd2]
        jbd2_write_access_granted at fs/jbd2/transaction.c:1155
        jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
        __ext4_journal_get_write_access+0x50/0x90 [ext4]
        ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
        ext4_mb_new_blocks+0x54f/0xca0 [ext4]
        ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
        ext4_map_blocks+0x3b4/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block+0x3b/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       5 locks held by fsync04/25724:
        #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
        #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
        #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
        #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
       irq event stamp: 1407125
       hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The plain reads are outside of jh->b_state_lock critical section which result
      in data races. Fix them by adding pairs of READ|WRITE_ONCE().
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      6c5d9112
  7. 29 2月, 2020 8 次提交
  8. 28 2月, 2020 18 次提交