1. 20 7月, 2018 1 次提交
    • T
      net/page_pool: Fix inconsistent lock state warning · 4905bd9a
      Tariq Toukan 提交于
      Fix the warning below by calling the ptr_ring_consume_bh,
      which uses spin_[un]lock_bh.
      
      [  179.064300] ================================
      [  179.069073] WARNING: inconsistent lock state
      [  179.073846] 4.18.0-rc2+ #18 Not tainted
      [  179.078133] --------------------------------
      [  179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at:
      __page_pool_empty_ring+0x61/0x100
      [  179.105988] {SOFTIRQ-ON-W} state was registered at:
      [  179.111443]   _raw_spin_lock+0x35/0x50
      [  179.115634]   __page_pool_empty_ring+0x61/0x100
      [  179.120699]   page_pool_destroy+0x32/0x50
      [  179.125204]   mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  179.130471]   mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  179.136418]   mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  179.142364]   mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  179.148509]   mlx5e_close+0x42/0x60 [mlx5_core]
      [  179.153936]   __dev_close_many+0xb1/0x120
      [  179.158749]   dev_close_many+0xa2/0x170
      [  179.163364]   rollback_registered_many+0x148/0x460
      [  179.169047]   rollback_registered+0x56/0x90
      [  179.174043]   unregister_netdevice_queue+0x7e/0x100
      [  179.179816]   unregister_netdev+0x18/0x20
      [  179.184623]   mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  179.190107]   mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  179.196274]   mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  179.203028]   cleanup+0x5/0xbfc [mlx5_core]
      [  179.208031]   __x64_sys_delete_module+0x16b/0x240
      [  179.213640]   do_syscall_64+0x5a/0x210
      [  179.218151]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  179.224218] irq event stamp: 334398
      [  179.228438] hardirqs last  enabled at (334398): [<ffffffffa511d8b7>]
      rcu_process_callbacks+0x1c7/0x790
      [  179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>]
      rcu_process_callbacks+0x182/0x790
      [  179.249931] softirqs last  enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70
      [  179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0
      [  179.268584]
      [  179.268584] other info that might help us debug this:
      [  179.276572]  Possible unsafe locking scenario:
      [  179.276572]
      [  179.283877]        CPU0
      [  179.286954]        ----
      [  179.290033]   lock(&(&r->consumer_lock)->rlock);
      [  179.295546]   <Interrupt>
      [  179.298830]     lock(&(&r->consumer_lock)->rlock);
      [  179.304550]
      [  179.304550]  *** DEADLOCK ***
      
      Fixes: ff7d6b27 ("page_pool: refurbish version of page_pool code")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4905bd9a
  2. 19 7月, 2018 28 次提交
  3. 18 7月, 2018 8 次提交
    • T
      ALSA: hda/realtek - Yet another Clevo P950 quirk entry · f3d737b6
      Takashi Iwai 提交于
      The PCI SSID 1558:95e1 needs the same quirk for other Clevo P950
      models, too.  Otherwise no sound comes out of speakers.
      
      Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1101143
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      f3d737b6
    • P
      kvmclock: fix TSC calibration for nested guests · e10f7805
      Peng Hao 提交于
      Inside a nested guest, access to hardware can be slow enough that
      tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
      to be called periodically and the nested guest to spend a lot of time
      reading the ACPI timer.
      
      However, if the TSC frequency is available from the pvclock page,
      we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
      'refine' operation.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
      [Commit message rewritten. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e10f7805
    • L
      KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled · 2307af1c
      Liran Alon 提交于
      When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
      with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
      by MSR_IA32_VMX_BASIC.
      
      However, even though not explictly documented by TLFS, VMXArea passed
      as VMXON argument should still be marked with revision_id reported by
      physical CPU.
      
      This issue was found by the following setup:
      * L0 = KVM which expose eVMCS to it's L1 guest.
      * L1 = KVM which consume eVMCS reported by L0.
      This setup caused the following to occur:
      1) L1 execute hardware_enable().
      2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
      3) L0 intercept L1 VMXON and execute handle_vmon() which notes
      vmxarea->revision_id != VMCS12_REVISION and therefore fails with
      nested_vmx_failInvalid() which sets RFLAGS.CF.
      4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
      hardware_enable() continues as usual.
      5) L1 hardware_enable() then calls ept_sync_global() which executes
      INVEPT.
      6) L0 intercept INVEPT and execute handle_invept() which notes
      !vmx->nested.vmxon and thus raise a #UD to L1.
      7) Raised #UD caused L1 to panic.
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 773e8a04Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2307af1c
    • P
      KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer · 9432a317
      Paolo Bonzini 提交于
      A comment warning against this bug is there, but the code is not doing what
      the comment says.  Therefore it is possible that an EPOLLHUP races against
      irq_bypass_register_consumer.  The EPOLLHUP handler schedules irqfd_shutdown,
      and if that runs soon enough, you get a use-after-free.
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      9432a317
    • L
      KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. · b5020a8e
      Lan Tianyu 提交于
      Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
      when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
      for one specific eventfd. When the assign path hasn't finished but irqfd
      has been added to kvm->irqfds.items list, another thead may deassign the
      eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
      the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
      such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
      has been added to kvm->irqfds.items list, and call synchronize_srcu()
      in irq_shutdown() to make sure that irqfd has been fully initialized in
      the assign path.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NTianyu Lan <tianyu.lan@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5020a8e
    • D
      net: usb: rtl8150: demote allmulti message to dev_dbg() · 3a9b0455
      David Lechner 提交于
      This driver can spam the kernel log with multiple messages of:
      
          net eth0: eth0: allmulti set
      
      Usually 4 or 8 at a time (probably because of using ConnMan).
      
      This message doesn't seem useful, so let's demote it from dev_info()
      to dev_dbg().
      Signed-off-by: NDavid Lechner <david@lechnology.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a9b0455
    • A
      octeon_mgmt: Fix MIX registers configuration on MTU setup · 4aac0b43
      Alexander Sverdlin 提交于
      octeon_mgmt driver doesn't drop RX frames that are 1-4 bytes bigger than
      MTU set for the corresponding interface. The problem is in the
      AGL_GMX_RX0/1_FRM_MAX register setting, which should not account for VLAN
      tagging.
      
      According to Octeon HW manual:
      "For tagged frames, MAX increases by four bytes for each VLAN found up to a
      maximum of two VLANs, or MAX + 8 bytes."
      
      OCTEON_FRAME_HEADER_LEN "define" is fine for ring buffer management, but
      should not be used for AGL_GMX_RX0/1_FRM_MAX.
      
      The problem could be easily reproduced using "ping" command. If affected
      system has default MTU 1500, other host (having MTU >= 1504) can
      successfully "ping" the affected system with payload size 1473-1476,
      resulting in IP packets of size 1501-1504 accepted by the mgmt driver.
      Fixed system still accepts IP packets of 1500 bytes even with VLAN tagging,
      because the limits are lifted in HW as expected, for every VLAN tag.
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4aac0b43
    • L
      Mark HI and TASKLET softirq synchronous · 3c53776e
      Linus Torvalds 提交于
      Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do
      its job"), and ever since we've had small nagging issues with it.  For
      example, we've had:
      
        1ff68820 ("watchdog: core: make sure the watchdog_worker is not deferred")
        8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
        217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
      
      all of which worked around some of the effects of that commit.
      
      The DVB people have also complained that the commit causes excessive USB
      URB latencies, which seems to be due to the USB code using tasklets to
      schedule USB traffic.  This seems to be an issue mainly when already
      living on the edge, but waiting for ksoftirqd to handle it really does
      seem to cause excessive latencies.
      
      Now Hanna Hawa reports that this issue isn't just limited to USB URB and
      DVB, but also causes timeout problems for the Marvell SoC team:
      
       "I'm facing kernel panic issue while running raid 5 on sata disks
        connected to Macchiatobin (Marvell community board with Armada-8040
        SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
        and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
        uses a tasklet to clean the done descriptors from the queue"
      
      The latency problem causes a panic:
      
        mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
        Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
      
      We've discussed simply just reverting the original commit entirely, and
      also much more involved solutions (with per-softirq threads etc).  This
      patch is intentionally stupid and fairly limited, because the issue
      still remains, and the other solutions either got sidetracked or had
      other issues.
      
      We should probably also consider the timer softirqs to be synchronous
      and not be delayed to ksoftirqd (since they were the issue with the
      earlier watchdog problems), but that should be done as a separate patch.
      This does only the tasklet cases.
      Reported-and-tested-by: NHanna Hawa <hannah@marvell.com>
      Reported-and-tested-by: NJosef Griebichler <griebichler.josef@gmx.at>
      Reported-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c53776e
  4. 17 7月, 2018 3 次提交
    • T
      ALSA: rawmidi: Change resized buffers atomically · 39675f7a
      Takashi Iwai 提交于
      The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the
      current code is racy.  For example, the sequencer client may write to
      buffer while it being resized.
      
      As a simple workaround, let's switch to the resized buffer inside the
      stream runtime lock.
      
      Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      39675f7a
    • Q
      btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() · 665d4953
      Qu Wenruo 提交于
      In commit ac0b4145 ("btrfs: scrub: Don't use inode pages for device
      replace") we removed the branch of copy_nocow_pages() to avoid
      corruption for compressed nodatasum extents.
      
      However above commit only solves the problem in scrub_extent(), if
      during scrub_pages() we failed to read some pages,
      sctx->no_io_error_seen will be non-zero and we go to fixup function
      scrub_handle_errored_block().
      
      In scrub_handle_errored_block(), for sctx without csum (no matter if
      we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
      which does the similar thing with copy_nocow_pages(), but does it
      without the extra check in copy_nocow_pages() routine.
      
      So for test cases like btrfs/100, where we emulate read errors during
      replace/scrub, we could corrupt compressed extent data again.
      
      This patch will fix it just by avoiding any "optimization" for
      nodatasum, just falls back to the normal fixup routine by try read from
      any good copy.
      
      This also solves WARN_ON() or dead lock caused by lame backref iteration
      in scrub_fixup_nodatasum() routine.
      
      The deadlock or WARN_ON() won't be triggered before commit ac0b4145
      ("btrfs: scrub: Don't use inode pages for device replace") since
      copy_nocow_pages() have better locking and extra check for data extent,
      and it's already doing the fixup work by try to read data from any good
      copy, so it won't go scrub_fixup_nodatasum() anyway.
      
      This patch disables the faulty code and will be removed completely in a
      followup patch.
      
      Fixes: ac0b4145 ("btrfs: scrub: Don't use inode pages for device replace")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      665d4953
    • U
      net/smc: take sock lock in smc_ioctl() · 1992d998
      Ursula Braun 提交于
      SMC ioctl processing requires the sock lock to work properly in
      all thinkable scenarios.
      Problem has been found with RaceFuzzer and fixes:
         KASAN: null-ptr-deref Read in smc_ioctl
      Reported-by: NByoungyoung Lee <lifeasageek@gmail.com>
      Reported-by: syzbot+35b2c5aa76fd398b9fd4@syzkaller.appspotmail.com
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1992d998