1. 05 12月, 2021 2 次提交
    • H
      parisc: Mark cr16 CPU clocksource unstable on all SMP machines · afdb4a5b
      Helge Deller 提交于
      In commit c8c37359 ("parisc: Enhance detection of synchronous cr16
      clocksources") I assumed that CPUs on the same physical core are syncronous.
      While booting up the kernel on two different C8000 machines, one with a
      dual-core PA8800 and one with a dual-core PA8900 CPU, this turned out to be
      wrong. The symptom was that I saw a jump in the internal clocks printed to the
      syslog and strange overall behaviour.  On machines which have 4 cores (2
      dual-cores) the problem isn't visible, because the current logic already marked
      the cr16 clocksource unstable in this case.
      
      This patch now marks the cr16 interval timers unstable if we have more than one
      CPU in the system, and it fixes this issue.
      
      Fixes: c8c37359 ("parisc: Enhance detection of synchronous cr16 clocksources")
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v5.15+
      afdb4a5b
    • H
      parisc: Fix "make install" on newer debian releases · 0f9fee4c
      Helge Deller 提交于
      On newer debian releases the debian-provided "installkernel" script is
      installed in /usr/sbin. Fix the kernel install.sh script to look for the
      script in this directory as well.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v3.13+
      0f9fee4c
  2. 01 12月, 2021 3 次提交
  3. 29 11月, 2021 7 次提交
  4. 28 11月, 2021 16 次提交
  5. 27 11月, 2021 12 次提交
    • Y
      io_uring: Fix undefined-behaviour in io_issue_sqe · f6223ff7
      Ye Bin 提交于
      We got issue as follows:
      ================================================================================
      UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
      signed integer overflow:
      -4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
      CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x170/0x1dc lib/dump_stack.c:118
       ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
       handle_overflow+0x188/0x1dc lib/ubsan.c:192
       __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
       ktime_set include/linux/ktime.h:42 [inline]
       timespec64_to_ktime include/linux/ktime.h:78 [inline]
       io_timeout fs/io_uring.c:5153 [inline]
       io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
       __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
       io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
       io_submit_sqe fs/io_uring.c:6137 [inline]
       io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
       __do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
       __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
       invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
       el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
       el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
       el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
      ================================================================================
      
      As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
      negative value maybe lead to overflow.
      To address this issue, we must check if 'sec' is negative.
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      f6223ff7
    • Y
      io_uring: fix soft lockup when call __io_remove_buffers · 1d0254e6
      Ye Bin 提交于
      I got issue as follows:
      [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      [  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  594.364987] Modules linked in:
      [  594.365405] irq event stamp: 604180238
      [  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
      [  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
      [  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
      [  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
      [  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
      [  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [  594.373604] Workqueue: events_unbound io_ring_exit_work
      [  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
      [  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
      [  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
      [  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
      [  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
      [  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
      [  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
      [  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
      [  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
      [  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
      [  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  594.387403] Call Trace:
      [  594.387738]  <TASK>
      [  594.388042]  find_and_remove_object+0x118/0x160
      [  594.389321]  delete_object_full+0xc/0x20
      [  594.389852]  kfree+0x193/0x470
      [  594.390275]  __io_remove_buffers.part.0+0xed/0x147
      [  594.390931]  io_ring_ctx_free+0x342/0x6a2
      [  594.392159]  io_ring_exit_work+0x41e/0x486
      [  594.396419]  process_one_work+0x906/0x15a0
      [  594.399185]  worker_thread+0x8b/0xd80
      [  594.400259]  kthread+0x3bf/0x4a0
      [  594.401847]  ret_from_fork+0x22/0x30
      [  594.402343]  </TASK>
      
      Message from syslogd@localhost at Nov 13 09:09:54 ...
      kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      
      We can reproduce this issue by follow syzkaller log:
      r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
      sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
      syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
      io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)
      
      The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
      to soft lockup.
      To solve this issue, we need add schedule point when do while loop in
      '__io_remove_buffers'.
      After add  schedule point we do regression, get follow data.
      [  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      [  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
      [  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      ...
      
      Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators")
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      1d0254e6
    • S
      tracing: Fix pid filtering when triggers are attached · a55f224f
      Steven Rostedt (VMware) 提交于
      If a event is filtered by pid and a trigger that requires processing of
      the event to happen is a attached to the event, the discard portion does
      not take the pid filtering into account, and the event will then be
      recorded when it should not have been.
      
      Cc: stable@vger.kernel.org
      Fixes: 3fdaf80f ("tracing: Implement event pid filtering")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a55f224f
    • A
      iommu/vt-d: Fix unmap_pages support · 86dc40c7
      Alex Williamson 提交于
      When supporting only the .map and .unmap callbacks of iommu_ops,
      the IOMMU driver can make assumptions about the size and alignment
      used for mappings based on the driver provided pgsize_bitmap.  VT-d
      previously used essentially PAGE_MASK for this bitmap as any power
      of two mapping was acceptably filled by native page sizes.
      
      However, with the .map_pages and .unmap_pages interface we're now
      getting page-size and count arguments.  If we simply combine these
      as (page-size * count) and make use of the previous map/unmap
      functions internally, any size and alignment assumptions are very
      different.
      
      As an example, a given vfio device assignment VM will often create
      a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
      does not support IOMMU super pages, the unmap_pages interface will
      ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
      will recurse down to level 2 of the page table where the first half
      of the pfn range exactly matches the entire pte level.  We clear the
      pte, increment the pfn by the level size, but (oops) the next pte is
      on a new page, so we exit the loop an pop back up a level.  When we
      then update the pfn based on that higher level, we seem to assume
      that the previous pfn value was at the start of the level.  In this
      case the level size is 256K pfns, which we add to the base pfn and
      get a results of 0x7fe00, which is clearly greater than 0x401ff,
      so we're done.  Meanwhile we never cleared the ptes for the remainder
      of the range.  When the VM remaps this range, we're overwriting valid
      ptes and the VT-d driver complains loudly, as reported by the user
      report linked below.
      
      The fix for this seems relatively simple, if each iteration of the
      loop in dma_pte_clear_level() is assumed to clear to the end of the
      level pte page, then our next pfn should be calculated from level_pfn
      rather than our working pfn.
      
      Fixes: 3f34f125 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
      Reported-by: NAjay Garg <ajaygargnsit@gmail.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
      Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omenSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      86dc40c7
    • C
      iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() · 4e5973dd
      Christophe JAILLET 提交于
      If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
      Go through the end of the function instead. This way, the missing
      'rcu_read_unlock()' is called.
      
      Fixes: 7afd7f6a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.frSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      4e5973dd
    • A
      iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 · f7ff3cff
      Alex Bee 提交于
      With the submission of iommu driver for RK3568 a subtle bug was
      introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
      the other way arround - that leads to random errors, especially when
      addresses beyond 32 bit are used.
      
      Fix it.
      
      Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2")
      Signed-off-by: NAlex Bee <knaerzche@gmail.com>
      Tested-by: NPeter Geis <pgwipeout@gmail.com>
      Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
      Tested-by: NDan Johansen <strit@manjaro.org>
      Reviewed-by: NBenjamin Gaignard <benjamin.gaignard@collabora.com>
      Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      f7ff3cff
    • J
      iommu/amd: Clarify AMD IOMMUv2 initialization messages · 717e88aa
      Joerg Roedel 提交于
      The messages printed on the initialization of the AMD IOMMUv2 driver
      have caused some confusion in the past. Clarify the messages to lower
      the confusion in the future.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org
      717e88aa
    • J
      iommu/vt-d: Remove unused PASID_DISABLED · 21e96a20
      Joerg Roedel 提交于
      The macro is unused after commit 00ecd540 so it can be removed.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Fixes: 00ecd540 ("iommu/vt-d: Clean up unused PASID updating functions")
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
      21e96a20
    • L
      Merge tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c5c17547
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from netfilter.
      
        Current release - regressions:
      
         - r8169: fix incorrect mac address assignment
      
         - vlan: fix underflow for the real_dev refcnt when vlan creation
           fails
      
         - smc: avoid warning of possible recursive locking
      
        Current release - new code bugs:
      
         - vsock/virtio: suppress used length validation
      
         - neigh: fix crash in v6 module initialization error path
      
        Previous releases - regressions:
      
         - af_unix: fix change in behavior in read after shutdown
      
         - igb: fix netpoll exit with traffic, avoid warning
      
         - tls: fix splice_read() when starting mid-record
      
         - lan743x: fix deadlock in lan743x_phy_link_status_change()
      
         - marvell: prestera: fix bridge port operation
      
        Previous releases - always broken:
      
         - tcp_cubic: fix spurious Hystart ACK train detections for
           not-cwnd-limited flows
      
         - nexthop: fix refcount issues when replacing IPv6 groups
      
         - nexthop: fix null pointer dereference when IPv6 is not enabled
      
         - phylink: force link down and retrigger resolve on interface change
      
         - mptcp: fix delack timer length calculation and incorrect early
           clearing
      
         - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
      
         - nfc: virtual_ncidev: change default device permissions
      
         - netfilter: ctnetlink: fix error codes and flags used for kernel
           side filtering of dumps
      
         - netfilter: flowtable: fix IPv6 tunnel addr match
      
         - ncsi: align payload to 32-bit to fix dropped packets
      
         - iavf: fix deadlock and loss of config during VF interface reset
      
         - ice: avoid bpf_prog refcount underflow
      
         - ocelot: fix broken PTP over IP and PTP API violations
      
        Misc:
      
         - marvell: mvpp2: increase MTU limit when XDP enabled"
      
      * tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
        net: dsa: microchip: implement multi-bridge support
        net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
        net: mscc: ocelot: set up traps for PTP packets
        net: ptp: add a definition for the UDP port for IEEE 1588 general messages
        net: mscc: ocelot: create a function that replaces an existing VCAP filter
        net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
        net: hns3: fix incorrect components info of ethtool --reset command
        net: hns3: fix one incorrect value of page pool info when queried by debugfs
        net: hns3: add check NULL address for page pool
        net: hns3: fix VF RSS failed problem after PF enable multi-TCs
        net: qed: fix the array may be out of bound
        net/smc: Don't call clcsock shutdown twice when smc shutdown
        net: vlan: fix underflow for the real_dev refcnt
        ptp: fix filter names in the documentation
        ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
        nfc: virtual_ncidev: change default device permissions
        net/sched: sch_ets: don't peek at classes beyond 'nbands'
        net: stmmac: Disable Tx queues when reconfiguring the interface
        selftests: tls: test for correct proto_ops
        tls: fix replacing proto_ops
        ...
      c5c17547
    • O
      net: dsa: microchip: implement multi-bridge support · b3612ccd
      Oleksij Rempel 提交于
      Current driver version is able to handle only one bridge at time.
      Configuring two bridges on two different ports would end up shorting this
      bridges by HW. To reproduce it:
      
      	ip l a name br0 type bridge
      	ip l a name br1 type bridge
      	ip l s dev br0 up
      	ip l s dev br1 up
      	ip l s lan1 master br0
      	ip l s dev lan1 up
      	ip l s lan2 master br1
      	ip l s dev lan2 up
      
      	Ping on lan1 and get response on lan2, which should not happen.
      
      This happened, because current driver version is storing one global "Port VLAN
      Membership" and applying it to all ports which are members of any
      bridge.
      To solve this issue, we need to handle each port separately.
      
      This patch is dropping the global port member storage and calculating
      membership dynamically depending on STP state and bridge participation.
      
      Note: STP support was broken before this patch and should be fixed
      separately.
      
      Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b3612ccd
    • L
      Merge tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5367cf1c
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These fix a NULL pointer dereference in the CPPC library code and a
        locking issue related to printing the names of ACPI device nodes in
        the device properties framework.
      
        Specifics:
      
         - Fix NULL pointer dereference in the CPPC library code occuring on
           hybrid systems without CPPC support (Rafael Wysocki).
      
         - Avoid attempts to acquire a semaphore with interrupts off when
           printing the names of ACPI device nodes and clean up code on top of
           that fix (Sakari Ailus)"
      
      * tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
        ACPI: Make acpi_node_get_parent() local
        ACPI: Get acpi_device's parent from the parent field
      5367cf1c
    • L
      Merge tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0ce629b1
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These address three issues in the intel_pstate driver and fix two
        problems related to hibernation.
      
        Specifics:
      
         - Make intel_pstate work correctly on Ice Lake server systems with
           out-of-band performance control enabled (Adamos Ttofari).
      
         - Fix EPP handling in intel_pstate during CPU offline and online in
           the active mode (Rafael Wysocki).
      
         - Make intel_pstate support ITMT on asymmetric systems with
           overclocking enabled (Srinivas Pandruvada).
      
         - Fix hibernation image saving when using the user space interface
           based on the snapshot special device file (Evan Green).
      
         - Make the hibernation code release the snapshot block device using
           the same mode that was used when acquiring it (Thomas Zeitlhofer)"
      
      * tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: hibernate: Fix snapshot partial write lengths
        PM: hibernate: use correct mode for swsusp_close()
        cpufreq: intel_pstate: ITMT support for overclocked system
        cpufreq: intel_pstate: Fix active mode offline/online EPP handling
        cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
      0ce629b1