1. 23 6月, 2021 8 次提交
  2. 22 6月, 2021 2 次提交
    • T
      x86/fpu: Make init_fpstate correct with optimized XSAVE · f9dfb5e3
      Thomas Gleixner 提交于
      The XSAVE init code initializes all enabled and supported components with
      XRSTOR(S) to init state. Then it XSAVEs the state of the components back
      into init_fpstate which is used in several places to fill in the init state
      of components.
      
      This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
      those use the init optimization and skip writing state of components which
      are in init state. So init_fpstate.xsave still contains all zeroes after
      this operation.
      
      There are two ways to solve that:
      
         1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
            XSAVES is enabled because XSAVES uses compacted format.
      
         2) Save the components which are known to have a non-zero init state by other
            means.
      
      Looking deeper, #2 is the right thing to do because all components the
      kernel supports have all-zeroes init state except the legacy features (FP,
      SSE). Those cannot be hard coded because the states are not identical on all
      CPUs, but they can be saved with FXSAVE which avoids all conditionals.
      
      Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
      a BUILD_BUG_ON() which reminds developers to validate that a newly added
      component has all zeroes init state. As a bonus remove the now unused
      copy_xregs_to_kernel_booting() crutch.
      
      The XSAVE and reshuffle method can still be implemented in the unlikely
      case that components are added which have a non-zero init state and no
      other means to save them. For now, FXSAVE is just simple and good enough.
      
        [ bp: Fix a typo or two in the text. ]
      
      Fixes: 6bad06b7 ("x86, xsave: Use xsaveopt in context-switch path when supported")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
      f9dfb5e3
    • T
      x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate() · 9301982c
      Thomas Gleixner 提交于
      sanitize_restored_user_xstate() preserves the supervisor states only
      when the fx_only argument is zero, which allows unprivileged user space
      to put supervisor states back into init state.
      
      Preserve them unconditionally.
      
       [ bp: Fix a typo or two in the text. ]
      
      Fixes: 5d6b6a6f ("x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.438635017@linutronix.de
      9301982c
  3. 21 6月, 2021 4 次提交
  4. 20 6月, 2021 2 次提交
    • L
      Merge tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · b84a7c28
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Fix initrd corruption caused by our recent change to use relative jump
        labels.
      
        Fix a crash using perf record on systems without a hardware PMU
        backend.
      
        Rework our 64-bit signal handling slighty to make it more closely
        match the old behaviour, after the recent change to use unsafe user
        accessors.
      
        Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
        Axtens, Greg Kurz, and Roman Bolshakov"
      
      * tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
        powerpc: Fix initrd corruption with relative jump labels
        powerpc/signal64: Copy siginfo before changing regs->nip
        powerpc/mem: Add back missing header to fix 'no previous prototype' error
      b84a7c28
    • L
      Merge tag 'perf-tools-fixes-for-v5.13-2021-06-19' of... · 913ec3c2
      Linus Torvalds 提交于
      Merge tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix refcount usage when processing PERF_RECORD_KSYMBOL.
      
       - 'perf stat' metric group fixes.
      
       - Fix 'perf test' non-bash issue with stat bpf counters.
      
       - Update unistd, in.h and socket.h with the kernel sources, silencing
         perf build warnings.
      
      * tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        tools headers UAPI: Sync linux/in.h copy with the kernel sources
        tools headers UAPI: Sync asm-generic/unistd.h with the kernel original
        perf beauty: Update copy of linux/socket.h with the kernel sources
        perf test: Fix non-bash issue with stat bpf counters
        perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL
        perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()
        perf metricgroup: Fix find_evsel_group() event selector
      913ec3c2
  5. 19 6月, 2021 24 次提交
    • L
      Merge tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d9403d30
      Linus Torvalds 提交于
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A build fix to always build modules with the 'medany' code model, as
         the module loader doesn't support 'medlow'.
      
       - A Kconfig warning fix for the SiFive errata.
      
       - A pair of fixes that for regressions to the recent memory layout
         changes.
      
       - A fix for the FU740 device tree.
      
      * tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: dts: fu740: fix cache-controller interrupts
        riscv: Ensure BPF_JIT_REGION_START aligned with PMD size
        riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name
        riscv: sifive: fix Kconfig errata warning
        riscv32: Use medany C model for modules
      d9403d30
    • L
      Merge tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · e14c779a
      Linus Torvalds 提交于
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix zcrypt ioctl hang due to AP queue msg counter dropping below 0
         when pending requests are purged.
      
       - Two fixes for the machine check handler in the entry code.
      
      * tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/ap: Fix hanging ioctl caused by wrong msg counter
        s390/mcck: fix invalid KVM guest condition check
        s390/mcck: fix calculation of SIE critical section size
      e14c779a
    • A
      tools headers UAPI: Sync linux/in.h copy with the kernel sources · 1792a59e
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        32182747 ("icmp: don't send out ICMP messages with a source address of 0.0.0.0")
      
      That don't result in any change in tooling, as INADDR_ are not used to
      generate id->string tables used by 'perf trace'.
      
      This addresses this build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
        diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1792a59e
    • A
      tools headers UAPI: Sync asm-generic/unistd.h with the kernel original · 17d27fc3
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        8b1462b6 ("quota: finish disable quotactl_path syscall")
      
      Those headers are used in some arches to generate the syscall table used
      in 'perf trace' to translate syscall numbers into strings.
      
      This addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
        diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      17d27fc3
    • A
      perf beauty: Update copy of linux/socket.h with the kernel sources · ef83f9ef
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        ea6932d7 ("net: make get_net_ns return error if NET_NS is disabled")
      
      That don't result in any changes in the tables generated from that
      header.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
        diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
      
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef83f9ef
    • I
      perf test: Fix non-bash issue with stat bpf counters · 482698c2
      Ian Rogers 提交于
      $(( .. )) is a bash feature but the test's interpreter is !/bin/sh,
      switch the code to use expr.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210617184216.2075588-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      482698c2
    • R
      perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL · c087e948
      Riccardo Mancini 提交于
      ASan reported a memory leak of BPF-related ksymbols map and dso. The
      leak is caused by refount never reaching 0, due to missing __put calls
      in the function machine__process_ksymbol_register.
      
      Once the dso is inserted in the map, dso__put() should be called
      (map__new2() increases the refcount to 2).
      
      The same thing applies for the map when it's inserted into maps
      (maps__insert() increases the refcount to 2).
      
        $ sudo ./perf record -- sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]
      
        =================================================================
        ==297735==ERROR: LeakSanitizer: detected memory leaks
      
        Direct leak of 6992 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x8e4e53 in map__new2 /home/user/linux/tools/perf/util/map.c:216:20
            #2 0x8cf68c in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:778:10
            [...]
      
        Indirect leak of 8702 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x8728d7 in dso__new_id /home/user/linux/tools/perf/util/dso.c:1256:20
            #2 0x872015 in dso__new /home/user/linux/tools/perf/util/dso.c:1295:9
            #3 0x8cf623 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:774:21
            [...]
      
        Indirect leak of 1520 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
            #2 0x888954 in map__process_kallsym_symbol /home/user/linux/tools/perf/util/symbol.c:710:8
            [...]
      
        Indirect leak of 1406 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
            #2 0x8cfbd8 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:803:8
            [...]
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
      Link: http://lore.kernel.org/lkml/20210612173751.188582-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c087e948
    • J
      perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter() · fe7a98b9
      John Garry 提交于
      The error code is not set at all in the sys event iter function.
      
      This may lead to an uninitialized value of "ret" in
      metricgroup__add_metric() when no CPU metric is added.
      
      Fix by properly setting the error code.
      
      It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
      if we have no CPU or sys event metric matching, then "has_match" should
      be 0 and "ret" is set to -EINVAL.
      
      However gcc cannot detect that it may not have been set after the
      map_for_each_metric() loop for CPU metrics, which is strange.
      
      Fixes: be335ec2 ("perf metricgroup: Support adding metrics for system PMUs")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1623335580-187317-3-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fe7a98b9
    • J
      perf metricgroup: Fix find_evsel_group() event selector · fc96ec4d
      John Garry 提交于
      The following command segfaults on my x86 broadwell:
      
        $ ./perf stat  -M frontend_bound,retiring,backend_bound,bad_speculation sleep 1
        WARNING: grouped events cpus do not match, disabling group:
          anon group { raw 0x10e }
          anon group { raw 0x10e }
        perf: util/evsel.c:1596: get_group_fd: Assertion `!(!leader->core.fd)' failed.
        Aborted (core dumped)
      
      The issue shows itself as a use-after-free in evlist__check_cpu_maps(),
      whereby the leader of an event selector (evsel) has been deleted (yet we
      still attempt to verify for an evsel).
      
      Fundamentally the problem comes from metricgroup__setup_events() ->
      find_evsel_group(), and has developed from the previous fix attempt in
      commit 9c880c24 ("perf metricgroup: Fix for metrics containing
      duration_time").
      
      The problem now is that the logic in checking if an evsel is in the same
      group is subtly broken for the "cycles" event. For the "cycles" event,
      the pmu_name is NULL; however the logic in find_evsel_group() may set an
      event matched against "cycles" as used, when it should not be.
      
      This leads to a condition where an evsel is set, yet its leader is not.
      
      Fix the check for evsel pmu_name by not matching evsels when either has a
      NULL pmu_name.
      
      There is still a pre-existing metric issue whereby the ordering of the
      metrics may break the 'stat' function, as discussed at:
      https://lore.kernel.org/lkml/49c6fccb-b716-1bf0-18a6-cace1cdb66b9@huawei.com/
      
      Fixes: 9c880c24 ("perf metricgroup: Fix for metrics containing duration_time")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> # On a Thinkpad T450S
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1623335580-187317-2-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fc96ec4d
    • D
      riscv: dts: fu740: fix cache-controller interrupts · 7ede12b0
      David Abdurachmanov 提交于
      The order of interrupt numbers is incorrect.
      
      The order for FU740 is: DirError, DataError, DataFail, DirFail
      
      From SiFive FU740-C000 Manual:
      19 - L2 Cache DirError
      20 - L2 Cache DirFail
      21 - L2 Cache DataError
      22 - L2 Cache DataFail
      Signed-off-by: NDavid Abdurachmanov <david.abdurachmanov@sifive.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      7ede12b0
    • J
      riscv: Ensure BPF_JIT_REGION_START aligned with PMD size · 3a02764c
      Jisheng Zhang 提交于
      Andreas reported commit fc850476 ("riscv: bpf: Avoid breaking W^X")
      breaks booting with one kind of defconfig, I reproduced a kernel panic
      with the defconfig:
      
      [    0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220
      [    0.139159] Oops [#1]
      [    0.139303] Modules linked in:
      [    0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1
      [    0.139934] Hardware name: riscv-virtio,qemu (DT)
      [    0.140193] epc : __memset+0xc4/0xfc
      [    0.140416]  ra : skb_flow_dissector_init+0x1e/0x82
      [    0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0
      [    0.140878]  gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158
      [    0.141156]  t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0
      [    0.141424]  s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000
      [    0.141654]  a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064
      [    0.141893]  a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff
      [    0.142126]  s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088
      [    0.142353]  s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438
      [    0.142584]  s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac
      [    0.142810]  s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
      [    0.143042]  t5 : 0000000000000155 t6 : 00000000000003ff
      [    0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f
      [    0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc
      [    0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60
      [    0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168
      [    0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224
      [    0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110
      [    0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc
      [    0.145124] ---[ end trace f1e9643daa46d591 ]---
      
      After some investigation, I think I found the root cause: commit
      2bfc6cd8 ("move kernel mapping outside of linear mapping") moves
      BPF JIT region after the kernel:
      
      | #define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)
      
      The &_end is unlikely aligned with PMD size, so the front bpf jit
      region sits with part of kernel .data section in one PMD size mapping.
      But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is
      called to make the first bpf jit prog ROX, we will make part of kernel
      .data section RO too, so when we write to, for example memset the
      .data section, MMU will trigger a store page fault.
      
      To fix the issue, we need to ensure the BPF JIT region is PMD size
      aligned. This patch acchieve this goal by restoring the BPF JIT region
      to original position, I.E the 128MB before kernel .text section. The
      modification to kasan_init.c is inspired by Alexandre.
      
      Fixes: fc850476 ("riscv: bpf: Avoid breaking W^X")
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      3a02764c
    • J
      riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name · 314b7817
      Jisheng Zhang 提交于
      commit 2bfc6cd8 ("riscv: Move kernel mapping outside of linear
      mapping") makes use of MODULES_VADDR to populate kernel, BPF, modules
      mapping. Currently, MODULES_VADDR is defined as below for RV64:
      
      | #define MODULES_VADDR   (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
      
      But kasan_init() has two local variables which are also named as _start,
      _end, so MODULES_VADDR is evaluated with the local variable _end
      rather than the global "_end" as we expected. Fix this issue by
      renaming the two local variables.
      
      Fixes: 2bfc6cd8 ("riscv: Move kernel mapping outside of linear mapping")
      Signed-off-by: NJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      314b7817
    • L
      Merge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9ed13a17
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
        bluetooth, netfilter and can.
      
        Current release - regressions:
      
         - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
           to fix modifying offloaded qdiscs
      
         - lantiq: net: fix duplicated skb in rx descriptor ring
      
         - rtnetlink: fix regression in bridge VLAN configuration, empty info
           is not an error, bot-generated "fix" was not needed
      
         - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
           creation
      
        Current release - new code bugs:
      
         - ethtool: fix NULL pointer dereference during module EEPROM dump via
           the new netlink API
      
         - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
           queue should not be visible to the stack
      
         - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
      
         - mlx5e: verify dev is present in get devlink port ndo, avoid a panic
      
        Previous releases - regressions:
      
         - neighbour: allow NUD_NOARP entries to be force GCed
      
         - further fixes for fallout from reorg of WiFi locking (staging:
           rtl8723bs, mac80211, cfg80211)
      
         - skbuff: fix incorrect msg_zerocopy copy notifications
      
         - mac80211: fix NULL ptr deref for injected rate info
      
         - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
      
        Previous releases - always broken:
      
         - bpf: more speculative execution fixes
      
         - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
      
         - udp: fix race between close() and udp_abort() resulting in a panic
      
         - fix out of bounds when parsing TCP options before packets are
           validated (in netfilter: synproxy, tc: sch_cake and mptcp)
      
         - mptcp: improve operation under memory pressure, add missing
           wake-ups
      
         - mptcp: fix double-lock/soft lookup in subflow_error_report()
      
         - bridge: fix races (null pointer deref and UAF) in vlan tunnel
           egress
      
         - ena: fix DMA mapping function issues in XDP
      
         - rds: fix memory leak in rds_recvmsg
      
        Misc:
      
         - vrf: allow larger MTUs
      
         - icmp: don't send out ICMP messages with a source address of 0.0.0.0
      
         - cdc_ncm: switch to eth%d interface naming"
      
      * tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
        net: ethernet: fix potential use-after-free in ec_bhf_remove
        selftests/net: Add icmp.sh for testing ICMP dummy address responses
        icmp: don't send out ICMP messages with a source address of 0.0.0.0
        net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
        net: ll_temac: Fix TX BD buffer overwrite
        net: ll_temac: Add memory-barriers for TX BD access
        net: ll_temac: Make sure to free skb when it is completely used
        MAINTAINERS: add Guvenc as SMC maintainer
        bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
        bnxt_en: Fix TQM fastpath ring backing store computation
        bnxt_en: Rediscover PHY capabilities after firmware reset
        cxgb4: fix wrong shift.
        mac80211: handle various extensible elements correctly
        mac80211: reset profile_periodicity/ema_ap
        cfg80211: avoid double free of PMSR request
        cfg80211: make certificate generation more robust
        mac80211: minstrel_ht: fix sample time check
        net: qed: Fix memcpy() overflow of qed_dcbx_params()
        net: cdc_eem: fix tx fixup skb leak
        net: hamradio: fix memory leak in mkiss_close
        ...
      9ed13a17
    • L
      Merge tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6fab154a
      Linus Torvalds 提交于
      Pull btrfs fix from David Sterba:
       "One more fix, for a space accounting bug in zoned mode. It happens
        when a block group is switched back rw->ro and unusable bytes (due to
        zoned constraints) are subtracted twice.
      
        It has user visible effects so I consider it important enough for late
        -rc inclusion and backport to stable"
      
      * tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: zoned: fix negative space_info->bytes_readonly
      6fab154a
    • L
      Merge tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 728a748b
      Linus Torvalds 提交于
      Pull PCI fixes from Bjorn Helgaas:
      
       - Clear 64-bit flag for host bridge windows below 4GB to fix a resource
         allocation regression added in -rc1 (Punit Agrawal)
      
       - Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter)
      
       - Avoid secondary bus resets on TI KeyStone C667X devices (Antti
         Järvinen)
      
       - Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni)
      
       - Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun)
      
       - Avoid broken ATS on AMD Navi14 GPU (Evan Quan)
      
       - Trust Broadcom BCM57414 NIC to isolate functions even though it
         doesn't advertise ACS support (Sriharsha Basavapatna)
      
       - Work around AMD RS690 BIOSes that don't configure DMA above 4GB
         (Mikel Rychliski)
      
       - Fix panic during PIO transfer on Aardvark controller (Pali Rohár)
      
      * tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: aardvark: Fix kernel panic during PIO transfer
        PCI: Add AMD RS690 quirk to enable 64-bit DMA
        PCI: Add ACS quirk for Broadcom BCM57414 NIC
        PCI: Mark AMD Navi14 GPU ATS as broken
        PCI: Work around Huawei Intelligent NIC VF FLR erratum
        PCI: Mark some NVIDIA GPUs to avoid bus reset
        PCI: Mark TI C667X to avoid bus reset
        PCI: tegra194: Fix MCFG quirk build regressions
        PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB
      728a748b
    • M
      afs: Re-enable freezing once a page fault is interrupted · 9620ad86
      Matthew Wilcox (Oracle) 提交于
      If a task is killed during a page fault, it does not currently call
      sb_end_pagefault(), which means that the filesystem cannot be frozen
      at any time thereafter.  This may be reported by lockdep like this:
      
      ====================================
      WARNING: fsstress/10757 still has locks held!
      5.13.0-rc4-build4+ #91 Not tainted
      ------------------------------------
      1 lock held by fsstress/10757:
       #0: ffff888104eac530
       (
      sb_pagefaults
      
      as filesystem freezing is modelled as a lock.
      
      Fix this by removing all the direct returns from within the function,
      and using 'ret' to indicate whether we were interrupted or successful.
      
      Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap")
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9620ad86
    • P
      net: ethernet: fix potential use-after-free in ec_bhf_remove · 9cca0c2d
      Pavel Skripkin 提交于
      static void ec_bhf_remove(struct pci_dev *dev)
      {
      ...
      	struct ec_bhf_priv *priv = netdev_priv(net_dev);
      
      	unregister_netdev(net_dev);
      	free_netdev(net_dev);
      
      	pci_iounmap(dev, priv->dma_io);
      	pci_iounmap(dev, priv->io);
      ...
      }
      
      priv is netdev private data, but it is used
      after free_netdev(). It can cause use-after-free when accessing priv
      pointer. So, fix it by moving free_netdev() after pci_iounmap()
      calls.
      
      Fixes: 6af55ff5 ("Driver for Beckhoff CX5020 EtherCAT master module.")
      Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cca0c2d
    • D
      Merge tag 'mac80211-for-net-2021-06-18' of... · 0d1dc9e1
      David S. Miller 提交于
      Merge tag 'mac80211-for-net-2021-06-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A couple of straggler fixes:
       * a minstrel HT sample check fix
       * peer measurement could double-free on races
       * certificate file generation at build time could
         sometimes hang
       * some parameters weren't reset between connections
         in mac80211
       * some extensible elements were treated as non-
         extensible, possibly causuing bad connections
         (or failures) if the AP adds data
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d1dc9e1
    • T
      selftests/net: Add icmp.sh for testing ICMP dummy address responses · 7e9838b7
      Toke Høiland-Jørgensen 提交于
      This adds a new icmp.sh selftest for testing that the kernel will respond
      correctly with an ICMP unreachable message with the dummy (192.0.0.8)
      source address when there are no IPv4 addresses configured to use as source
      addresses.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e9838b7
    • T
      icmp: don't send out ICMP messages with a source address of 0.0.0.0 · 32182747
      Toke Høiland-Jørgensen 提交于
      When constructing ICMP response messages, the kernel will try to pick a
      suitable source address for the outgoing packet. However, if no IPv4
      addresses are configured on the system at all, this will fail and we end up
      producing an ICMP message with a source address of 0.0.0.0. This can happen
      on a box routing IPv4 traffic via v6 nexthops, for instance.
      
      Since 0.0.0.0 is not generally routable on the internet, there's a good
      chance that such ICMP messages will never make it back to the sender of the
      original packet that the ICMP message was sent in response to. This, in
      turn, can create connectivity and PMTUd problems for senders. Fortunately,
      RFC7600 reserves a dummy address to be used as a source for ICMP
      messages (192.0.0.8/32), so let's teach the kernel to substitute that
      address as a last resort if the regular source address selection procedure
      fails.
      
      Below is a quick example reproducing this issue with network namespaces:
      
      ip netns add ns0
      ip l add type veth peer netns ns0
      ip l set dev veth0 up
      ip a add 10.0.0.1/24 dev veth0
      ip a add fc00:dead:cafe:42::1/64 dev veth0
      ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2
      ip -n ns0 l set dev veth0 up
      ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0
      ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1
      ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0
      ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1
      tcpdump -tpni veth0 -c 2 icmp &
      ping -w 1 10.1.0.1 > /dev/null
      tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
      listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
      IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64
      IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
      2 packets captured
      2 packets received by filter
      0 packets dropped by kernel
      
      With this patch the above capture changes to:
      IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64
      IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: NJuliusz Chroboczek <jch@irif.fr>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32182747
    • E
      net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY · f6396341
      Esben Haabendal 提交于
      As documented in Documentation/networking/driver.rst, the ndo_start_xmit
      method must not return NETDEV_TX_BUSY under any normal circumstances, and
      as recommended, we simply stop the tx queue in advance, when there is a
      risk that the next xmit would cause a NETDEV_TX_BUSY return.
      Signed-off-by: NEsben Haabendal <esben@geanix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6396341
    • E
      net: ll_temac: Fix TX BD buffer overwrite · c364df24
      Esben Haabendal 提交于
      Just as the initial check, we need to ensure num_frag+1 buffers available,
      as that is the number of buffers we are going to use.
      
      This fixes a buffer overflow, which might be seen during heavy network
      load. Complete lockup of TEMAC was reproducible within about 10 minutes of
      a particular load.
      
      Fixes: 84823ff8 ("net: ll_temac: Fix race condition causing TX hang")
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: NEsben Haabendal <esben@geanix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c364df24
    • E
      net: ll_temac: Add memory-barriers for TX BD access · 28d9fab4
      Esben Haabendal 提交于
      Add a couple of memory-barriers to ensure correct ordering of read/write
      access to TX BDs.
      
      In xmit_done, we should ensure that reading the additional BD fields are
      only done after STS_CTRL_APP0_CMPLT bit is set.
      
      When xmit_done marks the BD as free by setting APP0=0, we need to ensure
      that the other BD fields are reset first, so we avoid racing with the xmit
      path, which writes to the same fields.
      
      Finally, making sure to read APP0 of next BD after the current BD, ensures
      that we see all available buffers.
      Signed-off-by: NEsben Haabendal <esben@geanix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28d9fab4
    • E
      net: ll_temac: Make sure to free skb when it is completely used · 6aa32217
      Esben Haabendal 提交于
      With the skb pointer piggy-backed on the TX BD, we have a simple and
      efficient way to free the skb buffer when the frame has been transmitted.
      But in order to avoid freeing the skb while there are still fragments from
      the skb in use, we need to piggy-back on the TX BD of the skb, not the
      first.
      
      Without this, we are doing use-after-free on the DMA side, when the first
      BD of a multi TX BD packet is seen as completed in xmit_done, and the
      remaining BDs are still being processed.
      
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: NEsben Haabendal <esben@geanix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6aa32217