1. 03 11月, 2020 30 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 32b734e0
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        29dcc60f ("x86/boot/compressed/64: Add stage1 #VC handler")
        36e1be8a ("perf/x86/amd/ibs: Fix raw sample data accumulation")
        59a854e2 ("perf/x86/intel: Support TopDown metrics on Ice Lake")
        7b2c05a1 ("perf/x86/intel: Generic support for hardware TopDown metrics")
        99e40204 ("x86/msr: Move the F15h MSRs where they belong")
        b57de6cd ("x86/sev-es: Add SEV-ES Feature Detection")
        ed7bde7a ("cpufreq: intel_pstate: Allow enable/disable energy efficiency")
        f0f2f9fe ("x86/msr-index: Define an IA32_PASID MSR")
      
      That cause these changes in tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-10-19 13:27:33.195274425 -0300
        +++ after	2020-10-19 13:27:44.144507610 -0300
        @@ -113,6 +113,8 @@
         	[0x00000309] = "CORE_PERF_FIXED_CTR0",
         	[0x0000030a] = "CORE_PERF_FIXED_CTR1",
         	[0x0000030b] = "CORE_PERF_FIXED_CTR2",
        +	[0x0000030c] = "CORE_PERF_FIXED_CTR3",
        +	[0x00000329] = "PERF_METRICS",
         	[0x00000345] = "IA32_PERF_CAPABILITIES",
         	[0x0000038d] = "CORE_PERF_FIXED_CTR_CTRL",
         	[0x0000038e] = "CORE_PERF_GLOBAL_STATUS",
        @@ -222,6 +224,7 @@
         	[0x00000774] = "HWP_REQUEST",
         	[0x00000777] = "HWP_STATUS",
         	[0x00000d90] = "IA32_BNDCFGS",
        +	[0x00000d93] = "IA32_PASID",
         	[0x00000da0] = "IA32_XSS",
         	[0x00000dc0] = "LBR_INFO_0",
         	[0x00000ffc] = "IA32_BNDCFGS_RSVD",
        @@ -279,6 +282,7 @@
         	[0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE",
         	[0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA",
         	[0xc001011f - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VIRT_SPEC_CTRL",
        +	[0xc0010130 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV_ES_GHCB",
         	[0xc0010131 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV",
         	[0xc0010140 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_ID_LENGTH",
         	[0xc0010141 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_STATUS",
        $
      
      Which causes these parts of tools/perf/ to be rebuilt:
      
        CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
        DESCEND  plugins
        GEN      /tmp/build/perf/python/perf.so
        INSTALL  trace_plugins
        LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
        LD       /tmp/build/perf/trace/beauty/perf-in.o
        LD       /tmp/build/perf/perf-in.o
        LINK     /tmp/build/perf/per
      
      At some point these should just be tables read by perf on demand.
      
      This addresses this perf tools build warning:
      
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32b734e0
    • A
      tools x86 headers: Update required-features.h header from the kernel · 8b2fc25a
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        ecac7181 ("x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT")
      
      That don entail any changes in tooling, just addressing these perf tools
      build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/required-features.h' differs from latest version at 'arch/x86/include/asm/required-features.h'
        diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b2fc25a
    • A
      tools x86 headers: Update cpufeatures.h headers copies · 40a6bbf5
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        5866e920 ("x86/cpu: Add hardware-enforced cache coherency as a CPUID feature")
        ff4f8281 ("x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions")
        360e7c5c ("x86/cpufeatures: Add SEV-ES CPU feature")
        18ec63fa ("x86/cpufeatures: Enumerate TSX suspend load address tracking instructions")
        e48cb1a3 ("x86/resctrl: Enumerate per-thread MBA controls")
      
      Which don't cause any changes in tooling, just addresses these build
      warnings:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
        diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: Kyung Min Park <kyung.min.park@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40a6bbf5
    • A
      tools headers UAPI: Update fscrypt.h copy · d0448d6a
      Arnaldo Carvalho de Melo 提交于
      To get the changes from:
      
        c7f0207b ("fscrypt: make "#define fscrypt_policy" user-only")
      
      That don't cause any changes in tools/perf, only addresses this perf
      tools build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
        diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0448d6a
    • A
      tools headers UAPI: Sync drm/i915_drm.h with the kernel sources · 9e228f48
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        13149e8b ("drm/i915: add syncobj timeline support")
        cda9edd0 ("drm/i915: introduce a mechanism to extend execbuf2")
      
      That don't result in any changes in tooling, just silences this perf
      build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
        diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9e228f48
    • A
      tools headers UAPI: Sync prctl.h with the kernel sources · ad6330ac
      Arnaldo Carvalho de Melo 提交于
      To get the changes in:
      
        1c101da8 ("arm64: mte: Allow user control of the tag check mode via prctl()")
        af5ce952 ("arm64: mte: Allow user control of the generated random tags via prctl()")
      
      Which don't cause any change in tooling, only addresses this perf build
      warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
        diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ad6330ac
    • A
      perf scripting python: Avoid declaring function pointers with a visibility attribute · d0e7b0c7
      Arnaldo Carvalho de Melo 提交于
      To avoid this:
      
        util/scripting-engines/trace-event-python.c: In function 'python_start_script':
        util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes]
         1595 |  PyMODINIT_FUNC (*initfunc)(void);
              |  ^~~~~~~~~~~~~~
      
      That started breaking when building with PYTHON=python3 and these gcc
      versions (I haven't checked with the clang ones, maybe it breaks there
      as well):
      
        # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz
        # dm  fedora:33 fedora:rawhide
           1   107.80 fedora:33         : Ok   gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33)
           2    92.47 fedora:rawhide    : Ok   gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34)
        #
      
      Avoid that by ditching that 'initfunc' function pointer with its:
      
          #define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default")))
          #define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject*
      
      And just call PyImport_AppendInittab() at the end of the ifdef python3
      block with the functions that were being attributed to that initfunc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0e7b0c7
    • P
      perf tools: Remove broken __no_tail_call attribute · 9ae1e990
      Peter Zijlstra 提交于
      The GCC specific __attribute__((optimize)) attribute does not what is
      commonly expected and is explicitly recommended against using in
      production code by the GCC people.
      
      Unlike what is often expected, it doesn't add to the optimization flags,
      but it fully replaces them, loosing any and all optimization flags
      provided by the compiler commandline.
      
      The only guaranteed upon means of inhibiting tail-calls is by placing a
      volatile asm with side-effects after the call such that the tail-call simply
      cannot be done.
      
      Given the original commit wasn't specific on which calls were the problem, this
      removal might re-introduce the problem, which can then be re-analyzed and cured
      properly.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Acked-by: NMiguel Ojeda <ojeda@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Kook <keescook@chromium.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ae1e990
    • J
      perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX · 0dfbe4c6
      Jin Yao 提交于
      Ian reports an issue that the metric DRAM_BW_Use often remains 0.
      
      The metric expression for DRAM_BW_Use on CLX/SKX:
      
      "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time"
      
      The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/
      are scaled up by 64, that is to turn a count of cache lines into bytes,
      the count is then divided by 1000000000 to give GB.
      
      However, the counts of uncore_imc/cas_count_read/ and
      uncore_imc/cas_count_write/ have been scaled yet.
      
      The scale values are from sysfs, such as
      /sys/devices/uncore_imc_0/events/cas_count_read.scale.
      It's 6.103515625e-5 (64 / 1024.0 / 1024.0).
      
      So if we use original metric expression, the result is not correct.
      
      But the difficulty is, for SKL client, the counts are not scaled.
      
      The metric expression for DRAM_BW_Use on SKL:
      
      "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000"
      
      root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1
      
       Performance counter stats for 'system wide':
      
                     190      arb/event=0x84,umask=0x1/ #     1.86 DRAM_BW_Use
              29,093,178      arb/event=0x81,umask=0x1/
           1,000,703,287 ns   duration_time
      
             1.000703287 seconds time elapsed
      
      The result is expected.
      
      So the easy way is just change the metric expression for CLX/SKX.
      This patch changes the metric expression to:
      
      "( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 ) / 1000000000 ) / duration_time"
      
      1048576 = 1024 * 1024.
      
      Before (tested on CLX):
      
      root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1
      
       Performance counter stats for 'system wide':
      
                  765.35 MiB  uncore_imc/cas_count_read/ #     0.00 DRAM_BW_Use
                    5.42 MiB  uncore_imc/cas_count_write/
              1001515088 ns   duration_time
      
             1.001515088 seconds time elapsed
      
      After:
      
      root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1
      
       Performance counter stats for 'system wide':
      
                  767.95 MiB  uncore_imc/cas_count_read/ #     0.80 DRAM_BW_Use
                    5.02 MiB  uncore_imc/cas_count_write/
              1001900010 ns   duration_time
      
             1.001900010 seconds time elapsed
      
      Fixes: 038d3b53 ("perf vendor events intel: Update CascadelakeX events to v1.08")
      Fixes: b5ff7f27 ("perf vendor events: Update SkylakeX events to v1.21")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20201023005334.7869-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0dfbe4c6
    • S
      perf trace: Fix segfault when trying to trace events by cgroup · a6293f36
      Stanislav Ivanichkin 提交于
        # ./perf trace -e sched:sched_switch -G test -a sleep 1
        perf: Segmentation fault
        Obtained 11 stack frames.
        ./perf(sighandler_dump_stack+0x43) [0x55cfdc636db3]
        /lib/x86_64-linux-gnu/libc.so.6(+0x3efcf) [0x7fd23eecafcf]
        ./perf(parse_cgroups+0x36) [0x55cfdc673f36]
        ./perf(+0x3186ed) [0x55cfdc70d6ed]
        ./perf(parse_options_subcommand+0x629) [0x55cfdc70e999]
        ./perf(cmd_trace+0x9c2) [0x55cfdc5ad6d2]
        ./perf(+0x1e8ae0) [0x55cfdc5ddae0]
        ./perf(+0x1e8ded) [0x55cfdc5ddded]
        ./perf(main+0x370) [0x55cfdc556f00]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fd23eeadb96]
        ./perf(_start+0x29) [0x55cfdc557389]
        Segmentation fault
        #
      
       It happens because "struct trace" in option->value is passed to the
       parse_cgroups function instead of "struct evlist".
      
      Fixes: 9ea42ba4 ("perf trace: Support setting cgroups as targets")
      Signed-off-by: NStanislav Ivanichkin <sivanichkin@yandex-team.ru>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
      Link: http://lore.kernel.org/lkml/20201027094357.94881-1-sivanichkin@yandex-team.ruSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a6293f36
    • T
      perf tools: Fix crash with non-jited bpf progs · ab8bf5f2
      Tommi Rantala 提交于
      The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to
      the bpf interpreter, ie. within kernel text section. When processing the
      unregister event, this causes unexpected removal of vmlinux_map,
      crashing perf later in cleanup:
      
        # perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop
        PCOMM            PID    PPID   RET ARGS
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ]
        perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
        Aborted (core dumped)
      
        # perf script -D|grep KSYM
        0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
        0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_8c42dee26e8cd4c2
        0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
        108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve
        108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
        109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve
        109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
        perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
      
      Here the addresses match the bpf interpreter:
      
        # grep -e ffffffffa9b6b530 -e ffffffffa9b6b3b0 -e ffffffffa9b6b3f0 /proc/kallsyms
        ffffffffa9b6b3b0 t __bpf_prog_run224
        ffffffffa9b6b3f0 t __bpf_prog_run192
        ffffffffa9b6b530 t __bpf_prog_run32
      
      Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL
      unregister event.
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201016114718.54332-1-tommi.t.rantala@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab8bf5f2
    • A
      tools headers UAPI: Update process_madvise affected files · 263e452e
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        ecb8ac8b ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
      
      That addresses these perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
        diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
        Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
        diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      263e452e
    • A
      perf tools: Update copy of libbpf's hashmap.c · e555b4b8
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        85367030 ("libbpf: Centralize poisoning and poison reallocarray()")
        7d9c71e1 ("libbpf: Extract generic string hashing function for reuse")
      
      That don't entail any changes in tools/perf.
      
      This addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
        diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
      
      Not a kernel ABI, its just that this uses the mechanism in place for
      checking kernel ABI files drift.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e555b4b8
    • J
      perf tools: Remove LTO compiler options when building perl support · b773ea65
      Justin M. Forbes 提交于
      To avoid breaking the build by mixing files compiled with things coming
      from distro specific compiler options for perl with the rest of perf,
      i.e. to avoid this:
      
        `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o
      
      Noticed on Fedora 33.
      Signed-off-by: NJustin M. Forbes <jforbes@fedoraproject.org>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1593431
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://src.fedoraproject.org/rpms/kernel-tools/c/589a32b62f0c12516ab7b34e3dd30d450145bfa4?branch=masterSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b773ea65
    • L
      Merge branch 'akpm' (patches from Andrew) · b7cbaf59
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "Subsystems affected by this patch series: mm (memremap, memcg,
        slab-generic, kasan, mempolicy, pagecache, oom-kill, pagemap),
        kthread, signals, lib, epoll, and core-kernel"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        kernel/hung_task.c: make type annotations consistent
        epoll: add a selftest for epoll timeout race
        mm: always have io_remap_pfn_range() set pgprot_decrypted()
        mm, oom: keep oom_adj under or at upper limit when printing
        kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
        mm/truncate.c: make __invalidate_mapping_pages() static
        lib/crc32test: remove extra local_irq_disable/enable
        ptrace: fix task_join_group_stop() for the case when current is traced
        mm: mempolicy: fix potential pte_unmap_unlock pte error
        kasan: adopt KUNIT tests to SW_TAGS mode
        mm: memcg: link page counters to root if use_hierarchy is false
        mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
        hugetlb_cgroup: fix reservation accounting
        mm/mremap_pages: fix static key devmap_managed_key updates
      b7cbaf59
    • L
      kernel/hung_task.c: make type annotations consistent · 3b70ae4f
      Lukas Bulwahn 提交于
      Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      removed various __user annotations from function signatures as part of
      its refactoring.
      
      It also removed the __user annotation for proc_dohung_task_timeout_secs()
      at its declaration in sched/sysctl.h, but not at its definition in
      kernel/hung_task.c.
      
      Hence, sparse complains:
      
        kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))
      
      Adjust the annotation at the definition fitting to that refactoring to make
      sparse happy again, which also resolves this warning from sparse:
      
        kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
        kernel/hung_task.c:277:52:    expected void *
        kernel/hung_task.c:277:52:    got void [noderef] __user *buffer
      
      No functional change. No change in object code.
      Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Ignatov <rdna@fb.com>
      Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b70ae4f
    • S
      epoll: add a selftest for epoll timeout race · afabdf33
      Soheil Hassas Yeganeh 提交于
      Add a test case to ensure an event is observed by at least one poller
      when an epoll timeout is used.
      Signed-off-by: NGuantao Liu <guantaol@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Link: https://lkml.kernel.org/r/20201028180202.952079-2-soheil.kdev@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afabdf33
    • J
      mm: always have io_remap_pfn_range() set pgprot_decrypted() · f8f6ae5d
      Jason Gunthorpe 提交于
      The purpose of io_remap_pfn_range() is to map IO memory, such as a
      memory mapped IO exposed through a PCI BAR.  IO devices do not
      understand encryption, so this memory must always be decrypted.
      Automatically call pgprot_decrypted() as part of the generic
      implementation.
      
      This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
      using io_remap_pfn_range() to expose BAR pages to user space to fail.
      The CPU will encrypt access to those BAR pages instead of passing
      unencrypted IO directly to the device.
      
      Places not mapping IO should use remap_pfn_range().
      
      Fixes: aca20d54 ("x86/mm: Add support to make use of Secure Memory Encryption")
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Dave Young" <dyoung@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/0-v1-025d64bdf6c4+e-amd_sme_fix_jgg@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f6ae5d
    • C
      mm, oom: keep oom_adj under or at upper limit when printing · 66606567
      Charles Haithcock 提交于
      For oom_score_adj values in the range [942,999], the current
      calculations will print 16 for oom_adj.  This patch simply limits the
      output so output is inline with docs.
      Signed-off-by: NCharles Haithcock <chaithco@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Link: https://lkml.kernel.org/r/20201020165130.33927-1-chaithco@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66606567
    • Z
      kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled · 6993d0fd
      Zqiang 提交于
      There is a small race window when a delayed work is being canceled and
      the work still might be queued from the timer_fn:
      
      	CPU0						CPU1
      kthread_cancel_delayed_work_sync()
         __kthread_cancel_work_sync()
           __kthread_cancel_work()
              work->canceling++;
      					      kthread_delayed_work_timer_fn()
      						   kthread_insert_work();
      
      BUG: kthread_insert_work() should not get called when work->canceling is
      set.
      Signed-off-by: NZqiang <qiang.zhang@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6993d0fd
    • J
      mm/truncate.c: make __invalidate_mapping_pages() static · a77eedbc
      Jason Yan 提交于
      Fix the following sparse warning:
      
        mm/truncate.c:531:15: warning: symbol '__invalidate_mapping_pages' was not declared. Should it be static?
      
      Fixes: eb1d7a65 ("mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED")
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NYafang Shao <laoar.shao@gmail.com>
      Link: https://lkml.kernel.org/r/20201015054808.2445904-1-yanaijie@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a77eedbc
    • V
      lib/crc32test: remove extra local_irq_disable/enable · aa4e460f
      Vasily Gorbik 提交于
      Commit 4d004099 ("lockdep: Fix lockdep recursion") uncovered the
      following issue in lib/crc32test reported on s390:
      
        BUG: using __this_cpu_read() in preemptible [00000000] code: swapper/0/1
        caller is lockdep_hardirqs_on_prepare+0x48/0x270
        CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.9.0-next-20201015-15164-g03d992bd2de6 #19
        Hardware name: IBM 3906 M04 704 (LPAR)
        Call Trace:
          lockdep_hardirqs_on_prepare+0x48/0x270
          trace_hardirqs_on+0x9c/0x1b8
          crc32_test.isra.0+0x170/0x1c0
          crc32test_init+0x1c/0x40
          do_one_initcall+0x40/0x130
          do_initcalls+0x126/0x150
          kernel_init_freeable+0x1f6/0x230
          kernel_init+0x22/0x150
          ret_from_fork+0x24/0x2c
        no locks held by swapper/0/1.
      
      Remove extra local_irq_disable/local_irq_enable helpers calls.
      
      Fixes: 5fb7f874 ("lib: add module support to crc32 tests")
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lkml.kernel.org/r/patch.git-4369da00c06e.your-ad-here.call-01602859837-ext-1679@work.hoursSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa4e460f
    • O
      ptrace: fix task_join_group_stop() for the case when current is traced · 7b3c36fc
      Oleg Nesterov 提交于
      This testcase
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <signal.h>
      	#include <sys/ptrace.h>
      	#include <sys/wait.h>
      	#include <pthread.h>
      	#include <assert.h>
      
      	void *tf(void *arg)
      	{
      		return NULL;
      	}
      
      	int main(void)
      	{
      		int pid = fork();
      		if (!pid) {
      			kill(getpid(), SIGSTOP);
      
      			pthread_t th;
      			pthread_create(&th, NULL, tf, NULL);
      
      			return 0;
      		}
      
      		waitpid(pid, NULL, WSTOPPED);
      
      		ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
      		waitpid(pid, NULL, 0);
      
      		ptrace(PTRACE_CONT, pid, 0,0);
      		waitpid(pid, NULL, 0);
      
      		int status;
      		int thread = waitpid(-1, &status, 0);
      		assert(thread > 0 && thread != pid);
      		assert(status == 0x80137f);
      
      		return 0;
      	}
      
      fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().
      
      This is because task_join_group_stop() has 2 problems when current is traced:
      
      	1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
      	   can be woken up by debugger and it can clone another thread which
      	   should join the group-stop.
      
      	   We need to check group_stop_count || SIGNAL_STOP_STOPPED.
      
      	2. If SIGNAL_STOP_STOPPED is already set, we should not increment
      	   sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
      	   should stop without another do_notify_parent_cldstop() report.
      
      To clarify, the problem is very old and we should blame
      ptrace_init_task().  But now that we have task_join_group_stop() it makes
      more sense to fix this helper to avoid the code duplication.
      
      Reported-by: syzbot+3485e3773f7da290eecc@syzkaller.appspotmail.com
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Zhiqiang Liu <liuzhiqiang26@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b3c36fc
    • S
      mm: mempolicy: fix potential pte_unmap_unlock pte error · 3f088420
      Shijie Luo 提交于
      When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
      MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
      pte_unmap_unlock seems like not a good idea.
      
      queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
      migrate misplaced pages but returns with EIO when encountering such a
      page.  Since commit a7f40cfe ("mm: mempolicy: make mbind() return
      -EIO when MPOL_MF_STRICT is specified") and early break on the first pte
      in the range results in pte_unmap_unlock on an underflow pte.  This can
      lead to lockups later on when somebody tries to lock the pte resp.
      page_table_lock again..
      
      Fixes: a7f40cfe ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
      Signed-off-by: NShijie Luo <luoshijie1@huawei.com>
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Feilong Lin <linfeilong@huawei.com>
      Cc: Shijie Luo <luoshijie1@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f088420
    • A
      kasan: adopt KUNIT tests to SW_TAGS mode · 58b999d7
      Andrey Konovalov 提交于
      Now that we have KASAN-KUNIT tests integration, it's easy to see that
      some KASAN tests are not adopted to the SW_TAGS mode and are failing.
      
      Adjust the allocation size for kasan_memchr() and kasan_memcmp() by
      roung it up to OOB_TAG_OFF so the bad access ends up in a separate
      memory granule.
      
      Add a new kmalloc_uaf_16() tests that relies on UAF, and a new
      kasan_bitops_tags() test that is tailored to tag-based mode, as it's
      hard to adopt the existing kmalloc_oob_16() and kasan_bitops_generic()
      (renamed from kasan_bitops()) without losing the precision.
      
      Add new kmalloc_uaf_16() and kasan_bitops_uaf() tests that rely on UAFs,
      as it's hard to adopt the existing kmalloc_oob_16() and
      kasan_bitops_oob() (rename from kasan_bitops()) without losing the
      precision.
      
      Disable kasan_global_oob() and kasan_alloca_oob_left/right() as SW_TAGS
      mode doesn't instrument globals nor dynamic allocas.
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NDavid Gow <davidgow@google.com>
      Link: https://lkml.kernel.org/r/76eee17b6531ca8b3ca92b240cb2fd23204aaff7.1603129942.git.andreyknvl@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58b999d7
    • R
      mm: memcg: link page counters to root if use_hierarchy is false · 8de15e92
      Roman Gushchin 提交于
      Richard reported a warning which can be reproduced by running the LTP
      madvise6 test (cgroup v1 in the non-hierarchical mode should be used):
      
        WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
        Modules linked in:
        CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
        Workqueue: events drain_local_stock
        RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
        Call Trace:
          __memcg_kmem_uncharge (mm/memcontrol.c:3022)
          drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
          drain_local_stock (mm/memcontrol.c:2255)
          process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
          worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
          kthread (kernel/kthread.c:292)
          ret_from_fork (arch/x86/entry/entry_64.S:300)
      
      The problem occurs because in the non-hierarchical mode non-root page
      counters are not linked to root page counters, so the charge is not
      propagated to the root memory cgroup.
      
      After the removal of the original memory cgroup and reparenting of the
      object cgroup, the root cgroup might be uncharged by draining a objcg
      stock, for example.  It leads to an eventual underflow of the charge and
      triggers a warning.
      
      Fix it by linking all page counters to corresponding root page counters
      in the non-hierarchical mode.
      
      Please note, that in the non-hierarchical mode all objcgs are always
      reparented to the root memory cgroup, even if the hierarchy has more
      than 1 level.  This patch doesn't change it.
      
      The patch also doesn't affect how the hierarchical mode is working,
      which is the only sane and truly supported mode now.
      
      Thanks to Richard for reporting, debugging and providing an alternative
      version of the fix!
      
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Reported-by: <ltp@lists.linux.it>
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NMichal Koutný <mkoutny@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.comDebugged-by: NRichard Palethorpe <rpalethorpe@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8de15e92
    • Z
      mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg · 7de2e9f1
      zhongjiang-ali 提交于
      memcg_page_state will get the specified number in hierarchical memcg, It
      should multiply by HPAGE_PMD_NR rather than an page if the item is
      NR_ANON_THPS.
      
      [akpm@linux-foundation.org: fix printk warning]
      [akpm@linux-foundation.org: use u64 cast, per Michal]
      
      Fixes: 468c3982 ("mm: memcontrol: switch to native NR_ANON_THPS counter")
      Signed-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Link: https://lkml.kernel.org/r/1603722395-72443-1-git-send-email-zhongjiang-ali@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7de2e9f1
    • M
      hugetlb_cgroup: fix reservation accounting · 79aa925b
      Mike Kravetz 提交于
      Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
      with hugetlbfs and hit the warning below.  QEMU with free page hinting
      uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
      as free by a VM.  The reporting granularity is in pageblock granularity.
      So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
      one huge page in QEMU.
      
        WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
        Modules linked in: ...
        CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
        Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
        RIP: 0010:page_counter_uncharge+0x4b/0x50
        ...
        Call Trace:
          hugetlb_cgroup_uncharge_file_region+0x4b/0x80
          region_del+0x1d3/0x300
          hugetlb_unreserve_pages+0x39/0xb0
          remove_inode_hugepages+0x1a8/0x3d0
          hugetlbfs_fallocate+0x3c4/0x5c0
          vfs_fallocate+0x146/0x290
          __x64_sys_fallocate+0x3e/0x70
          do_syscall_64+0x33/0x40
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Investigation of the issue uncovered bugs in hugetlb cgroup reservation
      accounting.  This patch addresses the found issues.
      
      Fixes: 075a61d0 ("hugetlb_cgroup: add accounting for shared mappings")
      Reported-by: NMichal Privoznik <mprivozn@redhat.com>
      Co-developed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NMichal Privoznik <mprivozn@redhat.com>
      Reviewed-by: NMina Almasry <almasrymina@google.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Link: https://lkml.kernel.org/r/20201021204426.36069-1-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79aa925b
    • R
      mm/mremap_pages: fix static key devmap_managed_key updates · 46b1ee38
      Ralph Campbell 提交于
      commit 6f42193f ("memremap: don't use a separate devm action for
      devmap_managed_enable_get") changed the static key updates such that we
      now call devmap_managed_enable_put() without doing the equivalent
      devmap_managed_enable_get().
      
      devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
      MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
      types too.  This results in the below warning when switching between
      system-ram and devdax mode for devdax namespace.
      
         jump label: negative count!
         WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
         Modules linked in:
         ....
      
         NIP static_key_slow_try_dec+0x88/0xa0
         LR static_key_slow_try_dec+0x84/0xa0
         Call Trace:
           static_key_slow_try_dec+0x84/0xa0
           __static_key_slow_dec_cpuslocked+0x34/0xd0
           static_key_slow_dec+0x54/0xf0
           memunmap_pages+0x36c/0x500
           devm_action_release+0x30/0x50
           release_nodes+0x2f4/0x3e0
           device_release_driver_internal+0x17c/0x280
           bus_remove_device+0x124/0x210
           device_del+0x1d4/0x530
           unregister_dev_dax+0x48/0xe0
           devm_action_release+0x30/0x50
           release_nodes+0x2f4/0x3e0
           device_release_driver_internal+0x17c/0x280
           unbind_store+0x130/0x170
           drv_attr_store+0x40/0x60
           sysfs_kf_write+0x6c/0xb0
           kernfs_fop_write+0x118/0x280
           vfs_write+0xe8/0x2a0
           ksys_write+0x84/0x140
           system_call_exception+0x120/0x270
           system_call_common+0xf0/0x27c
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Link: https://lkml.kernel.org/r/20201023183222.13186-1-rcampbell@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46b1ee38
    • G
      of: Drop superfluous ULL suffix for ~0 · 495023e4
      Geert Uytterhoeven 提交于
      There is no need to specify a "ULL" suffix for "all bits set": "~0" is
      sufficient, and works regardless of type.  In fact adding the suffix
      makes the code more fragile.
      
      Fixes: 48ab6d5d ("dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n")
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      495023e4
  2. 02 11月, 2020 10 次提交
    • L
      Linux 5.10-rc2 · 3cea11cd
      Linus Torvalds 提交于
      3cea11cd
    • L
      Merge tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b56fbd8
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "Three fixes all related to #DB:
      
         - Handle the BTF bit correctly so it doesn't get lost due to a kernel
           #DB
      
         - Only clear and set the virtual DR6 value used by ptrace on user
           space triggered #DB. A kernel #DB must leave it alone to ensure
           data consistency for ptrace.
      
         - Make the bitmasking of the virtual DR6 storage correct so it does
           not lose DR_STEP"
      
      * tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
        x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
        x86/debug: Fix BTF handling
      7b56fbd8
    • L
      Merge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4312e0e8
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "A few fixes for timers/timekeeping:
      
         - Prevent undefined behaviour in the timespec64_to_ns() conversion
           which is used for converting user supplied time input to
           nanoseconds. It lacked overflow protection.
      
         - Mark sched_clock_read_begin/retry() to prevent recursion in the
           tracer
      
         - Remove unused debug functions in the hrtimer and timerlist code"
      
      * tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        time: Prevent undefined behaviour in timespec64_to_ns()
        timers: Remove unused inline funtion debug_timer_free()
        hrtimer: Remove unused inline function debug_hrtimer_free()
        time/sched_clock: Mark sched_clock_read_begin/retry() as notrace
      4312e0e8
    • L
      Merge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82423b46
      Linus Torvalds 提交于
      Pull smp fix from Thomas Gleixner:
       "A single fix for stop machine.
      
        Mark functions no trace to prevent a crash caused by recursion when
        enabling or disabling a tracer on RISC-V (probably all architectures
        which patch through stop machine)"
      
      * tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        stop_machine, rcu: Mark functions as notrace
      82423b46
    • L
      Merge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8d99084e
      Linus Torvalds 提交于
      Pull locking fixes from Thomas Gleixner:
       "A couple of locking fixes:
      
         - Fix incorrect failure injection handling in the fuxtex code
      
         - Prevent a preemption warning in lockdep when tracking
           local_irq_enable() and interrupts are already enabled
      
         - Remove more raw_cpu_read() usage from lockdep which causes state
           corruption on !X86 architectures.
      
         - Make the nr_unused_locks accounting in lockdep correct again"
      
      * tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Fix nr_unused_locks accounting
        locking/lockdep: Remove more raw_cpu_read() usage
        futex: Fix incorrect should_fail_futex() handling
        lockdep: Fix preemption WARN for spurious IRQ-enable
      8d99084e
    • L
      Merge tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 31f02006
      Linus Torvalds 提交于
      Pull char/misc fixes/removals from Greg KH:
       "Here's some small fixes for 5.10-rc2 and a big driver removal.
      
        The fixes are for some reported issues in the interconnect and
        coresight drivers, nothing major.
      
        The "big" driver removal is the MIC drivers have been asked to be
        removed as the hardware never shipped and Intel no longer wants to
        maintain something that no one can use. This is welcomed by many as
        the DMA usage of these drivers was "interesting" and the security
        people were starting to question some issues that were starting to be
        found in the codebase.
      
        Note, one of the subsystems for this driver, the "VOP" code, will
        probably come back in future kernel versions as it was looking to
        potentially solve some PCIe virtualization issues that a number of
        other vendors were wanting to solve. But as-is, this codebase didn't
        work for anyone else so no actual functionality is being removed.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        coresight: cti: Initialize dynamic sysfs attributes
        coresight: Fix uninitialised pointer bug in etm_setup_aux()
        coresight: add module license
        misc: mic: remove the MIC drivers
        interconnect: qcom: use icc_sync state for sm8[12]50
        interconnect: qcom: Ensure that the floor bandwidth value is enforced
        interconnect: qcom: sc7180: Init BCMs before creating the nodes
        interconnect: qcom: sdm845: Init BCMs before creating the nodes
        interconnect: Aggregate before setting initial bandwidth
        interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM
      31f02006
    • L
      Merge tag 'driver-core-5.10-rc2' of... · 9c75b68b
      Linus Torvalds 提交于
      Merge tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core and documentation fixes from Greg KH:
       "Here is one tiny debugfs change to fix up an API where the last user
        was successfully fixed up in 5.10-rc1 (so it couldn't be merged
        earlier), and a much larger Documentation/ABI/ update to the files so
        they can be automatically parsed by our tools.
      
        The Documentation/ABI/ updates are just formatting issues, small ones
        to bring the files into parsable format, and have been acked by
        numerous subsystem maintainers and the documentation maintainer. I
        figured it was good to get this into 5.10-rc2 to help wih the merge
        issues that would arise if these were to stick in linux-next until
        5.11-rc1.
      
        The debugfs change has been in linux-next for a long time, and the
        Documentation updates only for the last linux-next release"
      
      * tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (40 commits)
        scripts: get_abi.pl: assume ReST format by default
        docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication
        docs: ABI: sysfs-class-backlight: unify ABI documentation
        docs: ABI: sysfs-c2port: remove a duplicated entry
        docs: ABI: sysfs-class-power: unify duplicated properties
        docs: ABI: unify /sys/class/leds/<led>/brightness documentation
        docs: ABI: stable: remove a duplicated documentation
        docs: ABI: change read/write attributes
        docs: ABI: cleanup several ABI documents
        docs: ABI: sysfs-bus-nvdimm: use the right format for ABI
        docs: ABI: vdso: use the right format for ABI
        docs: ABI: fix syntax to be parsed using ReST notation
        docs: ABI: convert testing/configfs-acpi to ReST
        docs: Kconfig/Makefile: add a check for broken ABI files
        docs: abi-testing.rst: enable --rst-sources when building docs
        docs: ABI: don't escape ReST-incompatible chars from obsolete and removed
        docs: ABI: create a 2-depth index for ABI
        docs: ABI: make it parse ABI/stable as ReST-compatible files
        docs: ABI: sysfs-uevent: make it compatible with ReST output
        docs: ABI: testing: make the files compatible with ReST output
        ...
      9c75b68b
    • L
      Merge tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 2376cca0
      Linus Torvalds 提交于
      Pull staging driver fixes from Greg KH:
       "Here are some small staging driver fixes for issues that have been
        reported in 5.10-rc1:
      
         - octeon driver fixes
      
         - wfx driver fixes
      
         - memory leak fix in vchiq driver
      
         - fieldbus driver bugfix
      
         - comedi driver bugfix
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: fieldbus: anybuss: jump to correct label in an error path
        staging: wfx: fix test on return value of gpiod_get_value()
        staging: wfx: fix use of uninitialized pointer
        staging: mmal-vchiq: Fix memory leak for vchiq_instance
        staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
        staging: octeon: Drop on uncorrectable alignment or FCS error
        staging: octeon: repair "fixed-link" support
      2376cca0
    • L
      Merge tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 2754a42e
      Linus Torvalds 提交于
      Pull tty/serial fixes from Greg KH:
       "Here are some small TTY and Serial driver fixes for reported issues
        for 5.10-rc2. They include:
      
         - vt ioctl bugfix for reported problems
      
         - fsl_lpuart serial driver fix
      
         - 21285 serial driver bugfix
      
        All have been in linux-next with no reported issues"
      
      * tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        vt_ioctl: fix GIO_UNIMAP regression
        vt: keyboard, extend func_buf_lock to readers
        vt: keyboard, simplify vt_kdgkbsent
        tty: serial: fsl_lpuart: LS1021A has a FIFO size of 16 words, like LS1028A
        tty: serial: 21285: fix lockup on open
      2754a42e
    • L
      Merge tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 9b5ff3c9
      Linus Torvalds 提交于
      Pull USB driver fixes from Greg KH:
       "Here are a number of small bugfixes for reported issues in some USB
        drivers. They include:
      
         - typec bugfixes
      
         - xhci bugfixes and lockdep warning fixes
      
         - cdc-acm driver regression fix
      
         - kernel doc fixes
      
         - cdns3 driver bugfixes for a bunch of reported issues
      
         - other tiny USB driver fixes
      
        All have been in linux-next with no reported issues"
      
      * tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: cdns3: gadget: own the lock wrongly at the suspend routine
        usb: cdns3: Fix on-chip memory overflow issue
        usb: cdns3: gadget: suspicious implicit sign extension
        xhci: Don't create stream debugfs files with spinlock held.
        usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
        xhci: Fix sizeof() mismatch
        usb: typec: stusb160x: fix signedness comparison issue with enum variables
        usb: typec: add missing MODULE_DEVICE_TABLE() to stusb160x
        USB: apple-mfi-fastcharge: don't probe unhandled devices
        usbcore: Check both id_table and match() when both available
        usb: host: ehci-tegra: Fix error handling in tegra_ehci_probe()
        usb: typec: stusb160x: fix an IS_ERR() vs NULL check in probe
        usb: typec: tcpm: reset hard_reset_count for any disconnect
        usb: cdc-acm: fix cooldown mechanism
        usb: host: fsl-mph-dr-of: check return of dma_set_mask()
        usb: fix kernel-doc markups
        usb: typec: stusb160x: fix some signedness bugs
        usb: cdns3: Variable 'length' set but not used
      9b5ff3c9