1. 02 6月, 2018 4 次提交
    • A
      perf tools intel-pt-decoder: Update insn.h from the kernel sources · 0b3a1838
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        ee6a7354 ("kprobes/x86: Prohibit probing on exception masking instructions")
      
      That doesn't entail changes in tooling, but silences this perf build
      warning:
      
        Warning: Intel PT: x86 instruction decoder header at 'tools/perf/util/intel-pt-decoder/insn.h' differs from latest version at 'arch/x86/include/asm/insn.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-o3wfwjnyh7r8l0gi9q3y9f44@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0b3a1838
    • A
      tools headers: Sync x86 cpufeatures.h with the kernel sources · a20d23bb
      Arnaldo Carvalho de Melo 提交于
      To pick up changes found in these csets:
      
       11fb0683 x86/speculation: Add virtualized speculative store bypass disable support
       d1035d97 x86/cpufeatures: Add FEATURE_ZEN
       52817587 x86/cpufeatures: Disentangle SSBD enumeration
       7eb8956a x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
       e7c587da x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
       9f65fb29 x86/bugs: Rename _RDS to _SSBD
       764f3c21 x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested
       24f7fc83 x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation
       0cc5fa00 x86/cpufeatures: Add X86_FEATURE_RDS
       c456442c x86/bugs: Expose /sys/../spec_store_bypass
      
      The usage of this file in tools doesn't use the newly added X86_FEATURE_
      defines:
      
        CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
        LD       /tmp/build/perf/bench/perf-in.o
        LD       /tmp/build/perf/perf-in.o
      
      Silencing this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-mrwyauyov8c7s048abg26khg@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a20d23bb
    • A
      tools headers: Synchronize prctl.h ABI header · 63b89a19
      Arnaldo Carvalho de Melo 提交于
      To pick up changes from:
      
        $ git log --oneline -2 -i include/uapi/linux/prctl.h
        356e4bff prctl: Add force disable speculation
        b617cfc8 prctl: Add speculation control prctls
      
        $ tools/perf/trace/beauty/prctl_option.sh > before.c
        $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
        $ tools/perf/trace/beauty/prctl_option.sh > after.c
        $ diff -u before.c after.c
        --- before.c	2018-06-01 10:39:53.834073962 -0300
        +++ after.c	2018-06-01 10:42:11.307985394 -0300
        @@ -35,6 +35,8 @@
                [42] = "GET_THP_DISABLE",
                [45] = "SET_FP_MODE",
                [46] = "GET_FP_MODE",
        +       [52] = "GET_SPECULATION_CTRL",
        +       [53] = "SET_SPECULATION_CTRL",
         };
         static const char *prctl_set_mm_options[] = {
       	  [1] = "START_CODE",
        $
      
      This will be used by 'perf trace' to show these strings when beautifying
      the prctl syscall args. At some point we'll be able to say something
      like:
      
      	'perf trace --all-cpus -e prctl(option=*SPEC*)'
      
      To filter by arg by name.
      
        This silences this warning when building tools/perf:
      
          Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zztsptwhc264r8wg44tqh5gp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63b89a19
    • A
      perf trace beauty prctl: Default header_dir to cwd to work without parms · 0d690fc0
      Arnaldo Carvalho de Melo 提交于
      Useful when checking the effects of header synchs for the files it uses
      as a input to generate string tables, in retrospect this is how it
      should've been done from day 1, not requiring the header_dir to be set
      on the Makefile, will change everything later, so that the only parm,
      common to all generators will be $(srctree) and $(beauty_outdir).
      
      So, to see what it generates, just call it without any parameters:
      
        $ tools/perf/trace/beauty/prctl_option.sh
        static const char *prctl_options[] = {
      	  [1] = "SET_PDEATHSIG",
      	  [2] = "GET_PDEATHSIG",
      	  [3] = "GET_DUMPABLE",
      	  [4] = "SET_DUMPABLE",
      	  [5] = "GET_UNALIGN",
      	  [6] = "SET_UNALIGN",
      	  [7] = "GET_KEEPCAPS",
      	  [8] = "SET_KEEPCAPS",
      	  [9] = "GET_FPEMU",
      	  [10] = "SET_FPEMU",
      	  [11] = "GET_FPEXC",
      	  [12] = "SET_FPEXC",
      	  [13] = "GET_TIMING",
      	  [14] = "SET_TIMING",
      	  [15] = "SET_NAME",
      	  [16] = "GET_NAME",
      	  [19] = "GET_ENDIAN",
      	  [20] = "SET_ENDIAN",
      	  [21] = "GET_SECCOMP",
      	  [22] = "SET_SECCOMP",
      	  [25] = "GET_TSC",
      	  [26] = "SET_TSC",
      	  [27] = "GET_SECUREBITS",
      	  [28] = "SET_SECUREBITS",
      	  [29] = "SET_TIMERSLACK",
      	  [30] = "GET_TIMERSLACK",
      	  [35] = "SET_MM",
      	  [36] = "SET_CHILD_SUBREAPER",
      	  [37] = "GET_CHILD_SUBREAPER",
      	  [38] = "SET_NO_NEW_PRIVS",
      	  [39] = "GET_NO_NEW_PRIVS",
      	  [40] = "GET_TID_ADDRESS",
      	  [41] = "SET_THP_DISABLE",
      	  [42] = "GET_THP_DISABLE",
      	  [45] = "SET_FP_MODE",
      	  [46] = "GET_FP_MODE",
        };
        static const char *prctl_set_mm_options[] = {
      	  [1] = "START_CODE",
      	  [2] = "END_CODE",
      	  [3] = "START_DATA",
      	  [4] = "END_DATA",
      	  [5] = "START_STACK",
      	  [6] = "START_BRK",
      	  [7] = "BRK",
      	  [8] = "ARG_START",
      	  [9] = "ARG_END",
      	  [10] = "ENV_START",
      	  [11] = "ENV_END",
      	  [12] = "AUXV",
      	  [13] = "EXE_FILE",
      	  [14] = "MAP",
      	  [15] = "MAP_SIZE",
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-qtotspuztydjttxi7k6mec6h@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d690fc0
  2. 31 5月, 2018 6 次提交
  3. 30 5月, 2018 2 次提交
    • T
      perf test: "Session topology" dumps core on s390 · d1211091
      Thomas Richter 提交于
      The "perf test Session topology" entry fails with core dump on s390. The root
      cause is a NULL pointer dereference in function check_cpu_topology() line 76
      (or line 82 without -v).
      
      The session->header.env.cpu variable is NULL because on s390 function
      process_cpu_topology() returns with error:
      
          socket_id number is too big.
          You may need to upgrade the perf tool.
      
      and releases the env.cpu variable via zfree() and sets it to NULL.
      
      Here is the gdb output:
      (gdb) n
      76                      pr_debug("CPU %d, core %d, socket %d\n", i,
      (gdb) n
      
      Program received signal SIGSEGV, Segmentation fault.
      0x00000000010f4d9e in check_cpu_topology (path=0x3ffffffd6c8
      	"/tmp/perf-test-J6CHMa", map=0x14a1740) at tests/topology.c:76
      76  pr_debug("CPU %d, core %d, socket %d\n", i,
      (gdb)
      
      Make sure the env.cpu variable is not used when its NULL.
      Test for NULL pointer and return TEST_SKIP if so.
      
      Output before:
      
        [root@p23lp27 perf]# ./perf test -F 39
        39: Session topology  :Segmentation fault (core dumped)
        [root@p23lp27 perf]#
      
      Output after:
      
        [root@p23lp27 perf]# ./perf test -vF 39
        39: Session topology                                      :
        --- start ---
        templ file: /tmp/perf-test-Ajx59D
        socket_id number is too big.You may need to upgrade the perf tool.
        ---- end ----
        Session topology: Skip
        [root@p23lp27 perf]#
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180528073657.11743-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1211091
    • K
      perf parse-events: Handle uncore event aliases in small groups properly · 369b2308
      Kan Liang 提交于
      Perf stat doesn't count the uncore event aliases from the same uncore
      block in a group, for example:
      
        perf stat -e '{unc_m_cas_count.all,unc_m_clockticks}' -a -I 1000
        #           time             counts unit events
             1.000447342      <not counted>      unc_m_cas_count.all
             1.000447342      <not counted>      unc_m_clockticks
             2.000740654      <not counted>      unc_m_cas_count.all
             2.000740654      <not counted>      unc_m_clockticks
      
      The output is very misleading. It gives a wrong impression that the
      uncore event doesn't work.
      
      An uncore block could be composed by several PMUs. An uncore event alias
      is a joint name which means the same event runs on all PMUs of a block.
      Perf doesn't support mixed events from different PMUs in the same group.
      It is wrong to put uncore event aliases in a big group.
      
      The right way is to split the big group into multiple small groups which
      only include the events from the same PMU.
      
      Only uncore event aliases from the same uncore block should be specially
      handled here. It doesn't make sense to mix the uncore events with other
      uncore events from different blocks or even core events in a group.
      
      With the patch:
        #           time             counts unit events
           1.001557653            140,833      unc_m_cas_count.all
           1.001557653      1,330,231,332      unc_m_clockticks
           2.002709483             85,007      unc_m_cas_count.all
           2.002709483      1,429,494,563      unc_m_clockticks
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
      Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1525727623-19768-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      369b2308
  4. 28 5月, 2018 3 次提交
    • L
      Merge tag 'nds32-for-linus-4.17-fixes' of... · 786b71f5
      Linus Torvalds 提交于
      Merge tag 'nds32-for-linus-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
      
      Pull nds32 fixes from Greentime Hu:
       "Bug fixes and build error fixes for nds32"
      
      * tag 'nds32-for-linus-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
        nds32: Fix compiler warning, Wstringop-overflow, in vdso.c
        nds32: Disable local irq before calling cpu_dcache_wb_page in copy_user_highpage
        nds32: Flush the cache of the page at vmaddr instead of kaddr in flush_anon_page
        nds32: Correct flush_dcache_page function
        nds32: Fix the unaligned access handler
        nds32: Renaming the file for unaligned access
        nds32: To fix a cache inconsistency issue by setting correct cacheability of NTC
        nds32: To refine readability of INT_MASK_INITAIAL_VAL
        nds32: Fix the virtual address may map too much range by tlbop issue.
        nds32: Fix the allmodconfig build. To make sure CONFIG_CPU_LITTLE_ENDIAN is default y
        nds32: Fix build failed because arch_trace_hardirqs_off is changed to trace_hardirqs_off.
        nds32: Fix the unknown type u8 issue.
        nds32: Fix the symbols undefined issue by exporting them.
        nds32: Fix xfs_buf built failed by export invalidate_kernel_vmap_range and flush_kernel_vmap_range
        nds32: Fix drivers/gpu/drm/udl/udl_fb.c building error by defining PAGE_SHARED
        nds32: Fix building error of crypto/xor.c by adding xor.h
        nds32: Fix building error when CONFIG_FREEZE is enabled.
        nds32: lib: To use generic lib instead of libgcc to prevent the symbol undefined issue.
      786b71f5
    • L
      Linux 4.17-rc7 · b04e2177
      Linus Torvalds 提交于
      b04e2177
    • L
      Merge tag 'kbuild-fixes-v4.17-2' of... · 861d9dd3
      Linus Torvalds 提交于
      Merge tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull more Kbuild fixes from Masahiro Yamada:
      
       - enable '-fno-tree-loop-im' only when supported
      
       - add '-fno-PIE' option before the asm-goto test
      
      * tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Makefile: disable PIE before testing asm goto
        kbuild: gcov: enable -fno-tree-loop-im if supported
      861d9dd3
  5. 27 5月, 2018 7 次提交
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 7fbb6157
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "A few more fixes for v4.17:
      
         - a fix for a crash in scm_call_atomic on qcom platforms
      
         - display fix for Allwinner A10
      
         - a fix that re-enables ethernet on Allwinner H3 (C.H.I.P et al)
      
         - a fix for eMMC corruption on hikey
      
         - i2c-gpio descriptor tables for ixp4xx
      
        ... plus a small typo fix"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: Fix i2c-gpio GPIO descriptor tables
        arm64: dts: hikey: Fix eMMC corruption regression
        firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1()
        ARM: sun8i: v3s: fix spelling mistake: "disbaled" -> "disabled"
        ARM: dts: sun4i: Fix incorrect clocks for displays
        ARM: dts: sun8i: h3: Re-enable EMAC on Orange Pi One
      7fbb6157
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b2096a5e
      Linus Torvalds 提交于
      Pull x86 store buffer fixes from Thomas Gleixner:
       "Two fixes for the SSBD mitigation code:
      
         - expose SSBD properly to guests. This got broken when the CPU
           feature flags got reshuffled.
      
         - simplify the CPU detection logic to avoid duplicate entries in the
           tables"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/speculation: Simplify the CPU bug detection logic
        KVM/VMX: Expose SSBD properly to guests
      b2096a5e
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cc71efda
      Linus Torvalds 提交于
      Pull scheduler fixes from Thomas Gleixner:
       "Three fixes for scheduler and kthread code:
      
         - allow calling kthread_park() on an already parked thread
      
         - restore the sched_pi_setprio() tracepoint behaviour
      
         - clarify the unclear string for the scheduling domain debug output"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched, tracing: Fix trace_sched_pi_setprio() for deboosting
        kthread: Allow kthread_park() on a parked kthread
        sched/topology: Clarify root domain(s) debug string
      cc71efda
    • O
      Merge tag 'hisi-fixes-for-4.17v2' of git://github.com/hisilicon/linux-hisi into fixes · e5dd6154
      Olof Johansson 提交于
      ARM64: hisi fixes for 4.17
      
      - Remove eMMC max-frequency property to fix eMMC corruption on hikey board
      
      * tag 'hisi-fixes-for-4.17v2' of git://github.com/hisilicon/linux-hisi:
        arm64: dts: hikey: Fix eMMC corruption regression
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      e5dd6154
    • L
      ARM: Fix i2c-gpio GPIO descriptor tables · f59c303b
      Linus Walleij 提交于
      I used bad names in my clumsiness when rewriting many board
      files to use GPIO descriptors instead of platform data. A few
      had the platform_device ID set to -1 which would indeed give
      the device name "i2c-gpio".
      
      But several had it set to >=0 which gives the names
      "i2c-gpio.0", "i2c-gpio.1" ...
      
      Fix the offending instances in the ARM tree. Sorry for the
      mess.
      
      Fixes: b2e63555 ("i2c: gpio: Convert to use descriptors")
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: Simon Guinot <simon.guinot@sequanux.org>
      Reported-by: NSimon Guinot <simon.guinot@sequanux.org>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      f59c303b
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · ec30dcf7
      Linus Torvalds 提交于
      Pull KVM fixes from Radim Krčmář:
       "PPC:
      
         - Close a hole which could possibly lead to the host timebase getting
           out of sync.
      
         - Three fixes relating to PTEs and TLB entries for radix guests.
      
         - Fix a bug which could lead to an interrupt never getting delivered
           to the guest, if it is pending for a guest vCPU when the vCPU gets
           offlined.
      
        s390:
      
         - Fix false negatives in VSIE validity check (Cc stable)
      
        x86:
      
         - Fix time drift of VMX preemption timer when a guest uses LAPIC
           timer in periodic mode (Cc stable)
      
         - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
           migration from hosts that don't need retpoline mitigation (Cc
           stable)
      
         - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
           CPUID.OSXSAVE (Cc stable)
      
         - Report correct RIP after Hyper-V hypercall #UD (introduced in
           -rc6)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix #UD address of failed Hyper-V hypercalls
        kvm: x86: IA32_ARCH_CAPABILITIES is always supported
        KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
        x86/kvm: fix LAPIC timer drift when guest uses periodic mode
        KVM: s390: vsie: fix < 8k check for the itdba
        KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path
        KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change
        KVM: PPC: Book3S HV: Make radix clear pte when unmapping
        KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page
        KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
      ec30dcf7
    • J
      arm64: dts: hikey: Fix eMMC corruption regression · 9c6d26df
      John Stultz 提交于
      This patch is a partial revert of
      commit abd7d097 ("arm64: dts: hikey: Enable HS200 mode on eMMC")
      
      which has been causing eMMC corruption on my HiKey board.
      
      Symptoms usually looked like:
      
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      ...
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      mmc0: new HS200 MMC card at address 0001
      ...
      dwmmc_k3 f723d000.dwmmc0: Unexpected command timeout, state 3
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      print_req_error: I/O error, dev mmcblk0, sector 8810504
      Aborting journal on device mmcblk0p10-8.
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
      mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
      EXT4-fs error (device mmcblk0p10): ext4_journal_check_start:61: Detected aborted journal
      EXT4-fs (mmcblk0p10): Remounting filesystem read-only
      
      And quite often this would result in a disk that wouldn't properly
      boot even with older kernels.
      
      It seems the max-frequency property added by the above patch is
      causing the problem, so remove it.
      
      Cc: Ryan Grachek <ryan@edited.us>
      Cc: Wei Xu <xuwei5@hisilicon.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: YongQin Liu <yongqin.liu@linaro.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NWei Xu <xuwei04@gmail.com>
      9c6d26df
  6. 26 5月, 2018 18 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · bc2dbc54
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        kasan: fix memory hotplug during boot
        kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
        checkpatch: fix macro argument precedence test
        init/main.c: include <linux/mem_encrypt.h>
        kernel/sys.c: fix potential Spectre v1 issue
        mm/memory_hotplug: fix leftover use of struct page during hotplug
        proc: fix smaps and meminfo alignment
        mm: do not warn on offline nodes unless the specific node is explicitly requested
        mm, memory_hotplug: make has_unmovable_pages more robust
        mm/kasan: don't vfree() nonexistent vm_area
        MAINTAINERS: change hugetlbfs maintainer and update files
        ipc/shm: fix shmat() nil address after round-down when remapping
        Revert "ipc/shm: Fix shmat mmap nil-page protection"
        idr: fix invalid ptr dereference on item delete
        ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
        mm: fix nr_rotate_swap leak in swapon() error case
      bc2dbc54
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 03250e10
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Let's begin the holiday weekend with some networking fixes:
      
         1) Whoops need to restrict cfg80211 wiphy names even more to 64
            bytes. From Eric Biggers.
      
         2) Fix flags being ignored when using kernel_connect() with SCTP,
            from Xin Long.
      
         3) Use after free in DCCP, from Alexey Kodanev.
      
         4) Need to check rhltable_init() return value in ipmr code, from Eric
            Dumazet.
      
         5) XDP handling fixes in virtio_net from Jason Wang.
      
         6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
      
         7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
            Morgenstein.
      
         8) Prevent out-of-bounds speculation using indexes in BPF, from
            Daniel Borkmann.
      
         9) Fix regression added by AF_PACKET link layer cure, from Willem de
            Bruijn.
      
        10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
      
        11) Missing config options for PMTU tests, from Stefano Brivio"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
        ibmvnic: Fix partial success login retries
        selftests/net: Add missing config options for PMTU tests
        mlx4_core: allocate ICM memory in page size chunks
        enic: set DMA mask to 47 bit
        ppp: remove the PPPIOCDETACH ioctl
        ipv4: remove warning in ip_recv_error
        net : sched: cls_api: deal with egdev path only if needed
        vhost: synchronize IOTLB message with dev cleanup
        packet: fix reserve calculation
        net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
        net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
        bpf: properly enforce index mask to prevent out-of-bounds speculation
        net/mlx4: Fix irq-unsafe spinlock usage
        net: phy: broadcom: Fix bcm_write_exp()
        net: phy: broadcom: Fix auxiliary control register reads
        net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
        net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
        ibmvnic: Only do H_EOI for mobility events
        tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
        virtio-net: fix leaking page for gso packet during mergeable XDP
        ...
      03250e10
    • D
      kasan: fix memory hotplug during boot · 3f195972
      David Hildenbrand 提交于
      Using module_init() is wrong.  E.g.  ACPI adds and onlines memory before
      our memory notifier gets registered.
      
      This makes sure that ACPI memory detected during boot up will not result
      in a kernel crash.
      
      Easily reproducible with QEMU, just specify a DIMM when starting up.
      
      Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
      Fixes: 786a8959 ("kasan: disable memory hotplug")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f195972
    • D
      kasan: free allocated shadow memory on MEM_CANCEL_ONLINE · ed1596f9
      David Hildenbrand 提交于
      We have to free memory again when we cancel onlining, otherwise a later
      onlining attempt will fail.
      
      Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed1596f9
    • J
      checkpatch: fix macro argument precedence test · d41362ed
      Joe Perches 提交于
      checkpatch's macro argument precedence test is broken so fix it.
      
      Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d41362ed
    • M
      init/main.c: include <linux/mem_encrypt.h> · ae67d58d
      Mathieu Malaterre 提交于
      In commit c7753208 ("x86, swiotlb: Add memory encryption support") a
      call to function `mem_encrypt_init' was added.  Include prototype
      defined in header <linux/mem_encrypt.h> to prevent a warning reported
      during compilation with W=1:
      
        init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]
      
      Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.orgSigned-off-by: NMathieu Malaterre <malat@debian.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae67d58d
    • G
      kernel/sys.c: fix potential Spectre v1 issue · 23d6aef7
      Gustavo A. R. Silva 提交于
      `resource' can be controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
        kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
        kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
      
      Fix this by sanitizing *resource* before using it to index
      current->signal->rlim
      
      Notice that given that speculation windows are large, the policy is to
      kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.comSigned-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23d6aef7
    • J
      mm/memory_hotplug: fix leftover use of struct page during hotplug · a2155861
      Jonathan Cameron 提交于
      The case of a new numa node got missed in avoiding using the node info
      from page_struct during hotplug.  In this path we have a call to
      register_mem_sect_under_node (which allows us to specify it is hotplug
      so don't change the node), via link_mem_sections which unfortunately
      does not.
      
      Fix is to pass check_nid through link_mem_sections as well and disable
      it in the new numa node path.
      
      Note the bug only 'sometimes' manifests depending on what happens to be
      in the struct page structures - there are lots of them and it only needs
      to match one of them.
      
      The result of the bug is that (with a new memory only node) we never
      successfully call register_mem_sect_under_node so don't get the memory
      associated with the node in sysfs and meminfo for the node doesn't
      report it.
      
      It came up whilst testing some arm64 hotplug patches, but appears to be
      universal.  Whilst I'm triggering it by removing then reinserting memory
      to a node with no other elements (thus making the node disappear then
      appear again), it appears it would happen on hotplugging memory where
      there was none before and it doesn't seem to be related the arm64
      patches.
      
      These patches call __add_pages (where most of the issue was fixed by
      Pavel's patch).  If there is a node at the time of the __add_pages call
      then all is well as it calls register_mem_sect_under_node from there
      with check_nid set to false.  Without a node that function returns
      having not done the sysfs related stuff as there is no node to use.
      This is expected but it is the resulting path that fails...
      
      Exact path to the problem is as follows:
      
       mm/memory_hotplug.c: add_memory_resource()
      
         The node is not online so we enter the 'if (new_node)' twice, on the
         second such block there is a call to link_mem_sections which calls
         into
      
        drivers/node.c: link_mem_sections() which calls
      
        drivers/node.c: register_mem_sect_under_node() which calls
           get_nid_for_pfn and keeps trying until the output of that matches
           the expected node (passed all the way down from
           add_memory_resource)
      
      It is effectively the same fix as the one referred to in the fixes tag
      just in the code path for a new node where the comments point out we
      have to rerun the link creation because it will have failed in
      register_new_memory (as there was no node at the time).  (actually that
      comment is wrong now as we don't have register_new_memory any more it
      got renamed to hotplug_memory_register in Pavel's patch).
      
      Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
      Fixes: fc44f7f9 ("mm/memory_hotplug: don't read nid from struct page during hotplug")
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2155861
    • H
      proc: fix smaps and meminfo alignment · 6c04ab0e
      Hugh Dickins 提交于
      The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit
      numbers (commonly 0) are misaligned.
      
      Remove seq_put_decimal_ull_width()'s leftover optimization for single
      digits: it's wrong now that num_to_str() takes care of the width.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils
      Fixes: d1be35cb ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c04ab0e
    • M
      mm: do not warn on offline nodes unless the specific node is explicitly requested · 8addc2d0
      Michal Hocko 提交于
      Oscar has noticed that we splat
      
         WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9
         [...]
         CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G        W   E     4.17.0-rc5-next-20180517-1-default+ #66
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
         Workqueue: kacpi_hotplug acpi_hotplug_work_fn
         Call Trace:
          vmemmap_populate+0xf2/0x2ae
          sparse_mem_map_populate+0x28/0x35
          sparse_add_one_section+0x4c/0x187
          __add_pages+0xe7/0x1a0
          add_pages+0x16/0x70
          add_memory_resource+0xa3/0x1d0
          add_memory+0xe4/0x110
          acpi_memory_device_add+0x134/0x2e0
          acpi_bus_attach+0xd9/0x190
          acpi_bus_scan+0x37/0x70
          acpi_device_hotplug+0x389/0x4e0
          acpi_hotplug_work_fn+0x1a/0x30
          process_one_work+0x146/0x340
          worker_thread+0x47/0x3e0
          kthread+0xf5/0x130
          ret_from_fork+0x35/0x40
      
      when adding memory to a node that is currently offline.
      
      The VM_WARN_ON is just too loud without a good reason.  In this
      particular case we are doing
      
      	alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order)
      
      so we do not insist on allocating from the given node (it is more a
      hint) so we can fall back to any other populated node and moreover we
      explicitly ask to not warn for the allocation failure.
      
      Soften the warning only to cases when somebody asks for the given node
      explicitly by __GFP_THISNODE.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NOscar Salvador <osalvador@techadventures.net>
      Tested-by: NOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8addc2d0
    • M
      mm, memory_hotplug: make has_unmovable_pages more robust · 15c30bc0
      Michal Hocko 提交于
      Oscar has reported:
      : Due to an unfortunate setting with movablecore, memblocks containing bootmem
      : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
      : So while trying to remove that memory, the system failed in do_migrate_range
      : and __offline_pages never returned.
      :
      : This can be reproduced by running
      : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M
      : and movablecore=4G kernel command line
      :
      : linux kernel: BIOS-provided physical RAM map:
      : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
      : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
      : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
      : linux kernel: NX (Execute Disable) protection: active
      : linux kernel: SMBIOS 2.8 present.
      : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
      : linux kernel: Hypervisor detected: KVM
      : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
      : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
      : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
      :
      : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
      : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
      : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
      : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
      : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
      :
      : zoneinfo shows that the zone movable is placed into both numa nodes:
      : Node 0, zone  Movable
      :   pages free     160140
      :         min      1823
      :         low      2278
      :         high     2733
      :         spanned  262144
      :         present  262144
      :         managed  245670
      : Node 1, zone  Movable
      :   pages free     448427
      :         min      3827
      :         low      4783
      :         high     5739
      :         spanned  524288
      :         present  524288
      :         managed  515766
      
      Note how only Node 0 has a hutplugable memory region which would rule it
      out from the early memblock allocations (most likely memmap).  Node1
      will surely contain memmaps on the same node and those would prevent
      offlining to succeed.  So this is arguably a configuration issue.
      Although one could argue that we should be more clever and rule early
      allocations from the zone movable.  This would be correct but probably
      not worth the effort considering what a hack movablecore is.
      
      Anyway, We could do better for those cases though.  We rely on
      start_isolate_page_range resp.  has_unmovable_pages to do their job.
      The first one isolates the whole range to be offlined so that we do not
      allocate from it anymore and the later makes sure we are not stumbling
      over non-migrateable pages.
      
      has_unmovable_pages is overly optimistic, however.  It doesn't check all
      the pages if we are withing zone_movable because we rely that those
      pages will be always migrateable.  As it turns out we are still not
      perfect there.  While bootmem pages in zonemovable sound like a clear
      bug which should be fixed let's remove the optimization for now and warn
      if we encounter unmovable pages in zone_movable in the meantime.  That
      should help for now at least.
      
      Btw.  this wasn't a real problem until commit 72b39cfc ("mm,
      memory_hotplug: do not fail offlining too early") because we used to
      have a small number of retries and then failed.  This turned out to be
      too fragile though.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NOscar Salvador <osalvador@techadventures.net>
      Tested-by: NOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15c30bc0
    • A
      mm/kasan: don't vfree() nonexistent vm_area · 0f901dcb
      Andrey Ryabinin 提交于
      KASAN uses different routines to map shadow for hot added memory and
      memory obtained in boot process.  Attempt to offline memory onlined by
      normal boot process leads to this:
      
          Trying to vfree() nonexistent vm area (000000005d3b34b9)
          WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
      
          Call Trace:
           kasan_mem_notifier+0xad/0xb9
           notifier_call_chain+0x166/0x260
           __blocking_notifier_call_chain+0xdb/0x140
           __offline_pages+0x96a/0xb10
           memory_subsys_offline+0x76/0xc0
           device_offline+0xb8/0x120
           store_mem_state+0xfa/0x120
           kernfs_fop_write+0x1d5/0x320
           __vfs_write+0xd4/0x530
           vfs_write+0x105/0x340
           SyS_write+0xb0/0x140
      
      Obviously we can't call vfree() to free memory that wasn't allocated via
      vmalloc().  Use find_vm_area() to see if we can call vfree().
      
      Unfortunately it's a bit tricky to properly unmap and free shadow
      allocated during boot, so we'll have to keep it.  If memory will come
      online again that shadow will be reused.
      
      Matthew asked: how can you call vfree() on something that isn't a
      vmalloc address?
      
        vfree() is able to free any address returned by
        __vmalloc_node_range().  And __vmalloc_node_range() gives you any
        address you ask.  It doesn't have to be an address in [VMALLOC_START,
        VMALLOC_END] range.
      
        That's also how the module_alloc()/module_memfree() works on
        architectures that have designated area for modules.
      
      [aryabinin@virtuozzo.com: improve comments]
        Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
      [akpm@linux-foundation.org: fix typos, reflow comment]
      Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NPaul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f901dcb
    • M
      MAINTAINERS: change hugetlbfs maintainer and update files · b9ddff9b
      Mike Kravetz 提交于
      The current hugetlbfs maintainer has not been active for more than a few
      years.  I have been been active in this area for more than two years and
      plan to remain active in the foreseeable future.
      
      Also, update the hugetlbfs entry to include linux-mm mail list and
      additional hugetlbfs related files.  hugetlb.c and hugetlb.h are not
      100% hugetlbfs, but a majority of their content is hugetlbfs related.
      
      Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9ddff9b
    • D
      ipc/shm: fix shmat() nil address after round-down when remapping · 8f89c007
      Davidlohr Bueso 提交于
      shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
      fact the very first thing we check for.  Andrea reported that for
      SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
      but we need to check again if the address was rounded down to nil.  As
      of this patch, such cases will return -EINVAL.
      
      Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f89c007
    • D
      Revert "ipc/shm: Fix shmat mmap nil-page protection" · a73ab244
      Davidlohr Bueso 提交于
      Patch series "ipc/shm: shmat() fixes around nil-page".
      
      These patches fix two issues reported[1] a while back by Joe and Andrea
      around how shmat(2) behaves with nil-page.
      
      The first reverts a commit that it was incorrectly thought that mapping
      nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
      with the exception of SHM_REMAP; which is address in the second patch.
      
      I chose two patches because it is easier to backport and it explicitly
      reverts bogus behaviour.  Both patches ought to be in -stable and ltp
      testcases need updated (the added testcase around the cve can be
      modified to just test for SHM_RND|SHM_REMAP).
      
      [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
      
      This patch (of 2):
      
      Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      worked on the idea that we should not be mapping as root addr=0 and
      MAP_FIXED.  However, it was reported that this scenario is in fact
      valid, thus making the patch both bogus and breaks userspace as well.
      
      For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
      initialization[1].
      
      [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
      Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
      Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a73ab244
    • M
      idr: fix invalid ptr dereference on item delete · 7a4deea1
      Matthew Wilcox 提交于
      If the radix tree underlying the IDR happens to be full and we attempt
      to remove an id which is larger than any id in the IDR, we will call
      __radix_tree_delete() with an uninitialised 'slot' pointer, at which
      point anything could happen.  This was easiest to hit with a single
      entry at id 0 and attempting to remove a non-0 id, but it could have
      happened with 64 entries and attempting to remove an id >= 64.
      
      Roman said:
      
        The syzcaller test boils down to opening /dev/kvm, creating an
        eventfd, and calling a couple of KVM ioctls. None of this requires
        superuser. And the result is dereferencing an uninitialized pointer
        which is likely a crash. The specific path caught by syzbot is via
        KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
        other user-triggerable paths, so cc:stable is probably justified.
      
      Matthew added:
      
        We have around 250 calls to idr_remove() in the kernel today. Many of
        them pass an ID which is embedded in the object they're removing, so
        they're safe. Picking a few likely candidates:
      
        drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
        drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
        drivers/atm/nicstar.c could be taken down by a handcrafted packet
      
      Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
      Debugged-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a4deea1
    • C
      ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" · 3373de20
      Changwei Ge 提交于
      This reverts commit ba16ddfb ("ocfs2/o2hb: check len for
      bio_add_page() to avoid getting incorrect bio").
      
      In my testing, this patch introduces a problem that mkfs can't have
      slots more than 16 with 4k block size.
      
      And the original logic is safe actually with the situation it mentions
      so revert this commit.
      
      Attach test log:
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0
        (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0,bi_sector 8192
        (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5
        (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5
        (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5
      
      Link: http://lkml.kernel.org/r/SIXPR06MB0461721F398A5A92FC68C39ED5920@SIXPR06MB0461.apcprd06.prod.outlook.comSigned-off-by: NChangwei Ge <ge.changwei@h3c.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Yiwen Jiang <jiangyiwen@huawei.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3373de20
    • O
      mm: fix nr_rotate_swap leak in swapon() error case · 7cbf3192
      Omar Sandoval 提交于
      If swapon() fails after incrementing nr_rotate_swap, we don't decrement
      it and thus effectively leak it.  Make sure we decrement it if we
      incremented it.
      
      Link: http://lkml.kernel.org/r/b6fe6b879f17fa68eee6cbd876f459f6e5e33495.1526491581.git.osandov@fb.com
      Fixes: 81a0298b ("mm, swap: don't use VMA based swap readahead if HDD is used as swap")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cbf3192