1. 17 2月, 2018 3 次提交
  2. 16 2月, 2018 5 次提交
  3. 15 2月, 2018 14 次提交
  4. 13 2月, 2018 18 次提交
    • T
      x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages · fd0e786d
      Tony Luck 提交于
      In the following commit:
      
        ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      
      ... we added code to memory_failure() to unmap the page from the
      kernel 1:1 virtual address space to avoid speculative access to the
      page logging additional errors.
      
      But memory_failure() may not always succeed in taking the page offline,
      especially if the page belongs to the kernel.  This can happen if
      there are too many corrected errors on a page and either mcelog(8)
      or drivers/ras/cec.c asks to take a page offline.
      
      Since we remove the 1:1 mapping early in memory_failure(), we can
      end up with the page unmapped, but still in use. On the next access
      the kernel crashes :-(
      
      There are also various debug paths that call memory_failure() to simulate
      occurrence of an error. Since there is no actual error in memory, we
      don't need to map out the page for those cases.
      
      Revert most of the previous attempt and keep the solution local to
      arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
      
      	1) there is a real error
      	2) memory_failure() succeeds.
      
      All of this only applies to 64-bit systems. 32-bit kernel doesn't map
      all of memory into kernel space. It isn't worth adding the code to unmap
      the piece that is mapped because nobody would run a 32-bit kernel on a
      machine that has recoverable machine checks.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert (Persistent Memory) <elliott@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org #v4.14
      Fixes: ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd0e786d
    • A
      x86/error_inject: Make just_return_func() globally visible · 01684e72
      Arnd Bergmann 提交于
      With link time optimizations enabled, I get a link failure:
      
        ./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return':
        <artificial>:(.text+0x7f3): undefined reference to `just_return_func'
      
      Marking the symbol .globl makes it work as expected.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 540adea3 ("error-injection: Separate error-injection from kprobe")
      Link: http://lkml.kernel.org/r/20180202145634.200291-3-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      01684e72
    • M
      x86/platform/UV: Fix GAM Range Table entries less than 1GB · c25d99d2
      mike.travis@hpe.com 提交于
      The latest UV platforms include the new ApachePass NVDIMMs into the
      UV address space.  This has introduced address ranges in the Global
      Address Map Table that are less than the previous lowest range, which
      was 2GB.  Fix the address calculation so it accommodates address ranges
      from bytes to exabytes.
      Signed-off-by: NMike Travis <mike.travis@hpe.com>
      Reviewed-by: NAndrew Banman <andrew.banman@hpe.com>
      Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c25d99d2
    • M
      MIPS: Fix incorrect mem=X@Y handling · 67a3ba25
      Marcin Nowakowski 提交于
      Commit 73fbc1eb ("MIPS: fix mem=X@Y commandline processing") added a
      fix to ensure that the memory range between PHYS_OFFSET and low memory
      address specified by mem= cmdline argument is not later processed by
      free_all_bootmem.  This change was incorrect for systems where the
      commandline specifies more than 1 mem argument, as it will cause all
      memory between PHYS_OFFSET and each of the memory offsets to be marked
      as reserved, which results in parts of the RAM marked as reserved
      (Creator CI20's u-boot has a default commandline argument 'mem=256M@0x0
      mem=768M@0x30000000').
      
      Change the behaviour to ensure that only the range between PHYS_OFFSET
      and the lowest start address of the memories is marked as protected.
      
      This change also ensures that the range is marked protected even if it's
      only defined through the devicetree and not only via commandline
      arguments.
      Reported-by: NMathieu Malaterre <mathieu.malaterre@gmail.com>
      Signed-off-by: NMarcin Nowakowski <marcin.nowakowski@mips.com>
      Fixes: 73fbc1eb ("MIPS: fix mem=X@Y commandline processing")
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # v4.11+
      Tested-by: NMathieu Malaterre <malat@debian.org>
      Patchwork: https://patchwork.linux-mips.org/patch/18562/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      67a3ba25
    • P
      x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore · 74eb816b
      Progyan Bhattacharya 提交于
      The file was generated by make command and should not be in the source tree.
      Signed-off-by: NProgyan Bhattacharya <progyanb@acm.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      74eb816b
    • J
      MIPS: BMIPS: Fix section mismatch warning · 627f4a2b
      Jaedon Shin 提交于
      Remove the __init annotation from bmips_cpu_setup() to avoid the
      following warning.
      
      WARNING: vmlinux.o(.text+0x35c950): Section mismatch in reference from the function brcmstb_pm_s3() to the function .init.text:bmips_cpu_setup()
      The function brcmstb_pm_s3() references
      the function __init bmips_cpu_setup().
      This is often because brcmstb_pm_s3 lacks a __init
      annotation or the annotation of bmips_cpu_setup is wrong.
      Signed-off-by: NJaedon Shin <jaedon.shin@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Kevin Cernekee <cernekee@gmail.com>
      Cc: linux-mips@linux-mips.org
      Reviewed-by: NJames Hogan <jhogan@kernel.org>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/18589/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      627f4a2b
    • M
      x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU · 295cc7eb
      Masayoshi Mizuma 提交于
      When a physical CPU is hot-removed, the following warning messages
      are shown while the uncore device is removed in uncore_pci_remove():
      
        WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
        uncore_pci_remove+0xf1/0x110
        ...
        CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
        Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        ...
        Call Trace:
        pci_device_remove+0x36/0xb0
        device_release_driver_internal+0x145/0x210
        pci_stop_bus_device+0x76/0xa0
        pci_stop_root_bus+0x44/0x60
        acpi_pci_root_remove+0x1f/0x80
        acpi_bus_trim+0x54/0x90
        acpi_bus_trim+0x2e/0x90
        acpi_device_hotplug+0x2bc/0x4b0
        acpi_hotplug_work_fn+0x1a/0x30
        process_one_work+0x141/0x340
        worker_thread+0x47/0x3e0
        kthread+0xf5/0x130
      
      When uncore_pci_remove() runs, it tries to get the package ID to
      clear the value of uncore_extra_pci_dev[].dev[] by using
      topology_phys_to_logical_pkg(). The warning messesages are
      shown because topology_phys_to_logical_pkg() returns -1.
      
        arch/x86/events/intel/uncore.c:
        static void uncore_pci_remove(struct pci_dev *pdev)
        {
        ...
                phys_id = uncore_pcibus_to_physid(pdev->bus);
        ...
                        pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                        for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                                if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                        uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                        break;
                                }
                        }
                        WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
      
      topology_phys_to_logical_pkg() tries to find
      cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
      
        arch/x86/kernel/smpboot.c:
        int topology_phys_to_logical_pkg(unsigned int phys_pkg)
        {
                int cpu;
      
                for_each_possible_cpu(cpu) {
                        struct cpuinfo_x86 *c = &cpu_data(cpu);
      
                        if (c->initialized && c->phys_proc_id == phys_pkg)
                                return c->logical_proc_id;
                }
                return -1;
        }
      
      However, the phys_proc_id was already set to 0 by remove_siblinginfo()
      when the CPU was offlined.
      
      So, topology_phys_to_logical_pkg() cannot find the correct
      logical_proc_id and always returns -1.
      
      As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
      messages are shown.
      
      What is worse is that the bogus 'pkg' index results in two bugs:
      
       - We dereference uncore_extra_pci_dev[] with a negative index
       - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
      
      To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
      
      This should not cause any problems, because ->phys_proc_id is not
      used after it is hot-removed and it is re-set while hot-adding.
      Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yasu.isimatu@gmail.com
      Cc: <stable@vger.kernel.org>
      Fixes: 30bb9811 ("x86/topology: Avoid wasting 128k for package id array")
      Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      295cc7eb
    • G
      powerpc/kdump: Fix powernv build break when KEXEC_CORE=n · 91096175
      Guenter Roeck 提交于
      If KEXEC_CORE is not enabled, powernv builds fail as follows.
      
        arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
        arch/powerpc/platforms/powernv/smp.c:236:4: error:
        	implicit declaration of function 'crash_ipi_callback'
      
      Add dummy function calls, similar to kdump_in_progress(), to solve the
      problem.
      
      Fixes: 4145f358 ("powernv/kdump: Fix cases where the kdump kernel can get HMI's")
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      91096175
    • G
      powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug · 82343484
      Guenter Roeck 提交于
      Commit e67e02a5 ("powerpc/pseries: Fix cpu hotplug crash with
      memoryless nodes") adds an unconditional call to
      find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR
      is enabled. This results in the following build error if this is not
      the case.
      
        arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
        arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
        			undefined reference to `.find_and_online_cpu_nid'
      
      Follow the guideline provided by similar functions and provide a dummy
      function if CONFIG_PPC_SPLPAR is not enabled. This also moves the
      external function declaration into an include file where it should be.
      
      Fixes: e67e02a5 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes")
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      [mpe: Change subject to emphasise the build fix]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      82343484
    • A
      powerpc/mm/hash64: Zero PGD pages on allocation · fc5c2f4a
      Aneesh Kumar K.V 提交于
      On powerpc we allocate page table pages from slab caches of different
      sizes. Currently we have a constructor that zeroes out the objects when
      we allocate them for the first time.
      
      We expect the objects to be zeroed out when we free the the object
      back to slab cache. This happens in the unmap path. For hugetlb pages
      we call huge_pte_get_and_clear() to do that.
      
      With the current configuration of page table size, both PUD and PGD
      level tables are allocated from the same slab cache. At the PUD level,
      we use the second half of the table to store the slot information. But
      we never clear that when unmapping.
      
      When such a freed object is then allocated for a PGD page, the second
      half of the page table page will not be zeroed as expected. This
      results in a kernel crash.
      
      Fix it by always clearing PGD pages when they're allocated.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log wording and formatting, add whitespace]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fc5c2f4a
    • A
      powerpc/mm/hash64: Store the slot information at the right offset for hugetlb · ff31e105
      Aneesh Kumar K.V 提交于
      The hugetlb pte entries are at the PMD and PUD level, so we can't use
      PTRS_PER_PTE to find the second half of the page table. Use the right
      offset for PUD/PMD to get to the second half of the table.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ff31e105
    • A
      powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled · 4a7aa4fe
      Aneesh Kumar K.V 提交于
      We use the second half of the page table to store slot information, so we must
      allocate it always if hugetlb is possible.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4a7aa4fe
    • A
      powerpc/mm: Fix crashes with 16G huge pages · fae22116
      Aneesh Kumar K.V 提交于
      To support memory keys, we moved the hash pte slot information to the
      second half of the page table. This was ok with PTE entries at level
      4 (PTE page) and level 3 (PMD). We already allocate larger page table
      pages at those levels to accomodate extra details. For level 4 we
      already have the extra space which was used to track 4k hash page
      table entry details and at level 3 the extra space was allocated to
      track the THP details.
      
      With hugetlbfs PTE, we used this extra space at the PMD level to store
      the slot details. But we also support hugetlbfs PTE at PUD level for
      16GB pages and PUD level page didn't allocate extra space. This
      resulted in memory corruption.
      
      Fix this by allocating extra space at PUD level when HUGETLB is
      enabled.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fae22116
    • A
      powerpc/mm: Flush radix process translations when setting MMU type · 62e984dd
      Alexey Kardashevskiy 提交于
      Radix guests do normally invalidate process-scoped translations when a
      new pid is allocated but migrated guests do not invalidate these so
      migrated guests crash sometime, especially easy to reproduce with
      migration happening within first 10 seconds after the guest boot start
      on the same machine.
      
      This adds the "Invalidate process-scoped translations" flush to fix
      radix guests migration.
      
      Fixes: 2ee13be3 ("KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00")
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Tested-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      62e984dd
    • N
      powerpc/vas: Don't set uses_vas for kernel windows · b00b6289
      Nicholas Piggin 提交于
      cp_abort is only required for user windows, because kernel context
      must not be preempted between a copy/paste pair.
      
      Without this patch, the init task gets used_vas set when it runs the
      nx842_powernv_init initcall, which opens windows for kernel usage.
      
      used_vas is then never cleared anywhere, so it gets propagated into
      all other tasks. It's a property of the address space, so it should
      really be cleared when a new mm is created (or in dup_mmap if the
      mmaps are marked as VM_DONTCOPY). For now we seem to have no such
      driver, so leave that for another patch.
      
      Fixes: 6c8e6bb2 ("powerpc/vas: Add support for user receive window")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b00b6289
    • S
      powerpc/pseries: Enable RAS hotplug events later · c9dccf1d
      Sam Bobroff 提交于
      Currently if the kernel receives a memory hot-unplug event early
      enough, it may get stuck in an infinite loop in
      dissolve_free_huge_pages(). This appears as a stall just after:
      
        pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY
      
      It appears to be caused by "minimum_order" being uninitialized, due to
      init_ras_IRQ() executing before hugetlb_init().
      
      To correct this, extract the part of init_ras_IRQ() that enables
      hotplug event processing and place it in the machine_late_initcall
      phase, which is guaranteed to be after hugetlb_init() is called.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Reorder the functions to make the diff readable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9dccf1d
    • jia zhang's avatar
      x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally · cd026ca2
      jia zhang 提交于
      The vsyscall page should be visible only if vsyscall=emulate/native when dumping /proc/kcore.
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-3-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd026ca2
    • jia zhang's avatar
      vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page · 595dd46e
      jia zhang 提交于
      Commit:
      
        df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data")
      
      ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
      However, accessing the vsyscall user page will cause an SMAP fault.
      
      Replace memcpy() with copy_from_user() to fix this bug works, but adding
      a common way to handle this sort of user page may be useful for future.
      
      Currently, only vsyscall page requires KCORE_USER.
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      595dd46e