1. 10 11月, 2014 1 次提交
  2. 28 10月, 2014 1 次提交
    • O
      sched: stop the unbound recursion in preempt_schedule_context() · 009f60e2
      Oleg Nesterov 提交于
      preempt_schedule_context() does preempt_enable_notrace() at the end
      and this can call the same function again; exception_exit() is heavy
      and it is quite possible that need-resched is true again.
      
      1. Change this code to dec preempt_count() and check need_resched()
         by hand.
      
      2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
         the enable/disable dance around __schedule(). But in this case
         we need to move into sched/core.c.
      
      3. Cosmetic, but x86 forgets to declare this function. This doesn't
         really matter because it is only called by asm helpers, still it
         make sense to add the declaration into asm/preempt.h to match
         preempt_schedule().
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      009f60e2
  3. 24 10月, 2014 2 次提交
    • A
      KVM: x86: Prevent host from panicking on shared MSR writes. · 8b3c3104
      Andy Honig 提交于
      The previous patch blocked invalid writes directly when the MSR
      is written.  As a precaution, prevent future similar mistakes by
      gracefulling handle GPs caused by writes to shared MSRs.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndrew Honig <ahonig@google.com>
      [Remove parts obsoleted by Nadav's patch. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8b3c3104
    • N
      KVM: x86: Check non-canonical addresses upon WRMSR · 854e8bb1
      Nadav Amit 提交于
      Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
      written to certain MSRs. The behavior is "almost" identical for AMD and Intel
      (ignoring MSRs that are not implemented in either architecture since they would
      anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
      non-canonical address is written on Intel but not on AMD (which ignores the top
      32-bits).
      
      Accordingly, this patch injects a #GP on the MSRs which behave identically on
      Intel and AMD.  To eliminate the differences between the architecutres, the
      value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
      canonical value before writing instead of injecting a #GP.
      
      Some references from Intel and AMD manuals:
      
      According to Intel SDM description of WRMSR instruction #GP is expected on
      WRMSR "If the source register contains a non-canonical address and ECX
      specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
      IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."
      
      According to AMD manual instruction manual:
      LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
      LSTAR and CSTAR registers.  If an RIP written by WRMSR is not in canonical
      form, a general-protection exception (#GP) occurs."
      IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
      base field must be in canonical form or a #GP fault will occur."
      IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
      be in canonical form."
      
      This patch fixes CVE-2014-3610.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      854e8bb1
  4. 17 10月, 2014 1 次提交
  5. 10 10月, 2014 1 次提交
    • M
      mm: remove misleading ARCH_USES_NUMA_PROT_NONE · 6a33979d
      Mel Gorman 提交于
      ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
      _PAGE_NUMA using _PROT_NONE.  This saved using an additional PTE bit and
      relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
      fault scanner.  This was found to be conceptually confusing with a lot of
      implicit assumptions and it was asked that an alternative be found.
      
      Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
      PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
      bits and shrunk the maximum possible swap size but it did not go far
      enough.  There are no architectures that reuse _PROT_NONE as _PROT_NUMA
      but the relics still exist.
      
      This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
      duplication in powerpc vs the generic implementation by defining the types
      the core NUMA helpers expected to exist from x86 with their ppc64
      equivalent.  This necessitated that a PTE bit mask be created that
      identified the bits that distinguish present from NUMA pte entries but it
      is expected this will only differ between arches based on _PAGE_PROTNONE.
      The naming for the generic helpers was taken from x86 originally but ppc64
      has types that are equivalent for the purposes of the helper so they are
      mapped instead of duplicating code.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a33979d
  6. 08 10月, 2014 1 次提交
  7. 04 10月, 2014 5 次提交
    • M
      efi: Delete the in_nmi() conditional runtime locking · 60b4dc77
      Matt Fleming 提交于
      commit 5dc3826d9f08 ("efi: Implement mandatory locking for UEFI Runtime
      Services") implemented some conditional locking when accessing variable
      runtime services that Ingo described as "pretty disgusting".
      
      The intention with the !efi_in_nmi() checks was to avoid live-locks when
      trying to write pstore crash data into an EFI variable. Such lockless
      accesses are allowed according to the UEFI specification when we're in a
      "non-recoverable" state, but whether or not things are implemented
      correctly in actual firmware implementations remains an unanswered
      question, and so it would seem sensible to avoid doing any kind of
      unsynchronized variable accesses.
      
      Furthermore, the efi_in_nmi() tests are inadequate because they don't
      account for the case where we call EFI variable services from panic or
      oops callbacks and aren't executing in NMI context. In other words,
      live-locking is still possible.
      
      Let's just remove the conditional locking altogether. Now we've got the
      ->set_variable_nonblocking() EFI variable operation we can abort if the
      runtime lock is already held. Aborting is by far the safest option.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      60b4dc77
    • M
      x86/efi: Mark initialization code as such · 4e78eb05
      Mathias Krause 提交于
      The 32 bit and 64 bit implementations differ in their __init annotations
      for some functions referenced from the common EFI code. Namely, the 32
      bit variant is missing some of the __init annotations the 64 bit variant
      has.
      
      To solve the colliding annotations, mark the corresponding functions in
      efi_32.c as initialization code, too -- as it is such.
      
      Actually, quite a few more functions are only used during initialization
      and therefore can be marked __init. They are therefore annotated, too.
      Also add the __init annotation to the prototypes in the efi.h header so
      users of those functions will see it's meant as initialization code
      only.
      
      This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might
      be more appropriate but this is C code after all, not an opera! :D)
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      4e78eb05
    • M
      x86/efi: Unexport add_efi_memmap variable · 60920685
      Mathias Krause 提交于
      This variable was accidentally exported, even though it's only used in
      this compilation unit and only during initialization.
      
      Remove the bogus export, make the variable static instead and mark it
      as __initdata.
      
      Fixes: 200001eb ("x86 boot: only pick up additional EFI memmap...")
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      60920685
    • M
      x86/efi: Remove unused efi_call* macros · 24ffd84b
      Mathias Krause 提交于
      Complement commit 62fa6e69 ("x86/efi: Delete most of the efi_call*
      macros") and delete the stub macros for the !CONFIG_EFI case, too. In
      fact, there are no EFI calls in this case so we don't need a dummy for
      efi_call() even.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      24ffd84b
    • A
      efi: Implement mandatory locking for UEFI Runtime Services · 161485e8
      Ard Biesheuvel 提交于
      According to section 7.1 of the UEFI spec, Runtime Services are not fully
      reentrant, and there are particular combinations of calls that need to be
      serialized. Use a spinlock to serialize all Runtime Services with respect
      to all others, even if this is more than strictly needed.
      
      We've managed to get away without requiring a runtime services lock
      until now because most of the interactions with EFI involve EFI
      variables, and those operations are already serialised with
      __efivars->lock.
      
      Some of the assumptions underlying the decision whether locks are
      needed or not (e.g., SetVariable() against ResetSystem()) may not
      apply universally to all [new] architectures that implement UEFI.
      Rather than try to reason our way out of this, let's just implement at
      least what the spec requires in terms of locking.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      161485e8
  8. 03 10月, 2014 1 次提交
  9. 24 9月, 2014 7 次提交
  10. 23 9月, 2014 1 次提交
    • D
      x86: remove the Xen-specific _PAGE_IOMAP PTE flag · f955371c
      David Vrabel 提交于
      The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
      that were used to map I/O regions that are 1:1 in the p2m.  This
      allowed Xen to obtain the correct PFN when converting the MFNs read
      from a PTE back to their PFN.
      
      Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
      returns the correct PFN by using a combination of the m2p and p2m to
      determine if an MFN corresponds to a 1:1 mapping in the the p2m.
      
      Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
      future uses of the PTE flag.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      f955371c
  11. 22 9月, 2014 1 次提交
  12. 17 9月, 2014 1 次提交
    • T
      kvm: Remove ept_identity_pagetable from struct kvm_arch. · a255d479
      Tang Chen 提交于
      kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
      it is never used to refer to the page at all.
      
      In vcpu initialization, it indicates two things:
      1. indicates if ept page is allocated
      2. indicates if a memory slot for identity page is initialized
      
      Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
      identity pagetable is initialized. So we can remove ept_identity_pagetable.
      
      NOTE: In the original code, ept identity pagetable page is pinned in memroy.
            As a result, it cannot be migrated/hot-removed. After this patch, since
            kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
            is no longer pinned in memory. And it can be migrated/hot-removed.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NGleb Natapov <gleb@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a255d479
  13. 16 9月, 2014 2 次提交
    • L
      x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() · 8b375f64
      Luiz Capitulino 提交于
      The setup_node_data() function allocates a pg_data_t object,
      inserts it into the node_data[] array and initializes the
      following fields: node_id, node_start_pfn and
      node_spanned_pages.
      
      However, a few function calls later during the kernel boot,
      free_area_init_node() re-initializes those fields, possibly with
      setup_node_data() is not used.
      
      This causes a small glitch when running Linux as a hyperv numa
      guest:
      
        SRAT: PXM 0 -> APIC 0x00 -> Node 0
        SRAT: PXM 0 -> APIC 0x01 -> Node 0
        SRAT: PXM 1 -> APIC 0x02 -> Node 1
        SRAT: PXM 1 -> APIC 0x03 -> Node 1
        SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
        SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
        SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
        NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
        Initmem setup node 0 [mem 0x00000000-0x7fffffff]
          NODE_DATA [mem 0x7ffdc000-0x7ffeffff]
        Initmem setup node 1 [mem 0x80800000-0x1081fffff]
          NODE_DATA [mem 0x1081ea000-0x1081fdfff]
        crashkernel: memory value expected
         [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
         [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
        Zone ranges:
          DMA      [mem 0x00001000-0x00ffffff]
          DMA32    [mem 0x01000000-0xffffffff]
          Normal   [mem 0x100000000-0x1081fffff]
        Movable zone start for each node
        Early memory node ranges
          node   0: [mem 0x00001000-0x0009efff]
          node   0: [mem 0x00100000-0x7ffeffff]
          node   1: [mem 0x80200000-0xf7ffffff]
          node   1: [mem 0x100000000-0x1081fffff]
        On node 0 totalpages: 524174
          DMA zone: 64 pages used for memmap
          DMA zone: 21 pages reserved
          DMA zone: 3998 pages, LIFO batch:0
          DMA32 zone: 8128 pages used for memmap
          DMA32 zone: 520176 pages, LIFO batch:31
        On node 1 totalpages: 524288
          DMA32 zone: 7672 pages used for memmap
          DMA32 zone: 491008 pages, LIFO batch:31
          Normal zone: 520 pages used for memmap
          Normal zone: 33280 pages, LIFO batch:7
      
      In this dmesg, the SRAT table reports that the memory range for
      node 1 starts at 0x80200000.  However, the line starting with
      "Initmem" reports that node 1 memory range starts at 0x80800000.
       The "Initmem" line is reported by setup_node_data() and is
      wrong, because the kernel ends up using the range as reported in
      the SRAT table.
      
      This commit drops all that dead code from setup_node_data(),
      renames it to alloc_node_data() and adds a printk() to
      free_area_init_node() so that we report a node's memory range
      accurately.
      
      Here's the same dmesg section with this patch applied:
      
         SRAT: PXM 0 -> APIC 0x00 -> Node 0
         SRAT: PXM 0 -> APIC 0x01 -> Node 0
         SRAT: PXM 1 -> APIC 0x02 -> Node 1
         SRAT: PXM 1 -> APIC 0x03 -> Node 1
         SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
         SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
         SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
         NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
         NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff]
         NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff]
         crashkernel: memory value expected
          [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
          [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
         Zone ranges:
           DMA      [mem 0x00001000-0x00ffffff]
           DMA32    [mem 0x01000000-0xffffffff]
           Normal   [mem 0x100000000-0x1081fffff]
         Movable zone start for each node
         Early memory node ranges
           node   0: [mem 0x00001000-0x0009efff]
           node   0: [mem 0x00100000-0x7ffeffff]
           node   1: [mem 0x80200000-0xf7ffffff]
           node   1: [mem 0x100000000-0x1081fffff]
         Initmem setup node 0 [mem 0x00001000-0x7ffeffff]
         On node 0 totalpages: 524174
           DMA zone: 64 pages used for memmap
           DMA zone: 21 pages reserved
           DMA zone: 3998 pages, LIFO batch:0
           DMA32 zone: 8128 pages used for memmap
           DMA32 zone: 520176 pages, LIFO batch:31
         Initmem setup node 1 [mem 0x80200000-0x1081fffff]
         On node 1 totalpages: 524288
           DMA32 zone: 7672 pages used for memmap
           DMA32 zone: 491008 pages, LIFO batch:31
           Normal zone: 520 pages used for memmap
           Normal zone: 33280 pages, LIFO batch:7
      
      This commit was tested on a two node bare-metal NUMA machine and
      Linux as a numa guest on hyperv and qemu/kvm.
      
      PS: The wrong memory range reported by setup_node_data() seems to be
          harmless in the current kernel because it's just not used.  However,
          that bad range is used in kernel 2.6.32 to initialize the old boot
          memory allocator, which causes a crash during boot.
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8b375f64
    • Y
      x86/mm/hotplug: Modify PGD entry when removing memory · 9661d5bc
      Yasuaki Ishimatsu 提交于
      When hot-adding/removing memory, sync_global_pgds() is called
      for synchronizing PGD to PGD entries of all processes MM.  But
      when hot-removing memory, sync_global_pgds() does not work
      correctly.
      
      At first, sync_global_pgds() checks whether target PGD is none
      or not.  And if PGD is none, the PGD is skipped.  But when
      hot-removing memory, PGD may be none since PGD may be cleared by
      free_pud_table().  So when sync_global_pgds() is called after
      hot-removing memory, sync_global_pgds() should not skip PGD even
      if the PGD is none.  And sync_global_pgds() must clear PGD
      entries of all processes MM.
      
      Currently sync_global_pgds() does not clear PGD entries of all
      processes MM when hot-removing memory.  So when hot adding
      memory which is same memory range as removed memory after
      hot-removing memory, following call traces are shown:
      
       kernel BUG at arch/x86/mm/init_64.c:206!
       ...
       [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2
       [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380
       [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0
       [<ffffffff815d03d9>] add_memory+0xb9/0x1b0
       [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e
       [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0
       [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f
       [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
       [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
       [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5
       [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2
       [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e
       [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15
       [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32
       [<ffffffff8107e02b>] process_one_work+0x17b/0x460
       [<ffffffff8107edfb>] worker_thread+0x11b/0x400
       [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
       [<ffffffff81085aef>] kthread+0xcf/0xe0
       [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
      
      This patch clears PGD entries of all processes MM when
      sync_global_pgds() is called after hot-removing memory
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9661d5bc
  14. 14 9月, 2014 4 次提交
    • D
      x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8 · 3eddc69f
      Dave Young 提交于
      3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the
      bottom line of screen.
      
      Bisected, the first bad commit is below:
      commit 86dfc6f3
      Author: Lv Zheng <lv.zheng@intel.com>
      Date:   Fri Apr 4 12:38:57 2014 +0800
      
          ACPICA: Tables: Fix table checksums verification before installation.
      
      I did some debugging by enabling both serial and efi earlyprintk, below is
      some debug dmesg, seems early_ioremap fails in scroll up function due to
      no free slot, see below dmesg output:
      
        WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4()
        __early_ioremap(ed00c800, 00000c80) not found slot
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204
        Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013
        Call Trace:
          dump_stack+0x4e/0x7a
          warn_slowpath_common+0x75/0x8e
          ? __early_ioremap+0x90/0x1c4
          warn_slowpath_fmt+0x47/0x49
          __early_ioremap+0x90/0x1c4
          ? sprintf+0x46/0x48
          early_ioremap+0x13/0x15
          early_efi_map+0x24/0x26
          early_efi_scroll_up+0x6d/0xc0
          early_efi_write+0x1b0/0x214
          call_console_drivers.constprop.21+0x73/0x7e
          console_unlock+0x151/0x3b2
          ? vprintk_emit+0x49f/0x532
          vprintk_emit+0x521/0x532
          ? console_unlock+0x383/0x3b2
          printk+0x4f/0x51
          acpi_os_vprintf+0x2b/0x2d
          acpi_os_printf+0x43/0x45
          acpi_info+0x5c/0x63
          ? __acpi_map_table+0x13/0x18
          ? acpi_os_map_iomem+0x21/0x147
          acpi_tb_print_table_header+0x177/0x186
          acpi_tb_install_table_with_override+0x4b/0x62
          acpi_tb_install_standard_table+0xd9/0x215
          ? early_ioremap+0x13/0x15
          ? __acpi_map_table+0x13/0x18
          acpi_tb_parse_root_table+0x16e/0x1b4
          acpi_initialize_tables+0x57/0x59
          acpi_table_init+0x50/0xce
          acpi_boot_table_init+0x1e/0x85
          setup_arch+0x9b7/0xcc4
          start_kernel+0x94/0x42d
          ? early_idt_handlers+0x120/0x120
          x86_64_start_reservations+0x2a/0x2c
          x86_64_start_kernel+0xf3/0x100
      
      Quote reply from Lv.zheng about the early ioremap slot usage in this case:
      
      """
      In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer.
      In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table().
      We now need 2 mapping entries:
      1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table.
      2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it.
      
      When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used.
      The current 4 slots setting of early_ioremap() seems to be too small for such a use case.
      """
      
      Thus increase the slot to 8 in this patch to fix this issue.
      boot-time mappings become 512 page with this patch.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: <stable@vger.kernel.org> # v3.16
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      3eddc69f
    • L
      Make ARCH_HAS_FAST_MULTIPLIER a real config variable · 72d93104
      Linus Torvalds 提交于
      It used to be an ad-hoc hack defined by the x86 version of
      <asm/bitops.h> that enabled a couple of library routines to know whether
      an integer multiply is faster than repeated shifts and additions.
      
      This just makes it use the real Kconfig system instead, and makes x86
      (which was the only architecture that did this) select the option.
      
      NOTE! Even for x86, this really is kind of wrong.  If we cared, we would
      probably not enable this for builds optimized for netburst (P4), where
      shifts-and-adds are generally faster than multiplies.  This patch does
      *not* change that kind of logic, though, it is purely a syntactic change
      with no code changes.
      
      This was triggered by the fact that we have other places that really
      want to know "do I want to expand multiples by constants by hand or
      not", particularly the hash generation code.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72d93104
    • F
      x86: Tell irq work about self IPI support · 3010279f
      Frederic Weisbecker 提交于
      x86 supports irq work self-IPIs when local apic is available. This is
      partly known on runtime so lets implement arch_irq_work_has_interrupt()
      accordingly.
      
      This should be safely called after setup_arch().
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3010279f
    • P
      irq_work: Introduce arch_irq_work_has_interrupt() · c5c38ef3
      Peter Zijlstra 提交于
      The nohz full code needs irq work to trigger its own interrupt so that
      the subsystem can work even when the tick is stopped.
      
      Lets introduce arch_irq_work_has_interrupt() that archs can override to
      tell about their support for this ability.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      c5c38ef3
  15. 12 9月, 2014 3 次提交
    • D
      x86: Add more disabled features · 9298b815
      Dave Hansen 提交于
      The original motivation for these patches was for an Intel CPU
      feature called MPX.  The patch to add a disabled feature for it
      will go in with the other parts of the support.
      
      But, in the meantime, there are a few other features than MPX
      that we can make assumptions about at compile-time based on
      compile options.  Add them to disabled-features.h and check them
      with cpu_feature_enabled().
      
      Note that this gets rid of the last things that needed an #ifdef
      CONFIG_X86_64 in cpufeature.h.  Yay!
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.comAcked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9298b815
    • D
      x86: Introduce disabled-features · 381aa07a
      Dave Hansen 提交于
      I believe the REQUIRED_MASK aproach was taken so that it was
      easier to consult in assembly (arch/x86/kernel/verify_cpu.S).
      DISABLED_MASK does not have the same restriction, but I
      implemented it the same way for consistency.
      
      We have a REQUIRED_MASK... which does two things:
      1. Keeps a list of cpuid bits to check in very early boot and
         refuse to boot if those are not present.
      2. Consulted during cpu_has() checks, which allows us to
         optimize out things at compile-time.  In other words, if we
         *KNOW* we will not boot with the feature off, then we can
         safely assume that it will be present forever.
      
      But, we don't have a similar mechanism for CPU features which
      may be present but that we know we will not use.  We simply
      use our existing mechanisms to repeatedly check the status of
      the bit at runtime (well, the alternatives patching helps here
      but it does not provide compile-time optimization).
      
      Adding a feature to disabled-features.h allows the bit to be
      checked via a new macro: cpu_feature_enabled().  Note that
      for features in DISABLED_MASK, checks with this macro have
      all of the benefits of an #ifdef.  Before, we would have done
      this in a header:
      
      #ifdef CONFIG_X86_INTEL_MPX
      #define cpu_has_mpx cpu_has(X86_FEATURE_MPX)
      #else
      #define cpu_has_mpx 0
      #endif
      
      and this in the code:
      
      	if (cpu_has_mpx)
      		do_some_mpx_thing();
      
      Now, just add your feature to DISABLED_MASK and you can do this
      everywhere, and get the same benefits you would have from
      #ifdefs:
      
      	if (cpu_feature_enabled(X86_FEATURE_MPX))
      		do_some_mpx_thing();
      
      We need a new function and *not* a modification to cpu_has()
      because there are cases where we actually need to check the CPU
      itself, despite what features the kernel supports.  The best
      example of this is a hypervisor which has no control over what
      features its guests are using and where the guest does not depend
      on the host for support.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.comAcked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      381aa07a
    • D
      x86: Axe the lightly-used cpu_has_pae · c8128cce
      Dave Hansen 提交于
      cpu_has_pae is only referenced in one place: the X86_32 kexec
      code (in a file not even built on 64-bit).  It hardly warrants
      its own macro, or the trouble we go to ensuring that it can't
      be called in X86_64 code.
      
      Axe the macro and replace it with a direct cpu feature check.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.comAcked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c8128cce
  16. 10 9月, 2014 3 次提交
  17. 09 9月, 2014 2 次提交
  18. 06 9月, 2014 2 次提交
  19. 05 9月, 2014 1 次提交