1. 05 6月, 2014 2 次提交
    • M
      x86: require x86-64 for automatic NUMA balancing · 4468dd76
      Mel Gorman 提交于
      32-bit support for NUMA is an oddity on its own but with automatic NUMA
      balancing on top there is a reasonable risk that the CPUPID information
      cannot be stored in the page flags.  This patch removes support for
      automatic NUMA support on 32-bit x86.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4468dd76
    • N
      hugetlb: restrict hugepage_migration_support() to x86_64 · c177c81e
      Naoya Horiguchi 提交于
      Currently hugepage migration is available for all archs which support
      pmd-level hugepage, but testing is done only for x86_64 and there're
      bugs for other archs.  So to avoid breaking such archs, this patch
      limits the availability strictly to x86_64 until developers of other
      archs get interested in enabling this feature.
      
      Simply disabling hugepage migration on non-x86_64 archs is not enough to
      fix the reported problem where sys_move_pages() hits the BUG_ON() in
      follow_page(FOLL_GET), so let's fix this by checking if hugepage
      migration is supported in vma_migratable().
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c177c81e
  2. 31 5月, 2014 3 次提交
    • B
      mce: Panic when a core has reached a timeout · 716079f6
      Borislav Petkov 提交于
      There is very little and maybe practically nothing we can do to recover
      from a system where at least one core has reached a timeout during the
      whole monarch cores gathering. So panic when that happens.
      
      Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
      716079f6
    • M
      x86/mce: Improve mcheck_init_device() error handling · 9c15a24b
      Mathieu Souchaud 提交于
      Check return code of every function called by mcheck_init_device().
      Signed-off-by: NMathieu Souchaud <mattieu.souchaud@free.fr>
      Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.frSigned-off-by: NBorislav Petkov <bp@suse.de>
      9c15a24b
    • M
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim 提交于
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
  3. 29 5月, 2014 1 次提交
  4. 28 5月, 2014 5 次提交
    • H
      PCI: Turn pcibios_penalize_isa_irq() into a weak function · a43ae58c
      Hanjun Guo 提交于
      pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
      is not used by some architectures.  Make pcibios_penalize_isa_irq() a
      __weak function to simplify the code.  This removes the need for new
      platforms to add stub implementations of pcibios_penalize_isa_irq().
      
      [bhelgaas: changelog, comments]
      Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      a43ae58c
    • Y
      x86/PCI: Use pci_is_bridge() to simplify code · 56a41f99
      Yijing Wang 提交于
      Use pci_is_bridge() to simplify code.  No functional change.
      
      Requires: 326c1cda PCI: Rename pci_is_bridge() to pci_has_subordinate()
      Requires: 1c86438c PCI: Add new pci_is_bridge() interface
      Signed-off-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      56a41f99
    • S
      x86/PCI: Clean up and mark early_root_info_init() as deprecated · 9e7f7231
      Suravee Suthikulpanit 提交于
      early_root_info_init() is now deprecated in favor of info in ACPI.  Add a
      note to that effect.  Also, clean up the code a bit.
      
      There is no functional change.
      Signed-off-by: NSuravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      9e7f7231
    • L
      ACPICA: Clean up redudant definitions already defined elsewhere · 92985ef1
      Lv Zheng 提交于
      Since mis-order issues have been solved, we can cleanup redundant
      definitions that already have defaults in <acpi/platform/acenv.h>.
      
      This patch removes redudant environments for __KERNEL__ surrounded code.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      92985ef1
    • L
      ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h> · 07d83914
      Lv Zheng 提交于
      There is a mis-order inclusion for <asm/acpi.h>.
      
      As we will enforce including <linux/acpi.h> for all Linux ACPI users, we
      can find the inclusion order is as follows:
      
      <linux/acpi.h>
        <acpi/acpi.h>
         <acpi/platform/acenv.h>
          (acenv.h before including aclinux.h)
          <acpi/platform/aclinux.h>
      ...........................................................................
           (aclinux.h before including asm/acpi.h)
           <asm/acpi.h>                             @Redundant@
            (ACPICA specific stuff)
      ...........................................................................
      ...........................................................................
            (Linux ACPI specific stuff) ? - - - - - - - - - - - - +
           (aclinux.h after including asm/acpi.h)   @Invisible@   |
          (acenv.h after including aclinux.h)       @Invisible@   |
         other ACPICA headers                       @Invisible@   |
      ............................................................|..............
        <acpi/acpi_bus.h>                                         |
        <acpi/acpi_drivers.h>                                     |
        <asm/acpi.h> (Excluded)                                   |
         (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - +
      
      NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig
      generated <generated/autoconf.h> for Linux, it is meant to be included
      before including any ACPICA code.
      
      In the above figure, there is a question mark for "Linux ACPI specific
      stuff" in <asm/acpi.h> which should be included after including all other
      ACPICA header files.  Thus they really need to be moved to the position
      marked with exclaimation mark or the definitions in the blocks marked with
      "@Invisible@" will be invisible to such architecture specific "Linux ACPI
      specific stuff" header blocks.  This leaves 2 issues:
      1. All environmental definitions in these blocks should have a copy in the
         area marked with "@Redundant@" if they are required by the "Linux ACPI
         specific stuff".
      2. We cannot use any ACPICA defined types in <asm/acpi.h>.
      
      This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to
      fix this issue.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      07d83914
  5. 27 5月, 2014 3 次提交
    • M
      x86/xen: map foreign pfns for autotranslated guests · 77945ca7
      Mukesh Rathor 提交于
      When running as a dom0 in PVH mode, foreign pfns that are accessed
      must be added to our p2m which is managed by xen. This is done via
      XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
      building guests and mapping guest memory, xentrace mapping xen pages,
      etc.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      77945ca7
    • N
      KVM: x86: MOV CR/DR emulation should ignore mod · 9b88ae99
      Nadav Amit 提交于
      MOV CR/DR instructions ignore the mod field (in the ModR/M byte). As the SDM
      states: "The 2 bits in the mod field are ignored".  Accordingly, the second
      operand of these instructions is always a general purpose register.
      
      The current emulator implementation does not do so. If the mod bits do not
      equal 3, it expects the second operand to be in memory.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9b88ae99
    • P
      KVM: lapic: sync highest ISR to hardware apic on EOI · fc57ac2c
      Paolo Bonzini 提交于
      When Hyper-V enlightenments are in effect, Windows prefers to issue an
      Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
      The Hyper-V MSR write is not handled by the processor, and besides
      being slower, this also causes bugs with APIC virtualization.  The
      reason is that on EOI the processor will modify the highest in-service
      interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
      the SDM; every other step in EOI virtualization is already done by
      apic_send_eoi or on VM entry, but this one is missing.
      
      We need to do the same, and be careful not to muck with the isr_count
      and highest_isr_cache fields that are unused when virtual interrupt
      delivery is enabled.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc57ac2c
  6. 24 5月, 2014 3 次提交
  7. 22 5月, 2014 11 次提交
  8. 21 5月, 2014 2 次提交
  9. 19 5月, 2014 1 次提交
  10. 16 5月, 2014 1 次提交
  11. 15 5月, 2014 8 次提交