1. 31 5月, 2014 1 次提交
    • M
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim 提交于
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
  2. 22 5月, 2014 1 次提交
  3. 16 5月, 2014 2 次提交
    • A
      net: filter: x86: internal BPF JIT · 62258278
      Alexei Starovoitov 提交于
      Maps all internal BPF instructions into x86_64 instructions.
      This patch replaces original BPF x64 JIT with internal BPF x64 JIT.
      sysctl net.core.bpf_jit_enable is reused as on/off switch.
      
      Performance:
      
      1. old BPF JIT and internal BPF JIT generate equivalent x86_64 code.
        No performance difference is observed for filters that were JIT-able before
      
      Example assembler code for BPF filter "tcpdump port 22"
      
      original BPF -> old JIT:            original BPF -> internal BPF -> new JIT:
         0:   push   %rbp                      0:     push   %rbp
         1:   mov    %rsp,%rbp                 1:     mov    %rsp,%rbp
         4:   sub    $0x60,%rsp                4:     sub    $0x228,%rsp
         8:   mov    %rbx,-0x8(%rbp)           b:     mov    %rbx,-0x228(%rbp) // prologue
                                              12:     mov    %r13,-0x220(%rbp)
                                              19:     mov    %r14,-0x218(%rbp)
                                              20:     mov    %r15,-0x210(%rbp)
                                              27:     xor    %eax,%eax         // clear A
         c:   xor    %ebx,%ebx                29:     xor    %r13,%r13         // clear X
         e:   mov    0x68(%rdi),%r9d          2c:     mov    0x68(%rdi),%r9d
        12:   sub    0x6c(%rdi),%r9d          30:     sub    0x6c(%rdi),%r9d
        16:   mov    0xd8(%rdi),%r8           34:     mov    0xd8(%rdi),%r10
                                              3b:     mov    %rdi,%rbx
        1d:   mov    $0xc,%esi                3e:     mov    $0xc,%esi
        22:   callq  0xffffffffe1021e15       43:     callq  0xffffffffe102bd75
        27:   cmp    $0x86dd,%eax             48:     cmp    $0x86dd,%rax
        2c:   jne    0x0000000000000069       4f:     jne    0x000000000000009a
        2e:   mov    $0x14,%esi               51:     mov    $0x14,%esi
        33:   callq  0xffffffffe1021e31       56:     callq  0xffffffffe102bd91
        38:   cmp    $0x84,%eax               5b:     cmp    $0x84,%rax
        3d:   je     0x0000000000000049       62:     je     0x0000000000000074
        3f:   cmp    $0x6,%eax                64:     cmp    $0x6,%rax
        42:   je     0x0000000000000049       68:     je     0x0000000000000074
        44:   cmp    $0x11,%eax               6a:     cmp    $0x11,%rax
        47:   jne    0x00000000000000c6       6e:     jne    0x0000000000000117
        49:   mov    $0x36,%esi               74:     mov    $0x36,%esi
        4e:   callq  0xffffffffe1021e15       79:     callq  0xffffffffe102bd75
        53:   cmp    $0x16,%eax               7e:     cmp    $0x16,%rax
        56:   je     0x00000000000000bf       82:     je     0x0000000000000110
        58:   mov    $0x38,%esi               88:     mov    $0x38,%esi
        5d:   callq  0xffffffffe1021e15       8d:     callq  0xffffffffe102bd75
        62:   cmp    $0x16,%eax               92:     cmp    $0x16,%rax
        65:   je     0x00000000000000bf       96:     je     0x0000000000000110
        67:   jmp    0x00000000000000c6       98:     jmp    0x0000000000000117
        69:   cmp    $0x800,%eax              9a:     cmp    $0x800,%rax
        6e:   jne    0x00000000000000c6       a1:     jne    0x0000000000000117
        70:   mov    $0x17,%esi               a3:     mov    $0x17,%esi
        75:   callq  0xffffffffe1021e31       a8:     callq  0xffffffffe102bd91
        7a:   cmp    $0x84,%eax               ad:     cmp    $0x84,%rax
        7f:   je     0x000000000000008b       b4:     je     0x00000000000000c2
        81:   cmp    $0x6,%eax                b6:     cmp    $0x6,%rax
        84:   je     0x000000000000008b       ba:     je     0x00000000000000c2
        86:   cmp    $0x11,%eax               bc:     cmp    $0x11,%rax
        89:   jne    0x00000000000000c6       c0:     jne    0x0000000000000117
        8b:   mov    $0x14,%esi               c2:     mov    $0x14,%esi
        90:   callq  0xffffffffe1021e15       c7:     callq  0xffffffffe102bd75
        95:   test   $0x1fff,%ax              cc:     test   $0x1fff,%rax
        99:   jne    0x00000000000000c6       d3:     jne    0x0000000000000117
                                              d5:     mov    %rax,%r14
        9b:   mov    $0xe,%esi                d8:     mov    $0xe,%esi
        a0:   callq  0xffffffffe1021e44       dd:     callq  0xffffffffe102bd91 // MSH
                                              e2:     and    $0xf,%eax
                                              e5:     shl    $0x2,%eax
                                              e8:     mov    %rax,%r13
                                              eb:     mov    %r14,%rax
                                              ee:     mov    %r13,%rsi
        a5:   lea    0xe(%rbx),%esi           f1:     add    $0xe,%esi
        a8:   callq  0xffffffffe1021e0d       f4:     callq  0xffffffffe102bd6d
        ad:   cmp    $0x16,%eax               f9:     cmp    $0x16,%rax
        b0:   je     0x00000000000000bf       fd:     je     0x0000000000000110
                                              ff:     mov    %r13,%rsi
        b2:   lea    0x10(%rbx),%esi         102:     add    $0x10,%esi
        b5:   callq  0xffffffffe1021e0d      105:     callq  0xffffffffe102bd6d
        ba:   cmp    $0x16,%eax              10a:     cmp    $0x16,%rax
        bd:   jne    0x00000000000000c6      10e:     jne    0x0000000000000117
        bf:   mov    $0xffff,%eax            110:     mov    $0xffff,%eax
        c4:   jmp    0x00000000000000c8      115:     jmp    0x000000000000011c
        c6:   xor    %eax,%eax               117:     mov    $0x0,%eax
        c8:   mov    -0x8(%rbp),%rbx         11c:     mov    -0x228(%rbp),%rbx // epilogue
        cc:   leaveq                         123:     mov    -0x220(%rbp),%r13
        cd:   retq                           12a:     mov    -0x218(%rbp),%r14
                                             131:     mov    -0x210(%rbp),%r15
                                             138:     leaveq
                                             139:     retq
      
      On fully cached SKBs both JITed functions take 12 nsec to execute.
      BPF interpreter executes the program in 30 nsec.
      
      The difference in generated assembler is due to the following:
      
      Old BPF imlements LDX_MSH instruction via sk_load_byte_msh() helper function
      inside bpf_jit.S.
      New JIT removes the helper and does it explicitly, so ldx_msh cost
      is the same for both JITs, but generated code looks longer.
      
      New JIT has 4 registers to save, so prologue/epilogue are larger,
      but the cost is within noise on x64.
      
      Old JIT checks whether first insn clears A and if not emits 'xor %eax,%eax'.
      New JIT clears %rax unconditionally.
      
      2. old BPF JIT doesn't support ANC_NLATTR, ANC_PAY_OFFSET, ANC_RANDOM
        extensions. New JIT supports all BPF extensions.
        Performance of such filters improves 2-4 times depending on a filter.
        The longer the filter the higher performance gain.
        Synthetic benchmarks with many ancillary loads see 20x speedup
        which seems to be the maximum gain from JIT
      
      Notes:
      
      . net.core.bpf_jit_enable=2 + tools/net/bpf_jit_disasm is still functional
        and can be used to see generated assembler
      
      . there are two jit_compile() functions and code flow for classic filters is:
        sk_attach_filter() - load classic BPF
        bpf_jit_compile() - try to JIT from classic BPF
        sk_convert_filter() - convert classic to internal
        bpf_int_jit_compile() - JIT from internal BPF
      
        seccomp and tracing filters will just call bpf_int_jit_compile()
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62258278
    • A
      net: filter: x86: split bpf_jit_compile() · f3c2af7b
      Alexei Starovoitov 提交于
      Split bpf_jit_compile() into two functions to improve readability
      of for(pass++) loop. The change follows similar style of JIT compilers
      for arm, powerpc, s390
      
      The body of new do_jit() was not reformatted to reduce noise
      in this patch, since the following patch replaces most of it.
      
      Tested with BPF testsuite.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3c2af7b
  4. 15 5月, 2014 1 次提交
    • L
      x86-64, modify_ldt: Make support for 16-bit segments a runtime option · fa81511b
      Linus Torvalds 提交于
      Checkin:
      
      b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      disabled 16-bit segments on 64-bit kernels due to an information
      leak.  However, it does seem that people are genuinely using Wine to
      run old 16-bit Windows programs on Linux.
      
      A proper fix for this ("espfix64") is coming in the upcoming merge
      window, but as a temporary fix, create a sysctl to allow the
      administrator to re-enable support for 16-bit segments.
      
      It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
      you hit this issue and care about your old Windows program more than
      you care about a kernel stack address information leak, you can do
      
         echo 1 > /proc/sys/abi/ldt16
      
      as root (add it to your startup scripts), and you should be ok.
      
      The sysctl table is only added if you have COMPAT support enabled on
      x86-64, but I assume anybody who runs old windows binaries very much
      does that ;)
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
      Cc: <stable@vger.kernel.org>
      fa81511b
  5. 14 5月, 2014 3 次提交
  6. 12 5月, 2014 1 次提交
  7. 09 5月, 2014 3 次提交
  8. 08 5月, 2014 3 次提交
  9. 07 5月, 2014 3 次提交
  10. 06 5月, 2014 4 次提交
  11. 03 5月, 2014 1 次提交
    • D
      x86/efi: earlyprintk=efi,keep fix · 5f35eb0e
      Dave Young 提交于
      earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
      below:
      
        VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
        devtmpfs: mounted
        Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)
      
      It is caused by efi earlyprintk use __init function which will be freed
      later.  Such as early_efi_write is marked as __init, also it will use
      early_ioremap which is init function as well.
      
      To fix this issue, I added early initcall early_efi_map_fb which maps
      the whole efi fb for later use. OTOH, adding a wrapper function
      early_efi_map which calls early_ioremap before ioremap is available.
      
      With this patch applied efi boot ok with earlyprintk=efi,keep console=efi
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      5f35eb0e
  12. 28 4月, 2014 3 次提交
    • T
      genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2
      Thomas Gleixner 提交于
      On x86 the allocation of irq descriptors may allocate interrupts which
      are in the range of the GSI interrupts. That's wrong as those
      interrupts are hardwired and we don't have the irq domain translation
      like PPC. So one of these interrupts can be hooked up later to one of
      the devices which are hard wired to it and the io_apic init code for
      that particular interrupt line happily reuses that descriptor with a
      completely different configuration so hell breaks lose.
      
      Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
      except for a few usage sites which have not yet blown up in our face
      for whatever reason. But for drivers which need an irq range, like the
      GPIO drivers, we have no limit in place and we don't want to expose
      such a detail to a driver.
      
      To cure this introduce a function which an architecture can implement
      to impose a lower bound on the dynamic interrupt allocations.
      
      Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
      the end of the hardwired interrupt space, so all dynamic allocations
      happen above.
      
      That not only allows the GPIO driver to work sanely, it also protects
      the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
      htirq code. They need to be cleaned up as well, but that's a separate
      issue.
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Krogerus Heikki <heikki.krogerus@intel.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62a08ae2
    • B
      KVM: x86: Check for host supported fields in shadow vmcs · fe2b201b
      Bandan Das 提交于
      We track shadow vmcs fields through two static lists,
      one for read only and another for r/w fields. However, with
      addition of new vmcs fields, not all fields may be supported on
      all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
      unsupported hosts will result in a vmwrite error. For example, commit
      36be0b9d introduced GUEST_BNDCFGS, which is not supported
      by all processors. Filter out host unsupported fields before
      letting guests use shadow vmcs
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe2b201b
    • O
      x86/vsmp: Fix irq routing · 39025ba3
      Oren Twaig 提交于
      Correct IRQ routing in case a vSMP box is detected
      but the  Interrupt Routing Comply (IRC) value is set to
      "comply", which leads to incorrect IRQ routing.
      
      Before the patch:
      
      When a vSMP box was detected and IRC was set to "comply",
      users (and the kernel) couldn't effectively set the
      destination of the IRQs. This is because the hook inside
      vsmp_64.c always setup all CPUs as the IRQ destination using
      cpumask_setall() as the return value for IRQ allocation mask.
      Later, this "overrided" mask caused the kernel to set the IRQ
      destination to the lowest online CPU in the mask (CPU0 usually).
      
      After the patch:
      
      When the IRC is set to "comply", users (and the kernel) can control
      the destination of the IRQs as we will not be changing the
      default "apic->vector_allocation_domain".
      Signed-off-by: NOren Twaig <oren@scalemp.com>
      Acked-by: NShai Fultheim <shai@scalemp.com>
      Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com
      [ Minor readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      39025ba3
  13. 24 4月, 2014 1 次提交
  14. 22 4月, 2014 1 次提交
  15. 18 4月, 2014 1 次提交
  16. 17 4月, 2014 2 次提交
    • M
      kprobes/x86: Fix page-fault handling logic · 6381c24c
      Masami Hiramatsu 提交于
      Current kprobes in-kernel page fault handler doesn't
      expect that its single-stepping can be interrupted by
      an NMI handler which may cause a page fault(e.g. perf
      with callback tracing).
      
      In that case, the page-fault handled by kprobes and it
      misunderstands the page-fault has been caused by the
      single-stepping code and tries to recover IP address
      to probed address.
      
      But the truth is the page-fault has been caused by the
      NMI handler, and do_page_fault failes to handle real
      page fault because the IP address is modified and
      causes Kernel BUGs like below.
      
       ----
       [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
       [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
      
      To handle this correctly, I fixed the kprobes fault
      handler to ensure the faulted ip address is its own
      single-step buffer instead of checking current kprobe
      state.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fche@redhat.com
      Cc: systemtap@sourceware.org
      Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6381c24c
    • I
      x86/mce: Fix CMCI preemption bugs · ea431643
      Ingo Molnar 提交于
      The following commit:
      
        27f6c573 ("x86, CMCI: Add proper detection of end of CMCI storms")
      
      Added two preemption bugs:
      
       - machine_check_poll() does a get_cpu_var() without a matching
         put_cpu_var(), which causes preemption imbalance and crashes upon
         bootup.
      
       - it does percpu ops without disabling preemption. Preemption is not
         disabled due to the mistaken use of a raw spinlock.
      
      To fix these bugs fix the imbalance and change
      cmci_discover_lock to a regular spinlock.
      Reported-by: NOwen Kibel <qmewlo@gmail.com>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Chen, Gong <gong.chen@linux.intel.com>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Todorov <atodorov@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      --
       arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
       arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
       2 files changed, 10 insertions(+), 12 deletions(-)
      ea431643
  17. 16 4月, 2014 2 次提交
    • I
      x86: Remove the PCI reboot method from the default chain · 5be44a6f
      Ingo Molnar 提交于
      Steve reported a reboot hang and bisected it back to this commit:
      
        a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list
      
      He heroically tested all reboot methods and found the following:
      
        reboot=t       # triple fault                  ok
        reboot=k       # keyboard ctrl                 FAIL
        reboot=b       # BIOS                          ok
        reboot=a       # ACPI                          FAIL
        reboot=e       # EFI                           FAIL   [system has no EFI]
        reboot=p       # PCI 0xcf9                     FAIL
      
      And I think it's pretty obvious that we should only try PCI 0xcf9 as a
      last resort - if at all.
      
      The other observation is that (on this box) we should never try
      the PCI reboot method, but close with either the 'triple fault'
      or the 'BIOS' (terminal!) reboot methods.
      
      Thirdly, CF9_COND is a total misnomer - it should be something like
      CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
      
      So this patch fixes the worst problems:
      
       - it orders the actual reboot logic to follow the reboot ordering
         pattern - it was in a pretty random order before for no good
         reason.
      
       - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
         BOOT_CF9_SAFE flags to make the code more obvious.
      
       - it tries the BIOS reboot method before the PCI reboot method.
         (Since 'BIOS' is a terminal reboot method resulting in a hang
          if it does not work, this is essentially equivalent to removing
          the PCI reboot method from the default reboot chain.)
      
       - just for the miraculous possibility of terminal (resulting
         in hang) reboot methods of triple fault or BIOS returning
         without having done their job, there's an ordering between
         them as well.
      Reported-and-bisected-and-tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Li Aubrey <aubrey.li@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5be44a6f
    • K
      xen/spinlock: Don't enable them unconditionally. · e0fc17a9
      Konrad Rzeszutek Wilk 提交于
      The git commit a945928e
      ('xen: Do not enable spinlocks before jump_label_init() has executed')
      was added to deal with the jump machinery. Earlier the code
      that turned on the jump label was only called by Xen specific
      functions. But now that it had been moved to the initcall machinery
      it gets called on Xen, KVM, and baremetal - ouch!. And the detection
      machinery to only call it on Xen wasn't remembered in the heat
      of merge window excitement.
      
      This means that the slowpath is enabled on baremetal while it should
      not be.
      Reported-by: NWaiman Long <waiman.long@hp.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      CC: stable@vger.kernel.org
      CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      e0fc17a9
  18. 15 4月, 2014 7 次提交