1. 29 3月, 2014 1 次提交
    • C
      x86, CMCI: Add proper detection of end of CMCI storms · 27f6c573
      Chen, Gong 提交于
      When CMCI storm persists for a long time(at least beyond predefined
      threshold. It's 30 seconds for now), we can watch CMCI storm is
      detected immediately after it subsides.
      
      ...
      Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
      Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
      ...
      
      The problem is that our logic that determines that the storm has
      ended is incorrect. We announce the end, re-enable interrupts and
      realize that the storm is still going on, so we switch back to
      polling mode. Rinse, repeat.
      
      When a storm happens we disable signaling of errors via CMCI and begin
      polling machine check banks instead. If we find any logged errors,
      then we need to set a per-cpu flag so that our per-cpu tests that
      check whether the storm is ongoing will see that errors are still
      being logged independently of whether mce_notify_irq() says that the
      error has been fully processed.
      
      cmci_clear() is not the right tool to disable a bank. It disables the
      interrupt for the bank as desired, but it also clears the bit for
      this bank in "mce_banks_owned" so we will skip the bank when polling
      (so we fail to see that the storm continues because we stop looking).
      New cmci_storm_disable_banks() just disables the interrupt while
      allowing polling to continue.
      Reported-by: NWilliam Dauchy <wdauchy@gmail.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      27f6c573
  2. 19 3月, 2014 1 次提交
    • B
      Revert "[PATCH] Insert GART region into resource map" · 707d4eef
      Bjorn Helgaas 提交于
      This reverts commit 56dd669a, which makes the GART visible in
      /proc/iomem.  This fixes a regression: e501b3d8 ("agp: Support 64-bit
      APBASE") exposed an existing problem with a conflict between the GART
      region and a PCI BAR region.
      
      The GART addresses are bus addresses, not CPU addresses, and therefore
      should not be inserted in iomem_resource.
      
      On many machines, the GART region is addressable by the CPU as well as by
      an AGP master, but CPU addressability is not required by the spec.  On some
      of these machines, the GART is mapped by a PCI BAR, and in that case, the
      PCI core automatically inserts it into iomem_resource, just as it does for
      all BARs.
      
      Inserting it here means we'll have a conflict if the PCI core later tries
      to claim the GART region, so let's drop the insertion here.
      
      The conflict indirectly causes X failures, as reported by Jouni in the
      bugzilla below.  We detected the conflict even before e501b3d8, but
      after it the AGP code (fix_northbridge()) uses the PCI resource (which is
      zeroed because of the conflict) instead of reading the BAR again.
      
      Conflicts:
      	arch/x86_64/kernel/aperture.c
      
      Fixes: e501b3d8 agp: Support 64-bit APBASE
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=72201Reported-and-tested-by: NJouni Mettälä <jtmettala@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      707d4eef
  3. 14 3月, 2014 1 次提交
  4. 13 3月, 2014 1 次提交
  5. 12 3月, 2014 3 次提交
    • A
      x86: bpf_jit: support negative offsets · fdfaf64e
      Alexei Starovoitov 提交于
      Commit a998d434 claimed to introduce negative offset support to x86 jit,
      but it couldn't be working, since at the time of the execution
      of LD+ABS or LD+IND instructions via call into
      bpf_internal_load_pointer_neg_helper() the %edx (3rd argument of this func)
      had junk value instead of access size in bytes (1 or 2 or 4).
      
      Store size into %edx instead of %ecx (what original commit intended to do)
      
      Fixes: a998d434 ("bpf jit: Let the x86 jit handle negative offsets")
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Jan Seiffert <kaffeemonster@googlemail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdfaf64e
    • S
      x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU · 731bd6a9
      Suresh Siddha 提交于
      For non-eager fpu mode, thread's fpu state is allocated during the first
      fpu usage (in the context of device not available exception). This
      (math_state_restore()) can be a blocking call and hence we enable
      interrupts (which were originally disabled when the exception happened),
      allocate memory and disable interrupts etc.
      
      But the eager-fpu mode, call's the same math_state_restore() from
      kernel_fpu_end(). The assumption being that tsk_used_math() is always
      set for the eager-fpu mode and thus avoid the code path of enabling
      interrupts, allocating fpu state using blocking call and disable
      interrupts etc.
      
      But the below issue was noticed by Maarten Baert, Nate Eldredge and
      few others:
      
      If a user process dumps core on an ecrypt fs while aesni-intel is loaded,
      we get a BUG() in __find_get_block() complaining that it was called with
      interrupts disabled; then all further accesses to our ecrypt fs hang
      and we have to reboot.
      
      The aesni-intel code (encrypting the core file that we are writing) needs
      the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(),
      the latter of which calls math_state_restore(). So after kernel_fpu_end(),
      interrupts may be disabled, which nobody seems to expect, and they stay
      that way until we eventually get to __find_get_block() which barfs.
      
      For eager fpu, most the time, tsk_used_math() is true. At few instances
      during thread exit, signal return handling etc, tsk_used_math() might
      be false.
      
      In kernel_fpu_end(), for eager-fpu, call math_state_restore()
      only if tsk_used_math() is set. Otherwise, don't bother. Kernel code
      path which cleared tsk_used_math() knows what needs to be done
      with the fpu state.
      Reported-by: NMaarten Baert <maarten-baert@hotmail.com>
      Reported-by: NNate Eldredge <nate@thatsmathematics.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSuresh Siddha <sbsiddha@gmail.com>
      Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa
      Cc: George Spelvin <linux@horizon.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      731bd6a9
    • D
      x86: Remove CONFIG_X86_OOSTORE · 09df7c4c
      Dave Jones 提交于
      This was an optimization that made memcpy type benchmarks a little
      faster on ancient (Circa 1998) IDT Winchip CPUs.  In real-life
      workloads, it wasn't even noticable, and I doubt anyone is running
      benchmarks on 16 year old silicon any more.
      
      Given this code has likely seen very little use over the last decade,
      let's just remove it.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09df7c4c
  6. 11 3月, 2014 1 次提交
  7. 08 3月, 2014 2 次提交
    • L
      x86: fix compile error due to X86_TRAP_NMI use in asm files · b01d4e68
      Linus Torvalds 提交于
      It's an enum, not a #define, you can't use it in asm files.
      
      Introduced in commit 5fa10196 ("x86: Ignore NMIs that come in during
      early boot"), and sadly I didn't compile-test things like I should have
      before pushing out.
      
      My weak excuse is that the x86 tree generally doesn't introduce stupid
      things like this (and the ARM pull afterwards doesn't cause me to do a
      compile-test either, since I don't cross-compile).
      
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b01d4e68
    • H
      x86: Ignore NMIs that come in during early boot · 5fa10196
      H. Peter Anvin 提交于
      Don Zickus reports:
      
      A customer generated an external NMI using their iLO to test kdump
      worked.  Unfortunately, the machine hung.  Disabling the nmi_watchdog
      made things work.
      
      I speculated the external NMI fired, caused the machine to panic (as
      expected) and the perf NMI from the watchdog came in and was latched.
      My guess was this somehow caused the hang.
      
         ----
      
      It appears that the latched NMI stays latched until the early page
      table generation on 64 bits, which causes exceptions to happen which
      end in IRET, which re-enable NMI.  Therefore, ignore NMIs that come in
      during early execution, until we have proper exception handling.
      Reported-and-tested-by: NDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
      5fa10196
  8. 07 3月, 2014 1 次提交
  9. 05 3月, 2014 2 次提交
  10. 28 2月, 2014 2 次提交
    • P
      kvm, vmx: Really fix lazy FPU on nested guest · 1b385cbd
      Paolo Bonzini 提交于
      Commit e504c909 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
      highlighted a real problem, but the fix was subtly wrong.
      
      nested_read_cr0 is the CR0 as read by L2, but here we want to look at
      the CR0 value reflecting L1's setup.  In other words, L2 might think
      that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
      running it with TS=1, we should inject the fault into L1.
      
      The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
      it.
      
      Fixes: e504c909Reported-by: NKashyap Chamarty <kchamart@redhat.com>
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Tested-by: NKashyap Chamarty <kchamart@redhat.com>
      Tested-by: NAnthoine Bourgeois <bourgeois@bertin.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1b385cbd
    • A
      kvm: x86: fix emulator buffer overflow (CVE-2014-0049) · a08d3b3b
      Andrew Honig 提交于
      The problem occurs when the guest performs a pusha with the stack
      address pointing to an mmio address (or an invalid guest physical
      address) to start with, but then extending into an ordinary guest
      physical address.  When doing repeated emulated pushes
      emulator_read_write sets mmio_needed to 1 on the first one.  On a
      later push when the stack points to regular memory,
      mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.
      
      As a result, KVM exits to userspace, and then returns to
      complete_emulated_mmio.  In complete_emulated_mmio
      vcpu->mmio_cur_fragment is incremented.  The termination condition of
      vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
      The code bounces back and fourth to userspace incrementing
      mmio_cur_fragment past it's buffer.  If the guest does nothing else it
      eventually leads to a a crash on a memcpy from invalid memory address.
      
      However if a guest code can cause the vm to be destroyed in another
      vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
      can be used by the guest to control the data that's pointed to by the
      call to cancel_work_item, which can be used to gain execution.
      
      Fixes: f78146b0Signed-off-by: NAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org (3.5+)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a08d3b3b
  11. 27 2月, 2014 2 次提交
    • P
      perf/x86: Fix event scheduling · 26e61e89
      Peter Zijlstra 提交于
      Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
      with perf WARN_ON()s triggering. He also provided traces of the failures.
      
      This is I think the relevant bit:
      
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
      	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)
      
      So we add the insn:p event (fd[23]).
      
      At this point we should have:
      
        n_events = 2, n_added = 1, n_txn = 1
      
      	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
      	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)
      
      We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
      These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
      that's not visible.
      
      	group_sched_in()
      	  pmu->start_txn() /* nop - BP pmu */
      	  event_sched_in()
      	     event->pmu->add()
      
      So here we should end up with:
      
        0: n_events = 3, n_added = 2, n_txn = 2
        4: n_events = 4, n_added = 3, n_txn = 3
      
      But seeing the below state on x86_pmu_enable(), the must have failed,
      because the 0 and 4 events aren't there anymore.
      
      Looking at group_sched_in(), since the BP is the leader, its
      event_sched_in() must have succeeded, for otherwise we would not have
      seen the sibling adds.
      
      But since neither 0 or 4 are in the below state; their event_sched_in()
      must have failed; but I don't see why, the complete state: 0,0,1:p,4
      fits perfectly fine on a core2.
      
      However, since we try and schedule 4 it means the 0 event must have
      succeeded!  Therefore the 4 event must have failed, its failure will
      have put group_sched_in() into the fail path, which will call:
      
      	event_sched_out()
      	  event->pmu->del()
      
      on 0 and the BP event.
      
      Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
      giving what we see below:
      
       n_event = 2, n_added = 2, n_txn = 2
      
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
      
      So the problem is that x86_pmu_del(), when called from a
      group_sched_in() that fails (for whatever reason), and without x86_pmu
      TXN support (because the leader is !x86_pmu), will corrupt the n_added
      state.
      Reported-and-Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26e61e89
    • M
      KVM: MMU: drop read-only large sptes when creating lower level sptes · 404381c5
      Marcelo Tosatti 提交于
      Read-only large sptes can be created due to read-only faults as
      follows:
      
      - QEMU pagetable entry that maps guest memory is read-only
      due to COW.
      - Guest read faults such memory, COW is not broken, because
      it is a read-only fault.
      - Enable dirty logging, large spte not nuked because it is read-only.
      - Write-fault on such memory causes guest to loop endlessly
      (which must go down to level 1 because dirty logging is enabled).
      
      Fix by dropping large spte when necessary.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      404381c5
  12. 26 2月, 2014 2 次提交
  13. 22 2月, 2014 3 次提交
  14. 20 2月, 2014 2 次提交
    • M
      x86: tsc: Add missing Baytrail frequency to the table · 3e11e818
      Mika Westerberg 提交于
      Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means
      that the CPU reference clock runs at 83.3MHz. Add this missing frequency to
      the table.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3e11e818
    • T
      x86, tsc: Fallback to normal calibration if fast MSR calibration fails · 5f0e0309
      Thomas Gleixner 提交于
      If we cannot calibrate TSC via MSR based calibration
      try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
      to the caller. This value gets then propagated further to clockevents
      code resulting division by zero oops like the one below:
      
       divide error: 0000 [#1] PREEMPT SMP
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
       task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
       RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
       RSP: 0000:ffff880075507e58  EFLAGS: 00010246
       RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
       RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
       R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
       R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
       Stack:
        ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
        ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
        ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
       Call Trace:
        [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
        [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
        [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
        [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
        [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
        [<ffffffff8177c91e>] kernel_init+0xe/0x120
        [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
      
      Prevent this from happening by:
       1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
          if it fails.
       2) Check this return value in native_calibrate_tsc() and in case of zero
          fallback to use normal non-MSR based calibration.
      
      [mw: Added subject and changelog]
      Reported-and-tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5f0e0309
  15. 14 2月, 2014 3 次提交
  16. 13 2月, 2014 1 次提交
  17. 12 2月, 2014 1 次提交
    • S
      ftrace/x86: Use breakpoints for converting function graph caller · 87fbb2ac
      Steven Rostedt (Red Hat) 提交于
      When the conversion was made to remove stop machine and use the breakpoint
      logic instead, the modification of the function graph caller is still
      done directly as though it was being done under stop machine.
      
      As it is not converted via stop machine anymore, there is a possibility
      that the code could be layed across cache lines and if another CPU is
      accessing that function graph call when it is being updated, it could
      cause a General Protection Fault.
      
      Convert the update of the function graph caller to use the breakpoint
      method as well.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87fbb2ac
  18. 11 2月, 2014 2 次提交
    • M
      x86: dma-mapping: fix GFP_ATOMIC macro usage · c091c71a
      Marek Szyprowski 提交于
      GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
      flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller
      wants to perform an atomic allocation, the code must test for a lack of the
      __GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      c091c71a
    • M
      xen: properly account for _PAGE_NUMA during xen pte translations · a9c8e4be
      Mel Gorman 提交于
      Steven Noonan forwarded a users report where they had a problem starting
      vsftpd on a Xen paravirtualized guest, with this in dmesg:
      
        BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
        page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
        page flags: 0x2ffc0000000014(referenced|dirty)
        addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
        CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
        Call Trace:
          dump_stack+0x45/0x56
          print_bad_pte+0x22e/0x250
          unmap_single_vma+0x583/0x890
          unmap_vmas+0x65/0x90
          exit_mmap+0xc5/0x170
          mmput+0x65/0x100
          do_exit+0x393/0x9e0
          do_group_exit+0xcc/0x140
          SyS_exit_group+0x14/0x20
          system_call_fastpath+0x1a/0x1f
        Disabling lock debugging due to kernel taint
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1
      
      The issue could not be reproduced under an HVM instance with the same
      kernel, so it appears to be exclusive to paravirtual Xen guests.  He
      bisected the problem to commit 1667918b ("mm: numa: clear numa
      hinting information on mprotect") that was also included in 3.12-stable.
      
      The problem was related to how xen translates ptes because it was not
      accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
      a pteval_present helper for use by xen so both bare metal and xen use
      the same code when checking if a PTE is present.
      
      [mgorman@suse.de: wrote changelog, proposed minor modifications]
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Tested-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NElena Ufimtseva <ufimtseva@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c8e4be
  19. 09 2月, 2014 3 次提交
  20. 07 2月, 2014 3 次提交
  21. 06 2月, 2014 2 次提交
    • M
      x86/efi: Allow mapping BGRT on x86-32 · 081cd62a
      Matt Fleming 提交于
      CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
      map (see commit 70087011 ("x86, efi: Don't map Boot Services on
      i386")), and so efi_lookup_mapped_addr() will fail to return a valid
      address. Executing the ioremap() path in efi_bgrt_init() causes the
      following warning on x86-32 because we're trying to ioremap() RAM,
      
       WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
       Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
        00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
        00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
        00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
       Call Trace:
        [<c09a5196>] dump_stack+0x41/0x52
        [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c0448ce2>] warn_slowpath_null+0x22/0x30
        [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
        [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
        [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
        [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
        [<c043bc2b>] ioremap_nocache+0x1b/0x20
        [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
        [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
        [<c0cafd82>] efi_late_init+0x8/0xa
        [<c0c9bab2>] start_kernel+0x3ae/0x3c3
        [<c0c9b53b>] ? repair_env_string+0x51/0x51
        [<c0c9b378>] i386_start_kernel+0x12e/0x131
      
      Switch to using early_memremap(), which won't trigger this warning, and
      has the added benefit of more accurately conveying what we're trying to
      do - map a chunk of memory.
      
      This patch addresses the following bug report,
      
        https://bugzilla.kernel.org/show_bug.cgi?id=67911Reported-by: NAdam Williamson <awilliam@redhat.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      081cd62a
    • I
      x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs · f8f20234
      Ingo Molnar 提交于
      It can take some time to validate the image, make sure
      {allyes|allmod}config doesn't enable it.
      
      I'd say randconfig will cover it often enough, and the failure is also
      borderline build coverage related: you cannot really make the decoder
      test fail via source level changes, only with changes in the build
      environment, so I agree with Andi that we can disable this one too.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: Paul Gortmaker paul.gortmaker@windriver.com>
      Suggested-and-acked-by: Andi Kleen andi@firstfloor.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f20234
  22. 04 2月, 2014 1 次提交