1. 08 3月, 2014 1 次提交
  2. 07 3月, 2014 1 次提交
  3. 05 3月, 2014 2 次提交
  4. 28 2月, 2014 2 次提交
    • P
      kvm, vmx: Really fix lazy FPU on nested guest · 1b385cbd
      Paolo Bonzini 提交于
      Commit e504c909 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
      highlighted a real problem, but the fix was subtly wrong.
      
      nested_read_cr0 is the CR0 as read by L2, but here we want to look at
      the CR0 value reflecting L1's setup.  In other words, L2 might think
      that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
      running it with TS=1, we should inject the fault into L1.
      
      The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
      it.
      
      Fixes: e504c909Reported-by: NKashyap Chamarty <kchamart@redhat.com>
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Tested-by: NKashyap Chamarty <kchamart@redhat.com>
      Tested-by: NAnthoine Bourgeois <bourgeois@bertin.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1b385cbd
    • A
      kvm: x86: fix emulator buffer overflow (CVE-2014-0049) · a08d3b3b
      Andrew Honig 提交于
      The problem occurs when the guest performs a pusha with the stack
      address pointing to an mmio address (or an invalid guest physical
      address) to start with, but then extending into an ordinary guest
      physical address.  When doing repeated emulated pushes
      emulator_read_write sets mmio_needed to 1 on the first one.  On a
      later push when the stack points to regular memory,
      mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.
      
      As a result, KVM exits to userspace, and then returns to
      complete_emulated_mmio.  In complete_emulated_mmio
      vcpu->mmio_cur_fragment is incremented.  The termination condition of
      vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
      The code bounces back and fourth to userspace incrementing
      mmio_cur_fragment past it's buffer.  If the guest does nothing else it
      eventually leads to a a crash on a memcpy from invalid memory address.
      
      However if a guest code can cause the vm to be destroyed in another
      vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
      can be used by the guest to control the data that's pointed to by the
      call to cancel_work_item, which can be used to gain execution.
      
      Fixes: f78146b0Signed-off-by: NAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org (3.5+)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a08d3b3b
  5. 27 2月, 2014 2 次提交
    • P
      perf/x86: Fix event scheduling · 26e61e89
      Peter Zijlstra 提交于
      Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
      with perf WARN_ON()s triggering. He also provided traces of the failures.
      
      This is I think the relevant bit:
      
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
      	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)
      
      So we add the insn:p event (fd[23]).
      
      At this point we should have:
      
        n_events = 2, n_added = 1, n_txn = 1
      
      	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
      	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)
      
      We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
      These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
      that's not visible.
      
      	group_sched_in()
      	  pmu->start_txn() /* nop - BP pmu */
      	  event_sched_in()
      	     event->pmu->add()
      
      So here we should end up with:
      
        0: n_events = 3, n_added = 2, n_txn = 2
        4: n_events = 4, n_added = 3, n_txn = 3
      
      But seeing the below state on x86_pmu_enable(), the must have failed,
      because the 0 and 4 events aren't there anymore.
      
      Looking at group_sched_in(), since the BP is the leader, its
      event_sched_in() must have succeeded, for otherwise we would not have
      seen the sibling adds.
      
      But since neither 0 or 4 are in the below state; their event_sched_in()
      must have failed; but I don't see why, the complete state: 0,0,1:p,4
      fits perfectly fine on a core2.
      
      However, since we try and schedule 4 it means the 0 event must have
      succeeded!  Therefore the 4 event must have failed, its failure will
      have put group_sched_in() into the fail path, which will call:
      
      	event_sched_out()
      	  event->pmu->del()
      
      on 0 and the BP event.
      
      Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
      giving what we see below:
      
       n_event = 2, n_added = 2, n_txn = 2
      
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
      
      So the problem is that x86_pmu_del(), when called from a
      group_sched_in() that fails (for whatever reason), and without x86_pmu
      TXN support (because the leader is !x86_pmu), will corrupt the n_added
      state.
      Reported-and-Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26e61e89
    • M
      KVM: MMU: drop read-only large sptes when creating lower level sptes · 404381c5
      Marcelo Tosatti 提交于
      Read-only large sptes can be created due to read-only faults as
      follows:
      
      - QEMU pagetable entry that maps guest memory is read-only
      due to COW.
      - Guest read faults such memory, COW is not broken, because
      it is a read-only fault.
      - Enable dirty logging, large spte not nuked because it is read-only.
      - Write-fault on such memory causes guest to loop endlessly
      (which must go down to level 1 because dirty logging is enabled).
      
      Fix by dropping large spte when necessary.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      404381c5
  6. 26 2月, 2014 2 次提交
  7. 22 2月, 2014 3 次提交
  8. 20 2月, 2014 2 次提交
    • M
      x86: tsc: Add missing Baytrail frequency to the table · 3e11e818
      Mika Westerberg 提交于
      Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means
      that the CPU reference clock runs at 83.3MHz. Add this missing frequency to
      the table.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3e11e818
    • T
      x86, tsc: Fallback to normal calibration if fast MSR calibration fails · 5f0e0309
      Thomas Gleixner 提交于
      If we cannot calibrate TSC via MSR based calibration
      try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
      to the caller. This value gets then propagated further to clockevents
      code resulting division by zero oops like the one below:
      
       divide error: 0000 [#1] PREEMPT SMP
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
       task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
       RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
       RSP: 0000:ffff880075507e58  EFLAGS: 00010246
       RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
       RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
       R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
       R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
       Stack:
        ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
        ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
        ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
       Call Trace:
        [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
        [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
        [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
        [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
        [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
        [<ffffffff8177c91e>] kernel_init+0xe/0x120
        [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
      
      Prevent this from happening by:
       1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
          if it fails.
       2) Check this return value in native_calibrate_tsc() and in case of zero
          fallback to use normal non-MSR based calibration.
      
      [mw: Added subject and changelog]
      Reported-and-tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5f0e0309
  9. 14 2月, 2014 3 次提交
  10. 13 2月, 2014 1 次提交
  11. 12 2月, 2014 1 次提交
    • S
      ftrace/x86: Use breakpoints for converting function graph caller · 87fbb2ac
      Steven Rostedt (Red Hat) 提交于
      When the conversion was made to remove stop machine and use the breakpoint
      logic instead, the modification of the function graph caller is still
      done directly as though it was being done under stop machine.
      
      As it is not converted via stop machine anymore, there is a possibility
      that the code could be layed across cache lines and if another CPU is
      accessing that function graph call when it is being updated, it could
      cause a General Protection Fault.
      
      Convert the update of the function graph caller to use the breakpoint
      method as well.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87fbb2ac
  12. 11 2月, 2014 2 次提交
    • M
      x86: dma-mapping: fix GFP_ATOMIC macro usage · c091c71a
      Marek Szyprowski 提交于
      GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
      flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller
      wants to perform an atomic allocation, the code must test for a lack of the
      __GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      c091c71a
    • M
      xen: properly account for _PAGE_NUMA during xen pte translations · a9c8e4be
      Mel Gorman 提交于
      Steven Noonan forwarded a users report where they had a problem starting
      vsftpd on a Xen paravirtualized guest, with this in dmesg:
      
        BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
        page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
        page flags: 0x2ffc0000000014(referenced|dirty)
        addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
        CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
        Call Trace:
          dump_stack+0x45/0x56
          print_bad_pte+0x22e/0x250
          unmap_single_vma+0x583/0x890
          unmap_vmas+0x65/0x90
          exit_mmap+0xc5/0x170
          mmput+0x65/0x100
          do_exit+0x393/0x9e0
          do_group_exit+0xcc/0x140
          SyS_exit_group+0x14/0x20
          system_call_fastpath+0x1a/0x1f
        Disabling lock debugging due to kernel taint
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1
      
      The issue could not be reproduced under an HVM instance with the same
      kernel, so it appears to be exclusive to paravirtual Xen guests.  He
      bisected the problem to commit 1667918b ("mm: numa: clear numa
      hinting information on mprotect") that was also included in 3.12-stable.
      
      The problem was related to how xen translates ptes because it was not
      accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
      a pteval_present helper for use by xen so both bare metal and xen use
      the same code when checking if a PTE is present.
      
      [mgorman@suse.de: wrote changelog, proposed minor modifications]
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Tested-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NElena Ufimtseva <ufimtseva@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c8e4be
  13. 09 2月, 2014 3 次提交
  14. 07 2月, 2014 3 次提交
  15. 06 2月, 2014 2 次提交
    • M
      x86/efi: Allow mapping BGRT on x86-32 · 081cd62a
      Matt Fleming 提交于
      CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
      map (see commit 70087011 ("x86, efi: Don't map Boot Services on
      i386")), and so efi_lookup_mapped_addr() will fail to return a valid
      address. Executing the ioremap() path in efi_bgrt_init() causes the
      following warning on x86-32 because we're trying to ioremap() RAM,
      
       WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
       Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
        00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
        00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
        00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
       Call Trace:
        [<c09a5196>] dump_stack+0x41/0x52
        [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c0448ce2>] warn_slowpath_null+0x22/0x30
        [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
        [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
        [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
        [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
        [<c043bc2b>] ioremap_nocache+0x1b/0x20
        [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
        [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
        [<c0cafd82>] efi_late_init+0x8/0xa
        [<c0c9bab2>] start_kernel+0x3ae/0x3c3
        [<c0c9b53b>] ? repair_env_string+0x51/0x51
        [<c0c9b378>] i386_start_kernel+0x12e/0x131
      
      Switch to using early_memremap(), which won't trigger this warning, and
      has the added benefit of more accurately conveying what we're trying to
      do - map a chunk of memory.
      
      This patch addresses the following bug report,
      
        https://bugzilla.kernel.org/show_bug.cgi?id=67911Reported-by: NAdam Williamson <awilliam@redhat.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      081cd62a
    • I
      x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs · f8f20234
      Ingo Molnar 提交于
      It can take some time to validate the image, make sure
      {allyes|allmod}config doesn't enable it.
      
      I'd say randconfig will cover it often enough, and the failure is also
      borderline build coverage related: you cannot really make the decoder
      test fail via source level changes, only with changes in the build
      environment, so I agree with Andi that we can disable this one too.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: Paul Gortmaker paul.gortmaker@windriver.com>
      Suggested-and-acked-by: Andi Kleen andi@firstfloor.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f20234
  16. 04 2月, 2014 1 次提交
  17. 03 2月, 2014 1 次提交
  18. 02 2月, 2014 1 次提交
  19. 31 1月, 2014 6 次提交
  20. 30 1月, 2014 1 次提交