1. 03 12月, 2010 1 次提交
    • J
      vmalloc: eagerly clear ptes on vunmap · 64141da5
      Jeremy Fitzhardinge 提交于
      On stock 2.6.37-rc4, running:
      
        # mount lilith:/export /mnt/lilith
        # find  /mnt/lilith/ -type f -print0 | xargs -0 file
      
      crashes the machine fairly quickly under Xen.  Often it results in oops
      messages, but the couple of times I tried just now, it just hung quietly
      and made Xen print some rude messages:
      
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
          3000000000000000) for mfn 1d7058 (pfn 18fa7)
          (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
          1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
          (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04
      
      Which means the domain tried to map a pagetable page RW, which would
      allow it to map arbitrary memory, so Xen stopped it.  This is because
      vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
      finished with them, and those pages got recycled as pagetable pages
      while still having these RW aliases.
      
      Removing those mappings immediately removes the Xen-visible aliases, and
      so it has no problem with those pages being reused as pagetable pages.
      Deferring the TLB flush doesn't upset Xen because it can flush the TLB
      itself as needed to maintain its invariants.
      
      When unmapping a region in the vmalloc space, clear the ptes
      immediately.  There's no point in deferring this because there's no
      amortization benefit.
      
      The TLBs are left dirty, and they are flushed lazily to amortize the
      cost of the IPIs.
      
      This specific motivation for this patch is an oops-causing regression
      since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
      of vm_map_ram() introduced in 56e4ebf8 ("NFS: readdir with vmapped
      pages") .  XFS also uses vm_map_ram() and could cause similar problems.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64141da5
  2. 02 12月, 2010 2 次提交
    • S
      xen: unplug the emulated devices at resume time · 512b109e
      Stefano Stabellini 提交于
      Early after being resumed we need to unplug again the emulated devices.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      512b109e
    • S
      xen: fix MSI setup and teardown for PV on HVM guests · af42b8d1
      Stefano Stabellini 提交于
      When remapping MSIs into pirqs for PV on HVM guests, qemu is responsible
      for doing the actual mapping and unmapping.
      We only give qemu the desired pirq number when we ask to do the mapping
      the first time, after that we should be reading back the pirq number
      from qemu every time we want to re-enable the MSI.
      
      This fixes a bug in xen_hvm_setup_msi_irqs that manifests itself when
      trying to enable the same MSI for the second time: the old MSI to pirq
      mapping is still valid at this point but xen_hvm_setup_msi_irqs would
      try to assign a new pirq anyway.
      A simple way to reproduce this bug is to assign an MSI capable network
      card to a PV on HVM guest, if the user brings down the corresponding
      ethernet interface and up again, Linux would fail to enable MSIs on the
      device.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      af42b8d1
  3. 30 11月, 2010 2 次提交
    • I
      xen: x86/32: perform initial startup on initial_page_table · 805e3f49
      Ian Campbell 提交于
      Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
      (which also starts on initial_page_table) switches to it.  This helps ensure
      that the generic setup paths work on Xen unmodified. In particular
      clone_pgd_range writes directly to the destination pgd entries and is used to
      initialise swapper_pg_dir so we need to ensure that it remains writeable until
      the last possible moment during bring up.
      
      This is complicated slightly by the need to avoid sharing kernel PMD entries
      when running under Xen, therefore the Xen implementation must make a copy of
      the kernel PMD (which is otherwise referred to by both intial_page_table and
      swapper_pg_dir) before switching to swapper_pg_dir.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      805e3f49
    • J
      xen: don't bother to stop other cpus on shutdown/reboot · 31e323cc
      Jeremy Fitzhardinge 提交于
      Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
      no need to do it manually.
      
      In any case it will fail because all the IPI irqs have been pulled
      down by this point, so the cross-CPU calls will simply hang forever.
      
      Until change 76fac077 the function calls
      were not synchronously waited for, so this wasn't apparent.  However after
      that change the calls became synchronous leading to a hang on shutdown
      on multi-VCPU guests.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      Cc: Alok Kataria <akataria@vmware.com>
      31e323cc
  4. 28 11月, 2010 1 次提交
  5. 26 11月, 2010 2 次提交
  6. 25 11月, 2010 3 次提交
  7. 23 11月, 2010 3 次提交
    • J
      xen: use default_idle · bc15fde7
      Jeremy Fitzhardinge 提交于
      We just need the idle loop to drop into safe_halt, which default_idle()
      is perfectly capable of doing.  There's no need to duplicate it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      bc15fde7
    • J
      xen: clean up "extra" memory handling some more · c2d08791
      Jeremy Fitzhardinge 提交于
      Make sure that extra_pages is added for all E820_RAM regions beyond
      mem_end - completely excluded regions as well as the remains of partially
      included regions.
      
      Also makes sure the extra region is not unnecessarily high, and simplifies
      the logic to decide which regions should be added.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      c2d08791
    • K
      xen: set IO permission early (before early_cpu_init()) · ec35a69c
      Konrad Rzeszutek Wilk 提交于
      This patch is based off "xen dom0: Set up basic IO permissions for dom0."
      by Juan Quintela <quintela@redhat.com>.
      
      On AMD machines when we boot the kernel as Domain 0 we get this nasty:
      
      mapping kernel into physical memory
      Xen: setup ISA identity maps
      about to get started...
      (XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1-101116  x86_64  debug=y  Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff8130271b>]
      (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
      (XEN) rax: 000000008000c068   rbx: ffffffff8186c680   rcx: 0000000000000068
      (XEN) rdx: 0000000000000cf8   rsi: 000000000000c000   rdi: 0000000000000000
      (XEN) rbp: ffffffff81801e98   rsp: ffffffff81801e50   r8:  ffffffff81801eac
      (XEN) r9:  ffffffff81801ea8   r10: ffffffff81801eb4   r11: 00000000ffffffff
      (XEN) r12: ffffffff8186c694   r13: ffffffff81801f90   r14: ffffffffffffffff
      (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000221803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801e50:
      
      RIP points to read_pci_config() function.
      
      The issue is that we don't set IO permissions for the Linux kernel early enough.
      
      The call sequence used to be:
      
          xen_start_kernel()
      	x86_init.oem.arch_setup = xen_setup_arch;
              setup_arch:
                 - early_cpu_init
                     - early_init_amd
                        - read_pci_config
                 - x86_init.oem.arch_setup [ xen_arch_setup ]
                     - set IO permissions.
      
      We need to set the IO permissions earlier on, which this patch does.
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ec35a69c
  8. 20 11月, 2010 1 次提交
  9. 18 11月, 2010 10 次提交
  10. 13 11月, 2010 1 次提交
  11. 12 11月, 2010 3 次提交
  12. 11 11月, 2010 3 次提交
    • S
      tracing: Force arch_local_irq_* notrace for paravirt · b5908548
      Steven Rostedt 提交于
      When running ktest.pl randconfig tests, I would sometimes trigger
      a lockdep annotation bug (possible reason: unannotated irqs-on).
      
      This triggering happened right after function tracer self test was
      executed. After doing a config bisect I found that this was caused with
      having function tracer, paravirt guest, prove locking, and rcu torture
      all enabled.
      
      The rcu torture just enhanced the likelyhood of triggering the bug.
      Prove locking was needed, since it was the thing that was bugging.
      Function tracer would trace and disable interrupts in all sorts
      of funny places.
      paravirt guest would turn arch_local_irq_* into functions that would
      be traced.
      
      Besides the fact that tracing arch_local_irq_* is just a bad idea,
      this is what is happening.
      
      The bug happened simply in the local_irq_restore() code:
      
      		if (raw_irqs_disabled_flags(flags)) {	\
      			raw_local_irq_restore(flags);	\
      			trace_hardirqs_off();		\
      		} else {				\
      			trace_hardirqs_on();		\
      			raw_local_irq_restore(flags);	\
      		}					\
      
      The raw_local_irq_restore() was defined as arch_local_irq_restore().
      
      Now imagine, we are about to enable interrupts. We go into the else
      case and call trace_hardirqs_on() which tells lockdep that we are enabling
      interrupts, so it sets the current->hardirqs_enabled = 1.
      
      Then we call raw_local_irq_restore() which calls arch_local_irq_restore()
      which gets traced!
      
      Now in the function tracer we disable interrupts with local_irq_save().
      This is fine, but flags is stored that we have interrupts disabled.
      
      When the function tracer calls local_irq_restore() it does it, but this
      time with flags set as disabled, so we go into the if () path.
      This keeps interrupts disabled and calls trace_hardirqs_off() which
      sets current->hardirqs_enabled = 0.
      
      When the tracer is finished and proceeds with the original code,
      we enable interrupts but leave current->hardirqs_enabled as 0. Which
      now breaks lockdeps internal processing.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b5908548
    • I
      xen: do not release any memory under 1M in domain 0 · 9ec23a7f
      Ian Campbell 提交于
      We already deliberately setup a 1-1 P2M for the region up to 1M in
      order to allow code which assumes this region is already mapped to
      work without having to convert everything to ioremap.
      
      Domain 0 should not return any apparently unused memory regions
      (reserved or otherwise) in this region to Xen since the e820 may not
      accurately reflect what the BIOS has stashed in this region.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      9ec23a7f
    • P
      perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation · 034c6efa
      Peter Zijlstra 提交于
      Jasper suggested we use the zeroing capability of the allocators
      instead of calling memset ourselves. Add node affinity while we're at
      it.
      Reported-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      034c6efa
  13. 10 11月, 2010 6 次提交
  14. 09 11月, 2010 1 次提交
  15. 06 11月, 2010 1 次提交