1. 20 2月, 2012 1 次提交
  2. 04 2月, 2012 1 次提交
    • K
      xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic. · 41bd956d
      Konrad Rzeszutek Wilk 提交于
      When a user offlines a VCPU and then onlines it, we get:
      
      NMI watchdog disabled (cpu2): hardware events not enabled
      BUG: scheduling while atomic: swapper/2/0/0x00000002
      Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod libcrc32c crc32c radeon fbco
       ttm bitblit softcursor drm_kms_helper xen_blkfront xen_netfront xen_fbfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront xenfs [last unloaded:
      
      Pid: 0, comm: swapper/2 Tainted: G           O 3.2.0phase15.1-00003-gd6f7f5b-dirty #4
      Call Trace:
       [<ffffffff81070571>] __schedule_bug+0x61/0x70
       [<ffffffff8158eb78>] __schedule+0x798/0x850
       [<ffffffff8158ed6a>] schedule+0x3a/0x50
       [<ffffffff810349be>] cpu_idle+0xbe/0xe0
       [<ffffffff81583599>] cpu_bringup_and_idle+0xe/0x10
      
      The reason for this should be obvious from this call-chain:
      cpu_bringup_and_idle:
       \- cpu_bringup
        |   \-[preempt_disable]
        |
        |- cpu_idle
             \- play_dead [assuming the user offlined the VCPU]
             |     \
             |     +- (xen_play_dead)
             |          \- HYPERVISOR_VCPU_off [so VCPU is dead, once user
             |          |                       onlines it starts from here]
             |          \- cpu_bringup [preempt_disable]
             |
             +- preempt_enable_no_reschedule()
             +- schedule()
             \- preempt_enable()
      
      So we have two preempt_disble() and one preempt_enable(). Calling
      preempt_enable() after the cpu_bringup() in the xen_play_dead
      fixes the imbalance.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      41bd956d
  3. 25 1月, 2012 1 次提交
    • D
      x86: xen: size struct xen_spinlock to always fit in arch_spinlock_t · 7a7546b3
      David Vrabel 提交于
      If NR_CPUS < 256 then arch_spinlock_t is only 16 bits wide but struct
      xen_spinlock is 32 bits.  When a spin lock is contended and
      xl->spinners is modified the two bytes immediately after the spin lock
      would be corrupted.
      
      This is a regression caused by 84eb950d
      (x86, ticketlock: Clean up types and accessors) which reduced the size
      of arch_spinlock_t.
      
      Fix this by making xl->spinners a u8 if NR_CPUS < 256.  A
      BUILD_BUG_ON() is also added to check the sizes of the two structures
      are compatible.
      
      In many cases this was not noticable as there would often be padding
      bytes after the lock (e.g., if any of CONFIG_GENERIC_LOCKBREAK,
      CONFIG_DEBUG_SPINLOCK, or CONFIG_DEBUG_LOCK_ALLOC were enabled).
      
      The bnx2 driver is affected. In struct bnx2, phy_lock and
      indirect_lock may have no padding after them.  Contention on phy_lock
      would corrupt indirect_lock making it appear locked and the driver
      would deadlock.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      CC: stable@kernel.org #only 3.2
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7a7546b3
  4. 10 1月, 2012 1 次提交
  5. 04 1月, 2012 1 次提交
  6. 16 12月, 2011 1 次提交
    • I
      xen: only limit memory map to maximum reservation for domain 0. · d3db7281
      Ian Campbell 提交于
      d312ae87 "xen: use maximum reservation to limit amount of usable RAM"
      clamped the total amount of RAM to the current maximum reservation. This is
      correct for dom0 but is not correct for guest domains. In order to boot a guest
      "pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
      future memory expansion the guest must derive max_pfn from the e820 provided by
      the toolstack and not the current maximum reservation (which can reflect only
      the current maximum, not the guest lifetime max). The existing algorithm
      already behaves this correctly if we do not artificially limit the maximum
      number of pages for the guest case.
      
      For a guest booted with maxmem=512, memory=128 this results in:
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
       [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      -[    0.000000]  Xen: 0000000000100000 - 0000000008100000 (usable)
      -[    0.000000]  Xen: 0000000008100000 - 0000000020800000 (unusable)
      +[    0.000000]  Xen: 0000000000100000 - 0000000020800000 (usable)
      ...
       [    0.000000] NX (Execute Disable) protection: active
       [    0.000000] DMI not present or invalid.
       [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
       [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
      -[    0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
      +[    0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
       [    0.000000] initial memory mapped : 0 - 027ff000
       [    0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
      -[    0.000000] init_memory_mapping: 0000000000000000-0000000008100000
      -[    0.000000]  0000000000 - 0008100000 page 4k
      -[    0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
      +[    0.000000] init_memory_mapping: 0000000000000000-0000000020800000
      +[    0.000000]  0000000000 - 0020800000 page 4k
      +[    0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
       [    0.000000] xen: setting RW the range 27e8000 - 27ff000
       [    0.000000] 0MB HIGHMEM available.
      -[    0.000000] 129MB LOWMEM available.
      -[    0.000000]   mapped low ram: 0 - 08100000
      -[    0.000000]   low ram: 0 - 08100000
      +[    0.000000] 520MB LOWMEM available.
      +[    0.000000]   mapped low ram: 0 - 20800000
      +[    0.000000]   low ram: 0 - 20800000
      
      With this change "xl mem-set <domain> 512M" will successfully increase the
      guest RAM (by reducing the balloon).
      
      There is no change for dom0.
      Reported-and-Tested-by: NGeorge Shuklin <george.shuklin@gmail.com>
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@kernel.org
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d3db7281
  7. 09 12月, 2011 1 次提交
    • T
      memblock: Kill memblock_init() · fe091c20
      Tejun Heo 提交于
      memblock_init() initializes arrays for regions and memblock itself;
      however, all these can be done with struct initializers and
      memblock_init() can be removed.  This patch kills memblock_init() and
      initializes memblock with struct initializer.
      
      The only difference is that the first dummy entries don't have .nid
      set to MAX_NUMNODES initially.  This doesn't cause any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      fe091c20
  8. 04 12月, 2011 1 次提交
    • K
      xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf
      Konrad Rzeszutek Wilk 提交于
      The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
      pm_idle and default_idle") was to have one call - disable_cpuidle()
      which would make pm_idle not be molested by other code.  It disallows
      cpuidle_idle_call to be set to pm_idle (which is excellent).
      
      But in the select_idle_routine() and idle_setup(), the pm_idle can still
      be set to either: amd_e400_idle, mwait_idle or default_idle.  This
      depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.
      
      In case of mwait_idle we can hit some instances where the hypervisor
      (Amazon EC2 specifically) sets the MWAIT and we get:
      
        Brought up 2 CPUs
        invalid opcode: 0000 [#1] SMP
      
        Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
        RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
        ...
        Call Trace:
         [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
         [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
        RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
         RSP <ffff8801d28ddf10>
      
      In the case of amd_e400_idle we don't get so spectacular crashes, but we
      do end up making an MSR which is trapped in the hypervisor, and then
      follow it up with a yield hypercall.  Meaning we end up going to
      hypervisor twice instead of just once.
      
      The previous behavior before v3.0 was that pm_idle was set to
      default_idle regardless of select_idle_routine/idle_setup.
      
      We want to do that, but only for one specific case: Xen.  This patch
      does that.
      
      Fixes RH BZ #739499 and Ubuntu #881076
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5fd47bf
  9. 22 11月, 2011 3 次提交
    • A
      xen/granttable: Grant tables V2 implementation · 85ff6acb
      Annie Li 提交于
      Receiver-side copying of packets is based on this implementation, it gives
      better performance and better CPU accounting. It totally supports three types:
      full-page, sub-page and transitive grants.
      
      However this patch does not cover sub-page and transitive grants, it mainly
      focus on Full-page part and implements grant table V2 interfaces corresponding
      to what already exists in grant table V1, such as: grant table V2
      initialization, mapping, releasing and exported interfaces.
      
      Each guest can only supports one type of grant table type, every entry in grant
      table should be the same version. It is necessary to set V1 or V2 version before
      initializing the grant table.
      
      Grant table exported interfaces of V2 are same with those of V1, Xen is
      responsible to judge what grant table version guests are using in every grant
      operation.
      
      V2 fulfills the same role of V1, and it is totally backwards compitable with V1.
      If dom0 support grant table V2, the guests runing on it can run with either V1
      or V2.
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NAnnie Li <annie.li@oracle.com>
      [v1: Modified alloc_vm_area call (new parameters), indentation, and cleanpatch
           warnings]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      85ff6acb
    • A
      xen/granttable: Introducing grant table V2 stucture · 0f9f5a95
      Annie Li 提交于
      This patch introduces new structures of grant table V2, grant table V2 is an
      extension from V1. Grant table is shared between guest and Xen, and Xen is
      responsible to do corresponding work for grant operations, such as: figure
      out guest's grant table version, perform different actions based on
      different grant table version, etc. Although full-page structure of V2
      is different from V1, it play the same role as V1.
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NAnnie Li <annie.li@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0f9f5a95
    • M
      xen: Make XEN_MAX_DOMAIN_MEMORY have more sensible defaults · 80df4649
      Maxim Uvarov 提交于
      Which is that 128GB is not going to happen with 32-bit PV DomU.
      Lets use something more realistic. Also update the 64-bit to 500GB
      which is the max a PV guest can do.
      Signed-off-by: NMaxim Uvarov <maxim.uvarov@oracle.com>
      [v1: Updated 128GB->500GB for 64-bit]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      80df4649
  10. 17 11月, 2011 2 次提交
  11. 20 10月, 2011 3 次提交
  12. 30 9月, 2011 1 次提交
  13. 29 9月, 2011 6 次提交
    • D
      xen: release all pages within 1-1 p2m mappings · f3f436e3
      David Vrabel 提交于
      In xen_memory_setup() all reserved regions and gaps are set to an
      identity (1-1) p2m mapping.  If an available page has a PFN within one
      of these 1-1 mappings it will become inaccessible (as it MFN is lost)
      so release them before setting up the mapping.
      
      This can make an additional 256 MiB or more of RAM available
      (depending on the size of the reserved regions in the memory map) if
      the initial pages overlap with reserved regions.
      
      The 1:1 p2m mappings are also extended to cover partial pages.  This
      fixes an issue with (for example) systems with a BIOS that puts the
      DMI tables in a reserved region that begins on a non-page boundary.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f3f436e3
    • D
      xen: allow extra memory to be in multiple regions · dc91c728
      David Vrabel 提交于
      Allow the extra memory (used by the balloon driver) to be in multiple
      regions (typically two regions, one for low memory and one for high
      memory).  This allows the balloon driver to increase the number of
      available low pages (if the initial number if pages is small).
      
      As a side effect, the algorithm for building the e820 memory map is
      simpler and more obviously correct as the map supplied by the
      hypervisor is (almost) used as is (in particular, all reserved regions
      and gaps are preserved).  Only RAM regions are altered and RAM regions
      above max_pfn + extra_pages are marked as unused (the region is split
      in two if necessary).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      dc91c728
    • D
      xen: allow balloon driver to use more than one memory region · 8b5d44a5
      David Vrabel 提交于
      Allow the xen balloon driver to populate its list of extra pages from
      more than one region of memory.  This will allow platforms to provide
      (for example) a region of low memory and a region of high memory.
      
      The maximum possible number of extra regions is 128 (== E820MAX) which
      is quite large so xen_extra_mem is placed in __initdata.  This is safe
      as both xen_memory_setup() and balloon_init() are in __init.
      
      The balloon regions themselves are not altered (i.e., there is still
      only the one region).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8b5d44a5
    • D
      xen/balloon: account for pages released during memory setup · aa24411b
      David Vrabel 提交于
      In xen_memory_setup() pages that occur in gaps in the memory map are
      released back to Xen.  This reduces the domain's current page count in
      the hypervisor.  The Xen balloon driver does not correctly decrease
      its initial current_pages count to reflect this.  If 'delta' pages are
      released and the target is adjusted the resulting reservation is
      always 'delta' less than the requested target.
      
      This affects dom0 if the initial allocation of pages overlaps the PCI
      memory region but won't affect most domU guests that have been setup
      with pseudo-physical memory maps that don't have gaps.
      
      Fix this by accouting for the released pages when starting the balloon
      driver.
      
      If the domain's targets are managed by xapi, the domain may eventually
      run out of memory and die because xapi currently gets its target
      calculations wrong and whenever it is restarted it always reduces the
      target by 'delta'.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      aa24411b
    • S
      xen: XEN_PVHVM depends on PCI · b17d0b5c
      Stefano Stabellini 提交于
      Xen PV on HVM guests require PCI support because they need the
      xen-platform-pci driver in order to initialize xenbus.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b17d0b5c
    • S
      xen: modify kernel mappings corresponding to granted pages · 0930bba6
      Stefano Stabellini 提交于
      If we want to use granted pages for AIO, changing the mappings of a user
      vma and the corresponding p2m is not enough, we also need to update the
      kernel mappings accordingly.
      Currently this is only needed for pages that are created for user usages
      through /dev/xen/gntdev. As in, pages that have been in use by the
      kernel and use the P2M will not need this special mapping.
      However there are no guarantees that in the future the kernel won't
      start accessing pages through the 1:1 even for internal usage.
      
      In order to avoid the complexity of dealing with highmem, we allocated
      the pages lowmem.
      We issue a HYPERVISOR_grant_table_op right away in
      m2p_add_override and we remove the mappings using another
      HYPERVISOR_grant_table_op in m2p_remove_override.
      Considering that m2p_add_override and m2p_remove_override are called
      once per page we use multicalls and hypercall batching.
      
      Use the kmap_op pointer directly as argument to do the mapping as it is
      guaranteed to be present up until the unmapping is done.
      Before issuing any unmapping multicalls, we need to make sure that the
      mapping has already being done, because we need the kmap->handle to be
      set correctly.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v1: Removed GRANT_FRAME_BIT usage]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0930bba6
  14. 27 9月, 2011 1 次提交
  15. 24 9月, 2011 2 次提交
  16. 15 9月, 2011 1 次提交
  17. 13 9月, 2011 1 次提交
    • D
      xen/e820: if there is no dom0_mem=, don't tweak extra_pages. · e3b73c4a
      David Vrabel 提交于
      The patch "xen: use maximum reservation to limit amount of usable RAM"
      (d312ae87) breaks machines that
      do not use 'dom0_mem=' argument with:
      
      reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
      (XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      ...
      
      The reason being that the last E820 entry is created using the
      'extra_pages' (which is based on how many pages have been freed).
      The mentioned git commit sets the initial value of 'extra_pages'
      using a hypercall which returns the number of pages (if dom0_mem
      has been used) or -1 otherwise. If the later we return with
      MAX_DOMAIN_PAGES as basis for calculation:
      
          return min(max_pages, MAX_DOMAIN_PAGES);
      
      and use it:
      
           extra_limit = xen_get_max_pages();
           if (extra_limit >= max_pfn)
                   extra_pages = extra_limit - max_pfn;
           else
                   extra_pages = 0;
      
      which means we end up with extra_pages = 128GB in PFNs (33554432)
      - 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
      and then we add that value to the E820 making it:
      
        Xen: 00000000ff000000 - 0000000100000000 (reserved)
        Xen: 0000000100000000 - 000000133f2e2000 (usable)
      
      which is clearly wrong. It should look as so:
      
        Xen: 00000000ff000000 - 0000000100000000 (reserved)
        Xen: 0000000100000000 - 000000027fbda000 (usable)
      
      Naturally this problem does not present itself if dom0_mem=max:X
      is used.
      
      CC: stable@kernel.org
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e3b73c4a
  18. 09 9月, 2011 1 次提交
  19. 02 9月, 2011 2 次提交
  20. 01 9月, 2011 1 次提交
    • D
      xen: use maximum reservation to limit amount of usable RAM · d312ae87
      David Vrabel 提交于
      Use the domain's maximum reservation to limit the amount of extra RAM
      for the memory balloon. This reduces the size of the pages tables and
      the amount of reserved low memory (which defaults to about 1/32 of the
      total RAM).
      
      On a system with 8 GiB of RAM with the domain limited to 1 GiB the
      kernel reports:
      
      Before:
      
      Memory: 627792k/4472000k available
      
      After:
      
      Memory: 549740k/11132224k available
      
      A increase of about 76 MiB (~1.5% of the unused 7 GiB).  The reserved
      low memory is also reduced from 253 MiB to 32 MiB.  The total
      additional usable RAM is 329 MiB.
      
      For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit
      the number of pages for dom0') (c/s 23790)
      
      CC: stable@kernel.org
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d312ae87
  21. 25 8月, 2011 1 次提交
  22. 22 8月, 2011 2 次提交
  23. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  24. 10 8月, 2011 1 次提交
  25. 05 8月, 2011 3 次提交