1. 30 5月, 2012 1 次提交
    • K
      xen/balloon: Subtract from xen_released_pages the count that is populated. · 58b7b53a
      Konrad Rzeszutek Wilk 提交于
      We did not take into account that xen_released_pages would be
      used outside the initial E820 parsing code. As such we would
      did not subtract from xen_released_pages the count of pages
      that we had populated back (instead we just did a simple
      extra_pages = released - populated).
      
      The balloon driver uses xen_released_pages to set the initial
      current_pages count.  If this is wrong (too low) then when a new
      (higher) target is set, the balloon driver will request too many pages
      from Xen."
      
      This fixes errors such as:
      
      (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (51 of 512)
      during bootup and
      free_memory            : 0
      
      where the free_memory should be 128.
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Per David's review made the git commit better]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      58b7b53a
  2. 22 5月, 2012 1 次提交
  3. 21 5月, 2012 1 次提交
    • K
      xen/smp: unbind irqworkX when unplugging vCPUs. · 2f1bd67d
      Konrad Rzeszutek Wilk 提交于
      The git commit  1ff2b0c3
      "xen: implement IRQ_WORK_VECTOR handler" added the functionality
      to have a per-cpu "irqworkX" for the IPI APIC functionality.
      However it missed the unbind when a vCPU is unplugged resulting
      in an orphaned per-cpu interrupt line for unplugged vCPU:
      
        30:        216          0   xen-dyn-event     hvc_console
        31:        810          4   xen-dyn-event     eth0
        32:         29          0   xen-dyn-event     blkif
      - 36:          0          0  xen-percpu-ipi       irqwork2
      - 37:        287          0   xen-dyn-event     xenbus
      + 36:        287          0   xen-dyn-event     xenbus
       NMI:          0          0   Non-maskable interrupts
       LOC:          0          0   Local timer interrupts
       SPU:          0          0   Spurious interrupts
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2f1bd67d
  4. 08 5月, 2012 7 次提交
    • K
      xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep · 211063dc
      Konrad Rzeszutek Wilk 提交于
      Provide the registration callback to call in the Xen's
      ACPI sleep functionality. This means that during S3/S5
      we make a hypercall XENPF_enter_acpi_sleep with the
      proper PM1A/PM1B registers.
      
      Based of Ke Yu's <ke.yu@intel.com> initial idea.
      [ From http://xenbits.xensource.com/linux-2.6.18-xen.hg
      change c68699484a65 ]
      
      [v1: Added Copyright and license]
      [v2: Added check if PM1A/B the 16-bits MSB contain something. The spec
           only uses 16-bits but might have more in future]
      Signed-off-by: NLiang Tang <liang.tang@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      211063dc
    • L
      xen: implement IRQ_WORK_VECTOR handler · 1ff2b0c3
      Lin Ming 提交于
      Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1ff2b0c3
    • B
      xen: implement apic ipi interface · f447d56d
      Ben Guthro 提交于
      Map native ipi vector to xen vector.
      Implement apic ipi interface with xen_send_IPI_one.
      Tested-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NBen Guthro <ben@guthro.net>
      Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f447d56d
    • D
      xen/setup: update VA mapping when releasing memory during setup · 83d51ab4
      David Vrabel 提交于
      In xen_memory_setup(), if a page that is being released has a VA
      mapping this must also be updated.  Otherwise, the page will be not
      released completely -- it will still be referenced in Xen and won't be
      freed util the mapping is removed and this prevents it from being
      reallocated at a different PFN.
      
      This was already being done for the ISA memory region in
      xen_ident_map_ISA() but on many systems this was omitting a few pages
      as many systems marked a few pages below the ISA memory region as
      reserved in the e820 map.
      
      This fixes errors such as:
      
      (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
      (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      83d51ab4
    • K
      xen/setup: Combine the two hypercall functions - since they are quite similar. · 96dc08b3
      Konrad Rzeszutek Wilk 提交于
      They use the same set of arguments, so it is just the matter
      of using the proper hypercall.
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      96dc08b3
    • K
      xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM · 2e2fb754
      Konrad Rzeszutek Wilk 提交于
      When the Xen hypervisor boots a PV kernel it hands it two pieces
      of information: nr_pages and a made up E820 entry.
      
      The nr_pages value defines the range from zero to nr_pages of PFNs
      which have a valid Machine Frame Number (MFN) underneath it. The
      E820 mirrors that (with the VGA hole):
      BIOS-provided physical RAM map:
       Xen: 0000000000000000 - 00000000000a0000 (usable)
       Xen: 00000000000a0000 - 0000000000100000 (reserved)
       Xen: 0000000000100000 - 0000000080800000 (usable)
      
      The fun comes when a PV guest that is run with a machine E820 - that
      can either be the initial domain or a PCI PV guest, where the E820
      looks like the normal thing:
      
      BIOS-provided physical RAM map:
       Xen: 0000000000000000 - 000000000009e000 (usable)
       Xen: 000000000009ec00 - 0000000000100000 (reserved)
       Xen: 0000000000100000 - 0000000020000000 (usable)
       Xen: 0000000020000000 - 0000000020200000 (reserved)
       Xen: 0000000020200000 - 0000000040000000 (usable)
       Xen: 0000000040000000 - 0000000040200000 (reserved)
       Xen: 0000000040200000 - 00000000bad80000 (usable)
       Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
      ..
      With that overlaying the nr_pages directly on the E820 does not
      work as there are gaps and non-RAM regions that won't be used
      by the memory allocator. The 'xen_release_chunk' helps with that
      by punching holes in the P2M (PFN to MFN lookup tree) for those
      regions and tells us that:
      
      Freeing  20000-20200 pfn range: 512 pages freed
      Freeing  40000-40200 pfn range: 512 pages freed
      Freeing  bad80-badf4 pfn range: 116 pages freed
      Freeing  badf6-bae7f pfn range: 137 pages freed
      Freeing  bb000-100000 pfn range: 282624 pages freed
      Released 283999 pages of unused memory
      
      Those 283999 pages are subtracted from the nr_pages and are returned
      to the hypervisor. The end result is that the initial domain
      boots with 1GB less memory as the nr_pages has been subtracted by
      the amount of pages residing within the PCI hole. It can balloon up
      to that if desired using 'xl mem-set 0 8092', but the balloon driver
      is not always compiled in for the initial domain.
      
      This patch, implements the populate hypercall (XENMEM_populate_physmap)
      which increases the the domain with the same amount of pages that
      were released.
      
      The other solution (that did not work) was to transplant the MFN in
      the P2M tree - the ones that were going to be freed were put in
      the E820_RAM regions past the nr_pages. But the modifications to the
      M2P array (the other side of creating PTEs) were not carried away.
      As the hypervisor is the only one capable of modifying that and the
      only two hypercalls that would do this are: the update_va_mapping
      (which won't work, as during initial bootup only PFNs up to nr_pages
      are mapped in the guest) or via the populate hypercall.
      
      The end result is that the kernel can now boot with the
      nr_pages without having to subtract the 283999 pages.
      
      On a 8GB machine, with various dom0_mem= parameters this is what we get:
      
      no dom0_mem
      -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
      +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
      
      dom0_mem=3G
      -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
      +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)
      
      dom0_mem=max:3G
      -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
      +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)
      
      And the 'xm list' or 'xl list' now reflect what the dom0_mem=
      argument is.
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v2: Use populate hypercall]
      [v3: Remove debug printks]
      [v4: Simplify code]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2e2fb754
    • K
      xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 · ca118238
      Konrad Rzeszutek Wilk 提交于
      Otherwise we can get these meaningless:
      Freeing  bad80-badf4 pfn range: 0 pages freed
      
      We also can do this for the summary ones - no point of printing
      "Set 0 page(s) to 1-1 mapping"
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Extended to the summary printks]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ca118238
  5. 17 4月, 2012 1 次提交
  6. 16 4月, 2012 1 次提交
    • L
      x86-32: fix up strncpy_from_user() sign error · 12e993b8
      Linus Torvalds 提交于
      The 'max' range needs to be unsigned, since the size of the user address
      space is bigger than 2GB.
      
      We know that 'count' is positive in 'long' (that is checked in the
      caller), so we will truncate 'max' down to something that fits in a
      signed long, but before we actually do that, that comparison needs to be
      done in unsigned.
      
      Bug introduced in commit 92ae03f2 ("x86: merge 32/64-bit versions of
      'strncpy_from_user()' and speed it up").  On x86-64 you can't trigger
      this, since the user address space is much smaller than 63 bits, and on
      x86-32 it works in practice, since you would seldom hit the strncpy
      limits anyway.
      
      I had actually tested the corner-cases, I had only tested them on
      x86-64.  Besides, I had only worried about the case of a pointer *close*
      to the end of the address space, rather than really far away from it ;)
      
      This also changes the "we hit the user-specified maximum" to return
      'res', for the trivial reason that gcc seems to generate better code
      that way.  'res' and 'count' are the same in that case, so it really
      doesn't matter which one we return.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12e993b8
  7. 12 4月, 2012 1 次提交
    • L
      x86: merge 32/64-bit versions of 'strncpy_from_user()' and speed it up · 92ae03f2
      Linus Torvalds 提交于
      This merges the 32- and 64-bit versions of the x86 strncpy_from_user()
      by just rewriting it in C rather than the ancient inline asm versions
      that used lodsb/stosb and had been duplicated for (trivial) differences
      between the 32-bit and 64-bit versions.
      
      While doing that, it also speeds them up by doing the accesses a word at
      a time.  Finally, the new routines also properly handle the case of
      hitting the end of the address space, which we have never done correctly
      before (fs/namei.c has a hack around it for that reason).
      
      Despite all these improvements, it actually removes more lines than it
      adds, due to the de-duplication.  Also, we no longer export (or define)
      the legacy __strncpy_from_user() function (that was defined to not do
      the user permission checks), since it's not actually used anywhere, and
      the user address space checks are built in to the new code.
      
      Other architecture maintainers have been notified that the old hack in
      fs/namei.c will be going away in the 3.5 merge window, in case they
      copied the x86 approach of being a bit cavalier about the end of the
      address space.
      
      Cc: linux-arch@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92ae03f2
  8. 10 4月, 2012 3 次提交
  9. 07 4月, 2012 9 次提交
    • K
      xen/p2m: An early bootup variant of set_phys_to_machine · 940713bb
      Konrad Rzeszutek Wilk 提交于
      During early bootup we can't use alloc_page, so to allocate
      leaf pages in the P2M we need to use extend_brk. For that
      we are utilizing the early_alloc_p2m and early_alloc_p2m_middle
      functions to do the job for us. This function follows the
      same logic as set_phys_to_machine.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      940713bb
    • K
      xen/p2m: Collapse early_alloc_p2m_middle redundant checks. · d5096850
      Konrad Rzeszutek Wilk 提交于
      At the start of the function we were checking for idx != 0
      and bailing out. And later calling extend_brk if idx != 0.
      
      That is unnecessary so remove that checks.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5096850
    • K
      xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument · cef4cca5
      Konrad Rzeszutek Wilk 提交于
      For identity cases we want to call reserve_brk only on the boundary
      conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is
      to work around identify regions (PCI spaces, gaps in E820) which are not
      aligned on 2MB regions.
      
      However for the case were we want to allocate P2M middle leafs at the
      early bootup stage, irregardless of this alignment check we need some
      means of doing that.  For that we provide the new argument.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cef4cca5
    • K
      xen/p2m: Move code around to allow for better re-usage. · 3f3aaea2
      Konrad Rzeszutek Wilk 提交于
      We are going to be using the early_alloc_p2m (and
      early_alloc_p2m_middle) code in follow up patches which
      are not related to setting identity pages.
      
      Hence lets move the code out in its own function and
      rename them as appropiate.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3f3aaea2
    • L
      Make the "word-at-a-time" helper functions more commonly usable · f68e556e
      Linus Torvalds 提交于
      I have a new optimized x86 "strncpy_from_user()" that will use these
      same helper functions for all the same reasons the name lookup code uses
      them.  This is preparation for that.
      
      This moves them into an architecture-specific header file.  It's
      architecture-specific for two reasons:
      
       - some of the functions are likely to want architecture-specific
         implementations.  Even if the current code happens to be "generic" in
         the sense that it should work on any little-endian machine, it's
         likely that the "multiply by a big constant and shift" implementation
         is less than optimal for an architecture that has a guaranteed fast
         bit count instruction, for example.
      
       - I expect that if architectures like sparc want to start playing
         around with this, we'll need to abstract out a few more details (in
         particular the actual unaligned accesses).  So we're likely to have
         more architecture-specific stuff if non-x86 architectures start using
         this.
      
         (and if it turns out that non-x86 architectures don't start using
         this, then having it in an architecture-specific header is still the
         right thing to do, of course)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f68e556e
    • H
      x86: Use correct byte-sized register constraint in __add() · 8c91c532
      H. Peter Anvin 提交于
      Similar to:
      
       2ca052a3 x86: Use correct byte-sized register constraint in __xchg_op()
      
      ... the __add() macro also needs to use a "q" constraint in the
      byte-sized case, lest we try to generate an illegal register.
      
      Link: http://lkml.kernel.org/r/4F7A3315.501@goop.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Leigh Scott <leigh123linux@googlemail.com>
      Cc: Thomas Reitmayr <treitmayr@devbase.at>
      Cc: <stable@vger.kernel.org> v3.3
      8c91c532
    • J
      x86: Use correct byte-sized register constraint in __xchg_op() · 2ca052a3
      Jeremy Fitzhardinge 提交于
      x86-64 can access the low half of any register, but i386 can only do
      it with a subset of registers.  'r' causes compilation failures on i386,
      but 'q' expresses the constraint properly.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/4F7A3315.501@goop.orgReported-by: NLeigh Scott <leigh123linux@googlemail.com>
      Tested-by: NThomas Reitmayr <treitmayr@devbase.at>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@vger.kernel.org> v3.3
      2ca052a3
    • S
      xen/smp: Remove unnecessary call to smp_processor_id() · e8c9e788
      Srivatsa S. Bhat 提交于
      There is an extra and unnecessary call to smp_processor_id()
      in cpu_bringup(). Remove it.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e8c9e788
    • K
      xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' · 2531d64b
      Konrad Rzeszutek Wilk 提交于
      The above mentioned patch checks the IOAPIC and if it contains
      -1, then it unmaps said IOAPIC. But under Xen we get this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      PGD 0
      Oops: 0002 [#1] SMP
      CPU 0
      Modules linked in:
      
      Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
      1525                  /0U990C
      RIP: e030:[<ffffffff8134e51f>]  [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      RSP: e02b: ffff8800d42cbb70  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
      RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
      RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
      FS:  0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0:000000008005003b
      CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
      Stack:
       00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
       ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
       0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
      Call Trace:
       [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
       [<ffffffff8100a9b2>] ? check_events+0x12+0x20
       [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
       [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
       [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
       [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
       [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
       [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
       [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
       [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
       [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20
      
      The reason we are dying is b/c the call acpi_get_override_irq() is used,
      which returns the polarity and trigger for the IRQs. That function calls
      mp_find_ioapics to get the 'struct ioapic' structure - which along with the
      mp_irq[x] is used to figure out the default values and the polarity/trigger
      overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
      with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
      mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.
      
      The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
      struct so that platforms can override it. But for v3.4 lets carry this
      work-around. This patch does that by providing a slightly different variant
      of the fake IOAPIC entries.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2531d64b
  10. 06 4月, 2012 5 次提交
  11. 04 4月, 2012 1 次提交
  12. 03 4月, 2012 1 次提交
  13. 02 4月, 2012 1 次提交
  14. 31 3月, 2012 1 次提交
  15. 30 3月, 2012 5 次提交
  16. 29 3月, 2012 1 次提交
    • L
      x86: Preserve lazy irq disable semantics in fixup_irqs() · 99dd5497
      Liu, Chuansheng 提交于
      The default irq_disable() sematics are to mark the interrupt disabled,
      but keep it unmasked. If the interrupt is delivered while marked
      disabled, the low level interrupt handler masks it and marks it
      pending. This is important for detecting wakeup interrupts during
      suspend and for edge type interrupts to avoid losing interrupts.
      
      fixup_irqs() moves the interrupts away from an offlined cpu. For
      certain interrupt types it needs to mask the interrupt line before
      changing the affinity. After affinity has changed the interrupt line
      is unmasked again, but only if it is not marked disabled.
      
      This breaks the lazy irq disable semantics and causes problems in
      suspend as the interrupt can be lost or wakeup functionality is
      broken.
      
      Check irqd_irq_masked() instead of irqd_irq_disabled() because
      irqd_irq_masked() is only set, when the core code actually masked the
      interrupt line. If it's not set, we unmask the interrupt and let the
      lazy irq disable logic deal with an eventually incoming interrupt.
      
      [ tglx: Massaged changelog and added a comment ]
      Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A05DFB3@SHSMSX101.ccr.corp.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      99dd5497