1. 15 3月, 2011 5 次提交
  2. 14 3月, 2011 9 次提交
    • S
      xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. · 706cc9d2
      Stefano Stabellini 提交于
      If there is no proper PFN value in the M2P for the MFN
      (so we get 0xFFFFF.. or 0x55555, or 0x0), we should
      consult the M2P override to see if there is an entry for this.
      [Note: we also consult the M2P override if the MFN
      is past our machine_to_phys size].
      
      We consult the P2M with the PFN. In case the returned
      MFN is one of the special values: 0xFFF.., 0x5555
      (which signify that the MFN can be either "missing" or it
      belongs to DOMID_IO) or the p2m(m2p(mfn)) != mfn, we check
      the M2P override. If we fail the M2P override check, we reset
      the PFN value to INVALID_P2M_ENTRY.
      
      Next we try to find the MFN in the P2M using the MFN
      value (not the PFN value) and if found, we know
      that this MFN is an identity value and return it as so.
      
      Otherwise we have exhausted all the posibilities and we
      return the PFN, which at this stage can either be a real
      PFN value found in the machine_to_phys.. array, or
      INVALID_P2M_ENTRY value.
      
      [v1: Added Review-by tag]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      706cc9d2
    • K
      xen/m2p: No need to catch exceptions when we know that there is no RAM · 146c4e51
      Konrad Rzeszutek Wilk 提交于
      .. beyound what we think is the end of memory. However there might
      be more System RAM - but assigned to a guest. Hence jump to the
      M2P override check and consult.
      
      [v1: Added Review-by tag]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      146c4e51
    • K
      xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. · fc25151d
      Konrad Rzeszutek Wilk 提交于
      Only enabled if XEN_DEBUG is enabled. We print a warning
      when:
      
       pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set
      	(and pfn is an identity mapped pfn)
       pfn_to_mfn(pfn) != pfn, and VM_IO flag is set.
      	(ditto, pfn is an identity mapped pfn)
      
      [v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS]
      [v3: Fix compiler warning]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fc25151d
    • K
      xen/debugfs: Add 'p2m' file for printing out the P2M layout. · 2222e71b
      Konrad Rzeszutek Wilk 提交于
      We walk over the whole P2M tree and construct a simplified view of
      which PFN regions belong to what level and what type they are.
      
      Only enabled if CONFIG_XEN_DEBUG_FS is set.
      
      [v2: UNKN->UNKNOWN, use uninitialized_var]
      [v3: Rebased on top of mmu->p2m code split]
      [v4: Fixed the else if]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2222e71b
    • K
      xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. · 68df0da7
      Konrad Rzeszutek Wilk 提交于
      We walk the E820 region and start at 0 (for PV guests we start
      at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
      regions and as well the gaps we set them to be identity mappings.
      
      The reasons we do not want to set the identity mapping from 0->
      ISA_END_ADDRESS when running as PV is b/c that the kernel would
      try to read DMI information and fail (no permissions to read that).
      There is a lot of gnarly code to deal with that weird region so
      we won't try to do a cleanup in this patch.
      
      This code ends up calling 'set_phys_to_identity' with the start
      and end PFN of the the E820 that are non-RAM or have gaps.
      On 99% of machines that means one big region right underneath the
      4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
      0x100000.
      
      [v2: Fix for E820 crossing 1MB region and clamp the start]
      [v3: Squshed in code that does this over ranges]
      [v4: Moved the comment to the correct spot]
      [v5: Use the "raw" E820 from the hypervisor]
      [v6: Added Review-by tag]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      68df0da7
    • K
      xen/mmu: WARN_ON when racing to swap middle leaf. · c7617798
      Konrad Rzeszutek Wilk 提交于
      The initial bootup code uses set_phys_to_machine quite a lot, and after
      bootup it would be used by the balloon driver. The balloon driver does have
      mutex lock so this should not be necessary - but just in case, add
      a WARN_ON if we do hit this scenario. If we do fail this, it is OK
      to continue as there is a backup mechanism (VM_IO) that can bypass
      the P2M and still set the _PAGE_IOMAP flags.
      
      [v2: Change from WARN to BUG_ON]
      [v3: Rebased on top of xen->p2m code split]
      [v4: Change from BUG_ON to WARN]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c7617798
    • K
      xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. · fb38923e
      Konrad Rzeszutek Wilk 提交于
      If we find that the PFN is within the P2M as an identity
      PFN make sure to tack on the _PAGE_IOMAP flag.
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fb38923e
    • K
      xen/mmu: Add the notion of identity (1-1) mapping. · f4cec35b
      Konrad Rzeszutek Wilk 提交于
      Our P2M tree structure is a three-level. On the leaf nodes
      we set the Machine Frame Number (MFN) of the PFN. What this means
      is that when one does: pfn_to_mfn(pfn), which is used when creating
      PTE entries, you get the real MFN of the hardware. When Xen sets
      up a guest it initially populates a array which has descending
      (or ascending) MFN values, as so:
      
       idx: 0,  1,       2
       [0x290F, 0x290E, 0x290D, ..]
      
      so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
      starts looking quite random.
      
      We graft this structure on our P2M tree structure and stick in
      those MFN in the leafs. But for all other leaf entries, or for the top
      root, or middle one, for which there is a void entry, we assume it is
      "missing". So
       pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
      
      We add the possibility of setting 1-1 mappings on certain regions, so
      that:
       pfn_to_mfn(0xc0000)=0xc0000
      
      The benefit of this is, that we can assume for non-RAM regions (think
      PCI BARs, or ACPI spaces), we can create mappings easily b/c we
      get the PFN value to match the MFN.
      
      For this to work efficiently we introduce one new page p2m_identity and
      allocate (via reserved_brk) any other pages we need to cover the sides
      (1GB or 4MB boundary violations). All entries in p2m_identity are set to
      INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
      no other fancy value).
      
      On lookup we spot that the entry points to p2m_identity and return the identity
      value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
      points to an allocated page, we just proceed as before and return the PFN.
      If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
      (pfn_to_mfn).
      
      The reason for having the IDENTITY_FRAME_BIT instead of just returning the
      PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
      non-identity pfn. To protect ourselves against we elect to set (and get) the
      IDENTITY_FRAME_BIT on all identity mapped PFNs.
      
      This simplistic diagram is used to explain the more subtle piece of code.
      There is also a digram of the P2M at the end that can help.
      Imagine your E820 looking as so:
      
                         1GB                                           2GB
      /-------------------+---------\/----\         /----------\    /---+-----\
      | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
      \-------------------+---------/\----/         \----------/    \---+-----/
                                    ^- 1029MB                       ^- 2001MB
      
      [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]
      
      And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
      is actually not present (would have to kick the balloon driver to put it in).
      
      When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
      Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
      of the PFN and the end PFN (263424 and 512256 respectively). The first step is
      to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
      covers 512^2 of page estate (1GB) and in case the start or end PFN is not
      aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
      end pfn.  We reserve_brk top leaf pages if they are missing (means they point
      to p2m_mid_missing).
      
      With the E820 example above, 263424 is not 1GB aligned so we allocate a
      reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
      Each entry in the allocate page is "missing" (points to p2m_missing).
      
      Next stage is to determine if we need to do a more granular boundary check
      on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
      We check if the start pfn and end pfn violate that boundary check, and if
      so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
      granularity of setting which PFNs are missing and which ones are identity.
      In our example 263424 and 512256 both fail the check so we reserve_brk two
      pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
      and assign them to p2m[1][2] and p2m[1][488] respectively.
      
      At this point we would at minimum reserve_brk one page, but could be up to
      three. Each call to set_phys_range_identity has at maximum a three page
      cost. If we were to query the P2M at this stage, all those entries from
      start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
      ("missing").
      
      The next step is to walk from the start pfn to the end pfn setting
      the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'.
      If we find that the middle leaf is pointing to p2m_missing we can swap it over
      to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
      do not need to worry about boundary aligment (so no need to reserve_brk a middle
      page, figure out which PFNs are "missing" and which ones are identity), as that
      has been done earlier.  If we find that the middle leaf is not occupied by
      p2m_identity or p2m_missing, we dereference that page (which covers
      512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
      263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
      p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.
      
      All other regions that are void (or not filled) either point to p2m_missing
      (considered missing) or have the default value of INVALID_P2M_ENTRY (also
      considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
      contain the INVALID_P2M_ENTRY value and are considered "missing."
      
      This is what the p2m ends up looking (for the E820 above) with this
      fabulous drawing:
      
         p2m         /--------------\
       /-----\       | &mfn_list[0],|                           /-----------------\
       |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
       |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
       |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
       |-----|    \                      | [p2m_identity]+\\    | ....            |
       |  2  |--\  \-------------------->|  ...          | \\   \----------------/
       |-----|   \                       \---------------/  \\
       |  3  |\   \                                          \\  p2m_identity
       |-----| \   \-------------------->/---------------\   /-----------------\
       | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
       \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
              / /---------------\        | ....          |   \-----------------/
             /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
            /   | IDENTITY[@256]|<----/  \---------------/
           /    | ~0, ~0, ....  |
          |     \---------------/
          |
          p2m_missing             p2m_missing
      /------------------\     /------------\
      | [p2m_mid_missing]+---->| ~0, ~0, ~0 |
      | [p2m_mid_missing]+---->| ..., ~0    |
      \------------------/     \------------/
      
      where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Changed code to use ranges, added ASCII art]
      [v6: Rebased on top of xen->p2m code split]
      [v4: Squished patches in just this one]
      [v7: Added RESERVE_BRK for potentially allocated pages]
      [v8: Fixed alignment problem]
      [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X]
      [v10: Copied git commit description in the p2m code + Add Review tag]
      [v11: Title had '2-1' - should be '1-1' mapping]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f4cec35b
    • S
      x86: ce4100: Set pci ops via callback instead of module init · 03150171
      Sebastian Andrzej Siewior 提交于
      Setting the pci ops on subsys initcall unconditionally will break
      multi platform kernels on anything except ce4100.
      
      Use x86_init.pci.init ops to call this only on real ce4100 platforms.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: sodaville@linutronix.de
      LKML-Reference: <20110314093340.GA21026@www.tglx.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      03150171
  3. 12 3月, 2011 12 次提交
    • T
      x86: Enable forced interrupt threading support · c0185808
      Thomas Gleixner 提交于
      All non threadeable interrupts are marked. Enable forced irq threading
      support.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c0185808
    • T
      x86: Mark low level interrupts IRQF_NO_THREAD · 9bbbff25
      Thomas Gleixner 提交于
      These cannot be threaded.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bbbff25
    • T
      x86: Use generic show_interrupts · 517e4981
      Thomas Gleixner 提交于
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      517e4981
    • T
      x86: ioapic: Avoid redundant lookup of irq_cfg · 1a0e62a4
      Thomas Gleixner 提交于
      The caller of ioapic_register_intr() has a pointer to the irq_cfg for
      the irq already. Hand it in to avoid a full lookup.
      
      In msi_compose_msg() the pointer to irq_cfg is already available. No
      need to look it up again.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1a0e62a4
    • T
      x86: ioapic: Use new move_irq functions · 08221110
      Thomas Gleixner 提交于
      Use the functions which take irq_data. We already have a pointer to
      irq_data. That avoids a sparse irq lookup in move_*_irq.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      08221110
    • T
      51c43ac6
    • T
      x86: ioapic: Use irq_data->state · 5451ddc5
      Thomas Gleixner 提交于
      Use the state information in irq_data. That avoids a radix-tree lookup
      from apic_ack_level() and simplifies setup_ioapic_dest().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5451ddc5
    • T
      x86: ioapic: Simplify irq chip and handler setup · c60eaf25
      Thomas Gleixner 提交于
      Use pointers instead of ugly multiline if/else constructs.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c60eaf25
    • T
      x86: Cleanup the genirq name space · 2c778651
      Thomas Gleixner 提交于
      genirq is switching to a consistent name space for the irq related
      functions. Convert x86. Conversion was done with coccinelle.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2c778651
    • T
      x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation · 56396e68
      Tejun Heo 提交于
      The distance transforming in numa_emulation() used to call
      numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node
      combinations regardless of which are enabled.  As numa_set_distance()
      ignores all out-of-bound distance settings, this doesn't cause any
      problem other than looping unnecessarily many times during boot.
      
      However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the
      code such that it iterates through only the enabled combinations.
      
      Yinghai Lu identified the issue and provided an initial patch to
      address the issue; however, the patch was incorrect in that it didn't
      build emulated distance table when there's no physical distance table
      and unnecessarily complex.
      
        http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      56396e68
    • A
      x86, binutils, xen: Fix another wrong size directive · 371c394a
      Alexander van Heukelum 提交于
      The latest binutils (2.21.0.20110302/Ubuntu) breaks the build
      yet another time, under CONFIG_XEN=y due to a .size directive that
      refers to a slightly differently named (hence, to the now very
      strict and unforgiving assembler, non-existent) symbol.
      
      [ mingo:
      
         This unnecessary build breakage caused by new binutils
         version 2.21 gets escallated back several kernel releases spanning
         several years of Linux history, affecting over 130,000 upstream
         kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially
         affecting all major Linux distro kernel configs).
      
         Git annotate tells us that this slight debug symbol code mismatch
         bug has been introduced in 2008 in commit 3d75e1b8:
      
           3d75e1b8        (Jeremy Fitzhardinge    2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)
      
         The 'bug' is just a slight assymetry in ENTRY()/END()
         debug-symbols sequences, with lots of assembly code between the
         ENTRY() and the END():
      
           ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)
             ...
           END(do_hypervisor_callback)
      
         Human reviewers almost never catch such small mismatches, and binutils
         never even warned about it either.
      
         This new binutils version thus breaks the Xen build on all upstream kernels
         since v2.6.27, out of the blue.
      
         This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels
         impossible on such binutils, for a bisection window of over hundred
         thousand historic commits. (!)
      
         This is a major fail on the side of binutils and binutils needs to turn
         this show-stopper build failure into a warning ASAP. ]
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      371c394a
    • K
      xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. · 86b32122
      Konrad Rzeszutek Wilk 提交于
      If we have a guest that asked for:
      
      memory=1024
      maxmem=2048
      
      Which means we want 1GB now, and create pagetables so that we can expand
      up to 2GB, we would have this E820 layout:
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
      [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      [    0.000000]  Xen: 0000000000100000 - 0000000080800000 (usable)
      
      Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps."
      we would mark the memory past the 1GB mark as unusuable resulting in:
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
      [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      [    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
      [    0.000000]  Xen: 0000000040000000 - 0000000080800000 (unusable)
      
      which meant that we could not balloon up anymore. We could
      balloon the guest down. The fix is to run the code introduced
      by the above mentioned patch only for the initial domain.
      
      We will have to revisit this once we start introducing a modified
      E820 for PCI passthrough so that we can utilize the P2M identity code.
      
      We also fix an overflow by having UL instead of ULL on 32-bit machines.
      
      [v2: Ian pointed to the overflow issue]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      86b32122
  4. 11 3月, 2011 12 次提交
  5. 10 3月, 2011 2 次提交
    • S
      ftrace/graph: Trace function entry before updating index · 722b3c74
      Steven Rostedt 提交于
      Currently the index to the ret_stack is updated and the real return address
      is saved in the ret_stack. Then we call the trace function. The trace
      function could decide that it doesn't want to trace this function
      (ex. set_graph_function does not match) and it will return 0 which means
      not to trace this call.
      
      The normal function graph tracer has this code:
      
      	if (!(trace->depth || ftrace_graph_addr(trace->func)) ||
      	      ftrace_graph_ignore_irqs())
      		return 0;
      
      What this states is, if the trace depth (which is curr_ret_stack)
      is zero (top of nested functions) then test if we want to trace this
      function. If this function is not to be traced, then return  0 and
      the rest of the function graph tracer logic will not trace this function.
      
      The problem arises when an interrupt comes in after we updated the
      curr_ret_stack. The next function that gets called will have a trace->depth
      of 1. Which fools this trace code into thinking that we are in a nested
      function, and that we should trace. This causes interrupts to be traced
      when they should not be.
      
      The solution is to trace the function first and then update the ret_stack.
      Reported-by: Nzhiping zhong <xzhong86@163.com>
      Reported-by: Nwu zhangjin <wuzhangjin@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      722b3c74
    • D
      tracing: Fix event alignment: kvm:kvm_hv_hypercall · d5bf2ff0
      David Sharp 提交于
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-8-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d5bf2ff0