1. 09 10月, 2015 9 次提交
  2. 07 10月, 2015 1 次提交
  3. 06 10月, 2015 8 次提交
    • D
      vfio: Allow hotplug of containers onto existing guest IOMMU mappings · 508ce5eb
      David Gibson 提交于
      At present the memory listener used by vfio to keep host IOMMU mappings
      in sync with the guest memory image assumes that if a guest IOMMU
      appears, then it has no existing mappings.
      
      This may not be true if a VFIO device is hotplugged onto a guest bus
      which didn't previously include a VFIO device, and which has existing
      guest IOMMU mappings.
      
      Therefore, use the memory_region_register_iommu_notifier_replay()
      function in order to fix this case, replaying existing guest IOMMU
      mappings, bringing the host IOMMU into sync with the guest IOMMU.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      508ce5eb
    • D
      vfio: Record host IOMMU's available IO page sizes · 7a140a57
      David Gibson 提交于
      Depending on the host IOMMU type we determine and record the available page
      sizes for IOMMU translation.  We'll need this for other validation in
      future patches.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      7a140a57
    • D
      vfio: Check guest IOVA ranges against host IOMMU capabilities · 3898aad3
      David Gibson 提交于
      The current vfio core code assumes that the host IOMMU is capable of
      mapping any IOVA the guest wants to use to where we need.  However, real
      IOMMUs generally only support translating a certain range of IOVAs (the
      "DMA window") not a full 64-bit address space.
      
      The common x86 IOMMUs support a wide enough range that guests are very
      unlikely to go beyond it in practice, however the IOMMU used on IBM Power
      machines - in the default configuration - supports only a much more limited
      IOVA range, usually 0..2GiB.
      
      If the guest attempts to set up an IOVA range that the host IOMMU can't
      map, qemu won't report an error until it actually attempts to map a bad
      IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
      visible IOMMU) then this will show up very quickly.  If there is a guest
      visible IOMMU, however, the problem might not show up until much later when
      the guest actually attempt to DMA with an IOVA the host can't handle.
      
      This patch adds a test so that we will detect earlier if the guest is
      attempting to use IOVA ranges that the host IOMMU won't be able to deal
      with.
      
      For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
      incorrect, but no worse than what we have already.  We can't do better for
      now because the Type1 kernel interface doesn't tell us what IOVA range the
      IOMMU actually supports.
      
      For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
      IOVA range and validate guest IOVA ranges against it, and this patch does
      so.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      3898aad3
    • D
      vfio: Generalize vfio_listener_region_add failure path · ac6dc389
      David Gibson 提交于
      If a DMA mapping operation fails in vfio_listener_region_add() it
      checks to see if we've already completed initial setup of the
      container.  If so it reports an error so the setup code can fail
      gracefully, otherwise throws a hw_error().
      
      There are other potential failure cases in vfio_listener_region_add()
      which could benefit from the same logic, so move it to its own
      fail: block.  Later patches can use this to extend other failure cases
      to fail as gracefully as possible under the circumstances.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      ac6dc389
    • D
      vfio: Remove unneeded union from VFIOContainer · ee0bf0e5
      David Gibson 提交于
      Currently the VFIOContainer iommu_data field contains a union with
      different information for different host iommu types.  However:
         * It only actually contains information for the x86-like "Type1" iommu
         * Because we have a common listener the Type1 fields are actually used
      on all IOMMU types, including the SPAPR TCE type as well
      
      In fact we now have a general structure for the listener which is unlikely
      to ever need per-iommu-type information, so this patch removes the union.
      
      In a similar way we can unify the setup of the vfio memory listener in
      vfio_connect_container() that is currently split across a switch on iommu
      type, but is effectively the same in both cases.
      
      The iommu_data.release pointer was only needed as a cleanup function
      which would handle potentially different data in the union.  With the
      union gone, it too can be removed.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      ee0bf0e5
    • E
      hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS · a5b39cd3
      Eric Auger 提交于
      In irqfd mode, current code attempts to set a resamplefd whatever
      the type of the IRQ. For an edge-sensitive IRQ this attempt fails
      and as a consequence, the whole irqfd setup fails and we fall back
      to the slow mode. This patch bypasses the resamplefd setting for
      non level-sentive IRQs.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a5b39cd3
    • E
      hw/vfio/platform: change interrupt/unmask fields into pointer · a22313de
      Eric Auger 提交于
      unmask EventNotifier might not be initialized in case of edge
      sensitive irq. Using EventNotifier pointers make life simpler to
      handle the edge-sensitive irqfd setup.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a22313de
    • E
      hw/vfio/platform: irqfd setup sequence update · 58892b44
      Eric Auger 提交于
      With current implementation, eventfd VFIO signaling is first set up and
      then irqfd is setup, if supported and allowed.
      
      This start sequence causes several issues with IRQ forwarding setup
      which, if supported, is transparently attempted on irqfd setup:
      IRQ forwarding setup is likely to fail if the IRQ is detected as under
      injection into the guest (active at irqchip level or VFIO masked).
      
      This currently always happens because the current sequence explicitly
      VFIO-masks the IRQ before setting irqfd.
      
      Even if that masking were removed, we couldn't prevent the case where
      the IRQ is under injection into the guest.
      
      So the simpler solution is to remove this 2-step startup and directly
      attempt irqfd setup. This is what this patch does.
      
      Also in case the eventfd setup fails, there is no reason to go farther:
      let's abort.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      58892b44
  4. 03 10月, 2015 6 次提交
  5. 02 10月, 2015 4 次提交
  6. 01 10月, 2015 5 次提交
  7. 25 9月, 2015 7 次提交
    • L
      bt: remove muldiv64() · fdfea124
      Laurent Vivier 提交于
      Originally, timers were ticks based, and it made sense to
      add ticks to current time to know when to trigger an alarm.
      
      But since commit:
      
      74475455 change all other clock references to use nanosecond resolution accessors
      
      All timers use nanoseconds and we need to convert ticks to nanoseconds.
      
      As get_ticks_per_sec() is 10^9,
      
          a = muldiv64(b, get_ticks_per_sec(), 100);
          y = muldiv64(x, get_ticks_per_sec(), 1000000);
      
      can be converted to
      
          a = b * 10000000;
          y = x * 1000;
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfea124
    • L
      hpet: remove muldiv64() · 0a4f9240
      Laurent Vivier 提交于
      hpet defines a clock period in femtoseconds but
      then converts it to nanoseconds to use the internal
      timers.
      
      We can define the period in nanoseconds and use it
      directly, this allows to remove muldiv64().
      
      We only need to convert the period to femtoseconds
      to put it in internal hpet capability register.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      0a4f9240
    • L
      openrisc: remove muldiv64() · ccaf1749
      Laurent Vivier 提交于
      Originally, timers were ticks based, and it made sense to
      add ticks to current time to know when to trigger an alarm.
      
      But since commit:
      
      74475455 change all other clock references to use nanosecond resolution accessors
      
      All timers use nanoseconds and we need to convert ticks to nanoseconds, by
      doing something like:
      
          y = muldiv64(x, get_ticks_per_sec(), TIMER_FREQ)
      
      where x is the number of device ticks and y the number of system ticks.
      
      y is used as nanoseconds in timer functions,
      it works because 1 tick is 1 nanosecond.
      (get_ticks_per_sec() is 10^9)
      
      But as openrisc timer frequency is 20 MHz, we can also do:
      
          y = x * 50; /* 20 MHz period is 50 ns */
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      ccaf1749
    • L
      mips: remove muldiv64() · 683dca6b
      Laurent Vivier 提交于
      Originally, timers were ticks based, and it made sense to
      add ticks to current time to know when to trigger an alarm.
      
      But since commit:
      
      74475455 change all other clock references to use nanosecond resolution accessors
      
      All timers use nanoseconds and we need to convert ticks to nanoseconds, by
      doing something like:
      
          y = muldiv64(x, get_ticks_per_sec(), TIMER_FREQ)
      
      where x is the number of device ticks and y the number of system ticks.
      
      y is used as nanoseconds in timer functions,
      it works because 1 tick is 1 nanosecond.
      (get_ticks_per_sec() is 10^9)
      
      But as MIPS timer frequency is 100 MHz, we can also do:
      
          y = x * 10; /* 100 MHz period is 10 ns */
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NLeon Alrae <leon.alrae@imgtec.com>
      683dca6b
    • L
      pcnet: remove muldiv64() · c6acbe86
      Laurent Vivier 提交于
      Originally, timers were ticks based, and it made sense to
      add ticks to current time to know when to trigger an alarm.
      
      But since commit:
      
      74475455 change all other clock references to use nanosecond resolution accessors
      
      All timers use nanoseconds and we need to convert ticks to nanoseconds, by
      doing something like:
      
          y = muldiv64(x, get_ticks_per_sec(), PCI_FREQUENCY)
      
      where x is the number of device ticks and y the number of system ticks.
      
      y is used as nanoseconds in timer functions,
      it works because 1 tick is 1 nanosecond.
      (get_ticks_per_sec() is 10^9)
      
      But as PCI frequency is 33 MHz, we can also do:
      
          y = x * 30; /* 33 MHz PCI period is 30 ns */
      
      Which is much more simple.
      
      This implies a 33.333333 MHz PCI frequency,
      but this is correct.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      c6acbe86
    • L
      rtl8139: remove muldiv64() · 37b9ab92
      Laurent Vivier 提交于
      Originally, timers were ticks based, and it made sense to
      add ticks to current time to know when to trigger an alarm.
      
      But since commit:
      
      74475455 change all other clock references to use nanosecond resolution accessors
      
      All timers use nanoseconds and we need to convert ticks to nanoseconds, by
      doing something like:
      
          y = muldiv64(x, get_ticks_per_sec(), PCI_FREQUENCY)
      
      where x is the number of device ticks and y the number of system ticks.
      
      y is used as nanoseconds in timer functions,
      it works because 1 tick is 1 nanosecond.
      (get_ticks_per_sec() is 10^9)
      
      But as PCI frequency is 33 MHz, we can also do:
      
          y = x * 30; /* 33 MHz PCI period is 30 ns */
      
      Which is much more simple.
      
      This implies a 33.333333 MHz PCI frequency,
      but this is correct.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      37b9ab92
    • L
      i6300esb: remove muldiv64() · 9491e9bc
      Laurent Vivier 提交于
      Originally, timers were ticks based, and it made sense to
      add ticks to current time to know when to trigger an alarm.
      
      But since commit:
      
      74475455 change all other clock references to use nanosecond resolution accessors
      
      All timers use nanoseconds and we need to convert ticks to nanoseconds, by
      doing something like:
      
          y = muldiv64(x, get_ticks_per_sec(), PCI_FREQUENCY)
      
      where x is the number of device ticks and y the number of system ticks.
      
      y is used as nanoseconds in timer functions,
      it works because 1 tick is 1 nanosecond.
      (get_ticks_per_sec() is 10^9)
      
      But as PCI frequency is 33 MHz, we can also do:
      
          y = x * 30; /* 33 MHz PCI period is 30 ns */
      
      Which is much more simple.
      
      This implies a 33.333333 MHz PCI frequency,
      but this is correct.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      9491e9bc