提交 · 06409bd91b0a40f30d2e41159627a6eb8f5ac322 · openeuler / qemu

14 6月, 2016 14 次提交

virtio-ccw: Provide traces for indicator changes · 06409bd9

由 Christian Borntraeger 提交于 6月 02, 2016

This allows to trace changes in the summary and queue indicators
for the non-irqfd case. For irqfd, kernel traces are needed instead.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

06409bd9

s390x/css: introduce property type for device ids · 06e686ea

由 Cornelia Huck 提交于 4月 01, 2016

Let's introduce a CssDevId to handle device ids of the xx.x.xxxx
type used for channel devices. This has some benefits:

- We can use them in virtio-ccw and split the validity checks for
  a channel device id in general from the constraint checking
  within the virtio-ccw scope.
- We can reuse the device id type for future non-virtio channel
  devices.

While we're at it, improve the validity checks and disallow e.g.
trailing characters.
Suggested-by: NDong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NDong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

06e686ea

s390x/css: clear IO irqs when generating IPI CRW · c1755b14

由 Halil Pasic 提交于 1月 27, 2016

According to the Principles of Operation (more precisely the subsection
'Channel-Report Word'), a subchannel put into the installed parameters
initialized state is in the same state as after an I/O system reset (just
parameters possibly changed). This implies that any I/O interrupts for that
subchannel are no longer pending (as I/O system resets clear I/O
interrupts). Therefore, we need an interface to clear pending I/O
interrupts. Make css_generate_sch_crws clear the pending IO interrupts for
the subchannel.
Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

c1755b14

s390x/kvm: add interface for clearing IO irqs · 9eccb862

由 Halil Pasic 提交于 1月 27, 2016

According to the platform specification, under certain conditions,
pending IO interruptions have to be cleared. Let's add an interface
for that.
Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>

9eccb862

C
linux-headers: update · ff804f15
由 Cornelia Huck 提交于 6月 07, 2016
```
Update to 4.7-rc2.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
```
ff804f15

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160614' into staging · a28aae04

由 Peter Maydell 提交于 6月 14, 2016

ppc patch queue for 2016-06-14

Latest patch queue for ppc.
    * Allow qemu to support a generic architecture 2.07 (POWER8-era)
      compatibility mode.  This is useful for guests which are POWER8
      aware, but don't know about the specific POWER8 variant that
      qemu (and/or KVM) is emulating. (Thomas Huth)
    * Fix a bug where macio wasn't removing DMA mappings (Mark Cave-Ayland)
    * Add a workaround for Linux guest's miscalculation of maximum
      memory address (including hotplugged memory), which could break
      when hotplug memory was combined with VFIO.  The previous
      approach was technically correct by spec, but differed from
      PowerVM's behaviour enough to trip a guest kernel bug.  This
      works around the bug, while remaining correct-to-spec. (Bharata Rao)

# gpg: Signature made Tue 14 Jun 2016 06:53:58 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160614:
  spapr: Ensure all LMBs are represented in ibm,dynamic-memory
  macio: call dma_memory_unmap() at the end of each DMA transfer
  Add PowerPC AT_HWCAP2 definitions
  ppc: Add PowerISA 2.07 compatibility mode
  ppc: Improve PCR bit selection in ppc_set_compat()
  ppc: Provide function to get CPU class of the host CPU
  ppc: Split pcr_mask settings into supported bits and the register mask
  ppc/spapr: Refactor h_client_architecture_support() CPU parsing code
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

a28aae04

spapr: Ensure all LMBs are represented in ibm,dynamic-memory · d0e5a8f2

由 Bharata B Rao 提交于 6月 10, 2016

Memory hotplug can fail for some combinations of RAM and maxmem when
DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
on maximum addressable memory returned by guest and this value is currently
being calculated wrongly by the guest kernel routine memory_hotplug_max().
While there is an attempt to fix the guest kernel, this patch works
around the problem within QEMU itself.

memory_hotplug_max() routine in the guest kernel arrives at max
addressable memory by multiplying lmb-size with the lmb-count obtained
from ibm,dynamic-memory property. There are two assumptions here:

- All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
  where only hot-pluggable LMBs are present in this property.
- The memory area comprising of RAM and hotplug region is contiguous: This
  needn't be true always for PowerKVM as there can be gap between
  boot time RAM and hotplug region.

To work around this guest kernel bug, ensure that ibm,dynamic-memory
has information about all the LMBs (RMA, boot-time LMBs, future
hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
hotpluggable region).

RMA is represented separately by memory@0 node. Hence mark RMA LMBs
and also the LMBs for the gap b/n RAM and hotpluggable region as
reserved and as having no valid DRC so that these LMBs are not considered
by the guest.
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

d0e5a8f2

macio: call dma_memory_unmap() at the end of each DMA transfer · bc9ca595

由 Mark Cave-Ayland 提交于 6月 10, 2016

This ensures that the underlying memory is marked dirty once the transfer
is complete and resolves cache coherency problems under MacOS 9.
Signed-off-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

bc9ca595

Add PowerPC AT_HWCAP2 definitions · 42bff477

由 Anton Blanchard 提交于 6月 07, 2016

We need the PPC_FEATURE2_HAS_HTM bit in a subsequent patch, so
add the PowerPC AT_HWCAP2 definitions.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

42bff477

ppc: Add PowerISA 2.07 compatibility mode · b30ff227

由 Thomas Huth 提交于 6月 07, 2016

Make sure that guests can use the PowerISA 2.07 CPU sPAPR
compatibility mode when they request it and the target CPU
supports it.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

b30ff227

ppc: Improve PCR bit selection in ppc_set_compat() · eac4fba9

由 Thomas Huth 提交于 6月 07, 2016

When using an olderr PowerISA level, all the upper compatibility
bits have to be enabled, too. For example when we want to run
something in PowerISA 2.05 compatibility mode on POWER8, the bit
for 2.06 has to be set beside the bit for 2.05.
Additionally, to make sure that we do not set bits that are not
supported by the host, we apply a mask with the known-to-be-good
bits here, too.
Signed-off-by: NThomas Huth <thuth@redhat.com>
[dwg: Added some #ifs to fix compile on 32-bit targets]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

eac4fba9

ppc: Provide function to get CPU class of the host CPU · 52b2519c

由 Thomas Huth 提交于 6月 07, 2016

When running with KVM, we might be interested in some details
of the host CPU class, too, so provide a function to get the
corresponding CPU class.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

52b2519c

ppc: Split pcr_mask settings into supported bits and the register mask · 8cd2ce7a

由 Thomas Huth 提交于 6月 07, 2016

The current pcr_mask values are ambiguous: Should these be the mask
that defines valid bits in the PCR register? Or should these rather
indicate which compatibility levels are possible? Anyway, POWER6 and
POWER7 should certainly not use the same values here. So let's
introduce an additional variable "pcr_supported" here which is
used to indicate the valid compatibility levels, and use pcr_mask
to signal the valid bits in the PCR register.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

8cd2ce7a

ppc/spapr: Refactor h_client_architecture_support() CPU parsing code · 7386ae63

由 Thomas Huth 提交于 6月 07, 2016

The h_client_architecture_support() function has become quite big
and nested already. So factor out the code that takes care of the
sPAPR compatibility PVRs (which will be modified by the following
patches).
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

7386ae63

13 6月, 2016 14 次提交

Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160613-1' into staging · 2c96c379

由 Peter Maydell 提交于 6月 13, 2016

usb: misc fixes.

# gpg: Signature made Mon 13 Jun 2016 14:09:15 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-usb-20160613-1:
  vl: Eliminate usb_enabled()
  pxa2xx: Unconditionally enable USB controller
  hw/usb/dev-network.c: Use ldl_le_p() and stl_le_p()
  usb-host: add special case for bus+addr
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

2c96c379

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-2016-06-13-v1' into staging · 55e5c3a2

由 Peter Maydell 提交于 6月 13, 2016

Merge qcrypto-next 2016/06/13 v1

# gpg: Signature made Mon 13 Jun 2016 12:43:22 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/qcrypto-next-2016-06-13-v1:
  crypto: aes: always rename internal symbols
  crypto: assert that qcrypto_hash_digest_len is in range
  crypto: remove temp files on completion of secrets test
  TLS: provide slightly more information when TLS certificate loading fails
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

55e5c3a2

crypto: aes: always rename internal symbols · c8d70e59

由 Mike Frysinger 提交于 6月 06, 2016

OpenSSL's libcrypto always defines AES symbols with the same names as
qemu's local aes code.  This is problematic when enabling at least curl
as that frequently also uses libcrypto.  It might not be noticed when
running, but if you try to statically link, everything falls down.

An example snippet:
  LINK  qemu-nbd
.../libcrypto.a(aes-x86_64.o): In function 'AES_encrypt':
(.text+0x460): multiple definition of 'AES_encrypt'
crypto/aes.o:aes.c:(.text+0x670): first defined here
.../libcrypto.a(aes-x86_64.o): In function 'AES_decrypt':
(.text+0x9f0): multiple definition of 'AES_decrypt'
crypto/aes.o:aes.c:(.text+0xb30): first defined here
.../libcrypto.a(aes-x86_64.o): In function 'AES_cbc_encrypt':
(.text+0xf90): multiple definition of 'AES_cbc_encrypt'
crypto/aes.o:aes.c:(.text+0xff0): first defined here
collect2: error: ld returned 1 exit status
.../qemu-2.6.0/rules.mak:105: recipe for target 'qemu-nbd' failed
make: *** [qemu-nbd] Error 1

The aes.h header has redefines already for FreeBSD, but go ahead and
enable that for everyone since there's no real good reason to not use
a namespace all the time.
Signed-off-by: NMike Frysinger <vapier@chromium.org>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

c8d70e59

crypto: assert that qcrypto_hash_digest_len is in range · b35c1f33

由 Paolo Bonzini 提交于 5月 20, 2016

Otherwise unintended results could happen.  For example,
Coverity reports a division by zero in qcrypto_afsplit_hash.
While this cannot really happen, it shows that the contract
of qcrypto_hash_digest_len can be improved.
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

b35c1f33

crypto: remove temp files on completion of secrets test · e7ed11f0

由 Daniel P. Berrange 提交于 4月 26, 2016

The secret object tests left some temporary files on disk
when completing. Ensure they are unlink, and rename them
to make it more obvious where they come from.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

e7ed11f0

TLS: provide slightly more information when TLS certificate loading fails · b7b68166

由 Alex Bligh 提交于 4月 05, 2016

Give slightly more information when certification loading fails.
Rather than have no information, you now get gnutls's only slightly
less unhelpful error messages.
Signed-off-by: NAlex Bligh <alex@alex.org.uk>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

b7b68166

vl: Eliminate usb_enabled() · 4bcbe0b6

由 Eduardo Habkost 提交于 6月 08, 2016

This wrapper for machine_usb(current_machine) is not necessary,
replace all usages of usb_enabled() with machine_usb().

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: qemu-arm@nongnu.org
Cc: qemu-ppc@nongnu.org
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
Reviewed-by: NMarcel Apfelbaum <marcel@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Message-id: 1465419025-21519-3-git-send-email-ehabkost@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>

4bcbe0b6

pxa2xx: Unconditionally enable USB controller · c92cfba8

由 Eduardo Habkost 提交于 6月 08, 2016

Simplify initialization logic by removing the usb_enabled()
check. The USB controller is part of the SoC, so it doesn't make
sense to create a system where it is not present.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: qemu-arm@nongnu.org,
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 1465419025-21519-2-git-send-email-ehabkost@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>

c92cfba8

hw/usb/dev-network.c: Use ldl_le_p() and stl_le_p() · ec9125bc

由 Peter Maydell 提交于 6月 10, 2016

Use stl_le_p() and ldl_le_p() to read and write data from
buffers, rather than using pointer casts and cpu_to_le32()
for writes and le32_to_cpup() for reads. This:
 * avoids lots of casts
 * works even if the buffer isn't as aligned as the host would like
 * avoids using the *_to_cpup() functions which we want to get rid of

Note that there may still be some places where a pointer from the
guest is cast to a pointer to a host structure; these would also
have to be changed for the device to work on a host CPU which
enforces alignment restrictions.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-id: 1465573077-29221-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>

ec9125bc

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20160613-tag' into staging · 8fdf0387

由 Peter Maydell 提交于 6月 13, 2016

Xen 2016/06/13

# gpg: Signature made Mon 13 Jun 2016 11:53:18 BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20160613-tag:
  Introduce "xen-load-devices-state"
  exec: Fix qemu_ram_block_from_host for Xen
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

8fdf0387

usb-host: add special case for bus+addr · e058fa2d

由 Gerd Hoffmann 提交于 6月 03, 2016

This patch changes usb-host behavior in case we hostbus= and hostaddr=
properties are used to identify the usb device in question. Instead of
adding the device to the hotplug watchlist we try to open directly using
the given bus number and device address.

Putting a device specified by hostaddr to the hotplug watchlist isn't
a great idea as the address isn't a fixed property. It changes every
time the device is plugged in. So considering this case as "use the
device at bus:addr _now_" is more sane. Also usb-host will throw errors
in case it can't initialize the host device.

Note: For devices on the hotplug watchlist (hostport or vendorid or
productid specified) qemu continues to ignore errors and keeps
monitoring the usb bus to see if the device eventually shows up.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Message-id: 1464945175-28939-1-git-send-email-kraxel@redhat.com

e058fa2d

Introduce "xen-load-devices-state" · 88c16567

由 Wen Congyang 提交于 6月 03, 2016

Introduce a "xen-load-devices-state" QAPI command that can be used to
load the state of all devices, but not the RAM or the block devices of
the VM.

We only have hmp commands savevm/loadvm, and qmp commands
xen-save-devices-state.

We use this new command for COLO:
1. suspend both primary vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm

In such case, we need to update all devices' state in any time.
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: NAnthony PERARD <anthony.perard@citrix.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NStefano Stabellini <sstabellini@kernel.org>

88c16567

exec: Fix qemu_ram_block_from_host for Xen · d6b6aec4

由 Anthony PERARD 提交于 6月 09, 2016

Since f615f396 (exec: remove ram_addr argument from
qemu_ram_block_from_host), migration under Xen is likely to fail, with a
SEGV of QEMU. But the commit only reveal a bug with the calculation of
the offset value in qemu_ram_block_from_host().

This patch calculates the offset from the ram_addr as
qemu_ram_addr_from_host() will later calculate the ram_addr from the
offset.
Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NStefano Stabellini <sstabellini@kernel.org>

d6b6aec4

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20160611' into staging · da2fdd0b

由 Peter Maydell 提交于 6月 13, 2016

TB hashing improvements

# gpg: Signature made Sun 12 Jun 2016 01:12:50 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-tcg-20160611:
  translate-all: add tb hash bucket info to 'info jit' dump
  tb hash: track translated blocks with qht
  qht: add test-qht-par to invoke qht-bench from 'check' target
  qht: add qht-bench, a performance benchmark
  qht: add test program
  qht: QEMU's fast, resizable and scalable Hash Table
  qdist: add test program
  qdist: add module to represent frequency distributions of data
  tb hash: hash phys_pc, pc, and flags with xxhash
  exec: add tb_hash_func5, derived from xxhash
  qemu-thread: add simple test-and-set spinlock
  include/processor.h: define cpu_relax()
  seqlock: rename write_lock/unlock to write_begin/end
  seqlock: remove optional mutex
  compiler.h: add QEMU_ALIGNED() to enforce struct alignment
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

da2fdd0b

12 6月, 2016 12 次提交

translate-all: add tb hash bucket info to 'info jit' dump · 329844d4

由 Emilio G. Cota 提交于 6月 08, 2016

Examples:

- Good hashing, i.e. tb_hash_func5(phys_pc, pc, flags):
TB count            715135/2684354
[...]
TB hash buckets     388775/524288 (74.15% head buckets used)
TB hash occupancy   33.04% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[90,100]%
TB hash avg chain   1.017 buckets. Histogram: 1|█▁▁|3

- Not-so-good hashing, i.e. tb_hash_func5(phys_pc, pc, 0):
TB count            712636/2684354
[...]
TB hash buckets     344924/524288 (65.79% head buckets used)
TB hash occupancy   31.64% avg chain occ. Histogram: [0,10)%|█ ▆  ▅▁▃▁▂|[90,100]%
TB hash avg chain   1.047 buckets. Histogram: 1|█▁▁▁|4

- Bad hashing, i.e. tb_hash_func5(phys_pc, 0, 0):
TB count            702818/2684354
[...]
TB hash buckets     112741/524288 (21.50% head buckets used)
TB hash occupancy   10.15% avg chain occ. Histogram: [0,10)%|█ ▁  ▁▁▁▁▁|[90,100]%
TB hash avg chain   2.107 buckets. Histogram: [1.0,10.2)|█▁▁▁▁▁▁▁▁▁|[83.8,93.0]

- Good hashing, but no auto-resize:
TB count            715634/2684354
TB hash buckets     8192/8192 (100.00% head buckets used)
TB hash occupancy   98.30% avg chain occ. Histogram: [95.3,95.8)%|▁▁▃▄▃▄▁▇▁█|[99.5,100.0]%
TB hash avg chain   22.070 buckets. Histogram: [15.0,16.7)|▁▂▅▄█▅▁▁▁▁|[30.3,32.0]
Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Suggested-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-16-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

329844d4

tb hash: track translated blocks with qht · 909eaac9

由 Emilio G. Cota 提交于 6月 08, 2016

Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.

Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.

The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:

- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
              MRU is implemented here by adding an intermediate struct
              that contains the u32 hash and a pointer to the TB; this
              allows us, on an MRU promotion, to copy said struct (that is not
              at the head), and put this new copy at the head. After a grace
              period, the original non-head struct can be eliminated, and
              after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
                   no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
                 no MRU for lookups; MRU for inserts.

The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
  http://imgur.com/a/Awgnq

Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.

                            Host: Intel Xeon E5-2690

  28 ++------------+-------------+-------------+-------------+------------++
     A*****        +             +             +             master **A*** +
  27 ++    *                                                 xxhash ##B###++
     |      A******A******                               xxhash-rcu $$C$$$ |
  26 C$$                  A******A******            qht-fixed-nomru*%%D%%%++
     D%%$$                              A******A******A*qht-dyn-mru A*E****A
  25 ++ %%$$                                          qht-dyn-nomru &&F&&&++
     B#####%                                                               |
  24 ++    #C$$$$$                                                        ++
     |      B###  $                                                        |
     |          ## C$$$$$$                                                 |
  23 ++           #       C$$$$$$                                         ++
     |             B######       C$$$$$$                                %%%D
  22 ++                  %B######       C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
     |                    D%%%%%%B######      @E@@@@@@    %%%D%%%@@@E@@@@@@E
  21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
     +             E@@@   F&&&   +      E@     +      F&&&   +             +
  20 ++------------+-------------+-------------+-------------+------------++
     14            16            18            20            22            24
                             log2 number of buckets

                                 Host: Intel i7-4790K

  14.5 ++------------+------------+-------------+------------+------------++
       A**           +            +             +            master **A*** +
    14 ++ **                                                 xxhash ##B###++
  13.5 ++   **                                           xxhash-rcu $$C$$$++
       |                                            qht-fixed-nomru %%D%%% |
    13 ++     A******                                   qht-dyn-mru @@E@@@++
       |             A*****A******A******             qht-dyn-nomru &&F&&& |
  12.5 C$$                               A******A******A*****A******    ***A
    12 ++ $$                                                        A***  ++
       D%%% $$                                                             |
  11.5 ++  %%                                                             ++
       B###  %C$$$$$$                                                      |
    11 ++  ## D%%%%% C$$$$$                                               ++
       |     #      %      C$$$$$$                                         |
  10.5 F&&&&&&B######D%%%%%       C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$    $$$C
    10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
       +             F&&          D%%%%%%B######B######B#####B###@@@D%%%   +
   9.5 ++------------+------------+-------------+------------+------------++
       14            16           18            20           22            24
                              log2 number of buckets

Note that the original point before this patch series is X=15 for "master";
the little sensitivity to the increased number of buckets is due to the
poor hashing function in master.

xxhash-rcu has significant overhead due to the constant churn of allocating
and deallocating intermediate structs for implementing MRU. An alternative
would be do consider failed lookups as "maybe not there", and then
acquire the external lock (tb_lock in this case) to really confirm that
there was indeed a failed lookup. This, however, would not be enough
to implement dynamic resizing--this is more complex: see
"Resizable, Scalable, Concurrent Hash Tables via Relativistic
Programming" by Triplett, McKenney and Walpole. This solution was
discarded due to the very coarse RCU read critical sections that we have
in MTTCG; resizing requires waiting for readers after every pointer update,
and resizes require many pointer updates, so this would quickly become
prohibitive.

qht-fixed-nomru shows that MRU promotion is advisable for undersized
hash tables.

However, qht-dyn-mru shows that MRU promotion is not important if the
hash table is properly sized: there is virtually no difference in
performance between qht-dyn-nomru and qht-dyn-mru.

Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
can achieve with optimum sizing of the hash table, while keeping the hash
table scalable for readers.

The improvement we get before and after this patch for booting debian jessie
with arm-softmmu is:

- Intel Xeon E5-2690: 10.5% less time
- Intel i7-4790K: 5.2% less time

We could get this same improvement _for this particular workload_ by
statically increasing the size of the hash table. But this would hurt
workloads that do not need a large hash table. The dynamic (upward)
resizing allows us to start small and enlarge the hash table as needed.

A quick note on downsizing: the table is resized back to 2**15 buckets
on every tb_flush; this makes sense because it is not guaranteed that the
table will reach the same number of TBs later on (e.g. most bootup code is
thrown away after boot); it makes sense to grow the hash table as
more code blocks are translated. This also avoids the complication of
having to build downsizing hysteresis logic into qht.
Reviewed-by: NSergey Fedorov <serge.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

909eaac9

qht: add test-qht-par to invoke qht-bench from 'check' target · 896a9ee9

由 Emilio G. Cota 提交于 6月 08, 2016

Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-14-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

896a9ee9

qht: add qht-bench, a performance benchmark · 515864a0

由 Emilio G. Cota 提交于 6月 08, 2016

This serves as a performance benchmark as well as a stress test
for QHT. We can tweak quite a number of things, including the
number of resize threads and how frequently resizes are triggered.

A performance comparison of QHT vs CLHT[1] and ck_hs[2] using
this same benchmark program can be found here:
  http://imgur.com/a/0Bms4

The tests are run on a 64-core AMD Opteron 6376, pinning threads
to cores favoring same-socket cores. For each run, qht-bench is
invoked with:
  $ tests/qht-bench -d $duration -n $n -u $u -g $range
, where $duration is in seconds, $n is the number of threads,
$u is the update rate (0.0 to 100.0), and $range is the number
of keys.

Note that ck_hs's performance drops significantly as writes go
up, since it requires an external lock (I used a ck_spinlock)
around every write.

Also, note that CLHT instead of using a seqlock, relies on an
allocator that does not ever return the same address during the
same read-critical section. This gives it a slight performance
advantage over QHT on read-heavy workloads, since the seqlock
writes aren't there.

[1] CLHT: https://github.com/LPD-EPFL/CLHT
          https://infoscience.epfl.ch/record/207109/files/ascy_asplos15.pdf

[2] ck_hs: http://concurrencykit.org/
           http://backtrace.io/blog/blog/2015/03/13/workload-specialization/

A few of those plots are shown in text here, since that site
might not be online forever. Throughput is on Mops/s on the Y axis.

                             200K keys, 0 % updates

  450 ++--+------+------+-------+-------+-------+-------+------+-------+--++
      |   +      +      +       +       +       +       +      +      +N+  |
  400 ++                                                           ---+E+ ++
      |                                                       +++----      |
  350 ++          9 ++------+------++                       --+E+    -+H+ ++
      |             |      +H+-     |                 -+N+----   ---- +++  |
  300 ++          8 ++     +E+     ++             -----+E+  --+H+         ++
      |             |      +++      |         -+N+-----+H+--               |
  250 ++          7 ++------+------++  +++-----+E+----                    ++
  200 ++                    1         -+E+-----+H+                        ++
      |                           ----                     qht +-E--+      |
  150 ++                      -+E+                        clht +-H--+     ++
      |                   ----                              ck +-N--+      |
  100 ++               +E+                                                ++
      |            ----                                                    |
   50 ++       -+E+                                                       ++
      |   +E+E+  +      +       +       +       +       +      +       +   |
    0 ++--E------+------+-------+-------+-------+-------+------+-------+--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

                             200K keys, 1 % updates

  350 ++--+------+------+-------+-------+-------+-------+------+-------+--++
      |   +      +      +       +       +       +       +      +     -+E+  |
  300 ++                                                         -----+H+ ++
      |                                                       +E+--        |
      |           9 ++------+------++                  +++----             |
  250 ++            |      +E+   -- |                 -+E+                ++
      |           8 ++         --  ++             ----                     |
  200 ++            |      +++-     |  +++  ---+E+                        ++
      |           7 ++------N------++ -+E+--               qht +-E--+      |
      |                     1  +++----                    clht +-H--+      |
  150 ++                      -+E+                          ck +-N--+     ++
      |                   ----                                             |
  100 ++               +E+                                                ++
      |            ----                                                    |
      |        -+E+                                                        |
   50 ++    +H+-+N+----+N+-----+N+------                                  ++
      |   +E+E+  +      +       +      +N+-----+N+-----+N+----+N+-----+N+  |
    0 ++--E------+------+-------+-------+-------+-------+------+-------+--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

                             200K keys, 20 % updates

  300 ++--+------+------+-------+-------+-------+-------+------+-------+--++
      |   +      +      +       +       +       +       +      +       +   |
      |                                                              -+H+  |
  250 ++                                                         ----     ++
      |           9 ++------+------++                       --+H+  ---+E+  |
      |           8 ++     +H+--   ++                 -+H+----+E+--        |
  200 ++            |      +E+    --|             -----+E+--  +++         ++
      |           7 ++      + ---- ++       ---+H+---- +++ qht +-E--+      |
  150 ++          6 ++------N------++ -+H+-----+E+        clht +-H--+     ++
      |                     1     -----+E+--                ck +-N--+      |
      |                       -+H+----                                     |
  100 ++                  -----+E+                                        ++
      |                +E+--                                               |
      |            ----+++                                                 |
   50 ++       -+E+                                                       ++
      |     +E+ +++                                                        |
      |   +E+N+-+N+-----+       +       +       +       +      +       +   |
    0 ++--E------+------N-------N-------N-------N-------N------N-------N--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

                            200K keys, 100 % updates       qht +-E--+
                                                          clht +-H--+
  160 ++--+------+------+-------+-------+-------+-------+---ck-+-N-----+--++
      |   +      +      +       +       +       +       +      +   ----H   |
  140 ++                                                      +H+--  -+E+ ++
      |                                                +++----   ----      |
  120 ++          8 ++------+------++                 -+H+    +E+         ++
      |           7 ++     +H+---- ++             ---- +++----             |
  100 ++            |      +E+      |  +++  ---+H+    -+E+                ++
      |           6 ++     +++     ++ -+H+--   +++----                     |
   80 ++          5 ++------N----------+E+-----+E+                        ++
      |                     1 -+H+---- +++                                 |
      |                   -----+E+                                         |
   60 ++               +H+---- +++                                        ++
      |            ----+E+                                                 |
   40 ++        +H+----                                                   ++
      |       --+E+                                                        |
   20 ++    +E+                                                           ++
      |  +EE+    +      +       +       +       +       +      +       +   |
    0 ++--+N-N---N------N-------N-------N-------N-------N------N-------N--++
          1      8      16      24      32      40      48     56      64
                                Number of threads
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-13-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

515864a0

qht: add test program · 1a95404f

由 Emilio G. Cota 提交于 6月 08, 2016

Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-12-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

1a95404f

qht: QEMU's fast, resizable and scalable Hash Table · 2e11264a

由 Emilio G. Cota 提交于 6月 08, 2016

This is a fast, scalable chained hash table with optional auto-resizing, allowing
reads that are concurrent with reads, and reads/writes that are concurrent
with writes to separate buckets.

A hash table with these features will be necessary for the scalability
of the ongoing MTTCG work; before those changes arrive we can already
benefit from the single-threaded speedup that qht also provides.
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-11-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

2e11264a

qdist: add test program · ff9249b7

由 Emilio G. Cota 提交于 6月 08, 2016

Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-10-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ff9249b7

qdist: add module to represent frequency distributions of data · bf3afd5f

由 Emilio G. Cota 提交于 6月 08, 2016

Sometimes it is useful to have a quick histogram to represent a certain
distribution -- for example, when investigating a performance regression
in a hash table due to inadequate hashing.

The appended allows us to easily represent a distribution using Unicode
characters. Further, the data structure keeping track of the distribution
is so simple that obtaining its values for off-line processing is trivial.

Example, taking the last 10 commits to QEMU:

 Characters in commit title  Count
-----------------------------------
                         39      1
                         48      1
                         53      1
                         54      2
                         57      1
                         61      1
                         67      1
                         78      1
                         80      1
qdist_init(&dist);
qdist_inc(&dist, 39);
[...]
qdist_inc(&dist, 80);

char *str = qdist_pr(&dist, 9, QDIST_PR_LABELS);
// -> [39.0,43.6)▂▂ █▂ ▂ ▄[75.4,80.0]
g_free(str);

char *str = qdist_pr(&dist, 4, QDIST_PR_LABELS);
// -> [39.0,49.2)▁█▁▁[69.8,80.0]
g_free(str);
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-9-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

bf3afd5f

tb hash: hash phys_pc, pc, and flags with xxhash · 42bd3228

由 Emilio G. Cota 提交于 6月 08, 2016

For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
  https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html

To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.

The appended addresses this with two changes:

1) Use xxhash as the hash table's hash function. xxhash is a fast,
   high-quality hashing function.

2) Feed the hashing function with not just tb_phys, but also pc and flags.

This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.

Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though. UPDATE:
On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote:
> The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB
> consisting of only a delay slot).
> It may well still turn out to be reasonable to ignore cs_base for hashing.

BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.

This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time

Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

42bd3228

exec: add tb_hash_func5, derived from xxhash · dc8b295d

由 Emilio G. Cota 提交于 6月 08, 2016

This will be used by upcoming changes for hashing the tb hash.

Add this into a separate file to include the copyright notice from
xxhash.
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-7-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

dc8b295d

qemu-thread: add simple test-and-set spinlock · ac9a9eba

由 Guillaume Delbergue 提交于 6月 08, 2016

Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NGuillaume Delbergue <guillaume.delbergue@greensocs.com>
[Rewritten. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Emilio's additions: use TAS instead of atomic_xchg; emit acquire/release
 barriers; return bool from trylock; call cpu_relax() while spinning;
 optimize for uncontended locks by acquiring the lock with TAS instead
 of TATAS; add qemu_spin_locked().]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-6-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ac9a9eba

include/processor.h: define cpu_relax() · 462cda50

由 Emilio G. Cota 提交于 6月 08, 2016

Taken from the linux kernel.
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-5-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

462cda50