提交 · 7df69deadf2f25c19b3ac121404ee31f71dce845 · openeuler / qemu

27 7月, 2015 5 次提交

tcg: correctly mark dead inputs for mov with a constant · 7df69dea

由 Aurelien Jarno 提交于 7月 27, 2015

When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark
a temp as dead if the liveness analysis hints so. This fixes the
following assert when configure with --enable-debug-tcg:

qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed.

Cc: Richard Henderson <rth@twiddle.net>
Reported-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7df69dea

Merge remote-tracking branch 'remotes/jnsnow/tags/cve-2015-5154-pull-request' into staging · e40db4c6

由 Peter Maydell 提交于 7月 27, 2015

# gpg: Signature made Mon Jul 27 13:01:10 2015 BST using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/cve-2015-5154-pull-request:
  ide: Clear DRQ after handling all expected accesses
  ide/atapi: Fix START STOP UNIT command completion
  ide: Check array bounds before writing to io_buffer (CVE-2015-5154)
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

e40db4c6

ide: Clear DRQ after handling all expected accesses · cb72cba8

由 Kevin Wolf 提交于 7月 26, 2015

This is additional hardening against an end_transfer_func that fails to
clear the DRQ status bit. The bit must be unset as soon as the PIO
transfer has completed, so it's better to do this in a central place
instead of duplicating the code in all commands (and forgetting it in
some).
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>

cb72cba8

ide/atapi: Fix START STOP UNIT command completion · 03441c3a

由 Kevin Wolf 提交于 7月 26, 2015

The command must be completed on all code paths. START STOP UNIT with
pwrcnd set should succeed without doing anything.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>

03441c3a

ide: Check array bounds before writing to io_buffer (CVE-2015-5154) · d2ff8585

由 Kevin Wolf 提交于 7月 26, 2015

If the end_transfer_func of a command is called because enough data has
been read or written for the current PIO transfer, and it fails to
correctly call the command completion functions, the DRQ bit in the
status register and s->end_transfer_func may remain set. This allows the
guest to access further bytes in s->io_buffer beyond s->data_end, and
eventually overflowing the io_buffer.

One case where this currently happens is emulation of the ATAPI command
START STOP UNIT.

This patch fixes the problem by adding explicit array bounds checks
before accessing the buffer instead of relying on end_transfer_func to
function correctly.

Cc: qemu-stable@nongnu.org
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>

d2ff8585

24 7月, 2015 17 次提交

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging · f793d97e

由 Peter Maydell 提交于 7月 24, 2015

* qemu-char fixes
* SCSI fixes (including CVE-2015-5158)
* RCU fixes
* Framebuffer logic to set DIRTY_MEMORY_VGA
* Fix compiler warning for --disable-vnc
* qemu-doc fixes
* x86 TCG pasto fix

# gpg: Signature made Fri Jul 24 12:57:52 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  target-i386/FPU: a misprint in helper_fistll_ST0
  qemu-doc: fix typos
  framebuffer: set DIRTY_MEMORY_VGA on RAM that is used for the framebuffer
  memory: count number of active VGA logging clients
  vl: Fix compiler warning for builds without VNC
  scsi: Handle no media case for scsi_get_configuration
  rcu: actually register threads that have RCU read-side critical sections
  scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158)
  vnc: fix memory leak
  qemu-char: Fix missed data on unix socket
  qemu-char: handle EINTR for TCP character devices
  exec.c: Use atomic_rcu_read() to access dispatch in memory_region_section_get_iotlb()
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

f793d97e

target-i386/FPU: a misprint in helper_fistll_ST0 · 178846bd

由 Dmitry Poletaev 提交于 7月 08, 2015

There is a cut-and-paste mistake in the patch
https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg01657.html .
It cause errors in guest work.  Here is the bugfix.
Signed-off-by: NDmitry Poletaev <poletaev-qemu@yandex.ru>
Reported-by: NKirill Batuzov <batuzovk@ispras.ru>
Message-Id: <2692911436348920@web2m.yandex.ru>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

178846bd

qemu-doc: fix typos · d274e07c

由 Gonglei 提交于 7月 03, 2015

Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Message-Id: <1435917057-9396-1-git-send-email-arei.gonglei@huawei.com>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d274e07c

framebuffer: set DIRTY_MEMORY_VGA on RAM that is used for the framebuffer · c1076c3e

由 Paolo Bonzini 提交于 7月 13, 2015

The MemoryRegionSection contains enough information to access the
RAM region underlying the framebuffer, and can be cached inside the
display device.

By doing this, the new framebuffer_update_memory_section function can
enable dirty memory logging on the relevant RAM region.  The function
must be called whenever the stride or base of the framebuffer changes;
a simple way to cover these cases is to call it on every full frame
invalidation, which is a rare case.

framebuffer_update_display now works entirely on a MemoryRegionSection,
without going through cpu_physical_memory_map/unmap.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1076c3e

memory: count number of active VGA logging clients · deb809ed

由 Paolo Bonzini 提交于 7月 14, 2015

For a board that has multiple framebuffer devices, both of them
might want to use DIRTY_MEMORY_VGA on the same memory region.
The lack of reference counting in memory_region_set_log makes
this very awkward to implement.
Suggested-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

deb809ed

vl: Fix compiler warning for builds without VNC · fb430969

由 Stefan Weil 提交于 7月 22, 2015

This regression was caused by commit 70b94331.

  CC    vl.o
vl.c: In function ‘select_display’:
vl.c:2064:12: error: unused variable ‘err’ [-Werror=unused-variable]
     Error *err = NULL;
            ^
Reported-by: NClaudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: NStefan Weil <sw@weilnetz.de>
Message-Id: <1437587610-26433-1-git-send-email-sw@weilnetz.de>
Reviewed-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb430969

scsi: Handle no media case for scsi_get_configuration · 7d99f4c1

由 Matthew Rosato 提交于 7月 15, 2015

Currently, scsi_get_configuration always returns a current
profile (DVD or CD), even when there is actually no media present.
By comparison, ide/atapi uses a default profile of 0 (MMC_PROFILE_NONE)
for this case and checks for tray_open, so let's do the same for scsi.

This fixes a problem I'm seeing with Fedora 22 guests where systemd
cdrom_id fails to unmount after a QEMU-initiated eject against a
scsi cdrom device because it believes the media is still present
(but unreadable).
Signed-off-by: NMatthew Rosato <mjrosato@linux.vnet.ibm.com>
Message-Id: <1436986352-10695-1-git-send-email-mjrosato@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d99f4c1

P
rcu: actually register threads that have RCU read-side critical sections · ab28bd23
由 Paolo Bonzini 提交于 7月 09, 2015
```
Otherwise, grace periods are detected too early!
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
ab28bd23

scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158) · c170aad8

由 Paolo Bonzini 提交于 7月 21, 2015

This is a guest-triggerable buffer overflow present in QEMU 2.2.0
and newer.  scsi_cdb_length returns -1 as an error value, but the
caller does not check it.

Luckily, the massive overflow means that QEMU will just SIGSEGV,
making the impact much smaller.
Reported-by: NZhu Donghai (朱东海) <donghai.zdh@alibaba-inc.com>
Fixes: 1894df02Reviewed-by: NFam Zheng <famz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c170aad8

vnc: fix memory leak · 60928458

由 Gonglei 提交于 7月 22, 2015

If vnc's password is configured, it will leak memory
which cipher variable pointed on every vnc connection.

Cc: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDaniel P. Berrange <berrange@redhat.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Message-Id: <1437556133-11268-1-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

60928458

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150723' into staging · 30fdfae4

由 Peter Maydell 提交于 7月 24, 2015

Last minute fixes for 2.4.

# gpg: Signature made Fri Jul 24 04:42:31 2015 BST using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20150723:
  tcg/optimize: fix tcg_opt_gen_movi
  tcg/aarch64: use 32-bit offset for 32-bit softmmu emulation
  tcg/aarch64: use 32-bit offset for 32-bit user-mode emulation
  tcg/aarch64: add ext argument to tcg_out_insn_3310
  tcg/i386: Extend addresses for 32-bit guests
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

30fdfae4

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20150723.0' into staging · f75b7098

由 Peter Maydell 提交于 7月 24, 2015

VFIO fixes for v2.4.0-rc3
- Fix Realtek NIC quirk (Alex Williamson)
- Restore bootindex functionality (Alex Williamson)

# gpg: Signature made Thu Jul 23 19:51:23 2015 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-fixes-20150723.0:
  vfio/pci: Fix bootindex
  vfio/pci: Fix RTL8168 NIC quirks
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

f75b7098

tcg/optimize: fix tcg_opt_gen_movi · 96152126

由 Aurelien Jarno 提交于 7月 10, 2015

Due to a copy&paste, the new op value is tested against mov_i32 instead
of movi_i32. The test is therefore always false. Fix that.
Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

96152126

tcg/aarch64: use 32-bit offset for 32-bit softmmu emulation · 80adb8fc

由 Richard Henderson 提交于 7月 23, 2015

Similar to the same fix for user-mode, except this instance
occurs on the softmmu path.  Again, the tlb addend must be
the base register, while the guest address is the index.
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

80adb8fc

tcg/aarch64: use 32-bit offset for 32-bit user-mode emulation · ffc63728

由 Paolo Bonzini 提交于 7月 15, 2015

Thanks to the previous patch, it is now easy for tcg_out_qemu_ld and
tcg_out_qemu_st to use a 32-bit zero extended offset.  However, the
guest base register x28 must be the base and addr_reg must be the
index.
Reported-by: NLeon Alrae <leon.alrae@imgtec.com>
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <1436974021-28978-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ffc63728

tcg/aarch64: add ext argument to tcg_out_insn_3310 · 6c0f0c0f

由 Paolo Bonzini 提交于 7月 15, 2015

The new argument lets you pick uxtw or uxtx mode for the offset
register.  For now, all callers pass TCG_TYPE_I64 so that uxtx
is generated.  The bits for uxtx are removed from I3312_TO_I3310.
Reported-by: NLeon Alrae <leon.alrae@imgtec.com>
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <1436974021-28978-2-git-send-email-pbonzini@redhat.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6c0f0c0f

tcg/i386: Extend addresses for 32-bit guests · ee8ba9e4

由 Richard Henderson 提交于 7月 16, 2015

Removing the ??? comment explaining why it (mostly) worked.
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1437081950-7206-2-git-send-email-rth@twiddle.net>

ee8ba9e4

23 7月, 2015 8 次提交

Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging · 12e21eb0

由 Peter Maydell 提交于 7月 23, 2015

NUMA queue, 2015-07-22

# gpg: Signature made Wed Jul 22 19:11:04 2015 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/numa-pull-request:
  hostmem: Fix qemu_opt_get_bool() crash in host_memory_backend_init()
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

12e21eb0

qemu-char: Fix missed data on unix socket · 4bf1cb03

由 Nils Carlson 提交于 7月 19, 2015

Commit 812c1057 introduced HUP detection on unix and tcp sockets prior
to a read in tcp_chr_read. This unfortunately broke CloudStack 4.2
which relied on the old behaviour where data on a socket was readable
even if a HUP was present.

A working solution is to properly check the return values from recv,
handling a closed socket once there is no more data to read.

Also enable polling for G_IO_NVAL to ensure the callback is called
for all possible events as these should now be possible to handle
with the improved error detection.
Signed-off-by: NNils Carlson <pyssling@ludd.ltu.se>
Message-Id: <1437338396-22336-1-git-send-email-pyssling@ludd.ltu.se>
[Do not handle EINTR; use socket_error(). - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4bf1cb03

P
qemu-char: handle EINTR for TCP character devices · 9172f428
由 Paolo Bonzini 提交于 7月 21, 2015
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
9172f428

exec.c: Use atomic_rcu_read() to access dispatch in memory_region_section_get_iotlb() · 0b8e2c10

由 Peter Maydell 提交于 7月 20, 2015

When accessing the dispatch pointer in an AddressSpace within an RCU
critical section we should always use atomic_rcu_read(). Fix an
access within memory_region_section_get_iotlb() which was incorrectly
doing a direct pointer access.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-Id: <1437391637-31576-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0b8e2c10

vfio/pci: Fix bootindex · 759b484c

由 Alex Williamson 提交于 7月 22, 2015

bootindex was incorrectly changed to a device Property during the
platform code split, resulting in it no longer working.  Remove it.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # v2.3+

759b484c

vfio/pci: Fix RTL8168 NIC quirks · 69970fce

由 Alex Williamson 提交于 7月 22, 2015

The RTL8168 quirk correctly describes using bit 31 as a signal to
mark a latch/completion, but the code mistakenly uses bit 28. This
causes the Realtek driver to spin on this register for quite a while,
20k cycles on Windows 7 v7.092 driver. Then it gets frustrated and
tries to set the bit itself and spins for another 20k cycles. For
some this still results in a working driver, for others not. About
the only thing the code really does in its current form is protect
the guest from sneaking in writes to the real hardware MSI-X table.
The fix is obviously to use bit 31 as we document that we should.

The other problem doesn't seem to affect current drivers as nobody
seems to use these window registers for writes to the MSI-X table, but
we need to use the stored data when a write is triggered, not the
value of the current write, which only provides the offset.

Note that only the Windows drivers from Realtek seem to use these
registers, the Microsoft drivers provided with Windows 8.1 do not
access them, nor do Linux in-kernel drivers.

Link: https://bugs.launchpad.net/qemu/+bug/1384892Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # v2.1+

69970fce

hostmem: Fix qemu_opt_get_bool() crash in host_memory_backend_init() · 6b269967

由 Eduardo Habkost 提交于 7月 16, 2015

This fixes the following crash, introduced by commit
49d2e648:

  $ gdb --args qemu-system-x86_64 -machine pc,mem-merge=off -object memory-backend-ram,id=ram-node0,size=1024
  [...]
  Program received signal SIGABRT, Aborted.
  (gdb) bt
  #0  0x00007ffff253b8c7 in raise () at /lib64/libc.so.6
  #1  0x00007ffff253d52a in abort () at /lib64/libc.so.6
  #2  0x00007ffff253446d in __assert_fail_base () at /lib64/libc.so.6
  #3  0x00007ffff2534522 in  () at /lib64/libc.so.6
  #4  0x00005555558bb80a in qemu_opt_get_bool_helper (opts=0x55555621b650, name=name@entry=0x5555558ec922 "mem-merge", defval=defval@entry=true, del=del@entry=false) at qemu/util/qemu-option.c:388
  #5  0x00005555558bbb5a in qemu_opt_get_bool (opts=<optimized out>, name=name@entry=0x5555558ec922 "mem-merge", defval=defval@entry=true) at qemu/util/qemu-option.c:398
  #6  0x0000555555720a24 in host_memory_backend_init (obj=0x5555562ac970) at qemu/backends/hostmem.c:226

Instead of using qemu_opt_get_bool(), that didn't work with
qemu_machine_opts for a long time, we can use the corresponding
MachineState fields.
Reviewed-by: NMarcel Apfelbaum <marcel@redhat.com>
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>

6b269967

P
Update version for v2.4.0-rc2 release · b69b3053
由 Peter Maydell 提交于 7月 22, 2015
```
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
```
b69b3053

22 7月, 2015 10 次提交

Merge remote-tracking branch 'remotes/elmarco/tags/for-upstream' into staging · 3edf6b3f

由 Peter Maydell 提交于 7月 22, 2015

qxl: build fix for 2.4

# gpg: Signature made Wed Jul 22 15:55:00 2015 BST using DSA key ID F43F0992
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>"
# gpg:                 aka "Marc-Andre Lureau <marcandre.lureau@gmail.com>"
# gpg:                 aka "Marc-Andre Lureau <marc-andre.lureau@nokia.com>"
# gpg:                 aka "Marc-André Lureau <marc-andre.lureau@nokia.com>"
# gpg:                 aka "Marc-André Lureau (elmarco) <marcandre.lureau@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 7346 2483 9404 4E20 ABFF  7D48 D864 9487 F43F 0992

* remotes/elmarco/tags/for-upstream:
  qxl: Fix new function name for spice-server library
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

3edf6b3f

qxl: Fix new function name for spice-server library · a52b2cbf

由 Frediano Ziglio 提交于 7月 20, 2015

The new spice-server function to limit the number of monitors (0.12.6)
changed while development from spice_qxl_set_monitors_config_limit to
spice_qxl_max_monitors (accepted upstream).
By mistake I post patch with former name.
This patch fix the function name.
Signed-off-by: NFrediano Ziglio <fziglio@redhat.com>
Acked-by: NChristophe Fergeau <cfergeau@redhat.com>
Acked-by: NMartin Kletzander <mkletzan@redhat.com>
Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>

a52b2cbf

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging · dc94bd91

由 Peter Maydell 提交于 7月 22, 2015

# gpg: Signature made Wed Jul 22 12:43:35 2015 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  AioContext: optimize clearing the EventNotifier
  AioContext: fix broken placement of event_notifier_test_and_clear
  AioContext: fix broken ctx->dispatching optimization
  aio-win32: reorganize polling loop
  tests: remove irrelevant assertions from test-aio
  qemu-timer: initialize "timers_done_ev" to set
  mirror: Speed up bitmap initial scanning
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

dc94bd91

AioContext: optimize clearing the EventNotifier · 05e514b1

由 Paolo Bonzini 提交于 7月 21, 2015

It is pretty rare for aio_notify to actually set the EventNotifier.  It
can happen with worker threads such as thread-pool.c's, but otherwise it
should never be set thanks to the ctx->notify_me optimization.  The
previous patch, unfortunately, added an unconditional call to
event_notifier_test_and_clear; now add a userspace fast path that
avoids the call.

Note that it is not possible to do the same with event_notifier_set;
it would break, as proved (again) by the included formal model.

This patch survived over 3000 reboots on aarch64 KVM.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-7-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

05e514b1

AioContext: fix broken placement of event_notifier_test_and_clear · 21a03d17

由 Paolo Bonzini 提交于 7月 21, 2015

event_notifier_test_and_clear must be called before processing events.
Otherwise, an aio_poll could "eat" the notification before the main
I/O thread invokes ppoll().  The main I/O thread then never wakes up.
This is an example of what could happen:

   i/o thread       vcpu thread                     worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                                                     bh->scheduled = 1
                                                     event_notifier_set
                    lock_iothread
                    notify_me = 3
                    ppoll
                    notify_me = 1
                    aio_dispatch
                     aio_bh_poll
                      thread_pool_completion_bh
                                                     bh->scheduled = 1
                                                     event_notifier_set
                     node->io_read(node->opaque)
                      event_notifier_test_and_clear
   ppoll
   *** hang ***

"Tracing" with qemu_clock_get_ns shows pretty much the same behavior as
in the previous bug, so there are no new tricks here---just stare more
at the code until it is apparent.

One could also use a formal model, of course.  The included one shows
this with three processes: notifier corresponds to a QEMU thread pool
worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
waiter to the main I/O thread.  I would be happy to say that the
formal model found the bug for me, but actually I wrote it after the
fact.

This patch is a bit of a big hammer.  The next one optimizes it,
with help (this time for real rather than a posteriori :)) from
another, similar formal model.
Reported-by: NRichard W. M. Jones <rjones@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-6-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

21a03d17

AioContext: fix broken ctx->dispatching optimization · eabc9779

由 Paolo Bonzini 提交于 7月 21, 2015

This patch rewrites the ctx->dispatching optimization, which was the cause
of some mysterious hangs that could be reproduced on aarch64 KVM only.
The hangs were indirectly caused by aio_poll() and in particular by
flash memory updates's call to blk_write(), which invokes aio_poll().
Fun stuff: they had an extremely short race window, so much that
adding all kind of tracing to either the kernel or QEMU made it
go away (a single printf made it half as reproducible).

On the plus side, the failure mode (a hang until the next keypress)
made it very easy to examine the state of the process with a debugger.
And there was a very nice reproducer from Laszlo, which failed pretty
often (more than half of the time) on any version of QEMU with a non-debug
kernel; it also failed fast, while still in the firmware.  So, it could
have been worse.

For some unknown reason they happened only with virtio-scsi, but
that's not important.  It's more interesting that they disappeared with
io=native, making thread-pool.c a likely suspect for where the bug arose.
thread-pool.c is also one of the few places which use bottom halves
across threads, by the way.

I hope that no other similar bugs exist, but just in case :) I am
going to describe how the successful debugging went...  Since the
likely culprit was the ctx->dispatching optimization, which mostly
affects bottom halves, the first observation was that there are two
qemu_bh_schedule() invocations in the thread pool: the one in the aio
worker and the one in thread_pool_completion_bh.  The latter always
causes the optimization to trigger, the former may or may not.  In
order to restrict the possibilities, I introduced new functions
qemu_bh_schedule_slow() and qemu_bh_schedule_fast():

     /* qemu_bh_schedule_slow: */
     ctx = bh->ctx;
     bh->idle = 0;
     if (atomic_xchg(&bh->scheduled, 1) == 0) {
         event_notifier_set(&ctx->notifier);
     }

     /* qemu_bh_schedule_fast: */
     ctx = bh->ctx;
     bh->idle = 0;
     assert(ctx->dispatching);
     atomic_xchg(&bh->scheduled, 1);

Notice how the atomic_xchg is still in qemu_bh_schedule_slow().  This
was already debated a few months ago, so I assumed it to be correct.
In retrospect this was a very good idea, as you'll see later.

Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't
trigger the assertion (as expected).  Changing the worker's invocation
to qemu_bh_schedule_slow() didn't hide the bug (another assumption
which luckily held).  This already limited heavily the amount of
interaction between the threads, hinting that the problematic events
must have triggered around thread_pool_completion_bh().

As mentioned early, invoking a debugger to examine the state of a
hung process was pretty easy; the iothread was always waiting on a
poll(..., -1) system call.  Infinite timeouts are much rarer on x86,
and this could be the reason why the bug was never observed there.
With the buggy sequence more or less resolved to an interaction between
thread_pool_completion_bh() and poll(..., -1), my "tracing" strategy was
to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping
that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and
qemu_bh_schedule_fast() would provide some hint.  The output was:

    (gdb) p last_prepare
    $3 = 103885451
    (gdb) p last_dispatch
    $4 = 103876492
    (gdb) p last_poll
    $5 = 115909333
    (gdb) p last_schedule
    $6 = 115925212

Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch().
This makes little sense unless there is an aio_poll() call involved,
and indeed with a slightly different instrumentation you can see that
there is one:

    (gdb) p last_prepare
    $3 = 107569679
    (gdb) p last_dispatch
    $4 = 107561600
    (gdb) p last_aio_poll
    $5 = 110671400
    (gdb) p last_schedule
    $6 = 110698917

So the scenario becomes clearer:

   iothread                   VCPU thread
--------------------------------------------------------------------------
   aio_ctx_prepare
   aio_ctx_check
   qemu_poll_ns(timeout=-1)
                              aio_poll
                                aio_dispatch
                                  thread_pool_completion_bh
                                    qemu_bh_schedule()

At this point bh->scheduled = 1 and the iothread has not been woken up.
The solution must be close, but this alone should not be a problem,
because the bottom half is only rescheduled to account for rare situations
(see commit 3c80ca15, thread-pool: avoid deadlock in nested aio_poll()
calls, 2014-07-15).

Introducing a third thread---a thread pool worker thread, which
also does qemu_bh_schedule()---does bring out the problematic case.
The third thread must be awakened *after* the callback is complete and
thread_pool_completion_bh has redone the whole loop, explaining the
short race window.  And then this is what happens:

                                                      thread pool worker
--------------------------------------------------------------------------
                                                      <I/O completes>
                                                      qemu_bh_schedule()

Tada, bh->scheduled is already 1, so qemu_bh_schedule() does nothing
and the iothread is never woken up.  This is where the bh->scheduled
optimization comes into play---it is correct, but removing it would
have masked the bug.

So, what is the bug?

Well, the question asked by the ctx->dispatching optimization ("is any
active aio_poll dispatching?") was wrong.  The right question to ask
instead is "is any active aio_poll *not* dispatching", i.e. in the prepare
or poll phases?  In that case, the aio_poll is sleeping or might go to
sleep anytime soon, and the EventNotifier must be invoked to wake
it up.

In any other case (including if there is *no* active aio_poll at all!)
we can just wait for the next prepare phase to pick up the event (e.g. a
bottom half); the prepare phase will avoid the blocking and service the
bottom half.

Expressing the invariant with a logic formula, the broken one looked like:

   !(exists(thread): in_dispatching(thread)) => !optimize

or equivalently:

   !(exists(thread):
          in_aio_poll(thread) && in_dispatching(thread)) => !optimize

In the correct one, the negation is in a slightly different place:

   (exists(thread):
         in_aio_poll(thread) && !in_dispatching(thread)) => !optimize

or equivalently:

   (exists(thread): in_prepare_or_poll(thread)) => !optimize

Even if the difference boils down to moving an exclamation mark :)
the implementation is quite different.  However, I think the new
one is simpler to understand.

In the old implementation, the "exists" was implemented with a boolean
value.  This didn't really support well the case of multiple concurrent
event loops, but I thought that this was okay: aio_poll holds the
AioContext lock so there cannot be concurrent aio_poll invocations, and
I was just considering nested event loops.  However, aio_poll _could_
indeed be concurrent with the GSource.  This is why I came up with the
wrong invariant.

In the new implementation, "exists" is computed simply by counting how many
threads are in the prepare or poll phases.  There are some interesting
points to consider, but the gist of the idea remains:

1) AioContext can be used through GSource as well; as mentioned in the
patch, bit 0 of the counter is reserved for the GSource.

2) the counter need not be updated for a non-blocking aio_poll, because
it won't sleep forever anyway.  This is just a matter of checking
the "blocking" variable.  This requires some changes to the win32
implementation, but is otherwise not too complicated.

3) as mentioned above, the new implementation will not call aio_notify
when there is *no* active aio_poll at all.  The tests have to be
adjusted for this change.  The calls to aio_notify in async.c are fine;
they only want to kick aio_poll out of a blocking wait, but need not
do anything if aio_poll is not running.

4) nested aio_poll: these just work with the new implementation; when
a nested event loop is invoked, the outer event loop is never in the
prepare or poll phases.  The outer event loop thus has already decremented
the counter.
Reported-by: NRichard W. M. Jones <rjones@redhat.com>
Reported-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-5-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

eabc9779

aio-win32: reorganize polling loop · 6493c975

由 Paolo Bonzini 提交于 7月 21, 2015

Preparatory bugfixes and tweaks to the loop before the next patch:

- disable dispatch optimization during aio_prepare.  This fixes a bug.

- do not modify "blocking" until after the first WaitForMultipleObjects
call.  This is needed in the next patch.

- change the loop to do...while.  This makes it obvious that the loop
is always entered at least once.  In the next patch this is important
because the first iteration undoes the ctx->notify_me increment that
happened before entering the loop.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-4-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

6493c975

tests: remove irrelevant assertions from test-aio · 12d69ac0

由 Paolo Bonzini 提交于 7月 21, 2015

In these tests, the purpose of the initial calls to aio_poll and
g_main_context_iteration is simply to put the AioContext in a
known state; the return value of the function does not really
matter.  The next patch will change those return values; change
the assertions to a while loop which expresses the intention
better.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-3-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

12d69ac0

qemu-timer: initialize "timers_done_ev" to set · e4efd8a4

由 Paolo Bonzini 提交于 7月 21, 2015

The normal value for the event is to be set.  If we do not do
this, pause_all_vcpus (through qemu_clock_enable) hangs unless
timerlist_run_timers has been run at least once for the timerlist.
This can happen with the following patches, that make aio_notify do
nothing most of the time.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-2-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

e4efd8a4

mirror: Speed up bitmap initial scanning · 99900697

由 Fam Zheng 提交于 7月 09, 2015

Limiting to sectors_per_chunk for each bdrv_is_allocated_above is slow,
because the underlying protocol driver would issue much more queries
than necessary. We should coalesce the query.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: <1436413678-7114-4-git-send-email-famz@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

99900697