提交 · 1ab8f867ef2ef0aa259e0e0d1040d67350af1bec · openeuler / qemu

18 11月, 2014 15 次提交

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging · 1ab8f867

由 Peter Maydell 提交于 11月 18, 2014

Block patches for 2.2.0-rc2

# gpg: Signature made Tue 18 Nov 2014 11:32:55 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  block/raw-posix: Catch fsync() errors
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Fix preallocating write() loop
  raw-posix: The SEEK_HOLE code is flawed, rewrite it
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: Fix comment for raw_co_get_block_status()
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

1ab8f867

Merge remote-tracking branch 'remotes/amit-migration/tags/for-2.2' into staging · ea5b201a

由 Peter Maydell 提交于 11月 18, 2014

Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
migration by Michael S. Tsirkin.

# gpg: Signature made Tue 18 Nov 2014 11:23:00 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-migration/tags/for-2.2:
  migration: fix parameter validation on ram load
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

ea5b201a

linux-headers: update to 3.18-rc5 · 444b1996

由 Ard Biesheuvel 提交于 11月 17, 2014

This updates the Linux header to version 3.18-rc5, adding support for
(among other things) read-only memslots on ARM and arm64.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Message-id: 1416248898-6302-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

444b1996

migration: fix parameter validation on ram load · 0be839a2

由 Michael S. Tsirkin 提交于 11月 12, 2014

During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.

To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.

Validating addr < len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.

Fixes: CVE-2014-7840

Note: follow-up patches add extra checks on each block->host access.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NAmit Shah <amit.shah@redhat.com>

0be839a2

block/raw-posix: Catch fsync() errors · 098ffa66

由 Max Reitz 提交于 11月 18, 2014

fsync() may fail, and that case should be handled.
Reported-by: NLászló Érsek <lersek@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

098ffa66

block/raw-posix: Only sync after successful preallocation · 731de380

由 Max Reitz 提交于 11月 18, 2014

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

731de380

block/raw-posix: Fix preallocating write() loop · 39411cf3

由 Max Reitz 提交于 11月 18, 2014

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.
Reported-by: NLászló Érsek <lersek@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

39411cf3

exec: Handle multipage ranges in invalidate_and_set_dirty() · f874bf90

由 Peter Maydell 提交于 11月 16, 2014

The code in invalidate_and_set_dirty() needs to handle addr/length
combinations which cross guest physical page boundaries. This can happen,
for example, when disk I/O reads large blocks into guest RAM which previously
held code that we have cached translations for. Unfortunately we were only
checking the clean/dirty status of the first page in the range, and then
were calling a tb_invalidate function which only handles ranges that don't
cross page boundaries. Fix the function to deal with multipage ranges.

The symptoms of this bug were that guest code would misbehave (eg segfault),
in particular after a guest reboot but potentially any time the guest
reused a page of its physical RAM for new code.

Cc: qemu-stable@nongnu.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1416167061-13203-1-git-send-email-peter.maydell@linaro.org

f874bf90

Merge remote-tracking branch 'mreitz/block' into queue-block · 86767853

由 Kevin Wolf 提交于 11月 18, 2014

* mreitz/block:
  raw-posix: The SEEK_HOLE code is flawed, rewrite it
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: Fix comment for raw_co_get_block_status()

86767853

raw-posix: The SEEK_HOLE code is flawed, rewrite it · d1f06fe6

由 Markus Armbruster 提交于 11月 17, 2014

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

d1f06fe6

raw-posix: SEEK_HOLE suffices, get rid of FIEMAP · c4875e5b

由 Markus Armbruster 提交于 11月 17, 2014

Commit 5500316d (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8a (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0ae (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.htmlSigned-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

c4875e5b

raw-posix: Fix comment for raw_co_get_block_status() · be2ebc6d

由 Markus Armbruster 提交于 11月 17, 2014

Missed in commit 705be728.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

be2ebc6d

target-arm: handle address translations that start at level 3 · d6be29e3

由 Peter Maydell 提交于 11月 13, 2014

The ARMv8 address translation system defines that a page table walk
starts at a level which depends on the translation granule size
and the number of bits of virtual address that need to be resolved.
Where the translation granule is 64KB and the guest sets the
TCR.TxSZ field to between 35 and 39, it's actually possible to
start at level 3 (the final level). QEMU's implementation failed
to handle this case, and so we would set level to 2 and behave
incorrectly (including invoking the C undefined behaviour of
shifting left by a negative number). Correct the code that
determines the starting level to deal with the start-at-3 case,
by replacing the if-else ladder with an expression derived from
the ARM ARM pseudocode version.

This error was detected by the Coverity scan, which spotted
the potential shift by a negative number.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 1415890569-7454-1-git-send-email-peter.maydell@linaro.org

d6be29e3

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging · 1aba4be9

由 Peter Maydell 提交于 11月 17, 2014

A smattering of fixes for problems that Coverity reported.

# gpg: Signature made Mon 17 Nov 2014 17:03:25 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  hcd-musb: fix dereference null return value
  target-cris/translate.c: fix out of bounds read
  shpc: fix error propaagation
  qemu-char: fix MISSING_COMMA
  acl: fix memory leak
  nvme: remove superfluous check
  loader: fix NEGATIVE_RETURNS
  qga: fix false negative argument passing
  mips_mipssim: fix use-after-free for filename
  l2tpv3: fix fd leak
  l2tpv3: fix possible double free
  libcacard: fix resource leak
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

1aba4be9

hcd-musb: fix dereference null return value · a9be7657

由 Paolo Bonzini 提交于 11月 17, 2014

usb_ep_get and usb_handle_packet can deal with a NULL device, but we have
to avoid dereferencing NULL pointers when building the id.

Thanks to Gonglei for an initial stab at fixing this.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9be7657

17 11月, 2014 10 次提交

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging · d8edf52a

由 Peter Maydell 提交于 11月 17, 2014

Update OpenBIOS images

# gpg: Signature made Sat 15 Nov 2014 13:12:02 GMT using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

d8edf52a

target-cris/translate.c: fix out of bounds read · fae38221

由 zhanghailiang 提交于 11月 17, 2014

In function t_gen_mov_TN_preg and t_gen_mov_preg_TN, The begin check about the
validity of in-parameter 'r' is useless. We still access cpu_PR[r] in the
follow code if it is invalid. Which will be an out-of-bounds read error.

Fix it by using assert() to ensure it is valid before using it.
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fae38221

shpc: fix error propaagation · 0e8b439a

由 Gonglei 提交于 11月 15, 2014

Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e8b439a

qemu-char: fix MISSING_COMMA · 86d10328

由 Gonglei 提交于 11月 15, 2014

Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

86d10328

acl: fix memory leak · 6cfcd864

由 Gonglei 提交于 11月 15, 2014

If 'i != index' for all acl->entries, variable
entry leaks the storage it points to.
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6cfcd864

nvme: remove superfluous check · 720fdd6f

由 Gonglei 提交于 11月 15, 2014

Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
((n->bar.aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) > 4095
is always false regardless of the values of its operands.
This occurs as the logical second operand of '||'.
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

720fdd6f

loader: fix NEGATIVE_RETURNS · ddd2eab7

由 Gonglei 提交于 11月 15, 2014

lseek will return -1 on error, g_malloc0(size) and read(,,size)
paramenters cannot be negative. We should add a check for return
value of lseek().
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ddd2eab7

qga: fix false negative argument passing · 1def7454

由 Gonglei 提交于 11月 15, 2014

Function send_response(s, &qdict->base) returns a negative number
when any failures occured. But strerror()'s parameter cannot be
negative. Let's change the testing condition and pass '-ret' to
strerr().
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1def7454

mips_mipssim: fix use-after-free for filename · 77e205a5

由 Gonglei 提交于 11月 15, 2014

May pass freed pointer filename as an argument to error_report.
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77e205a5

l2tpv3: fix fd leak · d4754a95

由 Gonglei 提交于 11月 15, 2014

In this false branch, fd will leak when it is zero.
Change the testing condition.
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
[Fix net_l2tpv3_cleanup as well. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d4754a95

15 11月, 2014 1 次提交

Update OpenBIOS images · 35fb5b73

由 Mark Cave-Ayland 提交于 11月 15, 2014

Update OpenBIOS images to SVN r1327 built from submodule.
Signed-off-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

35fb5b73

14 11月, 2014 14 次提交

Merge remote-tracking branch 'remotes/sstabellini/xen-2014-11-14' into staging · 4e70f927

由 Peter Maydell 提交于 11月 14, 2014

* remotes/sstabellini/xen-2014-11-14:
  xen_disk: fix unmapping of persistent grants
  pc: piix4_pm: init legacy PCI hotplug when running on Xen
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

4e70f927

l2tpv3: fix possible double free · 77374582

由 zhanghailiang 提交于 11月 14, 2014

freeaddrinfo(result) does not assign result = NULL, after frees it.
There will be a double free when it goes error case.
It is reported by covertiy.
Reviewed-by: NGonglei <arei.gonglei@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77374582

libcacard: fix resource leak · 5bbebf62

由 zhanghailiang 提交于 11月 14, 2014

In function connect_to_qemu(), getaddrinfo() will allocate memory
that is stored into server, it should be freed by using freeaddrinfo()
before connect_to_qemu() return.

Cc: qemu-stable@nongnu.org
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5bbebf62

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging · b87dcdd0

由 Peter Maydell 提交于 11月 14, 2014

# gpg: Signature made Fri 14 Nov 2014 11:05:54 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info
  block: Fix max nb_sectors in bdrv_make_zero
  ahci: factor out FIS decomposition from handle_cmd
  ahci: Check cmd_fis[1] more explicitly
  ahci: Reorder error cases in handle_cmd
  ahci: Fix FIS decomposition
  ahci: add is_ncq predicate helper
  ide: Correct handling of malformed/short PRDTs
  ahci: unify sglist preparation
  ide: repair PIO transfers for cases where nsector > 1
  ahci: Fix byte count regression for ATAPI/PIO
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

b87dcdd0

xen_disk: fix unmapping of persistent grants · 2f01dfac

由 Roger Pau Monne 提交于 11月 13, 2014

This patch fixes two issues with persistent grants and the disk PV backend
(Qdisk):

 - Keep track of memory regions where persistent grants have been mapped
   since we need to unmap them as a whole. It is not possible to unmap a
   single grant if it has been batch-mapped. A new check has also been added
   to make sure persistent grants are only used if the whole mapped region
   can be persistently mapped in the batch_maps case.
 - Unmap persistent grants before switching to the closed state, so the
   frontend can also free them.
Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
Reported-by: NGeorge Dunlap <george.dunlap@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2f01dfac

pc: piix4_pm: init legacy PCI hotplug when running on Xen · 91ab2ed7

由 Igor Mammedov 提交于 11月 14, 2014

If user starts QEMU with "-machine pc,accel=xen", then
compat property in xenfv won't work and it would cause error:
"Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set"
when PCI device is added with -device on QEMU CLI.

From: Igor Mammedov <imammedo@redhat.com>

In case of Xen instead of using compat property, just use the fact
that xen doesn't use QEMU's fw_cfg/acpi tables to switch piix4_pm
into legacy PCI hotplug mode when Xen is enabled.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NLi Liang <liang.z.li@intel.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

91ab2ed7

vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info · 5f583307

由 Fam Zheng 提交于 11月 14, 2014

When extent types don't match, we return -ENOTSUP. In this case, be
polite to the caller and don't modify bdi.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 1415938161-16217-1-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

5f583307

block: Fix max nb_sectors in bdrv_make_zero · f3a9cfdd

由 Fam Zheng 提交于 11月 10, 2014

In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX /
BDRV_SECTOR_SIZE, so a caller shouldn't exceed it.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Message-id: 1415603264-21497-1-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

f3a9cfdd

ahci: factor out FIS decomposition from handle_cmd · 107f0d46

由 John Snow 提交于 11月 03, 2014

In order to make handle_cmd more readable at the macro level,
the details of how to decompose particular types of FIS packets
are left to helper functions.

In our case, the only type of FIS packet we currently expect to
see is a Register H2D FIS packet, but the gory details of its
decomposition are of no particular interest in handle_cmd.

This patch keeps the receipt of FIS packets and the decomposition
thereof separated to two different functions.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1415058979-16604-6-git-send-email-jsnow@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

107f0d46

ahci: Check cmd_fis[1] more explicitly · 102e5625

由 John Snow 提交于 11月 03, 2014

Instead of checking for a known byte, inspect the
fields of this byte explicitly to produce more meaningful
error messages and improve the readability of this section.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1415058979-16604-5-git-send-email-jsnow@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

102e5625

ahci: Reorder error cases in handle_cmd · 36ab3c34

由 John Snow 提交于 11月 03, 2014

Error checking in ahci's handle_cmd is re-ordered so that we
initialize as few things as possible before we've done our
sanity checking. This simplifies returning from this call
in case of an error.

A check to make sure the DMA memory map succeeds with the
correct size is also added, and the debug print of the
command fis is cleaned up with its size corrected.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1415058979-16604-4-git-send-email-jsnow@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

36ab3c34

ahci: Fix FIS decomposition · 1cbdd968

由 John Snow 提交于 11月 03, 2014

This patch introduces a few changes to how FIS packets are
deciphered in the AHCI virtual device. The summary of
changes can be grouped into two pieces:

[A] Changes to how we apply a preliminary sieve to FISes,
[B] Changes in how we internalize a decomposed FIS.

== Changes to how we apply a preliminary sieve to FISes ==

(1) Packets may now either update the Control register or
    the Command register, but not both. This is according
    to the SATA 3.2 specification which states:
    "...the device either initiates processing of the command
    indicated in the Command register or initiates processing
    of the control request indicated [...] depending on the
    state of the C bit in the FIS."

    See SATA 3.2 section 10.5.5.4, "Reception" in the 10.5.5
    "Register Host to Device FIS" section.

    This change accounts for the first two regions of change
    within the diff. All other changes belong to the following
    changes.

== Changes in how we internalize a decomposed FIS ==

(2) Instead of trying to extract the sector number out of the
    FIS from bytes 4-10 and setting it with ide_set_sector,
    we set the appropriate IDEState registers and trust that
    ide_get_sector can retrieve the correct sector later.

    By "constructing" the sector for use with ide_set_sector,
    we are duplicating the mechanisms of ide_get_sector.
    This change makes the FIS decomposition more obvious.

    SATA 3.2 as a specification does not make the legacy
    register mapping with respect to the D2H FIS obvious.
    However, SATA 3.2 section 10.5.5.1 "Register Host to
    Device FIS layout" describes all of the "cmd_fis"
    bytes:

    0 - FIS Type (0x27)
    1 - Port Multiplier Port and Command Update flag
    2 - ATA Command
    3 - Features_Low
    4 - LBA 7:0
    5 - LBA 15:8
    6 - LBA 23:16
    7 - Device, AKA "Drive Select."
    8 - LBA 31:24
    9 - LBA 39:32
    10 - LBA 47:40
    11 - Features_High
    12 - Count Low
    13 - Count High
    14 - ICC
    15 - Control
    16-19 - Auxiliary (for NCQ, defined per-command)

    Most of these registers map to existing IDEState registers
    in obvious ways, especially features, select, hob_features,
    and nsector (count). ICC is reserved in older specifications
    but is not supported in our implementation, and remains
    unused here. The Control register is not valid for a command
    that is trying to update the command register and is to be
    considered reserved at this point.

    What is not obvious is the LBA register mappings, but SATA 1.0
    can help inform of us legacy device support, see SATA 1.0 section
    8.5.2 "Register - Host to Device."

    LBA 7:0   - Sector Number    (sector)
    LBA 15:8  - Cyl Low          (lcyl)
    LBA 23:16 - Cyl High         (hcyl)
    LBA 31:24 - Sector Num Exp.  (hob_sector)
    LBA 39:32 - Cyl Low Exp.     (hob_lcyl)
    LBA 47:40 - Cyl High Exp.    (hob_hcyl)

    These mappings help guide which registers the FIS should be decomposed
    into/towards for CHS, LBA28 and LBA48 commands.

    As a note: The prior confusion that can be seen in the documentation
    arises from the fact that CHS and LBA28 commands use the low nybble
    of the drive select register to store LBA 27:24, whereas LNA48 commands
    use the hob_sector, hob_lcyl and hob_hcyl registers as explained above.

    The decomposition as it stands now will correctly decompose CHS, LBA28
    and LBA48 commands into their appropriate registers where the core
    IDE/ATAPI layers can deal with them correctly.

    See the below point for more information.

(3) We save cmd_fis[7] as ide_state->select, which informs
    decisions about if we are using LBA or CHS.
    This corrects a bug in AHCI wherein we attempt to set and/or
    retrieve the sector number by using ide_set_sector and
    ide_get_sector, which depend on the select register to
    determine if we are using LBA or CHS.

    Without this adjustment, LBA48 read/writes are currently
    broken. Thanks to Eniac Zheng @ HP for pointing this out.

(4) Save cmd_fis[11] as ide_state->hob_feature, as defined in SATA 3.2.

(5) For several ATA commands, the sector count register set to 0
    is a magic number that means 256 sectors. For LBA48 commands,
    this means 65,536 sectors. We drop the magic sector correction
    here, and trust the ide core layer to handle the conversion
    appropriately, in ide_cmd_lba48_transform(). As it stands,
    the current AHCI code is only compliant with LBA28 commands.
    By simply removing the magic, it will work with LBA28 and LBA48.

(6) We expand FIS decomposition to include both ATAPI and IDE devices.
    We leave the logic of determining if the fields are valid or not
    to the respective layers.

    This change intends to make it clearer that AHCI is only a
    composition mechanism for the FIS packets: the meanings of
    the registers is best left to the implementation layers for
    those devices.

(7) Forcefully setting the feature, hcyl and lcyl registers for ATAPI
    commands is removed.
    - The hcyl and lcyl magic present here is valid at boot only,
      and should not be overridden for every PACKET command.
    - The feature register is defined as valid for the PACKET command,
      so we should not suppress it. The ATAPI layer does not even
      currently depend on or require 0x01 as mandatory.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1415058979-16604-3-git-send-email-jsnow@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

1cbdd968

ahci: add is_ncq predicate helper · 72a065db

由 John Snow 提交于 11月 03, 2014

A small helper to determine which S/ATA commands
are destined to be routed to the NCQ pathways.

This references SATA 3.2 section 13.6,
Native Command Queueing. See sections 13.6.4,
13.6.5, 13.6.6, 13.6.7 and 13.6.8 for all
SATA commands considered to be part of the
NCQ feature set. This is summarized in a small
list in section 13.6.3.1 and again in 13.6.3.2.

Not all of these NCQ commands are currently supported,
so the error pathways are adjusted slightly to be more
informative in the case they are encountered.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1415058979-16604-2-git-send-email-jsnow@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

72a065db

ide: Correct handling of malformed/short PRDTs · 3251bdcf

由 John Snow 提交于 10月 31, 2014

This impacts both BMDMA and AHCI HBA interfaces for IDE.
Currently, we confuse the difference between a PRDT having
"0 bytes" and a PRDT having "0 complete sectors."

When we receive an incomplete sector, inconsistent error checking
leads to an infinite loop wherein the call succeeds, but it
didn't give us enough bytes -- leading us to re-call the
DMA chain over and over again. This leads to, in the BMDMA case,
leaked memory for short PRDTs, and infinite loops and resource
usage in the AHCI case.

The .prepare_buf() callback is reworked to return the number of
bytes that it successfully prepared. 0 is a valid, non-error
answer that means the table was empty and described no bytes.
-1 indicates an error.

Our current implementation uses the io_buffer in IDEState to
ultimately describe the size of a prepared scatter-gather list.
Even though the AHCI PRDT/SGList can be as large as 256GiB, the
AHCI command header limits transactions to just 4GiB. ATA8-ACS3,
however, defines the largest transaction to be an LBA48 command
that transfers 65,536 sectors. With a 512 byte sector size, this
is just 32MiB.

Since our current state structures use the int type to describe
the size of the buffer, and this state is migrated as int32, we
are limited to describing 2GiB buffer sizes unless we change the
migration protocol.

For this reason, this patch begins to unify the assertions in the
IDE pathways that the scatter-gather list provided by either the
AHCI PRDT or the PCI BMDMA PRDs can only describe, at a maximum,
2GiB. This should be resilient enough unless we need a sector
size that exceeds 32KiB.

Further, the likelihood of any guest operating system actually
attempting to transfer this much data in a single operation is
very slim.

To this end, the IDEState variables have been updated to more
explicitly clarify our maximum supported size. Callers to the
prepare_buf callback have been reworked to understand the new
return code, and all versions of the prepare_buf callback have
been adjusted accordingly.

Lastly, the ahci_populate_sglist helper, relied upon by the
AHCI implementation of .prepare_buf() as well as the PCI
implementation of the callback have had overflow assertions
added to help make clear the reasonings behind the various
type changes.

[Added %d -> %"PRId64" fix John sent because off_pos changed from int to
int64_t.
--Stefan]
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1414785819-26209-4-git-send-email-jsnow@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

3251bdcf