1. 18 11月, 2014 15 次提交
    • P
      Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging · 1ab8f867
      Peter Maydell 提交于
      Block patches for 2.2.0-rc2
      
      # gpg: Signature made Tue 18 Nov 2014 11:32:55 GMT using RSA key ID C88F2FD6
      # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
      
      * remotes/kevin/tags/for-upstream:
        block/raw-posix: Catch fsync() errors
        block/raw-posix: Only sync after successful preallocation
        block/raw-posix: Fix preallocating write() loop
        raw-posix: The SEEK_HOLE code is flawed, rewrite it
        raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
        raw-posix: Fix comment for raw_co_get_block_status()
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      1ab8f867
    • P
      Merge remote-tracking branch 'remotes/amit-migration/tags/for-2.2' into staging · ea5b201a
      Peter Maydell 提交于
      Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
      migration by Michael S. Tsirkin.
      
      # gpg: Signature made Tue 18 Nov 2014 11:23:00 GMT using RSA key ID 854083B6
      # gpg: Good signature from "Amit Shah <amit@amitshah.net>"
      # gpg:                 aka "Amit Shah <amit@kernel.org>"
      # gpg:                 aka "Amit Shah <amitshah@gmx.net>"
      
      * remotes/amit-migration/tags/for-2.2:
        migration: fix parameter validation on ram load
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      ea5b201a
    • A
      linux-headers: update to 3.18-rc5 · 444b1996
      Ard Biesheuvel 提交于
      This updates the Linux header to version 3.18-rc5, adding support for
      (among other things) read-only memslots on ARM and arm64.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Message-id: 1416248898-6302-1-git-send-email-ard.biesheuvel@linaro.org
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      444b1996
    • M
      migration: fix parameter validation on ram load · 0be839a2
      Michael S. Tsirkin 提交于
      During migration, the values read from migration stream during ram load
      are not validated. Especially offset in host_from_stream_offset() and
      also the length of the writes in the callers of said function.
      
      To fix this, we need to make sure that the [offset, offset + length]
      range fits into one of the allocated memory regions.
      
      Validating addr < len should be sufficient since data seems to always be
      managed in TARGET_PAGE_SIZE chunks.
      
      Fixes: CVE-2014-7840
      
      Note: follow-up patches add extra checks on each block->host access.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      0be839a2
    • M
      block/raw-posix: Catch fsync() errors · 098ffa66
      Max Reitz 提交于
      fsync() may fail, and that case should be handled.
      Reported-by: NLászló Érsek <lersek@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      098ffa66
    • M
      block/raw-posix: Only sync after successful preallocation · 731de380
      Max Reitz 提交于
      The loop which filled the file with zeroes may have been left early due
      to an error. In that case, the fsync() should be skipped.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      731de380
    • M
      block/raw-posix: Fix preallocating write() loop · 39411cf3
      Max Reitz 提交于
      write() may write less bytes than requested; in this case, the number of
      bytes written is returned. This is the byte count we should be
      subtracting from the number of bytes still to be written, and not the
      byte count we requested to write.
      Reported-by: NLászló Érsek <lersek@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      39411cf3
    • P
      exec: Handle multipage ranges in invalidate_and_set_dirty() · f874bf90
      Peter Maydell 提交于
      The code in invalidate_and_set_dirty() needs to handle addr/length
      combinations which cross guest physical page boundaries. This can happen,
      for example, when disk I/O reads large blocks into guest RAM which previously
      held code that we have cached translations for. Unfortunately we were only
      checking the clean/dirty status of the first page in the range, and then
      were calling a tb_invalidate function which only handles ranges that don't
      cross page boundaries. Fix the function to deal with multipage ranges.
      
      The symptoms of this bug were that guest code would misbehave (eg segfault),
      in particular after a guest reboot but potentially any time the guest
      reused a page of its physical RAM for new code.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1416167061-13203-1-git-send-email-peter.maydell@linaro.org
      f874bf90
    • K
      Merge remote-tracking branch 'mreitz/block' into queue-block · 86767853
      Kevin Wolf 提交于
      * mreitz/block:
        raw-posix: The SEEK_HOLE code is flawed, rewrite it
        raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
        raw-posix: Fix comment for raw_co_get_block_status()
      86767853
    • M
      raw-posix: The SEEK_HOLE code is flawed, rewrite it · d1f06fe6
      Markus Armbruster 提交于
      On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
      but not Linux), try_seek_hole() reports trailing data instead.
      
      Additionally, unlikely lseek() failures are treated badly:
      
      * When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
        -ENXIO, there's in fact a trailing hole.  Can happen only when
        something truncated the file since we opened it.
      
      * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
        then try_seek_hole() reports a trailing hole.  This is okay only
        when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
        found by SEEK_HOLE has since become trailing somehow).  For other
        failures (unlikely), it's wrong.
      
      * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
        then try_seek_hole() reports bogus data [-1,start), which its caller
        raw_co_get_block_status() turns into zero sectors of data.  Could
        theoretically lead to infinite loops in code that attempts to scan
        data vs. hole forward.
      
      Rewrite from scratch, with very careful comments.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      d1f06fe6
    • M
      raw-posix: SEEK_HOLE suffices, get rid of FIEMAP · c4875e5b
      Markus Armbruster 提交于
      Commit 5500316d (May 2012) implemented raw_co_is_allocated() as
      follows:
      
      1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
      
      2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
      
      3. Else pretend there are no holes
      
      Later on, raw_co_is_allocated() was generalized to
      raw_co_get_block_status().
      
      Commit 4f11aa8a (May 2014) changed it to try the three methods in order
      until success, because "there may be implementations which support
      [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
      versa."
      
      Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
      Commit 38c4d0ae (Sep 2014) added it.  Because that's a significant
      speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
      
      As you see, the obvious use of FIEMAP is wrong, and the correct use is
      slow.  I guess this puts it somewhere between -7 "The obvious use is
      wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
      to Misuse scale[*].
      
      "Fortunately", the FIEMAP code is used only when
      
      * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
      
        Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
        was introduced for ext4 and btrfs) and 2012.
      
      * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
      
        Unlikely.
      
      Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
      bugs.  Worse, bugs hiding there can theoretically bite even on a host
      that has SEEK_HOLE/SEEK_DATA.
      
      I don't want to worry about this crap, not even theoretically.  Get
      rid of it.
      
      [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.htmlSigned-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      c4875e5b
    • M
      raw-posix: Fix comment for raw_co_get_block_status() · be2ebc6d
      Markus Armbruster 提交于
      Missed in commit 705be728.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      be2ebc6d
    • P
      target-arm: handle address translations that start at level 3 · d6be29e3
      Peter Maydell 提交于
      The ARMv8 address translation system defines that a page table walk
      starts at a level which depends on the translation granule size
      and the number of bits of virtual address that need to be resolved.
      Where the translation granule is 64KB and the guest sets the
      TCR.TxSZ field to between 35 and 39, it's actually possible to
      start at level 3 (the final level). QEMU's implementation failed
      to handle this case, and so we would set level to 2 and behave
      incorrectly (including invoking the C undefined behaviour of
      shifting left by a negative number). Correct the code that
      determines the starting level to deal with the start-at-3 case,
      by replacing the if-else ladder with an expression derived from
      the ARM ARM pseudocode version.
      
      This error was detected by the Coverity scan, which spotted
      the potential shift by a negative number.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Message-id: 1415890569-7454-1-git-send-email-peter.maydell@linaro.org
      d6be29e3
    • P
      Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging · 1aba4be9
      Peter Maydell 提交于
      A smattering of fixes for problems that Coverity reported.
      
      # gpg: Signature made Mon 17 Nov 2014 17:03:25 GMT using RSA key ID 78C7AE83
      # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
      # gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
      # gpg: WARNING: This key is not certified with sufficiently trusted signatures!
      # gpg:          It is not certain that the signature belongs to the owner.
      # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
      #      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83
      
      * remotes/bonzini/tags/for-upstream:
        hcd-musb: fix dereference null return value
        target-cris/translate.c: fix out of bounds read
        shpc: fix error propaagation
        qemu-char: fix MISSING_COMMA
        acl: fix memory leak
        nvme: remove superfluous check
        loader: fix NEGATIVE_RETURNS
        qga: fix false negative argument passing
        mips_mipssim: fix use-after-free for filename
        l2tpv3: fix fd leak
        l2tpv3: fix possible double free
        libcacard: fix resource leak
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      1aba4be9
    • P
      hcd-musb: fix dereference null return value · a9be7657
      Paolo Bonzini 提交于
      usb_ep_get and usb_handle_packet can deal with a NULL device, but we have
      to avoid dereferencing NULL pointers when building the id.
      
      Thanks to Gonglei for an initial stab at fixing this.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a9be7657
  2. 17 11月, 2014 10 次提交
  3. 15 11月, 2014 1 次提交
  4. 14 11月, 2014 14 次提交
    • P
      Merge remote-tracking branch 'remotes/sstabellini/xen-2014-11-14' into staging · 4e70f927
      Peter Maydell 提交于
      * remotes/sstabellini/xen-2014-11-14:
        xen_disk: fix unmapping of persistent grants
        pc: piix4_pm: init legacy PCI hotplug when running on Xen
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      4e70f927
    • Z
      l2tpv3: fix possible double free · 77374582
      zhanghailiang 提交于
      freeaddrinfo(result) does not assign result = NULL, after frees it.
      There will be a double free when it goes error case.
      It is reported by covertiy.
      Reviewed-by: NGonglei <arei.gonglei@huawei.com>
      Cc: qemu-stable@nongnu.org
      Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77374582
    • Z
      libcacard: fix resource leak · 5bbebf62
      zhanghailiang 提交于
      In function connect_to_qemu(), getaddrinfo() will allocate memory
      that is stored into server, it should be freed by using freeaddrinfo()
      before connect_to_qemu() return.
      
      Cc: qemu-stable@nongnu.org
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5bbebf62
    • P
      Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging · b87dcdd0
      Peter Maydell 提交于
      # gpg: Signature made Fri 14 Nov 2014 11:05:54 GMT using RSA key ID 81AB73C8
      # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
      # gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
      
      * remotes/stefanha/tags/block-pull-request:
        vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info
        block: Fix max nb_sectors in bdrv_make_zero
        ahci: factor out FIS decomposition from handle_cmd
        ahci: Check cmd_fis[1] more explicitly
        ahci: Reorder error cases in handle_cmd
        ahci: Fix FIS decomposition
        ahci: add is_ncq predicate helper
        ide: Correct handling of malformed/short PRDTs
        ahci: unify sglist preparation
        ide: repair PIO transfers for cases where nsector > 1
        ahci: Fix byte count regression for ATAPI/PIO
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      b87dcdd0
    • R
      xen_disk: fix unmapping of persistent grants · 2f01dfac
      Roger Pau Monne 提交于
      This patch fixes two issues with persistent grants and the disk PV backend
      (Qdisk):
      
       - Keep track of memory regions where persistent grants have been mapped
         since we need to unmap them as a whole. It is not possible to unmap a
         single grant if it has been batch-mapped. A new check has also been added
         to make sure persistent grants are only used if the whole mapped region
         can be persistently mapped in the batch_maps case.
       - Unmap persistent grants before switching to the closed state, so the
         frontend can also free them.
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Reported-by: NGeorge Dunlap <george.dunlap@eu.citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Kevin Wolf <kwolf@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: George Dunlap <george.dunlap@eu.citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2f01dfac
    • I
      pc: piix4_pm: init legacy PCI hotplug when running on Xen · 91ab2ed7
      Igor Mammedov 提交于
      If user starts QEMU with "-machine pc,accel=xen", then
      compat property in xenfv won't work and it would cause error:
      "Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set"
      when PCI device is added with -device on QEMU CLI.
      
      From: Igor Mammedov <imammedo@redhat.com>
      
      In case of Xen instead of using compat property, just use the fact
      that xen doesn't use QEMU's fw_cfg/acpi tables to switch piix4_pm
      into legacy PCI hotplug mode when Xen is enabled.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NLi Liang <liang.z.li@intel.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      91ab2ed7
    • F
      vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info · 5f583307
      Fam Zheng 提交于
      When extent types don't match, we return -ENOTSUP. In this case, be
      polite to the caller and don't modify bdi.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 1415938161-16217-1-git-send-email-famz@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      5f583307
    • F
      block: Fix max nb_sectors in bdrv_make_zero · f3a9cfdd
      Fam Zheng 提交于
      In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX /
      BDRV_SECTOR_SIZE, so a caller shouldn't exceed it.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-id: 1415603264-21497-1-git-send-email-famz@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      f3a9cfdd
    • J
      ahci: factor out FIS decomposition from handle_cmd · 107f0d46
      John Snow 提交于
      In order to make handle_cmd more readable at the macro level,
      the details of how to decompose particular types of FIS packets
      are left to helper functions.
      
      In our case, the only type of FIS packet we currently expect to
      see is a Register H2D FIS packet, but the gory details of its
      decomposition are of no particular interest in handle_cmd.
      
      This patch keeps the receipt of FIS packets and the decomposition
      thereof separated to two different functions.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1415058979-16604-6-git-send-email-jsnow@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      107f0d46
    • J
      ahci: Check cmd_fis[1] more explicitly · 102e5625
      John Snow 提交于
      Instead of checking for a known byte, inspect the
      fields of this byte explicitly to produce more meaningful
      error messages and improve the readability of this section.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1415058979-16604-5-git-send-email-jsnow@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      102e5625
    • J
      ahci: Reorder error cases in handle_cmd · 36ab3c34
      John Snow 提交于
      Error checking in ahci's handle_cmd is re-ordered so that we
      initialize as few things as possible before we've done our
      sanity checking. This simplifies returning from this call
      in case of an error.
      
      A check to make sure the DMA memory map succeeds with the
      correct size is also added, and the debug print of the
      command fis is cleaned up with its size corrected.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1415058979-16604-4-git-send-email-jsnow@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      36ab3c34
    • J
      ahci: Fix FIS decomposition · 1cbdd968
      John Snow 提交于
      This patch introduces a few changes to how FIS packets are
      deciphered in the AHCI virtual device. The summary of
      changes can be grouped into two pieces:
      
      [A] Changes to how we apply a preliminary sieve to FISes,
      [B] Changes in how we internalize a decomposed FIS.
      
      == Changes to how we apply a preliminary sieve to FISes ==
      
      (1) Packets may now either update the Control register or
          the Command register, but not both. This is according
          to the SATA 3.2 specification which states:
          "...the device either initiates processing of the command
          indicated in the Command register or initiates processing
          of the control request indicated [...] depending on the
          state of the C bit in the FIS."
      
          See SATA 3.2 section 10.5.5.4, "Reception" in the 10.5.5
          "Register Host to Device FIS" section.
      
          This change accounts for the first two regions of change
          within the diff. All other changes belong to the following
          changes.
      
      == Changes in how we internalize a decomposed FIS ==
      
      (2) Instead of trying to extract the sector number out of the
          FIS from bytes 4-10 and setting it with ide_set_sector,
          we set the appropriate IDEState registers and trust that
          ide_get_sector can retrieve the correct sector later.
      
          By "constructing" the sector for use with ide_set_sector,
          we are duplicating the mechanisms of ide_get_sector.
          This change makes the FIS decomposition more obvious.
      
          SATA 3.2 as a specification does not make the legacy
          register mapping with respect to the D2H FIS obvious.
          However, SATA 3.2 section 10.5.5.1 "Register Host to
          Device FIS layout" describes all of the "cmd_fis"
          bytes:
      
          0 - FIS Type (0x27)
          1 - Port Multiplier Port and Command Update flag
          2 - ATA Command
          3 - Features_Low
          4 - LBA 7:0
          5 - LBA 15:8
          6 - LBA 23:16
          7 - Device, AKA "Drive Select."
          8 - LBA 31:24
          9 - LBA 39:32
          10 - LBA 47:40
          11 - Features_High
          12 - Count Low
          13 - Count High
          14 - ICC
          15 - Control
          16-19 - Auxiliary (for NCQ, defined per-command)
      
          Most of these registers map to existing IDEState registers
          in obvious ways, especially features, select, hob_features,
          and nsector (count). ICC is reserved in older specifications
          but is not supported in our implementation, and remains
          unused here. The Control register is not valid for a command
          that is trying to update the command register and is to be
          considered reserved at this point.
      
          What is not obvious is the LBA register mappings, but SATA 1.0
          can help inform of us legacy device support, see SATA 1.0 section
          8.5.2 "Register - Host to Device."
      
          LBA 7:0   - Sector Number    (sector)
          LBA 15:8  - Cyl Low          (lcyl)
          LBA 23:16 - Cyl High         (hcyl)
          LBA 31:24 - Sector Num Exp.  (hob_sector)
          LBA 39:32 - Cyl Low Exp.     (hob_lcyl)
          LBA 47:40 - Cyl High Exp.    (hob_hcyl)
      
          These mappings help guide which registers the FIS should be decomposed
          into/towards for CHS, LBA28 and LBA48 commands.
      
          As a note: The prior confusion that can be seen in the documentation
          arises from the fact that CHS and LBA28 commands use the low nybble
          of the drive select register to store LBA 27:24, whereas LNA48 commands
          use the hob_sector, hob_lcyl and hob_hcyl registers as explained above.
      
          The decomposition as it stands now will correctly decompose CHS, LBA28
          and LBA48 commands into their appropriate registers where the core
          IDE/ATAPI layers can deal with them correctly.
      
          See the below point for more information.
      
      (3) We save cmd_fis[7] as ide_state->select, which informs
          decisions about if we are using LBA or CHS.
          This corrects a bug in AHCI wherein we attempt to set and/or
          retrieve the sector number by using ide_set_sector and
          ide_get_sector, which depend on the select register to
          determine if we are using LBA or CHS.
      
          Without this adjustment, LBA48 read/writes are currently
          broken. Thanks to Eniac Zheng @ HP for pointing this out.
      
      (4) Save cmd_fis[11] as ide_state->hob_feature, as defined in SATA 3.2.
      
      (5) For several ATA commands, the sector count register set to 0
          is a magic number that means 256 sectors. For LBA48 commands,
          this means 65,536 sectors. We drop the magic sector correction
          here, and trust the ide core layer to handle the conversion
          appropriately, in ide_cmd_lba48_transform(). As it stands,
          the current AHCI code is only compliant with LBA28 commands.
          By simply removing the magic, it will work with LBA28 and LBA48.
      
      (6) We expand FIS decomposition to include both ATAPI and IDE devices.
          We leave the logic of determining if the fields are valid or not
          to the respective layers.
      
          This change intends to make it clearer that AHCI is only a
          composition mechanism for the FIS packets: the meanings of
          the registers is best left to the implementation layers for
          those devices.
      
      (7) Forcefully setting the feature, hcyl and lcyl registers for ATAPI
          commands is removed.
          - The hcyl and lcyl magic present here is valid at boot only,
            and should not be overridden for every PACKET command.
          - The feature register is defined as valid for the PACKET command,
            so we should not suppress it. The ATAPI layer does not even
            currently depend on or require 0x01 as mandatory.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1415058979-16604-3-git-send-email-jsnow@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      1cbdd968
    • J
      ahci: add is_ncq predicate helper · 72a065db
      John Snow 提交于
      A small helper to determine which S/ATA commands
      are destined to be routed to the NCQ pathways.
      
      This references SATA 3.2 section 13.6,
      Native Command Queueing. See sections 13.6.4,
      13.6.5, 13.6.6, 13.6.7 and 13.6.8 for all
      SATA commands considered to be part of the
      NCQ feature set. This is summarized in a small
      list in section 13.6.3.1 and again in 13.6.3.2.
      
      Not all of these NCQ commands are currently supported,
      so the error pathways are adjusted slightly to be more
      informative in the case they are encountered.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1415058979-16604-2-git-send-email-jsnow@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      72a065db
    • J
      ide: Correct handling of malformed/short PRDTs · 3251bdcf
      John Snow 提交于
      This impacts both BMDMA and AHCI HBA interfaces for IDE.
      Currently, we confuse the difference between a PRDT having
      "0 bytes" and a PRDT having "0 complete sectors."
      
      When we receive an incomplete sector, inconsistent error checking
      leads to an infinite loop wherein the call succeeds, but it
      didn't give us enough bytes -- leading us to re-call the
      DMA chain over and over again. This leads to, in the BMDMA case,
      leaked memory for short PRDTs, and infinite loops and resource
      usage in the AHCI case.
      
      The .prepare_buf() callback is reworked to return the number of
      bytes that it successfully prepared. 0 is a valid, non-error
      answer that means the table was empty and described no bytes.
      -1 indicates an error.
      
      Our current implementation uses the io_buffer in IDEState to
      ultimately describe the size of a prepared scatter-gather list.
      Even though the AHCI PRDT/SGList can be as large as 256GiB, the
      AHCI command header limits transactions to just 4GiB. ATA8-ACS3,
      however, defines the largest transaction to be an LBA48 command
      that transfers 65,536 sectors. With a 512 byte sector size, this
      is just 32MiB.
      
      Since our current state structures use the int type to describe
      the size of the buffer, and this state is migrated as int32, we
      are limited to describing 2GiB buffer sizes unless we change the
      migration protocol.
      
      For this reason, this patch begins to unify the assertions in the
      IDE pathways that the scatter-gather list provided by either the
      AHCI PRDT or the PCI BMDMA PRDs can only describe, at a maximum,
      2GiB. This should be resilient enough unless we need a sector
      size that exceeds 32KiB.
      
      Further, the likelihood of any guest operating system actually
      attempting to transfer this much data in a single operation is
      very slim.
      
      To this end, the IDEState variables have been updated to more
      explicitly clarify our maximum supported size. Callers to the
      prepare_buf callback have been reworked to understand the new
      return code, and all versions of the prepare_buf callback have
      been adjusted accordingly.
      
      Lastly, the ahci_populate_sglist helper, relied upon by the
      AHCI implementation of .prepare_buf() as well as the PCI
      implementation of the callback have had overflow assertions
      added to help make clear the reasonings behind the various
      type changes.
      
      [Added %d -> %"PRId64" fix John sent because off_pos changed from int to
      int64_t.
      --Stefan]
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1414785819-26209-4-git-send-email-jsnow@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      3251bdcf