1. 13 6月, 2016 4 次提交
    • E
      pxa2xx: Unconditionally enable USB controller · c92cfba8
      Eduardo Habkost 提交于
      Simplify initialization logic by removing the usb_enabled()
      check. The USB controller is part of the SoC, so it doesn't make
      sense to create a system where it is not present.
      
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Cc: Andrzej Zaborowski <balrogg@gmail.com>
      Cc: qemu-arm@nongnu.org,
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      Message-id: 1465419025-21519-2-git-send-email-ehabkost@redhat.com
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      c92cfba8
    • P
      hw/usb/dev-network.c: Use ldl_le_p() and stl_le_p() · ec9125bc
      Peter Maydell 提交于
      Use stl_le_p() and ldl_le_p() to read and write data from
      buffers, rather than using pointer casts and cpu_to_le32()
      for writes and le32_to_cpup() for reads. This:
       * avoids lots of casts
       * works even if the buffer isn't as aligned as the host would like
       * avoids using the *_to_cpup() functions which we want to get rid of
      
      Note that there may still be some places where a pointer from the
      guest is cast to a pointer to a host structure; these would also
      have to be changed for the device to work on a host CPU which
      enforces alignment restrictions.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-id: 1465573077-29221-1-git-send-email-peter.maydell@linaro.org
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      ec9125bc
    • G
      usb-host: add special case for bus+addr · e058fa2d
      Gerd Hoffmann 提交于
      This patch changes usb-host behavior in case we hostbus= and hostaddr=
      properties are used to identify the usb device in question.  Instead of
      adding the device to the hotplug watchlist we try to open directly using
      the given bus number and device address.
      
      Putting a device specified by hostaddr to the hotplug watchlist isn't
      a great idea as the address isn't a fixed property.  It changes every
      time the device is plugged in.  So considering this case as "use the
      device at bus:addr _now_" is more sane.  Also usb-host will throw errors
      in case it can't initialize the host device.
      
      Note: For devices on the hotplug watchlist (hostport or vendorid or
      productid specified) qemu continues to ignore errors and keeps
      monitoring the usb bus to see if the device eventually shows up.
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      Message-id: 1464945175-28939-1-git-send-email-kraxel@redhat.com
      e058fa2d
    • P
      Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20160611' into staging · da2fdd0b
      Peter Maydell 提交于
      TB hashing improvements
      
      # gpg: Signature made Sun 12 Jun 2016 01:12:50 BST
      # gpg:                using RSA key 0xAD1270CC4DD0279B
      # gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
      # gpg:                 aka "Richard Henderson <rth@redhat.com>"
      # gpg:                 aka "Richard Henderson <rth@twiddle.net>"
      # Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B
      
      * remotes/rth/tags/pull-tcg-20160611:
        translate-all: add tb hash bucket info to 'info jit' dump
        tb hash: track translated blocks with qht
        qht: add test-qht-par to invoke qht-bench from 'check' target
        qht: add qht-bench, a performance benchmark
        qht: add test program
        qht: QEMU's fast, resizable and scalable Hash Table
        qdist: add test program
        qdist: add module to represent frequency distributions of data
        tb hash: hash phys_pc, pc, and flags with xxhash
        exec: add tb_hash_func5, derived from xxhash
        qemu-thread: add simple test-and-set spinlock
        include/processor.h: define cpu_relax()
        seqlock: rename write_lock/unlock to write_begin/end
        seqlock: remove optional mutex
        compiler.h: add QEMU_ALIGNED() to enforce struct alignment
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      da2fdd0b
  2. 12 6月, 2016 15 次提交
    • E
      translate-all: add tb hash bucket info to 'info jit' dump · 329844d4
      Emilio G. Cota 提交于
      Examples:
      
      - Good hashing, i.e. tb_hash_func5(phys_pc, pc, flags):
      TB count            715135/2684354
      [...]
      TB hash buckets     388775/524288 (74.15% head buckets used)
      TB hash occupancy   33.04% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[90,100]%
      TB hash avg chain   1.017 buckets. Histogram: 1|█▁▁|3
      
      - Not-so-good hashing, i.e. tb_hash_func5(phys_pc, pc, 0):
      TB count            712636/2684354
      [...]
      TB hash buckets     344924/524288 (65.79% head buckets used)
      TB hash occupancy   31.64% avg chain occ. Histogram: [0,10)%|█ ▆  ▅▁▃▁▂|[90,100]%
      TB hash avg chain   1.047 buckets. Histogram: 1|█▁▁▁|4
      
      - Bad hashing, i.e. tb_hash_func5(phys_pc, 0, 0):
      TB count            702818/2684354
      [...]
      TB hash buckets     112741/524288 (21.50% head buckets used)
      TB hash occupancy   10.15% avg chain occ. Histogram: [0,10)%|█ ▁  ▁▁▁▁▁|[90,100]%
      TB hash avg chain   2.107 buckets. Histogram: [1.0,10.2)|█▁▁▁▁▁▁▁▁▁|[83.8,93.0]
      
      - Good hashing, but no auto-resize:
      TB count            715634/2684354
      TB hash buckets     8192/8192 (100.00% head buckets used)
      TB hash occupancy   98.30% avg chain occ. Histogram: [95.3,95.8)%|▁▁▃▄▃▄▁▇▁█|[99.5,100.0]%
      TB hash avg chain   22.070 buckets. Histogram: [15.0,16.7)|▁▂▅▄█▅▁▁▁▁|[30.3,32.0]
      Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Suggested-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-16-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      329844d4
    • E
      tb hash: track translated blocks with qht · 909eaac9
      Emilio G. Cota 提交于
      Having a fixed-size hash table for keeping track of all translation blocks
      is suboptimal: some workloads are just too big or too small to get maximum
      performance from the hash table. The MRU promotion policy helps improve
      performance when the hash table is a little undersized, but it cannot
      make up for severely undersized hash tables.
      
      Furthermore, frequent MRU promotions result in writes that are a scalability
      bottleneck. For scalability, lookups should only perform reads, not writes.
      This is not a big deal for now, but it will become one once MTTCG matures.
      
      The appended fixes these issues by using qht as the implementation of
      the TB hash table. This solution is superior to other alternatives considered,
      namely:
      
      - master: implementation in QEMU before this patchset
      - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
      - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
                    MRU is implemented here by adding an intermediate struct
                    that contains the u32 hash and a pointer to the TB; this
                    allows us, on an MRU promotion, to copy said struct (that is not
                    at the head), and put this new copy at the head. After a grace
                    period, the original non-head struct can be eliminated, and
                    after another grace period, freed.
      - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
                         no MRU for lookups; MRU for inserts.
      The appended solution is the following:
      - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
                       no MRU for lookups; MRU for inserts.
      
      The plots below compare the considered solutions. The Y axis shows the
      boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
      sweeps the number of buckets (or initial number of buckets for qht-autoresize).
      The plots in PNG format (and with errorbars) can be seen here:
        http://imgur.com/a/Awgnq
      
      Each test runs 5 times, and the entire QEMU process is pinned to a
      single core for repeatability of results.
      
                                  Host: Intel Xeon E5-2690
      
        28 ++------------+-------------+-------------+-------------+------------++
           A*****        +             +             +             master **A*** +
        27 ++    *                                                 xxhash ##B###++
           |      A******A******                               xxhash-rcu $$C$$$ |
        26 C$$                  A******A******            qht-fixed-nomru*%%D%%%++
           D%%$$                              A******A******A*qht-dyn-mru A*E****A
        25 ++ %%$$                                          qht-dyn-nomru &&F&&&++
           B#####%                                                               |
        24 ++    #C$$$$$                                                        ++
           |      B###  $                                                        |
           |          ## C$$$$$$                                                 |
        23 ++           #       C$$$$$$                                         ++
           |             B######       C$$$$$$                                %%%D
        22 ++                  %B######       C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
           |                    D%%%%%%B######      @E@@@@@@    %%%D%%%@@@E@@@@@@E
        21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
           +             E@@@   F&&&   +      E@     +      F&&&   +             +
        20 ++------------+-------------+-------------+-------------+------------++
           14            16            18            20            22            24
                                   log2 number of buckets
      
                                       Host: Intel i7-4790K
      
        14.5 ++------------+------------+-------------+------------+------------++
             A**           +            +             +            master **A*** +
          14 ++ **                                                 xxhash ##B###++
        13.5 ++   **                                           xxhash-rcu $$C$$$++
             |                                            qht-fixed-nomru %%D%%% |
          13 ++     A******                                   qht-dyn-mru @@E@@@++
             |             A*****A******A******             qht-dyn-nomru &&F&&& |
        12.5 C$$                               A******A******A*****A******    ***A
          12 ++ $$                                                        A***  ++
             D%%% $$                                                             |
        11.5 ++  %%                                                             ++
             B###  %C$$$$$$                                                      |
          11 ++  ## D%%%%% C$$$$$                                               ++
             |     #      %      C$$$$$$                                         |
        10.5 F&&&&&&B######D%%%%%       C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$    $$$C
          10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
             +             F&&          D%%%%%%B######B######B#####B###@@@D%%%   +
         9.5 ++------------+------------+-------------+------------+------------++
             14            16           18            20           22            24
                                    log2 number of buckets
      
      Note that the original point before this patch series is X=15 for "master";
      the little sensitivity to the increased number of buckets is due to the
      poor hashing function in master.
      
      xxhash-rcu has significant overhead due to the constant churn of allocating
      and deallocating intermediate structs for implementing MRU. An alternative
      would be do consider failed lookups as "maybe not there", and then
      acquire the external lock (tb_lock in this case) to really confirm that
      there was indeed a failed lookup. This, however, would not be enough
      to implement dynamic resizing--this is more complex: see
      "Resizable, Scalable, Concurrent Hash Tables via Relativistic
      Programming" by Triplett, McKenney and Walpole. This solution was
      discarded due to the very coarse RCU read critical sections that we have
      in MTTCG; resizing requires waiting for readers after every pointer update,
      and resizes require many pointer updates, so this would quickly become
      prohibitive.
      
      qht-fixed-nomru shows that MRU promotion is advisable for undersized
      hash tables.
      
      However, qht-dyn-mru shows that MRU promotion is not important if the
      hash table is properly sized: there is virtually no difference in
      performance between qht-dyn-nomru and qht-dyn-mru.
      
      Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
      X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
      can achieve with optimum sizing of the hash table, while keeping the hash
      table scalable for readers.
      
      The improvement we get before and after this patch for booting debian jessie
      with arm-softmmu is:
      
      - Intel Xeon E5-2690: 10.5% less time
      - Intel i7-4790K: 5.2% less time
      
      We could get this same improvement _for this particular workload_ by
      statically increasing the size of the hash table. But this would hurt
      workloads that do not need a large hash table. The dynamic (upward)
      resizing allows us to start small and enlarge the hash table as needed.
      
      A quick note on downsizing: the table is resized back to 2**15 buckets
      on every tb_flush; this makes sense because it is not guaranteed that the
      table will reach the same number of TBs later on (e.g. most bootup code is
      thrown away after boot); it makes sense to grow the hash table as
      more code blocks are translated. This also avoids the complication of
      having to build downsizing hysteresis logic into qht.
      Reviewed-by: NSergey Fedorov <serge.fedorov@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      909eaac9
    • E
      qht: add test-qht-par to invoke qht-bench from 'check' target · 896a9ee9
      Emilio G. Cota 提交于
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-14-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      896a9ee9
    • E
      qht: add qht-bench, a performance benchmark · 515864a0
      Emilio G. Cota 提交于
      This serves as a performance benchmark as well as a stress test
      for QHT. We can tweak quite a number of things, including the
      number of resize threads and how frequently resizes are triggered.
      
      A performance comparison of QHT vs CLHT[1] and ck_hs[2] using
      this same benchmark program can be found here:
        http://imgur.com/a/0Bms4
      
      The tests are run on a 64-core AMD Opteron 6376, pinning threads
      to cores favoring same-socket cores. For each run, qht-bench is
      invoked with:
        $ tests/qht-bench -d $duration -n $n -u $u -g $range
      , where $duration is in seconds, $n is the number of threads,
      $u is the update rate (0.0 to 100.0), and $range is the number
      of keys.
      
      Note that ck_hs's performance drops significantly as writes go
      up, since it requires an external lock (I used a ck_spinlock)
      around every write.
      
      Also, note that CLHT instead of using a seqlock, relies on an
      allocator that does not ever return the same address during the
      same read-critical section. This gives it a slight performance
      advantage over QHT on read-heavy workloads, since the seqlock
      writes aren't there.
      
      [1] CLHT: https://github.com/LPD-EPFL/CLHT
                https://infoscience.epfl.ch/record/207109/files/ascy_asplos15.pdf
      
      [2] ck_hs: http://concurrencykit.org/
                 http://backtrace.io/blog/blog/2015/03/13/workload-specialization/
      
      A few of those plots are shown in text here, since that site
      might not be online forever. Throughput is on Mops/s on the Y axis.
      
                                   200K keys, 0 % updates
      
        450 ++--+------+------+-------+-------+-------+-------+------+-------+--++
            |   +      +      +       +       +       +       +      +      +N+  |
        400 ++                                                           ---+E+ ++
            |                                                       +++----      |
        350 ++          9 ++------+------++                       --+E+    -+H+ ++
            |             |      +H+-     |                 -+N+----   ---- +++  |
        300 ++          8 ++     +E+     ++             -----+E+  --+H+         ++
            |             |      +++      |         -+N+-----+H+--               |
        250 ++          7 ++------+------++  +++-----+E+----                    ++
        200 ++                    1         -+E+-----+H+                        ++
            |                           ----                     qht +-E--+      |
        150 ++                      -+E+                        clht +-H--+     ++
            |                   ----                              ck +-N--+      |
        100 ++               +E+                                                ++
            |            ----                                                    |
         50 ++       -+E+                                                       ++
            |   +E+E+  +      +       +       +       +       +      +       +   |
          0 ++--E------+------+-------+-------+-------+-------+------+-------+--++
                1      8      16      24      32      40      48     56      64
                                      Number of threads
      
                                   200K keys, 1 % updates
      
        350 ++--+------+------+-------+-------+-------+-------+------+-------+--++
            |   +      +      +       +       +       +       +      +     -+E+  |
        300 ++                                                         -----+H+ ++
            |                                                       +E+--        |
            |           9 ++------+------++                  +++----             |
        250 ++            |      +E+   -- |                 -+E+                ++
            |           8 ++         --  ++             ----                     |
        200 ++            |      +++-     |  +++  ---+E+                        ++
            |           7 ++------N------++ -+E+--               qht +-E--+      |
            |                     1  +++----                    clht +-H--+      |
        150 ++                      -+E+                          ck +-N--+     ++
            |                   ----                                             |
        100 ++               +E+                                                ++
            |            ----                                                    |
            |        -+E+                                                        |
         50 ++    +H+-+N+----+N+-----+N+------                                  ++
            |   +E+E+  +      +       +      +N+-----+N+-----+N+----+N+-----+N+  |
          0 ++--E------+------+-------+-------+-------+-------+------+-------+--++
                1      8      16      24      32      40      48     56      64
                                      Number of threads
      
                                   200K keys, 20 % updates
      
        300 ++--+------+------+-------+-------+-------+-------+------+-------+--++
            |   +      +      +       +       +       +       +      +       +   |
            |                                                              -+H+  |
        250 ++                                                         ----     ++
            |           9 ++------+------++                       --+H+  ---+E+  |
            |           8 ++     +H+--   ++                 -+H+----+E+--        |
        200 ++            |      +E+    --|             -----+E+--  +++         ++
            |           7 ++      + ---- ++       ---+H+---- +++ qht +-E--+      |
        150 ++          6 ++------N------++ -+H+-----+E+        clht +-H--+     ++
            |                     1     -----+E+--                ck +-N--+      |
            |                       -+H+----                                     |
        100 ++                  -----+E+                                        ++
            |                +E+--                                               |
            |            ----+++                                                 |
         50 ++       -+E+                                                       ++
            |     +E+ +++                                                        |
            |   +E+N+-+N+-----+       +       +       +       +      +       +   |
          0 ++--E------+------N-------N-------N-------N-------N------N-------N--++
                1      8      16      24      32      40      48     56      64
                                      Number of threads
      
                                  200K keys, 100 % updates       qht +-E--+
                                                                clht +-H--+
        160 ++--+------+------+-------+-------+-------+-------+---ck-+-N-----+--++
            |   +      +      +       +       +       +       +      +   ----H   |
        140 ++                                                      +H+--  -+E+ ++
            |                                                +++----   ----      |
        120 ++          8 ++------+------++                 -+H+    +E+         ++
            |           7 ++     +H+---- ++             ---- +++----             |
        100 ++            |      +E+      |  +++  ---+H+    -+E+                ++
            |           6 ++     +++     ++ -+H+--   +++----                     |
         80 ++          5 ++------N----------+E+-----+E+                        ++
            |                     1 -+H+---- +++                                 |
            |                   -----+E+                                         |
         60 ++               +H+---- +++                                        ++
            |            ----+E+                                                 |
         40 ++        +H+----                                                   ++
            |       --+E+                                                        |
         20 ++    +E+                                                           ++
            |  +EE+    +      +       +       +       +       +      +       +   |
          0 ++--+N-N---N------N-------N-------N-------N-------N------N-------N--++
                1      8      16      24      32      40      48     56      64
                                      Number of threads
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-13-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      515864a0
    • E
      qht: add test program · 1a95404f
      Emilio G. Cota 提交于
      Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-12-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      1a95404f
    • E
      qht: QEMU's fast, resizable and scalable Hash Table · 2e11264a
      Emilio G. Cota 提交于
      This is a fast, scalable chained hash table with optional auto-resizing, allowing
      reads that are concurrent with reads, and reads/writes that are concurrent
      with writes to separate buckets.
      
      A hash table with these features will be necessary for the scalability
      of the ongoing MTTCG work; before those changes arrive we can already
      benefit from the single-threaded speedup that qht also provides.
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-11-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      2e11264a
    • E
      qdist: add test program · ff9249b7
      Emilio G. Cota 提交于
      Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-10-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ff9249b7
    • E
      qdist: add module to represent frequency distributions of data · bf3afd5f
      Emilio G. Cota 提交于
      Sometimes it is useful to have a quick histogram to represent a certain
      distribution -- for example, when investigating a performance regression
      in a hash table due to inadequate hashing.
      
      The appended allows us to easily represent a distribution using Unicode
      characters. Further, the data structure keeping track of the distribution
      is so simple that obtaining its values for off-line processing is trivial.
      
      Example, taking the last 10 commits to QEMU:
      
       Characters in commit title  Count
      -----------------------------------
                               39      1
                               48      1
                               53      1
                               54      2
                               57      1
                               61      1
                               67      1
                               78      1
                               80      1
      qdist_init(&dist);
      qdist_inc(&dist, 39);
      [...]
      qdist_inc(&dist, 80);
      
      char *str = qdist_pr(&dist, 9, QDIST_PR_LABELS);
      // -> [39.0,43.6)▂▂ █▂ ▂ ▄[75.4,80.0]
      g_free(str);
      
      char *str = qdist_pr(&dist, 4, QDIST_PR_LABELS);
      // -> [39.0,49.2)▁█▁▁[69.8,80.0]
      g_free(str);
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-9-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      bf3afd5f
    • E
      tb hash: hash phys_pc, pc, and flags with xxhash · 42bd3228
      Emilio G. Cota 提交于
      For some workloads such as arm bootup, tb_phys_hash is performance-critical.
      The is due to the high frequency of accesses to the hash table, originated
      by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
      More info:
        https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html
      
      To dig further into this I modified an arm image booting debian jessie to
      immediately shut down after boot. Analysis revealed that quite a bit of time
      is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
      results in very uneven loading of chains in the hash table's buckets;
      the longest observed chain had ~550 elements.
      
      The appended addresses this with two changes:
      
      1) Use xxhash as the hash table's hash function. xxhash is a fast,
         high-quality hashing function.
      
      2) Feed the hashing function with not just tb_phys, but also pc and flags.
      
      This improves performance over using just tb_phys for hashing, since that
      resulted in some hash buckets having many TB's, while others getting very few;
      with these changes, the longest observed chain on a single hash bucket is
      brought down from ~550 to ~40.
      
      Tests show that the other element checked for in tb_find_physical,
      cs_base, is always a match when tb_phys+pc+flags are a match,
      so hashing cs_base is wasteful. It could be that this is an ARM-only
      thing, though. UPDATE:
      On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote:
      > The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB
      > consisting of only a delay slot).
      > It may well still turn out to be reasonable to ignore cs_base for hashing.
      
      BTW, after this change the hash table should not be called "tb_hash_phys"
      anymore; this is addressed later in this series.
      
      This change gives consistent bootup time improvements. I tested two
      host machines:
      - Intel Xeon E5-2690: 11.6% less time
      - Intel i7-4790K: 19.2% less time
      
      Increasing the number of hash buckets yields further improvements. However,
      using a larger, fixed number of buckets can degrade performance for other
      workloads that do not translate as many blocks (600K+ for debian-jessie arm
      bootup). This is dealt with later in this series.
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      42bd3228
    • E
      exec: add tb_hash_func5, derived from xxhash · dc8b295d
      Emilio G. Cota 提交于
      This will be used by upcoming changes for hashing the tb hash.
      
      Add this into a separate file to include the copyright notice from
      xxhash.
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-7-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      dc8b295d
    • G
      qemu-thread: add simple test-and-set spinlock · ac9a9eba
      Guillaume Delbergue 提交于
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Signed-off-by: NGuillaume Delbergue <guillaume.delbergue@greensocs.com>
      [Rewritten. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Emilio's additions: use TAS instead of atomic_xchg; emit acquire/release
       barriers; return bool from trylock; call cpu_relax() while spinning;
       optimize for uncontended locks by acquiring the lock with TAS instead
       of TATAS; add qemu_spin_locked().]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-6-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ac9a9eba
    • E
      include/processor.h: define cpu_relax() · 462cda50
      Emilio G. Cota 提交于
      Taken from the linux kernel.
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-5-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      462cda50
    • E
      seqlock: rename write_lock/unlock to write_begin/end · 03719e44
      Emilio G. Cota 提交于
      It is a more appropriate name, now that the mutex embedded
      in the seqlock is gone.
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-4-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      03719e44
    • E
      seqlock: remove optional mutex · ccdb3c1f
      Emilio G. Cota 提交于
      This option is unused; besides, it bloats the struct when not needed.
      Let's just let writers define their own locks elsewhere.
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-3-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ccdb3c1f
    • E
      compiler.h: add QEMU_ALIGNED() to enforce struct alignment · 911a4d22
      Emilio G. Cota 提交于
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1465412133-3029-2-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      911a4d22
  3. 10 6月, 2016 6 次提交
  4. 09 6月, 2016 9 次提交
    • P
      target-i386: Move user-mode exception actions out of user-exec.c · 0c33682d
      Peter Maydell 提交于
      The exception_action() function in user-exec.c is just a call to
      cpu_loop_exit() for every target CPU except i386.  Since this
      function is only called if the target's handle_mmu_fault() hook has
      indicated an MMU fault, and that hook is only called from the
      handle_cpu_signal() code path, we can simply move the x86-specific
      setup into that hook, which allows us to remove the TARGET_I386
      ifdef from user-exec.c.
      
      Of the actions that were done by the call to raise_interrupt_err():
       * cpu_svm_check_intercept_param() is a no-op in user mode
       * check_exception() is a no-op since double faults are impossible
         for user-mode
       * assignments to cs->exception_index and env->error_code are no-ops
       * assigning to env->exception_next_eip is unnecessary because it
         is not used unless env->exception_is_int is true
       * cpu_loop_exit_restore() is equivalent to cpu_loop_exit() since
         pc is 0
      which leaves just setting env_>exception_is_int as the action that
      needs to be added to x86_cpu_handle_mmu_fault().
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 1463494687-25947-7-git-send-email-peter.maydell@linaro.org
      0c33682d
    • P
      target-i386: Add comment about do_interrupt_user() next_eip argument · 33271823
      Peter Maydell 提交于
      Add a comment to do_interrupt_user() along the same lines as the
      existing one for do_interrupt_all() noting that the next_eip
      argument is not used unless is_int is true or intno is EXCP_SYSCALL.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 1463494687-25947-6-git-send-email-peter.maydell@linaro.org
      33271823
    • P
      user-exec: Don't reextract sigmask from usercontext pointer · a5852dc5
      Peter Maydell 提交于
      Extracting the old signal mask from the usercontext pointer passed to
      a signal handler is a pain because it is OS and CPU dependent.
      Since we've already done it once and passed it to handle_cpu_signal(),
      there's no need to do it again in cpu_exit_tb_from_sighandler().
      This then means we don't need to pass a usercontext pointer in to
      handle_cpu_signal() at all.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 1463494687-25947-5-git-send-email-peter.maydell@linaro.org
      a5852dc5
    • P
      cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() · 6886b980
      Peter Maydell 提交于
      The function cpu_resume_from_signal() is now always called with a
      NULL puc argument, and is rather misnamed since it is never called
      from a signal handler. It is essentially forcing an exit to the
      top level cpu loop but without raising any exception, so rename
      it to cpu_loop_exit_noexc() and drop the useless unused argument.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 1463494687-25947-4-git-send-email-peter.maydell@linaro.org
      6886b980
    • P
      user-exec: Push resume-from-signal code out to handle_cpu_signal() · f213e72f
      Peter Maydell 提交于
      Since the only caller of page_unprotect() which might cause it to
      need to call cpu_resume_from_signal() is handle_cpu_signal() in
      the user-mode code, push the longjump handling out to that function.
      
      Since this is the only caller of cpu_resume_from_signal() which
      passes a non-NULL puc argument, split the non-NULL handling into
      a new cpu_exit_tb_from_sighandler() function. This allows us
      to merge the softmmu and usermode implementations of the
      cpu_resume_from_signal() function, which are now identical.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 1463494687-25947-3-git-send-email-peter.maydell@linaro.org
      f213e72f
    • P
      translate-all.c: Don't pass puc, locked to tb_invalidate_phys_page() · 75809229
      Peter Maydell 提交于
      The user-mode-only function tb_invalidate_phys_page() is only
      called from two places:
       * page_unprotect(), which passes in a non-zero pc, a puc pointer
         and the value 'true' for the locked argument
       * page_set_flags(), which passes in a zero pc, a NULL puc pointer
         and a 'false' locked argument
      
      If the pc is non-zero then we may call cpu_resume_from_signal(),
      which does a longjmp out of the calling code (and out of the
      signal handler); this is to cover the case of a target CPU with
      "precise self-modifying code" (currently only x86) executing
      a store instruction which modifies code in the same TB as the
      store itself. Rather than doing the longjump directly here,
      return a flag to the caller which indicates whether the current
      TB was modified, and move the longjump to page_unprotect.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 1463494687-25947-2-git-send-email-peter.maydell@linaro.org
      75809229
    • X
      hw/arm: virt uart fix · 9bbbf649
      xiaoqiang zhao 提交于
      commit f0d1d2c1
      ("hw/char: QOM'ify pl011 model") break qemu-system-arm virt machine
      if option '-machine secure=on' is provided.
      
      The function create_uart is called twice. So make CharDriverState pointer
      a parameter to create_uart instead of hardcoded.
      Signed-off-by: Nxiaoqiang zhao <zxq_yx_007@163.com>
      Tested-by: NJerome Forissier <jerome.forissier@linaro.org>
      Message-id: 1465353045-26323-1-git-send-email-zxq_yx_007@163.com
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      9bbbf649
    • P
      Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160608' into staging · b66e10e4
      Peter Maydell 提交于
      linux-user pull request for June 2016
      
      # gpg: Signature made Wed 08 Jun 2016 14:27:14 BST
      # gpg:                using RSA key 0xB44890DEDE3C9BC0
      # gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>"
      # gpg:                 aka "Riku Voipio <riku.voipio@linaro.org>"
      
      * remotes/riku/tags/pull-linux-user-20160608: (44 commits)
        linux-user: In fork_end(), remove correct CPUs from CPU list
        linux-user: Special-case ERESTARTSYS in target_strerror()
        linux-user: Make target_strerror() return 'const char *'
        linux-user: Correct signedness of target_flock l_start and l_len fields
        linux-user: Use safe_syscall wrapper for ioctl
        linux-user: Use safe_syscall wrapper for accept and accept4 syscalls
        linux-user: Use safe_syscall wrapper for semop
        linux-user: Use safe_syscall wrapper for epoll_wait syscalls
        linux-user: Use safe_syscall wrapper for poll and ppoll syscalls
        linux-user: Use safe_syscall wrapper for sleep syscalls
        linux-user: Use safe_syscall wrapper for rt_sigtimedwait syscall
        linux-user: Use safe_syscall wrapper for flock
        linux-user: Use safe_syscall wrapper for mq_timedsend and mq_timedreceive
        linux-user: Use safe_syscall wrapper for msgsnd and msgrcv
        linux-user: Use safe_syscall wrapper for send* and recv* syscalls
        linux-user: Use safe_syscall wrapper for connect syscall
        linux-user: Use safe_syscall wrapper for readv and writev syscalls
        linux-user: Fix error conversion in 64-bit fadvise syscall
        linux-user: Fix NR_fadvise64 and NR_fadvise64_64 for 32-bit guests
        linux-user: Fix handling of arm_fadvise64_64 syscall
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      
      Conflicts:
      	configure
      	scripts/qemu-binfmt-conf.sh
      b66e10e4
    • P
      Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging · 6f50f25c
      Peter Maydell 提交于
      Block layer patches
      
      # gpg: Signature made Wed 08 Jun 2016 09:31:38 BST
      # gpg:                using RSA key 0x7F09B272C88F2FD6
      # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
      
      * remotes/kevin/tags/for-upstream: (31 commits)
        qemu-img bench: Add --flush-interval
        qemu-img bench: Implement -S (step size)
        qemu-img bench: Make start offset configurable
        qemu-img bench: Sequential writes
        qemu-img bench
        block: Don't emulate natively supported pwritev flags
        blockdev: clean up error handling in do_open_tray
        block: Fix bdrv_all_delete_snapshot() error handling
        qcow2: avoid extra flushes in qcow2
        raw-posix: Fetch max sectors for host block device
        block: assert that bs->request_alignment is a power of 2
        migration/block: Convert saving to BlockBackend
        migration/block: Convert load to BlockBackend
        block: Kill bdrv_co_write_zeroes()
        vmdk: Convert to bdrv_co_pwrite_zeroes()
        raw_bsd: Convert to bdrv_co_pwrite_zeroes()
        raw-posix: Convert to bdrv_co_pwrite_zeroes()
        qed: Convert to bdrv_co_pwrite_zeroes()
        gluster: Convert to bdrv_co_pwrite_zeroes()
        blkreplay: Convert to bdrv_co_pwrite_zeroes()
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      6f50f25c
  5. 08 6月, 2016 6 次提交