• C
    Rework image & texture management to use concurrent message queues. (#9486) · ad582b50
    Chinmay Garde 提交于
    This patch reworks image decompression and collection in the following ways
    because of misbehavior in the described edge cases.
    
    The current flow for realizing a texture on the GPU from a blob of compressed
    bytes is to first pass it to the IO thread for image decompression and then
    upload to the GPU. The handle to the texture on the GPU is then passed back to
    the UI thread so that it can be included in subsequent layer trees for
    rendering. The GPU contexts on the Render & IO threads are in the same
    sharegroup so the texture ends up being visible to the Render Thread context
    during rendering. This works fine and does not block the UI thread. All
    references to the image are owned on UI thread by Dart objects. When the final
    reference to the image is dropped, the texture cannot be collected on the UI
    thread (because it has not GPU context). Instead, it must be passed to either
    the GPU or IO threads. The GPU thread is usually in the middle of a frame
    workload so we redirect the same to the IO thread for eventual collection. While
    texture collections are usually (comparatively) fast, texture decompression and
    upload are slow (order of magnitude of frame intervals).
    
    For application that end up creating (by not necessarily using) numerous large
    textures in straight-line execution, it could be the case that texture
    collection tasks are pending on the IO task runner after all the image
    decompressions (and upload) are done. Put simply, the collection of the first
    image could be waiting for the decompression and upload of the last image in the
    queue.
    
    This is exacerbated by two other hacks added to workaround unrelated issues.
    * First, creating a codec with a single image frame immediately kicks of
      decompression and upload of that frame image (even if the frame was never
      request from the codec). This hack was added because we wanted to get rid of
      the compressed image allocation ASAP. The expectation was codecs would only be
      created with the sole purpose of getting the decompressed image bytes.
      However, for applications that only create codecs to get image sizes (but
      never actually decompress the same), we would end up replacing the compressed
      image allocation with a larger allocation (device resident no less) for no
      obvious use. This issue is particularly insidious when you consider that the
      codec is usually asked for the native image size first before the frame is
      requested at a smaller size (usually using a new codec with same data but new
      targetsize). This would cause the creation of a whole extra texture (at 1:1)
      when the caller was trying to “optimize” for memory use by requesting a
      texture of a smaller size.
    * Second, all image collections we delayed in by the unref queue by 250ms
      because of observations that the calling thread (the UI thread) was being
      descheduled unnecessarily when a task with a timeout of zero was posted from
      the same (recall that a task has to be posted to the IO thread for the
      collection of that texture). 250ms is multiple frame intervals worth of
      potentially unnecessary textures.
    
    The net result of these issues is that we may end up creating textures when all
    that the application needs is to ask it’s codec for details about the same (but
    not necessarily access its bytes). Texture collection could also be delayed
    behind other jobs to decompress the textures on the IO thread. Also, all texture
    collections are delayed for an arbitrary amount of time.
    
    These issues cause applications to be susceptible to OOM situations. These
    situations manifest in various ways. Host memory exhaustion causes the usual OOM
    issues. Device memory exhaustion seems to manifest in different ways on iOS and
    Android. On Android, allocation of a new texture seems to be causing an
    assertion (in the driver). On iOS, the call hangs (presumably waiting for
    another thread to release textures which we won’t do because those tasks are
    blocked behind the current task completing).
    
    To address peak memory usage, the following changes have been made:
    * Image decompression and upload/collection no longer happen on the same thread.
      All image decompression will now be handled on a workqueue. The number of
      worker threads in this workqueue is equal to the number of processors on the
      device. These threads have a lower priority that either the UI or Render
      threads. These workers are shared between all Flutter applications in the
      process.
    * Both the images and their codec now report the correct allocation size to Dart
      for GC purposes. The Dart VM uses this to pick objects for collection. Earlier
      the image allocation was assumed to 32bpp with no mipmapping overhead
      reported. Now, the correct image size is reported and the mipmapping overhead
      is accounted for. Image codec sizes were not reported to the VM earlier and
      now are. Expect “External” VM allocations to be higher than previously
      reported and the numbers in Observatory to line up more closely with actual
      memory usage (device and host).
    * Decoding images to a specific size used to decode to 1:1 before performing a
      resize to the correct dimensions before texture upload. This has now been
      reworked so that images are first decompressed to a smaller size supported
      natively by the codec before final resizing to the requested target size. The
      intermediate copy is now smaller and more promptly collected. Resizing also
      happens on the workqueue worker.
    * The drain interval of the unref queue is now sub-frame-interval. I am hesitant
      to remove the delay entirely because I have not been able to instrument the
      performance overhead of the same. That is next on my list. But now, multiple
      frame intervals worth of textures no longer stick around.
    
    The following issues have been addressed:
    * https://github.com/flutter/flutter/issues/34070 Since this was the first usage
      of the concurrent message loops, the number of idle wakes were determined to
      be too high and this component has been rewritten to be simpler and not use
      the existing task runner and MessageLoopImpl interface.
    * Image decoding had no tests. The new `ui_unittests` harness has been added
      that sets up a GPU test harness on the host using SwiftShader. Tests have been
      added for image decompression, upload and resizing.
    * The device memory exhaustion in this benchmark has been addressed. That
      benchmark is still not viable for inclusion in any harness however because it
      creates 9 million codecs in straight-line execution. Because these codecs are
      destroyed in the microtask callbacks, these are referenced till those
      callbacks are executed. So now, instead of device memory exhaustion, this will
      lead to (slower) exhaustion of host memory. This is expected and working as
      intended.
    
    This patch only addresses peak memory use and makes collection of unused images
    and textures more prompt. It does NOT address memory use by images referenced
    strongly by the application or framework.
    ad582b50
BUILD.gn 1.4 KB