1. 17 1月, 2018 4 次提交
  2. 08 12月, 2017 1 次提交
    • S
      dm bufio: fix shrinker scans when (nr_to_scan < retain_target) · fbc7c07e
      Suren Baghdasaryan 提交于
      When system is under memory pressure it is observed that dm bufio
      shrinker often reclaims only one buffer per scan. This change fixes
      the following two issues in dm bufio shrinker that cause this behavior:
      
      1. ((nr_to_scan - freed) <= retain_target) condition is used to
      terminate slab scan process. This assumes that nr_to_scan is equal
      to the LRU size, which might not be correct because do_shrink_slab()
      in vmscan.c calculates nr_to_scan using multiple inputs.
      As a result when nr_to_scan is less than retain_target (64) the scan
      will terminate after the first iteration, effectively reclaiming one
      buffer per scan and making scans very inefficient. This hurts vmscan
      performance especially because mutex is acquired/released every time
      dm_bufio_shrink_scan() is called.
      New implementation uses ((LRU size - freed) <= retain_target)
      condition for scan termination. LRU size can be safely determined
      inside __scan() because this function is called after dm_bufio_lock().
      
      2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
      determine number of freeable objects in the slab. However dm_bufio
      always retains retain_target buffers in its LRU and will terminate
      a scan when this mark is reached. Therefore returning the entire LRU size
      from dm_bufio_shrink_count() is misleading because that does not
      represent the number of freeable objects that slab will reclaim during
      a scan. Returning (LRU size - retain_target) better represents the
      number of freeable objects in the slab. This way do_shrink_slab()
      returns 0 when (LRU size < retain_target) and vmscan will not try to
      scan this shrinker avoiding scans that will not reclaim any memory.
      
      Test: tested using Android device running
      <AOSP>/system/extras/alloc-stress that generates memory pressure
      and causes intensive shrinker scans
      Signed-off-by: NSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      fbc7c07e
  3. 17 11月, 2017 1 次提交
    • E
      dm bufio: fix integer overflow when limiting maximum cache size · 74d4108d
      Eric Biggers 提交于
      The default max_cache_size_bytes for dm-bufio is meant to be the lesser
      of 25% of the size of the vmalloc area and 2% of the size of lowmem.
      However, on 32-bit systems the intermediate result in the expression
      
          (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
      
      overflows, causing the wrong result to be computed.  For example, on a
      32-bit system where the vmalloc area is 520093696 bytes, the result is
      1174405 rather than the expected 130023424, which makes the maximum
      cache size much too small (far less than 2% of lowmem).  This causes
      severe performance problems for dm-verity users on affected systems.
      
      Fix this by using mult_frac() to correctly multiply by a percentage.  Do
      this for all places in dm-bufio that multiply by a percentage.  Also
      replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
      to the comment is now defined in include/linux/vmalloc.h.
      
      Depends-on: 9993bc63 ("sched/x86: Fix overflow in cyc2ns_offset")
      Fixes: 95d402f0 ("dm: add bufio")
      Cc: <stable@vger.kernel.org> # v3.2+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      74d4108d
  4. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  5. 28 8月, 2017 1 次提交
  6. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  7. 25 7月, 2017 1 次提交
  8. 09 6月, 2017 1 次提交
  9. 31 5月, 2017 1 次提交
    • J
      dm: make flush bios explicitly sync · ff0361b3
      Jan Kara 提交于
      Commit b685d3d6 ("block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous") removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
      definitions.  generic_make_request_checks() however strips REQ_FUA and
      REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
      write cache and thus write effectively becomes asynchronous which can
      lead to performance regressions.
      
      Fix the problem by making sure all bios which are synchronous are
      properly marked with REQ_SYNC.
      
      Fixes: b685d3d6 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ff0361b3
  10. 17 5月, 2017 1 次提交
    • M
      dm bufio: make the parameter "retain_bytes" unsigned long · 13840d38
      Mikulas Patocka 提交于
      Change the type of the parameter "retain_bytes" from unsigned to
      unsigned long, so that on 64-bit machines the user can set more than
      4GiB of data to be retained.
      
      Also, change the type of the variable "count" in the function
      "__evict_old_buffers" to unsigned long.  The assignment
      "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
      could result in unsigned long to unsigned overflow and that could result
      in buffers not being freed when they should.
      
      While at it, avoid division in get_retain_buffers().  Division is slow,
      we can change it to shift because we have precalculated the log2 of
      block size.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      13840d38
  11. 09 5月, 2017 1 次提交
  12. 02 5月, 2017 2 次提交
    • M
      dm bufio: check new buffer allocation watermark every 30 seconds · 390020ad
      Mikulas Patocka 提交于
      dm-bufio checks a watermark when it allocates a new buffer in
      __bufio_new().  However, it doesn't check the watermark when the user
      changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.
      
      This may result in a problem - if the watermark is high enough so that
      all possible buffers are allocated and if the user lowers the value of
      "max_cache_size_bytes", the watermark will never be checked against the
      new value because no new buffer would be allocated.
      
      To fix this, change __evict_old_buffers() so that it checks the
      watermark.  __evict_old_buffers() is called every 30 seconds, so if the
      user reduces "max_cache_size_bytes", dm-bufio will react to this change
      within 30 seconds and decrease memory consumption.
      
      Depends-on: 1b0fb5a5 ("dm bufio: avoid a possible ABBA deadlock")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      390020ad
    • M
      dm bufio: avoid a possible ABBA deadlock · 1b0fb5a5
      Mikulas Patocka 提交于
      __get_memory_limit() tests if dm_bufio_cache_size changed and calls
      __cache_size_refresh() if it did.  It takes dm_bufio_clients_lock while
      it already holds the client lock.  However, lock ordering is violated
      because in cleanup_old_buffers() dm_bufio_clients_lock is taken before
      the client lock.
      
      This results in a possible deadlock and lockdep engine warning.
      
      Fix this deadlock by changing mutex_lock() to mutex_trylock().  If the
      lock can't be taken, it will be re-checked next time when a new buffer
      is allocated.
      
      Also add "unlikely" to the if condition, so that the optimizer assumes
      that the condition is false.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1b0fb5a5
  13. 08 3月, 2017 1 次提交
  14. 02 3月, 2017 1 次提交
  15. 14 1月, 2017 1 次提交
    • D
      sched/core: Remove set_task_state() · 642fa448
      Davidlohr Bueso 提交于
      This is a nasty interface and setting the state of a foreign task must
      not be done. As of the following commit:
      
        be628be0 ("bcache: Make gc wakeup sane, remove set_task_state()")
      
      ... everyone in the kernel calls set_task_state() with current, allowing
      the helper to be removed.
      
      However, as the comment indicates, it is still around for those archs
      where computing current is more expensive than using a pointer, at least
      in theory. An important arch that is affected is arm64, however this has
      been addressed now [1] and performance is up to par making no difference
      with either calls.
      
      Of all the callers, if any, it's the locking bits that would care most
      about this -- ie: we end up passing a tsk pointer to a lot of the lock
      slowpath, and setting ->state on that. The following numbers are based
      on two tests: a custom ad-hoc microbenchmark that just measures
      latencies (for ~65 million calls) between get_task_state() vs
      get_current_state().
      
      Secondly for a higher overview, an unlink microbenchmark was used,
      which pounds on a single file with open, close,unlink combos with
      increasing thread counts (up to 4x ncpus). While the workload is quite
      unrealistic, it does contend a lot on the inode mutex or now rwsem.
      
      [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      
      x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
      and ppc64 (with paca) show similar improvements in the unlink microbenches.
      The small delta for ppc64 (2ms), does not represent the gains on the unlink
      runs. In the case of x86, there was a decent amount of variation in the
      latency runs, but always within a 20 to 50ms increase), ppc was more constant.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      642fa448
  16. 09 12月, 2016 3 次提交
    • M
      dm bufio: drop the lock when doing GFP_NOIO allocation · 41c73a49
      Mikulas Patocka 提交于
      If the first allocation attempt using GFP_NOWAIT fails, drop the lock
      and retry using GFP_NOIO allocation (lock is dropped because the
      allocation can take some time).
      
      Note that we won't do GFP_NOIO allocation when we loop for the second
      time, because the lock shouldn't be dropped between __wait_for_free_buffer
      and __get_unclaimed_buffer.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      41c73a49
    • M
      dm bufio: don't take the lock in dm_bufio_shrink_count · d12067f4
      Mikulas Patocka 提交于
      dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
      freeable objects are there. The reported value doesn't have to be precise,
      so we don't need to take the dm-bufio lock.
      Suggested-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d12067f4
    • D
      dm bufio: avoid sleeping while holding the dm_bufio lock · 9ea61cac
      Douglas Anderson 提交于
      We've seen in-field reports showing _lots_ (18 in one case, 41 in
      another) of tasks all sitting there blocked on:
      
        mutex_lock+0x4c/0x68
        dm_bufio_shrink_count+0x38/0x78
        shrink_slab.part.54.constprop.65+0x100/0x464
        shrink_zone+0xa8/0x198
      
      In the two cases analyzed, we see one task that looks like this:
      
        Workqueue: kverityd verity_prefetch_io
      
        __switch_to+0x9c/0xa8
        __schedule+0x440/0x6d8
        schedule+0x94/0xb4
        schedule_timeout+0x204/0x27c
        schedule_timeout_uninterruptible+0x44/0x50
        wait_iff_congested+0x9c/0x1f0
        shrink_inactive_list+0x3a0/0x4cc
        shrink_lruvec+0x418/0x5cc
        shrink_zone+0x88/0x198
        try_to_free_pages+0x51c/0x588
        __alloc_pages_nodemask+0x648/0xa88
        __get_free_pages+0x34/0x7c
        alloc_buffer+0xa4/0x144
        __bufio_new+0x84/0x278
        dm_bufio_prefetch+0x9c/0x154
        verity_prefetch_io+0xe8/0x10c
        process_one_work+0x240/0x424
        worker_thread+0x2fc/0x424
        kthread+0x10c/0x114
      
      ...and that looks to be the one holding the mutex.
      
      The problem has been reproduced on fairly easily:
      0. Be running Chrome OS w/ verity enabled on the root filesystem
      1. Pick test patch: http://crosreview.com/412360
      2. Install launchBalloons.sh and balloon.arm from
           http://crbug.com/468342
         ...that's just a memory stress test app.
      3. On a 4GB rk3399 machine, run
           nice ./launchBalloons.sh 4 900 100000
         ...that tries to eat 4 * 900 MB of memory and keep accessing.
      4. Login to the Chrome web browser and restore many tabs
      
      With that, I've seen printouts like:
        DOUG: long bufio 90758 ms
      ...and stack trace always show's we're in dm_bufio_prefetch().
      
      The problem is that we try to allocate memory with GFP_NOIO while
      we're holding the dm_bufio lock.  Instead we should be using
      GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
      lock and that causes the above problems.
      
      The current behavior explained by David Rientjes:
      
        It will still try reclaim initially because __GFP_WAIT (or
        __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
        contention on dm_bufio_lock() that the thread holds.  You want to
        pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
        mutex that can be contended by a concurrent slab shrinker (if
        count_objects didn't use a trylock, this pattern would trivially
        deadlock).
      
      This change significantly increases responsiveness of the system while
      in this state.  It makes a real difference because it unblocks kswapd.
      In the bug report analyzed, kswapd was hung:
      
         kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
         Call trace:
         [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
         [<ffffffc00090b794>] __schedule+0x440/0x6d8
         [<ffffffc00090bac0>] schedule+0x94/0xb4
         [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
         [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
         [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
         [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
         [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
         [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
         [<ffffffc00030e578>] balance_pgdat+0x328/0x508
         [<ffffffc00030eb7c>] kswapd+0x424/0x51c
         [<ffffffc00023f06c>] kthread+0x10c/0x114
         [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
      
      By unblocking kswapd memory pressure should be reduced.
      Suggested-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9ea61cac
  17. 22 11月, 2016 1 次提交
  18. 01 11月, 2016 1 次提交
  19. 22 9月, 2016 1 次提交
  20. 31 8月, 2016 1 次提交
  21. 08 6月, 2016 2 次提交
  22. 04 1月, 2016 1 次提交
  23. 10 12月, 2015 3 次提交
  24. 01 11月, 2015 2 次提交
  25. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  26. 10 2月, 2015 1 次提交
  27. 02 12月, 2014 1 次提交
  28. 11 11月, 2014 2 次提交
  29. 17 10月, 2014 1 次提交
    • M
      dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks · 9d28eb12
      Mikulas Patocka 提交于
      The shrinker uses gfp flags to indicate what kind of operation can the
      driver wait for. If __GFP_IO flag is present, the driver can wait for
      block I/O operations, if __GFP_FS flag is present, the driver can wait on
      operations involving the filesystem.
      
      dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
      device that makes calls into the filesystem. If __GFP_IO is present and
      __GFP_FS isn't, dm-bufio could still block on filesystem operations if it
      runs on a loop block device.
      
      The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
      unreproducible) deadlock involving dm-bufio and loop device.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9d28eb12