1. 17 5月, 2017 1 次提交
    • M
      dm bufio: make the parameter "retain_bytes" unsigned long · 13840d38
      Mikulas Patocka 提交于
      Change the type of the parameter "retain_bytes" from unsigned to
      unsigned long, so that on 64-bit machines the user can set more than
      4GiB of data to be retained.
      
      Also, change the type of the variable "count" in the function
      "__evict_old_buffers" to unsigned long.  The assignment
      "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
      could result in unsigned long to unsigned overflow and that could result
      in buffers not being freed when they should.
      
      While at it, avoid division in get_retain_buffers().  Division is slow,
      we can change it to shift because we have precalculated the log2 of
      block size.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      13840d38
  2. 09 5月, 2017 1 次提交
  3. 02 5月, 2017 2 次提交
    • M
      dm bufio: check new buffer allocation watermark every 30 seconds · 390020ad
      Mikulas Patocka 提交于
      dm-bufio checks a watermark when it allocates a new buffer in
      __bufio_new().  However, it doesn't check the watermark when the user
      changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.
      
      This may result in a problem - if the watermark is high enough so that
      all possible buffers are allocated and if the user lowers the value of
      "max_cache_size_bytes", the watermark will never be checked against the
      new value because no new buffer would be allocated.
      
      To fix this, change __evict_old_buffers() so that it checks the
      watermark.  __evict_old_buffers() is called every 30 seconds, so if the
      user reduces "max_cache_size_bytes", dm-bufio will react to this change
      within 30 seconds and decrease memory consumption.
      
      Depends-on: 1b0fb5a5 ("dm bufio: avoid a possible ABBA deadlock")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      390020ad
    • M
      dm bufio: avoid a possible ABBA deadlock · 1b0fb5a5
      Mikulas Patocka 提交于
      __get_memory_limit() tests if dm_bufio_cache_size changed and calls
      __cache_size_refresh() if it did.  It takes dm_bufio_clients_lock while
      it already holds the client lock.  However, lock ordering is violated
      because in cleanup_old_buffers() dm_bufio_clients_lock is taken before
      the client lock.
      
      This results in a possible deadlock and lockdep engine warning.
      
      Fix this deadlock by changing mutex_lock() to mutex_trylock().  If the
      lock can't be taken, it will be re-checked next time when a new buffer
      is allocated.
      
      Also add "unlikely" to the if condition, so that the optimizer assumes
      that the condition is false.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1b0fb5a5
  4. 08 3月, 2017 1 次提交
  5. 02 3月, 2017 1 次提交
  6. 14 1月, 2017 1 次提交
    • D
      sched/core: Remove set_task_state() · 642fa448
      Davidlohr Bueso 提交于
      This is a nasty interface and setting the state of a foreign task must
      not be done. As of the following commit:
      
        be628be0 ("bcache: Make gc wakeup sane, remove set_task_state()")
      
      ... everyone in the kernel calls set_task_state() with current, allowing
      the helper to be removed.
      
      However, as the comment indicates, it is still around for those archs
      where computing current is more expensive than using a pointer, at least
      in theory. An important arch that is affected is arm64, however this has
      been addressed now [1] and performance is up to par making no difference
      with either calls.
      
      Of all the callers, if any, it's the locking bits that would care most
      about this -- ie: we end up passing a tsk pointer to a lot of the lock
      slowpath, and setting ->state on that. The following numbers are based
      on two tests: a custom ad-hoc microbenchmark that just measures
      latencies (for ~65 million calls) between get_task_state() vs
      get_current_state().
      
      Secondly for a higher overview, an unlink microbenchmark was used,
      which pounds on a single file with open, close,unlink combos with
      increasing thread counts (up to 4x ncpus). While the workload is quite
      unrealistic, it does contend a lot on the inode mutex or now rwsem.
      
      [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      
      x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
      and ppc64 (with paca) show similar improvements in the unlink microbenches.
      The small delta for ppc64 (2ms), does not represent the gains on the unlink
      runs. In the case of x86, there was a decent amount of variation in the
      latency runs, but always within a 20 to 50ms increase), ppc was more constant.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      642fa448
  7. 09 12月, 2016 3 次提交
    • M
      dm bufio: drop the lock when doing GFP_NOIO allocation · 41c73a49
      Mikulas Patocka 提交于
      If the first allocation attempt using GFP_NOWAIT fails, drop the lock
      and retry using GFP_NOIO allocation (lock is dropped because the
      allocation can take some time).
      
      Note that we won't do GFP_NOIO allocation when we loop for the second
      time, because the lock shouldn't be dropped between __wait_for_free_buffer
      and __get_unclaimed_buffer.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      41c73a49
    • M
      dm bufio: don't take the lock in dm_bufio_shrink_count · d12067f4
      Mikulas Patocka 提交于
      dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
      freeable objects are there. The reported value doesn't have to be precise,
      so we don't need to take the dm-bufio lock.
      Suggested-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d12067f4
    • D
      dm bufio: avoid sleeping while holding the dm_bufio lock · 9ea61cac
      Douglas Anderson 提交于
      We've seen in-field reports showing _lots_ (18 in one case, 41 in
      another) of tasks all sitting there blocked on:
      
        mutex_lock+0x4c/0x68
        dm_bufio_shrink_count+0x38/0x78
        shrink_slab.part.54.constprop.65+0x100/0x464
        shrink_zone+0xa8/0x198
      
      In the two cases analyzed, we see one task that looks like this:
      
        Workqueue: kverityd verity_prefetch_io
      
        __switch_to+0x9c/0xa8
        __schedule+0x440/0x6d8
        schedule+0x94/0xb4
        schedule_timeout+0x204/0x27c
        schedule_timeout_uninterruptible+0x44/0x50
        wait_iff_congested+0x9c/0x1f0
        shrink_inactive_list+0x3a0/0x4cc
        shrink_lruvec+0x418/0x5cc
        shrink_zone+0x88/0x198
        try_to_free_pages+0x51c/0x588
        __alloc_pages_nodemask+0x648/0xa88
        __get_free_pages+0x34/0x7c
        alloc_buffer+0xa4/0x144
        __bufio_new+0x84/0x278
        dm_bufio_prefetch+0x9c/0x154
        verity_prefetch_io+0xe8/0x10c
        process_one_work+0x240/0x424
        worker_thread+0x2fc/0x424
        kthread+0x10c/0x114
      
      ...and that looks to be the one holding the mutex.
      
      The problem has been reproduced on fairly easily:
      0. Be running Chrome OS w/ verity enabled on the root filesystem
      1. Pick test patch: http://crosreview.com/412360
      2. Install launchBalloons.sh and balloon.arm from
           http://crbug.com/468342
         ...that's just a memory stress test app.
      3. On a 4GB rk3399 machine, run
           nice ./launchBalloons.sh 4 900 100000
         ...that tries to eat 4 * 900 MB of memory and keep accessing.
      4. Login to the Chrome web browser and restore many tabs
      
      With that, I've seen printouts like:
        DOUG: long bufio 90758 ms
      ...and stack trace always show's we're in dm_bufio_prefetch().
      
      The problem is that we try to allocate memory with GFP_NOIO while
      we're holding the dm_bufio lock.  Instead we should be using
      GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
      lock and that causes the above problems.
      
      The current behavior explained by David Rientjes:
      
        It will still try reclaim initially because __GFP_WAIT (or
        __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
        contention on dm_bufio_lock() that the thread holds.  You want to
        pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
        mutex that can be contended by a concurrent slab shrinker (if
        count_objects didn't use a trylock, this pattern would trivially
        deadlock).
      
      This change significantly increases responsiveness of the system while
      in this state.  It makes a real difference because it unblocks kswapd.
      In the bug report analyzed, kswapd was hung:
      
         kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
         Call trace:
         [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
         [<ffffffc00090b794>] __schedule+0x440/0x6d8
         [<ffffffc00090bac0>] schedule+0x94/0xb4
         [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
         [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
         [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
         [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
         [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
         [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
         [<ffffffc00030e578>] balance_pgdat+0x328/0x508
         [<ffffffc00030eb7c>] kswapd+0x424/0x51c
         [<ffffffc00023f06c>] kthread+0x10c/0x114
         [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
      
      By unblocking kswapd memory pressure should be reduced.
      Suggested-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9ea61cac
  8. 22 11月, 2016 1 次提交
  9. 01 11月, 2016 1 次提交
  10. 22 9月, 2016 1 次提交
  11. 31 8月, 2016 1 次提交
  12. 08 6月, 2016 2 次提交
  13. 04 1月, 2016 1 次提交
  14. 10 12月, 2015 3 次提交
  15. 01 11月, 2015 2 次提交
  16. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  17. 10 2月, 2015 1 次提交
  18. 02 12月, 2014 1 次提交
  19. 11 11月, 2014 2 次提交
  20. 17 10月, 2014 1 次提交
    • M
      dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks · 9d28eb12
      Mikulas Patocka 提交于
      The shrinker uses gfp flags to indicate what kind of operation can the
      driver wait for. If __GFP_IO flag is present, the driver can wait for
      block I/O operations, if __GFP_FS flag is present, the driver can wait on
      operations involving the filesystem.
      
      dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
      device that makes calls into the filesystem. If __GFP_IO is present and
      __GFP_FS isn't, dm-bufio could still block on filesystem operations if it
      runs on a loop block device.
      
      The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
      unreproducible) deadlock involving dm-bufio and loop device.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9d28eb12
  21. 06 10月, 2014 2 次提交
    • M
      dm bufio: when done scanning return from __scan immediately · 0e825862
      Mikulas Patocka 提交于
      When __scan frees the required number of buffer entries that the
      shrinker requested (nr_to_scan becomes zero) it must return.  Before
      this fix the __scan code exited only the inner loop and continued in the
      outer loop -- which could result in reduced performance due to extra
      buffers being freed (e.g. unnecessarily evicted thinp metadata needing
      to be synchronously re-read into bufio's cache).
      
      Also, move dm_bufio_cond_resched to __scan's inner loop, so that
      iterating the bufio client's lru lists doesn't result in scheduling
      latency.
      Reported-by: NJoe Thornber <thornber@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.2+
      0e825862
    • J
      dm bufio: update last_accessed when relinking a buffer · eb76faf5
      Joe Thornber 提交于
      The 'last_accessed' member of the dm_buffer structure was only set when
      the the buffer was created.  This led to each buffer being discarded
      after dm_bufio_max_age time even if it was used recently.  In practice
      this resulted in all thinp metadata being evicted soon after being read
      -- this is particularly problematic for metadata intensive workloads
      like multithreaded small random IO.
      
      'last_accessed' is now updated each time the buffer is moved to the head
      of the LRU list, so the buffer is now properly discarded if it was not
      used in dm_bufio_max_age time.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.2+
      eb76faf5
  22. 19 9月, 2014 1 次提交
    • K
      sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() · f139caf2
      Kirill Tkhai 提交于
      schedule(), io_schedule() and schedule_timeout() always return
      with TASK_RUNNING state set, so one more setting is unnecessary.
      
      (All places in patch are visible good, only exception is
       kiblnd_scheduler() from:
      
            drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
      
       Its schedule() is one line above standard 3 lines of unified diff)
      
      No places where set_current_state() is used for mb().
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Anil Belur <askb23@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dmitry Eremin <dmitry.eremin@intel.com>
      Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Isaac Huang <he.huang@intel.com>
      Cc: James E.J. Bottomley <JBottomley@parallels.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liang Zhen <liang.zhen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masaru Nomura <massa.nomura@gmail.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: Peng Tao <bergwolf@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Robert Love <robert.w.love@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Cc: Zi Shen Lim <zlim.lnx@gmail.com>
      Cc: devel@driverdev.osuosl.org
      Cc: dm-devel@redhat.com
      Cc: dri-devel@lists.freedesktop.org
      Cc: fcoe-devel@open-fcoe.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-afs@lists.infradead.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: qla2xxx-upstream@qlogic.com
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f139caf2
  23. 02 8月, 2014 1 次提交
    • G
      dm bufio: fully initialize shrinker · d8c712ea
      Greg Thelen 提交于
      1d3d4437 ("vmscan: per-node deferred work") added a flags field to
      struct shrinker assuming that all shrinkers were zero filled.  The dm
      bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
      in flags.  So far the only defined flags bit is SHRINKER_NUMA_AWARE.
      But there are proposed patches which add other bits to shrinker.flags
      (e.g. memcg awareness).
      
      Rather than simply initializing the shrinker, this patch uses kzalloc()
      when allocating the dm_bufio_client to ensure that the embedded shrinker
      and any other similar structures are zeroed.
      
      This fixes theoretical over aggressive shrinking of dm bufio objects.
      If the uninitialized dm_bufio_client.shrinker.flags contains
      SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
      each numa node rather than just once.  This has been broken since 3.12.
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.12+
      d8c712ea
  24. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  25. 18 4月, 2014 1 次提交
  26. 15 1月, 2014 2 次提交
    • M
      dm snapshot: use dm-bufio prefetch · 55b082e6
      Mikulas Patocka 提交于
      This patch modifies dm-snapshot so that it prefetches the buffers when
      loading the exceptions.
      
      The number of buffers read ahead is specified in the DM_PREFETCH_CHUNKS
      macro.  The current value for DM_PREFETCH_CHUNKS (12) was found to
      provide the best performance on a single 15k SCSI spindle.  In the
      future we may modify this default or make it configurable.
      
      Also, introduce the function dm_bufio_set_minimum_buffers to setup
      bufio's number of internal buffers before freeing happens.  dm-bufio may
      hold more buffers if enough memory is available.  There is no guarantee
      that the specified number of buffers will be available - if you need a
      guarantee, use the argument reserved_buffers for
      dm_bufio_client_create.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      55b082e6
    • M
      dm snapshot: use dm-bufio · 55494bf2
      Mikulas Patocka 提交于
      Use dm-bufio for initial loading of the exceptions.
      Introduce a new function dm_bufio_forget that frees the given buffer.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      55494bf2
  27. 11 12月, 2013 1 次提交
    • M
      dm bufio: initialize read-only module parameters · 4cb57ab4
      Mikulas Patocka 提交于
      Some module parameters in dm-bufio are read-only. These parameters
      inform the user about memory consumption. They are not supposed to be
      changed by the user.
      
      However, despite being read-only, these parameters can be set on
      modprobe or insmod command line, for example:
      modprobe dm-bufio current_allocated_bytes=12345
      
      The kernel doesn't expect that these variables can be non-zero at module
      initialization and if the user sets them, it results in BUG.
      
      This patch initializes the variables in the module init routine, so that
      user-supplied values are ignored.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.2+
      4cb57ab4
  28. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  29. 11 9月, 2013 1 次提交
    • D
      drivers: convert shrinkers to new count/scan API · 7dc19d5a
      Dave Chinner 提交于
      Convert the driver shrinkers to the new API.  Most changes are compile
      tested only because I either don't have the hardware or it's staging
      stuff.
      
      FWIW, the md and android code is pretty good, but the rest of it makes me
      want to claw my eyes out.  The amount of broken code I just encountered is
      mind boggling.  I've added comments explaining what is broken, but I fear
      that some of the code would be best dealt with by being dragged behind the
      bike shed, burying in mud up to it's neck and then run over repeatedly
      with a blunt lawn mower.
      
      Special mention goes to the zcache/zcache2 drivers.  They can't co-exist
      in the build at the same time, they are under different menu options in
      menuconfig, they only show up when you've got the right set of mm
      subsystem options configured and so even compile testing is an exercise in
      pulling teeth.  And that doesn't even take into account the horrible,
      broken code...
      
      [glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7dc19d5a
  30. 11 7月, 2013 1 次提交