1. 27 5月, 2011 1 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  2. 31 3月, 2011 1 次提交
  3. 25 3月, 2011 4 次提交
    • D
      fs: pull inode->i_lock up out of writeback_single_inode · 0f1b1fd8
      Dave Chinner 提交于
      First thing we do in writeback_single_inode() is take the i_lock and
      the last thing we do is drop it. A caller already holds the i_lock,
      so pull the i_lock out of writeback_single_inode() to reduce the
      round trips on this lock during inode writeback.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0f1b1fd8
    • D
      fs: move i_wb_list out from under inode_lock · a66979ab
      Dave Chinner 提交于
      Protect the inode writeback list with a new global lock
      inode_wb_list_lock and use it to protect the list manipulations and
      traversals. This lock replaces the inode_lock as the inodes on the
      list can be validity checked while holding the inode->i_lock and
      hence the inode_lock is no longer needed to protect the list.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a66979ab
    • D
      fs: move i_sb_list out from under inode_lock · 55fa6091
      Dave Chinner 提交于
      Protect the per-sb inode list with a new global lock
      inode_sb_list_lock and use it to protect the list manipulations and
      traversals. This lock replaces the inode_lock as the inodes on the
      list can be validity checked while holding the inode->i_lock and
      hence the inode_lock is no longer needed to protect the list.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55fa6091
    • D
      fs: protect inode->i_state with inode->i_lock · 250df6ed
      Dave Chinner 提交于
      Protect inode state transitions and validity checks with the
      inode->i_lock. This enables us to make inode state transitions
      independently of the inode_lock and is the first step to peeling
      away the inode_lock from the code.
      
      This requires that __iget() is done atomically with i_state checks
      during list traversals so that we don't race with another thread
      marking the inode I_FREEING between the state check and grabbing the
      reference.
      
      Also remove the unlock_new_inode() memory barrier optimisation
      required to avoid taking the inode_lock when clearing I_NEW.
      Simplify the code by simply taking the inode->i_lock around the
      state change and wakeup. Because the wakeup is no longer tricky,
      remove the wake_up_inode() function and open code the wakeup where
      necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      250df6ed
  4. 14 1月, 2011 6 次提交
    • S
      fs/fs-writeback.c: fix sync_inodes_sb() return value kernel-doc · cb9ef8d5
      Stefan Hajnoczi 提交于
      The sync_inodes_sb() function does not have a return value.  Remove the
      outdated documentation comment.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9ef8d5
    • A
      sync_inode_metadata: fix comment · c691b9d9
      Andrew Morton 提交于
      Use correct function name, remove incorrect apostrophe
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c691b9d9
    • J
      writeback: avoid livelocking WB_SYNC_ALL writeback · b9543dac
      Jan Kara 提交于
      When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
      usually set to LONG_MAX.  The logic in wb_writeback() then calls
      __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we
      easily end up with non-positive nr_to_write after the function returns, if
      the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment.
      
      When nr_to_write is <= 0 wb_writeback() decides we need another round of
      writeback but this is wrong in some cases!  For example when a single
      large file is continuously dirtied, we would never finish syncing it
      because each pass would be able to write MAX_WRITEBACK_PAGES and inode
      dirty timestamp never gets updated (as inode is never completely clean).
      Thus __writeback_inodes_sb() would write the redirtied inode again and
      again.
      
      Fix the issue by setting nr_to_write to LONG_MAX in WB_SYNC_ALL mode.  We
      do not need nr_to_write in WB_SYNC_ALL mode anyway since
      write_cache_pages() does livelock avoidance using page tagging in
      WB_SYNC_ALL mode.
      
      This makes wb_writeback() call __writeback_inodes_sb() only once on
      WB_SYNC_ALL.  The latter function won't livelock because it works on
      
      - a finite set of files by doing queue_io() once at the beginning
      - a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging
      
      After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no
      longer able to stall sync forever.
      
      [fengguang.wu@intel.com: fix locking comment]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Engelhardt <jengelh@medozas.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9543dac
    • J
      writeback: stop background/kupdate works from livelocking other works · aa373cf5
      Jan Kara 提交于
      Background writeback is easily livelockable in a loop in wb_writeback() by
      a process continuously re-dirtying pages (or continuously appending to a
      file).  This is in fact intended as the target of background writeback is
      to write dirty pages it can find as long as we are over
      dirty_background_threshold.
      
      But the above behavior gets inconvenient at times because no other work
      queued in the flusher thread's queue gets processed.  In particular, since
      e.g.  sync(1) relies on flusher thread to do all the IO for it, sync(1)
      can hang forever waiting for flusher thread to do the work.
      
      Generally, when a flusher thread has some work queued, someone submitted
      the work to achieve a goal more specific than what background writeback
      does.  Moreover by working on the specific work, we also reduce amount of
      dirty pages which is exactly the target of background writeout.  So it
      makes sense to give specific work a priority over a generic page cleaning.
      
      Thus we interrupt background writeback if there is some other work to do.
      We return to the background writeback after completing all the queued
      work.
      
      This may delay the writeback of expired inodes for a while, however the
      expired inodes will eventually be flushed to disk as long as the other
      works won't livelock.
      
      [fengguang.wu@intel.com: update comment]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Engelhardt <jengelh@medozas.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa373cf5
    • W
      writeback: trace wakeup event for background writeback · 71927e84
      Wu Fengguang 提交于
      This tracks when balance_dirty_pages() tries to wakeup the flusher thread
      for background writeback (if it was not started already).
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Engelhardt <jengelh@medozas.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71927e84
    • J
      writeback: integrated background writeback work · 6585027a
      Jan Kara 提交于
      Check whether background writeback is needed after finishing each work.
      
      When bdi flusher thread finishes doing some work check whether any kind of
      background writeback needs to be done (either because
      dirty_background_ratio is exceeded or because we need to start flushing
      old inodes).  If so, just do background write back.
      
      This way, bdi_start_background_writeback() just needs to wake up the
      flusher thread.  It will do background writeback as soon as there is no
      other work.
      
      This is a preparatory patch for the next patch which stops background
      writeback as soon as there is other work to do.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Engelhardt <jengelh@medozas.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6585027a
  5. 30 10月, 2010 1 次提交
    • L
      fs-writeback.c: unify some common code · cdf01dd5
      Linus Torvalds 提交于
      The btrfs merge looks like hell, because it changes fs-writeback.c, and
      the crazy code has this repeated "estimate number of dirty pages"
      counting that involves three different helper functions.  And it's done
      in two different places.
      
      Just unify that whole calculation as a "get_nr_dirty_pages()" helper
      function, and the merge result will look half-way decent.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cdf01dd5
  6. 29 10月, 2010 1 次提交
    • C
      Add new functions for triggering inode writeback · 3259f8be
      Chris Mason 提交于
      When btrfs is running low on metadata space, it needs to force delayed
      allocation pages to disk.  It currently does this with a suboptimal walk
      of a private list of inodes with delayed allocation, and it would be
      much better if we used the generic flusher threads.
      
      writeback_inodes_sb_if_idle would be ideal, but it waits for the flusher
      thread to start IO on all the dirty pages in the FS before it returns.
      This adds variants of writeback_inodes_sb* that allow the caller to
      control how many pages get sent down.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3259f8be
  7. 27 10月, 2010 3 次提交
  8. 26 10月, 2010 6 次提交
  9. 04 10月, 2010 1 次提交
    • C
      writeback: always use sb->s_bdi for writeback purposes · aaead25b
      Christoph Hellwig 提交于
      We currently use struct backing_dev_info for various different purposes.
      Originally it was introduced to describe a backing device which includes
      an unplug and congestion function and various bits of readahead information
      and VM-relevant flags.  We're also using for tracking dirty inodes for
      writeback.
      
      To make writeback properly find all inodes we need to only access the
      per-filesystem backing_device pointed to by the superblock in ->s_bdi
      inside the writeback code, and not the instances pointeded to by
      inode->i_mapping->backing_dev which can be overriden by special devices
      or might not be set at all by some filesystems.
      
      Long term we should split out the writeback-relevant bits of struct
      backing_device_info (which includes more than the current bdi_writeback)
      and only point to it from the superblock while leaving the traditional
      backing device as a separate structure that can be overriden by devices.
      
      The one exception for now is the block device filesystem which really
      wants different writeback contexts for it's different (internal) inodes
      to handle the writeout more efficiently.  For now we do this with
      a hack in fs-writeback.c because we're so late in the cycle, but in
      the future I plan to replace this with a superblock method that allows
      for multiple writeback contexts per filesystem.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      aaead25b
  10. 22 9月, 2010 1 次提交
    • J
      bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends · 692ebd17
      Jan Kara 提交于
      Inodes of devices such as /dev/zero can get dirty for example via
      utime(2) syscall or due to atime update. Backing device of such inodes
      (zero_bdi, etc.) is however unable to handle dirty inodes and thus
      __mark_inode_dirty complains.  In fact, inode should be rather dirtied
      against backing device of the filesystem holding it. This is generally a
      good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes
      in these pseudofilesystems are referenced from ordinary filesystem
      inodes and carry mapping with real data of the device. Thus for these
      inodes we have to use inode->i_mapping->backing_dev_info as we did so
      far. We distinguish these filesystems by checking whether sb->s_bdi
      points to a non-trivial backing device or not.
      
      Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /.
      There's a device inode A described by a path "/dev/sdb" on this
      filesystem. This inode will be dirtied against backing device "8:0"
      after this patch. bdev filesystem contains block device inode B coupled
      with our inode A. When someone modifies a page of /dev/sdb, it's B that
      gets dirtied and the dirtying happens against the backing device "8:16".
      Thus both inodes get filed to a correct bdi list.
      
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      692ebd17
  11. 28 8月, 2010 1 次提交
    • J
      writeback: Fix lost wake-up shutting down writeback thread · b76b4014
      J. Bruce Fields 提交于
      Setting the task state here may cause us to miss the wake up from
      kthread_stop(), so we need to recheck kthread_should_stop() or risk
      sleeping forever in the following schedule().
      
      Symptom was an indefinite hang on an NFSv4 mount.  (NFSv4 may create
      multiple mounts in a temporary namespace while traversing the mount
      path, and since the temporary namespace is immediately destroyed, it may
      end up destroying a mount very soon after it was created, possibly
      making this race more likely.)
      
      INFO: task mount.nfs4:4314 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount.nfs4    D 0000000000000000  2880  4314   4313 0x00000000
       ffff88001ed6da28 0000000000000046 ffff88001ed6dfd8 ffff88001ed6dfd8
       ffff88001ed6c000 ffff88001ed6c000 ffff88001ed6c000 ffff88001e5003a0
       ffff88001ed6dfd8 ffff88001e5003a8 ffff88001ed6c000 ffff88001ed6dfd8
      Call Trace:
       [<ffffffff8196090d>] schedule_timeout+0x1cd/0x2e0
       [<ffffffff8106a31c>] ? mark_held_locks+0x6c/0xa0
       [<ffffffff819639a0>] ? _raw_spin_unlock_irq+0x30/0x60
       [<ffffffff8106a5fd>] ? trace_hardirqs_on_caller+0x14d/0x190
       [<ffffffff819671fe>] ? sub_preempt_count+0xe/0xd0
       [<ffffffff8195fc80>] wait_for_common+0x120/0x190
       [<ffffffff81033c70>] ? default_wake_function+0x0/0x20
       [<ffffffff8195fdcd>] wait_for_completion+0x1d/0x20
       [<ffffffff810595fa>] kthread_stop+0x4a/0x150
       [<ffffffff81061a60>] ? thaw_process+0x70/0x80
       [<ffffffff810cc68a>] bdi_unregister+0x10a/0x1a0
       [<ffffffff81229dc9>] nfs_put_super+0x19/0x20
       [<ffffffff810ee8c4>] generic_shutdown_super+0x54/0xe0
       [<ffffffff810ee9b6>] kill_anon_super+0x16/0x60
       [<ffffffff8122d3b9>] nfs4_kill_super+0x39/0x90
       [<ffffffff810eda45>] deactivate_locked_super+0x45/0x60
       [<ffffffff810edfb9>] deactivate_super+0x49/0x70
       [<ffffffff81108294>] mntput_no_expire+0x84/0xe0
       [<ffffffff811084ef>] release_mounts+0x9f/0xc0
       [<ffffffff81108575>] put_mnt_ns+0x65/0x80
       [<ffffffff8122cc56>] nfs_follow_remote_path+0x1e6/0x420
       [<ffffffff8122cfbf>] nfs4_try_mount+0x6f/0xd0
       [<ffffffff8122d0c2>] nfs4_get_sb+0xa2/0x360
       [<ffffffff810edcb8>] vfs_kern_mount+0x88/0x1f0
       [<ffffffff810ede92>] do_kern_mount+0x52/0x130
       [<ffffffff81963d9a>] ? _lock_kernel+0x6a/0x170
       [<ffffffff81108e9e>] do_mount+0x26e/0x7f0
       [<ffffffff81106b3a>] ? copy_mount_options+0xea/0x190
       [<ffffffff811094b8>] sys_mount+0x98/0xf0
       [<ffffffff810024d8>] system_call_fastpath+0x16/0x1b
      1 lock held by mount.nfs4/4314:
       #0:  (&type->s_umount_key#24){+.+...}, at: [<ffffffff810edfb1>] deactivate_super+0x41/0x70
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      Acked-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      b76b4014
  12. 12 8月, 2010 5 次提交
  13. 10 8月, 2010 2 次提交
    • J
      mm: avoid resetting wb_start after each writeback round · 7624ee72
      Jan Kara 提交于
      WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't
      write out some huge inode for too long while starving writeout of other
      inodes.  To avoid livelocks, we record time we started writeback in
      wbc->wb_start and do not write out inodes which were dirtied after this
      time.  But currently, writeback_inodes_wb() resets wb_start each time it
      is called thus effectively invalidating this logic and making any
      WB_SYNC_NONE writeback prone to livelocks.
      
      This patch makes sure wb_start is set only once when we start writeback.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7624ee72
    • A
      simplify checks for I_CLEAR/I_FREEING · a4ffdde6
      Al Viro 提交于
      add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
      equivalent to I_FREEING for almost all code looking at either;
      it's there to keep track of having called clear_inode() exactly
      once per inode lifetime, at some point after having set I_FREEING.
      I_CLEAR and I_FREEING never get set at the same time with the
      current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
      instead of I_CLEAR without loss of information.  As the result of
      such change, checks become simpler and the amount of code that needs
      to know about I_CLEAR shrinks a lot.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4ffdde6
  14. 08 8月, 2010 7 次提交
    • A
      writeback: optimize periodic bdi thread wakeups · 6467716a
      Artem Bityutskiy 提交于
      Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
      should take care of the periodic background write-out. However, the write-out
      will actually start only 'dirty_writeback_interval' centisecs later, so we can
      delay the wake-up.
      
      This change was requested by Nick Piggin who pointed out that if we delay the
      wake-up, we weed out 2 unnecessary contex switches, which matters because
      '__mark_inode_dirty()' is a hot-path function.
      
      This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
      sets up a timer to wake-up the bdi thread and returns. So the wake-up is
      delayed.
      
      We also delete the timer in bdi threads just before writing-back. And
      synchronously delete it when unregistering bdi. At the unregister point the bdi
      does not have any users, so no one can arm it again.
      
      Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
      context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
      this change as well.
      
      This patch also moves the 'bdi_wb_init()' function down in the file to avoid
      forward-declaration of 'bdi_wakeup_thread_delayed()'.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6467716a
    • A
      writeback: prevent unnecessary bdi threads wakeups · 253c34e9
      Artem Bityutskiy 提交于
      Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very
      bad for battery-driven devices.
      
      There are two types of activities bdi threads do:
      1. process bdi works from the 'bdi->work_list'
      2. periodic write-back
      
      So there are 2 sources of wake-up events for bdi threads:
      
      1. 'bdi_queue_work()' - submits bdi works
      2. '__mark_inode_dirty()' - adds dirty I/O to bdi's
      
      The former already has bdi wake-up code. The latter does not, and this patch
      adds it.
      
      '__mark_inode_dirty()' is hot-path function, but this patch adds another
      'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when
      the bdi has no dirty inodes. So adding this spinlock should be fine and should
      not affect performance.
      
      This patch makes sure bdi threads and the forker thread do not wake-up if there
      is nothing to do. The forker thread will nevertheless wake up at least every
      5 min. to check whether it has to kill a bdi thread. This can also be optimized,
      but is not worth it.
      
      This patch also tidies up the warning about unregistered bid, and turns it from
      an ugly crocodile to a simple 'WARN()' statement.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      253c34e9
    • A
      writeback: move bdi threads exiting logic to the forker thread · fff5b85a
      Artem Bityutskiy 提交于
      Currently, bdi threads can decide to exit if there were no useful activities
      for 5 minutes. However, this causes nasty races: we can easily oops in the
      'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up.
      
      And even if we do not oops, but the bdi tread exits immediately after we wake
      it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs)
      in the bdi work processing.
      
      This patch makes the forker thread to be the central place which not only
      creates bdi threads, but also kills them if they were inactive long enough.
      This better design-wise.
      
      Another reason why this change was done is to prepare for the further changes
      which will prevent the bdi threads from waking up every 5 sec and wasting
      power. Indeed, when the task does not wake up periodically anymore, it won't be
      able to exit either.
      
      This patch also moves the the 'wake_up_bit()' call from the bdi thread to the
      forker thread as well. So now the forker thread sets the BDI_pending bit, then
      forks the task or kills it, then clears the bit and wakes up the waiting
      process.
      
      The only process which may wain on the bit is 'bdi_wb_shutdown()'. This
      function was changed as well - now it first removes the bdi from the
      'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is
      guaranteed that the forker thread won't race with it, because the bdi is not
      visible. Note, the forker thread sets the 'BDI_pending' bit under the
      'bdi->wb_lock' which is essential for proper serialization.
      
      And additionally, when we change 'bdi->wb.task', we now take the
      'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise
      would when raced with, say, 'bdi_queue_work()'.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fff5b85a
    • A
      writeback: move last_active to bdi · ecd58403
      Artem Bityutskiy 提交于
      Currently bdi threads use local variable 'last_active' which stores last time
      when the bdi thread did some useful work. Move this local variable to 'struct
      bdi_writeback'. This is just a preparation for the further patches which will
      make the forker thread decide when bdi threads should be killed.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      ecd58403
    • A
      writeback: do not remove bdi from bdi_list · 78c40cb6
      Artem Bityutskiy 提交于
      The forker thread removes bdis from 'bdi_list' before forking the bdi thread.
      But this is wrong for at least 2 reasons.
      
      Reason #1: if we temporary remove a bdi from the list, we may miss works which
                 would otherwise be given to us.
      
      Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are
                 always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when
                 it races with the forker thread, it can shut down the bdi thread
                 at the same time as the forker creates it.
      
      This patch makes sure the forker thread never removes bdis from 'bdi_list'
      (which was suggested by Christoph Hellwig).
      
      In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to
      hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending'
      flag.
      
      NOTE! The error path is interesting. Currently, when we fail to create a bdi
      thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the
      bdi from the list, we cannot move it to the tail either, because then we can
      mess up the RCU readers which walk the list. And also, we'll have the race
      described above in "Reason #2".
      
      But I not think that adding to the tail is any important so I just do not do
      that.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      78c40cb6
    • A
      writeback: do not lose wake-ups in bdi threads · 297252c8
      Artem Bityutskiy 提交于
      Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For
      example, if 'bdi_queue_work()' is executed after the bdi thread have had
      finished 'wb_do_writeback()' but before it called
      'schedule_timeout_interruptible()'.
      
      To fix this issue, we have to check whether we have works to process after we
      have changed the task state to 'TASK_INTERRUPTIBLE'.
      
      This patch also clean-ups handling of the cases when 'dirty_writeback_interval'
      is zero or non-zero.
      
      Additionally, this patch also removes unneeded 'list_empty_careful()' call.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      297252c8
    • A
      writeback: harmonize writeback threads naming · 6f904ff0
      Artem Bityutskiy 提交于
      The write-back code mixes words "thread" and "task" for the same things. This
      is not a big deal, but still an inconsistency.
      
      hch: a convention I tend to use and I've seen in various places
      is to always use _task for the storage of the task_struct pointer,
      and thread everywhere else.  This especially helps with having
      foo_thread for the actual thread and foo_task for a global
      variable keeping the task_struct pointer
      
      This patch renames:
      * 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
      * 'bdi_forker_task()'              -> 'bdi_forker_thread()'
      
      because bdi threads are 'bdi_writeback_thread()', so these names are more
      consistent.
      
      This patch also amends commentaries and makes them refer the forker and bdi
      threads as "thread", not "task".
      
      Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
      'static void' instead of 'void static' and make checkpatch.pl happy.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6f904ff0