1. 01 6月, 2016 1 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 02 2月, 2016 1 次提交
    • D
      CacheFiles: Provide read-and-reset release counters for cachefilesd · a5b3a80b
      David Howells 提交于
      Provide read-and-reset objects- and blocks-released counters for cachefilesd
      to use to work out whether there's anything new that can be culled.
      
      One of the problems cachefilesd has is that if all the objects in the cache
      are pinned by inodes lying dormant in the kernel inode cache, there isn't
      anything for it to cull.  In such a case, it just spins around walking the
      filesystem tree and scanning for something to cull.  This eats up a lot of
      CPU time.
      
      By telling cachefilesd if there have been any releases, the daemon can
      sleep until there is the possibility of something to do.
      
      cachefilesd finds this information by the following means:
      
       (1) When the control fd is read, the kernel presents a list of values of
           interest.  "freleased=N" and "breleased=N" are added to this list to
           indicate the number of files released and number of blocks released
           since the last read call.  At this point the counters are reset.
      
       (2) POLLIN is signalled if the number of files released becomes greater
           than 0.
      
      Note that by 'released' it just means that the kernel has released its
      interest in those files for the moment, not necessarily that the files
      should be deleted from the cache.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NSteve Dickson <steved@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a5b3a80b
  4. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  5. 04 1月, 2016 1 次提交
  6. 17 11月, 2015 1 次提交
  7. 11 11月, 2015 2 次提交
    • D
      FS-Cache: Handle a write to the page immediately beyond the EOF marker · 102f4d90
      David Howells 提交于
      Handle a write being requested to the page immediately beyond the EOF
      marker on a cache object.  Currently this gets an assertion failure in
      CacheFiles because the EOF marker is used there to encode information about
      a partial page at the EOF - which could lead to an unknown blank spot in
      the file if we extend the file over it.
      
      The problem is actually in fscache where we check the index of the page
      being written against store_limit.  store_limit is set to the number of
      pages that we're allowed to store by fscache_set_store_limit() - which
      means it's one more than the index of the last page we're allowed to store.
      The problem is that we permit writing to a page with an index _equal_ to
      the store limit - when we should reject that case.
      
      Whilst we're at it, change the triggered assertion in CacheFiles to just
      return -ENOBUFS instead.
      
      The assertion failure looks something like this:
      
      CacheFiles: Assertion failed
      1000 < 7b1 is false
      ------------[ cut here ]------------
      kernel BUG at fs/cachefiles/rdwr.c:962!
      ...
      RIP: 0010:[<ffffffffa02c9e83>]  [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
      
      Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754fb (at least)
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      102f4d90
    • N
      cachefiles: perform test on s_blocksize when opening cache file. · 95201a40
      NeilBrown 提交于
      cachefiles requires that s_blocksize in the cache is not greater than
      PAGE_SIZE, and performs the check every time a block is accessed.
      
      Move the test to the place where the file is "opened", where other
      file-validity tests are performed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      95201a40
  8. 07 11月, 2015 1 次提交
  9. 16 4月, 2015 2 次提交
  10. 24 2月, 2015 1 次提交
  11. 23 2月, 2015 2 次提交
    • D
      Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · ce40fa78
      David Howells 提交于
      Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
      thereof) in cachefiles:
      
       (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
           it doesn't want to deal with automounts in its cache.
      
       (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
           cachefiles.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ce40fa78
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  12. 20 11月, 2014 1 次提交
  13. 14 10月, 2014 1 次提交
    • D
      CacheFiles: Fix incorrect test for in-memory object collision · a30efe26
      David Howells 提交于
      When CacheFiles cache objects are in use, they have in-memory representations,
      as defined by the cachefiles_object struct.  These are kept in a tree rooted in
      the cache and indexed by dentry pointer (since there's a unique mapping between
      object index key and dentry).
      
      Collisions can occur between a representation already in the tree and a new
      representation being set up because it takes time to dispose of an old
      representation - particularly if it must be unlinked or renamed.
      
      When such a collision occurs, cachefiles_mark_object_active() is meant to check
      to see if the old, already-present representation is in the process of being
      discarded (ie. FSCACHE_OBJECT_IS_LIVE is not set on it) - and, if so, wait for
      the representation to be removed (ie. CACHEFILES_OBJECT_ACTIVE is then
      cleared).
      
      However, the test for whether the old representation is still live is checking
      the new object - which always will be live at this point.  This leads to an
      oops looking like:
      
      	CacheFiles: Error: Unexpected object collision
      	object: OBJ1b354
      	objstate=LOOK_UP_OBJECT fl=8 wbusy=2 ev=0[0]
      	ops=0 inp=0 exc=0
      	parent=ffff88053f5417c0
      	cookie=ffff880538f202a0 [pr=ffff8805381b7160 nd=ffff880509c6eb78 fl=27]
      	key=[8] '2490000000000000'
      	xobject: OBJ1a600
      	xobjstate=DROP_OBJECT fl=70 wbusy=2 ev=0[0]
      	xops=0 inp=0 exc=0
      	xparent=ffff88053f5417c0
      	xcookie=ffff88050f4cbf70 [pr=ffff8805381b7160 nd=          (null) fl=12]
      	------------[ cut here ]------------
      	kernel BUG at fs/cachefiles/namei.c:200!
      	...
      	Workqueue: fscache_object fscache_object_work_func [fscache]
      	...
      	RIP: ... cachefiles_walk_to_object+0x7ea/0x860 [cachefiles]
      	...
      	Call Trace:
      	 [<ffffffffa04dadd8>] ? cachefiles_lookup_object+0x58/0x100 [cachefiles]
      	 [<ffffffffa01affe9>] ? fscache_look_up_object+0xb9/0x1d0 [fscache]
      	 [<ffffffffa01afc4d>] ? fscache_parent_ready+0x2d/0x80 [fscache]
      	 [<ffffffffa01b0672>] ? fscache_object_work_func+0x92/0x1f0 [fscache]
      	 [<ffffffff8107e82b>] ? process_one_work+0x16b/0x400
      	 [<ffffffff8107fc16>] ? worker_thread+0x116/0x380
      	 [<ffffffff8107fb00>] ? manage_workers.isra.21+0x290/0x290
      	 [<ffffffff81085edc>] ? kthread+0xbc/0xe0
      	 [<ffffffff81085e20>] ? flush_kthread_worker+0x80/0x80
      	 [<ffffffff81502d0c>] ? ret_from_fork+0x7c/0xb0
      	 [<ffffffff81085e20>] ? flush_kthread_worker+0x80/0x80
      Reported-by: NManuel Schölling <manuel.schoelling@gmx.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSteve Dickson <steved@redhat.com>
      a30efe26
  14. 09 10月, 2014 1 次提交
  15. 30 9月, 2014 1 次提交
    • D
      CacheFiles: Handle object being killed before being set up · a3b7c004
      David Howells 提交于
      If a cache object gets killed whilst in the process of being set up - for
      instance if the netfs relinquishes the cookie that the object is associated
      with - then the object's state machine will transit to the DROP_OBJECT state
      without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.
      
      This is a problem for CacheFiles because cachefiles_drop_object() assumes that
      object->dentry will be set upon reaching the DROP_OBJECT state and has an
      ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
      set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
      they fail).
      
      To fix this, just make the dentry cleanup in cachefiles_drop_object()
      conditional on the dentry actually being set and remove the assertion.
      
      	CacheFiles: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at .../fs/cachefiles/namei.c:425!
      	...
      	Workqueue: fscache_object fscache_object_work_func [fscache]
      	...
      	RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
      	...
      	Call Trace:
      	 [<ffffffffa043280f>] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
      	 [<ffffffffa02ac511>] ? fscache_drop_object+0xd1/0x1d0 [fscache]
      	 [<ffffffffa02ac697>] ? fscache_object_work_func+0x87/0x210 [fscache]
      	 [<ffffffff81080635>] ? process_one_work+0x155/0x450
      	 [<ffffffff81081c44>] ? worker_thread+0x114/0x370
      	 [<ffffffff81081b30>] ? manage_workers.isra.21+0x2c0/0x2c0
      	 [<ffffffff81087fcc>] ? kthread+0xbc/0xe0
      	 [<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
      	 [<ffffffff8150638c>] ? ret_from_fork+0x7c/0xb0
      	 [<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
      Reported-by: NManuel Schölling <manuel.schoelling@gmx.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSteve Dickson <steved@redhat.com>
      a3b7c004
  16. 26 9月, 2014 1 次提交
  17. 18 9月, 2014 2 次提交
    • D
      CacheFiles: Handle rename2 · e2cf1f1c
      David Howells 提交于
      Not all filesystems now provide the rename i_op - ext4 for one - but rather
      provide the rename2 i_op.  CacheFiles checks that the filesystem has rename
      and so will reject ext4 now with EPERM:
      
      	CacheFiles: Failed to register: -1
      
      Fix this by checking for rename2 as an alternative.  The call to vfs_rename()
      actually handles selection of the appropriate function, so we needn't worry
      about that.
      
      Turning on debugging shows:
      
      	[cachef] ==> cachefiles_get_directory(,,cache)
      	[cachef] subdir -> ffff88000b22b778 positive
      	[cachef] <== cachefiles_get_directory() = -1 [check]
      
      where -1 is EPERM.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e2cf1f1c
    • N
      cachefiles: remove two unused pagevecs. · 696382f9
      NeilBrown 提交于
      
      These two have been unused since
      
      commit c4d6d8db
          CacheFiles: Fix the marking of cached pages
      
      in 3.8.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      696382f9
  18. 07 6月, 2014 2 次提交
  19. 04 4月, 2014 1 次提交
    • J
      fs: cachefiles: use add_to_page_cache_lru() · 55881bc7
      Johannes Weiner 提交于
      This code used to have its own lru cache pagevec up until a0b8cab3 ("mm:
      remove lru parameter from __pagevec_lru_add and remove parts of pagevec
      API").  Now it's just add_to_page_cache() followed by lru_cache_add(),
      might as well use add_to_page_cache_lru() directly.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55881bc7
  20. 02 4月, 2014 1 次提交
  21. 01 4月, 2014 2 次提交
  22. 09 11月, 2013 3 次提交
    • J
      locks: break delegations on any attribute modification · 27ac0ffe
      J. Bruce Fields 提交于
      NFSv4 uses leases to guarantee that clients can cache metadata as well
      as data.
      
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      27ac0ffe
    • J
      locks: break delegations on rename · 8e6d782c
      J. Bruce Fields 提交于
      Cc: David Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8e6d782c
    • J
      locks: break delegations on unlink · b21996e3
      J. Bruce Fields 提交于
      We need to break delegations on any operation that changes the set of
      links pointing to an inode.  Start with unlink.
      
      Such operations also hold the i_mutex on a parent directory.  Breaking a
      delegation may require waiting for a timeout (by default 90 seconds) in
      the case of a unresponsive NFS client.  To avoid blocking all directory
      operations, we therefore drop locks before waiting for the delegation.
      The logic then looks like:
      
      	acquire locks
      	...
      	test for delegation; if found:
      		take reference on inode
      		release locks
      		wait for delegation break
      		drop reference on inode
      		retry
      
      It is possible this could never terminate.  (Even if we take precautions
      to prevent another delegation being acquired on the same inode, we could
      get a different inode on each retry.)  But this seems very unlikely.
      
      The initial test for a delegation happens after the lock on the target
      inode is acquired, but the directory inode may have been acquired
      further up the call stack.  We therefore add a "struct inode **"
      argument to any intervening functions, which we use to pass the inode
      back up to the caller in the case it needs a delegation synchronously
      broken.
      
      Cc: David Howells <dhowells@redhat.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b21996e3
  23. 28 9月, 2013 1 次提交
    • D
      FS-Cache: Provide the ability to enable/disable cookies · 94d30ae9
      David Howells 提交于
      Provide the ability to enable and disable fscache cookies.  A disabled cookie
      will reject or ignore further requests to:
      
      	Acquire a child cookie
      	Invalidate and update backing objects
      	Check the consistency of a backing object
      	Allocate storage for backing page
      	Read backing pages
      	Write to backing pages
      
      but still allows:
      
      	Checks/waits on the completion of already in-progress objects
      	Uncaching of pages
      	Relinquishment of cookies
      
      Two new operations are provided:
      
       (1) Disable a cookie:
      
      	void fscache_disable_cookie(struct fscache_cookie *cookie,
      				    bool invalidate);
      
           If the cookie is not already disabled, this locks the cookie against other
           dis/enablement ops, marks the cookie as being disabled, discards or
           invalidates any backing objects and waits for cessation of activity on any
           associated object.
      
           This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
           but it reinitialises the cookie such that it can be reenabled.
      
           All possible failures are handled internally.  The caller should consider
           calling fscache_uncache_all_inode_pages() afterwards to make sure all page
           markings are cleared up.
      
       (2) Enable a cookie:
      
      	void fscache_enable_cookie(struct fscache_cookie *cookie,
      				   bool (*can_enable)(void *data),
      				   void *data)
      
           If the cookie is not already enabled, this locks the cookie against other
           dis/enablement ops, invokes can_enable() and, if the cookie is not an
           index cookie, will begin the procedure of acquiring backing objects.
      
           The optional can_enable() function is passed the data argument and returns
           a ruling as to whether or not enablement should actually be permitted to
           begin.
      
           All possible failures are handled internally.  The cookie will only be
           marked as enabled if provisional backing objects are allocated.
      
      A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
      is then contingent on i_writecount <= 0.  can_enable() checks for a race
      between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
      handling and allows us to get rid of open(O_RDONLY) accidentally introducing
      caching to an inode that's open for writing already.
      
      One operation has its API modified:
      
       (3) Acquire a cookie.
      
      	struct fscache_cookie *fscache_acquire_cookie(
      		struct fscache_cookie *parent,
      		const struct fscache_cookie_def *def,
      		void *netfs_data,
      		bool enable);
      
           This now has an additional argument that indicates whether the requested
           cookie should be enabled by default.  It doesn't need the can_enable()
           function because the caller must prevent multiple calls for the same netfs
           object and it doesn't need to take the enablement lock because no one else
           can get at the cookie before this returns.
      
      Signed-off-by: David Howells <dhowells@redhat.com
      94d30ae9
  24. 21 9月, 2013 2 次提交
  25. 06 9月, 2013 1 次提交
  26. 04 7月, 2013 1 次提交
  27. 19 6月, 2013 5 次提交
    • H
    • D
      FS-Cache: Simplify cookie retention for fscache_objects, fixing oops · 1362729b
      David Howells 提交于
      Simplify the way fscache cache objects retain their cookie.  The way I
      implemented the cookie storage handling made synchronisation a pain (ie. the
      object state machine can't rely on the cookie actually still being there).
      
      Instead of the the object being detached from the cookie and the cookie being
      freed in __fscache_relinquish_cookie(), we defer both operations:
      
       (*) The detachment of the object from the list in the cookie now takes place
           in fscache_drop_object() and is thus governed by the object state machine
           (fscache_detach_from_cookie() has been removed).
      
       (*) The release of the cookie is now in fscache_object_destroy() - which is
           called by the cache backend just before it frees the object.
      
      This means that the fscache_cookie struct is now available to the cache all the
      way through from ->alloc_object() to ->drop_object() and ->put_object() -
      meaning that it's no longer necessary to take object->lock to guarantee access.
      
      However, __fscache_relinquish_cookie() doesn't wait for the object to go all
      the way through to destruction before letting the netfs proceed.  That would
      massively slow down the netfs.  Since __fscache_relinquish_cookie() leaves the
      cookie around, in must therefore break all attachments to the netfs - which
      includes ->def, ->netfs_data and any outstanding page read/writes.
      
      To handle this, struct fscache_cookie now has an n_active counter:
      
       (1) This starts off initialised to 1.
      
       (2) Any time the cache needs to get at the netfs data, it calls
           fscache_use_cookie() to increment it - if it is not zero.  If it was zero,
           then access is not permitted.
      
       (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
           to decrement it.  This does a wake-up on it if it reaches 0.
      
       (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
           reach 0.  The initialisation to 1 in step (1) ensures that we only get
           wake ups when we're trying to get rid of the cookie.
      
      This leaves __fscache_relinquish_cookie() a lot simpler.
      
      
      ***
      This fixes a problem in the current code whereby if fscache_invalidate() is
      followed sufficiently quickly by fscache_relinquish_cookie() then it is
      possible for __fscache_relinquish_cookie() to have detached the cookie from the
      object and cleared the pointer before a thread is dispatched to process the
      invalidation state in the object state machine.
      
      Since the pending write clearance was deferred to the invalidation state to
      make it asynchronous, we need to either wait in relinquishment for the stores
      tree to be cleared in the invalidation state or we need to handle the clearance
      in relinquishment.
      
      Further, if the relinquishment code does clear the tree, then the invalidation
      state need to make the clearance contingent on still having the cookie to hand
      (since that's where the tree is rooted) and we have to prevent the cookie from
      disappearing for the duration.
      
      This can lead to an oops like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
      ...
      RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30
      ...
      CR2: 000000000000000c ...
      ...
      Process kslowd002 (...)
      ....
      Call Trace:
       [<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
       [<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180
       [<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
       [<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0
       [<ffffffff8110e233>] slow_work_execute+0x233/0x310
       [<ffffffff8110e515>] slow_work_thread+0x205/0x360
       [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffff8110e310>] ? slow_work_thread+0x0/0x360
       [<ffffffff81096936>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810968a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      
      The parameter to fscache_invalidate_writes() was object->cookie which is NULL.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      1362729b
    • D
      FS-Cache: Fix object state machine to have separate work and wait states · caaef690
      David Howells 提交于
      Fix object state machine to have separate work and wait states as that makes
      it easier to envision.
      
      There are now three kinds of state:
      
       (1) Work state.  This is an execution state.  No event processing is performed
           by a work state.  The function attached to a work state returns a pointer
           indicating the next state to which the OSM should transition.  Returning
           NO_TRANSIT repeats the current state, but goes back to the scheduler
           first.
      
       (2) Wait state.  This is an event processing state.  No execution is
           performed by a wait state.  Wait states are just tables of "if event X
           occurs, clear it and transition to state Y".  The dispatcher returns to
           the scheduler if none of the events in which the wait state has an
           interest are currently pending.
      
       (3) Out-of-band state.  This is a special work state.  Transitions to normal
           states can be overridden when an unexpected event occurs (eg. I/O error).
           Instead the dispatcher disables and clears the OOB event and transits to
           the specified work state.  This then acts as an ordinary work state,
           though object->state points to the overridden destination.  Returning
           NO_TRANSIT resumes the overridden transition.
      
      In addition, the states have names in their definitions, so there's no need for
      tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
      that is automatic for work states.
      
      Since the states are now separate structs rather than values in an enum, it's
      not possible to use comparisons other than (non-)equality between them, so use
      some object->flags to indicate what phase an object is in.
      
      The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
      (EV_KILL).  An object flag now carries the information about retirement.
      
      Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
      into an KILL_OBJECT state and additional states have been added for handling
      waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).
      
      A state has also been added for synchronising with parent object initialisation
      (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      caaef690
    • D
      FS-Cache: Wrap checks on object state · 493f7bc1
      David Howells 提交于
      Wrap checks on object state (mostly outside of fs/fscache/object.c) with
      inline functions so that the mechanism can be replaced.
      
      Some of the state checks within object.c are left as-is as they will be
      replaced.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      493f7bc1
    • J
      CacheFiles: name i_mutex lock class explicitly · 6bd5e82b
      J. Bruce Fields 提交于
      Just some cleanup.
      
      (And note the caller of this function may, for example, call vfs_unlink
      on a child, so the "1" (I_MUTEX_PARENT) really was what was intended
      here.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      6bd5e82b