1. 28 9月, 2013 1 次提交
    • D
      FS-Cache: Provide the ability to enable/disable cookies · 94d30ae9
      David Howells 提交于
      Provide the ability to enable and disable fscache cookies.  A disabled cookie
      will reject or ignore further requests to:
      
      	Acquire a child cookie
      	Invalidate and update backing objects
      	Check the consistency of a backing object
      	Allocate storage for backing page
      	Read backing pages
      	Write to backing pages
      
      but still allows:
      
      	Checks/waits on the completion of already in-progress objects
      	Uncaching of pages
      	Relinquishment of cookies
      
      Two new operations are provided:
      
       (1) Disable a cookie:
      
      	void fscache_disable_cookie(struct fscache_cookie *cookie,
      				    bool invalidate);
      
           If the cookie is not already disabled, this locks the cookie against other
           dis/enablement ops, marks the cookie as being disabled, discards or
           invalidates any backing objects and waits for cessation of activity on any
           associated object.
      
           This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
           but it reinitialises the cookie such that it can be reenabled.
      
           All possible failures are handled internally.  The caller should consider
           calling fscache_uncache_all_inode_pages() afterwards to make sure all page
           markings are cleared up.
      
       (2) Enable a cookie:
      
      	void fscache_enable_cookie(struct fscache_cookie *cookie,
      				   bool (*can_enable)(void *data),
      				   void *data)
      
           If the cookie is not already enabled, this locks the cookie against other
           dis/enablement ops, invokes can_enable() and, if the cookie is not an
           index cookie, will begin the procedure of acquiring backing objects.
      
           The optional can_enable() function is passed the data argument and returns
           a ruling as to whether or not enablement should actually be permitted to
           begin.
      
           All possible failures are handled internally.  The cookie will only be
           marked as enabled if provisional backing objects are allocated.
      
      A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
      is then contingent on i_writecount <= 0.  can_enable() checks for a race
      between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
      handling and allows us to get rid of open(O_RDONLY) accidentally introducing
      caching to an inode that's open for writing already.
      
      One operation has its API modified:
      
       (3) Acquire a cookie.
      
      	struct fscache_cookie *fscache_acquire_cookie(
      		struct fscache_cookie *parent,
      		const struct fscache_cookie_def *def,
      		void *netfs_data,
      		bool enable);
      
           This now has an additional argument that indicates whether the requested
           cookie should be enabled by default.  It doesn't need the can_enable()
           function because the caller must prevent multiple calls for the same netfs
           object and it doesn't need to take the enablement lock because no one else
           can get at the cookie before this returns.
      
      Signed-off-by: David Howells <dhowells@redhat.com
      94d30ae9
  2. 17 9月, 2013 1 次提交
    • M
      vfs: improve i_op->atomic_open() documentation · 0854d450
      Miklos Szeredi 提交于
      Fix documentation of ->atomic_open() and related functions: finish_open()
      and finish_no_open().  Also add details that seem to be unclear and a
      source of bugs (some of which are fixed in the following series).
      
      Cc-ing maintainers of all filesystems implementing ->atomic_open().
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0854d450
  3. 14 9月, 2013 1 次提交
  4. 12 9月, 2013 2 次提交
  5. 11 9月, 2013 2 次提交
  6. 09 9月, 2013 3 次提交
  7. 06 9月, 2013 3 次提交
    • M
      fscache: Netfs function for cleanup post readpages · 5a6f282a
      Milosz Tanski 提交于
      Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
      inside the aops readpages callback.  It marks all the pages in the list
      provided by readahead with PG_private_2.  In the cases that the netfs fails to
      read all the pages (which is legal) it ends up returning to the readahead and
      triggering a BUG.  This happens because the page list still contains marked
      pages.
      
      This patch implements a simple fscache_readpages_cancel function that the netfs
      should call before returning from readpages.  It will revoke the pages from the
      underlying cache backend and unmark them.
      
      The problem was originally worked out in the Ceph devel tree, but it also
      occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
      will clean up the unprocessed pages in the case of an error.
      
      This can be used to address the following oops:
      
      [12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
      [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
      	(null) index:0x0
      [12410647.597298] page flags: 0x200000000001000(private_2)
      
      ...
      
      [12410647.597334] Call Trace:
      [12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
      [12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
      [12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
      [12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
      [12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
      [12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
      [12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
      [12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
      [12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
      [12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
      [12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
      [12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
      [12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
      [12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
      [12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
      [12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
      [12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
      [12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
      [12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
      [12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
      [12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
      [12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
      [12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
      [12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
      [12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a6f282a
    • D
      FS-Cache: Fix heading in documentation · 696f69b6
      David Howells 提交于
      Fix a heading in the documentation to make it consistent with the contents
      list.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      696f69b6
    • D
      FS-Cache: Add interface to check consistency of a cached object · da9803bc
      David Howells 提交于
      Extend the fscache netfs API so that the netfs can ask as to whether a cache
      object is up to date with respect to its corresponding netfs object:
      
      	int fscache_check_consistency(struct fscache_cookie *cookie)
      
      This will call back to the netfs to check whether the auxiliary data associated
      with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
      may also return -ENOMEM and -ERESTARTSYS.
      
      The backends now have to implement a mandatory operation pointer:
      
      	int (*check_consistency)(struct fscache_object *object)
      
      that corresponds to the above API call.  FS-Cache takes care of pinning the
      object and the cookie in memory and managing this call with respect to the
      object state.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      da9803bc
  8. 29 8月, 2013 1 次提交
    • E
      ext4: allow specifying external journal by pathname mount option · ad4eec61
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0 
      # mkfs.ext4 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile 
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT4-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      ad4eec61
  9. 20 8月, 2013 1 次提交
  10. 06 8月, 2013 2 次提交
  11. 01 8月, 2013 1 次提交
    • E
      ext3: allow specifying external journal by pathname mount option · cf7eff46
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0
      # mkfs.ext3 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT3-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      cf7eff46
  12. 30 7月, 2013 1 次提交
  13. 25 7月, 2013 1 次提交
  14. 10 7月, 2013 1 次提交
  15. 04 7月, 2013 3 次提交
  16. 03 7月, 2013 1 次提交
  17. 29 6月, 2013 6 次提交
    • J
      locks: give the blocked_hash its own spinlock · 7b2296af
      Jeff Layton 提交于
      There's no reason we have to protect the blocked_hash and file_lock_list
      with the same spinlock. With the tests I have, breaking it in two gives
      a barely measurable performance benefit, but it seems reasonable to make
      this locking as granular as possible.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7b2296af
    • J
      locks: add a new "lm_owner_key" lock operation · 3999e493
      Jeff Layton 提交于
      Currently, the hashing that the locking code uses to add these values
      to the blocked_hash is simply calculated using fl_owner field. That's
      valid in most cases except for server-side lockd, which validates the
      owner of a lock based on fl_owner and fl_pid.
      
      In the case where you have a small number of NFS clients doing a lot
      of locking between different processes, you could end up with all
      the blocked requests sitting in a very small number of hash buckets.
      
      Add a new lm_owner_key operation to the lock_manager_operations that
      will generate an unsigned long to use as the key in the hashtable.
      That function is only implemented for server-side lockd, and simply
      XORs the fl_owner and fl_pid.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3999e493
    • J
      locks: protect most of the file_lock handling with i_lock · 1c8c601a
      Jeff Layton 提交于
      Having a global lock that protects all of this code is a clear
      scalability problem. Instead of doing that, move most of the code to be
      protected by the i_lock instead. The exceptions are the global lists
      that the ->fl_link sits on, and the ->fl_block list.
      
      ->fl_link is what connects these structures to the
      global lists, so we must ensure that we hold those locks when iterating
      over or updating these lists.
      
      Furthermore, sound deadlock detection requires that we hold the
      blocked_list state steady while checking for loops. We also must ensure
      that the search and update to the list are atomic.
      
      For the checking and insertion side of the blocked_list, push the
      acquisition of the global lock into __posix_lock_file and ensure that
      checking and update of the  blocked_list is done without dropping the
      lock in between.
      
      On the removal side, when waking up blocked lock waiters, take the
      global lock before walking the blocked list and dequeue the waiters from
      the global list prior to removal from the fl_block list.
      
      With this, deadlock detection should be race free while we minimize
      excessive file_lock_lock thrashing.
      
      Finally, in order to avoid a lock inversion problem when handling
      /proc/locks output we must ensure that manipulations of the fl_block
      list are also protected by the file_lock_lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c8c601a
    • L
      Don't pass inode to ->d_hash() and ->d_compare() · da53be12
      Linus Torvalds 提交于
      Instances either don't look at it at all (the majority of cases) or
      only want it to find the superblock (which can be had as dentry->d_sb).
      A few cases that want more are actually safe with dentry->d_inode -
      the only precaution needed is the check that it hadn't been replaced with
      NULL by rmdir() or by overwriting rename(), which case should be simply
      treated as cache miss.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      da53be12
    • A
      [readdir] ->readdir() is gone · 2233f31a
      Al Viro 提交于
      everything's converted to ->iterate()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2233f31a
    • A
      [readdir] introduce iterate_dir() and dir_context · 5c0ba4e0
      Al Viro 提交于
      iterate_dir(): new helper, replacing vfs_readdir().
      
      struct dir_context: contains the readdir callback (and will get more stuff
      in it), embedded into whatever data that callback wants to deal with;
      eventually, we'll be passing it to ->readdir() replacement instead of
      (data,filldir) pair.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c0ba4e0
  18. 17 6月, 2013 1 次提交
    • N
      f2fs: add remount_fs callback support · 696c018c
      Namjae Jeon 提交于
      Add the f2fs_remount function call which will be used
      during the filesystem remounting. This function
      will help us to change the mount options specific to
      f2fs.
      
      Also modify the f2fs background_gc mount option, which
      will allow the user to dynamically trun on/off the
      garbage collection in f2fs based on the background_gc
      value. If background_gc=on, Garbage collection will
      be turned off & if background_gc=off, Garbage collection
      will be truned on.
      
      By default the garbage collection is on in f2fs.
      
      Change Log:
      v2: Incorporated the review comments by Gu Zheng.
          Removing the restore part for VFS flags
          Updating comments with proper flag conditions
          Display GC background option as ON/OFF
          Revised conditions to stop GC in case of remount
      
      v1: Initial changes for adding remount_fs callback
      support.
      
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: change /** with /* for the coding style]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      696c018c
  19. 06 6月, 2013 2 次提交
  20. 28 5月, 2013 1 次提交
  21. 22 5月, 2013 1 次提交
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  22. 07 5月, 2013 1 次提交
  23. 30 4月, 2013 1 次提交
  24. 28 4月, 2013 1 次提交
  25. 26 4月, 2013 1 次提交
    • S
      SUNRPC: Use gssproxy upcall for server RPCGSS authentication. · 030d794b
      Simo Sorce 提交于
      The main advantge of this new upcall mechanism is that it can handle
      big tickets as seen in Kerberos implementations where tickets carry
      authorization data like the MS-PAC buffer with AD or the Posix Authorization
      Data being discussed in IETF on the krbwg working group.
      
      The Gssproxy program is used to perform the accept_sec_context call on the
      kernel's behalf. The code is changed to also pass the input buffer straight
      to upcall mechanism to avoid allocating and copying many pages as tokens can
      be as big (potentially more in future) as 64KiB.
      Signed-off-by: NSimo Sorce <simo@redhat.com>
      [bfields: containerization, negotiation api]
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      030d794b