1. 23 12月, 2013 2 次提交
    • J
      f2fs: add description about small_discards in document · ba0697ec
      Jaegeuk Kim 提交于
      This patch adds a description about small_disacrds in the f2fs document.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      ba0697ec
    • J
      f2fs: introduce sysfs entry to control in-place-update policy · 216fbd64
      Jaegeuk Kim 提交于
      This patch introduces new sysfs entries for users to control the policy of
      in-place-updates, namely IPU, in f2fs.
      
      Sometimes f2fs suffers from performance degradation due to its out-of-place
      update policy that produces many additional node block writes.
      If the storage performance is very dependant on the amount of data writes
      instead of IO patterns, we'd better drop this out-of-place update policy.
      
      This patch suggests 5 polcies and their triggering conditions as follows.
      
      [sysfs entry name = ipu_policy]
      
      0: F2FS_IPU_FORCE       all the time,
      1: F2FS_IPU_SSR         if SSR mode is activated,
      2: F2FS_IPU_UTIL        if FS utilization is over threashold,
      3: F2FS_IPU_SSR_UTIL    if SSR mode is activated and FS utilization is over
                              threashold,
      4: F2FS_IPU_DISABLE    disable IPU. (=default option)
      
      [sysfs entry name = min_ipu_util]
      
      This parameter controls the threshold to trigger in-place-updates.
      The number indicates percentage of the filesystem utilization, and used by
      F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
      
      For more details, see need_inplace_update() in segment.h.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      216fbd64
  2. 22 11月, 2013 2 次提交
  3. 13 11月, 2013 2 次提交
  4. 09 11月, 2013 2 次提交
    • J
      vfs: take i_mutex on renamed file · 6cedba89
      J. Bruce Fields 提交于
      A read delegation is used by NFSv4 as a guarantee that a client can
      perform local read opens without informing the server.
      
      The open operation takes the last component of the pathname as an
      argument, thus is also a lookup operation, and giving the client the
      above guarantee means informing the client before we allow anything that
      would change the set of names pointing to the inode.
      
      Therefore, we need to break delegations on rename, link, and unlink.
      
      We also need to prevent new delegations from being acquired while one of
      these operations is in progress.
      
      We could add some completely new locking for that purpose, but it's
      simpler to use the i_mutex, since that's already taken by all the
      operations we care about.
      
      The single exception is rename.  So, modify rename to take the i_mutex
      on the file that is being renamed.
      
      Also fix up lockdep and Documentation/filesystems/directory-locking to
      reflect the change.
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6cedba89
    • A
      5a3cd992
  5. 01 11月, 2013 1 次提交
  6. 25 10月, 2013 1 次提交
  7. 28 9月, 2013 1 次提交
    • D
      FS-Cache: Provide the ability to enable/disable cookies · 94d30ae9
      David Howells 提交于
      Provide the ability to enable and disable fscache cookies.  A disabled cookie
      will reject or ignore further requests to:
      
      	Acquire a child cookie
      	Invalidate and update backing objects
      	Check the consistency of a backing object
      	Allocate storage for backing page
      	Read backing pages
      	Write to backing pages
      
      but still allows:
      
      	Checks/waits on the completion of already in-progress objects
      	Uncaching of pages
      	Relinquishment of cookies
      
      Two new operations are provided:
      
       (1) Disable a cookie:
      
      	void fscache_disable_cookie(struct fscache_cookie *cookie,
      				    bool invalidate);
      
           If the cookie is not already disabled, this locks the cookie against other
           dis/enablement ops, marks the cookie as being disabled, discards or
           invalidates any backing objects and waits for cessation of activity on any
           associated object.
      
           This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
           but it reinitialises the cookie such that it can be reenabled.
      
           All possible failures are handled internally.  The caller should consider
           calling fscache_uncache_all_inode_pages() afterwards to make sure all page
           markings are cleared up.
      
       (2) Enable a cookie:
      
      	void fscache_enable_cookie(struct fscache_cookie *cookie,
      				   bool (*can_enable)(void *data),
      				   void *data)
      
           If the cookie is not already enabled, this locks the cookie against other
           dis/enablement ops, invokes can_enable() and, if the cookie is not an
           index cookie, will begin the procedure of acquiring backing objects.
      
           The optional can_enable() function is passed the data argument and returns
           a ruling as to whether or not enablement should actually be permitted to
           begin.
      
           All possible failures are handled internally.  The cookie will only be
           marked as enabled if provisional backing objects are allocated.
      
      A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
      is then contingent on i_writecount <= 0.  can_enable() checks for a race
      between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
      handling and allows us to get rid of open(O_RDONLY) accidentally introducing
      caching to an inode that's open for writing already.
      
      One operation has its API modified:
      
       (3) Acquire a cookie.
      
      	struct fscache_cookie *fscache_acquire_cookie(
      		struct fscache_cookie *parent,
      		const struct fscache_cookie_def *def,
      		void *netfs_data,
      		bool enable);
      
           This now has an additional argument that indicates whether the requested
           cookie should be enabled by default.  It doesn't need the can_enable()
           function because the caller must prevent multiple calls for the same netfs
           object and it doesn't need to take the enablement lock because no one else
           can get at the cookie before this returns.
      
      Signed-off-by: David Howells <dhowells@redhat.com
      94d30ae9
  8. 17 9月, 2013 1 次提交
    • M
      vfs: improve i_op->atomic_open() documentation · 0854d450
      Miklos Szeredi 提交于
      Fix documentation of ->atomic_open() and related functions: finish_open()
      and finish_no_open().  Also add details that seem to be unclear and a
      source of bugs (some of which are fixed in the following series).
      
      Cc-ing maintainers of all filesystems implementing ->atomic_open().
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0854d450
  9. 14 9月, 2013 1 次提交
  10. 12 9月, 2013 2 次提交
  11. 11 9月, 2013 2 次提交
  12. 09 9月, 2013 3 次提交
  13. 06 9月, 2013 3 次提交
    • M
      fscache: Netfs function for cleanup post readpages · 5a6f282a
      Milosz Tanski 提交于
      Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
      inside the aops readpages callback.  It marks all the pages in the list
      provided by readahead with PG_private_2.  In the cases that the netfs fails to
      read all the pages (which is legal) it ends up returning to the readahead and
      triggering a BUG.  This happens because the page list still contains marked
      pages.
      
      This patch implements a simple fscache_readpages_cancel function that the netfs
      should call before returning from readpages.  It will revoke the pages from the
      underlying cache backend and unmark them.
      
      The problem was originally worked out in the Ceph devel tree, but it also
      occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
      will clean up the unprocessed pages in the case of an error.
      
      This can be used to address the following oops:
      
      [12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
      [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
      	(null) index:0x0
      [12410647.597298] page flags: 0x200000000001000(private_2)
      
      ...
      
      [12410647.597334] Call Trace:
      [12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
      [12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
      [12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
      [12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
      [12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
      [12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
      [12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
      [12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
      [12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
      [12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
      [12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
      [12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
      [12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
      [12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
      [12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
      [12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
      [12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
      [12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
      [12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
      [12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
      [12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
      [12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
      [12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
      [12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
      [12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a6f282a
    • D
      FS-Cache: Fix heading in documentation · 696f69b6
      David Howells 提交于
      Fix a heading in the documentation to make it consistent with the contents
      list.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      696f69b6
    • D
      FS-Cache: Add interface to check consistency of a cached object · da9803bc
      David Howells 提交于
      Extend the fscache netfs API so that the netfs can ask as to whether a cache
      object is up to date with respect to its corresponding netfs object:
      
      	int fscache_check_consistency(struct fscache_cookie *cookie)
      
      This will call back to the netfs to check whether the auxiliary data associated
      with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
      may also return -ENOMEM and -ERESTARTSYS.
      
      The backends now have to implement a mandatory operation pointer:
      
      	int (*check_consistency)(struct fscache_object *object)
      
      that corresponds to the above API call.  FS-Cache takes care of pinning the
      object and the cookie in memory and managing this call with respect to the
      object state.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      da9803bc
  14. 29 8月, 2013 1 次提交
    • E
      ext4: allow specifying external journal by pathname mount option · ad4eec61
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0 
      # mkfs.ext4 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile 
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT4-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      ad4eec61
  15. 20 8月, 2013 1 次提交
  16. 06 8月, 2013 2 次提交
  17. 01 8月, 2013 1 次提交
    • E
      ext3: allow specifying external journal by pathname mount option · cf7eff46
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0
      # mkfs.ext3 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT3-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      cf7eff46
  18. 30 7月, 2013 1 次提交
  19. 25 7月, 2013 1 次提交
  20. 10 7月, 2013 1 次提交
  21. 04 7月, 2013 3 次提交
  22. 03 7月, 2013 1 次提交
  23. 29 6月, 2013 5 次提交
    • J
      locks: give the blocked_hash its own spinlock · 7b2296af
      Jeff Layton 提交于
      There's no reason we have to protect the blocked_hash and file_lock_list
      with the same spinlock. With the tests I have, breaking it in two gives
      a barely measurable performance benefit, but it seems reasonable to make
      this locking as granular as possible.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7b2296af
    • J
      locks: add a new "lm_owner_key" lock operation · 3999e493
      Jeff Layton 提交于
      Currently, the hashing that the locking code uses to add these values
      to the blocked_hash is simply calculated using fl_owner field. That's
      valid in most cases except for server-side lockd, which validates the
      owner of a lock based on fl_owner and fl_pid.
      
      In the case where you have a small number of NFS clients doing a lot
      of locking between different processes, you could end up with all
      the blocked requests sitting in a very small number of hash buckets.
      
      Add a new lm_owner_key operation to the lock_manager_operations that
      will generate an unsigned long to use as the key in the hashtable.
      That function is only implemented for server-side lockd, and simply
      XORs the fl_owner and fl_pid.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3999e493
    • J
      locks: protect most of the file_lock handling with i_lock · 1c8c601a
      Jeff Layton 提交于
      Having a global lock that protects all of this code is a clear
      scalability problem. Instead of doing that, move most of the code to be
      protected by the i_lock instead. The exceptions are the global lists
      that the ->fl_link sits on, and the ->fl_block list.
      
      ->fl_link is what connects these structures to the
      global lists, so we must ensure that we hold those locks when iterating
      over or updating these lists.
      
      Furthermore, sound deadlock detection requires that we hold the
      blocked_list state steady while checking for loops. We also must ensure
      that the search and update to the list are atomic.
      
      For the checking and insertion side of the blocked_list, push the
      acquisition of the global lock into __posix_lock_file and ensure that
      checking and update of the  blocked_list is done without dropping the
      lock in between.
      
      On the removal side, when waking up blocked lock waiters, take the
      global lock before walking the blocked list and dequeue the waiters from
      the global list prior to removal from the fl_block list.
      
      With this, deadlock detection should be race free while we minimize
      excessive file_lock_lock thrashing.
      
      Finally, in order to avoid a lock inversion problem when handling
      /proc/locks output we must ensure that manipulations of the fl_block
      list are also protected by the file_lock_lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c8c601a
    • L
      Don't pass inode to ->d_hash() and ->d_compare() · da53be12
      Linus Torvalds 提交于
      Instances either don't look at it at all (the majority of cases) or
      only want it to find the superblock (which can be had as dentry->d_sb).
      A few cases that want more are actually safe with dentry->d_inode -
      the only precaution needed is the check that it hadn't been replaced with
      NULL by rmdir() or by overwriting rename(), which case should be simply
      treated as cache miss.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      da53be12
    • A
      [readdir] ->readdir() is gone · 2233f31a
      Al Viro 提交于
      everything's converted to ->iterate()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2233f31a