1. 12 9月, 2013 4 次提交
    • Y
      ocfs2: ac_bits_wanted should be local_alloc_bits when returns -ENOSPC · 7e9b7937
      Younger Liu 提交于
      There is an issue in reserving and claiming space for localalloc, When
      localalloc space is not enough, it would claim space from global_bitmap.
      And if there is not enough free space in global_bitmap, the size of
      claiming space would set to half of orignal size and retry.
      
      The issue is as follows: osb->local_alloc_bits is set to half of orignal
      size in ocfs2_recalc_la_window(), but ac->ac_bits_wanted is set to
      osb->local_alloc_default_bits which is not changed.  localalloc always
      reserves and claims local_alloc_default_bits space and returns ENOSPC.
      
      So, ac->ac_bits_wanted should be osb->local_alloc_bits which would be
      changed.
      Signed-off-by: NYounger Liu <younger.liu@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Jeff Liu <jeff.liu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e9b7937
    • X
      ocfs2: dlm_request_all_locks() should deal with the status sent from target node · 98ac9125
      Xue jiufei 提交于
      dlm_request_all_locks() should deal with the status sent from target node
      if DLM_LOCK_REQUEST_MSG is sent successfully, or recovery master will fall
      into endless loop, waiting for other nodes to send locks and
      DLM_RECO_DATA_DONE_MSG to me.
      
              NodeA                                  NodeB
                                           selected as recovery master
                                           dlm_remaster_locks()
                                           ->dlm_request_all_locks()
                                           send DLM_LOCK_REQUEST_MSG to nodeA
      
      It happened that NodeA cannot alloc memory when it processes this
      message.  dlm_request_all_locks_handler() do not queue
      dlm_request_all_locks_worker and returns -ENOMEM.  It will never send
      locks and DLM_RECO_DATA_DONE_MSG to NodeB.
      
                                          NodeB do not deal with the status
                                          sent from nodeA, and will fall in
                                          endless loop waiting for the
                                          recovery state of NodeA to be
                                          changed.
      Signed-off-by: Njoyce <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Jeff Liu <jeff.liu@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98ac9125
    • J
      ocfs2: use i_size_read() to access i_size · f17c20dd
      Junxiao Bi 提交于
      Though ocfs2 uses inode->i_mutex to protect i_size, there are both
      i_size_read/write() and direct accesses.  Clean up all direct access to
      eliminate confusion.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f17c20dd
    • Y
      ocfs2: lighten up allocate transaction · 2b1e55c3
      Younger Liu 提交于
      The issue scenario is as following:
      
      When fallocating a very large disk space for a small file,
      __ocfs2_extend_allocation attempts to get a very large transaction.  For
      some journal sizes, there may be not enough room for this transaction,
      and the fallocate will fail.
      
      The patch below extends & restarts the transaction as necessary while
      allocating space, and should work with even the smallest journal.  This
      patch refers ext4 resize.
      
      Test:
      # mkfs.ocfs2 -b 4K -C 32K -T datafiles /dev/sdc
      ...(jounral size is 32M)
      # mount.ocfs2 /dev/sdc /mnt/ocfs2/
      # touch /mnt/ocfs2/1.log
      # fallocate -o 0 -l 400G /mnt/ocfs2/1.log
      fallocate: /mnt/ocfs2/1.log: fallocate failed: Cannot allocate memory
      # tail -f /var/log/messages
      [ 7372.278591] JBD: fallocate wants too many credits (2051 > 2048)
      [ 7372.278597] (fallocate,6438,0):__ocfs2_extend_allocation:709 ERROR: status = -12
      [ 7372.278603] (fallocate,6438,0):ocfs2_allocate_unwritten_extents:1504 ERROR: status = -12
      [ 7372.278607] (fallocate,6438,0):__ocfs2_change_file_space:1955 ERROR: status = -12
      ^C
      With this patch, the test works well.
      Signed-off-by: NYounger Liu <younger.liu@huawei.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b1e55c3
  2. 11 9月, 2013 1 次提交
    • L
      vfs: make sure we don't have a stale root path if unlazy_walk() fails · d0d27277
      Linus Torvalds 提交于
      When I moved the RCU walk termination into unlazy_walk(), I didn't copy
      quite all of it: for the successful RCU termination we properly add the
      necessary reference counts to our temporary copy of the root path, but
      for the failure case we need to make sure that any temporary root path
      information is cleared out (since it does _not_ have the proper
      reference counts from the RCU lookup).
      
      We could clean up this mess by just always dropping the temporary root
      information, but Al points out that that would mean that a single lookup
      through symlinks could see multiple different root entries if it races
      with another thread doing chroot.  Not that I think we should really
      care (we had that before too, back before we had a copy of the root path
      in the nameidata).
      
      Al says he has a cunning plan.  In the meantime, this is the minimal fix
      for the problem, even if it's not all that pretty.
      Reported-by: NMace Moneta <moneta.mace@gmail.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0d27277
  3. 10 9月, 2013 3 次提交
    • A
      split read_seqretry_or_unlock(), convert d_walk() to resulting primitives · 48f5ec21
      Al Viro 提交于
      Separate "check if we need to retry" from "unlock if we are done and
      had seq_writelock"; that allows to use these guys in d_walk(), where
      we need to recheck every time we ascend back to parent, but do *not*
      want to unlock until the very end.  Lift rcu_read_lock/rcu_read_unlock
      out into callers.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      48f5ec21
    • O
      direct-io: Use return from cmpxchg to decide of assignment happened · 45150c43
      Olof Johansson 提交于
      Not using the return value can in the generic case be racy, so it's
      in general good practice to check the return value instead.
      
      This also resolved the warning caused on ARM and other architectures:
      
        fs/direct-io.c: In function 'sb_init_dio_done_wq':
        fs/direct-io.c:557:2: warning: value computed is not used [-Wunused-value]
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: H Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45150c43
    • W
      dcache: Translating dentry into pathname without taking rename_lock · 232d2d60
      Waiman Long 提交于
      When running the AIM7's short workload, Linus' lockref patch eliminated
      most of the spinlock contention. However, there were still some left:
      
           8.46%     reaim  [kernel.kallsyms]     [k] _raw_spin_lock
                       |--42.21%-- d_path
                       |          proc_pid_readlink
                       |          SyS_readlinkat
                       |          SyS_readlink
                       |          system_call
                       |          __GI___readlink
                       |
                       |--40.97%-- sys_getcwd
                       |          system_call
                       |          __getcwd
      
      The big one here is the rename_lock (seqlock) contention in d_path()
      and the getcwd system call. This patch will eliminate the need to take
      the rename_lock while translating dentries into the full pathnames.
      
      The need to take the rename_lock is to make sure that no rename
      operation can be ongoing while the translation is in progress. However,
      only one thread can take the rename_lock thus blocking all the other
      threads that need it even though the translation process won't make
      any change to the dentries.
      
      This patch will replace the writer's write_seqlock/write_sequnlock
      sequence of the rename_lock of the callers of the prepend_path() and
      __dentry_path() functions with the reader's read_seqbegin/read_seqretry
      sequence within these 2 functions. As a result, the code will have to
      retry if one or more rename operations had been performed. In addition,
      RCU read lock will be taken during the translation process to make sure
      that no dentries will go away. To prevent live-lock from happening,
      the code will switch back to take the rename_lock if read_seqretry()
      fails for three times.
      
      To further reduce spinlock contention, this patch does not take the
      dentry's d_lock when copying the filename from the dentries. Instead,
      it treats the name pointer and length as unreliable and just copy
      the string byte-by-byte over until it hits a null byte or the end of
      string as specified by the length. This should avoid stepping into
      invalid memory address. The error cases are left to be handled by
      the sequence number check.
      
      The following code re-factoring are also made:
      1. Move prepend('/') into prepend_name() to remove one conditional
         check.
      2. Move the global root check in prepend_path() back to the top of
         the while loop.
      
      With this patch, the _raw_spin_lock will now account for only 1.2%
      of the total CPU cycles for the short workload. This patch also has
      the effect of reducing the effect of running perf on its profile
      since the perf command itself can be a heavy user of the d_path()
      function depending on the complexity of the workload.
      
      When taking the perf profile of the high-systime workload, the amount
      of spinlock contention contributed by running perf without this patch
      was about 16%. With this patch, the spinlock contention caused by
      the running of perf will go away and we will have a more accurate
      perf profile.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      232d2d60
  4. 09 9月, 2013 7 次提交
    • I
      autofs4 - fix device ioctl mount lookup · ac838719
      Ian Kent 提交于
      When reconnecting to automounts at startup an autofs ioctl is used
      to find the device and inode of existing mounts so they can be used
      to open a file descriptor of possibly covered mounts.
      
      At this time the the caller might not yet "own" the mount so it can
      trigger calling ->d_automount(). This causes automount to hang when
      trying to reconnect to direct or offset mount types.
      
      Consequently kern_path() can't be used but kern_path_mountpoint() can be.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac838719
    • L
      vfs: fix dentry RCU to refcounting possibly sleeping dput() · e5c832d5
      Linus Torvalds 提交于
      This is the fix that the last two commits indirectly led up to - making
      sure that we don't call dput() in a bad context on the dentries we've
      looked up in RCU mode after the sequence count validation fails.
      
      This basically expands d_rcu_to_refcount() into the callers, and then
      fixes the callers to delay the dput() in the failure case until _after_
      we've dropped all locks and are no longer in an RCU-locked region.
      
      The case of 'complete_walk()' was trivial, since its failure case did
      the unlock_rcu_walk() directly after the call to d_rcu_to_refcount(),
      and as such that is just a pure expansion of the function with a trivial
      movement of the resulting dput() to after 'unlock_rcu_walk()'.
      
      In contrast, the unlazy_walk() case was much more complicated, because
      not only does convert two different dentries from RCU to be reference
      counted, but it used to not call unlock_rcu_walk() at all, and instead
      just returned an error and let the caller clean everything up in
      "terminate_walk()".
      
      Happily, one of the dentries in question (called "parent" inside
      unlazy_walk()) is the dentry of "nd->path", which terminate_walk() wants
      a refcount to anyway for the non-RCU case.
      
      So what the new and improved unlazy_walk() does is to first turn that
      dentry into a refcounted one, and once that is set up, the error cases
      can continue to use the terminate_walk() helper for cleanup, but for the
      non-RCU case.  Which makes it possible to drop out of RCU mode if we
      actually hit the sequence number failure case.
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5c832d5
    • A
      introduce kern_path_mountpoint() · 2d864651
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2d864651
    • A
      rename user_path_umountat() to user_path_mountpoint_at() · 197df04c
      Al Viro 提交于
      ... and move the extern from linux/namei.h to fs/internal.h,
      along with that of vfs_path_lookup().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      197df04c
    • A
      take unlazy_walk() into umount_lookup_last() · 35759521
      Al Viro 提交于
      ... and massage it a bit to reduce nesting
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      35759521
    • L
      vfs: use lockred "dead" flag to mark unrecoverably dead dentries · 0d98439e
      Linus Torvalds 提交于
      This simplifies the RCU to refcounting code in particular.
      
      I was originally intending to leave this for later, but walking through
      all the dput() logic (see previous commit), I realized that the dput()
      "might_sleep()" check was misleadingly weak.  And I removed it as
      misleading, both for performance profiling and for debugging.
      
      However, the might_sleep() debugging case is actually true: the final
      dput() can indeed sleep, if the inode of the dentry that you are
      releasing ends up sleeping at iput time (see dentry_iput()).  So the
      problem with the might_sleep() in dput() wasn't that it wasn't true, it
      was that it wasn't actually testing and triggering on the interesting
      case.
      
      In particular, just about *any* dput() can indeed sleep, if you happen
      to race with another thread deleting the file in question, and you then
      lose the race to the be the last dput() for that file.  But because it's
      a very rare race, the debugging code would never trigger it in practice.
      
      Why is this problematic? The new d_rcu_to_refcount() (see commit
      15570086: "vfs: reimplement d_rcu_to_refcount() using
      lockref_get_or_lock()") does a dput() for the failure case, and it does
      it under the RCU lock.  So potentially sleeping really is a bug.
      
      But there's no way I'm going to fix this with the previous complicated
      "lockref_get_or_lock()" interface.  And rather than revert to the old
      and crufty nested dentry locking code (which did get this right by
      delaying the reference count updates until they were verified to be
      safe), let's make forward progress.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d98439e
    • L
      vfs: reorganize dput() memory accesses · 8aab6a27
      Linus Torvalds 提交于
      This is me being a bit OCD after all the dentry optimization work this
      merge window: profiles end up showing 'dput()' as a rather expensive
      operation, and there were two unrelated bad reasons for that.
      
      The first reason was reading d_lockref.count for debugging purposes,
      which touches the lockref cacheline (for reads) before really need to.
      More importantly, the debugging test in question is _wrong_, and has
      hidden bugs.  It's true that we can only sleep when the count goes down
      to zero, but the test as-is hides the much more subtle bug that happens
      if we race with somebody else deleting the file.
      
      Anyway we _will_ touch that cacheline, but let's do it for a write and
      in the right routine (ie in "lockref_put_or_lock()") which annotates the
      costs better.  So remove the misleading debug code.
      
      The other was an unnecessary access to the cacheline that contains the
      d_lru list, just to check whether we already were on the LRU list or
      not.  This is exactly what we have d_flags for, so that we can avoid
      touching extra cache lines for the common case.  So just add another bit
      for "is this dentry on the LRU".
      
      Finally, mark the tests properly likely/unlikely, so that the common
      fast-paths are dense in the instruction stream.
      
      This makes the profiles look much saner.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aab6a27
  5. 08 9月, 2013 10 次提交
  6. 07 9月, 2013 10 次提交
    • R
      um: hostfs: Fix writeback · 65984ff9
      Richard Weinberger 提交于
      We have to implement ->release() and trigger writeback from it.
      Otherwise we might lose dirty pages at munmap().
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      65984ff9
    • Y
      ceph: use d_invalidate() to invalidate aliases · a8d436f0
      Yan, Zheng 提交于
      d_invalidate() is the standard VFS method to invalidate dentry.
      compare to d_delete(), it also try shrinking children dentries.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a8d436f0
    • Y
      ceph: remove ceph_lookup_inode() · ed284c49
      Yan, Zheng 提交于
      commit 6f60f889 (ceph: fix freeing inode vs removing session caps race)
      introduced ceph_lookup_inode(). But there is already a ceph_find_inode()
      which provides similar function. So remove ceph_lookup_inode(), use
      ceph_find_inode() instead.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <alex.elder@linary.org>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ed284c49
    • A
      NFSv4.1 Use MDS auth flavor for data server connection · 0e20162e
      Andy Adamson 提交于
      Commit 4edaa308 "NFS: Use "krb5i" to establish NFSv4 state whenever possible"
      uses the nfs_client cl_rpcclient for all state management operations, and
      will use krb5i or auth_sys with no regard to the mount command authflavor
      choice.
      
      The MDS, as any NFSv4.1 mount point, uses the nfs_server rpc client for all
      non-state management operations with a different nfs_server for each fsid
      encountered traversing the mount point, each with a potentially different
      auth flavor.
      
      pNFS data servers are not mounted in the normal sense as there is no associated
      nfs_server structure. Data servers can also export multiple fsids, each with
      a potentially different auth flavor.
      
      Data servers need to use the same authflavor as the MDS server rpc client for
      non-state management operations. Populate a list of rpc clients with the MDS
      server rpc client auth flavor for the DS to use.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      0e20162e
    • M
      ceph: trivial buildbot warnings fix · 971f0bde
      Milosz Tanski 提交于
      The linux-next build bot found a three of warnings, this addresses all of them.
      
       * non-ANSI function declaration of function 'ceph_fscache_register' and
         'ceph_fscache_unregister'
       * symbol 'ceph_cache_netfs' was not declared, now it's extern in the header.
       * warning: "pr_fmt" redefined
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      971f0bde
    • M
      ceph: Do not do invalidate if the filesystem is mounted nofsc · e81568eb
      Milosz Tanski 提交于
      Previously we would always try to enqueue work even if the filesystem is not
      mounted with fscache enabled (or the file has no cookie). In the case of the
      filesystem mouned nofsc (but with fscache compiled in) this would lead to a
      crash.
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      e81568eb
    • M
      ceph: page still marked private_2 · d4d3aa38
      Milosz Tanski 提交于
      Previous patch that allowed us to cleanup most of the issues with pages marked
      as private_2 when calling ceph_readpages. However, there seams to be a case in
      the error case clean up in start read that still trigers this from time to
      time. I've only seen this one a couple times.
      
      BUG: Bad page state in process petabucket  pfn:335b82
      page:ffffea000cd6e080 count:0 mapcount:0 mapping:          (null) index:0x0
      page flags: 0x200000000001000(private_2)
      Call Trace:
       [<ffffffff81563442>] dump_stack+0x46/0x58
       [<ffffffff8112c7f7>] bad_page+0xc7/0x120
       [<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
       [<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
       [<ffffffff81132427>] __put_single_page+0x27/0x30
       [<ffffffff81132d95>] put_page+0x25/0x40
       [<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
       [<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      d4d3aa38
    • M
      ceph: ceph_readpage_to_fscache didn't check if marked · 9b8dd1e8
      Milosz Tanski 提交于
      Previously ceph_readpage_to_fscache did not call if page was marked as cached
      before calling fscache_write_page resulting in a BUG inside of fscache.
      
      FS-Cache: Assertion failed
      ------------[ cut here ]------------
      kernel BUG at fs/fscache/page.c:874!
      invalid opcode: 0000 [#1] SMP
      Call Trace:
       [<ffffffffa02e6566>] __ceph_readpage_to_fscache+0x66/0x80 [ceph]
       [<ffffffffa02caf84>] readpage_nounlock+0x124/0x210 [ceph]
       [<ffffffffa02cb08d>] ceph_readpage+0x1d/0x40 [ceph]
       [<ffffffff81126db6>] generic_file_aio_read+0x1f6/0x700
       [<ffffffffa02c6fcc>] ceph_aio_read+0x5fc/0xab0 [ceph]
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      9b8dd1e8
    • M
      ceph: clean PgPrivate2 on returning from readpages · 76be778b
      Milosz Tanski 提交于
      In some cases the ceph readapages code code bails without filling all the pages
      already marked by fscache. When we return back to readahead code this causes
      a BUG.
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      76be778b
    • M
      ceph: use fscache as a local presisent cache · 99ccbd22
      Milosz Tanski 提交于
      Adding support for fscache to the Ceph filesystem. This would bring it to on
      par with some of the other network filesystems in Linux (like NFS, AFS, etc...)
      
      In order to mount the filesystem with fscache the 'fsc' mount option must be
      passed.
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      99ccbd22
  7. 06 9月, 2013 5 次提交
    • T
      NFS: Don't check lock owner compatability unless file is locked (part 2) · 4109bb74
      Trond Myklebust 提交于
      When coalescing requests into a single READ or WRITE RPC call, and there
      is no file locking involved, we don't have to refuse coalescing for
      requests where the lock owner information doesn't match.
      Reported-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      4109bb74
    • M
      fscache: Netfs function for cleanup post readpages · 5a6f282a
      Milosz Tanski 提交于
      Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
      inside the aops readpages callback.  It marks all the pages in the list
      provided by readahead with PG_private_2.  In the cases that the netfs fails to
      read all the pages (which is legal) it ends up returning to the readahead and
      triggering a BUG.  This happens because the page list still contains marked
      pages.
      
      This patch implements a simple fscache_readpages_cancel function that the netfs
      should call before returning from readpages.  It will revoke the pages from the
      underlying cache backend and unmark them.
      
      The problem was originally worked out in the Ceph devel tree, but it also
      occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
      will clean up the unprocessed pages in the case of an error.
      
      This can be used to address the following oops:
      
      [12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
      [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
      	(null) index:0x0
      [12410647.597298] page flags: 0x200000000001000(private_2)
      
      ...
      
      [12410647.597334] Call Trace:
      [12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
      [12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
      [12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
      [12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
      [12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
      [12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
      [12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
      [12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
      [12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
      [12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
      [12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
      [12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
      [12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
      [12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
      [12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
      [12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
      [12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
      [12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
      [12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
      [12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
      [12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
      [12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
      [12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
      [12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
      [12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a6f282a
    • D
      CacheFiles: Implement interface to check cache consistency · 5002d7be
      David Howells 提交于
      Implement the FS-Cache interface to check the consistency of a cache object in
      CacheFiles.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      5002d7be
    • D
      FS-Cache: Add interface to check consistency of a cached object · da9803bc
      David Howells 提交于
      Extend the fscache netfs API so that the netfs can ask as to whether a cache
      object is up to date with respect to its corresponding netfs object:
      
      	int fscache_check_consistency(struct fscache_cookie *cookie)
      
      This will call back to the netfs to check whether the auxiliary data associated
      with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
      may also return -ENOMEM and -ERESTARTSYS.
      
      The backends now have to implement a mandatory operation pointer:
      
      	int (*check_consistency)(struct fscache_object *object)
      
      that corresponds to the above API call.  FS-Cache takes care of pinning the
      object and the cookie in memory and managing this call with respect to the
      object state.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      da9803bc
    • T
      NFS: Don't check lock owner compatibility in writes unless file is locked · 0f1d2605
      Trond Myklebust 提交于
      If we're doing buffered writes, and there is no file locking involved,
      then we don't have to worry about whether or not the lock owner information
      is identical.
      By relaxing this check, we ensure that fork()ed child processes can write
      to a page without having to first sync dirty data that was written
      by the parent to disk.
      Reported-by: NQuentin Barnes <qbarnes@gmail.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NQuentin Barnes <qbarnes@gmail.com>
      0f1d2605