1. 29 5月, 2014 16 次提交
  2. 16 4月, 2014 1 次提交
  3. 29 1月, 2014 1 次提交
  4. 28 1月, 2014 1 次提交
    • J
      NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping · d529ef83
      Jeff Layton 提交于
      There is a possible race in how the nfs_invalidate_mapping function is
      handled.  Currently, we go and invalidate the pages in the file and then
      clear NFS_INO_INVALID_DATA.
      
      The problem is that it's possible for a stale page to creep into the
      mapping after the page was invalidated (i.e., via readahead). If another
      writer comes along and sets the flag after that happens but before
      invalidate_inode_pages2 returns then we could clear the flag
      without the cache having been properly invalidated.
      
      So, we must clear the flag first and then invalidate the pages. Doing
      this however, opens another race:
      
      It's possible to have two concurrent read() calls that end up in
      nfs_revalidate_mapping at the same time. The first one clears the
      NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping.
      
      Just before calling that though, the other task races in, checks the
      flag and finds it cleared. At that point, it trusts that the mapping is
      good and gets the lock on the page, allowing the read() to be satisfied
      from the cache even though the data is no longer valid.
      
      These effects are easily manifested by running diotest3 from the LTP
      test suite on NFS. That program does a series of DIO writes and buffered
      reads. The operations are serialized and page-aligned but the existing
      code fails the test since it occasionally allows a read to come out of
      the cache incorrectly. While mixing direct and buffered I/O isn't
      recommended, I believe it's possible to hit this in other ways that just
      use buffered I/O, though that situation is much harder to reproduce.
      
      The problem is that the checking/clearing of that flag and the
      invalidation of the mapping really need to be atomic. Fix this by
      serializing concurrent invalidations with a bitlock.
      
      At the same time, we also need to allow other places that check
      NFS_INO_INVALID_DATA to check whether we might be in the middle of
      invalidating the file, so fix up a couple of places that do that
      to look for the new NFS_INO_INVALIDATING flag.
      
      Doing this requires us to be careful not to set the bitlock
      unnecessarily, so this code only does that if it believes it will
      be doing an invalidation.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d529ef83
  5. 18 1月, 2014 1 次提交
  6. 06 1月, 2014 1 次提交
  7. 25 10月, 2013 1 次提交
  8. 06 9月, 2013 1 次提交
  9. 05 9月, 2013 2 次提交
    • W
      nfs4.1: Add SP4_MACH_CRED write and commit support · 8c21c62c
      Weston Andros Adamson 提交于
      WRITE and COMMIT can use the machine credential.
      
      If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      8c21c62c
    • N
      NFSv4: Don't try to recover NFSv4 locks when they are lost. · ef1820f9
      NeilBrown 提交于
      When an NFSv4 client loses contact with the server it can lose any
      locks that it holds.
      
      Currently when it reconnects to the server it simply tries to reclaim
      those locks.  This might succeed even though some other client has
      held and released a lock in the mean time.  So the first client might
      think the file is unchanged, but it isn't.  This isn't good.
      
      If, when recovery happens, the locks cannot be claimed because some
      other client still holds the lock, then we get a message in the kernel
      logs, but the client can still write.  So two clients can both think
      they have a lock and can both write at the same time.  This is equally
      not good.
      
      There was a patch a while ago
        http://comments.gmane.org/gmane.linux.nfs/41917
      
      which tried to address some of this, but it didn't seem to go
      anywhere.  That patch would also send a signal to the process.  That
      might be useful but for now this patch just causes writes to fail.
      
      For NFSv4 (unlike v2/v3) there is a strong link between the lock and
      the write request so we can fairly easily fail any IO of the lock is
      gone.  While some applications might not expect this, it is still
      safer than allowing the write to succeed.
      
      Because this is a fairly big change in behaviour a module parameter,
      "recover_locks", is introduced which defaults to true (the current
      behaviour) but can be set to "false" to tell the client not to try to
      recover things that were lost.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ef1820f9
  10. 04 9月, 2013 1 次提交
    • A
      NFS avoid expired credential keys for buffered writes · dc24826b
      Andy Adamson 提交于
      We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
      context key) that is about to expire or has expired.  We currently will
      paint ourselves into a corner by returning success to the applciation
      for such a buffered WRITE, only to discover that we do not have permission when
      we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
      
      Use the RPC layer credential key timeout and expire routines which use a
      a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.
      
      If a WRITE is using a credential with a key that will expire within
      watermark seconds, flush the inode in nfs_write_end and send only
      NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
      Note that this results in single page NFS_FILE_SYNC WRITEs.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      [Trond: removed a pr_warn_ratelimited() for now]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      dc24826b
  11. 22 8月, 2013 1 次提交
  12. 10 7月, 2013 1 次提交
  13. 26 3月, 2013 1 次提交
  14. 05 1月, 2013 1 次提交
  15. 21 12月, 2012 1 次提交
    • D
      NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page · 8c209ce7
      David Howells 提交于
      nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
      leading to the following bad-page-state:
      
       BUG: Bad page state in process python-bin  pfn:17d39b
       page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
      index:38686 (Tainted: G    B      ---------------- )
       Pid: 31053, comm: python-bin Tainted: G    B      ----------------
      2.6.32-71.24.1.el6.x86_64 #1
       Call Trace:
       [<ffffffff8111bfe7>] bad_page+0x107/0x160
       [<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
       [<ffffffff8111ef19>] __pagevec_free+0x59/0xb0
       [<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
       [<ffffffff8112230c>] release_pages+0x21c/0x250
       [<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
       [<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
       [<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
       [<ffffffff811226f8>] __lru_cache_add+0x58/0x70
       [<ffffffff81122731>] lru_cache_add_lru+0x21/0x40
       [<ffffffff81123f49>] putback_lru_page+0x69/0x100
       [<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
       [<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
       [<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
       [<ffffffff8115255c>] compact_zone+0x4cc/0x600
       [<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
       [<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
       [<ffffffff8115290e>] compact_zone_order+0x7e/0xb0
       [<ffffffff81152a49>] try_to_compact_pages+0x109/0x170
       [<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
       [<ffffffff814c9136>] ? thread_return+0x4e/0x778
       [<ffffffff81150d43>] alloc_pages_vma+0x93/0x150
       [<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
       [<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
       [<ffffffff81136755>] handle_mm_fault+0x245/0x2b0
       [<ffffffff814ce383>] do_page_fault+0x123/0x3a0
       [<ffffffff814cbdf5>] page_fault+0x25/0x30
      
      nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
      - even if __GFP_WAIT is set.  The reason that doesn't wait is that
      fscache_maybe_release_page() might deadlock the allocator as the work threads
      writing to the cache may all end up sleeping on memory allocation.
      
      However, I wonder if that is actually a problem.  There are a number of things
      I can do to deal with this:
      
       (1) Make nfs_migrate_page() wait.
      
       (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.
      
       (3) Set a timeout around the wait.
      
       (4) Make nfs_migrate_page() return an error if the page is still busy.
      
      For the moment, I'll select (2) and (4).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      8c209ce7
  16. 16 12月, 2012 1 次提交
    • T
      NFS: Don't use SetPageError in the NFS writeback code · ada8e20d
      Trond Myklebust 提交于
      The writeback code is already capable of passing errors back to user space
      by means of the open_context->error. In the case of ENOSPC, Neil Brown
      is reporting seeing 2 errors being returned.
      
      Neil writes:
      
      "e.g. if /mnt2/ if an nfs mounted filesystem that has no space then
      
      strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1
      
      reported Input/output error and the relevant parts of the strace output are:
      
      write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
      fsync(1)                                = -1 EIO (Input/output error)
      close(1)                                = -1 ENOSPC (No space left on device)"
      
      Neil then shows that the duplication of error messages appears to be due to
      the use of the PageError() mechanism, which causes filemap_fdatawait_range
      to return the extra EIO. The regression was introduced by
      commit 7b281ee0 (NFS: fsync() must exit
      with an error if page writeback failed).
      
      Fix this by removing the call to SetPageError(), and just relying on
      open_context->error reporting the ENOSPC back to fsync().
      Reported-by: NNeil Brown <neilb@suse.de>
      Tested-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org [3.6+]
      ada8e20d
  17. 11 12月, 2012 1 次提交
    • J
      nfs: don't extend writes to cover entire page if pagecache is invalid · 81d9bce5
      Jeff Layton 提交于
      Jian reported that the following sequence would leave "testfile" with
      corrupt data:
      
          # mount localhost:/export /mnt/nfs/ -o vers=3
          # echo abc > /mnt/nfs/testfile; echo def >> /export/testfile; echo ghi >> /mnt/nfs/testfile
          # cat -v /export/testfile
          abc
          ^@^@^@^@ghi
      
      While there's no locking involved here, the operations are serialized,
      so CTO should prevent corruption.
      
      The first write to the file is fine and writes 4 bytes. The file is then
      extended on the server. When it's reopened a GETATTR is issued and the
      size change is noticed. This causes NFS_INO_INVALID_DATA to be set on
      the file. Because the file is opened for write only,
      nfs_want_read_modify_write() returns 0 to nfs_write_begin().
      nfs_updatepage then calls nfs_write_pageuptodate() to see if it should
      extend the nfs_page to cover the whole page. NFS_INO_INVALID_DATA is
      still set on the file at that point, but that flag is ignored and
      nfs_pageuptodate erroneously extends the write to cover the whole page,
      with the write done on the server side filled in with zeroes.
      
      This patch just has that function check for NFS_INO_INVALID_DATA in
      addition to NFS_INO_REVAL_PAGECACHE. This fixes the bug, but looking
      over the code, I wonder if we might have a similar bug in
      nfs_revalidate_size(). The difference between those two flags is very
      subtle, so it seems like we ought to be checking for
      NFS_INO_INVALID_DATA in most of the places that we look for
      NFS_INO_REVAL_PAGECACHE.
      
      I believe this is regression introduced by commit 8d197a56. The code
      did check for NFS_INO_INVALID_DATA prior to that patch.
      
      Original bug report is here:
      
          https://bugzilla.redhat.com/show_bug.cgi?id=885743
      
      Cc: <stable@vger.kernel.org> # 3.5+
      Reported-by: NJian Li <jiali@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      81d9bce5
  18. 26 11月, 2012 1 次提交
  19. 05 11月, 2012 1 次提交
  20. 29 9月, 2012 2 次提交
  21. 03 8月, 2012 1 次提交
  22. 01 8月, 2012 2 次提交