1. 25 3月, 2015 14 次提交
  2. 24 2月, 2015 10 次提交
  3. 23 2月, 2015 16 次提交
    • E
      xfs: pass mp to XFS_WANT_CORRUPTED_RETURN · 5fb5aeee
      Eric Sandeen 提交于
      Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5fb5aeee
    • E
      xfs: pass mp to XFS_WANT_CORRUPTED_GOTO · c29aad41
      Eric Sandeen 提交于
      Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      c29aad41
    • D
      xfs: inodes are new until the dentry cache is set up · 58c90473
      Dave Chinner 提交于
      Al Viro noticed a generic set of issues to do with filehandle lookup
      racing with dentry cache setup. They involve a filehandle lookup
      occurring while an inode is being created and the filehandle lookup
      racing with the dentry creation for the real file. This can lead to
      multiple dentries for the one path being instantiated. There are a
      host of other issues around this same set of paths.
      
      The underlying cause is that file handle lookup only waits on inode
      cache instantiation rather than full dentry cache instantiation. XFS
      is mostly immune to the problems discovered due to it's own internal
      inode cache, but there are a couple of corner cases where races can
      happen.
      
      We currently clear the XFS_INEW flag when the inode is fully set up
      after insertion into the cache. Newly allocated inodes are inserted
      locked and so aren't usable until the allocation transaction
      commits. This, however, occurs before the dentry and security
      information is fully initialised and hence the inode is unlocked and
      available for lookups to find too early.
      
      To solve the problem, only clear the XFS_INEW flag for newly created
      inodes once the dentry is fully instantiated. This means lookups
      will retry until the XFS_INEW flag is removed from the inode and
      hence avoids the race conditions in questions.
      
      THis also means that xfs_create(), xfs_create_tmpfile() and
      xfs_symlink() need to finish the setup of the inode in their error
      paths if we had allocated the inode but failed later in the creation
      process. xfs_symlink(), in particular, needed a lot of help to make
      it's error handling match that of xfs_create().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      58c90473
    • D
      xfs: ensure truncate forces zeroed blocks to disk · 5885ebda
      Dave Chinner 提交于
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5885ebda
    • J
      xfs: Fix quota type in quota structures when reusing quota file · dfcc70a8
      Jan Kara 提交于
      For filesystems without separate project quota inode field in the
      superblock we just reuse project quota file for group quotas (and vice
      versa) if project quota file is allocated and we need group quota file.
      When we reuse the file, quota structures on disk suddenly have wrong
      type stored in d_flags though. Nobody really cares about this (although
      structure type reported to userspace was wrong as well) except
      that after commit 14bf61ff (quota: Switch ->get_dqblk() and
      ->set_dqblk() to use bytes as space units) assertion in
      xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
      was testing without XFS_DEBUG so I didn't notice when submitting the
      above commit).
      
      Fix the problem by properly resetting ddq->d_flags when running quotacheck
      for a quota file.
      
      CC: stable@vger.kernel.org
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dfcc70a8
    • D
      xfs: lock out page faults from extent swap operations · 723cac48
      Dave Chinner 提交于
      Extent swap operations are another extent manipulation operation
      that we need to ensure does not race against mmap page faults. The
      current code returns if the file is mapped prior to the swap being
      done, but it could potentially race against new page faults while
      the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL
      for this operation, too.
      
      While there, fix the error path handling that can result in double
      unlocks of the inodes when cancelling the swapext transaction.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      723cac48
    • D
      xfs: xfs_setattr_size no longer races with page faults · 0f9160b4
      Dave Chinner 提交于
      Now that truncate locks out new page faults, we no longer need to do
      special writeback hacks in truncate to work around potential races
      between page faults, page cache truncation and file size updates to
      ensure we get write page faults for extending truncates on sub-page
      block size filesystems. Hence we can remove the code in
      xfs_setattr_size() that handles this and update the comments around
      the code tha thandles page cache truncate and size updates to
      reflect the new reality.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0f9160b4
    • D
      xfs: take i_mmap_lock on extent manipulation operations · e8e9ad42
      Dave Chinner 提交于
      Now we have the i_mmap_lock being held across the page fault IO
      path, we now add extent manipulation operation exclusion by adding
      the lock to the paths that directly modify extent maps. This
      includes truncate, hole punching and other fallocate based
      operations. The operations will now take both the i_iolock and the
      i_mmaplock in exclusive mode, thereby ensuring that all IO and page
      faults block without holding any page locks while the extent
      manipulation is in progress.
      
      This gives us the lock order during truncate of i_iolock ->
      i_mmaplock -> page_lock -> i_lock, hence providing the same
      lock order as the iolock provides the normal IO path without
      involving the mmap_sem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e8e9ad42
    • D
      xfs: use i_mmaplock on write faults · 075a924d
      Dave Chinner 提交于
      Take the i_mmaplock over write page faults. These come through the
      ->page_mkwrite callout, so we need to wrap that calls with the
      i_mmaplock.
      
      This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
      -> i_lock.
      
      Also, move the page_mkwrite wrapper to the same region of xfs_file.c
      as the read fault wrappers and add a tracepoint.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      075a924d
    • D
      xfs: use i_mmaplock on read faults · de0e8c20
      Dave Chinner 提交于
      Take the i_mmaplock over read page faults. These come through the
      ->fault callout, so we need to wrap the generic implementation
      with the i_mmaplock. While there, add tracepoints for the read
      fault as it passes through XFS.
      
      This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
      -> i_lock.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      de0e8c20
    • D
      xfs: introduce mmap/truncate lock · 653c60b6
      Dave Chinner 提交于
      Right now we cannot serialise mmap against truncate or hole punch
      sanely. ->page_mkwrite is not able to take locks that the read IO
      path normally takes (i.e. the inode iolock) because that could
      result in lock inversions (read - iolock - page fault - page_mkwrite
      - iolock) and so we cannot use an IO path lock to serialise page
      write faults against truncate operations.
      
      Instead, introduce a new lock that is used *only* in the
      ->page_mkwrite path that is the equivalent of the iolock. The lock
      ordering in a page fault is i_mmaplock -> page lock -> i_ilock,
      and so in truncate we can i_iolock -> i_mmaplock and so lock out
      new write faults during the process of truncation.
      
      Because i_mmap_lock is outside the page lock, we can hold it across
      all the same operations we hold the i_iolock for. The only
      difference is that we never hold the i_mmaplock in the normal IO
      path and so do not ever have the possibility that we can page fault
      inside it. Hence there are no recursion issues on the i_mmap_lock
      and so we can use it to serialise page fault IO against inode
      modification operations that affect the IO path.
      
      This patch introduces the i_mmaplock infrastructure, lockdep
      annotations and initialisation/destruction code. Use of the new lock
      will be in subsequent patches.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      653c60b6
    • D
      xfs: remove xfs_mod_incore_sb API · 964aa8d9
      Dave Chinner 提交于
      Now that there are no users of the bitfield based incore superblock
      modification API, just remove the whole damn lot of it, including
      all the bitfield definitions. This finally removes a lot of cruft
      that has been around for a long time.
      
      Credit goes to Christoph Hellwig for providing a great patch
      connecting all the dots to enale us to do this. This patch is
      derived from that work.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      964aa8d9
    • D
      xfs: replace xfs_mod_incore_sb_batched · 0bd5dded
      Dave Chinner 提交于
      Introduce helper functions for modifying fields in the superblock
      into xfs_trans.c, the only caller of xfs_mod_incore_sb_batch().  We
      can then use these directly in xfs_trans_unreserve_and_mod_sb() and
      so remove another user of the xfs_mode_incore_sb() API without
      losing any functionality or scalability of the transaction commit
      code..
      
      Based on a patch from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0bd5dded
    • D
      xfs: introduce xfs_mod_frextents · bab98bbe
      Dave Chinner 提交于
      Add a new helper to modify the incore counter of free realtime
      extents. This matches the helpers used for inode and data block
      counters, and removes a significant users of the xfs_mod_incore_sb()
      interface.
      
      Based on a patch originally from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bab98bbe
    • D
      xfs: Remove icsb infrastructure · 5681ca40
      Dave Chinner 提交于
      Now that the in-core superblock infrastructure has been replaced with
      generic per-cpu counters, we don't need it anymore. Nuke it from
      orbit so we are sure that it won't haunt us again...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5681ca40
    • D
      xfs: use generic percpu counters for free block counter · 0d485ada
      Dave Chinner 提交于
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. The free block counter is
      special in that it is used for ENOSPC detection outside transaction
      contexts for for delayed allocation. This means that the counter
      needs to be accurate at zero. The current per-cpu counter code jumps
      through lots of hoops to ensure we never run past zero, but we don't
      need to make all those jumps with the generic counter
      implementation.
      
      The generic counter implementation allows us to pass a "batch"
      threshold at which the addition/subtraction to the counter value
      will be folded back into global value under lock. We can use this
      feature to reduce the batch size as we approach 0 in a very similar
      manner to the existing counters and their rebalance algorithm. If we
      use a batch size of 1 as we approach 0, then every addition and
      subtraction will be done against the global value and hence allow
      accurate detection of zero threshold crossing.
      
      Hence we can replace the handrolled, accurate-at-zero counters with
      generic percpu counters.
      
      Note: this removes just enough of the icsb infrastructure to compile
      without warnings. The rest will go in subsequent commits.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0d485ada