1. 02 10月, 2014 3 次提交
  2. 23 9月, 2014 7 次提交
    • D
      xfs: flush entire last page of old EOF on truncate up · 2ebff7bb
      Dave Chinner 提交于
      On a sub-page sized filesystem, truncating a mapped region down
      leaves us in a world of hurt. We truncate the pagecache, zeroing the
      newly unused tail, then punch blocks out from under the page. If we
      then truncate the file back up immediately, we expose that unmapped
      hole to a dirty page mapped into the user application, and that's
      where it all goes wrong.
      
      In truncating the page cache, we avoid unmapping the tail page of
      the cache because it still contains valid data. The problem is that
      it also contains a hole after the truncate, but nobody told the mm
      subsystem that. Therefore, if the page is dirty before the truncate,
      we'll never get a .page_mkwrite callout after we extend the file and
      the application writes data into the hole on the page.  Hence when
      we come to writing that region of the page, it has no blocks and no
      delayed allocation reservation and hence we toss the data away.
      
      This patch adds code to the truncate up case to solve it, by
      ensuring the partial page at the old EOF is always cleaned after we
      do any zeroing and move the EOF upwards. We can't actually serialise
      the page writeback and truncate against page faults (yes, that
      problem AGAIN) so this is really just a best effort and assumes it
      is extremely unlikely that someone is concurrently writing to the
      page at the EOF while extending the file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2ebff7bb
    • D
      xfs: xfs_swap_extent_flush can be static · 7abbb8f9
      Dave Chinner 提交于
      Fix sparse warning introduced by commit 4ef897a2 ("xfs: flush both
      inodes in xfs_swap_extents").
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7abbb8f9
    • D
      xfs: xfs_buf_write_fail_rl_state can be static · 02cc1876
      Dave Chinner 提交于
      Fix sparse warning introduced by commit ac8809f9 ("xfs: abort
      metadata writeback on permanent errors").
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      02cc1876
    • F
      xfs: xfs_rtget_summary can be static · ea95961d
      Fengguang Wu 提交于
      Fix sparse warning introduced by commit afabfd30 ("xfs: combine
      xfs_rtmodify_summary and xfs_rtget_summary").
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ea95961d
    • F
      xfs: remove second xfs_quota.h inclusion in xfs_icache.c · e3cf1796
      Fabian Frederick 提交于
      xfs_quota.h was included twice.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      e3cf1796
    • E
      xfs: don't ASSERT on corrupt ftype · fb040131
      Eric Sandeen 提交于
      xfs_dir3_data_get_ftype() gets the file type off disk, but ASSERTs
      if it's invalid:
      
           ASSERT(type < XFS_DIR3_FT_MAX);
      
      We shouldn't ASSERT on bad values read from disk.  V3 dirs are
      CRC-protected, but V2 dirs + ftype are not.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      fb040131
    • D
      xfs: xlog_cil_force_lsn doesn't always wait correctly · 8af3dcd3
      Dave Chinner 提交于
      When running a tight mount/unmount loop on an older kernel, RedHat
      QE found that unmount would occasionally hang in
      xfs_buf_unpin_wait() on the superblock buffer. Tracing and other
      debug work by Eric Sandeen indicated that it was hanging on the
      writing of the superblock during unmount immediately after logging
      the superblock counters in a synchronous transaction. Further debug
      indicated that the synchronous transaction was not waiting for
      completion correctly, and we narrowed it down to
      xlog_cil_force_lsn() returning NULLCOMMITLSN and hence not pushing
      the transaction in the iclog buffer to disk correctly.
      
      While this unmount superblock write code is now very different in
      mainline kernels, the xlog_cil_force_lsn() code is identical, and it
      was bisected to the backport of commit f876e446 ("xfs: always do log
      forces via the workqueue"). This commit made the CIL push
      asynchronous for log forces and hence exposed a race condition that
      couldn't occur on a synchronous push.
      
      Essentially, the xlog_cil_force_lsn() relied implicitly on the fact
      that the sequence push would be complete by the time
      xlog_cil_push_now() returned, resulting in the context being pushed
      being in the committing list. When it was made asynchronous, it was
      recognised that there was a race condition in detecting whether an
      asynchronous push has started or not and code was added to handle
      it.
      
      Unfortunately, the fix was not quite right and left a race condition
      where it it would detect an empty CIL while a push was in progress
      before the context had been added to the committing list. This was
      incorrectly seen as a "nothing to do" condition and so would tell
      xfs_log_force_lsn() that there is nothing to wait for, and hence it
      would push the iclogbufs in memory.
      
      The fix is simple, but explaining the logic and the race condition
      is a lot more complex. The fix is to add the context to the
      committing list before we start emptying the CIL. This allows us to
      detect the difference between an empty "do nothing" push and a push
      that has not started by adding a discrete "emptying the CIL" state
      to avoid the transient, incorrect "empty" condition that the
      (unchanged) waiting code was seeing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8af3dcd3
  3. 09 9月, 2014 11 次提交
    • E
      xfs: remove rbpp check from xfs_rtmodify_summary_int · ab6978c2
      Eric Sandeen 提交于
      rbpp is always passed into xfs_rtmodify_summary
      and xfs_rtget_summary, so there is no need to
      test for it in xfs_rtmodify_summary_int.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ab6978c2
    • E
      xfs: combine xfs_rtmodify_summary and xfs_rtget_summary · afabfd30
      Eric Sandeen 提交于
      xfs_rtmodify_summary and xfs_rtget_summary are almost identical;
      fold them into xfs_rtmodify_summary_int(), with wrappers for each of
      the original calls.
      
      The _int function modifies if a delta is passed, and returns a
      summary pointer if *sum is passed.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      afabfd30
    • E
      xfs: combine xfs_dir_canenter into xfs_dir_createname · b16ed7c1
      Eric Sandeen 提交于
      xfs_dir_canenter and xfs_dir_createname are
      almost identical.
      
      Fold the former into the latter, with a helpful
      wrapper for the former.  If createname is called without
      an inode number, it now only checks for space, and does
      not actually add the entry.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b16ed7c1
    • E
      xfs: check resblks before calling xfs_dir_canenter · 94f3cad5
      Eric Sandeen 提交于
      Move the resblks test out of the xfs_dir_canenter,
      and into the caller.
      
      This makes a little more sense on the face of it;
      xfs_dir_canenter immediately returns if resblks !=0;
      and given some of the comments preceding the calls:
      
       * Check for ability to enter directory entry, if no space reserved.
      
      even more so.
      
      It also facilitates the next patch.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      94f3cad5
    • E
      xfs: deduplicate xlog_do_recovery_pass() · 970fd3f0
      Eric Sandeen 提交于
      In xlog_do_recovery_pass(), there are 2 distinct cases:
      non-wrapped and wrapped log recovery.
      
      If we find a wrapped log, we recover around the end
      of the log, and then handle the rest of recovery
      exactly as in the non-wrapped case - using exactly the same
      (duplicated) code.
      
      Rather than having the same code in both cases, we can
      get the wrapped portion out of the way first if needed,
      and then recover the non-wrapped portion of the log.
      
      There should be no functional change here, just code
      reorganization & deduplication.
      
      The patch looks a bit bigger than it really is; the last
      hunk is whitespace changes (un-indenting).
      
      Tested with xfstests "check -g log" on a stock configuration.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      970fd3f0
    • E
      xfs: lseek: the "whence" argument is called "whence" · 59f9c004
      Eric Sandeen 提交于
      For some reason, the older commit:
      
          965c8e59 lseek: the "whence" argument is called "whence"
      
          lseek: the "whence" argument is called "whence"
      
          But the kernel decided to call it "origin" instead.
          Fix most of the sites.
      
      left out xfs.  So fix xfs.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      59f9c004
    • E
      xfs: combine xfs_seek_hole & xfs_seek_data · 49c69591
      Eric Sandeen 提交于
      xfs_seek_hole & xfs_seek_data are remarkably similar;
      so much so that they can be combined, saving a fair
      bit of semi-complex code duplication.
      
      The following patch passes generic/285 and generic/286,
      which specifically test seek behavior.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      49c69591
    • B
      xfs: export log_recovery_delay to delay mount time log recovery · 2e227178
      Brian Foster 提交于
      XFS log recovery has been discovered to have race conditions with
      buffers when I/O errors occur. External tools are available to simulate
      I/O errors to XFS, but this alone is not sufficient for testing log
      recovery. XFS unconditionally resets the inactive region of the log
      prior to log recovery to avoid confusion over processing any partially
      written log records that might have been written before an unclean
      shutdown. Therefore, unconditional write I/O failures at mount time are
      caught by the reset sequence rather than log recovery and hinder the
      ability to test the latter.
      
      The device-mapper dm-flakey module uses an up/down timer to define a
      cycle for when to fail I/Os. Create a pre log recovery delay tunable
      that can be used to coordinate XFS log recovery with I/O errors
      simulated by dm-flakey. This facilitates coordination in userspace that
      allows the reset of stale log blocks to succeed and writes due to log
      recovery to fail. For example, define a dm-flakey instance with an
      uptime long enough to allow log reset to succeed and a log recovery
      delay long enough to allow the dm-flakey uptime to expire.
      
      The 'log_recovery_delay' sysfs tunable is exported under
      /sys/fs/xfs/debug and is only enabled for kernels compiled in XFS debug
      mode. The value is exported in units of seconds and allows for a delay
      of up to 60 seconds. Note that this is for XFS debug and test
      instrumentation purposes only and should not be used by applications. No
      delay is enabled by default.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2e227178
    • B
      xfs: add debug sysfs attribute set · 65b65735
      Brian Foster 提交于
      Create a top-level debug directory for global debug sysfs attributes.
      This directory is added and removed on XFS module initialization and
      removal respectively for DEBUG mode kernels only. It typically resides
      at /sys/fs/xfs/debug. It is located at the top level of the xfs sysfs
      hierarchy as attributes might define global behavior or behavior that
      must be configured before an xfs mount is available (e.g., log recovery
      behavior).
      
      Define the global debug kobject that represents the debug sysfs
      directory and add generic attribute show/store helpers to support future
      attributes. No debug attributes are exported as of yet.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      65b65735
    • E
      xfs: add a few more verifier tests · e1b05723
      Eric Sandeen 提交于
      These were exposed by fsfuzzer runs; without them we fail
      in various exciting and sometimes convoluted ways when we
      encounter disk corruption.
      
      Without the MAXLEVELS tests we tend to walk off the end of
      an array in a loop like this:
      
              for (i = 0; i < cur->bc_nlevels; i++) {
                      if (cur->bc_bufs[i])
      
      Without the dirblklog test we try to allocate more memory
      than we could possibly hope for and loop forever:
      
      xfs_dabuf_map()
      	nfsb = mp->m_dir_geo->fsbcount;
      	irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_SLEEP...
      
      As for the logbsize check, that's the convoluted one.
      
      If logbsize is specified at mount time, it's sanitized
      in xfs_parseargs; in particular it makes sure that it's
      not > XLOG_MAX_RECORD_BSIZE.
      
      If not specified at mount time, it comes from the superblock
      via sb_logsunit; this is limited to 256k at mkfs time as well;
      it's copied into m_logbsize in xfs_finish_flags().
      
      However, if for some reason the on-disk value is corrupt and
      too large, nothing catches it.  It's a circuitous path, but
      that size eventually finds its way to places that make the kernel
      very unhappy, leading to oopses in xlog_pack_data() because we
      use the size as an index into iclog->ic_data, but the array
      is not necessarily that big.
      
      Anyway - bounds checking when we read from disk is a good thing!
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e1b05723
    • B
      xfs: mark all internal workqueues as freezable · 8018ec08
      Brian Foster 提交于
      Workqueues must be explicitly set as freezable to ensure they are frozen
      in the assocated part of the hibernation/suspend sequence. Freezing of
      workqueues and kernel threads is important to ensure that modifications
      are not made on-disk after the hibernation image has been created.
      Otherwise, the in-memory state can become inconsistent with what is on
      disk and eventually lead to filesystem corruption. We have reports of
      free space btree corruptions that occur immediately after restore from
      hibernate that suggest the xfs-eofblocks workqueue could be causing
      such problems if it races with hibernation.
      
      Mark all of the internal XFS workqueues as freezable to ensure nothing
      changes on-disk once the freezer infrastructure freezes kernel threads
      and creates the hibernation image.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reported-by: NCarlos E. R. <carlos.e.r@opensuse.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8018ec08
  4. 25 8月, 2014 1 次提交
    • B
      aio: fix reqs_available handling · d856f32a
      Benjamin LaHaise 提交于
      As reported by Dan Aloni, commit f8567a38 ("aio: fix aio request
      leak when events are reaped by userspace") introduces a regression when
      user code attempts to perform io_submit() with more events than are
      available in the ring buffer.  Reverting that commit would reintroduce a
      regression when user space event reaping is used.
      
      Fixing this bug is a bit more involved than the previous attempts to fix
      this regression.  Since we do not have a single point at which we can
      count events as being reaped by user space and io_getevents(), we have
      to track event completion by looking at the number of events left in the
      event ring.  So long as there are as many events in the ring buffer as
      there have been completion events generate, we cannot call
      put_reqs_available().  The code to check for this is now placed in
      refill_reqs_available().
      
      A test program from Dan and modified by me for verifying this bug is available
      at http://www.kvack.org/~bcrl/20140824-aio_bug.c .
      Reported-by: NDan Aloni <dan@kernelim.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: NDan Aloni <dan@kernelim.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: stable@vger.kernel.org      # v3.16 and anything that f8567a38 was backported to
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d856f32a
  5. 23 8月, 2014 8 次提交
  6. 20 8月, 2014 3 次提交
  7. 18 8月, 2014 3 次提交
  8. 17 8月, 2014 3 次提交
  9. 16 8月, 2014 1 次提交
    • S
      [CIFS] Workaround MacOS server problem with SMB2.1 write · 754789a1
      Steve French 提交于
       response
      
      Writes fail to Mac servers with SMB2.1 mounts (works with cifs though) due
      to them sending an incorrect RFC1001 length for the SMB2.1 Write response.
      Workaround this problem. MacOS server sends a write response with 3 bytes
      of pad beyond the end of the SMB itself.  The RFC1001 length is 3 bytes
      more than the sum of the SMB2.1 header length + the write reponse.
      
      Incorporate feedback from Jeff and JRA to allow servers to send
      a tcp frame that is even more than three bytes too long
      (ie much longer than the SMB2/SMB3 request that it contains) but
      we do log it once now. In the earlier version of the patch I had
      limited how far off the length field could be before we fail the request.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      754789a1