1. 01 2月, 2019 1 次提交
    • T
      Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" · 8fdd60f2
      Theodore Ts'o 提交于
      This reverts commit ad211f3e.
      
      As Jan Kara pointed out, this change was unsafe since it means we lose
      the call to sync_mapping_buffers() in the nojournal case.  The
      original point of the commit was avoid taking the inode mutex (since
      it causes a lockdep warning in generic/113); but we need the mutex in
      order to call sync_mapping_buffers().
      
      The real fix to this problem was discussed here:
      
      https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
      
      The proposed patch was to fix a syzbot complaint, but the problem can
      also demonstrated via "kvm-xfstests -c nojournal generic/113".
      Multiple solutions were discused in the e-mail thread, but none have
      landed in the kernel as of this writing.  Anyway, commit
      ad211f3e is absolutely the wrong way to suppress the lockdep, so
      revert it.
      
      Fixes: ad211f3e ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported: Jan Kara <jack@suse.cz>
      8fdd60f2
  2. 31 12月, 2018 2 次提交
  3. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  4. 17 7月, 2017 1 次提交
    • D
      VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) · bc98a42c
      David Howells 提交于
      Firstly by applying the following with coccinelle's spatch:
      
      	@@ expression SB; @@
      	-SB->s_flags & MS_RDONLY
      	+sb_rdonly(SB)
      
      to effect the conversion to sb_rdonly(sb), then by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(!sb_rdonly(SB)) && A
      	+!sb_rdonly(SB) && A
      	|
      	-A != (sb_rdonly(SB))
      	+A != sb_rdonly(SB)
      	|
      	-A == (sb_rdonly(SB))
      	+A == sb_rdonly(SB)
      	|
      	-!(sb_rdonly(SB))
      	+!sb_rdonly(SB)
      	|
      	-A && (sb_rdonly(SB))
      	+A && sb_rdonly(SB)
      	|
      	-A || (sb_rdonly(SB))
      	+A || sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) != A
      	+sb_rdonly(SB) != A
      	|
      	-(sb_rdonly(SB)) == A
      	+sb_rdonly(SB) == A
      	|
      	-(sb_rdonly(SB)) && A
      	+sb_rdonly(SB) && A
      	|
      	-(sb_rdonly(SB)) || A
      	+sb_rdonly(SB) || A
      	)
      
      	@@ expression A, B, SB; @@
      	(
      	-(sb_rdonly(SB)) ? 1 : 0
      	+sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) ? A : B
      	+sb_rdonly(SB) ? A : B
      	)
      
      to remove left over excess bracketage and finally by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(A & MS_RDONLY) != sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
      	|
      	-(A & MS_RDONLY) == sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
      	)
      
      to make comparisons against the result of sb_rdonly() (which is a bool)
      work correctly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc98a42c
  5. 06 7月, 2017 1 次提交
  6. 05 2月, 2017 1 次提交
  7. 06 9月, 2016 1 次提交
  8. 27 6月, 2016 1 次提交
  9. 16 4月, 2015 1 次提交
  10. 03 4月, 2015 1 次提交
  11. 13 6月, 2013 1 次提交
  12. 05 6月, 2013 3 次提交
  13. 04 4月, 2013 1 次提交
    • T
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · d76a3a77
      Theodore Ts'o 提交于
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Reported-by: NGeorge Barnett <gbarnett@atlassian.com>
      Cc: stable@vger.kernel.org
      
      d76a3a77
  14. 26 12月, 2012 1 次提交
  15. 11 12月, 2012 1 次提交
  16. 05 10月, 2012 1 次提交
    • D
      ext4: fix ext4_flush_completed_IO wait semantics · c278531d
      Dmitry Monakhov 提交于
      BUG #1) All places where we call ext4_flush_completed_IO are broken
          because buffered io and DIO/AIO goes through three stages
          1) submitted io,
          2) completed io (in i_completed_io_list) conversion pended
          3) finished  io (conversion done)
          And by calling ext4_flush_completed_IO we will flush only
          requests which were in (2) stage, which is wrong because:
           1) punch_hole and truncate _must_ wait for all outstanding unwritten io
            regardless to it's state.
           2) fsync and nolock_dio_read should also wait because there is
              a time window between end_page_writeback() and ext4_add_complete_io()
              As result integrity fsync is broken in case of buffered write
              to fallocated region:
              fsync                                      blkdev_completion
      	 ->filemap_write_and_wait_range
                                                         ->ext4_end_bio
                                                           ->end_page_writeback
                <-- filemap_write_and_wait_range return
      	 ->ext4_flush_completed_IO
         	 sees empty i_completed_io_list but pended
         	 conversion still exist
                                                           ->ext4_add_complete_io
      
      BUG #2) Race window becomes wider due to the 'ext4: completed_io
      locking cleanup V4' patch series
      
      This patch make following changes:
      1) ext4_flush_completed_io() now first try to flush completed io and when
         wait for any outstanding unwritten io via ext4_unwritten_wait()
      2) Rename function to more appropriate name.
      3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
         prevent endless wait
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      c278531d
  17. 29 9月, 2012 1 次提交
    • D
      ext4: completed_io locking cleanup · 28a535f9
      Dmitry Monakhov 提交于
      Current unwritten extent conversion state-machine is very fuzzy.
      - For unknown reason it performs conversion under i_mutex. What for?
        My diagnosis:
        We already protect extent tree with i_data_sem, truncate and punch_hole
        should wait for DIO, so the only data we have to protect is end_io->flags
        modification, but only flush_completed_IO and end_io_work modified this
        flags and we can serialize them via i_completed_io_lock.
      
        Currently all these games with mutex_trylock result in the following deadlock
         truncate:                          kworker:
          ext4_setattr                       ext4_end_io_work
          mutex_lock(i_mutex)
          inode_dio_wait(inode)  ->BLOCK
                                   DEADLOCK<- mutex_trylock()
                                              inode_dio_done()
        #TEST_CASE1_BEGIN
        MNT=/mnt_scrach
        unlink $MNT/file
        fallocate -l $((1024*1024*1024)) $MNT/file
        aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
        sleep 2
        truncate -s 0 $MNT/file
        #TEST_CASE1_END
      
      Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286
      
      This patch makes state machine simple and clean:
      
      (1) xxx_end_io schedule final extent conversion simply by calling
          ext4_add_complete_io(), which append it to ei->i_completed_io_list
          NOTE1: because of (2A) work should be queued only if
          ->i_completed_io_list was empty, otherwise the work is scheduled already.
      
      (2) ext4_flush_completed_IO is responsible for handling all pending
          end_io from ei->i_completed_io_list
          Flushing sequence consists of following stages:
          A) LOCKED: Atomically drain completed_io_list to local_list
          B) Perform extents conversion
          C) LOCKED: move converted io's to to_free list for final deletion
             	     This logic depends on context which we was called from.
          D) Final end_io context destruction
          NOTE1: i_mutex is no longer required because end_io->flags modification
          is protected by ei->ext4_complete_io_lock
      
      Full list of changes:
      - Move all completion end_io related routines to page-io.c in order to improve
        logic locality
      - Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
      - remove EXT4_IO_END_FSYNC
      - Improve SMP scalability by removing useless i_mutex which does not
        protect io->flags anymore.
      - Reduce lock contention on i_completed_io_lock by optimizing list walk.
      - Rename ext4_end_io_nolock to end4_end_io and make it static
      - Check flush completion status to ext4_ext_punch_hole(). Because it is
        not good idea to punch blocks from corrupted inode.
      
      Changes since V3 (in request to Jan's comments):
        Fall back to active flush_completed_IO() approach in order to prevent
        performance issues with nolocked DIO reads.
      Changes since V2:
        Fix use-after-free caused by race truncate vs end_io_work
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      28a535f9
  18. 17 8月, 2012 1 次提交
  19. 14 7月, 2012 2 次提交
  20. 05 3月, 2012 1 次提交
    • J
      ext4: fix race between sync and completed io work · 491caa43
      Jeff Moyer 提交于
      The following command line will leave the aio-stress process unkillable
      on an ext4 file system (in my case, mounted on /mnt/test):
      
      aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2
      
      This is using the aio-stress program from the xfstests test suite.
      That particular command line tells aio-stress to do random writes to
      20 files from 20 threads (one thread per file).  The files are NOT
      preallocated, so you will get writes to random offsets within the
      file, thus creating holes and extending i_size.  It also opens the
      file with O_DIRECT and O_SYNC.
      
      On to the problem.  When an I/O requires unwritten extent conversion,
      it is queued onto the completed_io_list for the ext4 inode.  Two code
      paths will pull work items from this list.  The first is the
      ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
      which is called via the fsync path (and O_SYNC handling, as well).
      There are two issues I've found in these code paths.  First, if the
      fsync path beats the work routine to a particular I/O, the work
      routine will free the io_end structure!  It does not take into account
      the fact that the io_end may still be in use by the fsync path.  I've
      fixed this issue by adding yet another IO_END flag, indicating that
      the io_end is being processed by the fsync path.
      
      The second problem is that the work routine will make an assignment to
      io->flag outside of the lock.  I have witnessed this result in a hang
      at umount.  Moving the flag setting inside the lock resolved that
      problem.
      
      The problem was introduced by commit b82e384c ("ext4: optimize
      locking for end_io extent conversion"), which first appeared in 3.2.
      As such, the fix should be backported to that release (probably along
      with the unwritten extent conversion race fix).
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      CC: stable@kernel.org
      491caa43
  21. 31 10月, 2011 2 次提交
    • T
      ext4: optimize locking for end_io extent conversion · b82e384c
      Theodore Ts'o 提交于
      Now that we are doing the locking correctly, we need to grab the
      i_completed_io_lock() twice per end_io.  We can clean this up by
      removing the structure from the i_complted_io_list, and use this as
      the locking mechanism to prevent ext4_flush_completed_IO() racing
      against ext4_end_io_work(), instead of clearing the
      EXT4_IO_END_UNWRITTEN in io->flag.
      
      In addition, if the ext4_convert_unwritten_extents() returns an error,
      we no longer keep the end_io structure on the linked list.  This
      doesn't help, because it tends to lock up the file system and wedges
      the system.  That's one way to call attention to the problem, but it
      doesn't help the overall robustness of the system.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b82e384c
    • T
      ext4: Use correct locking for ext4_end_io_nolock() · d73d5046
      Tao Ma 提交于
      We must hold i_completed_io_lock when manipulating anything on the
      i_completed_io_list linked list.  This includes io->lock, which we
      were checking in ext4_end_io_nolock().
      
      So move this check to ext4_end_io_work().  This also has the bonus of
      avoiding extra work if it is already done without needing to take the
      mutex.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d73d5046
  22. 18 10月, 2011 1 次提交
  23. 31 7月, 2011 1 次提交
    • T
      ext4: fix races in ext4_sync_parent() · d59729f4
      Theodore Ts'o 提交于
      Fix problems if fsync() races against a rename of a parent directory
      as pointed out by Al Viro in his own inimitable way:
      
      >While we are at it, could somebody please explain what the hell is ext4
      >doing in
      >static int ext4_sync_parent(struct inode *inode)
      >{
      >        struct writeback_control wbc;
      >        struct dentry *dentry = NULL;
      >        int ret = 0;
      >
      >        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
      >                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
      >                dentry = list_entry(inode->i_dentry.next,
      >                                    struct dentry, d_alias);
      >                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
      >                        break;
      >                inode = dentry->d_parent->d_inode;
      >                ret = sync_mapping_buffers(inode->i_mapping);
      >                ...
      >Note that dentry obviously can't be NULL there.  dentry->d_parent is never
      >NULL.  And dentry->d_parent would better not be negative, for crying out
      >loud!  What's worse, there's no guarantees that dentry->d_parent will
      >remain our parent over that sync_mapping_buffers() *and* that inode won't
      >just be freed under us (after rename() and memory pressure leading to
      >eviction of what used to be our dentry->d_parent)......
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d59729f4
  24. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  25. 25 5月, 2011 1 次提交
    • J
      ext4: fix waiting and sending of a barrier in ext4_sync_file() · 93628ffb
      Jan Kara 提交于
      jbd2_log_start_commit() returns 1 only when we really start a
      transaction.  But we also need to wait for a transaction when the
      commit is already running.  Fix this problem by waiting for
      transaction commit unconditionally (which is just a quick check if the
      transaction is already committed).
      
      Also we have to be more careful with sending of a barrier because when
      transaction is being committed in parallel to ext4_sync_file()
      running, we cannot be sure that the barrier the journalling code sends
      happens after we wrote all the data for fsync (note that not every
      data writeout needs to trigger metadata changes thus commit of some
      metadata changes can be running while other data is still written
      out). So use jbd2_will_send_data_barrier() helper to detect the common
      cases when we can be sure barrier will be issued by the commit code
      and issue the barrier ourselves in the remaining cases.
      Reported-by: NEdward Goggin <egoggin@vmware.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      93628ffb
  26. 09 5月, 2011 1 次提交
    • T
      ext4: use EXT4FS_DEBUG instead of EXT4_DEBUG in fsync.c · e8bbe8c4
      Tao Ma 提交于
      We have EXT4FS_DEBUG for some old debug and CONFIG_EXT4_DEBUG
      for the new mballoc debug, but there isn't any EXT4_DEBUG.
      
      As CONFIG_EXT4_DEBUG seems to be only used in mballoc, use
      EXT4FS_DEBUG in fsync.c.
      
      [ It doesn't really matter; although I'm including this commit for
        consistency's sake.  The whole point of the #ifdef's is to disable
        the debugging code.  In general you're not going to want to enable
        all of the code protected by EXT4FS_DEBUG at the same time.  -- Ted ]
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e8bbe8c4
  27. 11 4月, 2011 1 次提交
    • C
      ext4: sync the directory inode in ext4_sync_parent() · 0893ed45
      Curt Wohlgemuth 提交于
      ext4 has taken the stance that, in the absence of a journal,
      when an fsync/fdatasync of an inode is done, the parent
      directory should be sync'ed if this inode entry is new.
      ext4_sync_parent(), which implements this, does indeed sync
      the dirent pages for parent directories, but it does not
      sync the directory *inode*.  This patch fixes this.
      
      Also now return error status from ext4_sync_parent().
      
      I tested this using a power fail test, which panics a
      machine running a file server getting requests from a
      client.  Without this patch, on about every other test run,
      the server is missing many, many files that had been synced.
      With this patch, on > 6 runs, I see zero files being lost.
      
      Google-Bug-Id: 4179519
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0893ed45
  28. 31 3月, 2011 1 次提交
  29. 22 3月, 2011 1 次提交
  30. 11 1月, 2011 1 次提交
    • J
      ext4: flush the i_completed_io_list during ext4_truncate · 3889fd57
      Jiaying Zhang 提交于
      Ted first found the bug when running 2.6.36 kernel with dioread_nolock
      mount option that xfstests #13 complained about wrong file size during fsck.
      However, the bug exists in the older kernels as well although it is
      somehow harder to trigger.
      
      The problem is that ext4_end_io_work() can happen after we have truncated an
      inode to a smaller size. Then when ext4_end_io_work() calls 
      ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
      been truncated, so the inode size becomes inconsistent with the allocated
      blocks. 
      
      The following patch flushes the i_completed_io_list during truncate to reduce 
      the risk that some pending end_io requests are executed later and convert 
      already truncated blocks to initialized. 
      
      Note that although the fix helps reduce the problem a lot there may still 
      be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
      problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
      held, it can race with an ongoing write request so that the io_end request is
      processed later when the corresponding blocks have been truncated.
      
      Ted and I have discussed the problem offline and we saw a few ways to fix
      the race completely:
      
      a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
      whenever vmtruncate() is called. The i_mutex lock prevents any new write
      requests from entering writeback and the i_alloc_sem prevents the race
      from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
      is called from do_truncate(), which is probably the most common case.
      However, there are places where we may call vmtruncate() without holding
      either i_mutex or i_alloc_sem. I would like to ask for other people's
      opinions on what locks are expected to be held before calling vmtruncate().
      There seems a disagreement among the callers of that function.
      
      b) We change the ext4 write path so that we change the extent tree to contain 
      the newly allocated blocks and update i_size both at the same time --- when 
      the write of the data blocks is completed.
      
      c) We add some additional locking to synchronize vmtruncate() and 
      ext4_end_io_work(). This approach may have performance implications so we
      need to be careful.
      
      All of the above proposals may require more substantial changes, so
      we may consider to take the following patch as a bandaid.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3889fd57
  31. 28 10月, 2010 1 次提交
  32. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  33. 28 5月, 2010 2 次提交
  34. 17 5月, 2010 1 次提交