1. 29 8月, 2013 1 次提交
  2. 05 6月, 2013 2 次提交
  3. 10 4月, 2013 1 次提交
  4. 04 4月, 2013 1 次提交
    • D
      ext4: fix journal callback list traversal · 5d3ee208
      Dmitry Monakhov 提交于
      It is incorrect to use list_for_each_entry_safe() for journal callback
      traversial because ->next may be removed by other task:
      ->ext4_mb_free_metadata()
        ->ext4_mb_free_metadata()
          ->ext4_journal_callback_del()
      
      This results in the following issue:
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      This patch fix the issue as follows:
      - ext4_journal_commit_callback() make list truly traversial safe
        simply by always starting from list_head
      - fix race between two ext4_journal_callback_del() and
        ext4_journal_callback_try_del()
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.com
      5d3ee208
  5. 10 2月, 2013 2 次提交
    • T
      ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd
      Theodore Ts'o 提交于
      Operations which modify extended attributes may need extra journal
      credits if inline data is used, since there is a chance that some
      extended attributes may need to get pushed to an external attribute
      block.
      
      Changes to reflect this was made in xattr.c, but they were missed in
      fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
      credits needed for xattr operations to an inline function defined in
      ext4_jbd2.h, and use it in acl.c and xattr.c.
      
      Also move the function declarations used in inline.c from xattr.h
      (where they are non-obviously hidden, and caused problems since
      ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
      them to ext4.h.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NTao Ma <boyu.mt@taobao.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      95eaefbd
    • T
      ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() · 64044abf
      Theodore Ts'o 提交于
      The ext4_unlink() and ext4_rmdir() don't actually release the blocks
      associated with the file/directory.  This gets done in a separate jbd2
      handle called via ext4_evict_inode().  Thus, we don't need to reserve
      lots of journal credits for the truncate.
      
      Note that using too many journal credits is non-optimal because it can
      leading to the journal transmit getting closed too early, before it is
      strictly necessary.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      64044abf
  6. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  7. 09 11月, 2012 1 次提交
  8. 23 7月, 2012 2 次提交
    • A
      ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() · b50924c2
      Artem Bityutskiy 提交于
      The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
      anymore and we can kill it.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b50924c2
    • A
      ext4: make quota as first class supported feature · 7c319d32
      Aditya Kali 提交于
      This patch adds support for quotas as a first class feature in ext4;
      which is to say, the quota files are stored in hidden inodes as file
      system metadata, instead of as separate files visible in the file system
      directory hierarchy.
      
      It is based on the proposal at:                                                                                                           
      https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4
      
      This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA
      which, when turned on, enables quota accounting at mount time
      iteself. Also, the quota inodes are stored in two additional superblock
      fields.  Some changes introduced by this patch that should be pointed
      out are:
      
      1) Two new ext4-superblock fields - s_usr_quota_inum and
         s_grp_quota_inum for storing the quota inodes in use.
      2) Default quota inodes are: inode#3 for tracking userquota and inode#4
         for tracking group quota. The superblock fields can be set to use
         other inodes as well.
      3) If the QUOTA feature and corresponding quota inodes are set in
         superblock, the quota usage tracking is turned on at mount time. On
         'quotaon' ioctl, the quota limits enforcement is turned
         on. 'quotaoff' ioctl turns off only the limits enforcement in this
         case.
      4) When QUOTA feature is in use, the quota mount options 'quota',
         'usrquota', 'grpquota' are ignored by the kernel.
      5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize
         quota inodes. The default reserved inodes will not be visible to user
         as regular files.
      6) The quota-tools will need to be modified to support hidden quota
         files on ext4. E2fsprogs will also include support for creating and
         fixing quota files.
      7) Support is only for the new V2 quota file format.
      Tested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJohann Lombardi <johann@whamcloud.com>
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7c319d32
  9. 30 4月, 2012 1 次提交
  10. 21 2月, 2012 2 次提交
    • B
      ext4: expand commit callback and · 18aadd47
      Bobi Jam 提交于
      The per-commit callback was used by mballoc code to manage free space
      bitmaps after deleted blocks have been released.  This patch expands
      it to support multiple different callbacks, to allow other things to
      be done after the commit has been completed.
      Signed-off-by: NBobi Jam <bobijam@whamcloud.com>
      Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      18aadd47
    • L
      ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc · 3d2b1582
      Lukas Czerner 提交于
      Ext4 does not support data journalling with delayed allocation enabled.
      We even do not allow to mount the file system with delayed allocation
      and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
      so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
      system mounted with delayed allocation (default) and that's where
      problem arises. The easies way to reproduce this problem is with the
      following set of commands:
      
       mkfs.ext4 /dev/sdd
       mount /dev/sdd /mnt/test1
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
       chattr +j /mnt/test1/file
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
       chattr -j /mnt/test1/file
      
      Additionally it can be reproduced quite reliably with xfstests 272 and
      269. In fact the above reproducer is a part of test 272.
      
      To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
      the file system is mounted with delayed allocation. This can be easily
      done by fixing ext4_should_*_data() functions do ignore data journal
      flag when delalloc is set (suggested by Ted). We also have to set the
      appropriate address space operations for the inode (again, ignoring data
      journal flag if delalloc enabled).
      
      Additionally this commit introduces ext4_inode_journal_mode() function
      because ext4_should_*_data() has already had a lot of common code and
      this change is putting it all into one function so it is easier to
      read.
      
      Successfully tested with xfstests in following configurations:
      
      delalloc + data=ordered
      delalloc + data=writeback
      data=journal
      nodelalloc + data=ordered
      nodelalloc + data=writeback
      nodelalloc + data=journal
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3d2b1582
  11. 13 8月, 2011 1 次提交
    • C
      ext4: Fix ext4_should_writeback_data() for no-journal mode · 441c8508
      Curt Wohlgemuth 提交于
      ext4_should_writeback_data() had an incorrect sequence of
      tests to determine if it should return 0 or 1: in
      particular, even in no-journal mode, 0 was being returned
      for a non-regular-file inode.
      
      This meant that, in non-journal mode, we would use
      ext4_journalled_aops for directories, symlinks, and other
      non-regular files.  However, calling journalled aop
      callbacks when there is no valid handle, can cause problems.
      
      This would cause a kernel crash with Jan Kara's commit
      2d859db3 ("ext4: fix data corruption in inodes with
      journalled data"), because we now dereference 'handle' in
      ext4_journalled_write_end().
      
      I also added BUG_ONs to check for a valid handle in the
      obviously journal-only aops callbacks.
      
      I tested this running xfstests with a scratch device in
      these modes:
      
         - no-journal
         - data=ordered
         - data=writeback
         - data=journal
      
      All work fine; the data=journal run has many failures and a
      crash in xfstests 074, but this is no different from a
      vanilla kernel.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      441c8508
  12. 09 5月, 2011 1 次提交
    • T
      ext4: remove unneeded ext4_journal_get_undo_access · 2cd05cc3
      Theodore Ts'o 提交于
      The block allocation code used to use jbd2_journal_get_undo_access as
      a way to make changes that wouldn't show up until the commit took
      place.  The new multi-block allocation code has a its own way of
      preventing newly freed blocks from getting reused until the commit
      takes place (it avoids updating the buddy bitmaps until the commit is
      done), so we don't need to use jbd2_journal_get_undo_access(), which
      has extra overhead compared to jbd2_journal_get_write_access().
      
      There was one last vestigal use of ext4_journal_get_undo_access() in
      ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
      and then remove the ext4_journal_get_undo_access() support.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2cd05cc3
  13. 05 4月, 2011 1 次提交
  14. 21 3月, 2011 1 次提交
  15. 11 1月, 2011 1 次提交
  16. 27 7月, 2010 1 次提交
  17. 30 6月, 2010 1 次提交
  18. 15 6月, 2010 1 次提交
  19. 12 6月, 2010 1 次提交
    • T
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o 提交于
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a0375156
  20. 17 5月, 2010 1 次提交
  21. 05 3月, 2010 1 次提交
    • J
      ext4: use ext4_get_block_write in buffer write · 744692dc
      Jiaying Zhang 提交于
      Allocate uninitialized extent before ext4 buffer write and
      convert the extent to initialized after io completes.
      The purpose is to make sure an extent can only be marked
      initialized after it has been written with new data so
      we can safely drop the i_mutex lock in ext4 DIO read without
      exposing stale data. This helps to improve multi-thread DIO
      read performance on high-speed disks.
      
      Skip the nobh and data=journal mount cases to make things simple for now.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      744692dc
  22. 09 12月, 2009 2 次提交
  23. 23 11月, 2009 1 次提交
  24. 25 11月, 2009 1 次提交
  25. 23 11月, 2009 1 次提交
    • T
      ext4: move ext4_forget() to ext4_jbd2.c · d6797d14
      Theodore Ts'o 提交于
      The ext4_forget() function better belongs in ext4_jbd2.c.  This will
      allow us to do some cleanup of the ext4_journal_revoke() and
      ext4_journal_forget() functions, as well as giving us better error
      reporting since we can report the caller of ext4_forget() when things
      go wrong.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d6797d14
  26. 29 9月, 2009 1 次提交
  27. 13 7月, 2009 1 次提交
    • C
      ext4: Fix buffer head reference leak in no-journal mode · e6b5d301
      Curt Wohlgemuth 提交于
      We found a problem with buffer head reference leaks when using an ext4
      partition without a journal.  In particular, calls to ext4_forget() would
      not to a brelse() on the input buffer head, which will cause pages they
      belong to to not be reclaimable.
      
      Further investigation showed that all places where ext4_journal_forget() and
      ext4_journal_revoke() are called are subject to the same problem.  The patch
      below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
      release of the buffer head when the journal handle isn't valid.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e6b5d301
  28. 09 7月, 2009 1 次提交
    • T
      ext4: fix no journal corruption with locale-gen · 5adfee9c
      Theodore Ts'o 提交于
      If there is no journal, ext4_should_writeback_data() should return
      TRUE.  This will fix ext4_set_aops() to set ext4_da_ops in the case of
      delayed allocation; otherwise ext4_journaled_aops gets used by
      default, which doesn't handle delayed allocation properly.
      
      The advantage of using ext4_should_writeback_data() approach is that
      it should handle nobh better as well.
      
      Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
      Kumar for suggesting this approach.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5adfee9c
  29. 07 1月, 2009 2 次提交
    • T
      ext4: Remove "extents" mount option · 83982b6f
      Theodore Ts'o 提交于
      This mount option is largely superfluous, and in fact the way it was
      implemented was buggy; if a filesystem which did not have the extents
      feature flag was mounted -o extents, the filesystem would attempt to
      create and use extents-based file even though the extents feature flag
      was not eabled.  The simplest thing to do is to nuke the mount option
      entirely.  It's not all that useful to force the non-creation of new
      extent-based files if the filesystem can support it.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      83982b6f
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  30. 20 8月, 2008 1 次提交
    • M
      ext4: journal credits calulation cleanup and fix for non-extent writepage · a02908f1
      Mingming Cao 提交于
      When considering how many journal credits are needed for modifying a
      chunk of data, we need to account for the super block, inode block,
      quota blocks and xattr block, indirect/index blocks, also, group bitmap
      and group descriptor blocks for new allocation (including data and
      indirect/index blocks). There are many places in ext4 do the calculation
      on their own and often missed one or two meta blocks, and often they
      assume single block allocation, and did not considering the multile
      chunk of allocation case.
      
      This patch is trying to cleanup current journal credit code, provides
      some common helper funtion to calculate the journal credits, to be used
      for writepage, writepages, DIO, fallocate, migration, defrag, and for
      both nonextent and extent files.
      
      This patch modified the writepage/write_begin credit caculation for
      nonextent files, to use the new helper function. It also fixed the
      problem that writepage on nonextent files did not consider the case
      blocksize <pagesize, thus could possibelly need multiple block
      allocation in a single transaction.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a02908f1
  31. 12 7月, 2008 1 次提交
  32. 14 7月, 2008 1 次提交
  33. 30 4月, 2008 1 次提交
  34. 18 10月, 2007 1 次提交