1. 15 3月, 2019 1 次提交
    • J
      ext4: fix NULL pointer dereference while journal is aborted · fa30dde3
      Jiufei Xue 提交于
      We see the following NULL pointer dereference while running xfstests
      generic/475:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
      RIP: 0010:ext4_do_update_inode+0x4ec/0x760
      ...
      Call Trace:
      ? jbd2_journal_get_write_access+0x42/0x50
      ? __ext4_journal_get_write_access+0x2c/0x70
      ? ext4_truncate+0x186/0x3f0
      ext4_mark_iloc_dirty+0x61/0x80
      ext4_mark_inode_dirty+0x62/0x1b0
      ext4_truncate+0x186/0x3f0
      ? unmap_mapping_pages+0x56/0x100
      ext4_setattr+0x817/0x8b0
      notify_change+0x1df/0x430
      do_truncate+0x5e/0x90
      ? generic_permission+0x12b/0x1a0
      
      This is triggered because the NULL pointer handle->h_transaction was
      dereferenced in function ext4_update_inode_fsync_trans().
      I found that the h_transaction was set to NULL in jbd2__journal_restart
      but failed to attached to a new transaction while the journal is aborted.
      
      Fix this by checking the handle before updating the inode.
      
      Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: stable@kernel.org
      fa30dde3
  2. 24 1月, 2019 1 次提交
  3. 18 12月, 2017 1 次提交
    • T
      ext4: fix up remaining files with SPDX cleanups · f5166768
      Theodore Ts'o 提交于
      A number of ext4 source files were skipped due because their copyright
      permission statements didn't match the expected text used by the
      automated conversion utilities.  I've added SPDX tags for the rest.
      
      While looking at some of these files, I've noticed that we have quite
      a bit of variation on the licenses that were used --- in particular
      some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
      and we have some files that have a LGPL-2.1 license (which was quite
      surprising).
      
      I've not attempted to do any license changes.  Even if it is perfectly
      legal to relicense to GPL 2.0-only for consistency's sake, that should
      be done with ext4 developer community discussion.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      f5166768
  4. 06 8月, 2017 1 次提交
  5. 22 6月, 2017 2 次提交
  6. 11 12月, 2016 1 次提交
    • S
      ext4: do not perform data journaling when data is encrypted · 73b92a2a
      Sergey Karamov 提交于
      Currently data journalling is incompatible with encryption: enabling both
      at the same time has never been supported by design, and would result in
      unpredictable behavior. However, users are not precluded from turning on
      both features simultaneously. This change programmatically replaces data
      journaling for encrypted regular files with ordered data journaling mode.
      
      Background:
      Journaling encrypted data has not been supported because it operates on
      buffer heads of the page in the page cache. Namely, when the commit
      happens, which could be up to five seconds after caching, the commit
      thread uses the buffer heads attached to the page to copy the contents of
      the page to the journal. With encryption, it would have been required to
      keep the bounce buffer with ciphertext for up to the aforementioned five
      seconds, since the page cache can only hold plaintext and could not be
      used for journaling. Alternatively, it would be required to setup the
      journal to initiate a callback at the commit time to perform deferred
      encryption - in this case, not only would the data have to be written
      twice, but it would also have to be encrypted twice. This level of
      complexity was not justified for a mode that in practice is very rarely
      used because of the overhead from the data journalling.
      
      Solution:
      If data=journaled has been set as a mount option for a filesystem, or if
      journaling is enabled on a regular file, do not perform journaling if the
      file is also encrypted, instead fall back to the data=ordered mode for the
      file.
      
      Rationale:
      The intent is to allow seamless and proper filesystem operation when
      journaling and encryption have both been enabled, and have these two
      conflicting features gracefully resolved by the filesystem.
      
      Fixes: 44614711Signed-off-by: NSergey Karamov <skaramov@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      73b92a2a
  7. 27 6月, 2016 1 次提交
  8. 24 4月, 2016 2 次提交
    • J
      ext4: do not ask jbd2 to write data for delalloc buffers · ee0876bc
      Jan Kara 提交于
      Currently we ask jbd2 to write all dirty allocated buffers before
      committing a transaction when doing writeback of delay allocated blocks.
      However this is unnecessary since we move all pages to writeback state
      before dropping a transaction handle and then submit all the necessary
      IO. We still need the transaction commit to wait for all the outstanding
      writeback before flushing disk caches during transaction commit to avoid
      data exposure issues though. Use the new jbd2 capability and ask it to
      only wait for outstanding writeback during transaction commit when
      writing back data in ext4_writepages().
      Tested-by: N"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ee0876bc
    • J
      jbd2: add support for avoiding data writes during transaction commits · 41617e1a
      Jan Kara 提交于
      Currently when filesystem needs to make sure data is on permanent
      storage before committing a transaction it adds inode to transaction's
      inode list. During transaction commit, jbd2 writes back all dirty
      buffers that have allocated underlying blocks and waits for the IO to
      finish. However when doing writeback for delayed allocated data, we
      allocate blocks and immediately submit the data. Thus asking jbd2 to
      write dirty pages just unnecessarily adds more work to jbd2 possibly
      writing back other redirtied blocks.
      
      Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
      outstanding data writes before committing a transaction and thus avoid
      unnecessary writes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      41617e1a
  9. 18 10月, 2015 1 次提交
  10. 11 9月, 2014 1 次提交
  11. 12 5月, 2014 1 次提交
  12. 29 8月, 2013 1 次提交
  13. 05 6月, 2013 2 次提交
  14. 10 4月, 2013 1 次提交
  15. 04 4月, 2013 1 次提交
    • D
      ext4: fix journal callback list traversal · 5d3ee208
      Dmitry Monakhov 提交于
      It is incorrect to use list_for_each_entry_safe() for journal callback
      traversial because ->next may be removed by other task:
      ->ext4_mb_free_metadata()
        ->ext4_mb_free_metadata()
          ->ext4_journal_callback_del()
      
      This results in the following issue:
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      This patch fix the issue as follows:
      - ext4_journal_commit_callback() make list truly traversial safe
        simply by always starting from list_head
      - fix race between two ext4_journal_callback_del() and
        ext4_journal_callback_try_del()
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.com
      5d3ee208
  16. 10 2月, 2013 2 次提交
    • T
      ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd
      Theodore Ts'o 提交于
      Operations which modify extended attributes may need extra journal
      credits if inline data is used, since there is a chance that some
      extended attributes may need to get pushed to an external attribute
      block.
      
      Changes to reflect this was made in xattr.c, but they were missed in
      fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
      credits needed for xattr operations to an inline function defined in
      ext4_jbd2.h, and use it in acl.c and xattr.c.
      
      Also move the function declarations used in inline.c from xattr.h
      (where they are non-obviously hidden, and caused problems since
      ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
      them to ext4.h.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NTao Ma <boyu.mt@taobao.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      95eaefbd
    • T
      ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() · 64044abf
      Theodore Ts'o 提交于
      The ext4_unlink() and ext4_rmdir() don't actually release the blocks
      associated with the file/directory.  This gets done in a separate jbd2
      handle called via ext4_evict_inode().  Thus, we don't need to reserve
      lots of journal credits for the truncate.
      
      Note that using too many journal credits is non-optimal because it can
      leading to the journal transmit getting closed too early, before it is
      strictly necessary.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      64044abf
  17. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  18. 09 11月, 2012 1 次提交
  19. 23 7月, 2012 2 次提交
    • A
      ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() · b50924c2
      Artem Bityutskiy 提交于
      The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
      anymore and we can kill it.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b50924c2
    • A
      ext4: make quota as first class supported feature · 7c319d32
      Aditya Kali 提交于
      This patch adds support for quotas as a first class feature in ext4;
      which is to say, the quota files are stored in hidden inodes as file
      system metadata, instead of as separate files visible in the file system
      directory hierarchy.
      
      It is based on the proposal at:                                                                                                           
      https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4
      
      This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA
      which, when turned on, enables quota accounting at mount time
      iteself. Also, the quota inodes are stored in two additional superblock
      fields.  Some changes introduced by this patch that should be pointed
      out are:
      
      1) Two new ext4-superblock fields - s_usr_quota_inum and
         s_grp_quota_inum for storing the quota inodes in use.
      2) Default quota inodes are: inode#3 for tracking userquota and inode#4
         for tracking group quota. The superblock fields can be set to use
         other inodes as well.
      3) If the QUOTA feature and corresponding quota inodes are set in
         superblock, the quota usage tracking is turned on at mount time. On
         'quotaon' ioctl, the quota limits enforcement is turned
         on. 'quotaoff' ioctl turns off only the limits enforcement in this
         case.
      4) When QUOTA feature is in use, the quota mount options 'quota',
         'usrquota', 'grpquota' are ignored by the kernel.
      5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize
         quota inodes. The default reserved inodes will not be visible to user
         as regular files.
      6) The quota-tools will need to be modified to support hidden quota
         files on ext4. E2fsprogs will also include support for creating and
         fixing quota files.
      7) Support is only for the new V2 quota file format.
      Tested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJohann Lombardi <johann@whamcloud.com>
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7c319d32
  20. 30 4月, 2012 1 次提交
  21. 21 2月, 2012 2 次提交
    • B
      ext4: expand commit callback and · 18aadd47
      Bobi Jam 提交于
      The per-commit callback was used by mballoc code to manage free space
      bitmaps after deleted blocks have been released.  This patch expands
      it to support multiple different callbacks, to allow other things to
      be done after the commit has been completed.
      Signed-off-by: NBobi Jam <bobijam@whamcloud.com>
      Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      18aadd47
    • L
      ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc · 3d2b1582
      Lukas Czerner 提交于
      Ext4 does not support data journalling with delayed allocation enabled.
      We even do not allow to mount the file system with delayed allocation
      and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
      so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
      system mounted with delayed allocation (default) and that's where
      problem arises. The easies way to reproduce this problem is with the
      following set of commands:
      
       mkfs.ext4 /dev/sdd
       mount /dev/sdd /mnt/test1
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
       chattr +j /mnt/test1/file
       dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
       chattr -j /mnt/test1/file
      
      Additionally it can be reproduced quite reliably with xfstests 272 and
      269. In fact the above reproducer is a part of test 272.
      
      To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
      the file system is mounted with delayed allocation. This can be easily
      done by fixing ext4_should_*_data() functions do ignore data journal
      flag when delalloc is set (suggested by Ted). We also have to set the
      appropriate address space operations for the inode (again, ignoring data
      journal flag if delalloc enabled).
      
      Additionally this commit introduces ext4_inode_journal_mode() function
      because ext4_should_*_data() has already had a lot of common code and
      this change is putting it all into one function so it is easier to
      read.
      
      Successfully tested with xfstests in following configurations:
      
      delalloc + data=ordered
      delalloc + data=writeback
      data=journal
      nodelalloc + data=ordered
      nodelalloc + data=writeback
      nodelalloc + data=journal
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      3d2b1582
  22. 13 8月, 2011 1 次提交
    • C
      ext4: Fix ext4_should_writeback_data() for no-journal mode · 441c8508
      Curt Wohlgemuth 提交于
      ext4_should_writeback_data() had an incorrect sequence of
      tests to determine if it should return 0 or 1: in
      particular, even in no-journal mode, 0 was being returned
      for a non-regular-file inode.
      
      This meant that, in non-journal mode, we would use
      ext4_journalled_aops for directories, symlinks, and other
      non-regular files.  However, calling journalled aop
      callbacks when there is no valid handle, can cause problems.
      
      This would cause a kernel crash with Jan Kara's commit
      2d859db3 ("ext4: fix data corruption in inodes with
      journalled data"), because we now dereference 'handle' in
      ext4_journalled_write_end().
      
      I also added BUG_ONs to check for a valid handle in the
      obviously journal-only aops callbacks.
      
      I tested this running xfstests with a scratch device in
      these modes:
      
         - no-journal
         - data=ordered
         - data=writeback
         - data=journal
      
      All work fine; the data=journal run has many failures and a
      crash in xfstests 074, but this is no different from a
      vanilla kernel.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      441c8508
  23. 09 5月, 2011 1 次提交
    • T
      ext4: remove unneeded ext4_journal_get_undo_access · 2cd05cc3
      Theodore Ts'o 提交于
      The block allocation code used to use jbd2_journal_get_undo_access as
      a way to make changes that wouldn't show up until the commit took
      place.  The new multi-block allocation code has a its own way of
      preventing newly freed blocks from getting reused until the commit
      takes place (it avoids updating the buddy bitmaps until the commit is
      done), so we don't need to use jbd2_journal_get_undo_access(), which
      has extra overhead compared to jbd2_journal_get_write_access().
      
      There was one last vestigal use of ext4_journal_get_undo_access() in
      ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
      and then remove the ext4_journal_get_undo_access() support.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2cd05cc3
  24. 05 4月, 2011 1 次提交
  25. 21 3月, 2011 1 次提交
  26. 11 1月, 2011 1 次提交
  27. 27 7月, 2010 1 次提交
  28. 30 6月, 2010 1 次提交
  29. 15 6月, 2010 1 次提交
  30. 12 6月, 2010 1 次提交
    • T
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o 提交于
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a0375156
  31. 17 5月, 2010 1 次提交
  32. 05 3月, 2010 1 次提交
    • J
      ext4: use ext4_get_block_write in buffer write · 744692dc
      Jiaying Zhang 提交于
      Allocate uninitialized extent before ext4 buffer write and
      convert the extent to initialized after io completes.
      The purpose is to make sure an extent can only be marked
      initialized after it has been written with new data so
      we can safely drop the i_mutex lock in ext4 DIO read without
      exposing stale data. This helps to improve multi-thread DIO
      read performance on high-speed disks.
      
      Skip the nobh and data=journal mount cases to make things simple for now.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      744692dc
  33. 09 12月, 2009 2 次提交