1. 22 6月, 2017 1 次提交
  2. 25 5月, 2017 1 次提交
  3. 22 5月, 2017 1 次提交
  4. 09 5月, 2017 2 次提交
    • M
      mm: introduce kv[mz]alloc helpers · a7c3e901
      Michal Hocko 提交于
      Patch series "kvmalloc", v5.
      
      There are many open coded kmalloc with vmalloc fallback instances in the
      tree.  Most of them are not careful enough or simply do not care about
      the underlying semantic of the kmalloc/page allocator which means that
      a) some vmalloc fallbacks are basically unreachable because the kmalloc
      part will keep retrying until it succeeds b) the page allocator can
      invoke a really disruptive steps like the OOM killer to move forward
      which doesn't sound appropriate when we consider that the vmalloc
      fallback is available.
      
      As it can be seen implementing kvmalloc requires quite an intimate
      knowledge if the page allocator and the memory reclaim internals which
      strongly suggests that a helper should be implemented in the memory
      subsystem proper.
      
      Most callers, I could find, have been converted to use the helper
      instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
      in the networking stack which I have converted as well and Eric Dumazet
      was not opposed [2] to convert them as well.
      
      [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
      
      This patch (of 9):
      
      Using kmalloc with the vmalloc fallback for larger allocations is a
      common pattern in the kernel code.  Yet we do not have any common helper
      for that and so users have invented their own helpers.  Some of them are
      really creative when doing so.  Let's just add kv[mz]alloc and make sure
      it is implemented properly.  This implementation makes sure to not make
      a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
      to not warn about allocation failures.  This also rules out the OOM
      killer as the vmalloc is a more approapriate fallback than a disruptive
      user visible action.
      
      This patch also changes some existing users and removes helpers which
      are specific for them.  In some cases this is not possible (e.g.
      ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
      require GFP_NO{FS,IO} context which is not vmalloc compatible in general
      (note that the page table allocation is GFP_KERNEL).  Those need to be
      fixed separately.
      
      While we are at it, document that __vmalloc{_node} about unsupported gfp
      mask because there seems to be a lot of confusion out there.
      kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
      superset) flags to catch new abusers.  Existing ones would have to die
      slowly.
      
      [sfr@canb.auug.org.au: f2fs fixup]
        Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7c3e901
    • D
      block, dax: move "select DAX" from BLOCK to FS_DAX · ef510424
      Dan Williams 提交于
      For configurations that do not enable DAX filesystems or drivers, do not
      require the DAX core to be built.
      
      Given that the 'direct_access' method has been removed from
      'block_device_operations', we can also go ahead and remove the
      block-related dax helper functions from fs/block_dev.c to
      drivers/dax/super.c. This keeps dax details out of the block layer and
      lets the DAX core be built as a module in the FS_DAX=n case.
      
      Filesystems need to include dax.h to call bdev_dax_supported().
      
      Cc: linux-xfs@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ef510424
  5. 04 5月, 2017 1 次提交
    • J
      ext4: mark superblock writes synchronous for nobarrier mounts · 00473374
      Jan Kara 提交于
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_FUA implementation.
      generic_make_request_checks() however strips REQ_FUA flag from a bio
      when the storage doesn't report volatile write cache and thus write
      effectively becomes asynchronous which can lead to performance
      regressions. This affects superblock writes for ext4. Fix the problem
      by marking superblock writes always as synchronous.
      
      Fixes: b685d3d6
      CC: linux-ext4@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      00473374
  6. 30 4月, 2017 3 次提交
  7. 24 4月, 2017 1 次提交
  8. 19 4月, 2017 1 次提交
    • J
      ext4: Set flags on quota files directly · 957153fc
      Jan Kara 提交于
      Currently immutable and noatime flags on quota files are set by quota
      code which requires us to copy inode->i_flags to our on disk version of
      quota flags in GETFLAGS ioctl and ext4_do_update_inode(). Move to
      setting / clearing these on-disk flags directly to save that copying.
      Signed-off-by: NJan Kara <jack@suse.cz>
      957153fc
  9. 16 3月, 2017 1 次提交
    • E
      fscrypt: eliminate ->prepare_context() operation · 94840e3c
      Eric Biggers 提交于
      The only use of the ->prepare_context() fscrypt operation was to allow
      ext4 to evict inline data from the inode before ->set_context().
      However, there is no reason why this cannot be done as simply the first
      step in ->set_context(), and in fact it makes more sense to do it that
      way because then the policy modes and flags get validated before any
      real work is done.  Therefore, merge ext4_prepare_context() into
      ext4_set_context(), and remove ->prepare_context().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      94840e3c
  10. 15 2月, 2017 1 次提交
  11. 10 2月, 2017 1 次提交
  12. 08 2月, 2017 1 次提交
  13. 06 2月, 2017 1 次提交
  14. 05 2月, 2017 3 次提交
  15. 12 1月, 2017 1 次提交
  16. 08 1月, 2017 1 次提交
  17. 25 12月, 2016 1 次提交
  18. 11 12月, 2016 1 次提交
    • S
      ext4: do not perform data journaling when data is encrypted · 73b92a2a
      Sergey Karamov 提交于
      Currently data journalling is incompatible with encryption: enabling both
      at the same time has never been supported by design, and would result in
      unpredictable behavior. However, users are not precluded from turning on
      both features simultaneously. This change programmatically replaces data
      journaling for encrypted regular files with ordered data journaling mode.
      
      Background:
      Journaling encrypted data has not been supported because it operates on
      buffer heads of the page in the page cache. Namely, when the commit
      happens, which could be up to five seconds after caching, the commit
      thread uses the buffer heads attached to the page to copy the contents of
      the page to the journal. With encryption, it would have been required to
      keep the bounce buffer with ciphertext for up to the aforementioned five
      seconds, since the page cache can only hold plaintext and could not be
      used for journaling. Alternatively, it would be required to setup the
      journal to initiate a callback at the commit time to perform deferred
      encryption - in this case, not only would the data have to be written
      twice, but it would also have to be encrypted twice. This level of
      complexity was not justified for a mode that in practice is very rarely
      used because of the overhead from the data journalling.
      
      Solution:
      If data=journaled has been set as a mount option for a filesystem, or if
      journaling is enabled on a regular file, do not perform journaling if the
      file is also encrypted, instead fall back to the data=ordered mode for the
      file.
      
      Rationale:
      The intent is to allow seamless and proper filesystem operation when
      journaling and encryption have both been enabled, and have these two
      conflicting features gracefully resolved by the filesystem.
      
      Fixes: 44614711Signed-off-by: NSergey Karamov <skaramov@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      73b92a2a
  19. 06 12月, 2016 1 次提交
  20. 04 12月, 2016 1 次提交
  21. 02 12月, 2016 1 次提交
    • E
      ext4: validate s_first_meta_bg at mount time · 3a4b77cd
      Eryu Guan 提交于
      Ralf Spenneberg reported that he hit a kernel crash when mounting a
      modified ext4 image. And it turns out that kernel crashed when
      calculating fs overhead (ext4_calculate_overhead()), this is because
      the image has very large s_first_meta_bg (debug code shows it's
      842150400), and ext4 overruns the memory in count_overhead() when
      setting bitmap buffer, which is PAGE_SIZE.
      
      ext4_calculate_overhead():
        buf = get_zeroed_page(GFP_NOFS);  <=== PAGE_SIZE buffer
        blks = count_overhead(sb, i, buf);
      
      count_overhead():
        for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
                ext4_set_bit(EXT4_B2C(sbi, s++), buf);   <=== buffer overrun
                count++;
        }
      
      This can be reproduced easily for me by this script:
      
        #!/bin/bash
        rm -f fs.img
        mkdir -p /mnt/ext4
        fallocate -l 16M fs.img
        mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
        debugfs -w -R "ssv first_meta_bg 842150400" fs.img
        mount -o loop fs.img /mnt/ext4
      
      Fix it by validating s_first_meta_bg first at mount time, and
      refusing to mount if its value exceeds the largest possible meta_bg
      number.
      Reported-by: NRalf Spenneberg <ralf@os-t.de>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      3a4b77cd
  22. 27 11月, 2016 1 次提交
  23. 22 11月, 2016 1 次提交
    • E
      ext4: avoid lockdep warning when inheriting encryption context · 2f8f5e76
      Eric Biggers 提交于
      On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep
      warning when run on ext4 mounted with -o test_dummy_encryption:
      
          xfs_io/4594 is trying to acquire lock:
           (jbd2_handle
          ){++++.+}, at:
          [<ffffffff813096ef>] jbd2_log_wait_commit+0x5/0x11b
      
          but task is already holding lock:
           (jbd2_handle
          ){++++.+}, at:
          [<ffffffff813000de>] start_this_handle+0x354/0x3d8
      
      The abbreviated call stack is:
      
       [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b
       [<ffffffff8130972a>] jbd2_log_wait_commit+0x40/0x11b
       [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b
       [<ffffffff8130987b>] ? __jbd2_journal_force_commit+0x76/0xa6
       [<ffffffff81309896>] __jbd2_journal_force_commit+0x91/0xa6
       [<ffffffff813098b9>] jbd2_journal_force_commit_nested+0xe/0x18
       [<ffffffff812a6049>] ext4_should_retry_alloc+0x72/0x79
       [<ffffffff812f0c1f>] ext4_xattr_set+0xef/0x11f
       [<ffffffff812cc35b>] ext4_set_context+0x3a/0x16b
       [<ffffffff81258123>] fscrypt_inherit_context+0xe3/0x103
       [<ffffffff812ab611>] __ext4_new_inode+0x12dc/0x153a
       [<ffffffff812bd371>] ext4_create+0xb7/0x161
      
      When a file is created in an encrypted directory, ext4_set_context() is
      called to set an encryption context on the new file.  This calls
      ext4_xattr_set(), which contains a retry loop where the journal is
      forced to commit if an ENOSPC error is encountered.
      
      If the task actually were to wait for the journal to commit in this
      case, then it would deadlock because a handle remains open from
      __ext4_new_inode(), so the running transaction can't be committed yet.
      Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not
      allowing the running transaction to be committed while the current task
      has it open.  However, the above lockdep warning is still triggered.
      
      This was a false positive which was introduced by: 1eaa566d: jbd2:
      track more dependencies on transaction commit
      
      Fix the problem by passing the handle through the 'fs_data' argument to
      ext4_set_context(), then using ext4_xattr_set_handle() instead of
      ext4_xattr_set().  And in the case where no journal handle is specified
      and ext4_set_context() has to open one, add an ENOSPC retry loop since
      in that case it is the outermost transaction.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      2f8f5e76
  24. 21 11月, 2016 1 次提交
  25. 20 11月, 2016 1 次提交
  26. 19 11月, 2016 4 次提交
  27. 15 11月, 2016 1 次提交
    • D
      ext4: use current_time() for inode timestamps · eeca7ea1
      Deepa Dinamani 提交于
      CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
      current_time() will be transitioned to be y2038 safe
      along with vfs.
      
      current_time() returns timestamps according to the
      granularities set in the super_block.
      The granularity check in ext4_current_time() to call
      current_time() or CURRENT_TIME_SEC is not required.
      Use current_time() directly to obtain timestamps
      unconditionally, and remove ext4_current_time().
      
      Quota files are assumed to be on the same filesystem.
      Hence, use current_time() for these files as well.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      eeca7ea1
  28. 14 11月, 2016 2 次提交
    • T
      ext4: don't lock buffer in ext4_commit_super if holding spinlock · 1566a48a
      Theodore Ts'o 提交于
      If there is an error reported in mballoc via ext4_grp_locked_error(),
      the code is holding a spinlock, so ext4_commit_super() must not try to
      lock the buffer head, or else it will trigger a BUG:
      
        BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
        in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount
        CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
         ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166
         ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153
         ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0
        Call Trace:
          [<ffffffff81318c89>] dump_stack+0x67/0x9e
          [<ffffffff810810b0>] ___might_sleep+0xf0/0x140
          [<ffffffff81081153>] __might_sleep+0x53/0xb0
          [<ffffffff8126c1dc>] ext4_commit_super+0x19c/0x290
          [<ffffffff8126e61a>] __ext4_grp_locked_error+0x14a/0x230
          [<ffffffff81081153>] ? __might_sleep+0x53/0xb0
          [<ffffffff812822be>] ext4_mb_generate_buddy+0x1de/0x320
      
      Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0
      (and it is the only caller which does so), avoid locking and unlocking
      the buffer in this case.
      
      This can result in races with ext4_commit_super() if there are other
      problems (which is what commit 4743f839 was trying to address),
      but a Warning is better than BUG.
      
      Fixes: 4743f839
      Cc: stable@vger.kernel.org # 4.9
      Reported-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      1566a48a
    • T
      ext4: allow ext4_truncate() to return an error · 2c98eb5e
      Theodore Ts'o 提交于
      This allows us to properly propagate errors back up to
      ext4_truncate()'s callers.  This also means we no longer have to
      silently ignore some errors (e.g., when trying to add the inode to the
      orphan inode list).
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      2c98eb5e
  29. 01 11月, 2016 1 次提交
  30. 13 10月, 2016 1 次提交
  31. 30 9月, 2016 1 次提交