1. 02 5月, 2016 1 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 27 3月, 2016 2 次提交
  4. 10 3月, 2016 1 次提交
    • J
      ext4: more efficient SEEK_DATA implementation · 2d90c160
      Jan Kara 提交于
      Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as
      ext4_seek_data() iterates hole block-by-block. Fix the problem by using
      returned hole size from ext4_map_blocks() and thus skip the hole in one
      go.
      
      Update also SEEK_HOLE implementation to follow the same pattern as
      SEEK_DATA to make future maintenance easier.
      
      Furthermore we add cond_resched() to both ext4_seek_data() and
      ext4_seek_hole() to avoid softlockups in case evil user creates huge
      fragmented file and we have to go through lots of extents.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2d90c160
  5. 09 3月, 2016 1 次提交
    • J
      ext4: use i_mutex to serialize unaligned AIO DIO · e142d052
      Jan Kara 提交于
      Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
      However the code cleanups that happened after 2011 when the lock was
      introduced made aio_mutex acquired at almost the same places where we
      already have exclusion using i_mutex. So just use i_mutex for the
      exclusion of unaligned AIO DIO.
      
      The change moves waiting for pending unwritten extent conversion under
      i_mutex. That makes special handling of O_APPEND writes unnecessary and
      also avoids possible livelocking of unaligned AIO DIO with aligned one
      (nothing was preventing contiguous stream of aligned AIO DIOs to let
      unaligned AIO DIO wait forever).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e142d052
  6. 28 2月, 2016 1 次提交
    • R
      ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite() · 1e9d180b
      Ross Zwisler 提交于
      As it is currently written ext4_dax_mkwrite() assumes that the call into
      __dax_mkwrite() will not have to do a block allocation so it doesn't create
      a journal entry.  For a read that creates a zero page to cover a hole
      followed by a write that actually allocates storage this is incorrect.  The
      ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
      get_blocks() to allocate storage.
      
      Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
      as this function already has all the logic needed to allocate a journal
      entry and call __dax_fault().
      
      Also update the ext2 fault handlers in this same way to remove duplicate
      code and keep the logic between ext2 and ext4 the same.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1e9d180b
  7. 08 2月, 2016 1 次提交
  8. 23 1月, 2016 2 次提交
    • R
      ext4: call dax_pfn_mkwrite() for DAX fsync/msync · d5be7a03
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5be7a03
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  9. 08 12月, 2015 2 次提交
    • J
      ext4: use pre-zeroed blocks for DAX page faults · ba5843f5
      Jan Kara 提交于
      Make DAX fault path use pre-zeroed blocks to avoid races with extent
      conversion and zeroing when two page faults to the same block happen.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ba5843f5
    • J
      ext4: fix races between page faults and hole punching · ea3d7209
      Jan Kara 提交于
      Currently, page faults and hole punching are completely unsynchronized.
      This can result in page fault faulting in a page into a range that we
      are punching after truncate_pagecache_range() has been called and thus
      we can end up with a page mapped to disk blocks that will be shortly
      freed. Filesystem corruption will shortly follow. Note that the same
      race is avoided for truncate by checking page fault offset against
      i_size but there isn't similar mechanism available for punching holes.
      
      Fix the problem by creating new rw semaphore i_mmap_sem in inode and
      grab it for writing over truncate, hole punching, and other functions
      removing blocks from extent tree and for read over page faults. We
      cannot easily use i_data_sem for this since that ranks below transaction
      start and we need something ranking above it so that it can be held over
      the whole truncate / hole punching operation. Also remove various
      workarounds we had in the code to reduce race window when page fault
      could have created pages with stale mapping information.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ea3d7209
  10. 09 9月, 2015 5 次提交
  11. 04 6月, 2015 1 次提交
    • D
      dax: don't abuse get_block mapping for endio callbacks · e842f290
      Dave Chinner 提交于
      dax_fault() currently relies on the get_block callback to attach an
      io completion callback to the mapping buffer head so that it can
      run unwritten extent conversion after zeroing allocated blocks.
      
      Instead of this hack, pass the conversion callback directly into
      dax_fault() similar to the get_block callback. When the filesystem
      allocates unwritten extents, it will set the buffer_unwritten()
      flag, and hence the dax_fault code can call the completion function
      in the contexts where it is necessary without overloading the
      mapping buffer head.
      
      Note: The changes to ext4 to use this interface are suspect at best.
      In fact, the way ext4 did this end_io assignment in the first place
      looks suspect because it only set a completion callback when there
      wasn't already some other write() call taking place on the same
      inode. The ext4 end_io code looks rather intricate and fragile with
      all it's reference counting and passing to different contexts for
      modification via inode private pointers that aren't protected by
      locks...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e842f290
  12. 01 6月, 2015 1 次提交
  13. 19 5月, 2015 1 次提交
    • T
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o 提交于
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7236e21
  14. 16 4月, 2015 2 次提交
  15. 12 4月, 2015 7 次提交
  16. 03 4月, 2015 1 次提交
  17. 26 3月, 2015 1 次提交
  18. 17 2月, 2015 1 次提交
  19. 11 2月, 2015 1 次提交
  20. 03 1月, 2015 1 次提交
  21. 03 12月, 2014 1 次提交
  22. 30 10月, 2014 1 次提交
    • D
      ext4: prevent bugon on race between write/fcntl · a41537e6
      Dmitry Monakhov 提交于
      O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked
      twice inside ext4_file_write_iter() and __generic_file_write() which
      result in BUG_ON inside ext4_direct_IO.
      
      Let's initialize iocb->private unconditionally.
      
      TESTCASE: xfstest:generic/036  https://patchwork.ozlabs.org/patch/402445/
      
      #TYPICAL STACK TRACE:
      kernel BUG at fs/ext4/inode.c:2960!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
      task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000
      RIP: 0010:[<ffffffff811fabf2>]  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
      RSP: 0018:ffff88080f90bb58  EFLAGS: 00010246
      RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818
      RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001
      RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80
      R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28
      FS:  00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0
      Stack:
       ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30
       0000000000000200 0000000000000200 0000000000000001 0000000000000200
       ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200
      Call Trace:
       [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180
       [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370
       [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400
       [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
       [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700
       [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0
       [<ffffffff811bd756>] aio_run_iocb+0x286/0x410
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0
       [<ffffffff811bde3b>] do_io_submit+0x55b/0x740
       [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740
       [<ffffffff811be030>] SyS_io_submit+0x10/0x20
       [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b
      Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c
      24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89
      RIP  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
       RSP <ffff88080f90bb58>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      a41537e6
  23. 15 7月, 2014 1 次提交
  24. 12 6月, 2014 1 次提交
    • A
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro 提交于
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d020765
  25. 13 5月, 2014 1 次提交
  26. 12 5月, 2014 1 次提交