1. 09 3月, 2016 1 次提交
    • J
      ext4: use i_mutex to serialize unaligned AIO DIO · e142d052
      Jan Kara 提交于
      Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
      However the code cleanups that happened after 2011 when the lock was
      introduced made aio_mutex acquired at almost the same places where we
      already have exclusion using i_mutex. So just use i_mutex for the
      exclusion of unaligned AIO DIO.
      
      The change moves waiting for pending unwritten extent conversion under
      i_mutex. That makes special handling of O_APPEND writes unnecessary and
      also avoids possible livelocking of unaligned AIO DIO with aligned one
      (nothing was preventing contiguous stream of aligned AIO DIOs to let
      unaligned AIO DIO wait forever).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e142d052
  2. 08 2月, 2016 1 次提交
  3. 23 1月, 2016 2 次提交
    • R
      ext4: call dax_pfn_mkwrite() for DAX fsync/msync · d5be7a03
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5be7a03
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  4. 08 12月, 2015 2 次提交
    • J
      ext4: use pre-zeroed blocks for DAX page faults · ba5843f5
      Jan Kara 提交于
      Make DAX fault path use pre-zeroed blocks to avoid races with extent
      conversion and zeroing when two page faults to the same block happen.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ba5843f5
    • J
      ext4: fix races between page faults and hole punching · ea3d7209
      Jan Kara 提交于
      Currently, page faults and hole punching are completely unsynchronized.
      This can result in page fault faulting in a page into a range that we
      are punching after truncate_pagecache_range() has been called and thus
      we can end up with a page mapped to disk blocks that will be shortly
      freed. Filesystem corruption will shortly follow. Note that the same
      race is avoided for truncate by checking page fault offset against
      i_size but there isn't similar mechanism available for punching holes.
      
      Fix the problem by creating new rw semaphore i_mmap_sem in inode and
      grab it for writing over truncate, hole punching, and other functions
      removing blocks from extent tree and for read over page faults. We
      cannot easily use i_data_sem for this since that ranks below transaction
      start and we need something ranking above it so that it can be held over
      the whole truncate / hole punching operation. Also remove various
      workarounds we had in the code to reduce race window when page fault
      could have created pages with stale mapping information.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ea3d7209
  5. 09 9月, 2015 5 次提交
  6. 04 6月, 2015 1 次提交
    • D
      dax: don't abuse get_block mapping for endio callbacks · e842f290
      Dave Chinner 提交于
      dax_fault() currently relies on the get_block callback to attach an
      io completion callback to the mapping buffer head so that it can
      run unwritten extent conversion after zeroing allocated blocks.
      
      Instead of this hack, pass the conversion callback directly into
      dax_fault() similar to the get_block callback. When the filesystem
      allocates unwritten extents, it will set the buffer_unwritten()
      flag, and hence the dax_fault code can call the completion function
      in the contexts where it is necessary without overloading the
      mapping buffer head.
      
      Note: The changes to ext4 to use this interface are suspect at best.
      In fact, the way ext4 did this end_io assignment in the first place
      looks suspect because it only set a completion callback when there
      wasn't already some other write() call taking place on the same
      inode. The ext4 end_io code looks rather intricate and fragile with
      all it's reference counting and passing to different contexts for
      modification via inode private pointers that aren't protected by
      locks...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e842f290
  7. 01 6月, 2015 1 次提交
  8. 19 5月, 2015 1 次提交
    • T
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o 提交于
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7236e21
  9. 16 4月, 2015 2 次提交
  10. 12 4月, 2015 7 次提交
  11. 03 4月, 2015 1 次提交
  12. 26 3月, 2015 1 次提交
  13. 17 2月, 2015 1 次提交
  14. 11 2月, 2015 1 次提交
  15. 03 1月, 2015 1 次提交
  16. 03 12月, 2014 1 次提交
  17. 30 10月, 2014 1 次提交
    • D
      ext4: prevent bugon on race between write/fcntl · a41537e6
      Dmitry Monakhov 提交于
      O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked
      twice inside ext4_file_write_iter() and __generic_file_write() which
      result in BUG_ON inside ext4_direct_IO.
      
      Let's initialize iocb->private unconditionally.
      
      TESTCASE: xfstest:generic/036  https://patchwork.ozlabs.org/patch/402445/
      
      #TYPICAL STACK TRACE:
      kernel BUG at fs/ext4/inode.c:2960!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
      task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000
      RIP: 0010:[<ffffffff811fabf2>]  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
      RSP: 0018:ffff88080f90bb58  EFLAGS: 00010246
      RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818
      RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001
      RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80
      R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28
      FS:  00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0
      Stack:
       ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30
       0000000000000200 0000000000000200 0000000000000001 0000000000000200
       ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200
      Call Trace:
       [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180
       [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370
       [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400
       [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
       [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700
       [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0
       [<ffffffff811bd756>] aio_run_iocb+0x286/0x410
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0
       [<ffffffff811bde3b>] do_io_submit+0x55b/0x740
       [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740
       [<ffffffff811be030>] SyS_io_submit+0x10/0x20
       [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b
      Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c
      24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89
      RIP  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
       RSP <ffff88080f90bb58>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      a41537e6
  18. 15 7月, 2014 1 次提交
  19. 12 6月, 2014 1 次提交
    • A
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro 提交于
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d020765
  20. 13 5月, 2014 1 次提交
  21. 12 5月, 2014 1 次提交
  22. 07 5月, 2014 2 次提交
  23. 22 4月, 2014 4 次提交