1. 06 8月, 2017 1 次提交
    • J
      ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize · fcf5ea10
      Jan Kara 提交于
      ext4_find_unwritten_pgoff() does not properly handle a situation when
      starting index is in the middle of a page and blocksize < pagesize. The
      following command shows the bug on filesystem with 1k blocksize:
      
        xfs_io -f -c "falloc 0 4k" \
                  -c "pwrite 1k 1k" \
                  -c "pwrite 3k 1k" \
                  -c "seek -a -r 0" foo
      
      In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
      SEEK_DATA) will return the correct result.
      
      Fix the problem by neglecting buffers in a page before starting offset.
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      CC: stable@vger.kernel.org # 3.8+
      fcf5ea10
  2. 24 6月, 2017 1 次提交
  3. 20 6月, 2017 1 次提交
  4. 25 5月, 2017 1 次提交
    • E
      ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff() · 624327f8
      Eryu Guan 提交于
      ext4_find_unwritten_pgoff() is used to search for offset of hole or
      data in page range [index, end] (both inclusive), and the max number
      of pages to search should be at least one, if end == index.
      Otherwise the only page is missed and no hole or data is found,
      which is not correct.
      
      When block size is smaller than page size, this can be demonstrated
      by preallocating a file with size smaller than page size and writing
      data to the last block. E.g. run this xfs_io command on a 1k block
      size ext4 on x86_64 host.
      
        # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
        	    -c "seek -d 0" /mnt/ext4/testfile
        wrote 1024/1024 bytes at offset 2048
        1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec)
        Whence  Result
        DATA    EOF
      
      Data at offset 2k was missed, and lseek(2) returned ENXIO.
      
      This is unconvered by generic/285 subtest 07 and 08 on ppc64 host,
      where pagesize is 64k. Because a recent change to generic/285
      reduced the preallocated file size to smaller than 64k.
      Signed-off-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      624327f8
  5. 22 5月, 2017 2 次提交
    • J
      ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff() · 3f1d5bad
      Jan Kara 提交于
      There is an off-by-one error in loop termination conditions in
      ext4_find_unwritten_pgoff() since 'end' may index a page beyond end of
      desired range if 'endoff' is page aligned. It doesn't have any visible
      effects but still it is good to fix it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3f1d5bad
    • J
      ext4: fix SEEK_HOLE · 7d95eddf
      Jan Kara 提交于
      Currently, SEEK_HOLE implementation in ext4 may both return that there's
      a hole at some offset although that offset already has data and skip
      some holes during a search for the next hole. The first problem is
      demostrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
      Whence	Result
      HOLE	0
      
      Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
      a hole although we have written data there. The second problem can be
      demonstrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offsets 56k..128k has been ignored by the
      SEEK_HOLE call.
      
      The underlying problem is in the ext4_find_unwritten_pgoff() which is
      just buggy. In some cases it fails to update returned offset when it
      finds a hole (when no pages are found or when the first found page has
      higher index than expected), in some cases conditions for detecting hole
      are just missing (we fail to detect a situation where indices of
      returned pages are not contiguous).
      
      Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
      indices and also handle all cases where we got less pages then expected
      in one place and handle it properly there.
      
      CC: stable@vger.kernel.org
      Fixes: c8c0df24
      CC: Zheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7d95eddf
  6. 13 5月, 2017 1 次提交
  7. 03 4月, 2017 1 次提交
    • D
      ext4: Add statx support · 99652ea5
      David Howells 提交于
      Return enhanced file attributes from the Ext4 filesystem.  This includes
      the following:
      
       (1) The inode creation time (i_crtime) as stx_btime, setting STATX_BTIME.
      
       (2) Certain FS_xxx_FL flags are mapped to stx_attribute flags.
      
      This requires that all ext4 inodes have a getattr call, not just some of
      them, so to this end, split the ext4_getattr() function and only call part
      of it where appropriate.
      
      Example output:
      
      	[root@andromeda ~]# touch foo
      	[root@andromeda ~]# chattr +ai foo
      	[root@andromeda ~]# /tmp/test-statx foo
      	statx(foo) = 0
      	results=fff
      	  Size: 0               Blocks: 0          IO Block: 4096    regular file
      	Device: 08:12           Inode: 2101950     Links: 1
      	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
      	Access: 2016-02-11 17:08:29.031795451+0000
      	Modify: 2016-02-11 17:08:29.031795451+0000
      	Change: 2016-02-11 17:11:11.987790114+0000
      	 Birth: 2016-02-11 17:08:29.031795451+0000
      	Attributes: 0000000000000030 (-------- -------- -------- -------- -------- -------- -------- --ai----)
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99652ea5
  8. 25 2月, 2017 3 次提交
  9. 23 2月, 2017 2 次提交
  10. 09 2月, 2017 1 次提交
  11. 05 2月, 2017 1 次提交
  12. 27 12月, 2016 1 次提交
  13. 21 11月, 2016 4 次提交
  14. 08 10月, 2016 2 次提交
  15. 30 9月, 2016 2 次提交
  16. 15 9月, 2016 1 次提交
  17. 27 7月, 2016 1 次提交
    • R
      dax: remote unused fault wrappers · 6b524995
      Ross Zwisler 提交于
      Remove the unused wrappers dax_fault() and dax_pmd_fault().  After this
      removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
      dax_pmd_fault() respectively, and update all callers.
      
      The dax_fault() and dax_pmd_fault() wrappers were initially intended to
      capture some filesystem independent functionality around page faults
      (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
      and ctime).
      
      However, the following commits:
      
         5726b27b ("ext2: Add locking for DAX faults")
         ea3d7209 ("ext4: fix races between page faults and hole punching")
      
      added locking to the ext2 and ext4 filesystems after these common
      operations but before __dax_fault() and __dax_pmd_fault() were called.
      This means that these wrappers are no longer used, and are unlikely to
      be used in the future.
      
      XFS has had locking analogous to what was recently added to ext2 and
      ext4 since DAX support was initially introduced by:
      
         6b698ede ("xfs: add DAX file operations support")
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b524995
  18. 11 7月, 2016 1 次提交
  19. 17 5月, 2016 1 次提交
  20. 13 5月, 2016 1 次提交
    • J
      ext4: pre-zero allocated blocks for DAX IO · 12735f88
      Jan Kara 提交于
      Currently ext4 treats DAX IO the same way as direct IO. I.e., it
      allocates unwritten extents before IO is done and converts unwritten
      extents afterwards. However this way DAX IO can race with page fault to
      the same area:
      
      ext4_ext_direct_IO()				dax_fault()
        dax_io()
          get_block() - allocates unwritten extent
          copy_from_iter_pmem()
      						  get_block() - converts
      						    unwritten block to
      						    written and zeroes it
      						    out
        ext4_convert_unwritten_extents()
      
      So data written with DAX IO gets lost. Similarly dax_new_buf() called
      from dax_io() can overwrite data that has been already written to the
      block via mmap.
      
      Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
      use them for DAX mmap. The downside of this solution is that every
      allocating write writes each block twice (once zeros, once data). Fixing
      the race with locking is possible as well however we would need to
      lock-out faults for the whole range written to by DAX IO. And that is
      not easy to do without locking-out faults for the whole file which seems
      too aggressive.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      12735f88
  21. 02 5月, 2016 2 次提交
  22. 27 4月, 2016 1 次提交
  23. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  24. 27 3月, 2016 2 次提交
  25. 10 3月, 2016 1 次提交
    • J
      ext4: more efficient SEEK_DATA implementation · 2d90c160
      Jan Kara 提交于
      Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as
      ext4_seek_data() iterates hole block-by-block. Fix the problem by using
      returned hole size from ext4_map_blocks() and thus skip the hole in one
      go.
      
      Update also SEEK_HOLE implementation to follow the same pattern as
      SEEK_DATA to make future maintenance easier.
      
      Furthermore we add cond_resched() to both ext4_seek_data() and
      ext4_seek_hole() to avoid softlockups in case evil user creates huge
      fragmented file and we have to go through lots of extents.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2d90c160
  26. 09 3月, 2016 1 次提交
    • J
      ext4: use i_mutex to serialize unaligned AIO DIO · e142d052
      Jan Kara 提交于
      Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
      However the code cleanups that happened after 2011 when the lock was
      introduced made aio_mutex acquired at almost the same places where we
      already have exclusion using i_mutex. So just use i_mutex for the
      exclusion of unaligned AIO DIO.
      
      The change moves waiting for pending unwritten extent conversion under
      i_mutex. That makes special handling of O_APPEND writes unnecessary and
      also avoids possible livelocking of unaligned AIO DIO with aligned one
      (nothing was preventing contiguous stream of aligned AIO DIOs to let
      unaligned AIO DIO wait forever).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e142d052
  27. 28 2月, 2016 1 次提交
    • R
      ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite() · 1e9d180b
      Ross Zwisler 提交于
      As it is currently written ext4_dax_mkwrite() assumes that the call into
      __dax_mkwrite() will not have to do a block allocation so it doesn't create
      a journal entry.  For a read that creates a zero page to cover a hole
      followed by a write that actually allocates storage this is incorrect.  The
      ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
      get_blocks() to allocate storage.
      
      Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
      as this function already has all the logic needed to allocate a journal
      entry and call __dax_fault().
      
      Also update the ext2 fault handlers in this same way to remove duplicate
      code and keep the logic between ext2 and ext4 the same.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1e9d180b
  28. 08 2月, 2016 1 次提交
  29. 23 1月, 2016 1 次提交
    • R
      ext4: call dax_pfn_mkwrite() for DAX fsync/msync · d5be7a03
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5be7a03