1. 07 7月, 2018 2 次提交
  2. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  3. 30 5月, 2018 1 次提交
  4. 22 5月, 2018 3 次提交
    • D
      xfs, dax: introduce xfs_break_dax_layouts() · d6dc57e2
      Dan Williams 提交于
      xfs_break_dax_layouts(), similar to xfs_break_leased_layouts(), scans
      for busy / pinned dax pages and waits for those pages to go idle before
      any potential extent unmap operation.
      
      dax_layout_busy_page() handles synchronizing against new page-busy
      events (get_user_pages). It invalidates all mappings to trigger the
      get_user_pages slow path which will eventually block on the xfs inode
      lock held in XFS_MMAPLOCK_EXCL mode. If dax_layout_busy_page() finds a
      busy page it returns it for xfs to wait for the page-idle event that
      will fire when the page reference count reaches 1 (recall ZONE_DEVICE
      pages are idle at count 1, see generic_dax_pagefree()).
      
      While waiting, the XFS_MMAPLOCK_EXCL lock is dropped in order to not
      deadlock the process that might be trying to elevate the page count of
      more pages before arranging for any of them to go idle. I.e. the typical
      case of submitting I/O is that iov_iter_get_pages() elevates the
      reference count of all pages in the I/O before starting I/O on the first
      page. The process of elevating the reference count of all pages involved
      in an I/O may cause faults that need to take XFS_MMAPLOCK_EXCL.
      
      Although XFS_MMAPLOCK_EXCL is dropped while waiting, XFS_IOLOCK_EXCL is
      held while sleeping. We need this to prevent starvation of the truncate
      path as continuous submission of direct-I/O could starve the truncate
      path indefinitely if the lock is dropped.
      
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d6dc57e2
    • D
      xfs: prepare xfs_break_layouts() for another layout type · 69eb5fa1
      Dan Williams 提交于
      When xfs is operating as the back-end of a pNFS block server, it
      prevents collisions between local and remote operations by requiring a
      lease to be held for remotely accessed blocks. Local filesystem
      operations break those leases before writing or mutating the extent map
      of the file.
      
      A similar mechanism is needed to prevent operations on pinned dax
      mappings, like device-DMA, from colliding with extent unmap operations.
      
      BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of
      layout breaking.
      
      Layouts are broken in the BREAK_WRITE case to ensure that layout-holders
      do not collide with local writes. Additionally, layouts are broken in
      the BREAK_UNMAP case to make sure the layout-holder has a consistent
      view of the file's extent map. While BREAK_WRITE breaks can be satisfied
      be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require
      waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL.
      
      After this refactoring xfs_break_layouts() becomes the entry point for
      coordinating both types of breaks. Finally, xfs_break_leased_layouts()
      becomes just the BREAK_WRITE handler.
      
      Note that the unlock tracking is needed in a follow on change. That will
      coordinate retrying either break handler until both successfully test
      for a lease break while maintaining the lock state.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      69eb5fa1
    • D
      xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL · c63a8eae
      Dan Williams 提交于
      In preparation for adding coordination between extent unmap operations
      and busy dax-pages, update xfs_break_layouts() to permit it to be called
      with the mmap lock held. This lock scheme will be required for
      coordinating the break of 'dax layouts' (non-idle dax (ZONE_DEVICE)
      pages mapped into the file's address space). Breaking dax layouts will
      be added to xfs_break_layouts() in a future patch, for now this preps
      the unmap call sites to take and hold XFS_MMAPLOCK_EXCL over the call to
      xfs_break_layouts().
      
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: N"Darrick J. Wong" <darrick.wong@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c63a8eae
  5. 10 5月, 2018 2 次提交
  6. 03 5月, 2018 1 次提交
  7. 18 4月, 2018 1 次提交
    • D
      xfs: prevent creating negative-sized file via INSERT_RANGE · 7d83fb14
      Darrick J. Wong 提交于
      
      During the "insert range" fallocate operation, i_size grows by the
      specified 'len' bytes.  XFS verifies that i_size + len < s_maxbytes, as
      it should.  But this comparison is done using the signed 'loff_t', and
      'i_size + len' can wrap around to a negative value, causing the check to
      incorrectly pass, resulting in an inode with "negative" i_size.  This is
      possible on 64-bit platforms, where XFS sets s_maxbytes = LLONG_MAX.
      ext4 and f2fs don't run into this because they set a smaller s_maxbytes.
      
      Fix it by using subtraction instead.
      
      Reproducer:
          xfs_io -f file -c "truncate $(((1<<63)-1))" -c "finsert 0 4096"
      
      Fixes: a904b1ca ("xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate")
      Cc: <stable@vger.kernel.org> # v4.1+
      Originally-From: Eric Biggers <ebiggers@google.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix signed integer addition overflow too]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7d83fb14
  8. 16 3月, 2018 1 次提交
  9. 15 3月, 2018 1 次提交
  10. 08 1月, 2018 1 次提交
  11. 03 11月, 2017 3 次提交
  12. 27 10月, 2017 1 次提交
  13. 24 10月, 2017 1 次提交
  14. 12 10月, 2017 1 次提交
  15. 27 9月, 2017 1 次提交
  16. 26 9月, 2017 2 次提交
  17. 07 9月, 2017 1 次提交
    • R
      dax: use common 4k zero page for dax mmap reads · 91d25ba8
      Ross Zwisler 提交于
      When servicing mmap() reads from file holes the current DAX code
      allocates a page cache page of all zeroes and places the struct page
      pointer in the mapping->page_tree radix tree.
      
      This has three major drawbacks:
      
      1) It consumes memory unnecessarily. For every 4k page that is read via
         a DAX mmap() over a hole, we allocate a new page cache page. This
         means that if you read 1GiB worth of pages, you end up using 1GiB of
         zeroed memory. This is easily visible by looking at the overall
         memory consumption of the system or by looking at /proc/[pid]/smaps:
      
      	7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:             1048576 kB
      	Pss:             1048576 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:   1048576 kB
      	Private_Dirty:         0 kB
      	Referenced:      1048576 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      2) It is slower than using a common zero page because each page fault
         has more work to do. Instead of just inserting a common zero page we
         have to allocate a page cache page, zero it, and then insert it. Here
         are the average latencies of dax_load_hole() as measured by ftrace on
         a random test box:
      
          Old method, using zeroed page cache pages:	3.4 us
          New method, using the common 4k zero page:	0.8 us
      
         This was the average latency over 1 GiB of sequential reads done by
         this simple fio script:
      
           [global]
           size=1G
           filename=/root/dax/data
           fallocate=none
           [io]
           rw=read
           ioengine=mmap
      
      3) The fact that we had to check for both DAX exceptional entries and
         for page cache pages in the radix tree made the DAX code more
         complex.
      
      Solve these issues by following the lead of the DAX PMD code and using a
      common 4k zero page instead.  As with the PMD code we will now insert a
      DAX exceptional entry into the radix tree instead of a struct page
      pointer which allows us to remove all the special casing in the DAX
      code.
      
      Note that we do still pretty aggressively check for regular pages in the
      DAX radix tree, especially where we take action based on the bits set in
      the page.  If we ever find a regular page in our radix tree now that
      most likely means that someone besides DAX is inserting pages (which has
      happened lots of times in the past), and we want to find that out early
      and fail loudly.
      
      This solution also removes the extra memory consumption.  Here is that
      same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
      code:
      
      	7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:                   0 kB
      	Pss:                   0 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:         0 kB
      	Private_Dirty:         0 kB
      	Referenced:            0 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      Overall system memory consumption is similarly improved.
      
      Another major change is that we remove dax_pfn_mkwrite() from our fault
      flow, and instead rely on the page fault itself to make the PTE dirty
      and writeable.  The following description from the patch adding the
      vm_insert_mixed_mkwrite() call explains this a little more:
      
         "To be able to use the common 4k zero page in DAX we need to have our
          PTE fault path look more like our PMD fault path where a PTE entry
          can be marked as dirty and writeable as it is first inserted rather
          than waiting for a follow-up dax_pfn_mkwrite() =>
          finish_mkwrite_fault() call.
      
          Right now we can rely on having a dax_pfn_mkwrite() call because we
          can distinguish between these two cases in do_wp_page():
      
                  case 1: 4k zero page => writable DAX storage
                  case 2: read-only DAX storage => writeable DAX storage
      
          This distinction is made by via vm_normal_page(). vm_normal_page()
          returns false for the common 4k zero page, though, just as it does
          for DAX ptes. Instead of special casing the DAX + 4k zero page case
          we will simplify our DAX PTE page fault sequence so that it matches
          our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
          We will instead use dax_iomap_fault() to handle write-protection
          faults.
      
          This means that insert_pfn() needs to follow the lead of
          insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
          'mkwrite' is set insert_pfn() will do the work that was previously
          done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91d25ba8
  18. 05 9月, 2017 1 次提交
  19. 02 9月, 2017 2 次提交
  20. 06 7月, 2017 1 次提交
  21. 03 7月, 2017 1 次提交
  22. 28 6月, 2017 1 次提交
  23. 21 6月, 2017 1 次提交
  24. 20 6月, 2017 1 次提交
  25. 26 5月, 2017 4 次提交
  26. 28 2月, 2017 1 次提交
  27. 25 2月, 2017 3 次提交