1. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  2. 07 9月, 2017 4 次提交
  3. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  4. 11 7月, 2017 1 次提交
  5. 06 7月, 2017 2 次提交
  6. 05 7月, 2017 1 次提交
    • A
      fs: generic_block_bmap(): initialize all of the fields in the temp bh · 2a527d68
      Alexander Potapenko 提交于
      KMSAN (KernelMemorySanitizer, a new error detection tool) reports the
      use of uninitialized memory in ext4_update_bh_state():
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory
      CPU: 3 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0-rc6+ #597
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
       0000000000000282 ffff88003cc96f68 ffffffff81f30856 0000003000000008
       ffff88003cc96f78 0000000000000096 ffffffff8169742a ffff88003cc96ff8
       ffffffff812fc1fc 0000000000000008 ffff88003a1980e8 0000000100000000
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f30856>] dump_stack+0xa6/0xc0 lib/dump_stack.c:51
       [<ffffffff812fc1fc>] kmsan_report+0x1ec/0x300 mm/kmsan/kmsan.c:?
       [<ffffffff812fc33b>] __msan_warning+0x2b/0x40 ??:?
       [<     inline     >] ext4_update_bh_state fs/ext4/inode.c:727
       [<ffffffff8169742a>] _ext4_get_block+0x6ca/0x8a0 fs/ext4/inode.c:759
       [<ffffffff81696d4c>] ext4_get_block+0x8c/0xa0 fs/ext4/inode.c:769
       [<ffffffff814a2d36>] generic_block_bmap+0x246/0x2b0 fs/buffer.c:2991
       [<ffffffff816ca30e>] ext4_bmap+0x5ee/0x660 fs/ext4/inode.c:3177
      ...
      origin description: ----tmp@generic_block_bmap
      ==================================================================
      
      (the line numbers are relative to 4.8-rc6, but the bug persists
      upstream)
      
      The local |tmp| is created in generic_block_bmap() and then passed into
      ext4_bmap() => ext4_get_block() => _ext4_get_block() =>
      ext4_update_bh_state(). Along the way tmp.b_page is never initialized
      before ext4_update_bh_state() checks its value.
      
      [ Use the approach suggested by Kees Cook of initializing the whole bh
        structure.]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2a527d68
  7. 03 7月, 2017 1 次提交
    • A
      vfs: Add page_cache_seek_hole_data helper · 334fd34d
      Andreas Gruenbacher 提交于
      Both ext4 and xfs implement seeking for the next hole or piece of data
      in unwritten extents by scanning the page cache, and both versions share
      the same bug when iterating the buffers of a page: the start offset into
      the page isn't taken into account, so when a page fits more than two
      filesystem blocks, things will go wrong.  For example, on a filesystem
      with a block size of 1k, the following command will fail:
      
        xfs_io -f -c "falloc 0 4k" \
                  -c "pwrite 1k 1k" \
                  -c "pwrite 3k 1k" \
                  -c "seek -a -r 0" foo
      
      In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
      SEEK_DATA) will return the correct result.
      
      Introduce a generic vfs helper for seeking in the page cache that gets
      this right.  The next commits will replace the filesystem specific
      implementations.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      [hch: dropped the export]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      334fd34d
  8. 28 6月, 2017 1 次提交
  9. 09 6月, 2017 1 次提交
  10. 09 5月, 2017 1 次提交
  11. 27 4月, 2017 1 次提交
    • E
      fs: remove _submit_bh() · 020c2833
      Eric Biggers 提交于
      _submit_bh() allowed submitting a buffer_head for I/O using custom
      bio_flags.  It used to be used by jbd to set BIO_SNAP_STABLE, introduced
      by commit 71368511 ("mm: make snapshotting pages for stable writes a
      per-bio operation").  However, the code and flag has since been removed
      and no _submit_bh() users remain.
      
      These days, bio_flags are mostly used internally by the block layer to
      track the state of bio's.  As such, it doesn't really make sense for
      filesystems to use them instead of op_flags when wanting special
      behavior for block requests.
      
      Therefore, remove _submit_bh() and trim the bio_flags argument from
      submit_bh_wbc().
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      020c2833
  12. 02 3月, 2017 1 次提交
  13. 28 2月, 2017 1 次提交
  14. 03 1月, 2017 1 次提交
  15. 10 11月, 2016 1 次提交
  16. 05 11月, 2016 3 次提交
  17. 03 11月, 2016 1 次提交
  18. 01 11月, 2016 1 次提交
  19. 28 10月, 2016 1 次提交
    • C
      block: better op and flags encoding · ef295ecf
      Christoph Hellwig 提交于
      Now that we don't need the common flags to overflow outside the range
      of a 32-bit type we can encode them the same way for both the bio and
      request fields.  This in addition allows us to place the operation
      first (and make some room for more ops while we're at it) and to
      stop having to shift around the operation values.
      
      In addition this allows passing around only one value in the block layer
      instead of two (and eventuall also in the file systems, but we can do
      that later) and thus clean up a lot of code.
      
      Last but not least this allows decreasing the size of the cmd_flags
      field in struct request to 32-bits.  Various functions passing this
      value could also be updated, but I'd like to avoid the churn for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ef295ecf
  20. 12 10月, 2016 1 次提交
  21. 28 9月, 2016 1 次提交
  22. 21 7月, 2016 1 次提交
  23. 27 6月, 2016 1 次提交
    • B
      fs: export __block_write_full_page · b4bba389
      Benjamin Marzinski 提交于
      gfs2 needs to be able to skip the check to see if a page is outside of
      the file size when writing it out. gfs2 can get into a situation where
      it needs to flush its in-memory log to disk while a truncate is in
      progress. If the file being trucated has data journaling enabled, it is
      possible that there are data blocks in the log that are past the end of
      the file. gfs can't finish the log flush without either writing these
      blocks out or revoking them. Otherwise, if the node crashed, it could
      overwrite subsequent changes made by other nodes in the cluster when
      it's journal was replayed.
      
      Unfortunately, there is no way to add log entries to the log during a
      flush. So gfs2 simply writes out the page instead. This situation can
      only occur when the truncate code still has the file locked exclusively,
      and hasn't marked this block as free in the metadata (which happens
      later in truc_dealloc).  After gfs2 writes this page out, the truncation
      code will shortly invalidate it and write out any revokes if necessary.
      
      In order to make this work, gfs2 needs to be able to skip the check for
      writes outside the file size. Since the check exists in
      block_write_full_page, this patch exports __block_write_full_page, which
      doesn't have the check.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b4bba389
  24. 21 6月, 2016 1 次提交
    • C
      fs: introduce iomap infrastructure · ae259a9c
      Christoph Hellwig 提交于
      Add infrastructure for multipage buffered writes.  This is implemented
      using an main iterator that applies an actor function to a range that
      can be written.
      
      This infrastucture is used to implement a buffered write helper, one
      to zero file ranges and one to implement the ->page_mkwrite VM
      operations.  All of them borrow a fair amount of code from fs/buffers.
      for now by using an internal version of __block_write_begin that
      gets passed an iomap and builds the corresponding buffer head.
      
      The file system is gets a set of paired ->iomap_begin and ->iomap_end
      calls which allow it to map/reserve a range and get a notification
      once the write code is finished with it.
      
      Based on earlier code from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ae259a9c
  25. 08 6月, 2016 3 次提交
  26. 20 5月, 2016 1 次提交
    • M
      mm, page_alloc: avoid looking up the first zone in a zonelist twice · c33d6c06
      Mel Gorman 提交于
      The allocator fast path looks up the first usable zone in a zonelist and
      then get_page_from_freelist does the same job in the zonelist iterator.
      This patch preserves the necessary information.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              fastmark-v1r20             initonce-v1r20
        Min      alloc-odr0-1               364.00 (  0.00%)           359.00 (  1.37%)
        Min      alloc-odr0-2               262.00 (  0.00%)           260.00 (  0.76%)
        Min      alloc-odr0-4               214.00 (  0.00%)           214.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           165.00 (  0.00%)
        Min      alloc-odr0-64              161.00 (  0.00%)           162.00 ( -0.62%)
        Min      alloc-odr0-128             159.00 (  0.00%)           161.00 ( -1.26%)
        Min      alloc-odr0-256             168.00 (  0.00%)           170.00 ( -1.19%)
        Min      alloc-odr0-512             180.00 (  0.00%)           181.00 ( -0.56%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           190.00 (  0.00%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           202.00 (  0.00%)
        Min      alloc-odr0-8192            206.00 (  0.00%)           205.00 (  0.49%)
        Min      alloc-odr0-16384           206.00 (  0.00%)           205.00 (  0.49%)
      
      The benefit is negligible and the results are within the noise but each
      cycle counts.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c33d6c06
  27. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  28. 16 3月, 2016 2 次提交
  29. 07 1月, 2016 1 次提交
  30. 11 11月, 2015 1 次提交
    • R
      vfs: remove unused wrapper block_page_mkwrite() · 5c500029
      Ross Zwisler 提交于
      The function currently called "__block_page_mkwrite()" used to be called
      "block_page_mkwrite()" until a wrapper for this function was added by:
      
      commit 24da4fab ("vfs: Create __block_page_mkwrite() helper passing
      	error values back")
      
      This wrapper, the current "block_page_mkwrite()", is currently unused.
      __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.
      
      Remove the unused wrapper, rename __block_page_mkwrite() back to
      block_page_mkwrite() and update the comment above block_page_mkwrite().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c500029
  31. 07 11月, 2015 1 次提交