1. 29 4月, 2016 1 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 18 2月, 2016 1 次提交
  4. 04 2月, 2016 1 次提交
  5. 18 12月, 2015 2 次提交
  6. 07 12月, 2015 6 次提交
  7. 03 12月, 2015 4 次提交
  8. 22 10月, 2015 4 次提交
  9. 17 2月, 2015 1 次提交
  10. 22 1月, 2015 1 次提交
    • D
      btrfs: switch extent_state state to unsigned · 9ee49a04
      David Sterba 提交于
      Currently there's a 4B hole in the structure between refs and state and there
      are only 16 bits used so we can make it unsigned. This will get a better
      packing and may save some stack space for local variables.
      
      The size of extent_state gets reduced by 8B and there are usually a lot
      of slab objects.
      
      struct extent_state {
      	u64                        start;                /*     0     8 */
      	u64                        end;                  /*     8     8 */
      	struct rb_node             rb_node;              /*    16    24 */
      	wait_queue_head_t          wq;                   /*    40    24 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	atomic_t                   refs;                 /*    64     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	long unsigned int          state;                /*    72     8 */
      	u64                        private;              /*    80     8 */
      
      	/* size: 88, cachelines: 2, members: 7 */
      	/* sum members: 84, holes: 1, sum holes: 4 */
      	/* last cacheline: 24 bytes */
      };
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      9ee49a04
  11. 13 12月, 2014 2 次提交
  12. 21 11月, 2014 1 次提交
    • F
      Btrfs: set page and mapping error on compressed write failure · 704de49d
      Filipe Manana 提交于
      If we fail in submit_compressed_extents() before calling btrfs_submit_compressed_write(),
      we start and end the writeback for the pages (clear their dirty flag, unlock them, etc)
      but we don't tag the pages, nor the inode's mapping, with an error. This makes it
      impossible for a caller of filemap_fdatawait_range() (fsync, or transaction commit
      for e.g.) know that there was an error.
      
      Note that the return value of submit_compressed_extents() is useless, as that function
      is executed by a workqueue task and not directly by the fill_delalloc callback. This
      means the writepage/s callbacks of the inode's address space operations don't get that
      return value.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      704de49d
  13. 08 10月, 2014 1 次提交
  14. 04 10月, 2014 1 次提交
    • F
      Btrfs: be aware of btree inode write errors to avoid fs corruption · 656f30db
      Filipe Manana 提交于
      While we have a transaction ongoing, the VM might decide at any time
      to call btree_inode->i_mapping->a_ops->writepages(), which will start
      writeback of dirty pages belonging to btree nodes/leafs. This call
      might return an error or the writeback might finish with an error
      before we attempt to commit the running transaction. If this happens,
      we might have no way of knowing that such error happened when we are
      committing the transaction - because the pages might no longer be
      marked dirty nor tagged for writeback (if a subsequent modification
      to the extent buffer didn't happen before the transaction commit) which
      makes filemap_fdata[write|wait]_range unable to find such pages (even
      if they're marked with SetPageError).
      So if this happens we must abort the transaction, otherwise we commit
      a super block with btree roots that point to btree nodes/leafs whose
      content on disk is invalid - either garbage or the content of some
      node/leaf from a past generation that got cowed or deleted and is no
      longer valid (for this later case we end up getting error messages like
      "parent transid verify failed on 10826481664 wanted 25748 found 29562"
      when reading btree nodes/leafs from disk).
      
      Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
      i_mapping would not be enough because we need to distinguish between
      log tree extents (not fatal) vs non-log tree extents (fatal) and
      because the next call to filemap_fdatawait_range() will catch and clear
      such errors in the mapping - and that call might be from a log sync and
      not from a transaction commit, which means we would not know about the
      error at transaction commit time. Also, checking for the eb flag
      EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
      not be completely reliable, as the eb might be removed from memory and
      read back when trying to get it, which clears that flag right before
      reading the eb's pages from disk, making us not know about the previous
      write error.
      
      Using the new 3 flags for the btree inode also makes us achieve the
      goal of AS_EIO/AS_ENOSPC when writepages() returns success, started
      writeback for all dirty pages and before filemap_fdatawait_range() is
      called, the writeback for all dirty pages had already finished with
      errors - because we were not using AS_EIO/AS_ENOSPC,
      filemap_fdatawait_range() would return success, as it could not know
      that writeback errors happened (the pages were no longer tagged for
      writeback).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      656f30db
  15. 02 10月, 2014 2 次提交
  16. 18 9月, 2014 7 次提交
    • M
      Btrfs: cleanup the read failure record after write or when the inode is freeing · f612496b
      Miao Xie 提交于
      After the data is written successfully, we should cleanup the read failure record
      in that range because
      - If we set data COW for the file, the range that the failure record pointed to is
        mapped to a new place, so it is invalid.
      - If we set no data COW for the file, and if there is no error during writting,
        the corrupted data is corrected, so the failure record can be removed. And if
        some errors happen on the mirrors, we also needn't worry about it because the
        failure record will be recreated if we read the same place again.
      
      Sometimes, we may fail to correct the data, so the failure records will be left
      in the tree, we need free them when we free the inode or the memory leak happens.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f612496b
    • M
      Btrfs: implement repair function when direct read fails · 8b110e39
      Miao Xie 提交于
      This patch implement data repair function when direct read fails.
      
      The detail of the implementation is:
      - When we find the data is not right, we try to read the data from the other
        mirror.
      - When the io on the mirror ends, we will insert the endio work into the
        dedicated btrfs workqueue, not common read endio workqueue, because the
        original endio work is still blocked in the btrfs endio workqueue, if we
        insert the endio work of the io on the mirror into that workqueue, deadlock
        would happen.
      - After we get right data, we write it back to the corrupted mirror.
      - And if the data on the new mirror is still corrupted, we will try next
        mirror until we read right data or all the mirrors are traversed.
      - After the above work, we set the uptodate flag according to the result.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8b110e39
    • M
      Btrfs: modify clean_io_failure and make it suit direct io · 1203b681
      Miao Xie 提交于
      We could not use clean_io_failure in the direct IO path because it got the
      filesystem information from the page structure, but the page in the direct
      IO bio didn't have the filesystem information in its structure. So we need
      modify it and pass all the information it need by parameters.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1203b681
    • M
      Btrfs: modify repair_io_failure and make it suit direct io · ffdd2018
      Miao Xie 提交于
      The original code of repair_io_failure was just used for buffered read,
      because it got some filesystem data from page structure, it is safe for
      the page in the page cache. But when we do a direct read, the pages in bio
      are not in the page cache, that is there is no filesystem data in the page
      structure. In order to implement direct read data repair, we need modify
      repair_io_failure and pass all filesystem data it need by function
      parameters.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ffdd2018
    • M
      Btrfs: split bio_readpage_error into several functions · 2fe6303e
      Miao Xie 提交于
      The data repair function of direct read will be implemented later, and some code
      in bio_readpage_error will be reused, so split bio_readpage_error into
      several functions which will be used in direct read repair later.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2fe6303e
    • F
      Btrfs: shrink further sizeof(struct extent_buffer) · 2a39e598
      Filipe Manana 提交于
      The map_start and map_len fields aren't used anywhere, so just remove
      them. On a x86_64 system, this reduced sizeof(struct extent_buffer)
      from 296 bytes to 280 bytes, and therefore 14 extent_buffer structs can
      now fit into a page instead of 13.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      2a39e598
    • F
      Btrfs: reduce size of struct extent_state · 27a3507d
      Filipe Manana 提交于
      The tree field of struct extent_state was only used to figure out if
      an extent state was connected to an inode's io tree or not. For this
      we can just use the rb_node field itself.
      
      On a x86_64 system with this change the sizeof(struct extent_state) is
      reduced from 96 bytes down to 88 bytes, meaning that with a page size
      of 4096 bytes we can now store 46 extent states per page instead of 42.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      27a3507d
  17. 20 6月, 2014 1 次提交
  18. 13 6月, 2014 1 次提交
  19. 10 6月, 2014 1 次提交
    • J
      Btrfs: add sanity tests for new qgroup accounting code · faa2dbf0
      Josef Bacik 提交于
      This exercises the various parts of the new qgroup accounting code.  We do some
      basic stuff and do some things with the shared refs to make sure all that code
      works.  I had to add a bunch of infrastructure because I needed to be able to
      insert items into a fake tree without having to do all the hard work myself,
      hopefully this will be usefull in the future.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      faa2dbf0
  20. 07 4月, 2014 1 次提交
    • J
      Btrfs: don't clear uptodate if the eb is under IO · a26e8c9f
      Josef Bacik 提交于
      So I have an awful exercise script that will run snapshot, balance and
      send/receive in parallel.  This sometimes would crash spectacularly and when it
      came back up the fs would be completely hosed.  Turns out this is because of a
      bad interaction of balance and send/receive.  Send will hold onto its entire
      path for the whole send, but its blocks could get relocated out from underneath
      it, and because it doesn't old tree locks theres nothing to keep this from
      happening.  So it will go to read in a slot with an old transid, and we could
      have re-allocated this block for something else and it could have a completely
      different transid.  But because we think it is invalid we clear uptodate and
      re-read in the block.  If we do this before we actually write out the new block
      we could write back stale data to the fs, and boom we're screwed.
      
      Now we definitely need to fix this disconnect between send and balance, but we
      really really need to not allow ourselves to accidently read in stale data over
      new data.  So make sure we check if the extent buffer is not under io before
      clearing uptodate, this will kick back EIO to the caller instead of reading in
      stale data and keep us from corrupting the fs.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a26e8c9f