1. 16 11月, 2017 2 次提交
  2. 09 9月, 2017 1 次提交
    • J
      mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY · 2916ecc0
      Jérôme Glisse 提交于
      Introduce a new migration mode that allow to offload the copy to a device
      DMA engine.  This changes the workflow of migration and not all
      address_space migratepage callback can support this.
      
      This is intended to be use by migrate_vma() which itself is use for thing
      like HMM (see include/linux/hmm.h).
      
      No additional per-filesystem migratepage testing is needed.  I disables
      MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
      added comment in those to explain why (part of this patch).  The commit
      message is unclear it should say that any callback that wish to support
      this new mode need to be aware of the difference in the migration flow
      from other mode.
      
      Some of these callbacks do extra locking while copying (aio, zsmalloc,
      balloon, ...) and for DMA to be effective you want to copy multiple
      pages in one DMA operations.  But in the problematic case you can not
      easily hold the extra lock accross multiple call to this callback.
      
      Usual flow is:
      
      For each page {
       1 - lock page
       2 - call migratepage() callback
       3 - (extra locking in some migratepage() callback)
       4 - migrate page state (freeze refcount, update page cache, buffer
           head, ...)
       5 - copy page
       6 - (unlock any extra lock of migratepage() callback)
       7 - return from migratepage() callback
       8 - unlock page
      }
      
      The new mode MIGRATE_SYNC_NO_COPY:
       1 - lock multiple pages
      For each page {
       2 - call migratepage() callback
       3 - abort in all problematic migratepage() callback
       4 - migrate page state (freeze refcount, update page cache, buffer
           head, ...)
      } // finished all calls to migratepage() callback
       5 - DMA copy multiple pages
       6 - unlock all the pages
      
      To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
      new callback migratepages() (for instance) that deals with multiple
      pages in one transaction.
      
      Because the problematic cases are not important for current usage I did
      not wanted to complexify this patchset even more for no good reason.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2916ecc0
  3. 08 9月, 2017 1 次提交
  4. 06 9月, 2017 2 次提交
  5. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  6. 22 8月, 2017 1 次提交
  7. 10 8月, 2017 1 次提交
  8. 01 8月, 2017 2 次提交
    • C
      f2fs: enhance on-disk inode structure scalability · 7a2af766
      Chao Yu 提交于
      This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
      to indicate that on-disk structure of current inode is extended.
      
      In order to extend, we changed the inode structure a bit:
      
      Original one:
      
      struct f2fs_inode {
      	...
      	struct f2fs_extent i_ext;
      	__le32 i_addr[DEF_ADDRS_PER_INODE];
      	__le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Extended one:
      
      struct f2fs_inode {
              ...
              struct f2fs_extent i_ext;
      	union {
      		struct {
      			__le16 i_extra_isize;
      			__le16 i_padding;
      			__le32 i_extra_end[0];
      		};
      		__le32 i_addr[DEF_ADDRS_PER_INODE];
      	};
              __le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
      i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
      we can calculate actual size of reserved space in i_addr, available
      attribute fields included in total extra attribute fields for current
      inode can be described as below:
      
        +--------------------+
        | .i_mode            |
        | ...                |
        | .i_ext             |
        +--------------------+
        | .i_extra_isize     |-----+
        | .i_padding         |     |
        | .i_prjid           |     |
        | .i_atime_extra     |     |
        | .i_ctime_extra     |     |
        | .i_mtime_extra     |<----+
        | .i_inode_cs        |<----- store blkaddr/inline from here
        | .i_xattr_cs        |
        | ...                |
        +--------------------+
        |                    |
        |    block address   |
        |                    |
        +--------------------+
        | .i_nid             |
        +--------------------+
        |   node_footer      |
        | (nid, ino, offset) |
        +--------------------+
      
      Hence, with this patch, we would enhance scalability of f2fs inode for
      storing more newly added attribute.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a2af766
    • C
      f2fs: make max inline size changeable · f2470371
      Chao Yu 提交于
      This patch tries to make below macros calculating max inline size,
      inline dentry field size considerring reserving size-changeable
      space:
      - MAX_INLINE_DATA
      - NR_INLINE_DENTRY
      - INLINE_DENTRY_BITMAP_SIZE
      - INLINE_RESERVED_SIZE
      
      Then, when inline_{data,dentry} options is enabled, it allows us to
      reserve inline space with different size flexibly for adding newly
      introduced inode attribute.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f2470371
  9. 09 7月, 2017 1 次提交
    • C
      f2fs: support plain user/group quota · 0abd675e
      Chao Yu 提交于
      This patch adds to support plain user/group quota.
      
      Change Note by Jaegeuk Kim.
      
      - Use f2fs page cache for quota files in order to consider garbage collection.
        so, quota files are not tolerable for sudden power-cuts, so user needs to do
        quotacheck.
      
      - setattr() calls dquot_transfer which will transfer inode->i_blocks.
        We can't reclaim that during f2fs_evict_inode(). So, we need to count
        node blocks as well in order to match i_blocks with dquot's space.
      
        Note that, Chao wrote a patch to count inode->i_blocks without inode block.
        (f2fs: don't count inode block in in-memory inode.i_blocks)
      
      - in f2fs_remount, we need to make RW in prior to dquot_resume.
      
      - handle fault_injection case during f2fs_quota_off_umount
      
      - TODO: Project quota
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0abd675e
  10. 08 7月, 2017 3 次提交
  11. 04 7月, 2017 1 次提交
    • Q
      f2fs: dax: fix races between page faults and truncating pages · 5a3a2d83
      Qiuyang Sun 提交于
      Currently in F2FS, page faults and operations that truncate the pagecahe
      or data blocks, are completely unsynchronized. This can result in page
      fault faulting in a page into a range that we are changing after
      truncating, and thus we can end up with a page mapped to disk blocks that
      will be shortly freed. Filesystem corruption will shortly follow.
      
      This patch fixes the problem by creating new rw semaphore i_mmap_sem in
      f2fs_inode_info and grab it for functions removing blocks from extent tree
      and for read over page faults. The mechanism is similar to that in ext4.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5a3a2d83
  12. 09 6月, 2017 1 次提交
  13. 24 5月, 2017 4 次提交
  14. 04 5月, 2017 2 次提交
  15. 03 5月, 2017 3 次提交
  16. 25 4月, 2017 1 次提交
    • J
      f2fs: fix out-of free segments · a7881893
      Jaegeuk Kim 提交于
      This patch also reverts d0db7703 ("f2fs: do SSR in higher priority").
      
      This patch fixes out of free segments caused by many small file creation by
      1) mkfs -s 1 2G
      2) mount
      3) untar
       - preoduce 60000 small files burstly
      4) sync
       - flush node pages
       - flush imeta
      
      Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in
      skipping to check has_not_enough_free_secs.
      
      Another test is done by
      1) mkfs -s 12 2G
      2) mount
      3) untar
       - preoduce 60000 small files burstly
      4) sync
       - flush node pages
       - flush imeta
      
      In this case, this patch also fixes wrong block allocation under large section
      size.
      Reported-by: NWilliam Brana <wbrana@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a7881893
  17. 20 4月, 2017 2 次提交
    • H
      f2fs: introduce async IPU policy · 04485987
      Hou Pengyang 提交于
      This patch introduces an ASYNC IPU policy.
      
      Under senario of large # of async updating(e.g. log writing in Android),
      disk would be seriously fragmented, and higher frequent gc would be triggered.
      
      This patch uses IPU to rewrite the async update writting, since async is
      NOT sensitive to io latency.
      Signed-off-by: NHou Pengyang <houpengyang@huawei.com>
      04485987
    • C
      f2fs: unlock cp_rwsem early for IPU writes · 001c584c
      Chao Yu 提交于
      For IPU writes, there won't be any udpates in dnode page since we
      will reuse old block address instead of allocating new one, so we
      don't need to lock cp_rwsem during IPU IO submitting.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      001c584c
  18. 11 4月, 2017 1 次提交
  19. 06 4月, 2017 3 次提交
  20. 25 3月, 2017 1 次提交
  21. 22 3月, 2017 2 次提交
    • K
      f2fs: fix bad prefetchw of NULL page · a83d50bc
      Kinglong Mee 提交于
      For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
      so that, the prefetchw(&page->flags) is operated on NULL.
      
      Fixes: f1e88660 ("f2fs: expose f2fs_mpage_readpages")
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a83d50bc
    • J
      f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer · 8c242db9
      Jaegeuk Kim 提交于
      When I forced to enable atomic operations intentionally, I could hit the below
      panic, since we didn't clear page->private in f2fs_invalidate_page called by
      file truncation.
      
      The panic occurs due to NULL mapping having page->private.
      
      BUG: unable to handle kernel paging request at ffffffffffffffff
      IP: drop_buffers+0x38/0xe0
      PGD 5d00c067
      PUD 5d00e067
      PMD 0
      CPU: 3 PID: 1648 Comm: fsstress Tainted: G      D    OE   4.10.0+ #5
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      task: ffff9151952863c0 task.stack: ffffaaec40db4000
      RIP: 0010:drop_buffers+0x38/0xe0
      RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292
      Call Trace:
       ? page_referenced+0x8b/0x170
       try_to_free_buffers+0xc5/0xe0
       try_to_release_page+0x49/0x50
       shrink_page_list+0x8bc/0x9f0
       shrink_inactive_list+0x1dd/0x500
       ? shrink_active_list+0x2c0/0x430
       shrink_node_memcg+0x5eb/0x7c0
       shrink_node+0xe1/0x320
       do_try_to_free_pages+0xef/0x2e0
       try_to_free_pages+0xe9/0x190
       __alloc_pages_slowpath+0x390/0xe70
       __alloc_pages_nodemask+0x291/0x2b0
       alloc_pages_current+0x95/0x140
       __page_cache_alloc+0xc4/0xe0
       pagecache_get_page+0xab/0x2a0
       grab_cache_page_write_begin+0x20/0x40
       get_read_data_page+0x2e6/0x4c0 [f2fs]
       ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
       ? truncate_data_blocks_range+0x238/0x2b0 [f2fs]
       get_lock_data_page+0x30/0x190 [f2fs]
       __exchange_data_block+0xaaf/0xf40 [f2fs]
       f2fs_fallocate+0x418/0xd00 [f2fs]
       vfs_fallocate+0x157/0x220
       SyS_fallocate+0x48/0x80
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Chao Yu: use INMEM_INVALIDATE for better tracing]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8c242db9
  22. 02 3月, 2017 1 次提交
  23. 28 2月, 2017 3 次提交