1. 05 12月, 2015 2 次提交
    • C
      f2fs: support file defragment · d323d005
      Chao Yu 提交于
      This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file
      defragment in a specified range of regular file.
      
      This ioctl can be used in very limited workload: if user expects high
      sequential read performance in randomly written file, this interface
      can be used for defragmentation, after that file can be written as
      continuous as possible in the device.
      
      Meanwhile, it has side-effect, it will make holes in segments where
      blocks located originally, so it's better to trigger GC to eliminate
      fragment in segments.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d323d005
    • C
      f2fs: commit atomic written page in LFS mode · 2da3e027
      Chao Yu 提交于
      We should always commit atomic written pages in LFS mode, otherwise data
      will become corrupted if we encounter suddent power cut after partial
      pages committed in IPU mode.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2da3e027
  2. 21 10月, 2015 2 次提交
  3. 14 10月, 2015 1 次提交
    • C
      f2fs crypto: fix racing of accessing encrypted page among · 08b39fbd
      Chao Yu 提交于
       different competitors
      
      Since we use different page cache (normally inode's page cache for R/W
      and meta inode's page cache for GC) to cache the same physical block
      which is belong to an encrypted inode. Writeback of these two page
      cache should be exclusive, but now we didn't handle writeback state
      well, so there may be potential racing problem:
      
      a)
      kworker:				f2fs_gc:
       - f2fs_write_data_pages
        - f2fs_write_data_page
         - do_write_data_page
          - write_data_page
           - f2fs_submit_page_mbio
      (page#1 in inode's page cache was queued
      in f2fs bio cache, and be ready to write
      to new blkaddr)
      					 - gc_data_segment
      					  - move_encrypted_block
      					   - pagecache_get_page
      					(page#2 in meta inode's page cache
      					was cached with the invalid datas
      					of physical block located in new
      					blkaddr)
      					   - f2fs_submit_page_mbio
      					(page#1 was submitted, later, page#2
      					with invalid data will be submitted)
      
      b)
      f2fs_gc:
       - gc_data_segment
        - move_encrypted_block
         - f2fs_submit_page_mbio
      (page#1 in meta inode's page cache was
      queued in f2fs bio cache, and be ready
      to write to new blkaddr)
      					user thread:
      					 - f2fs_write_begin
      					  - f2fs_submit_page_bio
      					(we submit the request to block layer
      					to update page#2 in inode's page cache
      					with physical block located in new
      					blkaddr, so here we may read gabbage
      					data from new blkaddr since GC hasn't
      					writebacked the page#1 yet)
      
      This patch fixes above potential racing problem for encrypted inode.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      08b39fbd
  4. 13 10月, 2015 3 次提交
  5. 10 10月, 2015 5 次提交
    • J
      f2fs: do not skip dentry block writes · 90b803e6
      Jaegeuk Kim 提交于
      Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
      pressure and the number of dirty pages is pretty small.
      
      But, we didn't skip for normal data writes, which gives us not much big impact
      on overall performance.
      Moreover, by skipping some data writes, kworker falls into infinite loop to try
      to write blocks, when many dir inodes have only one dentry block.
      
      So, this patch removes skipping data writes.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      90b803e6
    • C
      f2fs: use correct flag in f2fs_map_blocks() · 46c9e141
      Chao Yu 提交于
      We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc ("f2fs: fix
      incorrect mapping for bmap"), but forget to use this flag in the right
      place, fix it.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      46c9e141
    • C
      f2fs: fix to handle io error in ->direct_IO · f9811703
      Chao Yu 提交于
      Here is a oops reported as following message when testing generic/019 of
      xfstest:
      
       ------------[ cut here ]------------
       kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
       invalid opcode: 0000 [#1] SMP
       Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
      nf_def
       CPU: 2 PID: 25441 Comm: fio Tainted: G           O    4.3.0-rc1+ #6
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
       task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
       RIP: 0010:[<ffffffffa0784981>]  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
       RSP: 0018:ffff8803fd61f918  EFLAGS: 00010246
       RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
       RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
       RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
       R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
       FS:  00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
       Stack:
        000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
        0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
        ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
       Call Trace:
        [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
        [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
        [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
        [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
        [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
        [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
        [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
        [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
        [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
        [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
        [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
        [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
        [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
        [<ffffffff8120b913>] do_io_submit+0x373/0x630
        [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
        [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
        [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
       Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
       RIP  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
        RSP <ffff8803fd61f918>
       ---[ end trace 2e577d7f711ddb86 ]---
      
      The reason is that: in the test of generic/019, we will trigger a manmade
      IO error in block layer through debugfs, after that, prefree segment will
      no longer be freed, because we always skip doing gc or checkpoint when
      there occurs an IO error.
      
      Meanwhile fio with aio engine generated a large number of direct IOs,
      which continue allocating spaces in free segment until we run out of them,
      eventually, results in panic in new_curseg as no more free segment was
      found.
      
      So, this patch changes to return EIO in direct_IO for this condition.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9811703
    • C
      f2fs: reorganize f2fs_map_blocks · 973163fc
      Chao Yu 提交于
      In this patch, we try to reorganize f2fs_map_blocks to make block mapping
      flow more clear by using following structure:
      
      /* check status of mapping */
      
      if (unmapped) {
      	/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */
      
      	if (create) {
      		/* write path, handle dio write case here */
      		alloc_and_map;
      	} else {
      		/*
      		 * handle read cases from all call paths:
      		 *     1. generic read;
      		 *     2. dio read;
      		 *     3. fiemap;
      		 *     4. bmap
      		 */
      	}
      }
      
      /* map buffer_header */
      
      Besides, this patch handles the missing case correctly for dio write:
      When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
      not allocate blocks correctly for preallocated blocks, but returning with
      an unmapped buffer head, which will result in failure of dio write.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      973163fc
    • C
      f2fs: fix overflow of size calculation · 9edcdabf
      Chao Yu 提交于
      We have potential overflow issue when calculating size of object, when
      we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
      32-bits space in 32-bit architecture, left shifting will incur overflow,
      i.e:
      
      pgoff_t index =  0xFFFFFFFF;
      loff_t size = index << PAGE_CACHE_SHIFT;
      size: 0xFFFFF000
      
      So we should cast index with 64-bits type to avoid this issue.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9edcdabf
  6. 22 8月, 2015 1 次提交
    • C
      f2fs: fix incorrect mapping for bmap · e2b4e2bc
      Chao Yu 提交于
      The test step is like below:
      1. touch file
      2. truncate -s $((1024*1024)) file
      3. fallocate -o 0 -l $((1024*1024)) file
      4. fibmap.f2fs file
      
      Our result of fibmap.f2fs showed below is not correct:
      
      file_pos   start_blk     end_blk        blks
             0    -937166132    -937166132           1
          4096    -937166132    -937166132           1
          8192    -937166132    -937166132           1
         12288    -937166132    -937166132           1
         16384    -937166132    -937166132           1
         20480    -937166132    -937166132           1
      ...
       1040384    -937166132    -937166132           1
       1044480    -937166132    -937166132           1
      
      This is because f2fs_map_blocks will return with no error when meeting
      a hole or preallocated block, the caller __get_data_block will map the
      uninitialized variable value to bh->b_blocknr.
      
      Unfortunately generic_block_bmap will neither check the return value of
      get_data() nor check mapping info of buffer_head, result in returning
      the random block address.
      
      After fixing the issue, our result shows correctly:
      
      file_pos   start_blk     end_blk        blks
             0           0           0         256
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e2b4e2bc
  7. 21 8月, 2015 1 次提交
    • J
      f2fs: handle failed bio allocation · 740432f8
      Jaegeuk Kim 提交于
      As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
      same time. So, we can't guarantee that bio is allocated all the time.
      
      "
       *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
       *   able to allocate a bio. This is due to the mempool guarantees. To make this
       *   work, callers must never allocate more than 1 bio at a time from this pool.
       *   Callers that need to allocate more than 1 bio must always submit the
       *   previously allocated bio for IO before attempting to allocate a new one.
       *   Failure to do so can cause deadlocks under memory pressure.
      "
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      740432f8
  8. 14 8月, 2015 1 次提交
  9. 12 8月, 2015 2 次提交
    • C
      f2fs: remove inmem radix tree · decd36b6
      Chao Yu 提交于
      Previously, we use radix tree to index all registered page entries for
      atomic file, but now we only use radix tree to see whether current page
      is indexed or not, since the other user of radix tree is gone in commit
      042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages").
      
      So in this patch, we try to use one more efficient way:
      Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
      value to indicate page indexing status. By using this way, we can save
      memory and lookup time.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      decd36b6
    • C
      f2fs: report EINVAL for unalignment direct IO · c15e8599
      Chao Yu 提交于
      We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
      detail is as fallow:
      
      dio04
      
      <<<test_start>>>
      tag=dio04 stime=1432278894
      cmdline="diotest4"
      contacts=""
      analysis=exit
      <<<test_output>>>
      diotest4    1  TPASS  :  Negative Offset
      diotest4    2  TPASS  :  removed
      diotest4    3  TFAIL  :  diotest4.c:129: write allows odd count.returns 1: Success
      diotest4    4  TFAIL  :  diotest4.c:183: Odd count of read and write
      diotest4    5  TPASS  :  Read beyond the file size
      ......
      
      the result of ext4 with same environment:
      
      dio04
      
      <<<test_start>>>
      tag=dio04 stime=1432259643
      cmdline="diotest4"
      contacts=""
      analysis=exit
      <<<test_output>>>
      diotest4    1  TPASS  :  Negative Offset
      diotest4    2  TPASS  :  removed
      diotest4    3  TPASS  :  Odd count of read and write
      diotest4    4  TPASS  :  Read beyond the file size
      ......
      
      The reason is that when triggering DIO in f2fs, we will return zero value
      in ->direct_IO if writer's buffer offset, file offset and transfer size is
      not alignment to block size of filesystem, resulting in falling back into
      buffered write instead of returning -EINVAL.
      
      This patch fixes that problem by returning correct error number for above
      case, and removing the judgement condition in check_direct_IO to make sure
      the verification will be enabled for direct reader too.
      
      Besides, Jaegeuk Kim pointed out that there is expectional cases we should
      always make direct-io falling back into buffered write, such as dio in
      encrypted file.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      [Chao Yu make small change and add detail description in commit message]
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c15e8599
  10. 06 8月, 2015 1 次提交
  11. 05 8月, 2015 17 次提交
    • C
      f2fs: fix to release inode page correctly · 470f00e9
      Chao Yu 提交于
      In following call path, we will pass a locked and referenced ipage
      pointer to get_new_data_page:
       - init_inode_metadata
        - make_empty_dir
         - get_new_data_page
      
      There are two exit paths in get_new_data_page when error occurs:
      1) grab_cache_page fails, ipage will not be released;
      2) f2fs_reserve_block fails, ipage will be released in callee.
      
      So, it's not consistent for error handling in get_new_data_page.
      
      For f2fs_reserve_block, it's not very easy to change the rule
      of error handling, since it's already complicated.
      
      Here we deside to choose an easy way to fix this issue:
      If any error occur in get_new_data_page, we will ensure releasing
      ipage in this function.
      
      The same issue is in f2fs_convert_inline_dir, fix that too.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      470f00e9
    • F
      f2fs: change the timing of f2fs_wait_on_page_writeback · 5768dcdd
      Fan Li 提交于
      some backing devices need pages to be stable during writeback. It doesn't
      matter if
      the page is completely overwritten or already uptodate, it needs to wait
      before write.
      Signed-off-by: NFan li <fanofcode.li@samsung.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5768dcdd
    • C
      f2fs: skip writing in ->writepages when no dirty pages exist · 6a290544
      Chao Yu 提交于
      When flushing comes from background, if there is no dirty page in the
      mapping of inode, we'd better to skip seeking dirty page from mapping
      for writebacking.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6a290544
    • T
      f2fs: optimize f2fs_write_cache_pages · 737f1899
      Tiezhu Yang 提交于
      The if statement "goto continue_unlock" is exactly the same when
      each if condition is true that is depended on the value of both
      "step" and "is_cold_data(page)" are 0 or 1. That means when the
      value of "step" equals to "is_cold_data(page)", the if condition
      is true and the if statement "goto continue_unlock" appears only
      once, so it can be optimized to reduce the duplicated code.
      Signed-off-by: NTiezhu Yang <kernelpatch@126.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      737f1899
    • J
      f2fs: callers take care of the page from bio error · 86531d6b
      Jaegeuk Kim 提交于
      This patch changes for a caller to handle the page after its bio gets an error.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      86531d6b
    • C
      f2fs: expose f2fs_write_cache_pages · 8f46dcae
      Chao Yu 提交于
      If there are gced dirty pages and normal dirty pages in the mapping
      of one inode, we might writeback them alternately with discontinuous
      block address, resulting in low performance.
      
      This patch introduces f2fs_write_cache_pages with codes copied from
      write_cache_pages in mm/page-writeback.c.
      
      In this function, we refactor flow with two steps:
      1) writeback all cold type pages.
      2) writeback all non-cold type pages.
      
      By using this method, f2fs will writeback dirty pages with the same
      temperature in bunch mode, it makes writeouted block being with
      more continuous address, so they can be merged as much as possible
      in f2fs bio cache, and also it will reduce the chance of submiting
      small IO from block layer.
      
      Test environment: 8g nokia sd card (very old sd card, but it shows
      better effect when testing with this patch, and with a 32g kingston
      sd card, I didn't see much more improvement).
      
      Test step:
      1. touch testfile;
      2. truncate -s 512K testfile;
      3. write all pages with odd index;
      4. trigger gc by ioctl;
      5. write all pages with even index;
      6. time fsync testfile.
      
      before:
      real	0m0.402s
      user	0m0.000s
      sys	0m0.000s
      
      after:
      real	0m0.143s
      user	0m0.004s
      sys	0m0.004s
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8f46dcae
    • C
      f2fs: maintain extent cache in separated file · a28ef1f5
      Chao Yu 提交于
      This patch moves extent cache related code from data.c into extent_cache.c
      since extent cache is independent feature, and its codes are not relate to
      others in data.c, it's better for us to maintain them in separated place.
      
      There is no functionality change, but several small coding style fixes
      including:
      * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
      * rename misspelled word 'untill' to 'until';
      * remove unneeded 'return' in the end of f2fs_destroy_extent_tree().
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a28ef1f5
    • F
      f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN · 3c7df87d
      Fan Li 提交于
      Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will
      be kept in extent cache after split, extents already shorter than
      F2FS_MIN_EXTENT_LEN don't need to try split at all.
      Signed-off-by: NFan Li <fanofcode.li@samsung.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c7df87d
    • C
      f2fs: fix to update page flag · 90d4388a
      Chao Yu 提交于
      This patch fixes to update page flag (e.g. Uptodate/cold flag) in
      ->write_begin.
      
      Otherwise, page will be non-uptodate when we try to write entire
      page, and cold data flag in page will not be clean when gced page
      is being rewritten.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      90d4388a
    • J
      f2fs: shrink unreferenced extent_caches first · 7023a1ad
      Jaegeuk Kim 提交于
      If an extent_tree entry has a zero reference count, we can drop it from the
      cache in higher priority rather than currently referencing entries.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7023a1ad
    • C
      f2fs: enhance multithread performance · bb96a8d5
      Chao Yu 提交于
      In ->writepages, we use writepages mutex lock to serialize all block
      address allocation and page submitting pairs from different inodes.
      This method makes our delayed dirty pages of one inode being written
      continously as many as possible.
      
      But there is one problem that we did not submit current cached bio in
      protection region of writepages mutex lock, so there is a small chance
      that we submit the one of other thread's as below, resulting in
      splitting more bios.
      
      thread 1			thread 2
      ->writepages
        lock(writepages)
        ->write_cache_pages
        unlock(writepages)
      				  lock(writepages)
      				  ->write_cache_pages
        ->f2fs_submit_merged_bio
      				    ->writepage
      				  unlock(writepages)
      
      fs_mark-6535  [002] ....  2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288
      fs_mark-6536  [000] ....  2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096
      fs_mark-6536  [000] ....  2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096
      fs_mark-6535  [002] ....  2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096
      
      This may really increase time of block layer works, and may cause
      larger IO lantency.
      
      This patch moves the submitting operation into region of writepages
      mutex lock to avoid bio splits when concurrently writebacking is
      intensive.
      
      my test environment: virtual machine,
      intel cpu i5 2500, 8GB size memory, 4GB size ramdisk
      
      time fs_mark  -t  16  -L  1  -s  524288  -S  1  -d  /mnt/f2fs/
      
      before:
      real	0m4.244s
      user	0m0.088s
      sys	0m12.336s
      
      after:
      real	0m3.822s
      user	0m0.072s
      sys	0m10.760s
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bb96a8d5
    • J
      f2fs: check the largest extent at look-up time · 84bc926c
      Jaegeuk Kim 提交于
      Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
      that the largest extent would be cached in the tree all the time.
      
      Instead of relying on extent_tree, we can simply check the cached one in extent
      tree accordingly.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      84bc926c
    • J
      f2fs: use extent_cache by default · 3e72f721
      Jaegeuk Kim 提交于
      We don't need to handle the duplicate extent information.
      
      The integrated rule is:
       - update on-disk extent with largest one tracked by in-memory extent_cache
       - destroy extent_tree for the truncation case
       - drop per-inode extent_cache by shrinker
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3e72f721
    • J
      f2fs: shrink extent_cache entries · 554df79e
      Jaegeuk Kim 提交于
      This patch registers shrinking extent_caches.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      554df79e
    • J
      f2fs: set cached_en after checking finally · 244f4fc1
      Jaegeuk Kim 提交于
      This patch relocates cached_en not only to be covered by spin_lock, but also
      to set once after checking out completely.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      244f4fc1
    • J
      f2fs: update on-disk extents even under extent_cache · cbe91923
      Jaegeuk Kim 提交于
      Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the
      time, and then finally preserves its up-to-date extent into on-disk one during
      f2fs_evict_inode.
      
      But, in the following scenario:
      
      1. mount
      2. open & write an extent X
      3. f2fs_evict_inode; on-disk extent is X
      4. open & update the extent X with Y
      5. sync; trigger checkpoint
      6. power-cut
      
      after power-on, f2fs should serve extent Y, but we have an on-disk extent X.
      
      This causes a failure on xfstests/311.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cbe91923
    • J
      f2fs: fix wrong block address calculation for a split extent · 7a2cb678
      Jaegeuk Kim 提交于
      This patch fixes wrong calculation on block address field when an extent is
      split.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a2cb678
  12. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  13. 25 7月, 2015 1 次提交
    • J
      f2fs: call set_page_dirty to attach i_wb for cgroup · 6282adbf
      Jaegeuk Kim 提交于
      The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
      is called, __inc_wb_stat() updates i_wb's stat.
      
      So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
      any writebacking pages.
      
      This patch should resolve the following kernel panic reported by Andreas Reis.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=101801
      
      --- Comment #2 from Andreas Reis <andreas.reis@gmail.com> ---
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
      PGD 2951ff067 PUD 2df43f067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
      Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
      T01 02/03/2015
      task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
      RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
      __percpu_counter_add+0x1a/0x90
      RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
      RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
      RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
      RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
      R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
      R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
      FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
       ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
       0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
      Call Trace:
       [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
       [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
       [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
       [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
       [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
       [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
       [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
       [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
       [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
       [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
       [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
      Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
      89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
      65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
      RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
       RSP <ffff880295143ac8>
      CR2: 00000000000000a8
      ---[ end trace 5132449a58ed93a3 ]---
      note: gcc[10356] exited with preempt_count 2
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6282adbf
  14. 02 6月, 2015 1 次提交
  15. 29 5月, 2015 1 次提交