1. 29 1月, 2017 4 次提交
  2. 12 1月, 2017 1 次提交
    • D
      block: Rename blk_queue_zone_size and bdev_zone_size · f99e8648
      Damien Le Moal 提交于
      All block device data fields and functions returning a number of 512B
      sectors are by convention named xxx_sectors while names in the form
      xxx_size are generally used for a number of bytes. The blk_queue_zone_size
      and bdev_zone_size functions were not following this convention so rename
      them.
      
      No functional change is introduced by this patch.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      
      Collapsed the two patches, they were nonsensically split and broke
      bisection.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f99e8648
  3. 13 12月, 2016 1 次提交
  4. 12 12月, 2016 2 次提交
  5. 09 12月, 2016 1 次提交
  6. 08 12月, 2016 3 次提交
  7. 06 12月, 2016 2 次提交
  8. 30 11月, 2016 2 次提交
  9. 29 11月, 2016 1 次提交
  10. 26 11月, 2016 17 次提交
    • A
      f2fs: fix 32-bit build · 19c52651
      Arnd Bergmann 提交于
      The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED
      on 32-bit machines because of a 64-bit division:
      
      fs/f2fs/f2fs.o: In function `__issue_discard_async':
      extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod'
      
      Fortunately, bdev_zone_size() is guaranteed to return a power-of-two
      number, so we can replace the % operator with a cheaper bit mask.
      
      Fixes: 792b84b74b54 ("f2fs: support multiple devices")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      19c52651
    • N
      f2fs: set ->owner for debugfs status file's file_operations · 05e6ea26
      Nicolai Stange 提交于
      The struct file_operations instance serving the f2fs/status debugfs file
      lacks an initialization of its ->owner.
      
      This means that although that file might have been opened, the f2fs module
      can still get removed. Any further operation on that opened file, releasing
      included,  will cause accesses to unmapped memory.
      
      Indeed, Mike Marshall reported the following:
      
        BUG: unable to handle kernel paging request at ffffffffa0307430
        IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
        <...>
        Call Trace:
         [] __fput+0xdf/0x1d0
         [] ____fput+0xe/0x10
         [] task_work_run+0x8e/0xc0
         [] do_exit+0x2ae/0xae0
         [] ? __audit_syscall_entry+0xae/0x100
         [] ? syscall_trace_enter+0x1ca/0x310
         [] do_group_exit+0x44/0xc0
         [] SyS_exit_group+0x14/0x20
         [] do_syscall_64+0x61/0x150
         [] entry_SYSCALL64_slow_path+0x25/0x25
        <...>
        ---[ end trace f22ae883fa3ea6b8 ]---
        Fixing recursive fault but reboot is needed!
      
      Fix this by initializing the f2fs/status file_operations' ->owner with
      THIS_MODULE.
      
      This will allow debugfs to grab a reference to the f2fs module upon any
      open on that file, thus preventing it from getting removed.
      
      Fixes: 902829aa ("f2fs: move proc files to debugfs")
      Reported-by: NMike Marshall <hubcap@omnibond.com>
      Reported-by: NMartin Brandenburg <martin@omnibond.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      05e6ea26
    • C
      f2fs: fix incorrect free inode count in ->statfs · b08b12d2
      Chao Yu 提交于
      While calculating inode count that we can create at most in the left space,
      we should consider space which data/node blocks occupied, since we create
      data/node mixly in main area. So fix the wrong calculation in ->statfs.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b08b12d2
    • G
      f2fs: drop duplicate header timer.h · b4ceec29
      Geliang Tang 提交于
      Drop duplicate header timer.h from segment.c.
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b4ceec29
    • J
      f2fs: fix wrong AUTO_RECOVER condition · 97dd26ad
      Jaegeuk Kim 提交于
      If i_size is not aligned to the f2fs's block size, we should not skip inode
      update during fsync.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      97dd26ad
    • J
      f2fs: do not recover i_size if it's valid · 3a3a5ead
      Jaegeuk Kim 提交于
      If i_size is already valid during roll_forward recovery, we should not update
      it according to the block alignment.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3a3a5ead
    • C
      f2fs: fix fdatasync · 281518c6
      Chao Yu 提交于
      For below two cases, we can't guarantee data consistence:
      
      a)
      1. xfs_io "pwrite 0 4195328" "fsync"
      2. xfs_io "pwrite 4195328 1024" "fdatasync"
      3. godown
      4. umount & mount
      --> isize we updated before fdatasync won't be recovered
      
      b)
      1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
      2. xfs_io "fpunch 4194304 4096" "fdatasync"
      3. godown
      4. umount & mount
      --> dnode we punched before fdatasync won't be recovered
      
      The reason is that normally fdatasync won't be aware of modification
      of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
      we will skip flushing node pages for above cases, result in making
      fdatasynced file being lost during recovery.
      
      Currently we have introduced DIRTY_META global list in sbi for tracking
      dirty inode selectively, so in fdatasync we can choose to flush nodes
      depend on dirty state of current inode in the list.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      281518c6
    • C
      f2fs: fix to account total free nid correctly · 04d47e67
      Chao Yu 提交于
      Thread A		Thread B		Thread C
      - f2fs_create
       - f2fs_new_inode
        - f2fs_lock_op
         - alloc_nid
          alloc last nid
        - f2fs_unlock_op
      			- f2fs_create
      			 - f2fs_new_inode
      			  - f2fs_lock_op
      			   - alloc_nid
      			    as node count still not
      			    be increased, we will
      			    loop in alloc_nid
      						- f2fs_write_node_pages
      						 - f2fs_balance_fs_bg
      						  - f2fs_sync_fs
      						   - write_checkpoint
      						    - block_operations
      						     - f2fs_lock_all
       - f2fs_lock_op
      
      While creating new inode, we do not allocate and account nid atomically,
      so that when there is almost no free nids left, we may encounter deadloop
      like above stack.
      
      In order to avoid that, reuse nm_i::available_nids for accounting free nids
      and make nid allocation and counting being atomical during node creation.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      04d47e67
    • Y
      f2fs: fix an infinite loop when flush nodes in cp · d40a43af
      Yunlei He 提交于
      Thread A			Thread B
      
      - write_checkpoint
       - block_operations
         -blk_start_plug
          -sync_node_pages		- f2fs_do_sync_file
      				 - fsync_node_pages
      				  - f2fs_wait_on_page_writeback
      
      Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
      it start a plug list, some requests have been added to this list.
      Thread B lock one dirty node page, and wait this page write back.
      But this page has been in plug list of thread A with PG_writeback flag.
      Thread A keep on running and its plug list has no chance to finish,
      so it seems a deadlock between cp and fsync path.
      
      This patch add a wait on page write back before set node page dirty
      to avoid this problem.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NPengyang Hou <houpengyang@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d40a43af
    • C
      f2fs: don't wait writeback for datas during checkpoint · 36951b38
      Chao Yu 提交于
      Normally, while committing checkpoint, we will wait on all pages to be
      writebacked no matter the page is data or metadata, so in scenario where
      there are lots of data IO being submitted with metadata, we may suffer
      long latency for waiting writeback during checkpoint.
      
      Indeed, we only care about persistence for pages with metadata, but not
      pages with data, as file system consistent are only related to metadate,
      so in order to avoid encountering long latency in above scenario, let's
      recognize and reference metadata in submitted IOs, wait writeback only
      for metadatas.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36951b38
    • J
      f2fs: fix wrong written_valid_blocks counting · c79b7ff1
      Jaegeuk Kim 提交于
      Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
      the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
      Fix it to get the number from actual written block count from sit entries.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c79b7ff1
    • J
      f2fs: avoid BG_GC in f2fs_balance_fs · 7702bdbe
      Jaegeuk Kim 提交于
      If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
      time, all the threads would do FG_GC or BG_GC.
      In this critical path, we totally don't need to do BG_GC at all.
      Let's avoid that.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7702bdbe
    • J
      f2fs: fix redundant block allocation · c040ff9d
      Jaegeuk Kim 提交于
      In direct_IO path of f2fs_file_write_iter(),
      1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
         -> allocate LBA X
      2. f2fs_direct_IO()
         -> return 0;
      
      Then,
      f2fs_write_data_page() will allocate another LBA X+1.
      
      This makes EIO triggered by HM-SMR.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c040ff9d
    • J
      f2fs: use err for f2fs_preallocate_blocks · a7de6086
      Jaegeuk Kim 提交于
      This patch has no functional change.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a7de6086
    • J
      f2fs: support multiple devices · 3c62be17
      Jaegeuk Kim 提交于
      This patch implements multiple devices support for f2fs.
      Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
      volume under one f2fs instance.
      
      Internal block management is very simple, but we will modify block allocation
      and background GC policy to boost IO speed by exploiting them accoording to
      each device speed.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c62be17
    • J
      f2fs: allow dio read for LFS mode · e57e9ae5
      Jaegeuk Kim 提交于
      We can allow dio reads for LFS mode, while doing buffered writes for dio writes.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e57e9ae5
    • J
      f2fs: revert segment allocation for direct IO · 6ae1be13
      Jaegeuk Kim 提交于
      Now we don't need to be too much careful about storage alignment for dio, since
      its speed becomes quite fast and we'd better avoid any misalignment first.
      
      Revert: 38aa0889 (f2fs: align direct_io'ed data to section)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6ae1be13
  11. 24 11月, 2016 6 次提交
    • Y
      f2fs: return directly if block has been removed from the victim · 20614711
      Yunlei He 提交于
      If one block has been to written to a new place, just return
      in move data process. This patch check it again with holding
      page lock.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      20614711
    • C
      Revert "f2fs: do not recover from previous remained wrong dnodes" · d47b8715
      Chao Yu 提交于
      i_times of inode will be set with current system time which can be
      configured through 'date', so it's not safe to judge dnode block as
      garbage data or unchanged inode depend on i_times.
      
      Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
      dnode block, so I expect recoverying invalid dnode is almost not
      possible.
      
      This reverts commit 807b1e1c.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d47b8715
    • J
      f2fs: remove checkpoint in f2fs_freeze · b4b9d34c
      Jaegeuk Kim 提交于
      The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
      So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
      it triggers circular locking problem below due to gc_mutex for checkpoint.
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      4.9.0-rc1+ #132 Tainted: G           OE
      -------------------------------------------------------
      
      1. wait for __sb_start_write() by
      
       [<ffffffff9845f353>] dump_stack+0x85/0xc2
       [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
       [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
       [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
       [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
       [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
       [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
       [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
       [<ffffffff98289991>] iput+0x171/0x2c0
       [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
       [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
       [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
       [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
       [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
       [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
       [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
       [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
       [<ffffffff9826ca3e>] iterate_supers+0xae/0x100
       [<ffffffff982a50b5>] sys_sync+0x55/0x90
       [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6
      
      2. wait for sbi->gc_mutex by
      
       [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
       [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
       [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
       [<ffffffff9826b6ef>] freeze_super+0xcf/0x190
       [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
       [<ffffffff9827f099>] SyS_ioctl+0x79/0x90
       [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b4b9d34c
    • J
      f2fs: assign segments correctly for direct_io · bdb7d964
      Jaegeuk Kim 提交于
      Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or
      four logs, we do not use that type at all.
      Let's fix it.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bdb7d964
    • C
      f2fs: fix wrong i_atime recovery · 9f0552e0
      Chao Yu 提交于
      Shouldn't update in-memory i_atime with on-disk i_mtime of inode when
      recovering inode.
      
      Shuoran found this bug which is hidden for a long time, honour is belong
      to him.
      Signed-off-by: NShuoran Liu <liushuoran@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9f0552e0
    • C
      f2fs: record inode updating status correctly · 60dcedc9
      Chao Yu 提交于
      We should record updating status of inode only for living inode, for those
      unlinked inode it needs to clear its ino cache, otherwise after the ino
      was been reused, it will cause unneeded node page writing during ->fsync.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      60dcedc9