1. 06 9月, 2017 2 次提交
  2. 22 8月, 2017 3 次提交
    • C
      f2fs: introduce discard_granularity sysfs entry · 969d1b18
      Chao Yu 提交于
      Commit d618ebaf ("f2fs: enable small discard by default") enables
      f2fs to issue 4K size discard in real-time discard mode. However, issuing
      smaller discard may cost more lifetime but releasing less free space in
      flash device. Since f2fs has ability of separating hot/cold data and
      garbage collection, we can expect that small-sized invalid region would
      expand soon with OPU, deletion or garbage collection on valid datas, so
      it's better to delay or skip issuing smaller size discards, it could help
      to reduce overmuch consumption of IO bandwidth and lifetime of flash
      storage.
      
      This patch makes f2fs selectng 64K size as its default minimal
      granularity, and issue discard with the size which is not smaller than
      minimal granularity. Also it exposes discard granularity as sysfs entry
      for configuration in different scenario.
      
      Jaegeuk Kim:
       We must issue all the accumulated discard commands when fstrim is called.
       So, I've added pend_list_tag[] to indicate whether we should issue the
       commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
       P_TRIM is set once at a time, given fstrim trigger.
       In addition, issue_discard_thread is calling too much due to the number of
       discard commands remaining in the pending list. I added a timer to control
       it likewise gc_thread.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      969d1b18
    • Q
      f2fs: merge equivalent flags F2FS_GET_BLOCK_[READ|DIO] · f2220c7f
      Qiuyang Sun 提交于
      Currently, the two flags F2FS_GET_BLOCK_[READ|DIO] are totally equivalent
      and can be used interchangably in all scenarios they are involved in.
      Neither of the flags is referenced in f2fs_map_blocks(), making them both
      the default case. To remove the ambiguity, this patch merges both flags
      into F2FS_GET_BLOCK_DEFAULT, and introduces an enum for all distinct flags.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f2220c7f
    • C
      f2fs: support journalled quota · 4b2414d0
      Chao Yu 提交于
      This patch supports to enable f2fs to accept quota information through
      mount option:
      - {usr,grp,prj}jquota=<quota file path>
      - jqfmt=<quota type>
      
      Then, in ->mount flow, we can recover quota file during log replaying,
      by this, journelled quota can be supported.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: Fix wrong return values.]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4b2414d0
  3. 10 8月, 2017 1 次提交
  4. 04 8月, 2017 2 次提交
  5. 01 8月, 2017 6 次提交
    • C
      f2fs: support F2FS_IOC_FS{GET,SET}XATTR · 2c1d0305
      Chao Yu 提交于
      This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
      support for f2fs. The interface is kept consistent with the one
      of ext4/xfs.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2c1d0305
    • J
      f2fs: avoid naming confusion of sysfs init · dc6b2055
      Jaegeuk Kim 提交于
      This patch changes the function names of sysfs init to follow ext4.
      
      f2fs_init_sysfs <-> f2fs_register_sysfs
      f2fs_exit_sysfs <-> f2fs_unregister_sysfs
      Suggested-by: NChao Yu <yuchao0@huawei.com>
      Reivewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dc6b2055
    • C
      f2fs: support project quota · 5c57132e
      Chao Yu 提交于
      This patch adds to support plain project quota.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5c57132e
    • C
      f2fs: enhance on-disk inode structure scalability · 7a2af766
      Chao Yu 提交于
      This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
      to indicate that on-disk structure of current inode is extended.
      
      In order to extend, we changed the inode structure a bit:
      
      Original one:
      
      struct f2fs_inode {
      	...
      	struct f2fs_extent i_ext;
      	__le32 i_addr[DEF_ADDRS_PER_INODE];
      	__le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Extended one:
      
      struct f2fs_inode {
              ...
              struct f2fs_extent i_ext;
      	union {
      		struct {
      			__le16 i_extra_isize;
      			__le16 i_padding;
      			__le32 i_extra_end[0];
      		};
      		__le32 i_addr[DEF_ADDRS_PER_INODE];
      	};
              __le32 i_nid[DEF_NIDS_PER_INODE];
      }
      
      Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
      i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
      we can calculate actual size of reserved space in i_addr, available
      attribute fields included in total extra attribute fields for current
      inode can be described as below:
      
        +--------------------+
        | .i_mode            |
        | ...                |
        | .i_ext             |
        +--------------------+
        | .i_extra_isize     |-----+
        | .i_padding         |     |
        | .i_prjid           |     |
        | .i_atime_extra     |     |
        | .i_ctime_extra     |     |
        | .i_mtime_extra     |<----+
        | .i_inode_cs        |<----- store blkaddr/inline from here
        | .i_xattr_cs        |
        | ...                |
        +--------------------+
        |                    |
        |    block address   |
        |                    |
        +--------------------+
        | .i_nid             |
        +--------------------+
        |   node_footer      |
        | (nid, ino, offset) |
        +--------------------+
      
      Hence, with this patch, we would enhance scalability of f2fs inode for
      storing more newly added attribute.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a2af766
    • C
      f2fs: make max inline size changeable · f2470371
      Chao Yu 提交于
      This patch tries to make below macros calculating max inline size,
      inline dentry field size considerring reserving size-changeable
      space:
      - MAX_INLINE_DATA
      - NR_INLINE_DENTRY
      - INLINE_DENTRY_BITMAP_SIZE
      - INLINE_RESERVED_SIZE
      
      Then, when inline_{data,dentry} options is enabled, it allows us to
      reserve inline space with different size flexibly for adding newly
      introduced inode attribute.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f2470371
    • J
      f2fs: add ioctl to expose current features · e65ef207
      Jaegeuk Kim 提交于
      This patch adds an ioctl to provide feature information to user.
      For exapmle, SQLite can use this ioctl to detect whether f2fs support atomic
      write or not.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e65ef207
  6. 29 7月, 2017 1 次提交
  7. 27 7月, 2017 2 次提交
  8. 09 7月, 2017 1 次提交
    • C
      f2fs: support plain user/group quota · 0abd675e
      Chao Yu 提交于
      This patch adds to support plain user/group quota.
      
      Change Note by Jaegeuk Kim.
      
      - Use f2fs page cache for quota files in order to consider garbage collection.
        so, quota files are not tolerable for sudden power-cuts, so user needs to do
        quotacheck.
      
      - setattr() calls dquot_transfer which will transfer inode->i_blocks.
        We can't reclaim that during f2fs_evict_inode(). So, we need to count
        node blocks as well in order to match i_blocks with dquot's space.
      
        Note that, Chao wrote a patch to count inode->i_blocks without inode block.
        (f2fs: don't count inode block in in-memory inode.i_blocks)
      
      - in f2fs_remount, we need to make RW in prior to dquot_resume.
      
      - handle fault_injection case during f2fs_quota_off_umount
      
      - TODO: Project quota
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0abd675e
  9. 08 7月, 2017 4 次提交
    • C
      f2fs: use spin_{,un}lock_irq{save,restore} · d1aa2453
      Chao Yu 提交于
      generic/361 reports below warning, this is because: once, there is
      someone entering into critical region of sbi.cp_lock, if write_end_io.
      f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
      deadlock.
      
      So this patch changes to use spin_{,un}lock_irq{save,restore} to create
      critical region without IRQ enabled to avoid potential deadlock.
      
       irq event stamp: 83391573
       loop: Write error at byte offset 438729728, length 1024.
       hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
       hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
       loop: Write error at byte offset 438860288, length 1536.
       softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
       softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
       loop: Write error at byte offset 438990848, length 2048.
       ================================
       WARNING: inconsistent lock state
       4.12.0-rc2+ #30 Tainted: G           O
       --------------------------------
       inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
       xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
        (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
       {HARDIRQ-ON-W} state was registered at:
         __lock_acquire+0x527/0x7b0
         lock_acquire+0xae/0x220
         _raw_spin_lock+0x42/0x50
         do_checkpoint+0x165/0x9e0 [f2fs]
         write_checkpoint+0x33f/0x740 [f2fs]
         __f2fs_sync_fs+0x92/0x1f0 [f2fs]
         f2fs_sync_fs+0x12/0x20 [f2fs]
         sync_filesystem+0x67/0x80
         generic_shutdown_super+0x27/0x100
         kill_block_super+0x22/0x50
         kill_f2fs_super+0x3a/0x40 [f2fs]
         deactivate_locked_super+0x3d/0x70
         deactivate_super+0x40/0x60
         cleanup_mnt+0x39/0x70
         __cleanup_mnt+0x10/0x20
         task_work_run+0x69/0x80
         exit_to_usermode_loop+0x57/0x85
         do_fast_syscall_32+0x18c/0x1b0
         entry_SYSENTER_32+0x4c/0x7b
       irq event stamp: 1957420
       hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
       hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
       softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
       softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&(&sbi->cp_lock)->rlock);
         <Interrupt>
           lock(&(&sbi->cp_lock)->rlock);
      
        *** DEADLOCK ***
      
       2 locks held by xfs_io/7959:
        #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
        #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]
      
       stack backtrace:
       CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ #30
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
       Call Trace:
        dump_stack+0x5f/0x92
        print_usage_bug+0x1d3/0x1dd
        ? check_usage_backwards+0xe0/0xe0
        mark_lock+0x23d/0x280
        __lock_acquire+0x699/0x7b0
        ? __this_cpu_preempt_check+0xf/0x20
        ? trace_hardirqs_off_caller+0x91/0xe0
        lock_acquire+0xae/0x220
        ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
        _raw_spin_lock+0x42/0x50
        ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
        f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
        f2fs_write_end_io+0x147/0x150 [f2fs]
        bio_endio+0x7a/0x1e0
        blk_update_request+0xad/0x410
        blk_mq_end_request+0x16/0x60
        lo_complete_rq+0x3c/0x70
        __blk_mq_complete_request_remote+0x11/0x20
        flush_smp_call_function_queue+0x6d/0x120
        ? debug_smp_processor_id+0x12/0x20
        generic_smp_call_function_single_interrupt+0x12/0x30
        smp_call_function_single_interrupt+0x25/0x40
        call_function_single_interrupt+0x37/0x3c
       EIP: _raw_spin_unlock_irq+0x2d/0x50
       EFLAGS: 00000296 CPU: 2
       EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
       ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
        DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
        ? inherit_task_group.isra.98.part.99+0x6b/0xb0
        __add_to_page_cache_locked+0x1d4/0x290
        add_to_page_cache_lru+0x38/0xb0
        pagecache_get_page+0x8e/0x200
        f2fs_write_begin+0x96/0xf00 [f2fs]
        ? trace_hardirqs_on_caller+0xdd/0x1c0
        ? current_time+0x17/0x50
        ? trace_hardirqs_on+0xb/0x10
        generic_perform_write+0xa9/0x170
        __generic_file_write_iter+0x1a2/0x1f0
        ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
        f2fs_file_write_iter+0x6e/0x140 [f2fs]
        ? __lock_acquire+0x429/0x7b0
        __vfs_write+0xc1/0x140
        vfs_write+0x9b/0x190
        SyS_pwrite64+0x63/0xa0
        do_fast_syscall_32+0xa1/0x1b0
        entry_SYSENTER_32+0x4c/0x7b
       EIP: 0xb7786c61
       EFLAGS: 00000293 CPU: 2
       EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
       ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
        DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      
      Fixes: aaec2b1d ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d1aa2453
    • C
      f2fs: don't count inode block in in-memory inode.i_blocks · 000519f2
      Chao Yu 提交于
      Previously, we count all inode consumed blocks including inode block,
      xattr block, index block, data block into i_blocks, for other generic
      filesystems, they won't count inode block into i_blocks, so for
      userspace applications or quota system, they may detect incorrect block
      count according to i_blocks value in inode.
      
      This patch changes to count all blocks into inode.i_blocks excluding
      inode block, for on-disk i_blocks, we keep counting inode block for
      backward compatibility.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      000519f2
    • C
      f2fs: stop gc/discard thread in prior during umount · cce13252
      Chao Yu 提交于
      This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on
      
       f2fs: add f2fs_bug_on in __remove_discard_cmd
      
      For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to
      avoid referring and releasing race among them.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cce13252
    • C
      f2fs: introduce reserved_blocks in sysfs · daeb433e
      Chao Yu 提交于
      In this patch, we add a new sysfs interface, with it, we can control
      number of reserved blocks in system which could not be used by user,
      it enable f2fs to let user to configure for adjusting over-provision
      ratio dynamically instead of changing it by mkfs.
      
      So we can expect it will help to reserve more free space for relieving
      GC in both filesystem and flash device.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      daeb433e
  10. 04 7月, 2017 4 次提交
  11. 08 6月, 2017 1 次提交
    • D
      crypto: Work around deallocated stack frame reference gcc bug on sparc. · d41519a6
      David Miller 提交于
      On sparc, if we have an alloca() like situation, as is the case with
      SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
      memory.  The result can be that the value is clobbered if a trap
      or interrupt arrives at just the right instruction.
      
      It only occurs if the function ends returning a value from that
      alloca() area and that value can be placed into the return value
      register using a single instruction.
      
      For example, in lib/libcrc32c.c:crc32c() we end up with a return
      sequence like:
      
              return  %i7+8
               lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],
      
      %o5 holds the base of the on-stack area allocated for the shash
      descriptor.  But the return released the stack frame and the
      register window.
      
      So if an intererupt arrives between 'return' and 'lduw', then
      the value read at %o5+16 can be corrupted.
      
      Add a data compiler barrier to work around this problem.  This is
      exactly what the gcc fix will end up doing as well, and it absolutely
      should not change the code generated for other cpus (unless gcc
      on them has the same bug :-)
      
      With crucial insight from Eric Sandeen.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      d41519a6
  12. 24 5月, 2017 5 次提交
  13. 09 5月, 2017 1 次提交
    • M
      mm: introduce kv[mz]alloc helpers · a7c3e901
      Michal Hocko 提交于
      Patch series "kvmalloc", v5.
      
      There are many open coded kmalloc with vmalloc fallback instances in the
      tree.  Most of them are not careful enough or simply do not care about
      the underlying semantic of the kmalloc/page allocator which means that
      a) some vmalloc fallbacks are basically unreachable because the kmalloc
      part will keep retrying until it succeeds b) the page allocator can
      invoke a really disruptive steps like the OOM killer to move forward
      which doesn't sound appropriate when we consider that the vmalloc
      fallback is available.
      
      As it can be seen implementing kvmalloc requires quite an intimate
      knowledge if the page allocator and the memory reclaim internals which
      strongly suggests that a helper should be implemented in the memory
      subsystem proper.
      
      Most callers, I could find, have been converted to use the helper
      instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
      in the networking stack which I have converted as well and Eric Dumazet
      was not opposed [2] to convert them as well.
      
      [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
      
      This patch (of 9):
      
      Using kmalloc with the vmalloc fallback for larger allocations is a
      common pattern in the kernel code.  Yet we do not have any common helper
      for that and so users have invented their own helpers.  Some of them are
      really creative when doing so.  Let's just add kv[mz]alloc and make sure
      it is implemented properly.  This implementation makes sure to not make
      a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
      to not warn about allocation failures.  This also rules out the OOM
      killer as the vmalloc is a more approapriate fallback than a disruptive
      user visible action.
      
      This patch also changes some existing users and removes helpers which
      are specific for them.  In some cases this is not possible (e.g.
      ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
      require GFP_NO{FS,IO} context which is not vmalloc compatible in general
      (note that the page table allocation is GFP_KERNEL).  Those need to be
      fixed separately.
      
      While we are at it, document that __vmalloc{_node} about unsupported gfp
      mask because there seems to be a lot of confusion out there.
      kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
      superset) flags to catch new abusers.  Existing ones would have to die
      slowly.
      
      [sfr@canb.auug.org.au: f2fs fixup]
        Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7c3e901
  14. 04 5月, 2017 5 次提交
  15. 03 5月, 2017 2 次提交