1. 23 2月, 2016 28 次提交
    • J
      f2fs: give scheduling point in shrinking path · 6fe2bc95
      Jaegeuk Kim 提交于
      It needs to give a chance to be rescheduled while shrinking slab entries.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6fe2bc95
    • H
      f2fs: improve shrink performance of extent nodes · 201ef5e0
      Hou Pengyang 提交于
      On the worst case, we need to scan the whole radix tree and every rb-tree to
      free the victimed extent_nodes when shrinking.
      
      Pengyang initially introduced a victim_list to record the victimed extent_nodes,
      and free these extent_nodes by just scanning a list.
      
      Later, Chao Yu enhances the original patch to improve memory footprint by
      removing victim list.
      
      The policy of lru list shrinking becomes:
      1) lock lru list's lock
      2) trylock extent tree's lock
      3) remove extent node from lru list
      4) unlock lru list's lock
      5) do shrink
      6) repeat 1) to 5)
      Signed-off-by: NHou Pengyang <houpengyang@huawei.com>
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      201ef5e0
    • J
      f2fs: don't set cached_en if it will be freed · 42926744
      Jaegeuk Kim 提交于
      If en has empty list pointer, it will be freed sooner, so we don't need to
      set cached_en with it.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      42926744
    • J
      f2fs: move extent_node list operations being coupled with rbtree operation · 43a2fa18
      Jaegeuk Kim 提交于
      This patch moves extent_node list operations to be handled together with
      its rbtree operations.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      43a2fa18
    • H
      f2fs: reconstruct the code to free an extent_node · a03f01f2
      Hou Pengyang 提交于
      There are three steps to free an extent node:
      1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free
      
      In path f2fs_destroy_extent_tree, 1->2->3 to free a node,
      But in path f2fs_update_extent_tree_range, it is 2->1->3.
      
      This patch makes all the order to be: 1->2->3
      It makes sense, since in the next patch, we import a victim list in the
      path shrink_extent_tree, we could check if the extent_node is in the victim
      list by checking the list_empty(). So it is necessary to put 1) first.
      Signed-off-by: NHou Pengyang <houpengyang@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a03f01f2
    • J
      f2fs: use wq_has_sleeper for cp_wait wait_queue · 7c506896
      Jaegeuk Kim 提交于
      We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7c506896
    • F
      f2fs: avoid unnecessary search while finding victim in gc · 688159b6
      Fan Li 提交于
      variable nsearched in get_victim_by_default() indicates the number of
      dirty segments we already checked. There are 2 problems about the way
      it updates:
      1. When p.ofs_unit is greater than 1, the victim we find consists
         of multiple segments, possibly more than 1 dirty segment.
         But nsearched always increases by 1.
      2. If segments have been found but not been chosen, nsearched won't
         increase. So even we have checked all dirty segments, nsearched
         may still less than p.max_search.
      All these problems could cause unnecessary search after all dirty
      segments have already been checked.
      Signed-off-by: NFan li <fanofcode.li@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      688159b6
    • Y
      f2fs: delete unnecessary wait for page writeback · 85ead818
      Yunlei He 提交于
      no need to wait inline file page writeback for no one
      use it, so this patch delete unnecessary wait.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      85ead818
    • J
      f2fs: use wait_for_stable_page to avoid contention · fec1d657
      Jaegeuk Kim 提交于
      In write_begin, if storage supports stable_page, we don't need to wait for
      writeback to update its contents.
      This patch introduces to use wait_for_stable_page instead of
      wait_on_page_writeback.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fec1d657
    • C
      f2fs: enhance foreground GC · 718e53fa
      Chao Yu 提交于
      If we configure section consist of multiple segments, foreground GC will
      do the garbage collection with following approach:
      
      	for each segment in victim section
      		blk_start_plug
      		for each valid block in segment
      			write out by OPU method
      		submit bio cache   <---
      		blk_finish_plug   <---
      
      There are two issue:
      1) for most of the time, 'submit bio cache' will break the merging in
      current bio buffer from writes of next segments, making a smaller bio
      submitting.
      2) block plug only cover IO submitting in one segment, which reduce
      opportunity of merging IOs in plug with multiple segments.
      
      So refactor the code as below structure to strive for biggest
      opportunity of merging IOs:
      
      	blk_start_plug
      	for each segment in victim section
      		for each valid block in segment
      			write out by OPU method
      	submit bio cache
      	blk_finish_plug
      
      Test method:
      1. mkfs.f2fs -s 8 /dev/sdX
      2. touch 32 files
      3. write 2M data into each file
      4. punch 1.5M data from offset 0 for each file
      5. trigger foreground gc through ioctl
      
      Before patch, there are totoally 40 bios submitted.
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
      ----repeat for 8 times
      
      After patch, there are totally 35 bios submitted.
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
      ----repeat 34 times
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      718e53fa
    • J
      f2fs: don't need to call set_page_dirty for io error · e3ef1876
      Jaegeuk Kim 提交于
      If end_io gets an error, we don't need to set the page as dirty, since we
      already set f2fs_stop_checkpoint which will not flush any data.
      
      This will resolve the following warning.
      
      ======================================================
      [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
      4.4.0+ #9 Tainted: G           O
      ------------------------------------------------------
      xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
       (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs]
      
      and this task is already holding:
       (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490
      which would create a new lock dependency:
       (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...}
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e3ef1876
    • J
      f2fs: avoid needless sync_inode_page when reading inline_data · ae96e7bd
      Jaegeuk Kim 提交于
      In write_begin, if there is an inline_data, f2fs loads it into 0'th data page.
      Since it's the read path, we don't need to sync its inode page.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ae96e7bd
    • J
      f2fs: don't need to sync node page at every time · 52f80337
      Jaegeuk Kim 提交于
      In write_end, we don't need to sync inode page at every time.
      Instead, we can expect f2fs_write_inode will update later.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      52f80337
    • J
      f2fs: avoid multiple node page writes due to inline_data · 2049d4fc
      Jaegeuk Kim 提交于
      The sceanrio is:
      1. create fully node blocks
      2. flush node blocks
      3. write inline_data for all the node blocks again
      4. flush node blocks redundantly
      
      So, this patch tries to flush inline_data when flushing node blocks.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2049d4fc
    • J
      f2fs: do f2fs_balance_fs when block is allocated · 3c082b7b
      Jaegeuk Kim 提交于
      We should consider data block allocation to trigger f2fs_balance_fs.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c082b7b
    • J
      f2fs: fix to overcome inline_data floods · 6e17bfbc
      Jaegeuk Kim 提交于
      The scenario is:
      1. create lots of node blocks
      2. sync
      3. write lots of inline_data
      -> got panic due to no free space
      
      In that case, we should flush node blocks when writing inline_data in #3,
      and trigger gc as well.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6e17bfbc
    • J
      f2fs: use writepages->lock for WB_SYNC_ALL · 25c13551
      Jaegeuk Kim 提交于
      If there are many writepages calls by multiple threads in background, we don't
      need to serialize to merge all the bios, since it's background.
      In such the case, it'd better to run writepages concurrently.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      25c13551
    • J
      f2fs: remove needless condition check · b483fadf
      Jaegeuk Kim 提交于
      This patch removes needless condition variable.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b483fadf
    • C
      f2fs: correct search area in get_new_segment · 0ab14356
      Chao Yu 提交于
      get_new_segment starts from current segment position, tries to search a
      free segment among its right neighbors locate in same section.
      
      But previously our search area was set as [current segment, max segment],
      which means we have to search to more bits in free_segmap bitmap for some
      worse cases. So here we correct the search area to [current segment, last
      segment in section] to avoid unnecessary searching.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0ab14356
    • C
      f2fs: export dirty_nats_ratio in sysfs · 2304cb0c
      Chao Yu 提交于
      This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
      of dirty nat entries, if current ratio exceeds configured threshold,
      checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2304cb0c
    • C
      f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c
      Chao Yu 提交于
      When testing f2fs with xfstest, generic/251 is stuck for long time,
      the case uses below serials to obtain fresh released space in device,
      in order to prepare for following fstrim test.
      
      1. rm -rf /mnt/dir
      2. mkdir /mnt/dir/
      3. cp -axT `pwd`/ /mnt/dir/
      4. goto 1
      
      During preparing step, all nat entries will be cached in nat cache,
      most of them are dirty entries with invalid blkaddr, which means
      nodes related to these entries have been truncated, and they could
      be reused after the dirty entries been checkpointed.
      
      However, there was no checkpoint been triggered, so nid allocators
      (e.g. mkdir, creat) will run into long journey of iterating all NAT
      pages, looking for free nids in alloc_nid->build_free_nids.
      
      Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
      to flush nat entries for reusing them in free nid cache when dirty
      entry count exceeds 10% of max count.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7d768d2c
    • C
      f2fs: relocate is_merged_page · 0fd785eb
      Chao Yu 提交于
      Operations in is_merged_page is related to inner bio cache, move it to
      data.c.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0fd785eb
    • L
      Merge tag 'trace-fixes-v4.5-rc5' of... · 4de8ebef
      Linus Torvalds 提交于
      Merge tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
       "Two more small fixes.
      
        One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the
        stack made by the stack tracer.  As the stack tracer scans the entire
        kernel stack, KASAN triggers seeing it as a "stack out of bounds"
        error.  As the scan is looking at the contents of the stack from
        parent functions.  The NOCHECK() tells KASAN that this is done on
        purpose, and is not some kind of stack overflow.
      
        The second fix is to the ftrace selftests, to retrieve the PID of
        executed commands from the shell with '$!' and not by parsing 'jobs'"
      
      * tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing, kasan: Silence Kasan warning in check_stack of stack_tracer
        ftracetest: Fix instance test to use proper shell command for pids
      4de8ebef
    • L
      Merge tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 692b8c66
      Linus Torvalds 提交于
      Pull xen bug fixes from David Vrabel:
      
       - Two scsiback fixes (resource leak and spurious warning).
      
       - Fix DMA mapping of compound pages on arm/arm64.
      
       - Fix some pciback regressions in MSI-X handling.
      
       - Fix a pcifront crash due to some uninitialize state.
      
      * tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
        xen/pcifront: Report the errors better.
        xen/pciback: Save the number of MSI-X entries to be copied later.
        xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
        xen: fix potential integer overflow in queue_reply
        xen/arm: correctly handle DMA mapping of compound pages
        xen/scsiback: avoid warnings when adding multiple LUNs to a domain
        xen/scsiback: correct frontend counting
      692b8c66
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · dea08e60
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Looks like a lot, but mostly driver fixes scattered all over as usual.
      
        Of note:
      
         1) Add conditional sched in nf conntrack in cleanup to avoid NMI
            watchdogs.  From Florian Westphal.
      
         2) Fix deadlock in nfnetlink cttimeout, also from Floarian.
      
         3) Fix handling of slaves in bonding ARP monitor validation, from Jay
            Vosburgh.
      
         4) Callers of ip_cmsg_send() are responsible for freeing IP options,
            some were not doing so.  Fix from Eric Dumazet.
      
         5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.
      
         6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.
      
         7) bcm7xxx PHY driver bug fixes from Florian Fainelli.
      
         8) Avoid unaligned accesses to protocol headers wrt.  GRE, from
            Alexander Duyck.
      
         9) SKB leaks and other problems in arc_emac driver, from Alexander
            Kochetkov.
      
        10) tcp_v4_inbound_md5_hash() releases listener socket instead of
            request socket on error path, oops.  Fix from Eric Dumazet.
      
        11) Missing socket release in pppoe_rcv_core() that seems to have
            existed basically forever.  From Guillaume Nault.
      
        12) Missing slave_dev unregister in dsa_slave_create() error path,
            from Florian Fainelli.
      
        13) crypto_alloc_hash() never returns NULL, fix return value check in
            __tcp_alloc_md5sig_pool.  From Insu Yun.
      
        14) Properly expire exception route entries in ipv4, from Xin Long.
      
        15) Fix races in tcp/dccp listener socket dismantle, from Eric
            Dumazet.
      
        16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
            legal.  These drivers modify the SKB on transmit.  From Jiri Benc.
      
        17) Fix regression in the initialziation of netdev->tx_queue_len.
            From Phil Sutter.
      
        18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.
      
        19) SCTP port hash sizing does not properly ensure that table is a
            power of two in size.  From Neil Horman.
      
        20) Fix initializing of software copy of MAC address in fmvj18x_cs
            driver, from Ken Kawasaki"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
        bnx2x: Fix 84833 phy command handler
        bnx2x: Fix led setting for 84858 phy.
        bnx2x: Correct 84858 PHY fw version
        bnx2x: Fix 84833 RX CRC
        bnx2x: Fix link-forcing for KR2
        net: ethernet: davicom: fix devicetree irq resource
        fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
        Driver: Vmxnet3: Update Rx ring 2 max size
        net: netcp: rework the code for get/set sw_data in dma desc
        soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
        net: ti: netcp: restore get/set_pad_info() functionality
        MAINTAINERS: Drop myself as xen netback maintainer
        sctp: Fix port hash table size computation
        can: ems_usb: Fix possible tx overflow
        Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
        net: bcmgenet: Fix internal PHY link state
        af_unix: Don't use continue to re-execute unix_stream_read_generic loop
        unix_diag: fix incorrect sign extension in unix_lookup_by_ino
        bnxt_en: Failure to update PHY is not fatal condition.
        bnxt_en: Remove unnecessary call to update PHY settings.
        ...
      dea08e60
    • L
      Merge tag 'hwmon-for-linus-v4.5-rc6' of... · 5c102d0e
      Linus Torvalds 提交于
      Merge tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Two fixes headed for stable:
      
         - Remove an unnecessary speed_index lookup for thermal hook in the
           gpio-fan driver.  The unnecessary speed lookup can hog the system.
      
         - Handle negative conversion values correctly in the ads1015 driver"
      
      * tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
        hwmon: (ads1015) Handle negative conversion values correctly
      5c102d0e
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · a16152c8
      Linus Torvalds 提交于
      Pull rdma fixes from Doug Ledford:
       "One ocrdma fix:
      
         - The new CQ API support was added to ocrdma, but they got the arming
           logic wrong, so without this, transfers eventually fail when they
           fail to arm the interrupt properly under load
      
        Two related fixes for mlx4:
      
         - When we added the 64bit extended counters support to the core IB
           code, they forgot to update the RoCE side of the mlx4 driver (the
           IB side they properly updated).
      
           I debated whether or not to include these patches as they could be
           considered feature enablement patches, but the existing code will
           blindy copy the 32bit counters, whether any counters were requested
           at all (a bug).
      
           These two patches make it (a) check to see that counters were
           requested and (b) copy the right counters (the 64bit support is
           new, the 32bit is not).  For that reason I went ahead and took
           them"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/mlx4: Add support for the port info class for RoCE ports
        IB/mlx4: Add support for extended counters over RoCE ports
        RDMA/ocrdma: Fix arm logic to align with new cq API
      a16152c8
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 7ee302f6
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Some bugfixes from I2C for you:
      
        A fix for a RuntimePM regression with OMAP, a fix to enable TCO for
        Lewisburg platforms, and a typo fix while we are here"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: i801: Adding Intel Lewisburg support for iTCO
        i2c: uniphier: fix typos in error messages
        i2c: omap: Fix PM regression with deferred probe for pm_runtime_reinit
      7ee302f6
  2. 22 2月, 2016 12 次提交