1. 22 3月, 2017 12 次提交
  2. 20 3月, 2017 5 次提交
    • C
      f2fs: combine nat_bits and free_nid_bitmap cache · 7041d5d2
      Chao Yu 提交于
      Both nat_bits cache and free_nid_bitmap cache provide same functionality
      as a intermediate cache between free nid cache and disk, but with
      different granularity of indicating free nid range, and different
      persistence policy. nat_bits cache provides better persistence ability,
      and free_nid_bitmap provides better granularity.
      
      In this patch we combine advantage of both caches, so finally policy of
      the intermediate cache would be:
      - init: load free nid status from nat_bits into free_nid_bitmap
      - lookup: scan free_nid_bitmap before load NAT blocks
      - update: update free_nid_bitmap in real-time
      - persistence: udpate and persist nat_bits in checkpoint
      
      This patch also resolves performance regression reported by lkp-robot.
      
      commit:
        4ac91242 ("f2fs: introduce free nid bitmap")
        d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le")
        1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache")
      
      4ac91242 d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15
      ---------------- -------------------------- --------------------------
               %stddev     %change         %stddev     %change         %stddev
                   \          |                \          |                \
           77863 ±  0%      +2.1%      79485 ±  1%     +50.8%     117404 ±  0%  aim7.jobs-per-min
          231.63 ±  0%      -2.0%     227.01 ±  1%     -33.6%     153.80 ±  0%  aim7.time.elapsed_time
          231.63 ±  0%      -2.0%     227.01 ±  1%     -33.6%     153.80 ±  0%  aim7.time.elapsed_time.max
          896604 ±  0%      -0.8%     889221 ±  3%     -20.2%     715260 ±  1%  aim7.time.involuntary_context_switches
            2394 ±  1%      +4.6%       2503 ±  1%      +3.7%       2481 ±  2%  aim7.time.maximum_resident_set_size
            6240 ±  0%      -1.5%       6145 ±  1%     -14.1%       5360 ±  1%  aim7.time.system_time
         1111357 ±  3%      +1.9%    1132509 ±  2%      -6.2%    1041932 ±  2%  aim7.time.voluntary_context_switches
      ...
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NXiaolong Ye <xiaolong.ye@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7041d5d2
    • C
      f2fs: skip scanning free nid bitmap of full NAT blocks · 586d1492
      Chao Yu 提交于
      This patch adds to account free nids for each NAT blocks, and while
      scanning all free nid bitmap, do check count and skip lookuping in
      full NAT block.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      586d1492
    • J
      f2fs: use __set{__clear}_bit_le · 23380b85
      Jaegeuk Kim 提交于
      This patch uses __set{__clear}_bit_le for highter speed.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      23380b85
    • J
      f2fs: declare static functions · 9f7e4a2c
      Jaegeuk Kim 提交于
      This is to avoid build warning reported by kbuild test robot.
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9f7e4a2c
    • J
      f2fs: don't overwrite node block by SSR · 720037f9
      Jaegeuk Kim 提交于
      This patch fixes that SSR can overwrite previous warm node block consisting of
      a node chain since the last checkpoint.
      
      Fixes: 5b6c6be2 ("f2fs: use SSR for warm node as well")
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      720037f9
  3. 18 3月, 2017 6 次提交
    • W
      pNFS/flexfiles: never nfs4_mark_deviceid_unavailable · da066f3f
      Weston Andros Adamson 提交于
      The flexfiles layout should never mark a device unavailable.
      
      Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call
      directly from files layout where it's still needed.
      
      The flexfiles driver still handles marked devices in error paths, but will
      now print a rate limited warning.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      da066f3f
    • W
      pNFS: return status from nfs4_pnfs_ds_connect · a33e4b03
      Weston Andros Adamson 提交于
      The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it
      can wait on another context to reach the same failure.
      
      This checks that the rpc_create succeeded and returns the error to the
      caller.
      
      When an error is returned, both the files and flexfiles layouts will return
      NULL from _prepare_ds(). The flexfiles layout will also return the layout
      with the error NFS4ERR_NXIO.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      a33e4b03
    • O
      NFSv4.1 respect server's max size in CREATE_SESSION · 03385332
      Olga Kornievskaia 提交于
      Currently client doesn't respect max sizes server returns in CREATE_SESSION.
      nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0
      so they never get set to the sizes returned by the server.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      03385332
    • O
      NFS prevent double free in async nfs4_exchange_id · 63513232
      Olga Kornievskaia 提交于
      Since rpc_task is async, the release function should be called which
      will free the impl_id, scope, and owner.
      
      Trond pointed at 2 more problems:
      -- use of client pointer after free in the nfs4_exchangeid_release() function
      -- cl_count mismatch if rpc_run_task() isn't run
      
      Fixes: 8d89bd70 ("NFS setup async exchange_id")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org # 4.9
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      63513232
    • J
      nfs: make nfs4_cb_sv_ops static · 05fae7bb
      Jason Yan 提交于
      Fixes the following sparse warning:
      
      fs/nfs/callback.c:235:21: warning: symbol 'nfs4_cb_sv_ops' was not
      declared. Should it be static?
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      05fae7bb
    • K
      NFS: fix the fault nrequests decreasing for nfs_inode COPY · 38a33101
      Kinglong Mee 提交于
      The nfs_commit_file for NFSv4.2's COPY operation goes through
      the commit path for normal WRITE, but without increase nrequests,
      so, the nrequests decreased in nfs_commit_release_pages is fault.
      After that, the nrequests will be wrong.
      
      [ 5670.299881] ------------[ cut here ]------------
      [ 5670.300295] WARNING: CPU: 0 PID: 27656 at fs/nfs/inode.c:127 nfs_clear_inode+0x66/0x90 [nfs]
      [ 5670.300558] Modules linked in: nfsv4(E) nfs(E) fscache(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event ppdev f2fs coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_ens1371 intel_rapl_perf gameport snd_ac97_codec vmw_balloon ac97_bus snd_seq snd_pcm joydev snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm e1000 crc32c_intel mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: fscache]
      [ 5670.302925] CPU: 0 PID: 27656 Comm: umount.nfs4 Tainted: G        W   E   4.11.0-rc1+ #519
      [ 5670.303292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [ 5670.304094] Call Trace:
      [ 5670.304510]  dump_stack+0x63/0x86
      [ 5670.304917]  __warn+0xcb/0xf0
      [ 5670.305276]  warn_slowpath_null+0x1d/0x20
      [ 5670.305661]  nfs_clear_inode+0x66/0x90 [nfs]
      [ 5670.306093]  nfs4_evict_inode+0x61/0x70 [nfsv4]
      [ 5670.306480]  evict+0xbb/0x1c0
      [ 5670.306888]  dispose_list+0x4d/0x70
      [ 5670.307233]  evict_inodes+0x178/0x1a0
      [ 5670.307579]  generic_shutdown_super+0x44/0xf0
      [ 5670.307985]  nfs_kill_super+0x21/0x40 [nfs]
      [ 5670.308325]  deactivate_locked_super+0x43/0x70
      [ 5670.308698]  deactivate_super+0x5a/0x60
      [ 5670.309036]  cleanup_mnt+0x3f/0x90
      [ 5670.309407]  __cleanup_mnt+0x12/0x20
      [ 5670.309837]  task_work_run+0x80/0xa0
      [ 5670.310162]  exit_to_usermode_loop+0x89/0x90
      [ 5670.310497]  syscall_return_slowpath+0xaa/0xb0
      [ 5670.310875]  entry_SYSCALL_64_fastpath+0xa7/0xa9
      [ 5670.311197] RIP: 0033:0x7f1bb3617fe7
      [ 5670.311545] RSP: 002b:00007ffecbabb828 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
      [ 5670.311906] RAX: 0000000000000000 RBX: 0000000001dca1f0 RCX: 00007f1bb3617fe7
      [ 5670.312239] RDX: 000000000000000c RSI: 0000000000000001 RDI: 0000000001dc83c0
      [ 5670.312653] RBP: 0000000001dc83c0 R08: 0000000000000001 R09: 0000000000000000
      [ 5670.312998] R10: 0000000000000755 R11: 0000000000000206 R12: 00007ffecbabc66a
      [ 5670.313335] R13: 0000000001dc83a0 R14: 0000000000000000 R15: 0000000000000000
      [ 5670.313758] ---[ end trace bf4bfe7764e4eb40 ]---
      
      Cc: linux-kernel@vger.kernel.org
      Fixes: 67911c8f ("NFS: Add nfs_commit_file()")
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      38a33101
  4. 17 3月, 2017 17 次提交
    • D
      afs: Don't wait for page writeback with the page lock held · c5051c7b
      David Howells 提交于
      Drop the page lock before waiting for page writeback.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c5051c7b
    • D
      afs: ->writepage() shouldn't call clear_page_dirty_for_io() · 65a15109
      David Howells 提交于
      The ->writepage() op shouldn't call clear_page_dirty_for_io() as that has
      already been called by the caller.
      
      Fix afs_writepage() by moving the call out of
      afs_write_back_from_locked_page() to afs_writepages_region() where it is
      needed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      65a15109
    • D
      afs: Fix abort on signal while waiting for call completion · 954cd6dc
      David Howells 提交于
      Fix the way in which a call that's in progress and being waited for is
      aborted in the case that EINTR is detected.  We should be sending
      RX_USER_ABORT rather than RX_CALL_DEAD as the abort code.
      
      Note that since the only two ways out of the loop are if the call completes
      or if a signal happens, the kill-the-call clause after the loop has
      finished can only happen in the case of EINTR.  This means that we only
      have one abort case to deal with, not two, and the "KWC" case can never
      happen and so can be deleted.
      
      Note further that simply aborting the call isn't necessarily the best thing
      here since at this point: the request has been entirely sent and it's
      likely the server will do the operation anyway - whether we abort it or
      not.  In future, we should punt the handling of the remainder of the call
      off to a background thread.
      Reported-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      954cd6dc
    • D
      afs: Fix an off-by-one error in afs_send_pages() · 445783d0
      David Howells 提交于
      afs_send_pages() should only put the call into the AFS_CALL_AWAIT_REPLY
      state if it has sent all the pages - but the check it makes is incorrect
      and sometimes it will finish the loop early.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      445783d0
    • D
      afs: Fix afs_kill_pages() · 7286a35e
      David Howells 提交于
      Fix afs_kill_pages() in two ways:
      
       (1) If a writeback has been partially flushed, then if we try and kill the
           pages it contains, some of them may no longer be undergoing writeback
           and end_page_writeback() will assert.
      
           Fix this by checking to see whether the page in question is actually
           undergoing writeback before ending that writeback.
      
       (2) The loop that scans for pages to kill doesn't increase the first page
           index, and so the loop may not terminate, but it will try to process
           the same pages over and over again.
      
           Fix this by increasing the first page index to one after the last page
           we processed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7286a35e
    • D
      afs: Fix page leak in afs_write_begin() · 6d06b0d2
      David Howells 提交于
      afs_write_begin() leaks a ref and a lock on a page if afs_fill_page()
      fails.  Fix the leak by unlocking and releasing the page in the error path.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      6d06b0d2
    • D
      afs: Don't set PG_error on local EINTR or ENOMEM when filling a page · 68ae849d
      David Howells 提交于
      Don't set PG_error on a page if we get local EINTR or ENOMEM when filling a
      page for writing.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      68ae849d
    • M
      afs: Populate and use client modification time · ab94f5d0
      Marc Dionne 提交于
      The inode timestamps should be set from the client time
      in the status received from the server, rather than the
      server time which is meant for internal server use.
      
      Set AFS_SET_MTIME and populate the mtime for operations
      that take an input status, such as file/dir creation
      and StoreData.  If an input time is not provided the
      server will set the vnode times based on the current server
      time.
      
      In a situation where the server has some skew with the
      client, this could lead to the client seeing a timestamp
      in the future for a file that it just created or wrote.
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ab94f5d0
    • D
      afs: Better abort and net error handling · 70af0e3b
      David Howells 提交于
      If we receive a network error, a remote abort or a protocol error whilst
      we're still transmitting data, make sure we return an appropriate error to
      the caller rather than ESHUTDOWN or ECONNABORTED.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      70af0e3b
    • D
      afs: Invalid op ID should abort with RXGEN_OPCODE · 1157f153
      David Howells 提交于
      When we are given an invalid operation ID, we should abort that with
      RXGEN_OPCODE rather than RX_INVALID_OPERATION.
      
      Also map RXGEN_OPCODE to -ENOTSUPP.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1157f153
    • D
      afs: Fix the maths in afs_fs_store_data() · 146a1192
      David Howells 提交于
      afs_fs_store_data() works out of the size of the write it's going to make,
      but it uses 32-bit unsigned subtraction in one place that gets
      automatically cast to loff_t.
      
      However, if to < offset, then the number goes negative, but as the result
      isn't signed, this doesn't get sign-extended to 64-bits when placed in a
      loff_t.
      
      Fix by casting the operands to loff_t.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      146a1192
    • D
      afs: Use a bvec rather than a kvec in afs_send_pages() · 2f5705a5
      David Howells 提交于
      Use a bvec rather than a kvec in afs_send_pages() as we don't then have to
      call kmap() in advance.  This allows us to pass the array of contiguous
      pages that we extracted through to rxrpc in one go rather than passing a
      single page at a time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2f5705a5
    • D
      afs: Make struct afs_read::remain 64-bit · 6a0e3999
      David Howells 提交于
      Make struct afs_read::remain 64-bit so that it can handle huge transfers if
      we ever request them or the server decides to give us a bit extra data (the
      other fields there are already 64-bit).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      6a0e3999
    • D
      afs: Fix AFS read bug · 29f06985
      David Howells 提交于
      Fix a bug in AFS read whereby the request page afs_read::index isn't
      incremented after calling ->page_done() if ->remain reaches 0, indicating
      that the data read is complete.
      
      Without this a NULL pointer exception happens when ->page_done() is called
      twice for the last page because the page clearing loop will call it also
      and afs_readpages_page_done() clears the current entry in the page list.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: afs_readpages_page_done+0x21/0xa4 [kafs]
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in: kafs(E)
      CPU: 2 PID: 3002 Comm: md5sum Tainted: G            E   4.10.0-fscache #485
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804017d86c0 task.stack: ffff8803fc1d8000
      RIP: 0010:afs_readpages_page_done+0x21/0xa4 [kafs]
      RSP: 0018:ffff8803fc1db978 EFLAGS: 00010282
      RAX: ffff880405d39af8 RBX: 0000000000000000 RCX: ffff880407d83ed4
      RDX: 0000000000000000 RSI: ffff880405d39a00 RDI: ffff880405c6f400
      RBP: ffff8803fc1db988 R08: 0000000000000000 R09: 0000000000000001
      R10: ffff8803fc1db820 R11: ffff88040cf56000 R12: ffff8804088f1780
      R13: ffff8804017d86c0 R14: ffff8804088f1780 R15: 0000000000003840
      FS:  00007f8154469700(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000004016ec000 CR4: 00000000001406e0
      Call Trace:
       afs_deliver_fs_fetch_data+0x5b9/0x60e [kafs]
       ? afs_make_call+0x316/0x4e8 [kafs]
       ? afs_make_call+0x359/0x4e8 [kafs]
       afs_deliver_to_call+0x173/0x2e8 [kafs]
       ? afs_make_call+0x316/0x4e8 [kafs]
       afs_make_call+0x37a/0x4e8 [kafs]
       ? wake_up_q+0x4f/0x4f
       ? __init_waitqueue_head+0x36/0x49
       afs_fs_fetch_data+0x21c/0x227 [kafs]
       ? afs_fs_fetch_data+0x21c/0x227 [kafs]
       afs_vnode_fetch_data+0xf3/0x1d2 [kafs]
       afs_readpages+0x314/0x3fd [kafs]
       __do_page_cache_readahead+0x208/0x2c5
       ondemand_readahead+0x3a2/0x3b7
       ? ondemand_readahead+0x3a2/0x3b7
       page_cache_async_readahead+0x5e/0x67
       generic_file_read_iter+0x23b/0x70c
       ? __inode_security_revalidate+0x2f/0x62
       __vfs_read+0xc4/0xe8
       vfs_read+0xd1/0x15a
       SyS_read+0x4c/0x89
       do_syscall_64+0x80/0x191
       entry_SYSCALL64_slow_path+0x25/0x25
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      29f06985
    • T
      afs: Prevent callback expiry timer overflow · 56e71431
      Tina Ruchandani 提交于
      get_seconds() returns real wall-clock seconds. On 32-bit systems
      this value will overflow in year 2038 and beyond. This patch changes
      afs_vnode record to use ktime_get_real_seconds() instead, for the
      fields cb_expires and cb_expires_at.
      Signed-off-by: NTina Ruchandani <ruchandani.tina@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      56e71431
    • T
      afs: Migrate vlocation fields to 64-bit · 8a79790b
      Tina Ruchandani 提交于
      get_seconds() returns real wall-clock seconds. On 32-bit systems
      this value will overflow in year 2038 and beyond. This patch changes
      afs's vlocation record to use ktime_get_real_seconds() instead, for the
      fields time_of_death and update_at.
      Signed-off-by: NTina Ruchandani <ruchandani.tina@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8a79790b
    • A
      afs: security: Replace rcu_assign_pointer() with RCU_INIT_POINTER() · df8a09d1
      Andreea-Cristina Bernat 提交于
      The use of "rcu_assign_pointer()" is NULLing out the pointer.
      According to RCU_INIT_POINTER()'s block comment:
      "1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
      it is better to use it instead of rcu_assign_pointer() because it has a
      smaller overhead.
      
      The following Coccinelle semantic patch was used:
      @@
      @@
      
      - rcu_assign_pointer
      + RCU_INIT_POINTER
        (..., NULL)
      Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      df8a09d1