1. 21 2月, 2019 2 次提交
  2. 02 12月, 2018 1 次提交
    • D
      nfs: don't dirty kernel pages read by direct-io · ad3cba22
      Dave Kleikamp 提交于
      When we use direct_IO with an NFS backing store, we can trigger a
      WARNING in __set_page_dirty(), as below, since we're dirtying the page
      unnecessarily in nfs_direct_read_completion().
      
      To fix, replicate the logic in commit 53cbf3b1 ("fs: direct-io:
      don't dirtying pages for ITER_BVEC/ITER_KVEC direct read").
      
      Other filesystems that implement direct_IO handle this; most use
      blockdev_direct_IO(). ceph and cifs have similar logic.
      
      mount 127.0.0.1:/export /nfs
      dd if=/dev/zero of=/nfs/image bs=1M count=200
      losetup --direct-io=on -f /nfs/image
      mkfs.btrfs /dev/loop0
      mount -t btrfs /dev/loop0 /mnt/
      
      kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0
      kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_
      kernel:  snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li
      kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G            E     4.20.0-rc1.master.20181111.ol7.x86_64 #1
      kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      kernel: Workqueue: nfsiod rpc_async_release [sunrpc]
      kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0
      kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8
      kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046
      kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001
      kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0
      kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080
      kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0
      kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001
      kernel: FS:  0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0
      kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      kernel: Call Trace:
      kernel:  __set_page_dirty_buffers+0xb6/0x110
      kernel:  set_page_dirty+0x52/0xb0
      kernel:  nfs_direct_read_completion+0xc4/0x120 [nfs]
      kernel:  nfs_pgio_release+0x10/0x20 [nfs]
      kernel:  rpc_free_task+0x30/0x70 [sunrpc]
      kernel:  rpc_async_release+0x12/0x20 [sunrpc]
      kernel:  process_one_work+0x174/0x390
      kernel:  worker_thread+0x4f/0x3e0
      kernel:  kthread+0x102/0x140
      kernel:  ? drain_workqueue+0x130/0x130
      kernel:  ? kthread_stop+0x110/0x110
      kernel:  ret_from_fork+0x35/0x40
      kernel: ---[ end trace 01341980905412c9 ]---
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      
      [forward-ported to v4.20]
      Signed-off-by: NCalum Mackay <calum.mackay@oracle.com>
      Reviewed-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      ad3cba22
  3. 09 8月, 2018 1 次提交
  4. 09 3月, 2018 1 次提交
  5. 16 1月, 2018 1 次提交
  6. 15 8月, 2017 1 次提交
  7. 21 4月, 2017 2 次提交
  8. 18 4月, 2017 1 次提交
  9. 25 12月, 2016 1 次提交
  10. 02 12月, 2016 1 次提交
  11. 23 9月, 2016 1 次提交
    • D
      NFS: direct: use complete() instead of complete_all() · 024de8f1
      Daniel Wagner 提交于
      There is only one waiter for the completion, therefore there
      is no need to use complete_all(). Let's make that clear by
      using complete() instead of complete_all().
      
      nfs_file_direct_write() or nfs_file_direct_read() allocated a request
      object via nfs_direct_req_alloc(), which initializes the
      completion. The request object then is freed later in the exit path.
      Between the initialization and the release either
      nfs_direct_write_schedule_iovec() resp
      nfs_direct_read_schedule_iovec() are called which will asynchronously
      process the request. The calling function waits via nfs_direct_wait()
      till the async work has been done. Thus there is only one waiter on
      the completion.
      
      nfs_direct_pgio_init() and nfs_direct_read_completion() are passed via
      function pointers to nfs pageio. The first function does a ref
      counting (get_dreq() and put_dreq()) which ensures that
      nfs_direct_read_completion() and nfs_direct_read_schedule_iovec() only
      call the completion path once.
      
      The usage pattern of the completion is:
      
      waiter context                          waker context
      
      nfs_file_direct_write()
        dreq = nfs_direct_req_alloc()
          init_completion()
        nfs_direct_write_schedule_iovec()
        nfs_direct_wait()
          wait_for_completion_killable()
      
                                              nfs_direct_write_schedule_work()
                                                nfs_direct_complete()
                                                  complete()
      
      nfs_file_direct_read()
        dreq = nfs_direct_req_all()
          init_completion()
        nfs_direct_read_schedule_iovec()
        nfs_direct_wait()
          wait_for_completion_killable()
                                              nfs_direct_read_schedule_iovec()
                                                nfs_direct_complete()
                                                  complete()
      
                                              nfs_direct_read_completion()
                                                nfs_direct_complete()
                                                  complete()
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      024de8f1
  12. 06 7月, 2016 6 次提交
  13. 25 6月, 2016 1 次提交
  14. 30 5月, 2016 1 次提交
  15. 09 5月, 2016 2 次提交
  16. 02 5月, 2016 3 次提交
  17. 11 4月, 2016 1 次提交
  18. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  19. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  20. 01 1月, 2016 3 次提交
  21. 29 12月, 2015 1 次提交
  22. 23 9月, 2015 1 次提交
    • K
      NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set · 834e465b
      Kinglong Mee 提交于
      When lseg's commit_through_mds is set, pnfs client always WARN once
      in nfs_direct_select_verf after checking ds_cinfo.nbuckets.
      
      nfs should use the DS verf except commit_through_mds is set for
      layout segment where nbuckets is zero.
      
      [17844.666094] ------------[ cut here ]------------
      [17844.667071] WARNING: CPU: 0 PID: 21758 at /root/source/linux-pnfs/fs/nfs/direct.c:174 nfs_direct_select_verf+0x5a/0x70 [nfs]()
      [17844.668650] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul auth_rpcgss crc32_pclmul crc32c_intel nfs_acl ghash_clmulni_intel lockd vmw_balloon xor vmw_vmci grace raid6_pq shpchp sunrpc parport_pc i2c_piix4 parport vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
      [17844.686676] CPU: 0 PID: 21758 Comm: kworker/0:1 Tainted: G        W  OE   4.3.0-rc1-pnfs+ #245
      [17844.687352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [17844.698502] Workqueue: nfsiod rpc_async_release [sunrpc]
      [17844.699212]  0000000000000009 0000000043e58010 ffff8800454fbc10 ffffffff813680c4
      [17844.699990]  ffff8800454fbc48 ffffffff8108b49d ffff88004eb20000 ffff88004eb20000
      [17844.700844]  ffff880062e26000 0000000000000000 0000000000000001 ffff8800454fbc58
      [17844.701637] Call Trace:
      [17844.725252]  [<ffffffff813680c4>] dump_stack+0x19/0x25
      [17844.732693]  [<ffffffff8108b49d>] warn_slowpath_common+0x7d/0xb0
      [17844.733855]  [<ffffffff8108b5da>] warn_slowpath_null+0x1a/0x20
      [17844.735015]  [<ffffffffa04a27ca>] nfs_direct_select_verf+0x5a/0x70 [nfs]
      [17844.735999]  [<ffffffffa04a2b83>] nfs_direct_set_hdr_verf+0x23/0x90 [nfs]
      [17844.736846]  [<ffffffffa04a2e17>] nfs_direct_write_completion+0x227/0x260 [nfs]
      [17844.737782]  [<ffffffffa04a433c>] nfs_pgio_release+0x1c/0x20 [nfs]
      [17844.738597]  [<ffffffffa0502df3>] pnfs_generic_rw_release+0x23/0x30 [nfsv4]
      [17844.739486]  [<ffffffffa01cbbea>] rpc_free_task+0x2a/0x70 [sunrpc]
      [17844.740326]  [<ffffffffa01cbcd5>] rpc_async_release+0x15/0x20 [sunrpc]
      [17844.741173]  [<ffffffff810a387c>] process_one_work+0x21c/0x4c0
      [17844.741984]  [<ffffffff810a37cd>] ? process_one_work+0x16d/0x4c0
      [17844.742837]  [<ffffffff810a3b6a>] worker_thread+0x4a/0x440
      [17844.743639]  [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
      [17844.744399]  [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
      [17844.745176]  [<ffffffff810a8d75>] kthread+0xf5/0x110
      [17844.745927]  [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
      [17844.747105]  [<ffffffff8172ce1f>] ret_from_fork+0x3f/0x70
      [17844.747856]  [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
      [17844.748642] ---[ end trace 336a2845d42b83f0 ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      834e465b
  23. 25 4月, 2015 1 次提交
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  24. 24 4月, 2015 2 次提交
  25. 16 4月, 2015 2 次提交
  26. 12 4月, 2015 1 次提交