- 04 2月, 2015 21 次提交
-
-
由 Tom Haynes 提交于
The flexfile layout is a new layout that extends the file layout. It is currently being drafted as a specification at https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com> Signed-off-by: NTao Peng <bergwolf@primarydata.com>
-
由 Peng Tao 提交于
Also take care to stop waiting if someone clears retry bit. Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
-
由 Peng Tao 提交于
Use it to indicate that LD wants to retry layoutget. LD can set it whenever it wants the common pnfs code to return and retry pnfs path through a new layout. The bit gets cleared when client does a new layoutget, when client closes the file (ROC case), or when kernel needs to evict the inode (non-ROC case). Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
-
由 Peng Tao 提交于
Otherwise we'll lose error tracking information when encoding layoutreturn. pnfs_put_lseg may be called from rpc callbacks. So we should not call pnfs_send_layoutreturn directly because it can deadlock in the rpc layer. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com>
-
由 Peng Tao 提交于
When it is set, generic pnfs would try to send layoutreturn right before last close/delegation_return regard less NFS_LAYOUT_ROC is set or not. LD can then make sure layoutreturn is always sent rather than being omitted. The difference against NFS_LAYOUT_RETURN is that NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so LD can set it and expect generic layer to try pnfs path at the same time. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com>
-
由 Peng Tao 提交于
Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com>
-
由 Peng Tao 提交于
So that callers can specify which range to return. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com>
-
由 Peng Tao 提交于
If current IO cannot be completed due to some transient errors, LD may want to ask generic layer to resend the request through pnfs again. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com>
-
由 Peng Tao 提交于
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count. For read, we always use pg_mirrors[0], so this effectively gives us freedom to use pg_mirror_idx to track the actual mirror to read from through out the IO stack. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <loghyr@primarydata.com>
-
由 Peng Tao 提交于
So that we can detect the case if some layout segments are still pinned which is surely a bug that we need to fix. Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
-
由 Weston Andros Adamson 提交于
This patch adds mirrored write support to the pgio layer. The default is to use one mirror, but pgio callers may define callbacks to change this to any value up to the (arbitrarily selected) limit of 16. The basic idea is to break out members of nfs_pageio_descriptor that cannot be shared between mirrored DSes and put them in a new structure. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
-
由 Weston Andros Adamson 提交于
This is needed to support mirrored writes - the first write can't just trash the lseg, we need to keep it around until all mirrors have written. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
-
由 Peng Tao 提交于
So that pnfs path is not disabled for ever. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
If current lseg is the last lseg marked with NFS_LSEG_LAYOUTRETURN, send layoutreturn. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
And if we are to return the same type of layouts, don't bother sending more layoutgets. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
It marks all matching layout segments as NFS_LSEG_LAYOUTRETURN, which is an indicator for pnfs_put_lseg() to send layoutreturn, and also prevents pnfs_update_layout() from using the returning segments. Once it is set, it never gets cleared. It also sets proper io failure bit so that pnfs path can be retried after PNFS_LAYOUTGET_RETRY_TIMEOUT second. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
It allows to specify different iomode to return. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
So that it is possible to return a specific iomode layouts. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
Per RFC 5661 Errata 3208: | A client MAY always forget its layout state and associated | layout stateid at any time (See also section 12.5.5.1). | In such case, the client MUST use a non-layout stateid for the next | LAYOUTGET operation. This will signal the server that the client has | no more layouts on the file and its respective layout state can be | released before issuing a new layout in response to LAYOUTGET. In order to make such a signal unique to server, client needs to serialize all layoutgets using non-layout stateid. We implement this by serializing layoutgets when client has no layout segments at hand. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTom Haynes <Thomas.Haynes@primarydata.com>
-
由 Peng Tao 提交于
flexfiles needs to start layoutcommit when necessary Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
-
- 09 10月, 2014 1 次提交
-
-
由 Trond Myklebust 提交于
You cannot call pnfs_put_lseg_async() more than once per lseg, so it is really an inappropriate way to deal with a refcount issue. Instead, replace it with a function that decrements the refcount, and puts the final 'free' operation (which is incompatible with locks) on the workqueue. Cc: Weston Andros Adamson <dros@primarydata.com> Fixes: e6cf82d1: pnfs: add pnfs_put_lseg_async Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 11 9月, 2014 7 次提交
-
-
由 Christoph Hellwig 提交于
If a layout driver keeps per-inode state outside of the layout segments it needs to be notified of any layout returns or recalls on an inode, and not just about the freeing of layout segments. Add a method to acomplish this, which will allow the block layout driver to handle the case of truncated and re-expanded files properly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
Expedite layout recall processing by forcing a layout commit when we see busy segments. Without it the layout recall might have to wait until the VM decided to start writeback for the file, which can introduce long delays. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
Currently there is no XDR buffer space allocated for the per-layout driver layoutcommit payload, which leads to server buffer overflows in the blocklayout driver even under simple workloads. As we can't do per-layout sizes for XDR operations we'll have to splice a previously encoded list of pages into the XDR stream, similar to how we handle ACL buffers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
After we issued a layoutreturn operations the may free the layout stateid and will thus cause bad stateid error when the client uses it again. We currently try to avoid this case by chosing the open stateid if not lsegs are present for this inode. But various places can hold refererence on lsegs and thus cause the list not to be empty shortly after a layout return. Add an explicit flag to mark the current layout stateid invalid and force usage of the openstateid after we did a full file layoutreturn. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
When layoutget returns an entirely new layout stateid it should not check the generation counter as the new stateid will start with a new counter entirely unrelated to old one. The current behavior causes constant layoutget failures against a block server which allocates a new stateid after an recall that removed all outstanding layouts. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
Ensure the lsegs are initialized early so that we don't pass an unitialized one back to ->free_lseg during error processing. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Peng Tao 提交于
Track lwb in nfs_commit_data so that we can use it to setup layoutcommit in commit_done callback. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 04 8月, 2014 1 次提交
-
-
由 Weston Andros Adamson 提交于
This is useful when lsegs need to be released while holding locks. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 16 7月, 2014 1 次提交
-
-
由 NeilBrown 提交于
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 6月, 2014 4 次提交
-
-
由 Weston Andros Adamson 提交于
Clean up pnfs_read_done_resend_to_mds and pnfs_write_done_resend_to_mds: - instead of passing all arguments from a nfs_pgio_header, just pass the header - share the common code Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
The refcounting on nfs_pgio_header was related to there being (possibly) more than one nfs_pgio_data. Now that nfs_pgio_data has been merged into nfs_pgio_header, there is no reason to do this ref counting. Just call the completion callback on nfs_pgio_release/nfs_pgio_error. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
nfs_rw_header was used to allocate an nfs_pgio_header along with an nfs_pgio_data, because a _header would need at least one _data. Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move it to nfs_pgio_header and get rid of nfs_rw_header. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 10 6月, 2014 1 次提交
-
-
由 Weston Andros Adamson 提交于
end_offset and req_offset both return u64 - avoid casting to u32 until it's needed, when it's less than the (u32) size returned by nfs_generic_pg_test. Also, fix the comments in pnfs_generic_pg_test. Running the cthon04 special tests caused this lockup in the "write/read at 2GB, 4GB edges" test when running against a file layout server: BUG: soft lockup - CPU#0 stuck for 22s! [bigfile2:823] Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw e1000 shpchp i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc btrfs xor zlib_deflate raid6_pq mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy autofs4 irq event stamp: 205958 hardirqs last enabled at (205957): [<ffffffff814a62dc>] restore_args+0x0/0x30 hardirqs last disabled at (205958): [<ffffffff814ad96a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (205956): [<ffffffff8103ffb2>] __do_softirq+0x1ea/0x2ab softirqs last disabled at (205951): [<ffffffff8104026d>] irq_exit+0x44/0x9a CPU: 0 PID: 823 Comm: bigfile2 Not tainted 3.15.0-rc1-branch-pgio_plus+ #3 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 task: ffff8800792ec480 ti: ffff880078c4e000 task.ti: ffff880078c4e000 RIP: 0010:[<ffffffffa02ce51f>] [<ffffffffa02ce51f>] nfs_page_group_unlock+0x3e/0x4b [nfs] RSP: 0018:ffff880078c4fab0 EFLAGS: 00000202 RAX: 0000000000000fff RBX: ffff88006bf83300 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006bf83300 RBP: ffff880078c4fab8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff8249840c R11: 0000000000000000 R12: 0000000000000035 R13: ffff88007ffc72d8 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f45f11b7740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3a8cb632d0 CR3: 000000007931c000 CR4: 00000000001407f0 Stack: ffff88006bf832c0 ffff880078c4fb00 ffffffffa02cec22 ffff880078c4fad8 00000fff810f9d99 ffff880078c4fca0 ffff88006bf832c0 ffff88006bf832c0 ffff880078c4fca0 ffff880078c4fd60 ffff880078c4fb28 ffffffffa02cee34 Call Trace: [<ffffffffa02cec22>] __nfs_pageio_add_request+0x298/0x34f [nfs] [<ffffffffa02cee34>] nfs_pageio_add_request+0x1f/0x42 [nfs] [<ffffffffa02d1722>] nfs_do_writepage+0x1b5/0x1e4 [nfs] [<ffffffffa02d1764>] nfs_writepages_callback+0x13/0x25 [nfs] [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff810eb32d>] write_cache_pages+0x254/0x37f [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff8149cf9e>] ? printk+0x54/0x56 [<ffffffff810eacca>] ? __set_page_dirty_nobuffers+0x22/0xe9 [<ffffffffa016d864>] ? put_rpccred+0x38/0x101 [sunrpc] [<ffffffffa02d1ae1>] nfs_writepages+0xb4/0xf8 [nfs] [<ffffffff810ec59c>] do_writepages+0x21/0x2f [<ffffffff810e36e8>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810e374a>] filemap_write_and_wait_range+0x2d/0x5b [<ffffffffa030ba0a>] nfs4_file_fsync+0x3a/0x98 [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff810e40c2>] generic_file_aio_write+0xa7/0xbd [<ffffffffa02c5c6b>] nfs_file_write+0xf0/0x170 [nfs] [<ffffffff81129215>] do_sync_write+0x59/0x78 [<ffffffff8112956c>] vfs_write+0xab/0x107 [<ffffffff81129c8b>] SyS_write+0x49/0x7f [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Reported-by: NAnna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 29 5月, 2014 4 次提交
-
-
由 Weston Andros Adamson 提交于
Remove alignment checks that would revert to MDS and change pg_test to return the max ammount left in the segment (or other pg_test call) up to size of passed request, or 0 if no space is left. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Since the ability to split pages into subpage requests has been added, nfs_pgio_header->rpc_list only ever has one pgio data. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Now that pg_test can change the size of the request (by returning a non-zero size smaller than the request), pg_test functions that call other pg_test functions must return the minimum of the result - or 0 if any fail. Also clean up the logic of some pg_test functions so that all checks are for contitions where coalescing is not possible. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
This is a step toward allowing pg_test to inform the the coalescing code to reduce the size of requests so they may fit in whatever scheme the pg_test callback wants to define. For now, just return the size of the request if there is space, or 0 if there is not. This shouldn't change any behavior as it acts the same as when the pg_test functions returned bool. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-