- 16 8月, 2011 1 次提交
-
-
由 Jeff Moyer 提交于
Commit ae1b1539, block: reimplement FLUSH/FUA to support merge, introduced a performance regression when running any sort of fsyncing workload using dm-multipath and certain storage (in our case, an HP EVA). The test I ran was fs_mark, and it dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out that dm-multipath always advertised flush+fua support, and passed commands on down the stack, where those flags used to get stripped off. The above commit changed that behavior: static inline struct request *__elv_next_request(struct request_queue *q) { struct request *rq; while (1) { - while (!list_empty(&q->queue_head)) { + if (!list_empty(&q->queue_head)) { rq = list_entry_rq(q->queue_head.next); - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || - (rq->cmd_flags & REQ_FLUSH_SEQ)) - return rq; - rq = blk_do_flush(q, rq); - if (rq) - return rq; + return rq; } Note that previously, a command would come in here, have REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush: struct request *blk_do_flush(struct request_queue *q, struct request *rq) { unsigned int fflags = q->flush_flags; /* may change, cache it */ bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); unsigned skip = 0; ... if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { rq->cmd_flags &= ~REQ_FLUSH; if (!has_fua) rq->cmd_flags &= ~REQ_FUA; return rq; } So, the flush machinery was bypassed in such cases (q->flush_flags == 0 && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)). Now, however, we don't get into the flush machinery at all. Instead, __elv_next_request just hands a request with flush and fua bits set to the scsi_request_fn, even if the underlying request_queue does not support flush or fua. The agreed upon approach is to fix the flush machinery to allow stacking. While this isn't used in practice (since there is only one request-based dm target, and that target will now reflect the flush flags of the underlying device), it does future-proof the solution, and make it function as designed. In order to make this work, I had to add a field to the struct request, inside the flush structure (to store the original req->end_io). Shaohua had suggested overloading the union with rb_node and completion_data, but the completion data is used by device mapper and can also be used by other drivers. So, I didn't see a way around the additional field. I tested this patch on an HP EVA with both ext4 and xfs, and it recovers the lost performance. Comments and other testers, as always, are appreciated. Cheers, Jeff Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 11 8月, 2011 3 次提交
-
-
由 Shaohua Li 提交于
This patch reverts commit 35ae66e0(block: Make rq_affinity = 1 work as expected). The purpose is to avoid an unnecessary IPI. Let's take an example. My test box has cpu 0-7, one socket. Say request is added from CPU 1, blk_complete_request() occurs at CPU 7. Without the reverted patch, softirq will be done at CPU 7. With it, an IPI will be directed to CPU 0, and softirq will be done at CPU 0. In this case, doing softirq at CPU 0 and CPU 7 have no difference from cache sharing point view and we can avoid an ipi if doing it in CPU 7. An immediate concern is this is just like QUEUE_FLAG_SAME_FORCE, but actually not. blk_complete_request() is running in interrupt handler, and currently I/O controller doesn't support multiple interrupts (I checked several LSI cards and AHCI), so only one CPU can run blk_complete_request(). This is still quite different as QUEUE_FLAG_SAME_FORCE. Since only one CPU runs softirq, the only difference with below patch is softirq not always runs at the first CPU of a group. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Namhyung Kim 提交于
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or FUA follows WRITE, use the same 'F' flag for both cases and distinguish them by their (relative) position. The end results look like (other flags might be shown also): - WRITE: W - WRITE_FLUSH: FW - WRITE_FUA: WF - WRITE_FLUSH_FUA: FWF Note that we reuse TC_BARRIER due to lack of bit space of act_mask so that the older versions of blktrace tools will report flush requests as barriers from now on. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Matthew Wilcox 提交于
REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as on a request, so relocate them to the shared part of the enum. Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 8月, 2011 2 次提交
-
-
由 Jens Axboe 提交于
Merge branch 'stable/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
-
由 Jeff Moyer 提交于
blk_insert_flush has the following check: /* * If there's data but flush is not necessary, the request can be * processed directly without going through flush machinery. Queue * for normal execution. */ if ((policy & REQ_FSEQ_DATA) && !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { list_add_tail(&rq->queuelist, &q->queue_head); return; } However, blk_flush_policy will not return with policy set to only REQ_FSEQ_DATA: static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq) { unsigned int policy = 0; if (fflags & REQ_FLUSH) { if (rq->cmd_flags & REQ_FLUSH) policy |= REQ_FSEQ_PREFLUSH; if (blk_rq_sectors(rq)) policy |= REQ_FSEQ_DATA; if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA)) policy |= REQ_FSEQ_POSTFLUSH; } return policy; } Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set. Fix this mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH check. Tejun notes: Hmmm... yes, this can become a correctness issue if (and only if) blk_queue_flush() is called to change q->flush_flags while requests are in-flight; otherwise, requests wouldn't reach the function at all. Also, I think it would be a generally good idea to always set FSEQ_DATA if the request has data. Cheers, Jeff Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 09 8月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
With the frontend having Xen but the backend not, it just looks odd: <*> Xen virtual block device support <*> Block-device backend driver Fix it to have the 'Xen' in front of it. Reported-by: NSander Eikelenboom <linux@eikelenboom.it> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 05 8月, 2011 2 次提交
-
-
由 Vivek Goyal 提交于
There are always questions about why CFQ is idling on various conditions. Recent ones is Christoph asking again why to idle on REQ_NOIDLE. His assertion is that XFS is relying more and more on workqueues and is concerned that CFQ idling on IO from every workqueue will impact XFS badly. So he suggested that I add some more documentation about CFQ idling and that can provide more clarity on the topic and also gives an opprotunity to poke a hole in theory and lead to improvements. So here is my attempt at that. Any comments are welcome. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Tao Ma 提交于
Commit 5757a6d7 introduced a new rq_affinity = 2 so as to make the request completed in the __make_request cpu. But it makes the old rq_affinity = 1 not work any more. The root cause is that if the 'cpu' and 'req->cpu' is in the same group and cpu != req->cpu, ccpu will be the same as group_cpu, so the completion will be excuted in the 'cpu' not 'group_cpu'. This patch fix problem by simpling removing group_cpu and the codes are more explicit now. If ccpu == cpu, we complete in cpu, otherwise we raise_blk_irq to ccpu. Cc: Christoph Hellwig <hch@infradead.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: NTao Ma <boyu.mt@taobao.com> Reviewed-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 03 8月, 2011 1 次提交
-
-
由 Axel Lin 提交于
of_device_id structures need a NULL terminating entry, add it. Signed-off-by: NAxel Lin <axel.lin@gmail.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 02 8月, 2011 5 次提交
-
-
由 Herbert Poetzl 提交于
Remove the (unsigned long long) cast in diskstats_show() and adjusts the seq_printf() format string to 'unsigned long' diskstats_show() uses part_stat_read() to get the stats, which either accesses the specified field in the struct disk_stats directly (non SMP) or sums up the per CPU values in a variable of the same type as the field, so in any case the result will have the same type and range as the specified field which for all disk_stats entries is unsigned long Also, for unsigned long ranges the output of %lu should be identical to the one of %llu, so no change in the actual proc entry contents. Signed-off-by: NHerbert Poetzl <herbert@13thfloor.at> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Andrew Morton 提交于
The report has an ISO which has a very long manufacturer ID. It seems that Linux is wrong, not the ISO maker. Relax the check for the length of this field: emit a warning and truncate the incoming data to 2048 bytes rather than rejecting the entire thing. dvd_manufact.value isn't null-terminated. I'm not even sure if it's a string. The kernel doesn't apepar to use it anyway. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=39062 Reported-by: <ale.goujon@gmail.com> Tested-by: <ale.goujon@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 H Hartley Sweeten 提交于
The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Due to conflicts with the moduleh tree in linux-next, we run into an include file mess. We really need export.h in that tree, but if we add module.h locally then the issue is easier to resolve. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Vivek Goyal 提交于
FQ keeps track of number of groups which are linked on blkcg->blkg_list. This is useful to avoid races between queue exit and cgroup exit code paths. So if at the request queue exit time linked group count is not zero, that means there are some group out there which is yet to be deleted under rcu read period and queue exit code should wait for on rcu period. In my previous patch I forgot to decrease the number of group count. So in current form, we nr_blkcg_linked_grps is always non-zero and we will always wait one rcu period (if BLK_CGROUP=y). The side effect of this is that it can increase boot time. I am surprised, nobody complained so far. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 01 8月, 2011 25 次提交
-
-
由 Shaohua Li 提交于
read request is always sync. Using rw_is_sync() to determine if a bio is sync. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Kay Sievers 提交于
LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD waits for show() to finish to remove the sysfs file. cat /sys/class/block/loop0/loop/backing_file mutex_lock_nested+0x176/0x350 ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] loop_attr_do_show_backing_file+0x2f/0xd0 [loop] dev_attr_show+0x1b/0x60 ? sysfs_read_file+0x86/0x1a0 ? __get_free_pages+0x12/0x50 sysfs_read_file+0xaf/0x1a0 ioctl(LOOP_CLR_FD): wait_for_common+0x12c/0x180 ? try_to_wake_up+0x2a0/0x2a0 wait_for_completion+0x18/0x20 sysfs_deactivate+0x178/0x180 ? sysfs_addrm_finish+0x43/0x70 ? sysfs_addrm_start+0x1d/0x20 sysfs_addrm_finish+0x43/0x70 sysfs_hash_and_remove+0x85/0xa0 sysfs_remove_group+0x59/0x100 loop_clr_fd+0x1dc/0x3f0 [loop] lo_ioctl+0x223/0x7a0 [loop] Instead of taking the lo_ctl_mutex from sysfs code, take the inner lo->lo_lock, to protect the access to the backing_file data. Thanks to Tejun for help debugging and finding a solution. Cc: Milan Broz <mbroz@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Kay Sievers 提交于
Instead of unconditionally creating a fixed number of dead loop devices which need to be investigated by storage handling services, even when they are never used, we allow distros start with 0 loop devices and have losetup(8) and similar switch to the dynamic /dev/loop-control interface instead of searching /dev/loop%i for free devices. Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Kay Sievers 提交于
Loop devices today have a fixed pre-allocated number of usually 8. The number can only be changed at module init time. To find a free device to use, /dev/loop%i needs to be scanned, and all devices need to be opened until a free one is possibly found. This adds a new /dev/loop-control device node, that allows to dynamically find or allocate a free device, and to add and remove loop devices from the running system: LOOP_CTL_ADD adds a specific device. Arg is the number of the device. It returns the device i or a negative error code. LOOP_CTL_REMOVE removes a specific device, Arg is the number the device. It returns the device i or a negative error code. LOOP_CTL_GET_FREE finds the next unbound device or allocates a new one. No arg is given. It returns the device i or a negative error code. The loop kernel module gets automatically loaded when /dev/loop-control is accessed the first time. The alias specified in the module, instructs udev to create this 'dead' device node, even when the module is not loaded. Example: cfd = open("/dev/loop-control", O_RDWR); # add a new specific loop device err = ioctl(cfd, LOOP_CTL_ADD, devnr); # remove a specific loop device err = ioctl(cfd, LOOP_CTL_REMOVE, devnr); # find or allocate a free loop device to use devnr = ioctl(cfd, LOOP_CTL_GET_FREE); sprintf(loopname, "/dev/loop%i", devnr); ffd = open("backing-file", O_RDWR); lfd = open(loopname, O_RDWR); err = ioctl(lfd, LOOP_SET_FD, ffd); Cc: Tejun Heo <tj@kernel.org> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Kay Sievers 提交于
Replace the linked list, that keeps track of allocated devices, with an idr index to allow a more efficient lookup of devices. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Mike Christie 提交于
This moves the FC classes bsg code to the block layer and makes it a lib so that other classes like iscsi and SAS can use it. It is helpful because working with the request queue, bios, creating scatterlists, etc are a pain that the LLD does not have to worry about with normal IOs and should not have to worry about for bsg requests. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
git://git.linux-nfs.org/projects/trondmy/linux-nfs由 Linus Torvalds 提交于
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) pnfsblock: write_pagelist handle zero invalid extents pnfsblock: note written INVAL areas for layoutcommit pnfsblock: bl_write_pagelist pnfsblock: bl_read_pagelist pnfsblock: cleanup_layoutcommit pnfsblock: encode_layoutcommit pnfsblock: merge rw extents pnfsblock: add extent manipulation functions pnfsblock: bl_find_get_extent pnfsblock: xdr decode pnfs_block_layout4 pnfsblock: call and parse getdevicelist pnfsblock: merge extents pnfsblock: lseg alloc and free pnfsblock: remove device operations pnfsblock: add device operations pnfsblock: basic extent code pnfsblock: use pageio_ops api pnfsblock: add blocklayout Kconfig option, Makefile, and stubs pnfs: cleanup_layoutcommit pnfs: ask for layout_blksize and save it in nfs_server ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6由 Linus Torvalds 提交于
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slab: use NUMA_NO_NODE slab: remove one NR_CPUS dependency
-
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback由 Linus Torvalds 提交于
* 'urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: don't busy retry the inode on failed grab_super_passive()
-
git://git.infradead.org/battery-2.6由 Linus Torvalds 提交于
* git://git.infradead.org/battery-2.6: gpio-charger: Fix checking return value of request_any_context_irq power_supply: MAX17042: Support additional properties max8903_charger: Allow platform data to be __initdata power_supply: Add charger driver for MAX8998/LP3974 power_supply: Add charger driver for MAX8997/8966 max17042_battery: Remove obsolete cleanup for clientdata twl4030_charger: Fix warnings wm831x_power: Support multiple instances wm831x_backup: Support multiple instances apm_power: Fix style error in macros s3c_adc_battery: Fix annotation for s3c_adc_battery_probe() bq20z75: Enable detection after registering bq20z75: Add support for external notification
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/cpupowerutils由 Linus Torvalds 提交于
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/cpupowerutils: cpupower: Do detect IDA (opportunistic processor performance) via cpuid cpupower: Show Intel turbo ratio support via ./cpupower frequency-info cpupowerutils: increase MAX_LINE_LEN cpupower: Rename package from cpupowerutils to cpupower cpupowerutils: Rename: libcpufreq->libcpupower cpupowerutils: use kernel version-derived version string cpupowerutils: utils - ConfigStyle bugfixes cpupowerutils: helpers - ConfigStyle bugfixes cpupowerutils: idle_monitor - ConfigStyle bugfixes cpupowerutils: lib - ConfigStyle bugfixes cpupowerutils: bench - ConfigStyle bugfixes cpupowerutils: do not update po files on each and every compile cpupowerutils: remove ccdv, use kernel quiet/verbose mechanism cpupowerutils: use COPYING, CREDITS from top-level directory cpupowerutils - cpufrequtils extended with quite some features
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6由 Linus Torvalds 提交于
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6: smc91c92_cs.c: fix bogus compiler warning orinoco_cs: be more careful when matching cards with ID 0x0156:0x0002 hostap_cs: support cards with "Version 01.02" as third product ID pcmcia: add PCMCIA_DEVICE_MANF_CARD_PROD_ID3 pxa2xx pcmcia - stargate 2 use gpio array. pcmcia: pxa2xx: remove empty socket_init / socket_resume functions. drivers:pcmcia:soc_common: make socket_init and socket_suspend optional
-
由 Peng Tao 提交于
For invalid extents, find other pages in the same fsblock and write them out. [pnfsblock: write_begin] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NPeng Tao <peng_tao@emc.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Signed-off-by: NPeng Tao <peng_tao@emc.com> Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: bl_write_pagelist support functions] [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: NZhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> [SQUASHME: pnfsblock: mds_offset is set in the generic layer] Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> [pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED] Signed-off-by: NPeng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: fixup blksize alignment in bl_setup_layoutcommit] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: NZhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED] Signed-off-by: NPeng Tao <peng_tao@emc.com> [pnfsblock: read path error handling] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: NZhang Jingwang <yyalone@gmail.com> [pnfs-block: use new read_pagelist api] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: NJim Rees <rees@umich.edu> [pnfsblock: introduce bl_committing list] Signed-off-by: NPeng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: NTao Guo <guotao@nrchpc.ac.cn> [pnfsblock: cleanup_layoutcommit wants a status parameter] Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NPeng Tao <peng_tao@emc.com> Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: NTao Guo <guotao@nrchpc.ac.cn> [pnfsblock: prevent commit list corruption] [pnfsblock: fix layoutcommit with an empty opaque] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Adds working implementations of various support functions to handle INVAL extents, needed by writes, such as bl_mark_sectors_init and bl_is_sector_init. [pnfsblock: fix 64-bit compiler warnings for extent manipulation] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> [Implement release_inval_marks] Signed-off-by: NZhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Implement bl_find_get_extent(), one of the core extent manipulation routines. [pnfsblock: Lookup list entry of layouts and tags in reverse order] Signed-off-by: NZhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJim Rees <rees@umich.edu> pnfsblock: fix print format warnings for sector_t and size_t gcc spews warnings about these on x86_64, e.g.: fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’ fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
XDR decodes the block layout payload sent in LAYOUTGET result, storing the result in an extent list. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix bug getting pnfs_layout_type in translate_devid().] Signed-off-by: NTao Guo <guotao@nrchpc.ac.cn> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO for each device returned. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: NJim Rees <rees@umich.edu> [pnfsblock: fix pnfs_deviceid references] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix print format warnings for sector_t and size_t] [pnfs-block: #include <linux/vmalloc.h>] [pnfsblock: no PNFS_NFS_SERVER] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [pnfsblock: fix bug determining size of striped volume] [pnfsblock: fix oops when using multiple devices] Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> [pnfsblock: get rid of vmap and deviceid->area structure] Signed-off-by: NPeng Tao <peng_tao@emc.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Replace a stub, so that extents underlying the layouts are properly added, merged, or ignored as necessary. Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: delete the new node before put it] Signed-off-by: NMingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NPeng Tao <peng_tao@emc.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Fred Isaman 提交于
Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix bug getting pnfs_layout_type in translate_devid().] Signed-off-by: NTao Guo <guotao@nrchpc.ac.cn> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NZhang Jingwang <Jingwang.Zhang@emc.com> Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJim Rees <rees@umich.edu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-