- 23 4月, 2015 1 次提交
-
-
由 Vinod Koul 提交于
HSU_DMA is selected by the HSU_DMA_PCI driver, this should be user selected so remove the user prompt for this Signed-off-by: NVinod Koul <vinod.koul@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 4月, 2015 36 次提交
-
-
由 Ilya Dryomov 提交于
After the switch to blk-mq rbd_wq processes requests, not devices. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Wolfram Sang 提交于
My Pengutronix address is not valid anymore, redirect people to the Pengutronix kernel team. Reported-by: NHarald Geyer <harald@ccbib.org> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Acked-by: NRobert Schwebel <r.schwebel@pengutronix.de> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Wolfram Sang 提交于
My Pengutronix address is not valid anymore, redirect people to the Pengutronix kernel team. Reported-by: NHarald Geyer <harald@ccbib.org> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Acked-by: NRobert Schwebel <r.schwebel@pengutronix.de> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Aaro Koskinen 提交于
Use fixed length string for register names. This saves 416 bytes in text size. Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Aaro Koskinen 提交于
Fix some trivial coding style issues to reduce noise from static analyzers. Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Aaro Koskinen 提交于
Convert OCTEON watchdog to WATCHDOG_CORE API. This enables support for multiple watchdogs on OCTEON boards. Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Michal Simek 提交于
Remove Kconfig dependency and enable driver for all ARCHs. Signed-off-by: NMichal Simek <michal.simek@xilinx.com> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Mathieu Olivari 提交于
MSM watchdog configuration happens in the same register block as the timer, so we'll use the same binding as the existing timer. The qcom-wdt will now be probed when devicetree has an entry compatible with "qcom,kpss-timer" or "qcom-scss-timer". Signed-off-by: NMathieu Olivari <mathieu@codeaurora.org> Reviewed-by: NStephen Boyd <sboyd@codeaurora.org> Acked-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Joe Perches 提交于
The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: Guenter Roeck <linux~roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Vinod Koul 提交于
DW_DMAC_CORE is slected by PCI or Platform driver, so this symbol shouldn't be user selectable, so remove the prompt Signed-off-by: NVinod Koul <vinod.koul@intel.com>
-
由 Eric Mei 提交于
When array is degraded, read data landed on failed drives will result in reading rest of data in a stripe. So a single sequential read would result in same data being read twice. This patch is to avoid chunk aligned read for degraded array. The downside is to involve stripe cache which means associated CPU overhead and extra memory copy. Test Results: Following test are done on a enterprise storage node with Seagate 6T SAS drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2, chunk size 128 KiB. I use FIO, using direct-io with various bs size, enough queue depth, tested sequential and 100% random read against 3 array config: 1) optimal, as baseline; 2) degraded; 3) degraded with this patch. Kernel version is 4.0-rc3. Each individual test I only did once so there might be some variations, but we just focus on big trend. Sequential Read: bs=(KiB) optimal(MiB/s) degraded(MiB/s) degraded-with-patch (MiB/s) 1024 1608 656 995 512 1624 710 956 256 1635 728 980 128 1636 771 983 64 1612 1119 1000 32 1580 1420 1004 16 1368 688 986 8 768 647 953 4 411 413 850 Random Read: bs=(KiB) optimal(IOPS) degraded(IOPS) degraded-with-patch (IOPS) 1024 163 160 156 512 274 273 272 256 426 428 424 128 576 592 591 64 726 724 726 32 849 848 837 16 900 970 971 8 927 940 929 4 948 940 955 Some notes: * In sequential + optimal, as bs size getting smaller, the FIO thread become CPU bound. * In sequential + degraded, there's big increase when bs is 64K and 32K, I don't have explanation. * In sequential + degraded-with-patch, the MD thread mostly become CPU bound. If you want to we can discuss specific data point in those data. But in general it seems with this patch, we have more predictable and in most cases significant better sequential read performance when array is degraded, and almost no noticeable impact on random read. Performance is a complicated thing, the patch works well for this particular configuration, but may not be universal. For example I imagine testing on all SSD array may have very different result. But I personally think in most cases IO bandwidth is more scarce resource than CPU. Signed-off-by: NEric Mei <eric.mei@seagate.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The default setting of 256 stripe_heads is probably much too small for many configurations. So it is best to make it auto-configure. Shrinking the cache under memory pressure is easy. The only interesting part here is that we put a fairly high cost ('seeks') on shrinking the cache as the cost is greater than just having to read more data, it reduces parallelism. Growing the cache on demand needs to be done carefully. If we allow fast growth, that can upset memory balance as lots of dirty memory can quickly turn into lots of memory queued in the stripe_cache. It is important for the raid5 block device to appear congested to allow write-throttling to work. So we only add stripes slowly. We set a flag when an allocation fails because all stripes are in use, allocate at a convenient time when that flag is set, and don't allow it to be set again until at least one stripe_head has been released for re-use. This means that a spurt of requests will only cause one stripe_head to be allocated, but a steady stream of requests will slowly increase the cache size - until memory pressure puts it back again. It could take hours to reach a steady state. The value written to, and displayed in, stripe_cache_size is used as a minimum. The cache can grow above this and shrink back down to it. The actual size is not directly visible, though it can be deduced to some extent by watching stripe_cache_active. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This allows us to easily add more (atomic) flags. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Rather than adjusting max_nr_stripes whenever {grow,drop}_one_stripe() succeeds, do it inside the functions. Also choose the correct hash to handle next inside the functions. This removes duplication and will help with future new uses of {grow,drop}_one_stripe. This also fixes a minor bug where the "md/raid:%md: allocate XXkB" message always said "0kB". Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is needed for future improvement to stripe cache management. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
Depending on the available coding we allow optimized rmw logic for write operations. To support easier testing this patch allows manual control of the rmw/rcw descision through the interface /sys/block/mdX/md/rmw_level. The configuration can handle three levels of control. rmw_level=0: Disable rmw for all RAID types. Hardware assisted P/Q calculation has no implementation path yet to factor in/out chunks of a syndrome. Enforcing this level can be benefical for slow CPUs with hardware syndrome support and fast SSDs. rmw_level=1: Estimate rmw IOs and rcw IOs. Execute rmw only if we will save IOs. This equals the "old" unpatched behaviour and will be the default. rmw_level=2: Execute rmw even if calculated IOs for rmw and rcw are equal. We might have higher CPU consumption because of calculating the parity twice but it can be benefical otherwise. E.g. RAID4 with fast dedicated parity disk/SSD. The option is implemented just to be forward-looking and will ONLY work with this patch! Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
Glue it altogehter. The raid6 rmw path should work the same as the already existing raid5 logic. So emulate the prexor handling/flags and split functions as needed. 1) Enable xor_syndrome() in the async layer. 2) Split ops_run_prexor() into RAID4/5 and RAID6 logic. Xor the syndrome at the start of a rmw run as we did it before for the single parity. 3) Take care of rmw run in ops_run_reconstruct6(). Again process only the changed pages to get syndrome back into sync. 4) Enhance set_syndrome_sources() to fill NULL pages if we are in a rmw run. The lower layers will calculate start & end pages from that and call the xor_syndrome() correspondingly. 5) Adapt the several places where we ignored Q handling up to now. Performance numbers for a single E5630 system with a mix of 10 7200k desktop/server disks. 300 seconds random write with 8 threads onto a 3,2TB (10*400GB) RAID6 64K chunk without spare (group_thread_cnt=4) bsize rmw_level=1 rmw_level=0 rmw_level=1 rmw_level=0 skip_copy=1 skip_copy=1 skip_copy=0 skip_copy=0 4K 115 KB/s 141 KB/s 165 KB/s 140 KB/s 8K 225 KB/s 275 KB/s 324 KB/s 274 KB/s 16K 434 KB/s 536 KB/s 640 KB/s 534 KB/s 32K 751 KB/s 1,051 KB/s 1,234 KB/s 1,045 KB/s 64K 1,339 KB/s 1,958 KB/s 2,282 KB/s 1,962 KB/s 128K 2,673 KB/s 3,862 KB/s 4,113 KB/s 3,898 KB/s 256K 7,685 KB/s 7,539 KB/s 7,557 KB/s 7,638 KB/s 512K 19,556 KB/s 19,558 KB/s 19,652 KB/s 19,688 Kb/s Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
expansion/resync can grab a stripe when the stripe is in batch list. Since all stripes in batch list must be in the same state, we can't allow some stripes run into expansion/resync. So we delay expansion/resync for stripe in batch list. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
If io error happens in any stripe of a batch list, the batch list will be split, then normal process will run for the stripes in the list. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
stripe cache is 4k size. Even adjacent full stripe writes are handled in 4k unit. Idealy we should use big size for adjacent full stripe writes. Bigger stripe cache size means less stripes runing in the state machine so can reduce cpu overhead. And also bigger size can cause bigger IO size dispatched to under layer disks. With below patch, we will automatically batch adjacent full stripe write together. Such stripes will be added to the batch list. Only the first stripe of the list will be put to handle_list and so run handle_stripe(). Some steps of handle_stripe() are extended to cover all stripes of the list, including ops_run_io, ops_run_biodrain and so on. With this patch, we have less stripes running in handle_stripe() and we send IO of whole stripe list together to increase IO size. Stripes added to a batch list have some limitations. A batch list can only include full stripe write and can't cross chunk boundary to make sure stripes have the same parity disks. Stripes in a batch list must be in the same state (no written, toread and so on). If a stripe is in a batch list, all new read/write to add_stripe_bio will be blocked to overlap conflict till the batch list is handled. The limitations will make sure stripes in a batch list be in exactly the same state in the life circly. I did test running 160k randwrite in a RAID5 array with 32k chunk size and 6 PCIe SSD. This patch improves around 30% performance and IO size to under layer disk is exactly 32k. I also run a 4k randwrite test in the same array to make sure the performance isn't changed with the patch. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
Track overwrite disk count, so we can know if a stripe is a full stripe write. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
A freshly new stripe with write request can be batched. Any time the stripe is handled or new read is queued, the flag will be cleared. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
Use flex_array for scribble data. Next patch will batch several stripes together, so scribble data should be able to cover several stripes, so this patch also allocates scribble data for stripes across a chunk. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Heinz Mauelshagen 提交于
md raid0: access mddev->queue (request queue member) conditionally because it is not set when accessed from dm-raid The patch makes 3 references to mddev->queue in the raid0 personality conditional in order to allow for it to be accessed from dm-raid. Mandatory, because md instances underneath dm-raid don't manage a request queue of their own which'd lead to oopses without the patch. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Tested-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When md notices non-sync IO happening while it is trying to resync (or reshape or recover) it slows down to the set minimum. The default minimum might have made sense many years ago but the drives have become faster. Changing the default to match the times isn't really a long term solution. This patch changes the code so that instead of waiting until the speed has dropped to the target, it just waits until pending requests have completed. This means that the delay inserted is a function of the speed of the devices. Testing shows that: - for some loads, the resync speed is unchanged. For those loads increasing the minimum doesn't change the speed either. So this is a good result. To increase resync speed under such loads we would probably need to increase the resync window size. - for other loads, resync speed does increase to a reasonable fraction (e.g. 20%) of maximum possible, and throughput of the load only drops a little bit (e.g. 10%) - for other loads, throughput of the non-sync load drops quite a bit more. These seem to be latency-sensitive loads. So it isn't a perfect solution, but it is mostly an improvement. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This option is not well justified and testing suggests that it hardly ever makes any difference. The comment suggests there might be a need to wait for non-resync activity indicated by ->nr_waiting, however raise_barrier() already waits for all of that. So just remove it to simplify reasoning about speed limiting. This allows us to remove a 'FIXME' comment from raid5.c as that never used the flag. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
There is really no need for sync_min to be a multiple of chunk_size, and values read from here often aren't. That means you cannot read a value and expect to be able to write it back later. So remove the chunk_size check, and round down to a multiple of 4K, to be sure everything works with 4K-sector devices. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the message clear the Faulty bit in their respective rdev->flags. 2. The node initiating re-add, gathers the bitmaps of all nodes and copies them into the local bitmap. It does not clear the bitmap from which it is copying. 3. Initiating node schedules a md recovery to sync the devices. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This adds the capability of re-adding a failed disk by writing "re-add" to /sys/block/mdXX/md/dev-YYY/state. This facilitates adding disks which have encountered a temporary error such as a network disconnection/hiccup in an iSCSI device, or a SAN cable disconnection which has been restored. In such a situation, you do not need to remove and re-add the device. Writing re-add to the failed device's state would add it again to the array and perform the recovery of only the blocks which were written after the device failed. This works for generic md, and is not related to clustering. However, this patch is to ease re-add operations listed above in clustering environments. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This adds "remove" capabilities for the clustered environment. When a user initiates removal of a device from the array, a REMOVE message with disk number in the array is sent to all the nodes which kick the respective device in their own array. This facilitates the removal of failed devices. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This is required by the clustering module (patches to follow) to find the device to remove or re-add. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This export is required for clustering module in order to co-ordinate remove/readd a rdev from all nodes. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Guoqing Jiang 提交于
Since the node num of md-cluster is from zero, and cinfo->slot_number represents the slot num of dlm, no need to check for equality. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Eran Ben Elisha 提交于
Currently we parse max_msg_sz from the wrong offset in QUERY_DEV_CAP, fix to use the right offset. Fixes: 0b131561 ('net/mlx4_en: Add Flow control statistics [..]') Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matthew Wilcox 提交于
The only reason to keep parisc's private asm/scatterlist.h was that it had the macro sg_virt_addr(). Convert all callers to use something else (sometimes just sg->offset was enough, others should use sg_virt()), and we can just use the asm-generic scatterlist.h instead. Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Signed-off-by: NDave Anglin <dave.anglin@bell.net> Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Linus Torvalds 提交于
My 'allmodconfig' build is _almost_ free of warnings, and most of the remaining ones are for legacy drivers that just do bad things that I can't find it in my black heart to care too much about. But this one was just annoying me: drivers/media/v4l2-core/videobuf2-core.c:3256:26: warning: unused variable ‘fileio’ [-Wunused-variable] because commit 0e661006 ("[media] vb2: fix 'UNBALANCED' warnings when calling vb2_thread_stop()") removed all users of 'fileio' and instead calls "__vb2_cleanup_fileio(q)" to clean up q->fileio. But the now unused 'fileio' variable was left around. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 4月, 2015 3 次提交
-
-
由 Sumit Semwal 提交于
Fixes: 817bd7253291 ("dma-buf: cleanup dma_buf_export() to make it easily extensible") Stupid copy-paste from me in the above patch leads to the following static checker warning: drivers/staging/android/ion/ion.c:1112 ion_share_dma_buf() error: potentially dereferencing uninitialized 'buffer'. drivers/staging/android/ion/ion.c 1103 struct dma_buf *ion_share_dma_buf(struct ion_client *client, 1104 struct ion_handle *handle) 1105 { 1106 struct ion_buffer *buffer; ^^^^^^ 1107 struct dma_buf *dmabuf; 1108 bool valid_handle; 1109 DEFINE_DMA_BUF_EXPORT_INFO(exp_info); 1110 1111 exp_info.ops = &dma_buf_ops; 1112 exp_info.size = buffer->size; ^^^^^^ 1113 exp_info.flags = O_RDWR; 1114 exp_info.priv = buffer; ^^^^^^ And here also. 1115 This patch corrects this stupidity. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
-
由 Sumit Semwal 提交于
At present, dma_buf_export() takes a series of parameters, which makes it difficult to add any new parameters for exporters, if required. Make it simpler by moving all these parameters into a struct, and pass the struct * as parameter to dma_buf_export(). While at it, unite dma_buf_export_named() with dma_buf_export(), and change all callers accordingly. Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org> Acked-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com> Acked-by: NDave Airlie <airlied@redhat.com> Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
-
由 Andreas Oetken 提交于
The Error-Bit on the avalon streaming interface of the tx-dma-channel was always set. In SGMII configurations this leads to error-symbols on the PCS and packet-rejection on the receiver side (e.g. SGMII/1000Base-X connected switch). This only applies to the tse-configuration with MSGDMA. This issue was detected and fixed on a custom board with a direct connection to a Marvell switch in SGMII-PHY-Mode. (incl. custom patches for SGMII-PCS). According to the datasheet if ff_tx_err (avalon-streaming) is set it is forwarded to gm_tx_err. As a result the PCS is forwarding the error by sending a "/V/"-caracter. Signed-off-by: NAndreas Oetken <ennoerlangen@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-