- 22 4月, 2015 21 次提交
-
-
由 NeilBrown 提交于
This allows us to easily add more (atomic) flags. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Rather than adjusting max_nr_stripes whenever {grow,drop}_one_stripe() succeeds, do it inside the functions. Also choose the correct hash to handle next inside the functions. This removes duplication and will help with future new uses of {grow,drop}_one_stripe. This also fixes a minor bug where the "md/raid:%md: allocate XXkB" message always said "0kB". Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is needed for future improvement to stripe cache management. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
Depending on the available coding we allow optimized rmw logic for write operations. To support easier testing this patch allows manual control of the rmw/rcw descision through the interface /sys/block/mdX/md/rmw_level. The configuration can handle three levels of control. rmw_level=0: Disable rmw for all RAID types. Hardware assisted P/Q calculation has no implementation path yet to factor in/out chunks of a syndrome. Enforcing this level can be benefical for slow CPUs with hardware syndrome support and fast SSDs. rmw_level=1: Estimate rmw IOs and rcw IOs. Execute rmw only if we will save IOs. This equals the "old" unpatched behaviour and will be the default. rmw_level=2: Execute rmw even if calculated IOs for rmw and rcw are equal. We might have higher CPU consumption because of calculating the parity twice but it can be benefical otherwise. E.g. RAID4 with fast dedicated parity disk/SSD. The option is implemented just to be forward-looking and will ONLY work with this patch! Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
Glue it altogehter. The raid6 rmw path should work the same as the already existing raid5 logic. So emulate the prexor handling/flags and split functions as needed. 1) Enable xor_syndrome() in the async layer. 2) Split ops_run_prexor() into RAID4/5 and RAID6 logic. Xor the syndrome at the start of a rmw run as we did it before for the single parity. 3) Take care of rmw run in ops_run_reconstruct6(). Again process only the changed pages to get syndrome back into sync. 4) Enhance set_syndrome_sources() to fill NULL pages if we are in a rmw run. The lower layers will calculate start & end pages from that and call the xor_syndrome() correspondingly. 5) Adapt the several places where we ignored Q handling up to now. Performance numbers for a single E5630 system with a mix of 10 7200k desktop/server disks. 300 seconds random write with 8 threads onto a 3,2TB (10*400GB) RAID6 64K chunk without spare (group_thread_cnt=4) bsize rmw_level=1 rmw_level=0 rmw_level=1 rmw_level=0 skip_copy=1 skip_copy=1 skip_copy=0 skip_copy=0 4K 115 KB/s 141 KB/s 165 KB/s 140 KB/s 8K 225 KB/s 275 KB/s 324 KB/s 274 KB/s 16K 434 KB/s 536 KB/s 640 KB/s 534 KB/s 32K 751 KB/s 1,051 KB/s 1,234 KB/s 1,045 KB/s 64K 1,339 KB/s 1,958 KB/s 2,282 KB/s 1,962 KB/s 128K 2,673 KB/s 3,862 KB/s 4,113 KB/s 3,898 KB/s 256K 7,685 KB/s 7,539 KB/s 7,557 KB/s 7,638 KB/s 512K 19,556 KB/s 19,558 KB/s 19,652 KB/s 19,688 Kb/s Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
expansion/resync can grab a stripe when the stripe is in batch list. Since all stripes in batch list must be in the same state, we can't allow some stripes run into expansion/resync. So we delay expansion/resync for stripe in batch list. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
If io error happens in any stripe of a batch list, the batch list will be split, then normal process will run for the stripes in the list. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
stripe cache is 4k size. Even adjacent full stripe writes are handled in 4k unit. Idealy we should use big size for adjacent full stripe writes. Bigger stripe cache size means less stripes runing in the state machine so can reduce cpu overhead. And also bigger size can cause bigger IO size dispatched to under layer disks. With below patch, we will automatically batch adjacent full stripe write together. Such stripes will be added to the batch list. Only the first stripe of the list will be put to handle_list and so run handle_stripe(). Some steps of handle_stripe() are extended to cover all stripes of the list, including ops_run_io, ops_run_biodrain and so on. With this patch, we have less stripes running in handle_stripe() and we send IO of whole stripe list together to increase IO size. Stripes added to a batch list have some limitations. A batch list can only include full stripe write and can't cross chunk boundary to make sure stripes have the same parity disks. Stripes in a batch list must be in the same state (no written, toread and so on). If a stripe is in a batch list, all new read/write to add_stripe_bio will be blocked to overlap conflict till the batch list is handled. The limitations will make sure stripes in a batch list be in exactly the same state in the life circly. I did test running 160k randwrite in a RAID5 array with 32k chunk size and 6 PCIe SSD. This patch improves around 30% performance and IO size to under layer disk is exactly 32k. I also run a 4k randwrite test in the same array to make sure the performance isn't changed with the patch. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
Track overwrite disk count, so we can know if a stripe is a full stripe write. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
A freshly new stripe with write request can be batched. Any time the stripe is handled or new read is queued, the flag will be cleared. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 shli@kernel.org 提交于
Use flex_array for scribble data. Next patch will batch several stripes together, so scribble data should be able to cover several stripes, so this patch also allocates scribble data for stripes across a chunk. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Heinz Mauelshagen 提交于
md raid0: access mddev->queue (request queue member) conditionally because it is not set when accessed from dm-raid The patch makes 3 references to mddev->queue in the raid0 personality conditional in order to allow for it to be accessed from dm-raid. Mandatory, because md instances underneath dm-raid don't manage a request queue of their own which'd lead to oopses without the patch. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Tested-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When md notices non-sync IO happening while it is trying to resync (or reshape or recover) it slows down to the set minimum. The default minimum might have made sense many years ago but the drives have become faster. Changing the default to match the times isn't really a long term solution. This patch changes the code so that instead of waiting until the speed has dropped to the target, it just waits until pending requests have completed. This means that the delay inserted is a function of the speed of the devices. Testing shows that: - for some loads, the resync speed is unchanged. For those loads increasing the minimum doesn't change the speed either. So this is a good result. To increase resync speed under such loads we would probably need to increase the resync window size. - for other loads, resync speed does increase to a reasonable fraction (e.g. 20%) of maximum possible, and throughput of the load only drops a little bit (e.g. 10%) - for other loads, throughput of the non-sync load drops quite a bit more. These seem to be latency-sensitive loads. So it isn't a perfect solution, but it is mostly an improvement. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This option is not well justified and testing suggests that it hardly ever makes any difference. The comment suggests there might be a need to wait for non-resync activity indicated by ->nr_waiting, however raise_barrier() already waits for all of that. So just remove it to simplify reasoning about speed limiting. This allows us to remove a 'FIXME' comment from raid5.c as that never used the flag. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
There is really no need for sync_min to be a multiple of chunk_size, and values read from here often aren't. That means you cannot read a value and expect to be able to write it back later. So remove the chunk_size check, and round down to a multiple of 4K, to be sure everything works with 4K-sector devices. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the message clear the Faulty bit in their respective rdev->flags. 2. The node initiating re-add, gathers the bitmaps of all nodes and copies them into the local bitmap. It does not clear the bitmap from which it is copying. 3. Initiating node schedules a md recovery to sync the devices. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This adds the capability of re-adding a failed disk by writing "re-add" to /sys/block/mdXX/md/dev-YYY/state. This facilitates adding disks which have encountered a temporary error such as a network disconnection/hiccup in an iSCSI device, or a SAN cable disconnection which has been restored. In such a situation, you do not need to remove and re-add the device. Writing re-add to the failed device's state would add it again to the array and perform the recovery of only the blocks which were written after the device failed. This works for generic md, and is not related to clustering. However, this patch is to ease re-add operations listed above in clustering environments. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This adds "remove" capabilities for the clustered environment. When a user initiates removal of a device from the array, a REMOVE message with disk number in the array is sent to all the nodes which kick the respective device in their own array. This facilitates the removal of failed devices. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This is required by the clustering module (patches to follow) to find the device to remove or re-add. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This export is required for clustering module in order to co-ordinate remove/readd a rdev from all nodes. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Guoqing Jiang 提交于
Since the node num of md-cluster is from zero, and cinfo->slot_number represents the slot num of dlm, no need to check for equality. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 10 4月, 2015 1 次提交
-
-
由 NeilBrown 提交于
Since commit 20d0189b in v3.14-rc1 RAID0 has performed incorrect calculations when the chunksize is not a power of 2. This happens because "sector_div()" modifies its first argument, but this wasn't taken into account in the patch. So restore that first arg before re-using the variable. Reported-by: NJoe Landman <joe.landman@gmail.com> Reported-by: NDave Chinner <david@fromorbit.com> Fixes: 20d0189b Cc: stable@vger.kernel.org (3.14 and later). Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 08 4月, 2015 1 次提交
-
-
由 Gu Zheng 提交于
Simon reported the md io stats accounting issue: " I'm seeing "iostat -x -k 1" print this after a RAID1 rebuild on 4.0-rc5. It's not abnormal other than it's 3-disk, with one being SSD (sdc) and the other two being write-mostly: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 345.00 0.00 0.00 0.00 0.00 100.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 58779.00 0.00 0.00 0.00 0.00 100.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 100.00 " The cause is commit "18c0b223" uses the generic_start_io_acct to account the disk stats rather than the open code, but it also introduced the increase to .in_flight[rw] which is needless to md. So we re-use the open code here to fix it. Reported-by: NSimon Kirby <sim@hostway.ca> Cc: <stable@vger.kernel.org> 3.19 Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 07 4月, 2015 2 次提交
-
-
由 Jack Morgenstein 提交于
Commit 1daa4303 ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") did the deprecation only for port 1 of the card. Need to deprecate for port 2 as well. Fixes: 1daa4303 ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stas Sergeev 提交于
mvneta_adjust_link() is a callback for of_phy_connect() and should not be called directly. The result of calling it directly is as below: Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 4月, 2015 2 次提交
-
-
由 Hans de Goede 提交于
On V2 devices the DualPoint Stick reports bare packets, these should be reported via the "AlpsPS/2 ALPS DualPoint Stick" dev2 evdev node, which also has the INPUT_PROP_POINTING_STICK propbit set. Note that since there is no way to distinguish these packets from an external PS/2 mouse (insofar as these laptops have an external PS/2 port) this means that we will be reporting PS/2 mouse events via this evdev node too, as we've been doing in kernel 3.19 and older. This has been tested on a Dell Latitude D620 and a Dell Latitude E6400, which both have a V2 touchpad + a DualPoint Stick which reports bare packets. Signed-off-by: NHans de Goede <hdegoede@redhat.com> Reviewed-by: NPali Rohár <pali.rohar@gmail.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Hans de Goede 提交于
Bare packets should be reported via the same evdev device independent on whether they are detected on the beginning of a packet or in the middle of a packet. This has been tested on a Dell Latitude E6400, where the DualPoint Stick reports bare packets, which get reported via dev3 when the touchpad is idle, and via dev2 when the touchpad and stick are used simultaneously. This commit fixes this inconsistency by always reporting bare packets via dev3. Note that since the come from a DualPoint Stick they really should be reported via dev2, this gets fixed in a later commit. Signed-off-by: NHans de Goede <hdegoede@redhat.com> Reviewed-by: NPali Rohár <pali.rohar@gmail.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
- 03 4月, 2015 2 次提交
-
-
由 Jonathan Davies 提交于
xen-netfront limits transmitted skbs to be at most 44 segments in size. However, GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448 bytes each. This slight reduction in the size of packets means a slight loss in efficiency. Since c/s 9ecd1a75, xen-netfront sets gso_max_size to XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER, where XEN_NETIF_MAX_TX_SIZE is 65535 bytes. The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s 6c09fa09) in determining when to split an skb into two is sk->sk_gso_max_size - 1 - MAX_TCP_HEADER. So the maximum permitted size of an skb is calculated to be (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER. Intuitively, this looks like the wrong formula -- we don't need two TCP headers. Instead, there is no need to deviate from the default gso_max_size of 65536 as this already accommodates the size of the header. Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments of 1448 bytes each), as observed via tcpdump. This patch makes netfront send skbs of up to 65160 bytes (45 segments of 1448 bytes each). Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as it relates to the size of the whole packet, including the header. Fixes: 9ecd1a75 ("xen-netfront: reduce gso_max_size to account for max TCP header") Signed-off-by: NJonathan Davies <jonathan.davies@citrix.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shachar Raindel 提交于
Properly verify that the resulting page aligned end address is larger than both the start address and the length of the memory area requested. Both the start and length arguments for ib_umem_get are controlled by the user. A misbehaving user can provide values which will cause an integer overflow when calculating the page aligned end address. This overflow can cause also miscalculation of the number of pages mapped, and additional logic issues. Addresses: CVE-2014-8159 Cc: <stable@vger.kernel.org> Signed-off-by: NShachar Raindel <raindel@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 02 4月, 2015 9 次提交
-
-
由 Christian König 提交于
We need to wait for all fences, not just the exclusive one. Signed-off-by: NChristian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
We somehow try to free the SG table twice. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=89734Signed-off-by: NChristian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Daniel Stone 提交于
When performing a modeset, use the framebuffer pitch value to set FIMD IMG_SIZE and Mixer SPAN registers. These are both defined as pitch - the distance between contiguous lines (bytes for FIMD, pixels for mixer). Fixes display on Snow (1366x768). Signed-off-by: NDaniel Stone <daniels@collabora.com> Tested-by: NJavier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: NInki Dae <inki.dae@samsung.com>
-
由 Ville Syrjälä 提交于
The legcy colorkey ioctls are only implemented for sprite planes, so reject the ioctl for primary/cursor planes. If we want to support colorkeying with these planes (assuming we have hw support of course) we should just move ahead with the colorkey property conversion. Testcase: kms_legacy_colorkey Cc: Tommi Rantala <tt.rantala@gmail.com> Cc: stable@vger.kernel.org Reference: http://mid.gmane.org/CA+ydwtr+bCo7LJ44JFmUkVRx144UDFgOS+aJTfK6KHtvBDVuAw@mail.gmail.comReported-and-tested-by: NTommi Rantala <tt.rantala@gmail.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Hariprasad Shenai 提交于
Add new Common Code routines to retrieve Firmware Device Log parameters from PCIE_FW_PF[7]. The firmware initializes its Device Log very early on and stores the parameters for its location/size in that register. Using the parameters from the register allows us to access the Firmware Device Log even when the firmware crashes very early on or we're not attached to the firmware Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hariprasad Shenai 提交于
Adds new macro and few macro changes for fw version 1.13.32.0 also changes version string in driver to match 1.13.32.0 Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rusty Russell 提交于
Since commit 8e709469 ("lguest: add a dummy PCI host bridge.") lguest uses PCI, but it needs you to frob the ports directly. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yuval Mintz 提交于
When IOMM-vtd is active, once main kernel crashes unfinished DMAE transactions will be blocked, putting the HW in an error state which will cause further transactions to timeout. Current employed logic uses wrong macros, causing the first function to be the only function that cleanups that error state during its probe/load. This patch allows all the functions to successfully re-load in kdump kernel. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
When running in a kdump kernel, it's very likely that due to sync. loss with management firmware the first PCI function to probe and reach the previous unload flow would decide it can reset the chip and continue onward. While doing so, it will only close its own Rx port. On a 4-port device where 2nd port on engine is a 1g-port, the 2nd port would allow ingress traffic after the chip is reset [assuming it was active on the first kernel]. This would later cause a HW attention. This changes driver flow to close both ports' 1g capabilities during the previous driver unload flow prior to the chip reset. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 4月, 2015 2 次提交
-
-
由 Filip Ayazi 提交于
Commit 98dc0703 ("Input: synaptics - add quirk for Thinkpad E440") had a typo in ymax, this changes the value to the one reported by touchpad-edge-detector and mentioned in the commit. Signed-off-by: NFilip Ayazi <filipayazi@gmail.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Christian Hesse 提交于
This device is sold as 'Lenovo Tinkpad USB 3.0 Ethernet 4X90E51405'. Chipset is RTL8153 and works with r8152. Signed-off-by: NChristian Hesse <mail@eworm.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-