提交 · 38aef1549b18539eaecd804383a6ccb6588a9ce1 · openanolis / cloud-kernel

27 4月, 2015 7 次提交

drm/radeon: only enable audio streams if the monitor supports it · 38aef154

由 Alex Deucher 提交于 4月 07, 2015

Selectively enable which packets we send based on monitor caps.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

38aef154

drm/radeon: only mark audio as connected if the monitor supports it (v3) · 0f55db36

由 Alex Deucher 提交于 4月 07, 2015

Otherwise the driver may try and send audio which may confuse the
monitor.

v2: set pin to NULL if no audio
v3: avoid crash with analog encoders
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

0f55db36

drm/radeon/audio: don't enable packets until the end · 362ff251

由 Alex Deucher 提交于 3月 31, 2015

Don't enable the audio and avi infoframes and audio stream
until all the state is set up.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

362ff251

drm/radeon: drop dce6_dp_enable · 12428327

由 Alex Deucher 提交于 3月 31, 2015

It's mostly duplicated with evergreen_dp_enable. This
is a prerequisite for fix implemented in another patch.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

12428327

drm/radeon: fix ordering of AVI packet setup · 304f07e9

由 Alex Deucher 提交于 3月 31, 2015

Set the line first, then enable the stream.  May fix
pink line problems on some displays.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

304f07e9

drm/radeon: Use drm_calloc_ab for CS relocs · b421ed15

由 Michel Dänzer 提交于 4月 16, 2015

The number of relocs is passed in by userspace and can be large. It has
been observed to cause kcalloc failures in the wild.

Cc: stable@vger.kernel.org
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b421ed15

v4l: xilinx: fix for include file movement · 4a655466

由 Stephen Rothwell 提交于 4月 13, 2015

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a655466

25 4月, 2015 2 次提交

platform/chrome: chromeos_laptop - instantiate Atmel at primary address · 96cba9b0

由 Dmitry Torokhov 提交于 4月 14, 2015

The new Atmel MXT driver expects i2c client's address contain the
primary (main address) of the chip, and calculates the expected
bootloader address form the primary address. Unfortunately chrome_laptop
does probe the devices and if touchpad (or touchscreen, or both) comes
up in bootloader mode the i2c device gets instantiated with the
bootloader address which confuses the driver.

To work around this issue let's probe the primary address first. If the
device is not detected at the primary address we'll probe alternative
addresses as "dummy" devices. If any of them are found, destroy the
dummy client and instantiate client with proper name at primary address
still.
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NOlof Johansson <olof@lixom.net>

96cba9b0

toshiba_acpi: Do not register vendor backlight when acpi_video bl is available · 358d6a2c

由 Hans de Goede 提交于 4月 21, 2015

commit a39f46df ("toshiba_acpi: Fix regression caused by backlight extra
check code") causes the backlight to no longer work on the Toshiba Z30,
reverting that commit fixes this but restores the original issue fixed
by that commit.

Looking at the toshiba_acpi backlight code for a fix for this I noticed that
the toshiba code is the only code under platform/x86 which unconditionally
registers a vendor acpi backlight interface, without checking for acpi_video
backlight support first.

This commit adds the necessary checks bringing toshiba_acpi in line with the
other drivers, and fixing the Z30 regression without needing to revert the
commit causing it.

Chances are that there will be some Toshiba models which have a non working
acpi-video implementation while the toshiba vendor backlight interface does
work, this commit adds an empty dmi_id table where such systems can be added,
this is identical to how other drivers handle such systems.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1206036
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=86521Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Reviewed-and-tested-by: NAzael Avalos <coproscefalo@gmail.com>
Signed-off-by: NDarren Hart <dvhart@linux.intel.com>

358d6a2c

24 4月, 2015 6 次提交

crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA · 8c98ebd7

由 Geert Uytterhoeven 提交于 4月 23, 2015

If NO_DMA=y:

    drivers/built-in.o: In function `img_hash_write_via_dma_stop':
    img-hash.c:(.text+0xa2b822): undefined reference to `dma_unmap_sg'
    drivers/built-in.o: In function `img_hash_xmit_dma':
    img-hash.c:(.text+0xa2b8d8): undefined reference to `dma_map_sg'
    img-hash.c:(.text+0xa2b948): undefined reference to `dma_unmap_sg'

Also move the "depends" section below the "tristate" line while we're at
it.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8c98ebd7

ACPI / scan: Add a scan handler for PRP0001 · 7d284352

由 Rafael J. Wysocki 提交于 4月 24, 2015

If the special PRP0001 device ID is present in the given device's list
of ACPI/PNP IDs and the device has a valid "compatible" property in
the _DSD, it should be enumerated using the default mechanism,
unless some scan handlers match the IDs preceding PRP0001 in the
device's list of ACPI/PNP IDs.  In addition to that, no scan handlers
matching the IDs following PRP0001 in that list should be attached
to the device.

To make that happen, define a scan handler that will match PRP0001
and trigger the default enumeration for the matching devices if the
"compatible" property is present for them.

Since that requires the check for platform_id and device->handler
to be removed from acpi_default_enumeration(), move the fallback
invocation of acpi_default_enumeration() to acpi_bus_attach()
(after it's checked if there's a matching ACPI driver for the
device), which is a better place to call it, and do the platform_id
check in there too (device->handler is guaranteed to be unset at
the point where the function is looking for a matching ACPI driver).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NDarren Hart <dvhart@linux.intel.com>

7d284352

ACPI / scan: Annotate physical_node_lock in acpi_scan_is_offline() · 4c533c80

由 Rafael J. Wysocki 提交于 4月 18, 2015

acpi_scan_is_offline() may be called under the physical_node_lock
lock of the given device object's parent, so prevent lockdep from
complaining about that by annotating that instance with
SINGLE_DEPTH_NESTING.

Fixes: caa73ea1 (ACPI / hotplug / driver core: Handle containers in a special way)
Reported-and-tested-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NToshi Kani <toshi.kani@hp.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4c533c80

drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg · b5f1c97f

由 Imre Deak 提交于 4月 15, 2015

Due this typo we don't save/restore the GFX_MAX_REQ_COUNT register across
suspend/resume, so fix this.

This was introduced in

commit ddeea5b0
Author: Imre Deak <imre.deak@intel.com>
Date:   Mon May 5 15:19:56 2014 +0300

    drm/i915: vlv: add runtime PM support

I noticed this only by reading the code. To my knowledge it shouldn't
cause any real problems at the moment, since the power well backing this
register remains on across a runtime s/r. This may change once
system-wide s0ix functionality is enabled in the kernel.

v2:
- resend after a missing git add -u :/

Cc: stable@vger.kernel.org
Signed-off-by: NImre Deak <imre.deak@intel.com>
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

b5f1c97f

drm/i915: Workaround to avoid lite restore with HEAD==TAIL · 53292cdb

由 Michel Thierry 提交于 4月 15, 2015

WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.

Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.

If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.

v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).

v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).

v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

53292cdb

drm/i915: cope with large i2c transfers · 9535c475

由 Dmitry Torokhov 提交于 4月 21, 2015

The hardware, according to the specs, is limited to 256 byte transfers,
and current driver has no protections in case users attempt to do larger
transfers. The code will just stomp over status register and mayhem
ensues.

Let's split larger transfers into digestable chunks. Doing this allows
Atmel MXT driver on Pixel 1 function properly (it hasn't since commit
9d8dc3e5 "Input: atmel_mxt_ts -
implement T44 message handling" which tries to consume multiple
touchscreen/touchpad reports in a single transaction).

Cc: stable@vger.kernel.org
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

9535c475

23 4月, 2015 2 次提交

pwm: Remove __init initializer for pwm_add_table() · c264f111

由 Shobhit Kumar 提交于 3月 12, 2015

For platforms that don't support DT, some early MFD modules can register
lookup tables. Remove the __init annotation so that this works. This is
similar to gpio_add_lookup_table() which allows late additions.

CC: Samuel Ortiz <sameo@linux.intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: NShobhit Kumar <shobhit.kumar@intel.com>
Signed-off-by: NThierry Reding <thierry.reding@gmail.com>

c264f111

dmaengine: hsu: don't prompt for hsu_core part · 3cfe2137

由 Vinod Koul 提交于 4月 22, 2015

HSU_DMA is selected by the HSU_DMA_PCI driver, this should be user selected
so remove the user prompt for this
Signed-off-by: NVinod Koul <vinod.koul@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cfe2137

22 4月, 2015 23 次提交

rbd: rbd_wq comment is obsolete · f77303bd

由 Ilya Dryomov 提交于 4月 22, 2015

After the switch to blk-mq rbd_wq processes requests, not devices.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f77303bd

watchdog: stmp3xxx_rtc_wdt: fix broken email address · cf82f52d

由 Wolfram Sang 提交于 4月 20, 2015

My Pengutronix address is not valid anymore, redirect people to the Pengutronix
kernel team.
Reported-by: NHarald Geyer <harald@ccbib.org>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
Acked-by: NRobert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

cf82f52d

watchdog: pnx4008_wdt: fix broken email address · e8cc5366

由 Wolfram Sang 提交于 4月 20, 2015

My Pengutronix address is not valid anymore, redirect people to the Pengutronix
kernel team.
Reported-by: NHarald Geyer <harald@ccbib.org>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
Acked-by: NRobert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

e8cc5366

watchdog: octeon: use fixed length string for register names · 3a30c07e

由 Aaro Koskinen 提交于 3月 28, 2015

Use fixed length string for register names. This saves 416 bytes
in text size.
Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

3a30c07e

watchdog: octeon: fix some trivial coding style issues · 8692cf0a

由 Aaro Koskinen 提交于 3月 28, 2015

Fix some trivial coding style issues to reduce noise from static analyzers.
Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

8692cf0a

watchdog: octeon: convert to WATCHDOG_CORE API · 3d588c93

由 Aaro Koskinen 提交于 3月 28, 2015

Convert OCTEON watchdog to WATCHDOG_CORE API. This enables support
for multiple watchdogs on OCTEON boards.
Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

3d588c93

watchdog: cadence: Remove Kconfig dependency on ARCH · 6290d8c8

由 Michal Simek 提交于 3月 09, 2015

Remove Kconfig dependency and enable driver for
all ARCHs.
Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

6290d8c8

watchdog: qcom: use timer devicetree binding · 0dfd582e

由 Mathieu Olivari 提交于 2月 20, 2015

MSM watchdog configuration happens in the same register block as the
timer, so we'll use the same binding as the existing timer.

The qcom-wdt will now be probed when devicetree has an entry compatible
with "qcom,kpss-timer" or "qcom-scss-timer".
Signed-off-by: NMathieu Olivari <mathieu@codeaurora.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Acked-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

0dfd582e

watchdog: bcm281xx: Remove use of seq_printf return value · e1dbde29

由 Joe Perches 提交于 2月 21, 2015

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: Guenter Roeck <linux~roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

e1dbde29

dmaengine: dw: don't prompt for DW_DMAC_CORE · cdde0e61

由 Vinod Koul 提交于 4月 22, 2015

DW_DMAC_CORE is slected by PCI or Platform driver, so this symbol shouldn't
be user selectable, so remove the prompt
Signed-off-by: NVinod Koul <vinod.koul@intel.com>

cdde0e61

ACPI / EC: fix NULL pointer dereference in acpi_ec_remove_query_handler() · 6b5eab54

由 Chris Bainbridge 提交于 4月 22, 2015

Use list_for_each_entry_safe for iterating because handler may be freed
in the loop.

BUG: unable to handle kernel NULL pointer dereference at 000000000000002c
IP: [<ffffffff814d69c8>] acpi_ec_put_query_handler+0x7/0x1a
Call Trace:
 acpi_ec_remove_query_handler+0x87/0x97
 acpi_smbus_hc_remove+0x2a/0x44 [sbshc]
 acpi_device_remove+0x7b/0x9a
 __device_release_driver+0x7e/0x110
 driver_detach+0xb0/0xc0
 bus_remove_driver+0x54/0xe0
 driver_unregister+0x2b/0x60
 acpi_bus_unregister_driver+0x10/0x12
 acpi_smb_hc_driver_exit+0x10/0x12 [sbshc]
 SyS_delete_module+0x1b8/0x210
 system_call_fastpath+0x12/0x6a
Signed-off-by: NChris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6b5eab54

md/raid5: don't do chunk aligned read on degraded array. · 9ffc8f7c

由 Eric Mei 提交于 3月 18, 2015

When array is degraded, read data landed on failed drives will result in
reading rest of data in a stripe. So a single sequential read would
result in same data being read twice.

This patch is to avoid chunk aligned read for degraded array. The
downside is to involve stripe cache which means associated CPU overhead
and extra memory copy.

Test Results:
Following test are done on a enterprise storage node with Seagate 6T SAS
drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2,
chunk size 128 KiB.

I use FIO, using direct-io with various bs size, enough queue depth,
tested sequential and 100% random read against 3 array config:
 1) optimal, as baseline;
 2) degraded;
 3) degraded with this patch.
Kernel version is 4.0-rc3.

Each individual test I only did once so there might be some variations,
but we just focus on big trend.

Sequential Read:
  bs=(KiB)  optimal(MiB/s)  degraded(MiB/s)  degraded-with-patch (MiB/s)
   1024       1608            656              995
    512       1624            710              956
    256       1635            728              980
    128       1636            771              983
     64       1612           1119             1000
     32       1580           1420             1004
     16       1368            688              986
      8        768            647              953
      4        411            413              850

Random Read:
  bs=(KiB)  optimal(IOPS)  degraded(IOPS)  degraded-with-patch (IOPS)
   1024        163            160              156
    512        274            273              272
    256        426            428              424
    128        576            592              591
     64        726            724              726
     32        849            848              837
     16        900            970              971
      8        927            940              929
      4        948            940              955

Some notes:
  * In sequential + optimal, as bs size getting smaller, the FIO thread
become CPU bound.
  * In sequential + degraded, there's big increase when bs is 64K and
32K, I don't have explanation.
  * In sequential + degraded-with-patch, the MD thread mostly become CPU
bound.

If you want to we can discuss specific data point in those data. But in
general it seems with this patch, we have more predictable and in most
cases significant better sequential read performance when array is
degraded, and almost no noticeable impact on random read.

Performance is a complicated thing, the patch works well for this
particular configuration, but may not be universal. For example I
imagine testing on all SSD array may have very different result. But I
personally think in most cases IO bandwidth is more scarce resource than
CPU.
Signed-off-by: NEric Mei <eric.mei@seagate.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9ffc8f7c

md/raid5: allow the stripe_cache to grow and shrink. · edbe83ab

由 NeilBrown 提交于 2月 26, 2015

The default setting of 256 stripe_heads is probably
much too small for many configurations.  So it is best to make it
auto-configure.

Shrinking the cache under memory pressure is easy.  The only
interesting part here is that we put a fairly high cost
('seeks') on shrinking the cache as the cost is greater than
just having to read more data, it reduces parallelism.

Growing the cache on demand needs to be done carefully.  If we allow
fast growth, that can upset memory balance as lots of dirty memory can
quickly turn into lots of memory queued in the stripe_cache.
It is important for the raid5 block device to appear congested to
allow write-throttling to work.

So we only add stripes slowly. We set a flag when an allocation
fails because all stripes are in use, allocate at a convenient
time when that flag is set, and don't allow it to be set again
until at least one stripe_head has been released for re-use.

This means that a spurt of requests will only cause one stripe_head
to be allocated, but a steady stream of requests will slowly
increase the cache size - until memory pressure puts it back again.

It could take hours to reach a steady state.

The value written to, and displayed in, stripe_cache_size is
used as a minimum.  The cache can grow above this and shrink back
down to it.  The actual size is not directly visible, though it can
be deduced to some extent by watching stripe_cache_active.
Signed-off-by: NNeilBrown <neilb@suse.de>

edbe83ab

N
md/raid5: change ->inactive_blocked to a bit-flag. · 5423399a
由 NeilBrown 提交于 2月 26, 2015
```
This allows us to easily add more (atomic) flags.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
5423399a

md/raid5: move max_nr_stripes management into grow_one_stripe and drop_one_stripe · 486f0644

由 NeilBrown 提交于 2月 25, 2015

Rather than adjusting max_nr_stripes whenever {grow,drop}_one_stripe()
succeeds, do it inside the functions.

Also choose the correct hash to handle next inside the functions.

This removes duplication and will help with future new uses of
{grow,drop}_one_stripe.

This also fixes a minor bug where the "md/raid:%md: allocate XXkB"
message always said "0kB".
Signed-off-by: NNeilBrown <neilb@suse.de>

486f0644

md/raid5: pass gfp_t arg to grow_one_stripe() · a9683a79

由 NeilBrown 提交于 2月 25, 2015

This is needed for future improvement to stripe cache management.
Signed-off-by: NNeilBrown <neilb@suse.de>

a9683a79

md/raid5: introduce configuration option rmw_level · d06f191f

由 Markus Stockhausen 提交于 12月 15, 2014

Depending on the available coding we allow optimized rmw logic for write
operations. To support easier testing this patch allows manual control
of the rmw/rcw descision through the interface /sys/block/mdX/md/rmw_level.

The configuration can handle three levels of control.

rmw_level=0: Disable rmw for all RAID types. Hardware assisted P/Q
calculation has no implementation path yet to factor in/out chunks of
a syndrome. Enforcing this level can be benefical for slow CPUs with
hardware syndrome support and fast SSDs.

rmw_level=1: Estimate rmw IOs and rcw IOs. Execute rmw only if we will
save IOs. This equals the "old" unpatched behaviour and will be the
default.

rmw_level=2: Execute rmw even if calculated IOs for rmw and rcw are
equal. We might have higher CPU consumption because of calculating the
parity twice but it can be benefical otherwise. E.g. RAID4 with fast
dedicated parity disk/SSD. The option is implemented just to be
forward-looking and will ONLY work with this patch!
Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

d06f191f

md/raid5: activate raid6 rmw feature · 584acdd4

由 Markus Stockhausen 提交于 12月 15, 2014

Glue it altogehter. The raid6 rmw path should work the same as the
already existing raid5 logic. So emulate the prexor handling/flags
and split functions as needed.

1) Enable xor_syndrome() in the async layer.

2) Split ops_run_prexor() into RAID4/5 and RAID6 logic. Xor the syndrome
at the start of a rmw run as we did it before for the single parity.

3) Take care of rmw run in ops_run_reconstruct6(). Again process only
the changed pages to get syndrome back into sync.

4) Enhance set_syndrome_sources() to fill NULL pages if we are in a rmw
run. The lower layers will calculate start & end pages from that and
call the xor_syndrome() correspondingly.

5) Adapt the several places where we ignored Q handling up to now.

Performance numbers for a single E5630 system with a mix of 10 7200k
desktop/server disks. 300 seconds random write with 8 threads onto a
3,2TB (10*400GB) RAID6 64K chunk without spare (group_thread_cnt=4)

bsize   rmw_level=1   rmw_level=0   rmw_level=1   rmw_level=0
        skip_copy=1   skip_copy=1   skip_copy=0   skip_copy=0
   4K      115 KB/s      141 KB/s      165 KB/s      140 KB/s
   8K      225 KB/s      275 KB/s      324 KB/s      274 KB/s
  16K      434 KB/s      536 KB/s      640 KB/s      534 KB/s
  32K      751 KB/s    1,051 KB/s    1,234 KB/s    1,045 KB/s
  64K    1,339 KB/s    1,958 KB/s    2,282 KB/s    1,962 KB/s
 128K    2,673 KB/s    3,862 KB/s    4,113 KB/s    3,898 KB/s
 256K    7,685 KB/s    7,539 KB/s    7,557 KB/s    7,638 KB/s
 512K   19,556 KB/s   19,558 KB/s   19,652 KB/s   19,688 Kb/s
Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

584acdd4

raid5: handle expansion/resync case with stripe batching · dabc4ec6

由 shli@kernel.org 提交于 12月 15, 2014

expansion/resync can grab a stripe when the stripe is in batch list. Since all
stripes in batch list must be in the same state, we can't allow some stripes
run into expansion/resync. So we delay expansion/resync for stripe in batch
list.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

dabc4ec6

raid5: handle io error of batch list · 72ac7330

由 shli@kernel.org 提交于 12月 15, 2014

If io error happens in any stripe of a batch list, the batch list will be
split, then normal process will run for the stripes in the list.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

72ac7330

RAID5: batch adjacent full stripe write · 59fc630b

由 shli@kernel.org 提交于 12月 15, 2014

stripe cache is 4k size. Even adjacent full stripe writes are handled in 4k
unit. Idealy we should use big size for adjacent full stripe writes. Bigger
stripe cache size means less stripes runing in the state machine so can reduce
cpu overhead. And also bigger size can cause bigger IO size dispatched to under
layer disks.

With below patch, we will automatically batch adjacent full stripe write
together. Such stripes will be added to the batch list. Only the first stripe
of the list will be put to handle_list and so run handle_stripe(). Some steps
of handle_stripe() are extended to cover all stripes of the list, including
ops_run_io, ops_run_biodrain and so on. With this patch, we have less stripes
running in handle_stripe() and we send IO of whole stripe list together to
increase IO size.

Stripes added to a batch list have some limitations. A batch list can only
include full stripe write and can't cross chunk boundary to make sure stripes
have the same parity disks. Stripes in a batch list must be in the same state
(no written, toread and so on). If a stripe is in a batch list, all new
read/write to add_stripe_bio will be blocked to overlap conflict till the batch
list is handled. The limitations will make sure stripes in a batch list be in
exactly the same state in the life circly.

I did test running 160k randwrite in a RAID5 array with 32k chunk size and 6
PCIe SSD. This patch improves around 30% performance and IO size to under layer
disk is exactly 32k. I also run a 4k randwrite test in the same array to make
sure the performance isn't changed with the patch.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

59fc630b

raid5: track overwrite disk count · 7a87f434

由 shli@kernel.org 提交于 12月 15, 2014

Track overwrite disk count, so we can know if a stripe is a full stripe write.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7a87f434

raid5: add a new flag to track if a stripe can be batched · da41ba65

由 shli@kernel.org 提交于 12月 15, 2014

A freshly new stripe with write request can be batched. Any time the stripe is
handled or new read is queued, the flag will be cleared.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

da41ba65

openanolis / cloud-kernel 12 个月 前同步成功

openanolis / cloud-kernel
12 个月前同步成功