提交 · 6ee57bcc1e61d39c0579438055bc84087210f9b6 · openeuler / raspberrypi-kernel

18 12月, 2012 4 次提交

virtio-net: correct capacity math on ring full · 6ee57bcc

由 Michael S. Tsirkin 提交于 10月 16, 2012

Capacity math on ring full is wrong: we are
looking at num_sg but that might be optimistic
because of indirect buffer use.

The implementation also penalizes fast path
with extra memory accesses for the benefit of
ring full condition handling which is slow path.

It's easy to query ring capacity so let's do just that.

This change also makes it easier to move vnet header
for tx around as follow-up patch does.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>

6ee57bcc

virtio: move queue_index and num_free fields into core struct virtqueue. · 06ca287d

由 Rusty Russell 提交于 10月 16, 2012

They're generic concepts, so hoist them.  This also avoids accessor
functions (though kept around for merge with DaveM's net tree).

This goes even further than Jason Wang's 17bb6d40 patch
("virtio-ring: move queue_index to vring_virtqueue") which moved the
queue_index from the specific transport.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

06ca287d

virtio-pci: use module_pci_driver to simplify the code · 1ce6853a

由 Wei Yongjun 提交于 10月 16, 2012

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1ce6853a

lguest: fix typo · 681f2066

由 Alex Russell 提交于 10月 16, 2012

Signed-off-by: NAlex Russell <giles.alex@hotmail.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

681f2066

22 10月, 2012 1 次提交

virtio: force vring descriptors to be allocated from lowmem · b92b1b89

由 Will Deacon 提交于 10月 19, 2012

Virtio devices may attempt to add descriptors to a virtqueue from atomic
context using GFP_ATOMIC allocation. This is problematic because such
allocations can fall outside of the lowmem mapping, causing virt_to_phys
to report bogus physical addresses which are subsequently passed to
userspace via the buffers for the virtual device.

This patch masks out __GFP_HIGH and __GFP_HIGHMEM from the requested
flags when allocating descriptors for a virtqueue. If an atomic
allocation is requested and later fails, we will return -ENOSPC which
will be handled by the driver.

Cc: stable@kernel.org
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b92b1b89

16 10月, 2012 1 次提交

thermal, cpufreq: Fix build when CPU_FREQ_TABLE isn't configured · dd8e8c4a

由 David Rientjes 提交于 10月 15, 2012

Commit 02361418 ("thermal: add generic cpufreq cooling
implementation") requires cpufreq_frequency_get_table(), but that
function is only defined for CONFIG_CPU_FREQ_TABLE resulting in the
following build error:

  drivers/built-in.o: In function `cpufreq_get_max_state':
  drivers/thermal/cpu_cooling.c:259: undefined reference to `cpufreq_frequency_get_table'
  drivers/built-in.o: In function `get_cpu_frequency':
  drivers/thermal/cpu_cooling.c:129: undefined reference to `cpufreq_frequency_get_table'

Fix it by selecting CONFIG_CPU_FREQ_TABLE for such a configuration.

It turns out CONFIG_EXYNOS_THERMAL also needs CONFIG_CPU_FREQ_TABLE, so
select it there as well.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd8e8c4a

13 10月, 2012 5 次提交

dm: store dm_target_io in bio front_pad · dba14160

由 Mikulas Patocka 提交于 10月 12, 2012

Use the recently-added bio front_pad field to allocate struct dm_target_io.

Prior to this patch, dm_target_io was allocated from a mempool. For each
dm_target_io, there is exactly one bio allocated from a bioset.

This patch merges these two allocations into one allocation: we create a
bioset with front_pad equal to the size of dm_target_io so that every
bio allocated from the bioset has sizeof(struct dm_target_io) bytes
before it. We allocate a bio and use the bytes before the bio as
dm_target_io.

_tio_cache is removed and the tio_pool mempool is now only used for
request-based devices.

This idea was introduced by Kent Overstreet.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: tj@kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Bill Pemberton <wfp5p@viridian.itc.virginia.edu>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

dba14160

dm thin: move bio_prison code to separate module · 4f81a417

由 Mike Snitzer 提交于 10月 12, 2012

The bio prison code will be useful to other future DM targets so
move it to a separate module.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4f81a417

dm thin: prepare to separate bio_prison code · 44feb387

由 Mike Snitzer 提交于 10月 12, 2012

The bio prison code will be useful to share with future DM targets.

Prepare to move this code into a separate module, adding a dm prefix
to structures and functions that will be exported.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

44feb387

dm thin: support discard with non power of two block size · 28eed34e

由 Mike Snitzer 提交于 10月 12, 2012

Support discards when the pool's block size is not a power of 2.
The block layer assumes discard_granularity is a power of 2 (in
blkdev_issue_discard), so we set this to the largest power of 2 that is
a divides into the number of sectors in each block, but never less than
DATA_DEV_BLOCK_SIZE_MIN_SECTORS.

This patch eliminates the "Discard support must be disabled when the
block size is not a power of 2" constraint that was imposed in commit
55f2b8bd ("dm thin: support for non power of 2 pool blocksize").  That
commit was incomplete: using a block size that is not a power of 2
shouldn't mean disabling discard support on the device completely.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

28eed34e

mcs7830: Fix link state detection · dabdaf0c

由 Ondrej Zary 提交于 10月 11, 2012

The device had an undocumented "feature": it can provide a sequence of
spurious link-down status data even if the link is up all the time.
A sequence of 10 was seen so update the link state only after the device
reports the same link state 20 times.
Signed-off-by: NOndrej Zary <linux@rainbow-software.org>
Reported-by: NMichael Leun <lkml20120218@newton.leun.net>
Tested-by: NMichael Leun <lkml20120218@newton.leun.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dabdaf0c

12 10月, 2012 12 次提交

dm persistent data: convert to use le32_add_cpu · 0bcf0879

由 Wei Yongjun 提交于 10月 12, 2012

Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0bcf0879

dm: use ACCESS_ONCE for sysfs values · fe5fe906

由 Mikulas Patocka 提交于 10月 12, 2012

Use the ACCESS_ONCE macro in dm-bufio and dm-verity where a variable
can be modified asynchronously (through sysfs) and we want to prevent
compiler optimizations that assume that the variable hasn't changed.
(See Documentation/atomic_ops.txt.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fe5fe906

dm bufio: use list_move · 54499afb

由 Wei Yongjun 提交于 10月 12, 2012

Use list_move() instead of list_del() + list_add().

spatch with a semantic match was used to find this.
(http://coccinelle.lip6.fr/)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

54499afb

dm mpath: fix check for null mpio in end_io fn · a71a261f

由 Wei Yongjun 提交于 10月 12, 2012

The mpio dereference should be moved below the BUG_ON NULL test
in multipath_end_io().

spatch with a semantic match was used to found this.
(http://coccinelle.lip6.fr/)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a71a261f

xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches. · cb6b6df1

由 Konrad Rzeszutek Wilk 提交于 10月 10, 2012

The commit 254d1a3f, titled
"xen/pv-on-hvm kexec: shutdown watches from old kernel" assumes that the
XenBus backend can deal with reading of values from:
 "control/platform-feature-xs_reset_watches":

    ... a patch for xenstored is required so that it
    accepts the XS_RESET_WATCHES request from a client (see changeset
    23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
    the registration of watches will fail and some features of a PVonHVM
    guest are not available. The guest is still able to boot, but repeated
    kexec boots will fail."

Sadly this is not true when using a Xen 3.4 hypervisor and booting a PVHVM
guest. We end up hanging at:

  err = xenbus_scanf(XBT_NIL, "control",
                        "platform-feature-xs_reset_watches", "%d", &supported);

This can easily be seen with guests hanging at xenbus_init:

NX (Execute Disable) protection: active
SMBIOS 2.4 present.
DMI: Xen HVM domU, BIOS 3.4.0 05/13/2011
Hypervisor detected: Xen HVM
Xen version 3.4.
Xen Platform PCI: I/O protocol version 1
... snip ..
calling  xenbus_init+0x0/0x27e @ 1

Reverting the commit or using the attached patch fixes the issue. This fix
checks whether the hypervisor is older than 4.0 and if so does not try to
perform the read.

Fixes-Oracle-Bug: 14708233
CC: stable@vger.kernel.org
Acked-by: NOlaf Hering <olaf@aepfle.de>
[v2: Added a comment in the source code]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cb6b6df1

kdb,vt_console: Fix missed data due to pager overruns · 17b572e8

由 Jason Wessel 提交于 8月 26, 2012

It is possible to miss data when using the kdb pager.  The kdb pager
does not pay attention to the maximum column constraint of the screen
or serial terminal.  This result is not incrementing the shown lines
correctly and the pager will print more lines that fit on the screen.
Obviously that is less than useful when using a VGA console where you
cannot scroll back.

The pager will now look at the kdb_buffer string to see how many
characters are printed.  It might not be perfect considering you can
output ASCII that might move the cursor position, but it is a
substantially better approximation for viewing dmesg and trace logs.

This also means that the vt screen needs to set the kdb COLUMNS
variable.

Cc: <stable@vger.kernel.org>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

17b572e8

kgdboc: Accept either kbd or kdb to activate the vga + keyboard kdb shell · 24b8592e

由 Jason Wessel 提交于 8月 10, 2012

It is a common enough mistake for people to specify "kdb" when they
meant to type "kbd" that the kgdboc can just accept both since they
both mean the same thing anyway.  Specifically it is for the case
where you want kdb to be active on your graphics console + keyboard
(where kbd was the original abbreviation for keyboard).

With this change kgdboc will now accept either to mean the same thing:
   kgdboc=kbd
   kgdboc=kdb
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

24b8592e

mmc: mxs-mmc: Fix merge issue causing build error · 4c5bb2e4

由 Marek Vasut 提交于 10月 01, 2012

The following error appeared due to a merge problem; the patches:

fc108d24 "mmc: mxs-mmc: fix deadlock caused by recursion loop"
829c1bf4 "mmc: spi: Pull out parts shared between MMC and SPI"

came in through separate branches and cause this build error when
combined.

drivers/mmc/host/mxs-mmc.c: In function 'mxs_mmc_enable_sdio_irq':
drivers/mmc/host/mxs-mmc.c:527:3: error: 'struct mxs_mmc_host' has no member named 'base'
drivers/mmc/host/mxs-mmc.c:527:3: error: 'struct mxs_mmc_host' has no member named 'devid'
make[3]: *** [drivers/mmc/host/mxs-mmc.o] Error 1

This patch corrects the issue.
Signed-off-by: NMarek Vasut <marex@denx.de>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: NChris Ball <cjb@laptop.org>

4c5bb2e4

usbnet: Support devices reporting idleness · 5d9d01a3

由 Oliver Neukum 提交于 10月 11, 2012

Some device types support a form of power management in which
the device suggests to the host that the device may be suspended
now. Support for that is best located in usbnet.
Signed-off-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d9d01a3

Add CDC-ACM support for the CX93010-2x UCMxx USB Modem · e7d491a1

由 Jean-Christian de Rivaz 提交于 10月 10, 2012

This USB V.92/V.32bis Controllered Modem have the USB vendor ID 0x0572
and device ID 0x1340. It need the NO_UNION_NORMAL quirk to be recognized.

Reference:
http://www.conexant.com/servlets/DownloadServlet/DSH-201723-005.pdf?docid=1725&revid=5
See idVendor and idProduct in table 6-1. Device Descriptors
Signed-off-by: NJean-Christian de Rivaz <jc@eclis.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7d491a1

net/ethernet/jme: disable ASPM · aac9453b

由 Kevin Baradon 提交于 10月 10, 2012

Based on patch from Matthew Garrett <mjg@redhat.com> (https://lkml.org/lkml/2011/11/11/168).

http://driveragent.com/archive/30421/7-0-14 indicates that ASPM is
disabled on the 250 and 260. Duplicate for sanity.

Fixes random RX engine hangs I experienced with JMC250 on Clevo W270HU.
Signed-off-by: NKevin Baradon <kevin.baradon@gmail.com>
Cc: Guo-Fu Tseng <cooldavid@cooldavid.org>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aac9453b

kaweth: print correct debug ptr · 0abc1cee

由 Alan Cox 提交于 10月 11, 2012

We nowdays copy the buffer and free fw->data, so make the debug printk use
the right thing.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0abc1cee

11 10月, 2012 17 次提交

md: refine reporting of resync/reshape delays. · 72f36d59

由 NeilBrown 提交于 10月 11, 2012

If 'resync_max' is set to 0 (as is often done when starting a
reshape, so the mdadm can remain in control during a sensitive
period), and if the reshape request is initially delayed because
another array using the same array is resyncing or reshaping etc,
when user-space cannot easily tell when the delay changes from being
due to a conflicting reshape, to being due to resync_max = 0.

So introduce a new state: (curr_resync == 3) to reflect this, make
sure it is visible both via /proc/mdstat and via the "sync_completed"
sysfs attribute, and ensure that the event transition from one delay
state to the other is properly notified.
Signed-off-by: NNeilBrown <neilb@suse.de>

72f36d59

md/raid5: be careful not to resize_stripes too big. · e56108d6

由 NeilBrown 提交于 10月 11, 2012

When a RAID5 is reshaping, conf->raid_disks is increased
before mddev->delta_disks becomes zero.
This can result in check_reshape calling resize_stripes with a
number that is too large.  This particularly happens
when md_check_recovery calls ->check_reshape().

If we use ->previous_raid_disks, we don't risk this.
Signed-off-by: NNeilBrown <neilb@suse.de>

e56108d6

md: make sure manual changes to recovery checkpoint are saved. · db07d85e

由 NeilBrown 提交于 10月 11, 2012

If you make an array bigger but suppress resync of the new region with
  mdadm --grow /dev/mdX --size=max --assume-clean

then stop the array before anything is written to it, the effect of
the "--assume-clean" is lost and the array will resync the new space
when restarted.
So ensure that we update the metadata in the case.
Reported-by: NSebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

db07d85e

md/raid10: use correct limit variable · 91502f09

由 Dan Carpenter 提交于 10月 11, 2012

Clang complains that we are assigning a variable to itself.  This should
be using bad_sectors like the similar earlier check does.

Bug has been present since 3.1-rc1.  It is minor but could
conceivably cause corruption or other bad behaviour.

Cc: stable@vger.kernel.org
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

91502f09

md: writing to sync_action should clear the read-auto state. · 48c26ddc

由 NeilBrown 提交于 10月 11, 2012

In some cases array are started in 'read-auto' state where in
nothing gets written to any device until the array is written
to.  The purpose of this is to make accidental auto-assembly
of the wrong arrays less of a risk, and to allow arrays to be
started to read suspend-to-disk images without actually changing
anything (as might happen if the array were dirty and a
resync seemed necessary).

Explicitly writing the 'sync_action' for a read-auto array currently
doesn't clear the read-auto state, so the sync action doesn't
happen, which can be confusing.

So allow any successful write to sync_action to clear any read-auto
state.
Reported-by: NAlexander Kühn <alexander.kuehn@nagilum.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

48c26ddc

Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races · 7f7583d4

由 Jianpeng Ma 提交于 10月 11, 2012

Now that multiple threads can handle stripes, it is safer to
use an atomic64_t for resync_mismatches, to avoid update races.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7f7583d4

e1000e: Change wthresh to 1 to avoid possible Tx stalls · 8edc0e62

由 Hiroaki SHIMODA 提交于 10月 10, 2012

This patch originated from Hiroaki SHIMODA but has been modified
by Intel with some minor cleanups and additional commit log text.

Denys Fedoryshchenko and others reported Tx stalls on e1000e with
BQL enabled.  Issue was root caused to hardware delays. They were
introduced because some of the e1000e hardware with transmit
writeback bursting enabled, waits until the driver does an
explict flush OR there are WTHRESH descriptors to write back.

Sometimes the delays in question were on the order of seconds,
causing visible lag for ssh sessions and unacceptable tx
completion latency, especially for BQL enabled kernels.

To avoid possible Tx stalls, change WTHRESH back to 1.

The current plan is to investigate a method for re-enabling
WTHRESH while not harming BQL, but those patches will be later
for net-next if they work.

please enqueue for stable since v3.3 as this bug was introduced in
commit 3f0cfa3b
Author: Tom Herbert <therbert@google.com>
Date:   Mon Nov 28 16:33:16 2011 +0000

    e1000e: Support for byte queue limits

    Changes to e1000e to use byte queue limits.
Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb>
Tested-by: NDenys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
CC: eric.dumazet@gmail.com
CC: therbert@google.com
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8edc0e62

xen: netback: handle compound page fragments on transmit. · 6a8ed462

由 Ian Campbell 提交于 10月 10, 2012

An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
iterating over the frames which make up the page.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a8ed462

isdn: fix a wrapping bug in isdn_ppp_ioctl() · 435f08a7

由 Dan Carpenter 提交于 10月 09, 2012

"protos" is an array of unsigned longs and "i" is the number of bits in
an unsigned long so we need to use 1UL as well to prevent the shift
from wrapping around.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

435f08a7

md/raid5: make sure to_read and to_write never go negative. · 1ed850f3

由 NeilBrown 提交于 10月 11, 2012

to_read and to_write are part of the result of analysing
a stripe before handling it.
Their use is to avoid some loops and tests if the values are
known to be zero.  Thus it is not a problem if they are a
little bit larger than they should be.

So decrementing them in handle_failed_stripe serves little value, and
due to races it could cause some loops to be skipped incorrectly.

So remove those decrements.
Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1ed850f3

md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. · a7854487

由 Alexander Lyakas 提交于 10月 11, 2012

Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
Suggested-by: NYair Hershko <yair@zadarastorage.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a7854487

md/raid5: protect debug message against NULL derefernce. · b97390ae

由 NeilBrown 提交于 10月 11, 2012

The pr_debug in add_stripe_bio could race with something
changing *bip, so it is best to hold the lock until
after the pr_debug.
Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b97390ae

md/raid5: add some missing locking in handle_failed_stripe. · 143c4d05

由 NeilBrown 提交于 10月 11, 2012

We really should hold the stripe_lock while accessing
'toread' else we could race with add_stripe_bio and corrupt
a list.
Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

143c4d05

MD: raid5 avoid unnecessary zero page for trim · 9e444768

由 Shaohua Li 提交于 10月 11, 2012

We want to avoid zero discarded dev page, because it's useless for discard.
But if we don't zero it, another read/write hit such page in the cache and will
get inconsistent data.

To avoid zero the page, we don't set R5_UPTODATE flag after construction is
done. In this way, discard write request is still issued and finished, but read
will not hit the page. If the stripe gets accessed soon, we need reread the
stripe, but since the chance is low, the reread isn't a big deal.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9e444768

MD: raid5 trim support · 620125f2

由 Shaohua Li 提交于 10月 11, 2012


Discard for raid4/5/6 has limitation. If discard request size is
small, we do discard for one disk, but we need calculate parity and
write parity disk.  To correctly calculate parity, zero_after_discard
must be guaranteed. Even it's true, we need do discard for one disk
but write another disks, which makes the parity disks wear out
fast. This doesn't make sense. So an efficient discard for raid4/5/6
should discard all data disks and parity disks, which requires the
write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size
is smaller than chunk_size, such pattern is almost impossible in
practice. So in this patch, I only handle the case that A's size
equals to chunk_size. That is discard request should be aligned to
stripe size and its size is multiple of stripe size.

Since we can only handle request with specific alignment and size (or
part of the request fitting stripes), we can't guarantee
zero_after_discard even zero_after_discard is true in low level
drives.

The block layer doesn't send down correctly aligned requests even
correct discard alignment is set, so I must filter out.

For raid4/5/6 parity calculation, if data is 0, parity is 0. So if
zero_after_discard is true for all disks, data is consistent after
discard.  Otherwise, data might be lost. Let's consider a scenario:
discard a stripe, write data to one disk and write parity disk. The
stripe could be still inconsistent till then depending on using data
from other data disks or parity disks to calculate new parity. If the
disk is broken, we can't restore it. So in this patch, we only enable
discard support if all disks have zero_after_discard.

If discard fails in one disk, we face the similar inconsistent issue
above. The patch will make discard follow the same path as normal
write request. If discard fails, a resync will be scheduled to make
the data consistent. This isn't good to have extra writes, but data
consistency is important.

If a subsequent read/write request hits raid5 cache of a discarded
stripe, the discarded dev page should have zero filled, so the data is
consistent. This patch will always zero dev page for discarded request
stripe. This isn't optimal because discard request doesn't need such
payload. Next patch will avoid it.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

620125f2

J
md/bitmap:Don't use IS_ERR to judge alloc_page(). · 582e2e05
由 Jianpeng Ma 提交于 10月 11, 2012
```
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
582e2e05

md/raid1: Don't release reference to device while handling read error. · 7ad4d4a6

由 NeilBrown 提交于 10月 11, 2012

When we get a read error, we arrange for raid1d to handle it.
Currently we release the reference on the device.  This can result
in
   conf->mirrors[read_disk].rdev
being NULL in fix_read_error, if the device happens to get removed
before the read error is handled.

So instead keep the reference until the read error has been fully
handled.
Reported-by: Nhank <pyu@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7ad4d4a6