- 31 7月, 2016 1 次提交
-
-
由 Tomasz Majchrzak 提交于
md pending write counter must be incremented after bio is split, otherwise it gets decremented too many times in end bio callback and becomes negative. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 20 7月, 2016 2 次提交
-
-
由 Tomasz Majchrzak 提交于
RAID10 random read performance is lower than expected due to excessive spinlock utilisation which is required mostly for rebuild/resync. Simplify allow_barrier as it's in IO path and encounters a lot of unnecessary congestion. As lower_barrier just takes a lock in order to decrement a counter, convert counter (nr_pending) into atomic variable and remove the spin lock. There is also a congestion for wake_up (it uses lock internally) so call it only when it's really needed. As wake_up is not called constantly anymore, ensure process waiting to raise a barrier is notified when there are no more waiting IOs. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Arnd Bergmann 提交于
The md code stores the exact time of the last error in the last_read_error variable using a timespec structure. It only ever uses the seconds portion of that though, so we can use a scalar for it. There won't be an overflow in 2038 here, because it already used monotonic time and 32-bit is enough for that, but I've decided to use time64_t for consistency in the conversion. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 6月, 2016 10 次提交
-
-
由 NeilBrown 提交于
Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made which can delay several milliseconds in some case. If lots of devices fail at once - as could happen with a large RAID10 where one set of devices are removed all at once - these delays can add up to be very inconcenient. As failure is not reversible we can check for that first, setting a separate flag if it is found, and then all synchronize_rcu() once for all the flagged devices. Then ->hot_remove_disk() function can skip the synchronize_rcu() step if the flag is set. fix build error(Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
It is important that we never increment rdev->nr_pending on a Faulty device as ->hot_remove_disk() assumes that once the Faulty flag is visible no code will take a new reference. Some places take a new reference after only check In_sync. This should be safe as the two are changed together. However to make the code more obviously safe, add checks for 'Faulty' as well. Note: the actual rule is: Never increment nr_pending if Faulty is set and Blocked is clear, never clear Faulty, and never set Blocked without holding a reference through nr_pending. fix build error (Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
'tmp' is only ever used to extract 'tmp->rdev', so just use 'rdev' directly. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
rdev already holds conf->mirrors[d].rdev, so no need to load it again. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
mirrors[].rdev can become NULL at any point unless: - a counted reference is held - ->reconfig_mutex is held, or - rcu_read_lock() is held Reshape isn't always suitably careful as in the past rdev couldn't be removed during reshape. It can now, so add protection. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
mirrors[].rdev can become NULL at any point unless: - a counted reference is held - ->reconfig_mutex is held, or - rcu_read_lock() is held Previously they could not become NULL during a resync/recovery/reshape either. However when remove_and_add_spares() was added to hot_remove_disk(), that changed. So raid10_sync_request didn't previously need to protect rdev access, but now it does. Fix missed check(Shaohua) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
mirrors[].rdev can become NULL at any point unless: - a counted reference is held - ->reconfig_mutex is held, or - rcu_read_lock() is held raid10_status holds none of these. So add rcu_read_lock() protection. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
If you have a raid10 with a replacement device that is resyncing - e.g. after a crash before the replacement was complete - the write to the replacement will increment nr_pending on the wrong device, which will lead to strangeness. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 NeilBrown 提交于
Re-checking the faulty flag here brings no value. The comment about "risk" refers to the risk that the device could be in the process of being removed by ->hot_remove_disk(). However providing that the ->nr_pending count is incremented inside an rcu_read_locked() region, there is no risk of that happening. This is because the rdev pointer (in the personalities array) is set to NULL before synchronize_rcu(), and ->nr_pending is tested afterwards. If the rcu_read_locked region happens before the synchronize_rcu(), the test will see that nr_pending has been incremented. If it happens afterwards, the rdev pointer will be NULL so there is nothing to increment. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Tomasz Majchrzak 提交于
A performance drop of mkfs has been observed on RAID10 during resync since commit 09314799 ("md: remove 'go_faster' option from ->sync_request()"). Resync sends so many IOs it slows down non-resync IOs significantly (few times). Add a short delay to a resync. The previous long sleep (1s) has proven unnecessary, even very short delay brings performance right. The change also applied to raid1. The problem has not been observed on raid1, however it shares barriers code with raid10 so it might be an issue for some setup too. Suggested-by: NNeilBrown <neilb@suse.com> Link: http://lkml.kernel.org/r/20160609134555.GA9104@proton.igk.intel.comSigned-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 09 6月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of overloading the discard support with the REQ_SECURE flag. Use the opportunity to rename the queue flag as well, and remove the dead checks for this flag in the RAID 1 and RAID 10 drivers that don't claim support for secure erase. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 6月, 2016 3 次提交
-
-
由 Mike Christie 提交于
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
Separate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: NMike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 5月, 2016 2 次提交
-
-
由 Guoqing Jiang 提交于
Some code waits for a metadata update by: 1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN) 2. setting MD_CHANGE_PENDING and waking the management thread 3. waiting for MD_CHANGE_PENDING to be cleared If the first two are done without locking, the code in md_update_sb() which checks if it needs to repeat might test if an update is needed before step 1, then clear MD_CHANGE_PENDING after step 2, resulting in the wait returning early. So make sure all places that set MD_CHANGE_PENDING are atomicial, and bit_clear_unless (suggested by Neil) is introduced for the purpose. Cc: Martin Kepplinger <martink@posteo.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: <linux-kernel@vger.kernel.org> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Heinz Mauelshagen 提交于
In case md runs underneath the dm-raid target, the mddev does not have a request queue or gendisk, thus avoid accesses to it. This patch adds two missing conditionals to the raid10 personality. Signed-of-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 18 3月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
This is the raid10 counterpart of the bug fixed by Nate (raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang) Fixes: 95af587e(md/raid10: ensure device failure recorded before write request returns) Cc: stable@vger.kernel.org (V4.3+) Cc: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 21 1月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
These short function names are hard to search. Rename them to make vim happy. Signed-off-by: NShaohua Li <shli@fb.com>
-
- 14 1月, 2016 1 次提交
-
-
由 Dan Williams 提交于
It is not safe for an integrity profile to be changed while i/o is in-flight in the queue. Prevent adding new disks or otherwise online spares to an array if the device has an incompatible integrity profile. The original change to the blk_integrity_unregister implementation in md, commmit c7bfced9 "md: suspend i/o during runtime blk_integrity_unregister" introduced an immediate hang regression. This policy of disallowing changes the integrity profile once one has been established is shared with DM. Here is an abbreviated log from a test run that: 1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [ 59.076127] 2/ Tries to add an integrity-disabled device (pmem1m) [ 90.489209] 3/ Retries with an integrity-enabled device (pmem1s) [ 205.671277] [ 59.076127] md/raid1:md0: active with 1 out of 2 mirrors [ 59.078302] md: data integrity enabled on md0 [..] [ 90.489209] md0: incompatible integrity profile for pmem1m [..] [ 205.671277] md: super_written gets error=-5 [ 205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device. [ 205.677386] md/raid1:md0: Operation continuing on 1 devices. [ 205.683037] RAID1 conf printout: [ 205.684699] --- wd:1 rd:2 [ 205.685972] disk 0, wo:0, o:1, dev:pmem0s [ 205.687562] disk 1, wo:1, o:1, dev:pmem1s [ 205.691717] md: recovery of RAID array md0 Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister") Cc: <stable@vger.kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: NNeilBrown <neilb@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 18 12月, 2015 1 次提交
-
-
由 Artur Paszkiewicz 提交于
The commit c31df25f ("md/raid10: make sync_request_write() call bio_copy_data()") replaced manual data copying with bio_copy_data() but it doesn't work as intended. The source bio (fbio) is already processed, so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because of this, bio_copy_data() either does not copy anything, or worse, copies data from the ->bi_next bio if it is set. This causes wrong data to be written to drives during resync and sometimes lockups/crashes in bio_copy_data(): [ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319] [ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1 [ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015 [ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000 [ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0 [ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246 [ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000 [ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0 [ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980 [ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000 [ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000 [ 517.525412] FS: 0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000 [ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0 [ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 517.566144] Stack: [ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000 [ 517.577659] 0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800 [ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000 [ 517.595773] Call Trace: [ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10] [ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2 [ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140 [ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80 [ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520 [ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0 [ 517.636506] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70 [ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70 [ 517.649929] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70 Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: NShaohua Li <shli@kernel.org> Cc: stable@vger.kernel.org (v4.2+) Fixes: c31df25f ("md/raid10: make sync_request_write() call bio_copy_data()") Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 24 10月, 2015 2 次提交
-
-
由 NeilBrown 提交于
In Linux 3.9 we introduce a new 'far' layout for RAID10 which was supposed to rotate the replicas differently and so provide better resilience. In particular it could survive more combinations of 2 drive failures. Unfortunately. due to a coding error, this some did what was wanted, sometimes improved less than we hoped, and sometimes - in very unlikely circumstances - put multiple replicas on the same device so the redundancy was harmed. No public user-space tool has created arrays using this layout so it is very unlikely that zero-redundancy arrays actually exist. Probably no arrays using any form of the new layout exist. But we cannot be certain. So use another bit in the 'layout' number and introduce a bug-fixed version of the layout. Also when assembling an array, if it has a zero-redundancy layout, give a warning. Reported-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
When a write fails and a bad-block-list is present, we can update the bad-block-list instead of writing the data. If this succeeds then it is OK clear the relevant bitmap-bit as no further 'sync' of the block is needed. However if writing the bad-block-list fails then we need to treat the write as failed and particularly must not clear the bitmap bit. Otherwise the device can be re-added (after any hardware connection issues are resolved) and because the relevant bit in the bitmap is clear, that block will not be resynced. This leads to data corruption. We already delay the final bio_endio() on the write until the bad-block-list is written so that when the write returns: either that data is safe, the bad-block record is safe, or the fact that the device is faulty is safe. However we *don't* delay the clearing of the bitmap, so the bitmap bit can be recorded as cleared before we know if the bad-block-list was written safely. So: delay that until the write really is safe. i.e. move the call to close_write() until just before calling bio_endio(), and recheck the 'is array degraded' status before making that call. This bug goes back to v3.1 when bad-block-lists were introduced, though it only affects arrays created with mdadm-3.3 or later as only those have bad-block lists. Backports will require at least Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.") as well. I'll send that to 'stable' separately. Note that of the two tests of R10BIO_WriteError that this patch adds, the first is certain to fail and the second is certain to succeed. However doing it this way makes the patch more obviously correct. I will tidy the code up in a future merge window. Reported-by: NNate Dailey <nate.dailey@stratus.com> Fixes: bd870a16 ("md/raid10: Handle write errors by updating badblock log.") Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 22 10月, 2015 1 次提交
-
-
由 Dan Williams 提交于
Synchronize pending i/o against a change in the integrity profile to avoid the possibility of spurious integrity errors. Given linear_add() is suspending the mddev before manipulating the mddev, do the same for the other personalities. Acked-by: NNeilBrown <neilb@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 10月, 2015 1 次提交
-
-
由 Jes Sorensen 提交于
This was introduced with 9e882242 which changed the return value of submit_bio_wait() to return != 0 on error, but didn't update the caller accordingly. Fixes: 9e882242 ("block: Add submit_bio_wait(), remove from md") Cc: stable@vger.kernel.org (v3.10) Reported-by: NBill Kuzeja <William.Kuzeja@stratus.com> Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 12 10月, 2015 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1conf as cluster_sync_low and cluster_sync_high and processed in raid1's sync_request(). If the current resync is outside the cluster resync window: 1. Set the cluster_sync_low to curr_resync_completed. 2. Check if the sync will fit in the new window, if not issue a wait_barrier() and set cluster_sync_low to sector_nr. 3. Set cluster_sync_high to cluster_sync_low + resync_window. 4. Send a message to all nodes so they may add it in their suspension list. bitmap_cond_end_sync is modified to allow to force a sync inorder to get the curr_resync_completed uptodate with the sector passed. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 10月, 2015 1 次提交
-
-
由 Mikulas Patocka 提交于
The commit 55ce74d4 (md/raid1: ensure device failure recorded before write request returns) is causing crash in the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100% reproducible. The reason for the crash is that the newly added code in raid1d moves the list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and then incorrectly pops the bio from conf->bio_end_io_list (which is empty because the list was alrady moved). Raid-10 has a similar bug. Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000) CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35 task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111111000001111 Not tainted r00-03 000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38 r04-07 000000001059d000 000000007f16be08 0000000000200200 0000000000000001 r08-11 000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000 r12-15 000000004056f320 0000000000000000 0000000000013dd0 0000000000000000 r16-19 00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001 r20-23 000000000800000f 0000000042200390 0000000000000000 0000000000000000 r24-27 0000000000000001 000000000800000f 000000007f16be08 000000001059d000 r28-31 0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000 sr00-03 0000000000249800 0000000000000000 0000000000000000 0000000000249800 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620 IIR: 0f8010c6 ISR: 0000000000000000 IOR: 0000000100000000 CPU: 3 CR30: 000000006ccb8000 CR31: 0000000000000000 ORIG_R28: 000000001059d000 IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1] IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1] RP(r2): raid_end_bio_io+0x88/0x168 [raid1] Backtrace: [<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1] [<00000000105a4f64>] raid1d+0x144/0x1640 [raid1] [<000000004017fd5c>] kthread+0x144/0x160 Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.") Fixes: 95af587e ("md/raid10: ensure device failure recorded before write request returns.") Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 02 10月, 2015 1 次提交
-
-
由 Julia Lawall 提交于
Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 01 9月, 2015 2 次提交
-
-
由 NeilBrown 提交于
When a write to one of the legs of a RAID10 fails, the failure is recorded in the metadata of the other legs so that after a restart the data on the failed drive wont be trusted even if that drive seems to be working again (maybe a cable was unplugged). Currently there is no interlock between the write request completing and the metadata update. So it is possible that the write will complete, the app will confirm success in some way, and then the machine will crash before the metadata update completes. This is an extremely small hole for a racy to fit in, but it is theoretically possible and so should be closed. So: - set MD_CHANGE_PENDING when requesting a metadata update for a failed device, so we can know with certainty when it completes - queue requests that experienced an error on a new queue which is only processed after the metadata update completes - call raid_end_bio_io() on bios in that queue when the time comes. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 14 8月, 2015 1 次提交
-
-
由 Kent Overstreet 提交于
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: NDongsu Park <dpark@posteo.net> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 7月, 2015 2 次提交
-
-
由 Jens Axboe 提交于
Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too. It was a bit of a mess of atomic vs non-atomic access. With BIO_UPTODATE gone, we don't have any risk of concurrent access to the flags. So relax the restriction and don't make any of them atomic. The flags that do have serialization issues (reffed and chained), we already handle those separately. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 7月, 2015 1 次提交
-
-
由 NeilBrown 提交于
'reshape_position' tracks where in the reshape we have reached. 'reshape_safe' tracks where in the reshape we have safely recorded in the metadata. These are compared to determine when to update the metadata. So it is important that reshape_safe is initialised properly. Currently it isn't. When starting a reshape from the beginning it usually has the correct value by luck. But when reducing the number of devices in a RAID10, it has the wrong value and this leads to the metadata not being updated correctly. This can lead to corruption if the reshape is not allowed to complete. This patch is suitable for any -stable kernel which supports RAID10 reshape, which is 3.5 and later. Fixes: 3ea7daa5 ("md/raid10: add reshape support") Cc: stable@vger.kernel.org (v3.5+ please wait for -final to be out for 2 weeks) Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 17 6月, 2015 1 次提交
-
-
由 Kent Overstreet 提交于
Refactor sync_request_write() of md/raid10 to use bio_copy_data() instead of open coding bio_vec iterations. Cc: Christoph Hellwig <hch@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NNeilBrown <neilb@suse.de> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: NDongsu Park <dpark@posteo.net> Signed-off-by: NMing Lin <mlin@kernel.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 6月, 2015 1 次提交
-
-
由 NeilBrown 提交于
MD_RECOVERY_DONE is normally cleared by md_check_recovery after a resync etc finished. However it is possible for raid5_start_reshape to race and start a reshape before MD_RECOVERY_DONE is cleared. This can lean to multiple reshapes running at the same time, which isn't good. To make sure it is cleared before starting a reshape, and also clear it when reaping a thread, just to be safe. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 02 6月, 2015 1 次提交
-
-
由 Tejun Heo 提交于
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 4月, 2015 1 次提交
-
-
由 NeilBrown 提交于
This option is not well justified and testing suggests that it hardly ever makes any difference. The comment suggests there might be a need to wait for non-resync activity indicated by ->nr_waiting, however raise_barrier() already waits for all of that. So just remove it to simplify reasoning about speed limiting. This allows us to remove a 'FIXME' comment from raid5.c as that never used the flag. Signed-off-by: NNeilBrown <neilb@suse.de>
-