提交 · 26308eab69aa193f7b3fb50764a64ae14544a39b · openanolis / cloud-kernel

07 4月, 2009 1 次提交

block: fix inconsistency in I/O stat accounting code · 26308eab

由 Jerome Marchand 提交于 3月 27, 2009

This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

26308eab

26 3月, 2009 1 次提交

block: WARN in __blk_put_request() for potential bio leak · 1cd96c24

由 Boaz Harrosh 提交于 3月 24, 2009

Put a WARN_ON in __blk_put_request if it is about to
leak bio(s). This is a serious bug that can happen in error
handling code paths.

For this to work I have fixed a couple of places in block/ where
request->bio != NULL ownership was not honored. And a small cleanup
at sg_io() while at it.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1cd96c24

06 3月, 2009 1 次提交

block: fix missing bio back/front segment size setting in blk_recount_segments() · 59247eae

由 Jens Axboe 提交于 3月 06, 2009

Commit 1e428079 introduced a bug where we
don't get front/back segment sizes in the bio in blk_recount_segments().
Fix this by tracking the back bio as well as the front bio in
__blk_recalc_rq_segments(), this also cleans up the interface by getting
rid of the segment size pointer passing.
Tested-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

59247eae

26 2月, 2009 1 次提交

block: reduce stack footprint of blk_recount_segments() · 1e428079

由 Jens Axboe 提交于 2月 23, 2009

blk_recalc_rq_segments() requires a request structure passed in, which
we don't have from blk_recount_segments(). So the latter allocates one on
the stack, using > 400 bytes of stack for that. This can cause us to spill
over one page of stack from ext4 at least:

 0)     4560     400   blk_recount_segments+0x43/0x62
 1)     4160      32   bio_phys_segments+0x1c/0x24
 2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
 3)     4096      32   init_request_from_bio+0xf9/0xfe
 4)     4064     112   __make_request+0x33c/0x3f6
 5)     3952     144   generic_make_request+0x2d1/0x321
 6)     3808      64   submit_bio+0xb9/0xc3
 7)     3744      48   submit_bh+0xea/0x10e
 8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
 9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
16)     1664      32   do_writepages+0x2d/0x3c
17)     1632     144   __writeback_single_inode+0x162/0x2cd
18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
19)     1392      16   sync_sb_inodes+0xe/0x10
20)     1376      48   writeback_inodes+0x69/0xb3
21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
24)      720      80   generic_file_aio_write+0x6c/0xc8
25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
26)      560     320   do_sync_write+0xf0/0x137
27)      240      48   vfs_write+0xb3/0x13c
28)      192      64   sys_write+0x4c/0x74
29)      128     128   system_call_fastpath+0x16/0x1b

Split the segment counting out into a __blk_recalc_rq_segments() helper
to avoid allocating an onstack request just for checking the physical
segment count.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1e428079

06 11月, 2008 1 次提交

block: remove unused ll_new_mergeable() · 43381785

由 FUJITA Tomonori 提交于 10月 20, 2008

Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

43381785

17 10月, 2008 1 次提交

block: fix nr_phys_segments miscalculation bug · 86771427

由 FUJITA Tomonori 提交于 10月 13, 2008

This fixes the bug reported by Nikanth Karthikesan <knikanth@suse.de>:

http://lkml.org/lkml/2008/10/2/203

The root cause of the bug is that blk_phys_contig_segment
miscalculates q->max_segment_size.

blk_phys_contig_segment checks:

req->biotail->bi_size + next_req->bio->bi_size > q->max_segment_size

But blk_recalc_rq_segments might expect that req->biotail and the
previous bio in the req are supposed be merged into one
segment. blk_recalc_rq_segments might also expect that next_req->bio
and the next bio in the next_req are supposed be merged into one
segment. In such case, we merge two requests that can't be merged
here. Later, blk_rq_map_sg gives more segments than it should.

We need to keep track of segment size in blk_recalc_rq_segments and
use it to see if two requests can be merged. This patch implements it
in the similar way that we used to do for hw merging (virtual
merging).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

86771427

09 10月, 2008 8 次提交

block: inherit CPU completion on bio->rq and rq->rq merges · ab780f1e

由 Jens Axboe 提交于 8月 26, 2008

Somewhat incomplete, as we do allow merges of requests and bios
that have different completion CPUs given. This is done on the
assumption that a larger IO is still more beneficial than CPU
locality.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ab780f1e

block: move stats from disk to part0 · 074a7aca

由 Tejun Heo 提交于 8月 25, 2008

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
  is not part0.  ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary.  It handles part0 stats
  automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
  part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
  Combined with the above changes, this makes NULL special case
  handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
  stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

074a7aca

block: fix diskstats access · c9959059

由 Tejun Heo 提交于 8月 25, 2008

There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters.  It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.

This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access).  diskstats access
should always be enclosed between the two functions.  As such, there's
no need for the versions which disables preemption.  They're removed
and double underbars ones are renamed to drop the underbars.  As an
extra argument is added, there's no danger of using the old version
unconverted.

disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.

This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others.  Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c9959059

block: fix disk->part[] dereferencing race · e71bf0d0

由 Tejun Heo 提交于 9月 03, 2008

disk->part[] is protected by its matching bdev's lock.  However,
non-critical accesses like collecting stats and printing out sysfs and
proc information used to be performed without any locking.  As
partitions can come and go dynamically, partitions can go away
underneath those non-critical accesses.  As some of those accesses are
writes, this theoretically can lead to silent corruption.

This patch fixes the race by using RCU for the partition array and dev
reference counter to hold partitions.

* Rename disk->part[] to disk->__part[] to make sure no one outside
  genhd layer proper accesses it directly.

* Use RCU for disk->__part[] dereferencing.

* Implement disk_{get|put}_part() which can be used to get and put
  partitions from gendisk respectively.

* Iterators are implemented to help iterate through all partitions
  safely.

* Functions which require RCU readlock are marked with _rcu suffix.

* Use disk_put_part() in __blkdev_put() instead of directly putting
  the contained kobject.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e71bf0d0

block: misc updates · 310a2c10

由 Tejun Heo 提交于 8月 25, 2008

This patch makes the following misc updates in preparation for
disk->part dereference fix and extended block devt support.

* implment part_to_disk()

* fix comment about gendisk->part indexing

* rename get_part() to disk_map_sector()

* don't use n which is always zero while printing disk information in
  diskstats_show()
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

310a2c10

drop vmerge accounting · 5df97b91

由 Mikulas Patocka 提交于 8月 15, 2008

Remove hw_segments field from struct bio and struct request. Without virtual
merge accounting they have no purpose.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5df97b91

block: drop virtual merging accounting · b8b3e16c

由 Mikulas Patocka 提交于 8月 15, 2008

Remove virtual merge accounting.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b8b3e16c

Allow elevators to sort/merge discard requests · e17fc0a1

由 David Woodhouse 提交于 8月 09, 2008

But blkdev_issue_discard() still emits requests which are interpreted as
soft barriers, because naïve callers might otherwise issue subsequent
writes to those same sectors, which might cross on the queue (if they're
reallocated quickly enough).

Callers still _can_ issue non-barrier discard requests, but they have to
take care of queue ordering for themselves.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e17fc0a1

03 7月, 2008 1 次提交

block: Block layer data integrity support · 7ba1ba12

由 Martin K. Petersen 提交于 6月 30, 2008

Some block devices support verifying the integrity of requests by way
of checksums or other protection information that is submitted along
with the I/O.

This patch implements support for generating and verifying integrity
metadata, as well as correctly merging, splitting and cloning bios and
requests that have this extra information attached.

See Documentation/block/data-integrity.txt for more information.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7ba1ba12

07 5月, 2008 1 次提交

block: get rid of likely/unlikely predictions in merge logic · 2cdf79ca

由 Jens Axboe 提交于 5月 07, 2008

They tend to depend a lot on the workload, so not a clear-cut
likely or unlikely fit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2cdf79ca

29 4月, 2008 1 次提交

block: make queue flags non-atomic · 75ad23bc

由 Nick Piggin 提交于 4月 29, 2008

We can save some atomic ops in the IO path, if we clearly define
the rules of how to modify the queue flags.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

75ad23bc

21 4月, 2008 1 次提交

block: move the padding adjustment to blk_rq_map_sg · f18573ab

由 FUJITA Tomonori 提交于 4月 11, 2008

blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule
that req->data_len (the true data length) is equal to sum(bio). It
broke the scsi command completion code.

commit e97a294e was introduced to fix
the above issue. However, the partial completion code doesn't work
with it. The commit is also a layer violation (scsi mid-layer should
not know about the block layer's padding).

This patch moves the padding adjustment to blk_rq_map_sg (suggested by
James). The padding works like the drain buffer. This patch breaks the
rule that req->data_len is equal to sum(sg), however, the drain buffer
already broke it. So this patch just restores the rule that
req->data_len is equal to sub(bio) without breaking anything new.

Now when a low level driver needs padding, blk_rq_map_user and
blk_rq_map_user_iov guarantee there's enough room for padding.
blk_rq_map_sg can safely extend the last entry of a scatter list.

blk_rq_map_sg must extend the last entry of a scatter list only for a
request that got through bio_copy_user_iov. This patches introduces
new REQ_COPY_USER flag.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f18573ab

04 3月, 2008 1 次提交

block: restore the meaning of rq->data_len to the true data length · 7a85f889

由 FUJITA Tomonori 提交于 3月 04, 2008

The meaning of rq->data_len was changed to the length of an allocated
buffer from the true data length. It breaks SG_IO friends and
bsg. This patch restores the meaning of rq->data_len to the true data
length and adds rq->extra_len to store an extended length (due to
drain buffer and padding).

This patch also removes the code to update bio in blk_rq_map_user
introduced by the commit 40b01b9b.
The commit adjusts bio according to memory alignment
(queue_dma_alignment). However, memory alignment is NOT padding
alignment. This adjustment also breaks SG_IO friends and bsg. Padding
alignment needs to be fixed in a proper way (by a separate patch).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>

7a85f889

19 2月, 2008 3 次提交

block: clear drain buffer if draining for write command · db0a2e00

由 Tejun Heo 提交于 2月 19, 2008

Clear drain buffer before chaining if the command in question is a
write.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

db0a2e00

block: implement request_queue->dma_drain_needed · 2fb98e84

由 Tejun Heo 提交于 2月 19, 2008

Draining shouldn't be done for commands where overflow may indicate
data integrity issues.  Add dma_drain_needed callback to
request_queue.  Drain buffer is appened iff this function returns
non-zero.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2fb98e84

block: add request->raw_data_len · 6b00769f

由 Tejun Heo 提交于 2月 19, 2008

With padding and draining moved into it, block layer now may extend
requests as directed by queue parameters, so now a request has two
sizes - the original request size and the extended size which matches
the size of area pointed to by bios and later by sgs. The latter size
is what lower layers are primarily interested in when allocating,
filling up DMA tables and setting up the controller.

Both padding and draining extend the data area to accomodate
controller characteristics. As any controller which speaks SCSI can
handle underflows, feeding larger data area is safe.

So, this patch makes the primary data length field, request->data_len,
indicate the size of full data area and add a separate length field,
request->raw_data_len, for the unmodified request size. The latter is
used to report to higher layer (userland) and where the original
request size should be fed to the controller or device.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6b00769f

08 2月, 2008 1 次提交

Enhanced partition statistics: update partition statitics · 6f2576af

由 Jerome Marchand 提交于 2月 08, 2008

Updates the enhanced partition statistics in generic block layer
besides the disk statistics.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6f2576af

01 2月, 2008 1 次提交
- J
  block: make core bits checkpatch compliant · 6728cb0e
  由 Jens Axboe 提交于 1月 31, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
  6728cb0e
30 1月, 2008 1 次提交
- J
  block: ll_rw_blk.c split, add blk-merge.c · d6d48196
  由 Jens Axboe 提交于 1月 29, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
  d6d48196

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功