提交 · 8288f496eb1b1905c425e92eaf1abbb29119217b · bug2833 / cloud-kernel

27 9月, 2014 3 次提交

block: Add prefix to block integrity profile flags · 8288f496

由 Martin K. Petersen 提交于 9月 26, 2014

Add a BLK_ prefix to the integrity profile flags. Also rename the flags
to be more consistent with the generate/verify terminology in the rest
of the integrity code.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8288f496

block: Deprecate the use of the term sector in the context of block integrity · 3be91c4a

由 Martin K. Petersen 提交于 9月 26, 2014

The protection interval is not necessarily tied to the logical block
size of a block device. Stop using the terms "sector" and "sectors".

Going forward we will use the term "seed" to describe the initial
reference tag value for a given I/O. "Interval" will be used to describe
the portion of the data buffer that a given piece of protection
information is associated with.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3be91c4a

block: Remove integrity tagging functions · 8492b68b

由 Martin K. Petersen 提交于 9月 26, 2014

None of the filesystems appear interested in using the integrity tagging
feature. Potentially because very few storage devices actually permit
using the application tag space.

Remove the tagging functions.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8492b68b

24 11月, 2013 1 次提交

bio-integrity: Convert to bvec_iter · d57a5f7c

由 Kent Overstreet 提交于 11月 23, 2013

The bio integrity is also stored in a bvec array, so if we use the bvec
iter code we just added, the integrity code won't need to implement its
own iteration stuff (bio_integrity_mark_head(), bio_integrity_mark_tail())
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>

d57a5f7c

20 3月, 2013 1 次提交

scatterlist: introduce sg_unmark_end · c8164d89

由 Paolo Bonzini 提交于 3月 20, 2013

This is useful in places that recycle the same scatterlist multiple
times, and do not want to incur the cost of sg_init_table every
time in hot paths.
Acked-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

c8164d89

22 2月, 2013 1 次提交

bdi: allow block devices to say that they require stable page writes · 7d311cda

由 Darrick J. Wong 提交于 2月 21, 2013

This patchset ("stable page writes, part 2") makes some key
modifications to the original 'stable page writes' patchset.  First, it
provides creators (devices and filesystems) of a backing_dev_info a flag
that declares whether or not it is necessary to ensure that page
contents cannot change during writeout.  It is no longer assumed that
this is true of all devices (which was never true anyway).  Second, the
flag is used to relaxed the wait_on_page_writeback calls so that wait
only occurs if the device needs it.  Third, it fixes up the remaining
disk-backed filesystems to use this improved conditional-wait logic to
provide stable page writes on those filesystems.

It is hoped that (for people not using checksumming devices, anyway)
this patchset will give back unnecessary performance decreases since the
original stable page write patchset went into 3.0.  Sorry about not
fixing it sooner.

Complaints were registered by several people about the long write
latencies introduced by the original stable page write patchset.
Generally speaking, the kernel ought to allocate as little extra memory
as possible to facilitate writeout, but for people who simply cannot
wait, a second page stability strategy is (re)introduced: snapshotting
page contents.  The waiting behavior is still the default strategy; to
enable page snapshotting, a superblock flag (MS_SNAP_STABLE) must be
set.  This flag is used to bandaid^Henable stable page writeback on
ext3[1], and is not used anywhere else.

Given that there are already a few storage devices and network FSes that
have rolled their own page stability wait/page snapshot code, it would
be nice to move towards consolidating all of these.  It seems possible
that iscsi and raid5 may wish to use the new stable page write support
to enable zero-copy writeout.

Thank you to Jan Kara for helping fix a couple more filesystems.

Per Andrew Morton's request, here are the result of using dbench to measure
latencies on ext2:

3.8.0-rc3:
   Operation      Count    AvgLat    MaxLat
   ----------------------------------------
   WriteX        109347     0.028    59.817
   ReadX         347180     0.004     3.391
   Flush          15514    29.828   287.283

  Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms

3.8.0-rc3 + patches:
   WriteX        105556     0.029     4.273
   ReadX         335004     0.005     4.112
   Flush          14982    30.540   298.634

  Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms

As you can see, for ext2 the maximum write latency decreases from ~60ms
on a laptop hard disk to ~4ms.  I'm not sure why the flush latencies
increase, though I suspect that being able to dirty pages faster gives
the flusher more work to do.

On ext4, the average write latency decreases as well as all the maximum
latencies:

3.8.0-rc3:
   WriteX         85624     0.152    33.078
   ReadX         272090     0.010    61.210
   Flush          12129    36.219   168.260

  Throughput 44.8618 MB/sec  4 clients  4 procs  max_latency=168.276 ms

3.8.0-rc3 + patches:
   WriteX         86082     0.141    30.928
   ReadX         273358     0.010    36.124
   Flush          12214    34.800   165.689

  Throughput 44.9941 MB/sec  4 clients  4 procs  max_latency=165.722 ms

XFS seems to exhibit similar latency improvements as ext2:

3.8.0-rc3:
   WriteX        125739     0.028   104.343
   ReadX         399070     0.005     4.115
   Flush          17851    25.004   131.390

  Throughput 66.0024 MB/sec  4 clients  4 procs  max_latency=131.406 ms

3.8.0-rc3 + patches:
   WriteX        123529     0.028     6.299
   ReadX         392434     0.005     4.287
   Flush          17549    25.120   188.687

  Throughput 64.9113 MB/sec  4 clients  4 procs  max_latency=188.704 ms

...and btrfs, just to round things out, also shows some latency
decreases:

3.8.0-rc3:
   WriteX         67122     0.083    82.355
   ReadX         212719     0.005     2.828
   Flush           9547    47.561   147.418

  Throughput 35.3391 MB/sec  4 clients  4 procs  max_latency=147.433 ms

3.8.0-rc3 + patches:
   WriteX         64898     0.101    71.631
   ReadX         206673     0.005     7.123
   Flush           9190    47.963   219.034

  Throughput 34.0795 MB/sec  4 clients  4 procs  max_latency=219.044 ms

Before this patchset, all filesystems would block, regardless of whether
or not it was necessary.  ext3 would wait, but still generate occasional
checksum errors.  The network filesystems were left to do their own
thing, so they'd wait too.

After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it.  ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait.  Network filesystems haven't been touched, so either they
provide their own wait code, or they don't block at all.  The blocking
behavior is back to what it was before 3.0 if you don't have a disk
requiring stable page writes.

This patchset has been tested on 3.8.0-rc3 on x64 with ext3, ext4, and
xfs.  I've spot-checked 3.8.0-rc4 and seem to be getting the same
results as -rc3.

[1] The alternative fixes to ext3 include fixing the locking order and
page bit handling like we did for ext4 (but then why not just use
ext4?), or setting PG_writeback so early that ext3 becomes extremely
slow.  I tried that, but the number of write()s I could initiate dropped
by nearly an order of magnitude.  That was a bit much even for the
author of the stable page series! :)

This patch:

Creates a per-backing-device flag that tracks whether or not pages must
be held immutable during writeout.  Eventually it will be used to waive
wait_for_page_writeback() if nothing requires stable pages.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d311cda

01 11月, 2011 1 次提交

block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros · d5decd3b

由 Paul Gortmaker 提交于 5月 26, 2011

These files were getting <linux/module.h> via an implicit include
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason.  Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

d5decd3b

06 4月, 2011 1 次提交

dm: improve block integrity support · a63a5cf8

由 Mike Snitzer 提交于 4月 01, 2011

The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return).  To some degree that is unavoidable
(stacked DM devices force this late checking).  But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.

Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template.  Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.

Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
  during table load; uninitialized profiles (e.g. for underlying DM
  device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
  conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
  don't all we can do is report the mismatch (during resume we're past
  the point of no return)
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a63a5cf8

15 10月, 2010 1 次提交

block: Fix double free in blk_integrity_unregister · e817bf3f

由 Martin K. Petersen 提交于 10月 15, 2010

Commit 3839e4b2 introduced a kobject_put but failed to remove the
kmem_cache_free beneath it, leading to a double free.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

e817bf3f

11 9月, 2010 1 次提交

block/scsi: Provide a limit on the number of integrity segments · 13f05c8d

由 Martin K. Petersen 提交于 9月 10, 2010

Some controllers have a hardware limit on the number of protection
information scatter-gather list segments they can handle.

Introduce a max_integrity_segments limit in the block layer and provide
a new scsi_host_template setting that allows HBA drivers to provide a
value suitable for the hardware.

Add support for honoring the integrity segment limit when merging both
bios and requests.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>

13f05c8d

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

08 3月, 2010 1 次提交

Driver core: Constify struct sysfs_ops in struct kobj_type · 52cf25d0

由 Emese Revfy 提交于 1月 19, 2010

Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing
Signed-off-by: NEmese Revfy <re.emese@gmail.com>
Acked-by: NDavid Teigland <teigland@redhat.com>
Acked-by: NMatt Domsch <Matt_Domsch@dell.com>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: NHans J. Koch <hjk@linutronix.de>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

52cf25d0

28 7月, 2009 1 次提交

block: fix improper kobject release in blk_integrity_unregister · 3839e4b2

由 Xiaotian Feng 提交于 7月 28, 2009

blk_integrity_unregister should use kobject_put to release the kobject,
otherwise after bi is freed, memory of bi->kobj->name is leaked.
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3839e4b2

23 5月, 2009 1 次提交

block: Do away with the notion of hardsect_size · e1defc4f

由 Martin K. Petersen 提交于 5月 22, 2009

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case.  The
sector size will be 4KB but the logical block size will remain
512-bytes.  Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e1defc4f

30 1月, 2009 1 次提交

block: Allow empty integrity profile · 32231638

由 Martin K. Petersen 提交于 1月 04, 2009

Allow a block device to allocate and register an integrity profile
without providing a template.  This allows DM to preallocate a profile
to avoid deadlocks during table reconfiguration.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

32231638

09 10月, 2008 4 次提交

block: Switch blk_integrity_compare from bdev to gendisk · ad7fce93

由 Martin K. Petersen 提交于 10月 01, 2008

The DM and MD integrity support now depends on being able to use
gendisks instead of block_devices when comparing integrity profiles.
Change function parameters accordingly.

Also update comparison logic so that two NULL profiles are a valid
configuration.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ad7fce93

block: Fix double put in blk_integrity_unregister · 0c032ab8

由 Martin K. Petersen 提交于 10月 01, 2008

- kobject_del already puts the parent.

 - Set integrity profile to NULL to prevent stale data.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0c032ab8

block: implement and use {disk|part}_to_dev() · ed9e1982

由 Tejun Heo 提交于 8月 25, 2008

Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev.  To make sure no
user is left behind, rename generic devices fields to __dev.

This is in preparation of unifying partition 0 handling with other
partitions.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ed9e1982

Add some block/ source files to the kernel-api docbook. Fix kernel-doc... · 710027a4

由 Randy Dunlap 提交于 8月 19, 2008

Add some block/ source files to the kernel-api docbook. Fix kernel-doc notation in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

710027a4

03 7月, 2008 3 次提交

block: integrity flags can't use bit ops on unsigned short · b24498d4

由 Jens Axboe 提交于 6月 27, 2008

Just use normal open coded bit operations instead, they need not be
atomic.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b24498d4

block: integrity checkpatch cleanups · b984679e

由 Jens Axboe 提交于 6月 17, 2008

> 80 char lines and that sort of thing.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b984679e

block: Block layer data integrity support · 7ba1ba12

由 Martin K. Petersen 提交于 6月 30, 2008

Some block devices support verifying the integrity of requests by way
of checksums or other protection information that is submitted along
with the I/O.

This patch implements support for generating and verifying integrity
metadata, as well as correctly merging, splitting and cloning bios and
requests that have this extra information attached.

See Documentation/block/data-integrity.txt for more information.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7ba1ba12

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致