提交 · da22f0eea555baf9b0a84b52afe56db2052cfe8d · openeuler / Kernel

06 9月, 2017 10 次提交

bcache: silence static checker warning · da22f0ee

由 Dan Carpenter 提交于 9月 06, 2017

In olden times, closure_return() used to have a hidden return built in.
We removed the hidden return but forgot to add a new return here.  If
"c" were NULL we would oops on the next line, but fortunately "c" is
never NULL.  Let's just remove the if statement.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da22f0ee

bcache: fix for gc and write-back race · 9baf3097

由 Tang Junhui 提交于 9月 06, 2017

gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread                               write-back thread
|                                       |bch_writeback_thread()
|bch_gc_thread()                        |
|                                       |==>read_dirty()
|==>bch_btree_gc()                      |
|==>btree_root() //get btree root       |
|                //node write locker    |
|==>bch_btree_gc_root()                 |
|                                       |==>read_dirty_submit()
|                                       |==>write_dirty()
|                                       |==>continue_at(cl,
|                                       |               write_dirty_finish,
|                                       |               system_wq);
|                                       |==>write_dirty_finish()//excute
|                                       |               //in system_wq
|                                       |==>bch_btree_insert()
|                                       |==>bch_btree_map_leaf_nodes()
|                                       |==>__bch_btree_map_nodes()
|                                       |==>btree_root //try to get btree
|                                       |              //root node read
|                                       |              //lock
|                                       |-----stuck here
|==>bch_btree_set_root()
|==>bch_journal_meta()
|==>bch_journal()
|==>journal_try_write()
|==>journal_write_unlocked() //journal_full(&c->journal)
|                            //condition satisfied
|==>continue_at(cl, journal_write, system_wq); //try to excute
|                               //journal_write in system_wq
|                               //but work queue is excuting
|                               //write_dirty_finish()
|==>closure_sync(); //wait journal_write execute
|                   //over and wake up gc,
|-------------stuck here
|==>release root node write locker

This patch alloc a separate work-queue for write-back thread to avoid such
race.

(Commit log re-organized by Coly Li to pass checkpatch.pl checking)
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Acked-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9baf3097

bcache: increase the number of open buckets · 89b1fc54

由 Tang Junhui 提交于 9月 06, 2017

In currently, we only alloc 6 open buckets for each cache set,
but in usually, we always attach about 10 or so backend devices for
each cache set, and the each bcache device are always accessed by
about 10 or so threads in top application layer. So 6 open buckets
are too few, It has led to that each of the same thread write data
to different buckets, which would cause low efficiency write-back,
and also cause buckets inefficient, and would be Very easy to run
out of.

I add debug message in bch_open_buckets_alloc() to print alloc bucket
info, and test with ten bcache devices with a cache set, and each
bcache device is accessed by ten threads.

From the debug message, we can see that, after the modification, One
bucket is more likely to assign to the same thread, and the data from
the same thread are more likely to write the same bucket. Usually the
same thread always write/read the same backend device, so it is good
for write-back and also promote the usage efficiency of buckets.
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89b1fc54

bcache: Correct return value for sysfs attach errors · 77fa100f

由 Tony Asleson 提交于 9月 06, 2017

If you encounter any errors in bch_cached_dev_attach it will return
a negative error code.  The variable 'v' which stores the result is
unsigned, thus user space sees a very large value returned for bytes
written which can cause incorrect user space behavior.  Utilize 1
signed variable to use throughout the function to preserve error return
capability.
Signed-off-by: NTony Asleson <tasleson@redhat.com>
Acked-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

77fa100f

bcache: correct cache_dirty_target in __update_writeback_rate() · a8394090

由 Tang Junhui 提交于 9月 06, 2017

__update_write_rate() uses a Proportion-Differentiation Controller
algorithm to control writeback rate. A dirty target number is used in
this PD controller to control writeback rate. A larger target number
will make the writeback rate smaller, on the versus, a smaller target
number will make the writeback rate larger.

bcache uses the following steps to calculate the target number,
1) cache_sectors = all-buckets-of-cache-set * buckets-size
2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
3) target = cache_dirty_target *
(sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)

The calculation at step 1) for cache_sectors is incorrect, which does
not consider dirty blocks occupied by flash only volume.

A flash only volume can be took as a bcache device without cached
device. All data sectors allocated for it are persistent on cache device
and marked dirty, they are not touched by bcache writeback and garbage
collection code. So data blocks of flash only volume should be ignore
when calculating cache_sectors of cache set.

Current code does not subtract dirty sectors of flash only volume, which
results a larger target number from the above 3 steps. And in sequence
the cache device's writeback rate is smaller then a correct value,
writeback speed is slower on all cached devices.

This patch fixes the incorrect slower writeback rate by subtracting
dirty sectors of flash only volumes in __update_writeback_rate().

(Commit log composed by Coly Li to pass checkpatch.pl checking)
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8394090

bcache: gc does not work when triggering by manual command · 0b43f49d

由 Tang Junhui 提交于 9月 06, 2017

I try to execute the following command to trigger gc thread:
[root@localhost internal]# echo 1 > trigger_gc
But it does not work, I debug the code in gc_should_run(), It works only
if in invalidating or sectors_to_gc < 0. So set sectors_to_gc to -1 to
meet the condition when we trigger gc by manual command.

(Code comments aded by Coly Li)
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b43f49d

bcache: Don't reinvent the wheel but use existing llist API · 09b3efec

由 Byungchul Park 提交于 9月 06, 2017

Although llist provides proper APIs, they are not used. Make them used.
Signed-off-by: NByungchul Park <byungchul.park@lge.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09b3efec

bcache: do not subtract sectors_to_gc for bypassed IO · 69daf03a

由 Tang Junhui 提交于 9月 06, 2017

Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Acked-by: NColy Li <colyli@suse.de>
Reviewed-by: NEric Wheeler <bcache@linux.ewheeler.net>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69daf03a

bcache: fix sequential large write IO bypass · c81ffa32

由 Tang Junhui 提交于 9月 06, 2017

Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
    if (!congested &&
        mode == CACHE_MODE_WRITEBACK &&
        op_is_write(bio_op(bio)) &&
        (bio-＞bi_opf & REQ_SYNC))
        goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c81ffa32

bcache: Fix leak of bdev reference · 4b758df2

由 Jan Kara 提交于 9月 06, 2017

If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b758df2

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

10 8月, 2017 1 次提交

block: pass in queue to inflight accounting · d62e26b3

由 Jens Axboe 提交于 6月 30, 2017

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d62e26b3

20 6月, 2017 1 次提交

sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9

由 Ingo Molnar 提交于 6月 20, 2017

Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ac6424b9

19 6月, 2017 3 次提交

bcache: use kmalloc to allocate bio in bch_data_verify() · 5a136fdf

由 NeilBrown 提交于 6月 18, 2017

This function allocates a bio, then a collection
of pages.  It copes with failure.

It currently uses a mempool() to allocate the bio,
but alloc_page() to allocate the pages.  These fail
in different ways, so the usage is inconsistent.

Change the bio_clone() to bio_clone_kmalloc()
so that no pool is used either for the bio or the pages.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by : Ming Lei <ming.lei@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a136fdf

blk: make the bioset rescue_workqueue optional. · 47e0fb46

由 NeilBrown 提交于 6月 18, 2017

This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer().  It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.

All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.

biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios.  This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.

biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.

It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

47e0fb46

blk: replace bioset_create_nobvec() with a flags arg to bioset_create() · 011067b0

由 NeilBrown 提交于 6月 18, 2017

"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

011067b0

09 6月, 2017 1 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

09 5月, 2017 2 次提交

drivers/md/bcache/super.c: use kvmalloc · bc4e54f6

由 Michal Hocko 提交于 5月 08, 2017

bcache_device_init uses kmalloc for small requests and vmalloc for those
which are larger than 64 pages. This alone is a strange criterion.
Moreover kmalloc can fallback to vmalloc on the failure. Let's simply
use kvmalloc instead as it knows how to handle the fallback properly

Link: http://lkml.kernel.org/r/20170306103327.2766-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc4e54f6

treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68

由 Michal Hocko 提交于 5月 08, 2017

There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

752ade68

10 3月, 2017 1 次提交

drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h · f4b7ac68

由 Masanari Iida 提交于 3月 09, 2017

Link: http://lkml.kernel.org/r/20170226060230.11555-1-standby24x7@gmail.comSigned-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NColy Li <colyli@suse.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f4b7ac68

02 3月, 2017 3 次提交

sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> · b2d09103

由 Ingo Molnar 提交于 2月 04, 2017

We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

b2d09103

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> · 68db0cf1

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

68db0cf1

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571

由 Ingo Molnar 提交于 2月 01, 2017

We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e6017571

02 2月, 2017 1 次提交

block: Use pointer to backing_dev_info from request_queue · dc3b17cc

由 Jan Kara 提交于 2月 02, 2017

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

dc3b17cc

28 1月, 2017 1 次提交

block: add a op_is_flush helper · f73f44eb

由 Christoph Hellwig 提交于 1月 27, 2017

This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f73f44eb

18 12月, 2016 2 次提交
- E
  bcache: partition support: add 16 minors per bcacheN device · b8c0d911
  由 Eric Wheeler 提交于 10月 23, 2016
```
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
Tested-by: NWido den Hollander <wido@widodh.nl>
```
  b8c0d911
- K
  bcache: Make gc wakeup sane, remove set_task_state() · be628be0
  由 Kent Overstreet 提交于 10月 26, 2016
```
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
```
  be628be0
22 11月, 2016 2 次提交

bcache: debug: avoid accessing .bi_io_vec directly · 4113b88a

由 Ming Lei 提交于 11月 11, 2016

Instead we use standard iterator way to do that.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4113b88a

block: bio: pass bvec table to bio_init() · 3a83f467

由 Ming Lei 提交于 11月 22, 2016

Some drivers often use external bvec table, so introduce
this helper for this case. It is always safe to access the
bio->bi_io_vec in this way for this case.

After converting to this usage, it will becomes a bit easier
to evaluate the remaining direct access to bio->bi_io_vec,
so it can help to prepare for the following multipage bvec
support.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

Fixed up the new O_DIRECT cases.
Signed-off-by: NJens Axboe <axboe@fb.com>

3a83f467

01 11月, 2016 2 次提交

block,fs: use REQ_* flags directly · 70fd7614

由 Christoph Hellwig 提交于 11月 01, 2016

Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

70fd7614

bcache: use op_is_sync to check for synchronous requests · 83b5df67

由 Christoph Hellwig 提交于 11月 01, 2016

(and remove one layer of masking for the op_is_write call next to it).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

83b5df67

22 9月, 2016 1 次提交

block: export bio_free_pages to other modules · 491221f8

由 Guoqing Jiang 提交于 9月 22, 2016

bio_free_pages is introduced in commit 1dfa0f68
("block: add a helper to free bio bounce buffer pages"),
we can reuse the func in other modules after it was
imported.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

491221f8

19 8月, 2016 3 次提交

bcache: pr_err: more meaningful error message when nr_stripes is invalid · 90706094

由 Eric Wheeler 提交于 8月 18, 2016

The original error was thought to be corruption, but was actually caused by:
	make-bcache --data-offset N
where N was in bytes and should have been in sectors.  While userspace
tools should be updated to check --data-offset beyond end of volume,
hopefully this will help others that might not have noticed the units.
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: Kent Overstreet <kent.overstreet@gmail.com>

90706094

bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two. · acc9cf8c

由 Kent Overstreet 提交于 8月 17, 2016

This patch fixes a cachedev registration-time allocation deadlock.
This can deadlock on boot if your initrd auto-registeres bcache devices:

Allocator thread:
[  720.727614] INFO: task bcache_allocato:3833 blocked for more than 120 seconds.
[  720.732361]  [<ffffffff816eeac7>] schedule+0x37/0x90
[  720.732963]  [<ffffffffa05192b8>] bch_bucket_alloc+0x188/0x360 [bcache]
[  720.733538]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
[  720.734137]  [<ffffffffa05302bd>] bch_prio_write+0x19d/0x340 [bcache]
[  720.734715]  [<ffffffffa05190bf>] bch_allocator_thread+0x3ff/0x470 [bcache]
[  720.735311]  [<ffffffff816ee41c>] ? __schedule+0x2dc/0x950
[  720.735884]  [<ffffffffa0518cc0>] ? invalidate_buckets+0x980/0x980 [bcache]

Registration thread:
[  720.710403] INFO: task bash:3531 blocked for more than 120 seconds.
[  720.715226]  [<ffffffff816eeac7>] schedule+0x37/0x90
[  720.715805]  [<ffffffffa05235cd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
[  720.716409]  [<ffffffffa0522d30>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
[  720.717008]  [<ffffffffa05236e4>] bch_btree_insert+0xf4/0x170 [bcache]
[  720.717586]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
[  720.718191]  [<ffffffffa0527d9a>] bch_journal_replay+0x14a/0x290 [bcache]
[  720.718766]  [<ffffffff810cc90d>] ? ttwu_do_activate.constprop.94+0x5d/0x70
[  720.719369]  [<ffffffff810cf684>] ? try_to_wake_up+0x1d4/0x350
[  720.719968]  [<ffffffffa05317d0>] run_cache_set+0x580/0x8e0 [bcache]
[  720.720553]  [<ffffffffa053302e>] register_bcache+0xe2e/0x13b0 [bcache]
[  720.721153]  [<ffffffff81354cef>] kobj_attr_store+0xf/0x20
[  720.721730]  [<ffffffff812a2dad>] sysfs_kf_write+0x3d/0x50
[  720.722327]  [<ffffffff812a225a>] kernfs_fop_write+0x12a/0x180
[  720.722904]  [<ffffffff81225177>] __vfs_write+0x37/0x110
[  720.723503]  [<ffffffff81228048>] ? __sb_start_write+0x58/0x110
[  720.724100]  [<ffffffff812cedb3>] ? security_file_permission+0x23/0xa0
[  720.724675]  [<ffffffff812258a9>] vfs_write+0xa9/0x1b0
[  720.725275]  [<ffffffff8102479c>] ? do_audit_syscall_entry+0x6c/0x70
[  720.725849]  [<ffffffff81226755>] SyS_write+0x55/0xd0
[  720.726451]  [<ffffffff8106a390>] ? do_page_fault+0x30/0x80
[  720.727045]  [<ffffffff816f2cae>] system_call_fastpath+0x12/0x71

The fifo code in upstream bcache can't use the last element in the buffer,
which was the cause of the bug: if you asked for a power of two size,
it'd give you a fifo that could hold one less than what you asked for
rather than allocating a buffer twice as big.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Tested-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org

acc9cf8c

bcache: register_bcache(): call blkdev_put() when cache_alloc() fails · d9dc1702

由 Eric Wheeler 提交于 6月 17, 2016

register_cache() is supposed to return an error string on error so that
register_bcache() will will blkdev_put and cleanup other user counters,
but it does not set 'char *err' when cache_alloc() fails (eg, due to
memory pressure) and thus register_bcache() performs no cleanup.

register_bcache() <----------\  <- no jump to err_close, no blkdev_put()
   |                         |
   +->register_cache()       |  <- fails to set char *err
         |                   |
         +->cache_alloc() ---/  <- returns error

This patch sets `char *err` for this failure case so that register_cache()
will cause register_bcache() to correctly jump to err_close and do
cleanup.  This was tested under OOM conditions that triggered the bug.
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: stable@vger.kernel.org

d9dc1702

08 8月, 2016 1 次提交

block: rename bio bi_rw to bi_opf · 1eff9d32

由 Jens Axboe 提交于 8月 05, 2016

Since commit 63a4cc24, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.
Signed-off-by: NJens Axboe <axboe@fb.com>

1eff9d32

21 7月, 2016 1 次提交

block: simplify and cleanup bvec pool handling · ed996a52

由 Christoph Hellwig 提交于 7月 19, 2016

Instead of a flag and an index just make sure an index of 0 means
no need to free the bvec array.  Also move the constants related
to the bvec pools together and use a consistent naming scheme for
them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ed996a52

06 7月, 2016 3 次提交

bcache: Remove redundant block_size assignment · 89b920e0

由 Yijing Wang 提交于 7月 04, 2016

We have assigned sb->block_size before the switch,
so remove the redundant one.
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NYijing Wang <wangyijing@huawei.com>
Acked-by: NEric Wheeler <bcache@lists.ewheeler.net>
Signed-off-by: NJens Axboe <axboe@fb.com>

89b920e0

bcache: update document info · 7abc70d7

由 Yijing Wang 提交于 7月 04, 2016

There is no return in continue_at(), update the documentation.
Signed-off-by: NYijing Wang <wangyijing@huawei.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

7abc70d7

bcache: Remove redundant parameter for cache_alloc() · c50d4d5d

由 Yijing Wang 提交于 7月 04, 2016

Cache_sb is not used in cache_alloc, and we have copied
sb info to cache->sb already, remove it.
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NYijing Wang <wangyijing@huawei.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c50d4d5d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功