提交 · 5fdee2127faa77c9c91862ad5e001dfab7013e92 · openanolis / cloud-kernel

06 10月, 2017 1 次提交

block: remove QUEUE_FLAG_STACKABLE · 5fdee212

由 Christoph Hellwig 提交于 10月 05, 2017

We already have a queue_is_rq_based helper to check if a request_queue
is request based, so we can remove the flag for it.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5fdee212

05 10月, 2017 4 次提交

sysctl: remove /proc/sys/vm/nr_pdflush_threads · b35bd0d9

由 Jens Axboe 提交于 9月 30, 2017

This tunable has been obsolete since 2.6.32, and writes to the
file have been failing and complaining in dmesg since then:

nr_pdflush_threads exported in /proc is scheduled for removal

That was 8 years ago. Remove the file ABI obsolete notice, and
the sysfs file.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b35bd0d9

writeback: eliminate work item allocation in bd_start_writeback() · 85009b4f

由 Jens Axboe 提交于 9月 30, 2017

Handle start-all writeback like we do periodic or kupdate
style writeback - by marking the bdi_writeback as needing a full
flush, and simply waking the thread. This eliminates the need to
allocate and queue a specific work item just for this purpose.

After this change, we truly only ever have one of them running at
any point in time. We mark the need to start all flushes, and the
writeback thread will clear it once it has processed the request.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85009b4f

blk-mq: document the need to have STARTED and COMPLETED share a byte · fc13457f

由 Jens Axboe 提交于 10月 04, 2017

For memory ordering guarantees on stores, we need to ensure that
these two bits share the same byte of storage in the unsigned
long. Add a comment as to why, and a BUILD_BUG_ON() to ensure that
we don't violate this requirement.
Suggested-by: NBoqun Feng <boqun.feng@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc13457f

blk-mq: attempt to fix atomic flag memory ordering · a7af0af3

由 Peter Zijlstra 提交于 9月 06, 2017

Attempt to untangle the ordering in blk-mq. The patch introducing the
single smp_mb__before_atomic() is obviously broken in that it doesn't
clearly specify a pairing barrier and an obtained guarantee.

The comment is further misleading in that it hints that the
deadline store and the COMPLETE store also need to be ordered, but
AFAICT there is no such dependency. However what does appear to be
important is the clear happening _after_ the store, and that worked by
pure accident.

This clarifies blk_mq_start_request() -- we should not get there with
STARTING set -- this simplifies the code and makes the barrier usage
sane (the old code could be read to allow not having _any_ atomic after
the barrier, in which case the barrier hasn't got anything to order). We
then also introduce the missing pairing barrier for it.

Also down-grade the barrier to smp_wmb(), this is cheaper for
PowerPC/ARM and doesn't cost anything extra on x86.

And it documents the STARTING vs COMPLETE ordering. Although I've not
been entirely successful in reverse engineering the blk-mq state
machine so there might still be more funnies around timeout vs
requeue.

If I got anything wrong, feel free to educate me by adding comments to
clarify things ;-)

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: 538b7534 ("blk-mq: request deadline must be visible before marking rq as started")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7af0af3

03 10月, 2017 17 次提交

block: move __elv_next_request to blk-core.c · 9c988374

由 Christoph Hellwig 提交于 10月 03, 2017

No need to have this helper inline in a header.  Also drop the __ prefix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9c988374

block, bfq: decrease burst size when queues in burst exit · 7cb04004

由 Paolo Valente 提交于 9月 21, 2017

If many queues belonging to the same group happen to be created
shortly after each other, then the concurrent processes associated
with these queues have typically a common goal, and they get it done
as soon as possible if not hampered by device idling. Examples are
processes spawned by git grep, or by systemd during boot. As for
device idling, this mechanism is currently necessary for weight
raising to succeed in its goal: privileging I/O. In view of these
facts, BFQ does not provide the above queues with either weight
raising or device idling.

On the other hand, a burst of queue creations may be caused also by
the start-up of a complex application. In this case, these queues need
usually to be served one after the other, and as quickly as possible,
to maximise responsiveness. Therefore, in this case the best strategy
is to weight-raise all the queues created during the burst, i.e., the
exact opposite of the strategy for the above case.

To distinguish between the two cases, BFQ uses an empirical burst-size
threshold, found through extensive tests and monitoring of daily
usage. Only large bursts, i.e., burst with a size above this
threshold, are considered as generated by a high number of parallel
processes. In this respect, upstart-based boot proved to be rather
hard to detect as generating a large burst of queue creations, because
with upstart most of the queues created in a burst exit *before* the
next queues in the same burst are created. To address this issue, I
changed the burst-detection mechanism so as to not decrease the size
of the current burst even if one of the queues in the burst is
eliminated.

Unfortunately, this missing decrease causes false positives on very
fast systems: on the start-up of a complex application, such as
libreoffice writer, so many queues are created, served and exited
shortly after each other, that a large burst of queue creations is
wrongly detected as occurring. These false positives just disappear if
the size of a burst is decreased when one of the queues in the burst
exits. This commit restores the missing burst-size decrease, relying
of the fact that upstart is apparently unlikely to be used on systems
running this and future versions of the kernel.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NMauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7cb04004

block, bfq: let early-merged queues be weight-raised on split too · 894df937

由 Paolo Valente 提交于 9月 21, 2017

A just-created bfq_queue, say Q, may happen to be merged with another
bfq_queue on the very first invocation of the function
__bfq_insert_request. In such a case, even if Q would clearly deserve
interactive weight raising (as it has just been created), the function
bfq_add_request does not make it to be invoked for Q, and thus to
activate weight raising for Q. As a consequence, when the state of Q
is saved for a possible future restore, after a split of Q from the
other bfq_queue(s), such a state happens to be (unjustly)
non-weight-raised. Then the bfq_queue will not enjoy any weight
raising on the split, even if should still be in an interactive
weight-raising period when the split occurs.

This commit solves this problem as follows, for a just-created
bfq_queue that is being early-merged: it stores directly, in the saved
state of the bfq_queue, the weight-raising state that would have been
assigned to the bfq_queue if not early-merged.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

894df937

block, bfq: check and switch back to interactive wr also on queue split · 3e2bdd6d

由 Paolo Valente 提交于 9月 21, 2017

As already explained in the message of commit "block, bfq: fix
wrong init of saved start time for weight raising", if a soft
real-time weight-raising period happens to be nested in a larger
interactive weight-raising period, then BFQ restores the interactive
weight raising at the end of the soft real-time weight raising. In
particular, BFQ checks whether the latter has ended only on request
dispatches.

Unfortunately, the above scheme fails to restore interactive weight
raising in the following corner case: if a bfq_queue, say Q,
1) Is merged with another bfq_queue while it is in a nested soft
real-time weight-raising period. The weight-raising state of Q is
then saved, and not considered any longer until a split occurs.
2) Is split from the other bfq_queue(s) at a time instant when its
soft real-time weight raising is already finished.
On the split, while resuming the previous, soft real-time
weight-raised state of the bfq_queue Q, BFQ checks whether the
current soft real-time weight-raising period is actually over. If so,
BFQ switches weight raising off for Q, *without* checking whether the
soft real-time period was actually nested in a non-yet-finished
interactive weight-raising period.

This commit addresses this issue by adding the above missing check in
bfq_queue splits, and restoring interactive weight raising if needed.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e2bdd6d

block, bfq: fix wrong init of saved start time for weight raising · 4baa8bb1

由 Paolo Valente 提交于 9月 21, 2017

This commit fixes a bug that causes bfq to fail to guarantee a high
responsiveness on some drives, if there is heavy random read+write I/O
in the background. More precisely, such a failure allowed this bug to
be found [1], but the bug may well cause other yet unreported
anomalies.

BFQ raises the weight of the bfq_queues associated with soft real-time
applications, to privilege the I/O, and thus reduce latency, for these
applications. This mechanism is named soft-real-time weight raising in
BFQ. A soft real-time period may happen to be nested into an
interactive weight raising period, i.e., it may happen that, when a
bfq_queue switches to a soft real-time weight-raised state, the
bfq_queue is already being weight-raised because deemed interactive
too. In this case, BFQ saves in a special variable
wr_start_at_switch_to_srt, the time instant when the interactive
weight-raising period started for the bfq_queue, i.e., the time
instant when BFQ started to deem the bfq_queue interactive. This value
is then used to check whether the interactive weight-raising period
would still be in progress when the soft real-time weight-raising
period ends. If so, interactive weight raising is restored for the
bfq_queue. This restore is useful, in particular, because it prevents
bfq_queues from losing their interactive weight raising prematurely,
as a consequence of spurious, short-lived soft real-time
weight-raising periods caused by wrong detections as soft real-time.

If, instead, a bfq_queue switches to soft-real-time weight raising
while it *is not* already in an interactive weight-raising period,
then the variable wr_start_at_switch_to_srt has no meaning during the
following soft real-time weight-raising period. Unfortunately the
handling of this case is wrong in BFQ: not only the variable is not
flagged somehow as meaningless, but it is also set to the time when
the switch to soft real-time weight-raising occurs. This may cause an
interactive weight-raising period to be considered mistakenly as still
in progress, and thus a spurious interactive weight-raising period to
start for the bfq_queue, at the end of the soft-real-time
weight-raising period. In particular the spurious interactive
weight-raising period will be considered as still in progress, if the
soft-real-time weight-raising period does not last very long. The
bfq_queue will then be wrongly privileged and, if I/O bound, will
unjustly steal bandwidth to truly interactive or soft real-time
bfq_queues, harming responsiveness and low latency.

This commit fixes this issue by just setting wr_start_at_switch_to_srt
to minus infinity (farthest past time instant according to jiffies
macros): when the soft-real-time weight-raising period ends, certainly
no interactive weight-raising period will be considered as still in
progress.

[1] Background I/O Type: Random - Background I/O mix: Reads and writes
- Application to start: LibreOffice Writer in
http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.13-IO-LaptopSigned-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4baa8bb1

writeback: only allow one inflight and pending full flush · aac8d41c

由 Jens Axboe 提交于 9月 28, 2017

When someone calls wakeup_flusher_threads() or
wakeup_flusher_threads_bdi(), they schedule writeback of all dirty
pages in the system (or on that bdi). If we are tight on memory, we
can get tons of these queued from kswapd/vmscan. This causes (at
least) two problems:

1) We consume a ton of memory just allocating writeback work items.
   We've seen as much as 600 million of these writeback work items
   pending. That's a lot of memory to pointlessly hold hostage,
   while the box is under memory pressure.

2) We spend so much time processing these work items, that we
   introduce a softlockup in writeback processing. This is because
   each of the writeback work items don't end up doing any work (it's
   hard when you have millions of identical ones coming in to the
   flush machinery), so we just sit in a tight loop pulling work
   items and deleting/freeing them.

Fix this by adding a 'start_all' bit to the writeback structure, and
set that when someone attempts to flush all dirty pages. The bit is
cleared when we start writeback on that work item. If the bit is
already set when we attempt to queue !nr_pages writeback, then we
simply ignore it.

This provides us one full flush in flight, with one pending as well,
and makes for more efficient handling of this type of writeback.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aac8d41c

writeback: move nr_pages == 0 logic to one location · e8e8a0c6

由 Jens Axboe 提交于 9月 28, 2017

Now that we have no external callers of wb_start_writeback(), we
can shuffle the passing in of 'nr_pages'. Everybody passes in 0
at this point, so just kill the argument and move the dirty
count retrieval to that function.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8e8a0c6

writeback: make wb_start_writeback() static · 9dfb176f

由 Jens Axboe 提交于 9月 28, 2017

We don't have any callers outside of fs-writeback.c anymore,
make it private.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9dfb176f

writeback: pass in '0' for nr_pages writeback in laptop mode · 0ab29fd0

由 Jens Axboe 提交于 9月 28, 2017

Laptop mode really wants to writeback the number of dirty
pages and inodes. Instead of calculating this in the caller,
just pass in 0 and let wakeup_flusher_threads() handle it.

Use the new wakeup_flusher_threads_bdi() instead of rolling
our own.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0ab29fd0

writeback: provide a wakeup_flusher_threads_bdi() · 595043e5

由 Jens Axboe 提交于 9月 28, 2017

Similar to wakeup_flusher_threads(), except that we only wake
up the flusher threads on the specified backing device.

No functional changes in this patch.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

595043e5

J
writeback: remove 'range_cyclic' argument for wb_start_writeback() · 47410d88
由 Jens Axboe 提交于 9月 28, 2017
```
All the callers pass in 'true' for range_cyclic, so kill the
argument.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
47410d88

writeback: switch wakeup_flusher_threads() to cyclic writeback · d31cd9d3

由 Jens Axboe 提交于 9月 27, 2017

We're writing back the full range of dirty pages on the devices,
there's no point in making this special and not do normal range
cyclic writeback.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d31cd9d3

fs: kill 'nr_pages' argument from wakeup_flusher_threads() · 9ba4b2df

由 Jens Axboe 提交于 9月 20, 2017

Everybody is passing in 0 now, let's get rid of the argument.
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ba4b2df

buffer: eliminate the need to call free_more_memory() in __getblk_slow() · bc48f001

由 Jens Axboe 提交于 9月 27, 2017

Since the previous commit removed any case where grow_buffers()
would return failure due to memory allocations, we can safely
remove the case where we have to call free_more_memory() in
this function.

Since this is also the last user of free_more_memory(), kill
it off completely.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc48f001

buffer: grow_dev_page() should use __GFP_NOFAIL for all cases · 94dc24c0

由 Jens Axboe 提交于 9月 27, 2017

We currently use it for find_or_create_page(), which means that it
cannot fail. Ensure we also pass in 'retry == true' to
alloc_page_buffers(), which also ensure that it cannot fail.

After this, there are no failure cases in grow_dev_page() that
occur because of a failed memory allocation.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

94dc24c0

buffer: have alloc_page_buffers() use __GFP_NOFAIL · 640ab98f

由 Jens Axboe 提交于 9月 27, 2017

Instead of adding weird retry logic in that function, utilize
__GFP_NOFAIL to ensure that the vm takes care of handling any
potential retries appropriately. This means we don't have to
call free_more_memory() from here.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

640ab98f

blk-mq: wire up completion notifier for laptop mode · 7beb2f84

由 Jens Axboe 提交于 9月 30, 2017

For some reason, the laptop mode IO completion notified was never wired
up for blk-mq. Ensure that we trigger the callback appropriately, to arm
the laptop mode flush timer.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7beb2f84

01 10月, 2017 1 次提交

blk-mq-tag: kill unused tag enums · 5385fa47

由 Jens Axboe 提交于 10月 01, 2017

We don't have any notion of a tagging cache anymore, and haven't
for a long time. Kill off the unused enums.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5385fa47

30 9月, 2017 2 次提交

blk-mq: remove unused function hctx_allow_merges · 54724873

由 weiping zhang 提交于 9月 30, 2017

since 9bddeb2a "blk-mq: make per-sw-queue bio merge as default .bio_merge"
there is no caller for this function.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54724873

null_blk: add "no_sched" module parameter · b3cffc38

由 weiping zhang 提交于 9月 30, 2017

add an option that disable io scheduler for null block device.
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b3cffc38

27 9月, 2017 1 次提交

block: fix a build error · 0b508bc9

由 Shaohua Li 提交于 9月 26, 2017

The code is only for blkcg not for all cgroups

Fixes: d4478e92 ("block/loop: make loop cgroup aware")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b508bc9

26 9月, 2017 14 次提交

block: cryptoloop - Fix build warning · 9979d545

由 Corentin Labbe 提交于 9月 25, 2017

This patch fix the following build warning:
drivers/block/cryptoloop.c:46:8: warning: variable 'cipher' set but not used [-Wunused-but-set-variable]
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9979d545

block/loop: make loop cgroup aware · d4478e92

由 Shaohua Li 提交于 9月 25, 2017

loop block device handles IO in a separate thread. The actual IO
dispatched isn't cloned from the IO loop device received, so the
dispatched IO loses the cgroup context.

I'm ignoring buffer IO case now, which is quite complicated.  Making the
loop thread aware cgroup context doesn't really help. The loop device
only writes to a single file. In current writeback cgroup
implementation, the file can only belong to one cgroup.

For direct IO case, we could workaround the issue in theory. For
example, say we assign cgroup1 5M/s BW for loop device and cgroup2
10M/s. We can create a special cgroup for loop thread and assign at
least 15M/s for the underlayer disk. In this way, we correctly throttle
the two cgroups. But this is tricky to setup.

This patch tries to address the issue. We record bio's css in loop
command. When loop thread is handling the command, we then use the API
provided in patch 1 to set the css for current task. The bio layer will
use the css for new IO (from patch 3).
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4478e92

block: make blkcg aware of kthread stored original cgroup info · 902ec5b6

由 Shaohua Li 提交于 9月 14, 2017

bio_blkcg is the only API to get cgroup info for a bio right now. If
bio_blkcg finds current task is a kthread and has original blkcg
associated, it will use the css instead of associating the bio to
current task. This makes it possible that kthread dispatches bios on
behalf of other threads.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

902ec5b6

blkcg: delete unused APIs · af551fb3

由 Shaohua Li 提交于 9月 14, 2017

Nobody uses the APIs right now.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af551fb3

kthread: add a mechanism to store cgroup info · 05e3db95

由 Shaohua Li 提交于 9月 14, 2017

kthread usually runs jobs on behalf of other threads. The jobs should be
charged to cgroup of original threads. But the jobs run in a kthread,
where we lose the cgroup context of original threads. The patch adds a
machanism to record cgroup info of original threads in kthread context.
Later we can retrieve the cgroup info and attach the cgroup info to jobs.

Since this mechanism is only required by kthread, we store the cgroup
info in kthread data instead of generic task_struct.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

05e3db95

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · e365806a

由 Linus Torvalds 提交于 9月 25, 2017

Pull compat fix from Al Viro:
 "I really wish gcc warned about conversions from pointer to function
  into void *..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a typo in put_compat_shm_info()

e365806a

fix a typo in put_compat_shm_info() · b776e4b1

由 Al Viro 提交于 9月 25, 2017

"uip" misspelled as "up"; unfortunately, the latter happens to be
a function and gcc is happy to convert it to void *...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b776e4b1

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 19240e6b

由 Linus Torvalds 提交于 9月 25, 2017

Pull block fixes from Jens Axboe:

 - Two sets of NVMe pull requests from Christoph:
      - Fixes for the Fibre Channel host/target to fix spec compliance
      - Allow a zero keep alive timeout
      - Make the debug printk for broken SGLs work better
      - Fix queue zeroing during initialization
      - Set of RDMA and FC fixes
      - Target div-by-zero fix

 - bsg double-free fix.

 - ndb unknown ioctl fix from Josef.

 - Buffered vs O_DIRECT page cache inconsistency fix. Has been floating
   around for a long time, well reviewed. From Lukas.

 - brd overflow fix from Mikulas.

 - Fix for a loop regression in this merge window, where using a union
   for two members of the loop_cmd turned out to be a really bad idea.
   From Omar.

 - Fix for an iostat regression fix in this series, using the wrong API
   to get at the block queue. From Shaohua.

 - Fix for a potential blktrace delection deadlock. From Waiman.

* 'for-linus' of git://git.kernel.dk/linux-block: (30 commits)
  nvme-fcloop: fix port deletes and callbacks
  nvmet-fc: sync header templates with comments
  nvmet-fc: ensure target queue id within range.
  nvmet-fc: on port remove call put outside lock
  nvme-rdma: don't fully stop the controller in error recovery
  nvme-rdma: give up reconnect if state change fails
  nvme-core: Use nvme_wq to queue async events and fw activation
  nvme: fix sqhd reference when admin queue connect fails
  block: fix a crash caused by wrong API
  fs: Fix page cache inconsistency when mixing buffered and AIO DIO
  nvmet: implement valid sqhd values in completions
  nvme-fabrics: Allow 0 as KATO value
  nvme: allow timed-out ios to retry
  nvme: stop aer posting if controller state not live
  nvme-pci: Print invalid SGL only once
  nvme-pci: initialize queue memory before interrupts
  nvmet-fc: fix failing max io queue connections
  nvme-fc: use transport-specific sgl format
  nvme: add transport SGL definitions
  nvme.h: remove FC transport-specific error values
  ...

19240e6b

Merge tag 'gfs2-for-linus-4.14-rc3' of... · 17763641

由 Linus Torvalds 提交于 9月 25, 2017

Merge tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Bob Peterson:
 "GFS2: Fix an old regression in GFS2's debugfs interface

 This fixes a regression introduced by commit 88ffbf3e ("GFS2: Use
 resizable hash table for glocks"). The regression caused the glock dump
 in debugfs to not report all the glocks, which makes debugging
 extremely difficult"

* tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix debugfs glocks dump

17763641

Merge tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze · cf034616

由 Linus Torvalds 提交于 9月 25, 2017

Pull Microblaze fixes from Michal Simek:

 - Kbuild fix

 - use vma_pages

 - setup default little endians

* tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
  arch: change default endian for microblaze
  microblaze: Cocci spatch "vma_pages"
  microblaze: Add missing kvm_para.h to Kbuild

cf034616

Merge tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · ac0a3646

由 Linus Torvalds 提交于 9月 25, 2017

Pull tracing fixes from Steven Rostedt:
"Stack tracing and RCU has been having issues with each other and
lockdep has been pointing out constant problems.

The changes have been going into the stack tracer, but it has been
discovered that the problem isn't with the stack tracer itself, but it
is with calling save_stack_trace() from within the internals of RCU.

The stack tracer is the one that can trigger the issue the easiest,
but examining the problem further, it could also happen from a WARN()
in the wrong place, or even if an NMI happened in this area and it did
an rcu_read_lock().

The critical area is where RCU is not watching. Which can happen while
going to and from idle, or bringing up or taking down a CPU.

The final fix was to put the protection in kernel_text_address() as it
is the one that requires RCU to be watching while doing the stack
trace.

To make this work properly, Paul had to allow rcu_irq_enter() happen
after rcu_nmi_enter(). This should have been done anyway, since an NMI
can page fault (reading vmalloc area), and a page fault triggers
rcu_irq_enter().

One patch is just a consolidation of code so that the fix only needed
to be done in one location"

* tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove RCU work arounds from stack tracer
extable: Enable RCU if it is not watching in kernel_text_address()
extable: Consolidate *kernel_text_address() functions
rcu: Allow for page faults in NMI handlers

ac0a3646

nvme-fcloop: fix port deletes and callbacks · fddc9923

由 James Smart 提交于 9月 19, 2017

Now that there are potentially long delays between when a remoteport or
targetport delete calls is made and when the callback occurs (dev_loss_tmo
timeout), no longer block in the delete routines and move the final nport
puts to the callbacks.

Moved the fcloop_nport_get/put/free routines to avoid forward declarations.

Ensure port_info structs used in registrations are nulled in case fields
are not set (ex: devloss_tmo values).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fddc9923

nvmet-fc: sync header templates with comments · 6b71f9e1

由 James Smart 提交于 9月 20, 2017

Comments were incorrect:
- defer_rcv was in host port template. moved to target port template
- Added Mandatory statements for target port template items
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b71f9e1

nvmet-fc: ensure target queue id within range. · 0c319d3a

由 James Smart 提交于 9月 19, 2017

When searching for queue id's ensure they are within the expected range.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0c319d3a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功