提交 · a711d91cd97e6c9a554ccd1652527a7f36661857 · openeuler / Kernel

05 5月, 2020 1 次提交

block: add a cdrom_device_info pointer to struct gendisk · a711d91c

由 Christoph Hellwig 提交于 4月 25, 2020

Add a pointer to the CDROM information structure to struct gendisk.
This will allow various removable media file systems to call directly
into the CDROM layer instead of abusing ioctls with kernel pointers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a711d91c

01 5月, 2020 4 次提交

iocost_monitor: drop string wrap around numbers when outputting json · 21f3cfea

由 Tejun Heo 提交于 4月 13, 2020

Wrapping numbers in strings is used by some to work around bit-width issues in
some enviroments. The problem isn't innate to json and the workaround seems to
cause more integration problems than help. Let's drop the string wrapping.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21f3cfea

iocost_monitor: exit successfully if interval is zero · f4fe3ea6

由 Tejun Heo 提交于 4月 13, 2020

This is to help external tools to decide whether iocost_monitor has all its
requirements met or not based on the exit status of an -i0 run.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f4fe3ea6

blk-iocost: account for IO size when testing latencies · cd006509

由 Tejun Heo 提交于 4月 13, 2020

On each IO completion, iocost decides whether the IO met or missed its latency
target. Currently, the targets are fixed numbers per IO type. While this can be
good enough for loose latency targets way higher than typical completion
latencies, the effect of IO size makes it difficult to tighten the latency
target - a target adequate for 4k IOs might be too tight for 512k IOs and
vice-versa.

iocost already has all the necessary information to account for different IO
sizes when testing whether the latency target is met as iocost can calculate the
size vtime cost of a given IO. This patch updates the completion path to
calculate the size vtime cost of the IO, deduct the nsec equivalent from the
observed latency and use the adjusted value to decide whether the target is met.

This makes latency targets independent from IO size and enables determining
adequate latency targets with fixed size fio runs.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd006509

blk-iocost: switch to fixed non-auto-decaying use_delay · 54c52e10

由 Tejun Heo 提交于 4月 13, 2020

The use_delay mechanism was introduced by blk-iolatency to hold memory
allocators accountable for the reclaim and other shared IOs they cause. The
duration of the delay is dynamically balanced between iolatency increasing the
value on each target miss and it auto-decaying as time passes and threads get
delayed on it.

While this works well for iolatency, iocost's control model isn't compatible
with it. There is no repeated "violation" events which can be balanced against
auto-decaying. iocost instead knows how much a given cgroup is over budget and
wants to prevent that cgroup from issuing IOs while over budget. Until now,
iocost has been adding the cost of force-issued IOs. However, this doesn't
reflect the amount which is already over budget and is simply not enough to
counter the auto-decaying allowing anon-memory leaking low priority cgroup to
go over its alloted share of IOs.

As auto-decaying doesn't make much sense for iocost, this patch introduces a
different mode of operation for use_delay - when blkcg_set_delay() are used
insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it
is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the
delay duration synchronized to the budget overage amount.

With this change, iocost can effectively police cgroups which generate
significant amount of force-issued IOs.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54c52e10

29 4月, 2020 5 次提交

block: add a bio_queue_enter helper · accea322

由 Christoph Hellwig 提交于 4月 28, 2020

Add a little helper that passes the right nowait flag to blk_queue_enter
based on the bio flag, and terminates the bio with the right error code
if entering the queue fails.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

accea322

block: replace BIO_QUEUE_ENTERED with BIO_CGROUP_ACCT · 0376e9ef

由 Christoph Hellwig 提交于 4月 28, 2020

BIO_QUEUE_ENTERED is only used for cgroup accounting now, so rename
the flag and move setting it into the cgroup code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0376e9ef

block: cleanup the memory stall accounting in submit_bio · 760f83ea

由 Christoph Hellwig 提交于 4月 28, 2020

Instead of a convoluted chain just check for REQ_OP_READ directly,
and keep all the memory stall code together in a single unlikely
branch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

760f83ea

block: improve the submit_bio and generic_make_request documentation · 3fdd4086

由 Christoph Hellwig 提交于 4月 28, 2020

The current documentation is a little weird, as it doesn't clearly
explain which function to use, and also has the guts of the information
on generic_make_request, which is the internal interface for stacking
drivers.

Fix this up by properly documenting submit_bio, and only documenting
the differences and the use case for generic_make_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3fdd4086

blk-mq: make function '__blk_mq_sched_dispatch_requests' static · e1b586f2

由 Zheng Bin 提交于 4月 29, 2020

Fix sparse warnings:

block/blk-mq-sched.c:209:5: warning: symbol '__blk_mq_sched_dispatch_requests' was not declared. Should it be static?
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e1b586f2

25 4月, 2020 4 次提交

block: bypass ->make_request_fn for blk-mq drivers · 8cf7961d

由 Christoph Hellwig 提交于 4月 25, 2020

Call blk_mq_make_request when no ->make_request_fn is set.  This is
safe now that blk_alloc_queue always sets up the pointer for make_request
based drivers.  This avoids an indirect call in the blk-mq driver I/O
fast path, which is rather expensive due to spectre mitigations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cf7961d

C
dm: remove the make_request_fn check in device_area_is_invalid · ae3cc8d8
由 Christoph Hellwig 提交于 4月 25, 2020
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
ae3cc8d8

bcache: remove a duplicate ->make_request_fn assignment · a91b2014

由 Christoph Hellwig 提交于 4月 25, 2020

The make_request_fn pointer should only be assigned by blk_alloc_queue.
Fix a left over manual initialization.

Fixes: ff27668c ("bcache: pass the make_request methods to blk_queue_make_request")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a91b2014

block: remove create_io_context · 3e82c348

由 Christoph Hellwig 提交于 4月 25, 2020

create_io_context just has a single caller, which also happens to not
even use the return value.  Just open code it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e82c348

24 4月, 2020 2 次提交

block: Limit number of items taken from the I/O scheduler in one go · 28d65729

由 Salman Qazi 提交于 4月 24, 2020

Flushes bypass the I/O scheduler and get added to hctx->dispatch
in blk_mq_sched_bypass_insert.  This can happen while a kworker is running
hctx->run_work work item and is past the point in
blk_mq_sched_dispatch_requests where hctx->dispatch is checked.

The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time,
because the I/O scheduler can feed an arbitrary number of commands.

Since we have only one hctx->run_work, the commands waiting in
hctx->dispatch will wait an arbitrary length of time for run_work to be
rerun.

A similar phenomenon exists with dispatches from the software queue.

The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and
blk_mq_do_dispatch_ctx and return from the run_work handler and let it
rerun.
Signed-off-by: NSalman Qazi <sqazi@google.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28d65729

block: unexport bdev_read_page and bdev_write_page · 895d4775

由 Christoph Hellwig 提交于 4月 24, 2020

Each one just has two callers, both in always built-in code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

895d4775

23 4月, 2020 5 次提交

block: move dma_pad handling from blk_rq_map_sg into the callers · bdf8710d

由 Christoph Hellwig 提交于 4月 14, 2020

There are only two callers of blk_rq_map_sg/__blk_rq_map_sg that set
the dma_pad value in the queue.  Move the handling into those callers
instead of burdening the common code, and move the ->extra_len field
from struct request to struct scsi_cmnd.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bdf8710d

block: move dma drain handling to scsi · cc97923a

由 Christoph Hellwig 提交于 4月 14, 2020

Don't burden the common block code with with specifics of the libata DMA
draining mechanism.  Instead move most of the code to the scsi midlayer.

That also means the nr_phys_segments adjustments in the blk-mq fast path
can go away entirely, given that SCSI never looks at nr_phys_segments
after mapping the request to a scatterlist.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc97923a

scsi: merge scsi_init_sgtable into scsi_init_io · 0475bd6c

由 Christoph Hellwig 提交于 4月 14, 2020

scsi_init_io is the only caller of scsi_init_sgtable.  Merge the two
function to make upcoming changes a little easier.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0475bd6c

block: provide a blk_rq_map_sg variant that returns the last element · 89de1504

由 Christoph Hellwig 提交于 4月 14, 2020

To be able to move some of the special purpose hacks in blk_rq_map_sg
into the callers we need a variant that returns the last mapped
S/G list element to the caller.  Add that variant as __blk_rq_map_sg
and make blk_rq_map_sg a trivial inline wrapper around it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89de1504

block: remove RQF_COPY_USER · e64a0e16

由 Christoph Hellwig 提交于 4月 14, 2020

The RQF_COPY_USER is set for bio where the passthrough request mapping
helpers decided that bounce buffering is required. It is then used to
pad scatterlist for drivers that required it. But given that
non-passthrough requests are per definition aligned, and directly mapped
pass-through request must be aligned it is not actually required at all.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e64a0e16

21 4月, 2020 14 次提交

block: fold bdev_unhash_inode into invalidate_partition · 9bc5c397

由 Christoph Hellwig 提交于 4月 14, 2020

invalidate_partition and bdev_unhash_inode are always paired, and
invalidate_partition already does an icache lookup for the block device
inode.  Piggy back on that to remove the inode from the hash.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9bc5c397

block: mark invalidate_partition static · 02d33b67

由 Christoph Hellwig 提交于 4月 14, 2020

invalidate_partition is only used in genhd.c, so mark it static.  Also
drop the return value given that is is always ignored.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02d33b67

block: simplify block device syncing in bdev_del_partition · d5f3178e

由 Christoph Hellwig 提交于 4月 14, 2020

We just checked a little above that the block device for the partition
im busy.  That implies no file system is mounted, and thus the only
thing in fsync_bdev that actually is used is sync_blockdev.  Just call
sync_blockdev directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d5f3178e

block: don't call invalidate_partition from blk_drop_partitions · e669c1da

由 Christoph Hellwig 提交于 4月 14, 2020

Given that the device must not be busy, most of the calls from
invalidate_partition that are related to file system metadata are
guranteed to not happen.  Just open code the calls to sync_blockdev
and invalidate_bdev instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e669c1da

dasd: use blk_drop_partitions instead of badly reimplementing it · 21be6cdc

由 Christoph Hellwig 提交于 4月 14, 2020

Use the blk_drop_partitions function instead of messing around with
ioctls that get kernel pointers.  For this blk_drop_partitions needs
to be exported, which it normally shouldn't - make an exception for
s390 only.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21be6cdc

block: remove the disk argument from blk_drop_partitions · d46430bf

由 Christoph Hellwig 提交于 4月 14, 2020

The gendisk can be trivially deducted from the block_device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d46430bf

block: remove hd_struct_kill · 4377b48d

由 Christoph Hellwig 提交于 4月 14, 2020

The function has a single caller, so just open code it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4377b48d

block: cleanup hd_struct freeing · 8da2892e

由 Christoph Hellwig 提交于 4月 14, 2020

Move hd_ref_init out of line as there it isn't anywhere near a fast path,
and rename the rcu ref freeing callbacks to be more descriptive.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8da2892e

block: pass a hd_struct to delete_partition · cddae808

由 Christoph Hellwig 提交于 4月 14, 2020

All callers have the hd_struct at hand, so pass it instead of performing
another lookup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cddae808

block: refactor blkpg_ioctl · fa9156ae

由 Christoph Hellwig 提交于 4月 14, 2020

Split each sub-command out into a separate helper, and move those helpers
to block/partitions/core.c instead of having a lot of partition
manipulation logic open coded in block/ioctl.c.

Signed-off-by: Christoph Hellwig <hch@lst.de
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa9156ae

Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle" · b4fd63f4

由 Douglas Anderson 提交于 4月 20, 2020

This reverts commit 7e70aa78.

Now that we have the patches ("blk-mq: In blk_mq_dispatch_rq_list()
"no budget" is a reason to kick") and ("blk-mq: Rerun dispatching in
the case of budget contention") we should no longer need the fix in
the SCSI code.  Revert it, resolving conflicts with other patches that
have touched this code.

With this revert (and the two new patches) I can run the script that
was in commit 7e70aa78 ("scsi: core: run queue if SCSI device
queue isn't ready and queue is idle") in a loop with no failure.  If I
do this revert without the two new patches I can easily get a failure.
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b4fd63f4

blk-mq: Rerun dispatching in the case of budget contention · a0823421

由 Douglas Anderson 提交于 4月 20, 2020

If ever a thread running blk-mq code tries to get budget and fails it
immediately stops doing work and assumes that whenever budget is freed
up that queues will be kicked and whatever work the thread was trying
to do will be tried again.

One path where budget is freed and queues are kicked in the normal
case can be seen in scsi_finish_command().  Specifically:
- scsi_finish_command()
  - scsi_device_unbusy()
    - # Decrement "device_busy", AKA release budget
  - scsi_io_completion()
    - scsi_end_request()
      - blk_mq_run_hw_queues()

The above is all well and good.  The problem comes up when a thread
claims the budget but then releases it without actually dispatching
any work.  Since we didn't schedule any work we'll never run the path
of finishing work / kicking the queues.

This isn't often actually a problem which is why this issue has
existed for a while and nobody noticed.  Specifically we only get into
this situation when we unexpectedly found that we weren't going to do
any work.  Code that later receives new work kicks the queues.  All
good, right?

The problem shows up, however, if timing is just wrong and we hit a
race.  To see this race let's think about the case where we only have
a budget of 1 (only one thread can hold budget).  Now imagine that a
thread got budget and then decided not to dispatch work.  It's about
to call put_budget() but then the thread gets context switched out for
a long, long time.  While in this state, any and all kicks of the
queue (like the when we received new work) will be no-ops because
nobody can get budget.  Finally the thread holding budget gets to run
again and returns.  All the normal kicks will have been no-ops and we
have an I/O stall.

As you can see from the above, you need just the right timing to see
the race.  To start with, the only case it happens if we thought we
had work, actually managed to get the budget, but then actually didn't
have work.  That's pretty rare to start with.  Even then, there's
usually a very small amount of time between realizing that there's no
work and putting the budget.  During this small amount of time new
work has to come in and the queue kick has to make it all the way to
trying to get the budget and fail.  It's pretty unlikely.

One case where this could have failed is illustrated by an example of
threads running blk_mq_do_dispatch_sched():

* Threads A and B both run has_work() at the same time with the same
  "hctx".  Imagine has_work() is exact.  There's no lock, so it's OK
  if Thread A and B both get back true.
* Thread B gets interrupted for a long time right after it decides
  that there is work.  Maybe its CPU gets an interrupt and the
  interrupt handler is slow.
* Thread A runs, get budget, dispatches work.
* Thread A's work finishes and budget is released.
* Thread B finally runs again and gets budget.
* Since Thread A already took care of the work and no new work has
  come in, Thread B will get NULL from dispatch_request().  I believe
  this is specifically why dispatch_request() is allowed to return
  NULL in the first place if has_work() must be exact.
* Thread B will now be holding the budget and is about to call
  put_budget(), but hasn't called it yet.
* Thread B gets interrupted for a long time (again).  Dang interrupts.
* Now Thread C (maybe with a different "hctx" but the same queue)
  comes along and runs blk_mq_do_dispatch_sched().
* Thread C won't do anything because it can't get budget.
* Finally Thread B will run again and put the budget without kicking
  any queues.

Even though the example above is with blk_mq_do_dispatch_sched() I
believe the race is possible any time someone is holding budget but
doesn't do work.

Unfortunately, the unlikely has become more likely if you happen to be
using the BFQ I/O scheduler.  BFQ, by design, sometimes returns "true"
for has_work() but then NULL for dispatch_request() and stays in this
state for a while (currently up to 9 ms).  Suddenly you only need one
race to hit, not two races in a row.  With my current setup this is
easy to reproduce in reboot tests and traces have actually shown that
we hit a race similar to the one described above.

Note that we only need to fix blk_mq_do_dispatch_sched() and
blk_mq_do_dispatch_ctx() and not the other places that put budget.  In
other cases we know that we have work to do on at least one "hctx" and
code already exists to kick that "hctx"'s queue.  When that work
finally finishes all the queues will be kicked using the normal flow.

One last note is that (at least in the SCSI case) budget is shared by
all "hctx"s that have the same queue.  Thus we need to make sure to
kick the whole queue, not just re-run dispatching on a single "hctx".
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a0823421

blk-mq: Add blk_mq_delay_run_hw_queues() API call · b9151e7b

由 Douglas Anderson 提交于 4月 20, 2020

We have:
* blk_mq_run_hw_queue()
* blk_mq_delay_run_hw_queue()
* blk_mq_run_hw_queues()

...but not blk_mq_delay_run_hw_queues(), presumably because nobody
needed it before now.  Since we need it for a later patch in this
series, add it.
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b9151e7b

blk-mq: In blk_mq_dispatch_rq_list() "no budget" is a reason to kick · ab3cee37

由 Douglas Anderson 提交于 4月 20, 2020

In blk_mq_dispatch_rq_list(), if blk_mq_sched_needs_restart() returns
true and the driver returns BLK_STS_RESOURCE then we'll kick the
queue.  However, there's another case where we might need to kick it.
If we were unable to get budget we can be in much the same state as
when the driver returns BLK_STS_RESOURCE, so we should treat it the
same.

It should be noted that even if we add a whole bunch of extra kicking
to the queue in other patches this patch is still important.
Specifically any kicking that happened before we re-spliced leftover
requests into 'hctx->dispatch' wouldn't have found any work, so we
really need to make sure we kick ourselves after we've done the
splicing.
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab3cee37

20 4月, 2020 5 次提交

L

Linux 5.7-rc2 · ae83d0b4
由 Linus Torvalds 提交于 4月 19, 2020

ae83d0b4

mm: Fix MREMAP_DONTUNMAP accounting on VMA merge · dadbd85f

由 Brian Geffon 提交于 4月 17, 2020

When remapping a mapping where a portion of a VMA is remapped
into another portion of the VMA it can cause the VMA to become
split. During the copy_vma operation the VMA can actually
be remerged if it's an anonymous VMA whose pages have not yet
been faulted. This isn't normally a problem because at the end
of the remap the original portion is unmapped causing it to
become split again.

However, MREMAP_DONTUNMAP leaves that original portion in place which
means that the VMA which was split and then remerged is not actually
split at the end of the mremap. This patch fixes a bug where
we don't detect that the VMAs got remerged and we end up
putting back VM_ACCOUNT on the next mapping which is completely
unreleated. When that next mapping is unmapped it results in
incorrectly unaccounting for the memory which was never accounted,
and eventually we will underflow on the memory comittment.

There is also another issue which is similar, we're currently
accouting for the number of pages in the new_vma but that's wrong.
We need to account for the length of the remap operation as that's
all that is being added. If there was a mapping already at that
location its comittment would have been adjusted as part of
the munmap at the start of the mremap.

A really simple repro can be seen in:
https://gist.github.com/bgaff/e101ce99da7d9a8c60acc641d07f312c

Fixes: e346b381 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NBrian Geffon <bgeffon@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dadbd85f

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 86cc3398

由 Linus Torvalds 提交于 4月 19, 2020

Pull clk fixes from Stephen Boyd:
 "Two build fixes for a couple clk drivers and a fix for the Unisoc
  serial clk where we want to keep it on for earlycon"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sprd: don't gate uart console clock
  clk: mmp2: fix link error without mmp2
  clk: asm9260: fix __clk_hw_register_fixed_rate_with_accuracy typo

86cc3398

Merge tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0fe5f9ca

由 Linus Torvalds 提交于 4月 19, 2020

Pull x86 and objtool fixes from Thomas Gleixner:
 "A set of fixes for x86 and objtool:

  objtool:

   - Ignore the double UD2 which is emitted in BUG() when
     CONFIG_UBSAN_TRAP is enabled.

   - Support clang non-section symbols in objtool ORC dump

   - Fix switch table detection in .text.unlikely

   - Make the BP scratch register warning more robust.

  x86:

   - Increase microcode maximum patch size for AMD to cope with new CPUs
     which have a larger patch size.

   - Fix a crash in the resource control filesystem when the removal of
     the default resource group is attempted.

   - Preserve Code and Data Prioritization enabled state accross CPU
     hotplug.

   - Update split lock cpu matching to use the new X86_MATCH macros.

   - Change the split lock enumeration as Intel finaly decided that the
     IA32_CORE_CAPABILITIES bits are not architectural contrary to what
     the SDM claims. !@#%$^!

   - Add Tremont CPU models to the split lock detection cpu match.

   - Add a missing static attribute to make sparse happy"

* tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Add Tremont family CPU models
  x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural
  x86/resctrl: Preserve CDP enable over CPU hotplug
  x86/resctrl: Fix invalid attempt at removing the default resource group
  x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()
  x86/umip: Make umip_insns static
  x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
  objtool: Make BP scratch register warning more robust
  objtool: Fix switch table detection in .text.unlikely
  objtool: Support Clang non-section symbols in ORC generation
  objtool: Support Clang non-section symbols in ORC dump
  objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings

0fe5f9ca

Merge tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3e0dea57

由 Linus Torvalds 提交于 4月 19, 2020

Pull time namespace fix from Thomas Gleixner:
 "An update for the proc interface of time namespaces: Use symbolic
  names instead of clockid numbers. The usability nuisance of numbers
  was noticed by Michael when polishing the man page"

* tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets

3e0dea57

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功