提交 · f94e5b1a2ccb63e07bc180cd8fe321b41bae08dc · openanolis / cloud-kernel

17 1月, 2020 1 次提交

alinux: hotfix: Add Cloud Kernel hotfix enhancement · f94e5b1a

由 Xunlei Pang 提交于 7月 18, 2019

We reserve some fields beforehand for core structures prone to change,
so that we won't hurt when extra fields have to be added for hotfix,
thereby inceasing the success rate, we even can hot add features with
this enhancement.

After reserving, normally cache does not matter as the reserved fields
(usually at tail) are not accessed at all.

Currently involve the following structures:
    MM:
    struct zone
    struct pglist_data
    struct mm_struct
    struct vm_area_struct
    struct mem_cgroup
    struct writeback_control

    Block:
    struct gendisk
    struct backing_dev_info
    struct bio
    struct queue_limits
    struct request_queue
    struct blkcg
    struct blkcg_policy
    struct blk_mq_hw_ctx
    struct blk_mq_tag_set
    struct blk_mq_queue_data
    struct blk_mq_ops
    struct elevator_mq_ops
    struct inode
    struct dentry
    struct address_space
    struct block_device
    struct hd_struct
    struct bio_set

    Network:
    struct sk_buff
    struct sock
    struct net_device_ops
    struct xt_target
    struct dst_entry
    struct dst_ops
    struct fib_rule

    Scheduler:
    struct task_struct
    struct cfs_rq
    struct rq
    struct sched_statistics
    struct sched_entity
    struct signal_struct
    struct task_group
    struct cpuacct

    cgroup:
    struct cgroup_root
    struct cgroup_subsys_state
    struct cgroup_subsys
    struct css_set
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
[ caspar: use SPDX-License-Identifier ]
Signed-off-by: NCaspar Zhang <caspar@linux.alibaba.com>

f94e5b1a

15 1月, 2020 1 次提交

alinux: blk-throttle: limit bios to fix amount of pages entering writeback prematurely · 0fd4aa6d

由 Xiaoguang Wang 提交于 12月 28, 2018

Currently in blk_throtl_bio(), if one bio exceeds its throtl_grp's bps
or iops limit, this bio will be queued throtl_grp's throtl_service_queue,
then obviously mm subsys will submit more pages, even underlying device
can not handle these io requests, also this will make large amount of pages
entering writeback prematurely, later if some process writes some of these
pages, it will wait for long time.

I have done some tests: one process does buffered writes on a 1GB file,
and make this process's blkcg max bps limit be 10MB/s, I observe this:
	#cat /proc/meminfo  | grep -i back
	Writeback:        900024 kB
	WritebackTmp:          0 kB

I think this Writeback value is just too big, indeed many bios have been
queued in throtl_grp's throtl_service_queue, if one process try to write
the last bio's page in this queue, it will call wait_on_page_writeback(page),
which must wait the previous bios to finish and will take long time, we
have also see 120s hung task warning in our server.

 INFO: task kworker/u128:0:30072 blocked for more than 120 seconds.
       Tainted: G            E 4.9.147-013.ali3000_015_test.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kworker/u128:0  D    0 30072      2 0x00000000
 Workqueue: writeback wb_workfn (flush-8:16)
  ffff882ddd066b40 0000000000000000 ffff882e5cad3400 ffff882fbe959e80
  ffff882fa50b1a00 ffffc9003a5a3768 ffffffff8173325d ffffc9003a5a3780
  00ff882e5cad3400 ffff882fbe959e80 ffffffff81360b49 ffff882e5cad3400
 Call Trace:
  [<ffffffff8173325d>] ? __schedule+0x23d/0x6d0
  [<ffffffff81360b49>] ? alloc_request_struct+0x19/0x20
  [<ffffffff81733726>] schedule+0x36/0x80
  [<ffffffff81736c56>] schedule_timeout+0x206/0x4b0
  [<ffffffff81036c69>] ? sched_clock+0x9/0x10
  [<ffffffff81363073>] ? get_request+0x403/0x810
  [<ffffffff8110ca10>] ? ktime_get+0x40/0xb0
  [<ffffffff81732f8a>] io_schedule_timeout+0xda/0x170
  [<ffffffff81733f90>] ? bit_wait+0x60/0x60
  [<ffffffff81733fab>] bit_wait_io+0x1b/0x60
  [<ffffffff81733b28>] __wait_on_bit+0x58/0x90
  [<ffffffff811b0d91>] ? find_get_pages_tag+0x161/0x2e0
  [<ffffffff811aff62>] wait_on_page_bit+0x82/0xa0
  [<ffffffff810d47f0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffa02fc181>] mpage_prepare_extent_to_map+0x2d1/0x310 [ext4]
  [<ffffffff8121ff65>] ? kmem_cache_alloc+0x185/0x1a0
  [<ffffffffa0305a2f>] ? ext4_init_io_end+0x1f/0x40 [ext4]
  [<ffffffffa0300294>] ext4_writepages+0x404/0xef0 [ext4]
  [<ffffffff81508c64>] ? scsi_init_io+0x44/0x200
  [<ffffffff81398a0f>] ? fprop_fraction_percpu+0x2f/0x80
  [<ffffffff811c139e>] do_writepages+0x1e/0x30
  [<ffffffff8127c0f5>] __writeback_single_inode+0x45/0x320
  [<ffffffff8127c942>] writeback_sb_inodes+0x272/0x600
  [<ffffffff8127cf6b>] wb_writeback+0x10b/0x300
  [<ffffffff8127d884>] wb_workfn+0xb4/0x380
  [<ffffffff810b85e9>] ? try_to_wake_up+0x59/0x3e0
  [<ffffffff810a5759>] process_one_work+0x189/0x420
  [<ffffffff810a5a3e>] worker_thread+0x4e/0x4b0
  [<ffffffff810a59f0>] ? process_one_work+0x420/0x420
  [<ffffffff810ac026>] kthread+0xe6/0x100
  [<ffffffff810abf40>] ? kthread_park+0x60/0x60
  [<ffffffff81738499>] ret_from_fork+0x39/0x50

To fix this issue, we can simply limit throtl_service_queue's max queued
bios, currently we limit it to throtl_grp's bps_limit or iops limit, if it
still exteeds, we just sleep for a while.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

0fd4aa6d

27 12月, 2019 2 次提交

blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep() · cf569826

由 Tejun Heo 提交于 8月 28, 2019

commit 015d254cb02b6d8eec4b3366274bf4672f9e0b64 upstream.

Separate out blkcg_conf_get_disk() so that it can be used by blkcg
policy interface file input parsers before the policy is actually
enabled.  This doesn't introduce any functional changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

cf569826

blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn() · 15ac60d3

由 Tejun Heo 提交于 8月 28, 2019

commit cf09a8ee19ad1f78b4e18cdde9f2a61133efacf5 upstream.

Instead of @node, pass in @q and @blkcg so that the alloc function has
more context.  This doesn't cause any behavior change and will be used
by io.weight implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
[Joseph: resolve conflicts for cfq]
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

15ac60d3

01 9月, 2018 2 次提交

blkcg: delay blkg destruction until after writeback has finished · 59b57717

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59b57717

Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" · 6b065462

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

This reverts commit 4c699480.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c699480) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c699480 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b065462

12 8月, 2018 1 次提交

blkcg: Make blkg_root_lookup() work for queues in bypass mode · b86d865c

由 Bart Van Assche 提交于 8月 10, 2018

For legacy queues the only call of blkg_root_lookup() happens after
bypass mode has been enabled. Since blkg_lookup() returns NULL for
queues in bypass mode, modify the blkg_root_lookup() such that it
no longer depends on bypass mode. Rename the function into
blk_queue_root_blkg() as suggested by Tejun.
Suggested-by: NTejun Heo <tj@kernel.org>
Fixes: 6bad9b21 ("blkcg: Introduce blkg_root_lookup()")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b86d865c

09 8月, 2018 1 次提交

blkcg: Introduce blkg_root_lookup() · 6bad9b21

由 Bart Van Assche 提交于 8月 09, 2018

This new function will be used in a later patch to verify whether a
queue has been dissociated from the cgroup controller before being
released.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6bad9b21

30 7月, 2018 1 次提交

block: don't account for split bio's size in cgroup stats · c454edc2

由 Josef Bacik 提交于 7月 30, 2018

We need to check in blkcg_bio_issue_check if the bio is flagged as
QUEUE_ENTERED, because if it is then we've already accounted for the
size of the IO in the cgroup stats.  We can still however account for
the extra IO since it'll be another request.
Reported-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c454edc2

18 7月, 2018 1 次提交

blkcg: Track DISCARD statistics and output them in cgroup io.stat · 636620b6

由 Tejun Heo 提交于 7月 18, 2018

Add tracking of REQ_OP_DISCARD ios to the per-cgroup io.stat.  Two
fields, dbytes and dios, to respectively count the total bytes and
number of discards are added.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Cc: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

636620b6

09 7月, 2018 4 次提交

blkcg: add generic throttling mechanism · d09d8df3

由 Josef Bacik 提交于 7月 03, 2018

Since IO can be issued from literally anywhere it's almost impossible to
do throttling without having some sort of adverse effect somewhere else
in the system because of locking or other dependencies.  The best way to
solve this is to do the throttling when we know we aren't holding any
other kernel resources.  Do this by tracking throttling in a per-blkg
basis, and if we require throttling flag the task that it needs to check
before it returns to user space and possibly sleep there.

This is to address the case where a process is doing work that is
generating IO that can't be throttled, whether that is directly with a
lot of REQ_META IO, or indirectly by allocating so much memory that it
is swamping the disk with REQ_SWAP.  We can't use task_add_work as we
don't want to induce a memory allocation in the IO path, so simply
saving the request queue in the task and flagging it to do the
notify_resume thing achieves the same result without the overhead of a
memory allocation.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d09d8df3

blk: introduce REQ_SWAP · 0d1e0c7c

由 Josef Bacik 提交于 7月 03, 2018

Just like REQ_META, it's important to know the IO coming down is swap
in order to guard against potential IO priority inversion issues with
cgroups.  Add REQ_SWAP and use it for all swap IO, and add it to our
bio_issue_as_root_blkg helper.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d1e0c7c

blk-cgroup: allow controllers to output their own stats · 903d23f0

由 Josef Bacik 提交于 7月 03, 2018

blk-iolatency has a few stats that it would like to print out, and
instead of adding a bunch of crap to the generic code just provide a
helper so that controllers can add stuff to the stat line if they want
to.

Hide it behind a boot option since it changes the output of io.stat from
normal, and these stats are only interesting to developers.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

903d23f0

block: introduce bio_issue_as_root_blkg · c7c98fd3

由 Josef Bacik 提交于 7月 03, 2018

Instead of forcing all file systems to get the right context on their
bio's, simply check for REQ_META to see if we need to issue as the root
blkg.  We don't want to force all bio's to have the root blkg associated
with them if REQ_META is set, as some controllers (blk-iolatency) need
to know who the originating cgroup is so it can backcharge them for the
work they are doing.  This helper will make sure that the controllers do
the proper thing wrt the IO priority and backcharging.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7c98fd3

17 3月, 2018 1 次提交

blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir() · 4c699480

由 Joseph Qi 提交于 3月 16, 2018

We've triggered a WARNING in blk_throtl_bio() when throttling writeback
io, which complains blkg->refcnt is already 0 when calling blkg_get(),
and then kernel crashes with invalid page request.
After investigating this issue, we've found it is caused by a race
between blkcg_bio_issue_check() and cgroup_rmdir(), which is described
below:

writeback kworker               cgroup_rmdir
                                  cgroup_destroy_locked
                                    kill_css
                                      css_killed_ref_fn
                                        css_killed_work_fn
                                          offline_css
                                            blkcg_css_offline
  blkcg_bio_issue_check
    rcu_read_lock
    blkg_lookup
                                              spin_trylock(q->queue_lock)
                                              blkg_destroy
                                              spin_unlock(q->queue_lock)
    blk_throtl_bio
    spin_lock_irq(q->queue_lock)
    ...
    spin_unlock_irq(q->queue_lock)
  rcu_read_unlock

Since rcu can only prevent blkg from releasing when it is being used,
the blkg->refcnt can be decreased to 0 during blkg_destroy() and schedule
blkg release.
Then trying to blkg_get() in blk_throtl_bio() will complains the WARNING.
And then the corresponding blkg_put() will schedule blkg release again,
which result in double free.
This race is introduced by commit ae118896 ("blkcg: consolidate blkg
creation in blkcg_bio_issue_check()"). Before this commit, it will
lookup first and then try to lookup/create again with queue_lock. Since
revive this logic is a bit drastic, so fix it by only offlining pd during
blkcg_css_offline(), and move the rest destruction (especially
blkg_put()) into blkcg_css_free(), which should be the right way as
discussed.

Fixes: ae118896 ("blkcg: consolidate blkg creation in blkcg_bio_issue_check()")
Reported-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4c699480

16 1月, 2018 1 次提交

blkcg: simplify statistic accumulation code · ddc21231

由 Arnd Bergmann 提交于 1月 16, 2018

Some older compilers (gcc-4.4 through 4.6 in particular) struggle
with the way that blkg_rwstat_read() returns a structure, leading
to excessive stack usage and rather inefficient code:

block/blk-cgroup.c: In function 'blkg_destroy':
block/blk-cgroup.c:354:1: error: the frame size of 1296 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
block/cfq-iosched.c: In function 'cfqg_stats_add_aux':
block/cfq-iosched.c:753:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
block/bfq-cgroup.c: In function 'bfqg_stats_add_aux':
block/bfq-cgroup.c:299:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

I also notice that there is no point in using atomic accesses
for the local variables, so storing the temporaries in simple 'u64'
variables not only avoids the stack usage on older compilers but
also improves the object code on modern versions.

Fixes: e6269c44 ("blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it")
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ddc21231

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

26 9月, 2017 2 次提交

block: make blkcg aware of kthread stored original cgroup info · 902ec5b6

由 Shaohua Li 提交于 9月 14, 2017

bio_blkcg is the only API to get cgroup info for a bio right now. If
bio_blkcg finds current task is a kthread and has original blkcg
associated, it will use the css instead of associating the bio to
current task. This makes it possible that kthread dispatches bios on
behalf of other threads.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

902ec5b6

blkcg: delete unused APIs · af551fb3

由 Shaohua Li 提交于 9月 14, 2017

Nobody uses the APIs right now.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af551fb3

29 7月, 2017 1 次提交

block: always attach cgroup info into bio · 007cc56b

由 Shaohua Li 提交于 7月 12, 2017

blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
info.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

007cc56b

21 6月, 2017 1 次提交

percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch · 104b4e51

由 Nikolay Borisov 提交于 6月 20, 2017

Currently, percpu_counter_add is a wrapper around __percpu_counter_add
which is preempt safe due to explicit calls to preempt_disable.  Given
how __ prefix is used in percpu related interfaces, the naming
unfortunately creates the false sense that __percpu_counter_add is
less safe than percpu_counter_add.  In terms of context-safety,
they're equivalent.  The only difference is that the __ version takes
a batch parameter.

Make this a bit more explicit by just renaming __percpu_counter_add to
percpu_counter_add_batch.

This patch doesn't cause any functional changes.

tj: Minor updates to patch description for clarity.  Cosmetic
    indentation updates.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: linux-mm@kvack.org
Cc: "David S. Miller" <davem@davemloft.net>

104b4e51

30 3月, 2017 1 次提交

Revert "blkcg: allocate struct blkcg_gq outside request queue spinlock" · d708f0d5

由 Jens Axboe 提交于 3月 29, 2017

I inadvertently applied the v5 version of this patch, whereas
the agreed upon version was v5. Revert this one so we can apply
the right one.

This reverts commit 7fc6b87a.

d708f0d5

29 3月, 2017 1 次提交

blkcg: allocate struct blkcg_gq outside request queue spinlock · 7fc6b87a

由 Tahsin Erdogan 提交于 3月 09, 2017

blkg_conf_prep() currently calls blkg_lookup_create() while holding
request queue spinlock. This means allocating memory for struct
blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
failures in call paths like below:

  pcpu_alloc+0x68f/0x710
  __alloc_percpu_gfp+0xd/0x10
  __percpu_counter_init+0x55/0xc0
  cfq_pd_alloc+0x3b2/0x4e0
  blkg_alloc+0x187/0x230
  blkg_create+0x489/0x670
  blkg_lookup_create+0x9a/0x230
  blkg_conf_prep+0x1fb/0x240
  __cfqg_set_weight_device.isra.105+0x5c/0x180
  cfq_set_weight_on_dfl+0x69/0xc0
  cgroup_file_write+0x39/0x1c0
  kernfs_fop_write+0x13f/0x1d0
  __vfs_write+0x23/0x120
  vfs_write+0xc2/0x1f0
  SyS_write+0x44/0xb0
  entry_SYSCALL_64_fastpath+0x18/0xad

In the code path above, percpu allocator cannot call vmalloc() due to
queue spinlock.

A failure in this call path gives grief to tools which are trying to
configure io weights. We see occasional failures happen shortly after
reboots even when system is not under any memory pressure. Machines
with a lot of cpus are more vulnerable to this condition.

Update blkg_create() function to temporarily drop the rcu and queue
locks when it is allowed by gfp mask.
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

7fc6b87a

01 11月, 2016 1 次提交
- C
  blk-cgroup: use op_is_sync to check for synchronous requests · d71d9ae1
  由 Christoph Hellwig 提交于 11月 01, 2016
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
  d71d9ae1
28 10月, 2016 1 次提交

block: better op and flags encoding · ef295ecf

由 Christoph Hellwig 提交于 10月 28, 2016

Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

ef295ecf

24 9月, 2016 1 次提交

blkcg: Annotate blkg_hint correctly · 55679c8d

由 Bart Van Assche 提交于 9月 23, 2016

Avoid that sparse complains about blkg_hint manipulations.

Fixes: a637120e ("blkcg: use radix tree to index blkgs from blkcg")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

55679c8d

10 8月, 2016 1 次提交

cgroup: make cgroup_path() and friends behave in the style of strlcpy() · 4c737b41

由 Tejun Heo 提交于 8月 10, 2016

cgroup_path() and friends used to format the path from the end and
thus the resulting path usually didn't start at the start of the
passed in buffer.  Also, when the buffer was too small, the partial
result was truncated from the head rather than tail and there was no
way to tell how long the full path would be.  These make the functions
less robust and more awkward to use.

With recent updates to kernfs_path(), cgroup_path() and friends can be
made to behave in strlcpy() style.

* cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now
  always return the length of the full path.  If buffer is too small,
  it contains nul terminated truncated output.

* All users updated accordingly.

v2: cgroup_path() usage in kernel/sched/debug.c converted.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Peter Zijlstra <peterz@infradead.org>

4c737b41

08 8月, 2016 1 次提交

block: rename bio bi_rw to bi_opf · 1eff9d32

由 Jens Axboe 提交于 8月 05, 2016

Since commit 63a4cc24, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.
Signed-off-by: NJens Axboe <axboe@fb.com>

1eff9d32

08 6月, 2016 2 次提交

blkg_rwstat: separate op from flags · 63a4cc24

由 Mike Christie 提交于 6月 05, 2016

The bio and request operation and flags are going to be separate
definitions, so we cannot pass them in as a bitmap. This patch
converts the blkg_rwstat code and its caller, cfq, to pass in the
values separately.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

63a4cc24

block, drivers, cgroup: use op_is_write helper instead of checking for REQ_WRITE · a8ebb056

由 Mike Christie 提交于 6月 05, 2016

We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.

This patch converts the drivers and cgroup to use the
op_is_write helper. This should just cover the simple
cases. I did dm, md and bcache in their own patches
because they were more involved.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a8ebb056

27 10月, 2015 1 次提交

blkcg: fix incorrect read/write sync/async stat accounting · 174fd8d3

由 Tejun Heo 提交于 10月 22, 2015

While unifying how blkcg stats are collected, 77ea7338 ("blkcg:
move io_service_bytes and io_serviced stats into blkcg_gq")
incorrectly used bio->flags instead of bio->rw to tell the IO type.
This made IOs to be accounted as the wrong type.  Fix it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 77ea7338 ("blkcg: move io_service_bytes and io_serviced stats into blkcg_gq")
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

174fd8d3

19 8月, 2015 9 次提交

blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy · 69d7fde5

由 Tejun Heo 提交于 8月 18, 2015

cgroup is trying to make interface consistent across different
controllers.  For weight based resource control, the knob should have
the range [1, 10000] and default to 100.  This patch updates
cfq-iosched so that the weight range conforms.  The internal
calculations have enough range and the widening of the weight range
shouldn't cause any problem.

* blkcg_policy->cpd_bind_fn() is added.  If present, this is invoked
  when blkcg is attached to a hierarchy.

* cfq_cpd_init() is updated to use the new default value on the
  unified hierarchy.

* cfq_cpd_bind() callback is implemented to clear per-blkg configs and
  apply the default config matching the hierarchy type.

* cfqd->root_group->[leaf_]weight initialization in cfq_init_queue()
  is moved into !CONFIG_CFQ_GROUP_IOSCHED block.  cfq_cpd_bind() is
  now responsible for initializing the initial weights when blkcg is
  enabled.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

69d7fde5

blkcg: implement interface for the unified hierarchy · 2ee867dc

由 Tejun Heo 提交于 8月 18, 2015

blkcg interface grew to be the biggest of all controllers and
unfortunately most inconsistent too.  The interface files are
inconsistent with a number of cloes duplicates.  Some files have
recursive variants while others don't.  There's distinction between
normal and leaf weights which isn't intuitive and there are a lot of
stat knobs which don't make much sense outside of debugging and expose
too much implementation details to userland.

In the unified hierarchy, everything is always hierarchical and
internal nodes can't have tasks rendering the two structural issues
twisting the current interface.  The interface has to be updated in a
significant anyway and this is a good chance to revamp it as a whole.
This patch implements blkcg interface for the unified hierarchy.

* (from a previous patch) blkcg is identified by "io" instead of
  "blkio" on the unified hierarchy.  Given that the whole interface is
  updated anyway, the rename shouldn't carry noticeable conversion
  overhead.

* The original interface consisted of 27 files is replaced with the
  following three files.

  blkio.stat	: per-blkcg stats
  blkio.weight	: per-cgroup and per-cgroup-queue weight settings
  blkio.max	: per-cgroup-queue bps and iops max limits

Documentation/cgroups/unified-hierarchy.txt updated accordingly.

v2: blkcg_policy->dfl_cftypes wasn't removed on
    blkcg_policy_unregister() corrupting the cftypes list.  Fixed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

2ee867dc

blkcg: misc preparations for unified hierarchy interface · dd165eb3

由 Tejun Heo 提交于 8月 18, 2015

* Export blkg_dev_name()

* Drop unnecessary @cft from __cfq_set_weight().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

dd165eb3

blkcg: move body parsing from blkg_conf_prep() to its callers · 36aa9e5f

由 Tejun Heo 提交于 8月 18, 2015

Currently, blkg_conf_prep() expects input to be of the following form

 MAJ:MIN NUM

and reads the NUM part into blkg_conf_ctx->v.  This is quite
restrictive and gets in the way in implementing blkcg interface for
the unified hierarchy.  This patch updates blkg_conf_prep() so that it
expects

 MAJ:MIN BODY_STR

where BODY_STR is an arbitrary string.  blkg_conf_ctx->v is replaced
with ->body which is a char pointer pointing to the start of BODY_STR.
Parsing of the body is moved to blkg_conf_prep()'s callers.

To allow using, for example, strsep() on blkg_conf_ctx->val, it is a
non-const pointer and to accommodate that const is dropped from @input
too.

This doesn't cause any behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

36aa9e5f

blkcg: mark existing cftypes as legacy · 880f50e2

由 Tejun Heo 提交于 8月 18, 2015

blkcg is about to grow interface for the unified hierarchy.  Add
legacy to existing cftypes.

* blkcg_policy->cftypes -> blkcg_policy->legacy_cftypes
* blk-cgroup.c:blkcg_files -> blkcg_legacy_files
* cfq-iosched.c:cfq_blkcg_files -> cfq_blkcg_legacy_files
* blk-throttle.c:throtl_files -> throtl_legacy_files

Pure renames.  No functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

880f50e2

blkcg: rename subsystem name from blkio to io · c165b3e3

由 Tejun Heo 提交于 8月 18, 2015

blkio interface has become messy over time and is currently the
largest.  In addition to the inconsistent naming scheme, it has
multiple stat files which report more or less the same thing, a number
of debug stat files which expose internal details which shouldn't have
been part of the public interface in the first place, recursive and
non-recursive stats and leaf and non-leaf knobs.

Both recursive vs. non-recursive and leaf vs. non-leaf distinctions
don't make any sense on the unified hierarchy as only leaf cgroups can
contain processes.  cgroups is going through a major interface
revision with the unified hierarchy involving significant fundamental
usage changes and given that a significant portion of the interface
doesn't make sense anymore, it's a good time to reorganize the
interface.

As the first step, this patch renames the external visible subsystem
name from "blkio" to "io".  This is more concise, matches the other
two major subsystem names, "cpu" and "memory", and better suited as
blkcg will be involved in anything writeback related too whether an
actual block device is involved or not.

As the subsystem legacy_name is set to "blkio", the only userland
visible change outside the unified hierarchy is that blkcg is reported
as "io" instead of "blkio" in the subsystem initialized message during
boot.  On the unified hierarchy, blkcg now appears as "io".
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

c165b3e3

blkcg: move io_service_bytes and io_serviced stats into blkcg_gq · 77ea7338

由 Tejun Heo 提交于 8月 18, 2015

Currently, both cfq-iosched and blk-throttle keep track of
io_service_bytes and io_serviced stats.  While keeping track of them
separately may be useful during development, it doesn't make much
sense otherwise.  Also, blk-throttle was counting bio's as IOs while
cfq-iosched request's, which is more confusing than informative.

This patch adds ->stat_bytes and ->stat_ios to blkg (blkcg_gq),
removes the counterparts from cfq-iosched and blk-throttle and let
them print from the common blkg counters.  The common counters are
incremented during bio issue in blkcg_bio_issue_check().

The outputs are still filtered by whether the policy has
blkg_policy_data on a given blkg, so cfq's output won't show up if it
has never been used for a given blkg.  The only times when the outputs
would differ significantly are when policies are attached on the fly
or elevators are switched back and forth.  Those are quite exceptional
operations and I don't think they warrant keeping separate counters.

v3: Update blkio-controller.txt accordingly.

v2: Account IOs during bio issues instead of request completions so
    that bio-based drivers can be handled the same way.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

77ea7338

blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq · f12c74ca

由 Tejun Heo 提交于 8月 18, 2015

Currently, blkg_[rw]stat_recursive_sum() assume that the target
counter is located in pd (blkg_policy_data); however, some counters
are planned to be moved to blkg (blkcg_gq).

This patch updates blkg_[rw]stat_recursive_sum() to take blkg and
blkg_policy pointers instead of pd.  If policy is NULL, it indexes
into blkg.  If non-NULL, into the blkg's pd of the policy.

The existing usages are updated to maintain the current behaviors.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f12c74ca

blkcg: make blkcg_[rw]stat per-cpu · 24bdb8ef

由 Tejun Heo 提交于 8月 18, 2015

blkcg_[rw]stat are used as stat counters for blkcg policies.  It isn't
per-cpu by itself and blk-throttle makes it per-cpu by wrapping around
it.  This patch makes blkcg_[rw]stat per-cpu and drop the ad-hoc
per-cpu wrapping in blk-throttle.

* blkg_[rw]stat->cnt is replaced with cpu_cnt which is struct
  percpu_counter.  This makes syncp unnecessary as remote accesses are
  handled by percpu_counter itself.

* blkg_[rw]stat_init() can now fail due to percpu allocation failure
  and thus are updated to return int.

* percpu_counters need explicit freeing.  blkg_[rw]stat_exit() added.

* As blkg_rwstat->cpu_cnt[] can't be read directly anymore, reading
  and summing results are stored in ->aux_cnt[] instead.

* Custom per-cpu stat implementation in blk-throttle is removed.

This makes all blkcg stat counters per-cpu without complicating policy
implmentations.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

24bdb8ef

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功