提交 · e92e113cabc1d3e47dc4740a814adb413f022e2f · openanolis / cloud-kernel

06 12月, 2013 4 次提交

netprio_cgroup: convert away from cftype->read_map() · e92e113c

由 Tejun Heo 提交于 12月 05, 2013

In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

cftype->read_map() doesn't add any value and being replaced with
->read_seq_string().  Update read_priomap() to use ->read_seq_string()
instead.

This patch doesn't make any visible behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: NLi Zefan <lizefan@huawei.com>

e92e113c

memcg: convert away from cftype->read() and ->read_map() · 791badbd

由 Tejun Heo 提交于 12月 05, 2013

In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

cftype->read_map() doesn't add any value and being replaced with
->read_seq_string(), and all users of cftype->read() can be easily
served, usually better, by seq_file and other methods.

Update mem_cgroup_read() to return u64 instead of printing itself and
rename it to mem_cgroup_read_u64(), and update
mem_cgroup_oom_control_read() to use ->read_seq_string() instead of
->read_map().

This patch doesn't make any visible behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

791badbd

cpuset: convert away from cftype->read() · 51ffe411

由 Tejun Heo 提交于 12月 05, 2013

In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

All users of cftype->read() can be easily served, usually better, by
seq_file and other methods.  Rename cpuset_common_file_read() to
cpuset_common_read_seq_string() and convert it to use
read_seq_string() interface instead.  This not only simplifies the
code but also makes it more versatile.  Before, the file couldn't
output if the result is longer than PAGE_SIZE.  After the conversion,
seq_file automatically grows the buffer until the output can fit.

This patch doesn't make any visible behavior changes except for being
able to handle output larger than PAGE_SIZE.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

51ffe411

cgroup, sched: convert away from cftype->read_map() · 44ffc75b

由 Tejun Heo 提交于 12月 05, 2013

In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

cftype->read_map() doesn't add any value and being replaced with
->read_seq_string().  Update cpu_stats_show() and cpuacct_stats_show()
accordingly.

This patch doesn't make any visible behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>

44ffc75b

29 11月, 2013 9 次提交

cgroup: don't guarantee cgroup.procs is sorted if sane_behavior · afb2bc14

由 Tejun Heo 提交于 11月 29, 2013

For some reason, tasks and cgroup.procs guarantee that the result is
sorted.  This is the only reason this whole pidlist logic is necessary
instead of just iterating through sorted member tasks.  We can't do
anything about the existing interface but at least ensure that such
expectation doesn't exist for the new interface so that pidlist logic
may be removed in the distant future.

This patch scrambles the sort order if sane_behavior so that the
output is usually not sorted in the new interface.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

afb2bc14

cgroup: remove cgroup_pidlist->use_count · 04502365

由 Tejun Heo 提交于 11月 29, 2013

After the recent changes, pidlist ref is held only between
cgroup_pidlist_start() and cgroup_pidlist_stop() during which
cgroup->pidlist_mutex is also held.  IOW, the reference count is
redundant now.  While in use, it's always one and pidlist_mutex is
held - holding the mutex has exactly the same protection.

This patch collapses destroy_dwork queueing into cgroup_pidlist_stop()
so that pidlist_mutex is not released inbetween and drops
pidlist->use_count.

This patch shouldn't introduce any behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

04502365

cgroup: load and release pidlists from seq_file start and stop respectively · 4bac00d1

由 Tejun Heo 提交于 11月 29, 2013

Currently, pidlists are reference counted from file open and release
methods.  This means that holding onto an open file may waste memory
and reads may return data which is very stale.  Both aren't critical
because pidlists are keyed and shared per namespace and, well, the
user isn't supposed to have large delay between open and reads.

cgroup is planned to be converted to use kernfs and it'd be best if we
can stick to just the seq_file operations - start, next, stop and
show.  This can be achieved by loading pidlist on demand from start
and release with time delay from stop, so that consecutive reads don't
end up reloading the pidlist on each iteration.  This would remove the
need for hooking into open and release while also avoiding issues with
holding onto pidlist for too long.

The previous patches implemented delayed release and restructured
pidlist handling so that pidlists can be loaded and released from
seq_file start / stop.  This patch actually moves pidlist load to
start and release to stop.

This means that pidlist is pinned only between start and stop and may
go away between two consecutive read calls if the two calls are apart
by more than CGROUP_PIDLIST_DESTROY_DELAY.  cgroup_pidlist_start()
thus can't re-use the stored cgroup_pid_list_open_file->pidlist
directly.  During start, it's only used as a hint indicating whether
this is the first start after open or not and pidlist is always looked
up or created.

pidlist_mutex locking and reference counting are moved out of
pidlist_array_load() so that pidlist_array_load() can perform lookup
and creation atomically.  While this enlarges the area covered by
pidlist_mutex, given how the lock is used, it's highly unlikely to be
noticeable.

v2: Refreshed on top of the updated "cgroup: introduce struct
    cgroup_pidlist_open_file".
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

4bac00d1

cgroup: remove cgroup_pidlist->rwsem · 069df3b7

由 Tejun Heo 提交于 11月 29, 2013

cgroup_pidlist locking is needlessly complicated.  It has outer
cgroup->pidlist_mutex to protect the list of pidlists associated with
a cgroup and then each pidlist has rwsem to synchronize updates and
reads.  Given that the only read access is from seq_file operations
which are always invoked back-to-back, the rwsem is a giant overkill.
All it does is adding unnecessary complexity.

This patch removes cgroup_pidlist->rwsem and protects all accesses to
pidlists belonging to a cgroup with cgroup->pidlist_mutex.
pidlist->rwsem locking is removed if it's nested inside
cgroup->pidlist_mutex; otherwise, it's replaced with
cgroup->pidlist_mutex locking.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

069df3b7

cgroup: refactor cgroup_pidlist_find() · e6b81710

由 Tejun Heo 提交于 11月 29, 2013

Rename cgroup_pidlist_find() to cgroup_pidlist_find_create() and
separate out finding proper to cgroup_pidlist_find().  Also, move
locking to the caller.

This patch is preparation for pidlist restructure and doesn't
introduce any behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

e6b81710

cgroup: introduce struct cgroup_pidlist_open_file · 62236858

由 Tejun Heo 提交于 11月 29, 2013

For pidlist files, seq_file->private pointed to the loaded
cgroup_pidlist; however, pidlist loading is planned to be moved to
cgroup_pidlist_start() for kernfs conversion and seq_file->private
needs to carry more information from open to allow that.

This patch introduces struct cgroup_pidlist_open_file which contains
type, cgrp and pidlist and updates pidlist seq_file->private to point
to it using seq_open_private() and seq_release_private().  Note that
this eventually will be replaced by kernfs_open_file.

While this patch makes more information available to seq_file
operations, they don't use it yet and this patch doesn't introduce any
behavior changes except for allocation of the extra private struct.

v2: use __seq_open_private() instead of seq_open_private() for brevity
    as suggested by Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

62236858

cgroup: implement delayed destruction for cgroup_pidlist · b1a21367

由 Tejun Heo 提交于 11月 29, 2013

Currently, pidlists are reference counted from file open and release
methods. This means that holding onto an open file may waste memory
and reads may return data which is very stale. Both aren't critical
because pidlists are keyed and shared per namespace and, well, the
user isn't supposed to have large delay between open and reads.

cgroup is planned to be converted to use kernfs and it'd be best if we
can stick to just the seq_file operations - start, next, stop and
show. This can be achieved by loading pidlist on demand from start
and release with time delay from stop, so that consecutive reads don't
end up reloading the pidlist on each iteration. This would remove the
need for hooking into open and release while also avoiding issues with
holding onto pidlist for too long.

This patch implements delayed release of pidlist. As pidlists could
be lingering on cgroup removal waiting for the timer to expire, cgroup
free path needs to queue the destruction work item immediately and
flush. As those work items are self-destroying, each work item can't
be flushed directly. A new workqueue - cgroup_pidlist_destroy_wq - is
added to serve as flush domain.

Note that this patch just adds delayed release on top of the current
implementation and doesn't change where pidlist is loaded and
released. Following patches will make those changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

b1a21367

cgroup: remove cftype->release() · b9f3ceca

由 Tejun Heo 提交于 11月 29, 2013

Now that pidlist files don't use cftype->release(), it doesn't have
any user left.  Remove it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

b9f3ceca

cgroup: don't skip seq_open on write only opens on pidlist files · ac1e69aa

由 Tejun Heo 提交于 11月 29, 2013

Currently, cgroup_pidlist_open() skips seq_open() and pidlist loading
if the file is opened write-only, which is a sensible optimization as
pidlist loading can be costly and there often are occasions where
tasks or cgroup.procs is opened write-only.  However, pidlist init and
release are planned to be moved to cgroup_pidlist_start/stop()
respectively which would make this optimization unnecessary.

This patch removes the optimization and always fully initializes
pidlist files regardless of open mode.  This will help moving pidlist
handling to start/stop by unifying rw paths and removes the need for
specifying cftype->release() in addition to .release in
cgroup_pidlist_operations as file->f_op is now always overridden.  As
pidlist files were the only user of cftype->release(), the next patch
will remove the method.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

ac1e69aa

28 11月, 2013 3 次提交

cgroup: Merge branch 'for-3.13-fixes' into for-3.14 · c729b11e

由 Tejun Heo 提交于 11月 27, 2013

Pull to receive e605b365 ("cgroup: fix cgroup_subsys_state leak
for seq_files") as for-3.14 is scheduled to have a lot of changes
which depend on it.
Signed-off-by: NTejun Heo <tj@kernel.org>

c729b11e

cgroup: fix cgroup_subsys_state leak for seq_files · e605b365

由 Tejun Heo 提交于 11月 27, 2013

If a cgroup file implements either read_map() or read_seq_string(),
such file is served using seq_file by overriding file->f_op to
cgroup_seqfile_operations, which also overrides the release method to
single_release() from cgroup_file_release().

Because cgroup_file_open() didn't use to acquire any resources, this
used to be fine, but since f7d58818 ("cgroup: pin
cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open()
pins the css (cgroup_subsys_state) which is put by
cgroup_file_release().  The patch forgot to update the release path
for seq_files and each open/release cycle leaks a css reference.

Fix it by updating cgroup_file_release() to also handle seq_files and
using it for seq_file release path too.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.12

e605b365

cpuset: Fix memory allocator deadlock · 0fc0287c

由 Peter Zijlstra 提交于 11月 26, 2013

Juri hit the below lockdep report:

[    4.303391] ======================================================
[    4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[    4.303394] 3.12.0-dl-peterz+ #144 Not tainted
[    4.303395] ------------------------------------------------------
[    4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[    4.303399]  (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290
[    4.303417]
[    4.303417] and this task is already holding:
[    4.303418]  (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100
[    4.303431] which would create a new lock dependency:
[    4.303432]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
[    4.303436]

[    4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[    4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
[    4.303922]    HARDIRQ-ON-W at:
[    4.303923]                     [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0
[    4.303926]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
[    4.303929]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
[    4.303931]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
[    4.303933]    SOFTIRQ-ON-W at:
[    4.303933]                     [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0
[    4.303935]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
[    4.303940]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
[    4.303955]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
[    4.303959]    INITIAL USE at:
[    4.303960]                    [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0
[    4.303963]                    [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
[    4.303966]                    [<ffffffff81063dd6>] kthreadd+0x86/0x180
[    4.303969]                    [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
[    4.303972]  }

Which reports that we take mems_allowed_seq with interrupts enabled. A
little digging found that this can only be from
cpuset_change_task_nodemask().

This is an actual deadlock because an interrupt doing an allocation will
hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
forever waiting for the write side to complete.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Reported-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Tested-by: NJuri Lelli <juri.lelli@gmail.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

0fc0287c

23 11月, 2013 24 次提交

cgroup: Merge branch 'memcg_event' into for-3.14 · edab9510

由 Tejun Heo 提交于 11月 22, 2013

Merge v3.12 based patch series to move cgroup_event implementation to
memcg into for-3.14.  The following two commits cause a conflict in
kernel/cgroup.c

  2ff2a7d0 ("cgroup: kill css_id")
  79bd9814 ("cgroup, memcg: move cgroup_event implementation to memcg")

Each patch removes a struct definition from kernel/cgroup.c.  As the
two are adjacent, they cause a context conflict.  Easily resolved by
removing both structs.
Signed-off-by: NTejun Heo <tj@kernel.org>

edab9510

cgroup: unexport cgroup_css() and remove __file_cft() · b36824c7

由 Tejun Heo 提交于 11月 22, 2013

Now that cgroup_event is made memcg specific, the temporarily exported
functions are no longer necessary.  Unexport cgroup_css() and remove
__file_cft() which doesn't have any user left.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>

b36824c7

memcg: rename cgroup_event to mem_cgroup_event · 3bc942f3

由 Tejun Heo 提交于 11月 22, 2013

cgroup_event is only available in memcg now.  Let's brand it that way.
While at it, add a comment encouraging deprecation of the feature and
remove the respective section from cgroup documentation.

This patch is cosmetic.

v3: Typo update as per Li Zefan.

v2: Index in cgroups.txt updated accordingly as suggested by Li Zefan.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>

3bc942f3

memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state · 59b6f873

由 Tejun Heo 提交于 11月 22, 2013

cgroup_event is now memcg specific.  Replace cgroup_event->css with
->memcg and convert [un]register_event() callbacks to take mem_cgroup
pointer instead of cgroup_subsys_state one.  This simplifies the code
slightly and makes css_to_vmpressure() unnecessary which is removed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>

59b6f873

memcg: remove cgroup_event->cft · 347c4a87

由 Tejun Heo 提交于 11月 22, 2013

The only use of cgroup_event->cft is distinguishing "usage_in_bytes"
and "memsw.usgae_in_bytes" for mem_cgroup_usage_[un]register_event(),
which can be done by adding an explicit argument to the function and
implementing two wrappers so that the two cases can be distinguished
from the function alone.

Remove cgroup_event->cft and the related code including
[un]register_events() methods.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>

347c4a87

cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg · fba94807

由 Tejun Heo 提交于 11月 22, 2013

cgroup_event is being moved from cgroup core to memcg and the
implementation is already moved by the previous patch.  This patch
moves the data fields and callbacks.

* cgroup->event_list[_lock] are moved to mem_cgroup.

* cftype->[un]register_event() are moved to cgroup_event.  This makes
  it impossible for individual cftype definitions to specify their
  event callbacks.  This is worked around by simply hard-coding
  filename to event callback mapping in cgroup_write_event_control().
  This is awkward and inflexible, which is actually desirable given
  that we don't want to grow more usages of this feature.

* eventfd_ctx declaration is removed from cgroup.h, which makes
  vmpressure.h miss eventfd_ctx declaration.  Include eventfd.h from
  vmpressure.h.

v2: Use file name from dentry instead of cftype.  This will allow
    removing all cftype handling in the function.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>

fba94807

memcg: cgroup_write_event_control() now knows @css is for memcg · b5557c4c

由 Tejun Heo 提交于 11月 22, 2013

@css for cgroup_write_event_control() is now always for memcg and the
target file should be a memcg file too.  Drop code which assumes @css
is dummy_css and the target file may belong to different subsystems.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>

b5557c4c

cgroup, memcg: move cgroup_event implementation to memcg · 79bd9814

由 Tejun Heo 提交于 11月 22, 2013

cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

cgroup_css() and __file_cft() are exported to enable the move;
however, this will soon be reverted once the event code is updated to
be memcg specific.

Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Aside from the above change, this is pure code relocation.

v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
    poll.h inclusion moved from cgroup.c to memcontrol.c.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>

79bd9814

cgroup: use a dedicated workqueue for cgroup destruction · e5fca243

由 Tejun Heo 提交于 11月 22, 2013

Since be445626 ("cgroup: remove synchronize_rcu() from
cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
freeing is performed from a work item from that point on and a later
commit, ea15f8cc ("cgroup: split cgroup destruction into two
steps"), moves css offlining to workqueue too.

As cgroup destruction isn't depended upon for memory reclaim, the
destruction work items were put on the system_wq; unfortunately, some
controller may block in the destruction path for considerable duration
while holding cgroup_mutex.  As large part of destruction path is
synchronized through cgroup_mutex, when combined with high rate of
cgroup removals, this has potential to fill up system_wq's max_active
of 256.

Also, it turns out that memcg's css destruction path ends up queueing
and waiting for work items on system_wq through work_on_cpu().  If
such operation happens while system_wq is fully occupied by cgroup
destruction work items, work_on_cpu() can't make forward progress
because system_wq is full and other destruction work items on
system_wq can't make forward progress because the work item waiting
for work_on_cpu() is holding cgroup_mutex, leading to deadlock.

This can be fixed by queueing destruction work items on a separate
workqueue.  This patch creates a dedicated workqueue -
cgroup_destroy_wq - for this purpose.  As these work items shouldn't
have inter-dependencies and mostly serialized by cgroup_mutex anyway,
giving high concurrency level doesn't buy anything and the workqueue's
@max_active is set to 1 so that destruction work items are executed
one by one on each CPU.

Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
separate core_initcall().  In the future, we probably want to reorder
so that workqueue init happens before cgroup_init().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NHugh Dickins <hughd@google.com>
Reported-by: NShawn Bohrer <shawn.bohrer@gmail.com>
Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
Cc: stable@vger.kernel.org # v3.9+

e5fca243

L

Linux 3.13-rc1 · 6ce4eac1
由 Linus Torvalds 提交于 11月 22, 2013

6ce4eac1

Merge tag 'ecryptfs-3.13-rc1-quiet-checkers' of... · 57498f9c

由 Linus Torvalds 提交于 11月 22, 2013

Merge tag 'ecryptfs-3.13-rc1-quiet-checkers' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull minor eCryptfs fix from Tyler Hicks:
 "Quiet static checkers by removing unneeded conditionals"

* tag 'ecryptfs-3.13-rc1-quiet-checkers' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: file->private_data is always valid

57498f9c

Merge tag 'sound-fix2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e48f88a3

由 Linus Torvalds 提交于 11月 22, 2013

Pull second set of sound fixes from Takashi Iwai:
 "A collection of small fixes in HD-audio quirks and runtime PM, ASoC
  rcar, abs8500 and other codecs.  Most of commits are for stable
  kernels, too"

* tag 'sound-fix2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Set current_headset_type to ALC_HEADSET_TYPE_ENUM (janitorial)
  ALSA: hda - Provide missing pin configs for VAIO with ALC260
  ALSA: hda - Add headset quirk for Dell Inspiron 3135
  ALSA: hda - Fix the headphone jack detection on Sony VAIO TX
  ALSA: hda - Fix missing bass speaker on ASUS N550
  ALSA: hda - Fix unbalanced runtime PM notification at resume
  ASoC: arizona: Set FLL to free-run before disabling
  ALSA: hda - A casual Dell Headset quirk
  ASoC: rcar: fixup dma_async_issue_pending() timing
  ASoC: rcar: off by one in rsnd_scu_set_route()
  ASoC: wm5110: Add post SYSCLK register patch for rev D chip
  ASoC: ab8500: Revert to using custom I/O functions
  ALSA: hda - Also enable mute/micmute LED control for "Lenovo dock" fixup
  ALSA: firewire-lib: include sound/asound.h to refer to snd_pcm_format_t
  ALSA: hda - Select FW_LOADER from CONFIG_SND_HDA_CODEC_CA0132_DSP
  ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Realtek codec
  ASoC: rcar: fixup mod access before checking

e48f88a3

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · aecde27c

由 Linus Torvalds 提交于 11月 22, 2013

Pull DRM fixes from Dave Airlie:
 "I was going to leave this until post -rc1 but sysfs fixes broke
  hotplug in userspace, so I had to fix it harder, otherwise a set of
  pulls from intel, radeon and vmware,

  The vmware/ttm changes are bit larger but since its early and they are
  unlikely to break anything else I put them in, it lets vmware work
  with dri3"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (36 commits)
  drm/sysfs: fix hotplug regression since lifetime changes
  drm/exynos: g2d: fix memory leak to userptr
  drm/i915: Fix gen3 self-refresh watermarks
  drm/ttm: Remove set_need_resched from the ttm fault handler
  drm/ttm: Don't move non-existing data
  drm/radeon: hook up backlight functions for CI and KV family.
  drm/i915: Replicate BIOS eDP bpp clamping hack for hsw
  drm/i915: Do not enable package C8 on unsupported hardware
  drm/i915: Hold pc8 lock around toggling pc8.gpu_idle
  drm/i915: encoder->get_config is no longer optional
  drm/i915/tv: add ->get_config callback
  drm/radeon/cik: Add macrotile mode array query
  drm/radeon/cik: Return backend map information to userspace
  drm/vmwgfx: Make vmwgfx dma buffers prime aware
  drm/vmwgfx: Make surfaces prime-aware
  drm/vmwgfx: Hook up the prime ioctls
  drm/ttm: Add a minimal prime implementation for ttm base objects
  drm/vmwgfx: Fix false lockdep warning
  drm/ttm: Allow execbuf util reserves without ticket
  drm/i915: restore the early forcewake cleanup
  ...

aecde27c

Merge tag 'pci-v3.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · e3414786

由 Linus Torvalds 提交于 11月 22, 2013

Pull PCI updates from Bjorn Helgaas:
 "Miscellaneous
   - Remove duplicate disable from pcie_portdrv_remove() (Yinghai Lu)
   - Fix whitespace, capitalization, and spelling errors (Bjorn Helgaas)"

* tag 'pci-v3.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove()
  PCI: Fix whitespace, capitalization, and spelling errors

e3414786

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · b0e3636f

由 Linus Torvalds 提交于 11月 22, 2013

Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...

b0e3636f

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 0032cdef

由 Linus Torvalds 提交于 11月 22, 2013

Pull hwmon fixes from Guenter Roeck:
 - acpi_power_meter: Fix return value check from call to
   acpi_bus_get_device
 - nct6775: Fix/improve NCT6791 support
 - lm75: Add support for GMT G751

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (acpi_power_meter) Fix acpi_bus_get_device() return value check
  hwmon: (nct6775) NCT6791 supports weight control only for CPUFAN
  hwmon: (nct6775) Monitor additional temperature registers
  hwmon: (lm75) Add support for GMT G751 chip

0032cdef

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · d2c2ad54

由 Linus Torvalds 提交于 11月 22, 2013

Pull networking fixes from David Miller:

 1) Fix memory leaks and other issues in mwifiex driver, from Amitkumar
    Karwar.

 2) skb_segment() can choke on packets using frag lists, fix from
    Herbert Xu with help from Eric Dumazet and others.

 3) IPv4 output cached route instantiation properly handles races
    involving two threads trying to install the same route, but we
    forgot to propagate this logic to input routes as well.  Fix from
    Alexei Starovoitov.

 4) Put protections in place to make sure that recvmsg() paths never
    accidently copy uninitialized memory back into userspace and also
    make sure that we never try to use more that sockaddr_storage for
    building the on-kernel-stack copy of a sockaddr.  Fixes from Hannes
    Frederic Sowa.

 5) R8152 driver transmit flow bug fixes from Hayes Wang.

 6) Fix some minor fallouts from genetlink changes, from Johannes Berg
    and Michael Opdenacker.

 7) AF_PACKET sendmsg path can race with netdevice unregister notifier,
    fix by using RCU to make sure the network device doesn't go away
    from under us.  Fix from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  gso: handle new frag_list of frags GRO packets
  genetlink: fix genl_set_err() group ID
  genetlink: fix genlmsg_multicast() bug
  packet: fix use after free race in send path when dev is released
  xen-netback: stop the VIF thread before unbinding IRQs
  wimax: remove dead code
  net/phy: Add the autocross feature for forced links on VSC82x4
  net/phy: Add VSC8662 support
  net/phy: Add VSC8574 support
  net/phy: Add VSC8234 support
  net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage)
  net: rework recvmsg handler msg_name and msg_namelen logic
  bridge: flush br's address entry in fdb when remove the
  net: core: Always propagate flag changes to interfaces
  ipv4: fix race in concurrent ip_route_input_slow()
  r8152: fix incorrect type in assignment
  r8152: support stopping/waking tx queue
  r8152: modify the tx flow
  r8152: fix tx/rx memory overflow
  netfilter: ebt_ip6: fix source and destination matching
  ...

d2c2ad54

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · 7fa850ab

由 Linus Torvalds 提交于 11月 22, 2013

Pull ARM fixes from Russell King:
 "Some small fixes for this merge window, most of them quite self
  explanatory - the biggest thing here is a fix for the ARMv7 LPAE
  suspend/resume support"

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7894/1: kconfig: select GENERIC_CLOCKEVENTS if HAVE_ARM_ARCH_TIMER
  ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP
  ARM: 7892/1: Fix warning for V7M builds
  ARM: 7888/1: seccomp: not compatible with ARM OABI
  ARM: 7886/1: make OABI default to off
  ARM: 7885/1: Save/Restore 64-bit TTBR registers on LPAE suspend/resume
  ARM: 7884/1: mm: Fix ECC mem policy printk
  ARM: 7883/1: fix mov to mvn conversion in case of 64 bit phys_addr_t and BE
  ARM: 7882/1: mm: fix __phys_to_virt to work with 64 bit phys_addr_t in BE case
  ARM: 7881/1: __fixup_smp read of SCU config should do byteswap in BE case
  ARM: Fix nommu.c build warning

7fa850ab

Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm · c874e6fc

由 Linus Torvalds 提交于 11月 22, 2013

Pull KVM fixes from Gleb Natapov.

* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: kvm_clear_guest_page(): fix empty_zero_page usage
  kvm: mmu: delay mmu audit activation
  arm/arm64: KVM: Fix hyp mappings of vmalloc regions

c874e6fc

Merge git://git.kvack.org/~bcrl/aio-next · d0f278c1

由 Linus Torvalds 提交于 11月 22, 2013

Pull aio fixes from Benjamin LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: nullify aio->ring_pages after freeing it
  aio: prevent double free in ioctx_alloc
  aio: Fix a trinity splat

d0f278c1

Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linux · 533db9b3

由 Linus Torvalds 提交于 11月 22, 2013

Pull nfsd bugfixes from Bruce Fields:
 "A couple nfsd bugfixes"

* 'for-3.13' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix xdr decoding of large non-write compounds
  nfsd: make sure to balance get/put_write_access
  nfsd: split up nfsd_setattr

533db9b3

Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes · c85e0727

由 Linus Torvalds 提交于 11月 22, 2013

Pull GFS2 fixes from Steven Whitehouse:
 "A couple of small, but important bug fixes for GFS2.  The first one
  fixes a possible NULL pointer dereference, and the second one resolves
  a reference counting issue in one of the lesser used paths through
  atomic_open"

* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Fix ref count bug relating to atomic_open
  GFS2: fix potential NULL pointer dereference

c85e0727

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · fb0d1eb8

由 Linus Torvalds 提交于 11月 22, 2013

Pull btrfs fixes from Chris Mason:
 "Almost all of these are bug fixes.  Dave Sterba's documentation update
  is the big exception because he removed our promises to set any
  machine running Btrfs on fire"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Documentation: filesystems: update btrfs tools section
  Documentation: filesystems: add new btrfs mount options
  btrfs: update kconfig help text
  btrfs: fix bio_size_ok() for max_sectors > 0xffff
  btrfs: Use trace condition for get_extent tracepoint
  btrfs: fix typo in the log message
  Btrfs: fix list delete warning when removing ordered root from the list
  Btrfs: print bytenr instead of page pointer in check-int
  Btrfs: remove dead codes from ctree.h
  Btrfs: don't wait for ordered data outside desired range
  Btrfs: fix lockdep error in async commit
  Btrfs: avoid heavy operations in btrfs_commit_super
  Btrfs: fix __btrfs_start_workers retval
  Btrfs: disable online raid-repair on ro mounts
  Btrfs: do not inc uncorrectable_errors counter on ro scrubs
  Btrfs: only drop modified extents if we logged the whole inode
  Btrfs: make sure to copy everything if we rename
  Btrfs: don't BUG_ON() if we get an error walking backrefs

fb0d1eb8

Merge tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfs · 6ea9786e

由 Linus Torvalds 提交于 11月 22, 2013

Pull second xfs update from Ben Myers:
 "There are a couple of patches that I wasn't quite sure about in time
  for our initial 3.13 pull request, a bugfix, and an update to add Dave
  to MAINTAINERS:

  Here we have a performance fix for inode iversion, increased inode
  cluster size for v5 superblock filesystems, a fix for error handling
  in xfs_bmap_add_attrfork, and a MAINTAINERS update to add Dave"

* tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: open code inc_inode_iversion when logging an inode
  xfs: increase inode cluster size for v5 filesystems
  xfs: fix unlock in xfs_bmap_add_attrfork
  xfs: update maintainers

6ea9786e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功