提交 · 86a3db5643c7d29bb36ca85c7a4bb67ad4d88d77 · openeuler / raspberrypi-kernel

25 1月, 2013 3 次提交

cgroup: remove duplicate RCU free on struct cgroup · 86a3db56

由 Li Zefan 提交于 1月 24, 2013

When destroying a cgroup, though in cgroup_diput() we've called
synchronize_rcu(), we then still have to free it via call_rcu().

The story is, long ago to fix a race between reading /proc/sched_debug
and freeing cgroup, the code was changed to utilize call_rcu(). See
commit a47295e6 ("cgroups: make
cgroup_path() RCU-safe")

As we've fixed cpu cgroup that cpu_cgroup_offline_css() is used
to unregister a task_group so there won't be concurrent access
to this task_group after synchronize_rcu() in diput(). Now we can
just kfree(cgrp).
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

86a3db56

cgroup: initialize cgrp->dentry before css_alloc() · fe1c06ca

由 Li Zefan 提交于 1月 24, 2013

With this change, we're guaranteed that cgroup_path() won't see NULL
cgrp->dentry, and thus we can remove the NULL check in it.

(Well, it's not strictly true, because dummptop.dentry is always NULL
 but we already handle that separately.)
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

fe1c06ca

cgroup: remove a NULL check in cgroup_exit() · b5d646f5

由 Li Zefan 提交于 1月 24, 2013

init_task.cgroups is initialized at boot phase, and whenver a ask
is forked, it's cgroups pointer is inherited from its parent, and
it's never set to NULL afterwards.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

b5d646f5

23 1月, 2013 1 次提交

cgroup: fix bogus kernel warnings when cgroup_create() failed · 2739d3cc

由 Li Zefan 提交于 1月 21, 2013

If cgroup_create() failed and cgroup_destroy_locked() is called to
do cleanup, we'll see a bunch of warnings:

cgroup_addrm_files: failed to remove 2MB.limit_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.usage_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.max_usage_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.failcnt, err=-2
cgroup_addrm_files: failed to remove prioidx, err=-2
cgroup_addrm_files: failed to remove ifpriomap, err=-2
...

We failed to remove those files, because cgroup_create() has failed
before creating those cgroup files.

To fix this, we simply don't warn if cgroup_rm_file() can't find the
cft entry.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

2739d3cc

15 1月, 2013 2 次提交

cgroup: remove synchronize_rcu() from rebind_subsystems() · 130e3695

由 Li Zefan 提交于 1月 14, 2013

Nothing's protected by RCU in rebind_subsystems(), and I can't think
of a reason why it is needed.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

130e3695

cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() · 5d65bc0c

由 Li Zefan 提交于 1月 14, 2013

These 2 syncronize_rcu()s make attaching a task to a cgroup
quite slow, and it can't be ignored in some situations.

A real case from Colin Cross: Android uses cgroups heavily to
manage thread priorities, putting threads in a background group
with reduced cpu.shares when they are not visible to the user,
and in a foreground group when they are. Some RPCs from foreground
threads to background threads will temporarily move the background
thread into the foreground group for the duration of the RPC.
This results in many calls to cgroup_attach_task.

In cgroup_attach_task() it's task->cgroups that is protected by RCU,
and put_css_set() calls kfree_rcu() to free it.

If we remove this synchronize_rcu(), there can be threads in RCU-read
sections accessing their old cgroup via current->cgroups with
concurrent rmdir operation, but this is safe.

 # time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; }

real    0m2.524s
user    0m0.008s
sys     0m0.004s

With this patch:

real    0m0.004s
user    0m0.004s
sys     0m0.000s

tj: These synchronize_rcu()s are utterly confused.  synchornize_rcu()
    necessarily has to come between two operations to guarantee that
    the changes made by the former operation are visible to all rcu
    readers before proceeding to the latter operation.  Here,
    synchornize_rcu() are at the end of attach operations with nothing
    beyond it.  Its only effect would be delaying completion of
    write(2) to sysfs tasks/procs files until all rcu readers see the
    change, which doesn't mean anything.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NColin Cross <ccross@google.com>

5d65bc0c

11 1月, 2013 1 次提交

cgroup: use new hashtable implementation · 0ac801fe

由 Li Zefan 提交于 1月 10, 2013

Switch cgroup to use the new hashtable implementation. No functional changes.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

0ac801fe

08 1月, 2013 1 次提交

cgroup: implement cgroup_rightmost_descendant() · 12a9d2fe

由 Tejun Heo 提交于 1月 07, 2013

Implement cgroup_rightmost_descendant() which returns the right most
descendant of the specified cgroup.  This can be used to skip the
cgroup's subtree while iterating with
cgroup_for_each_descendant_pre().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NLi Zefan <lizefan@huawei.com>

12a9d2fe

18 12月, 2012 1 次提交

kernel: remove reference to feature-removal-schedule.txt · 8ec7d50f

由 Tao Ma 提交于 12月 17, 2012

In commit 9c0ece06 ("Get rid of Documentation/feature-removal.txt"),
Linus removed feature-removal-schedule.txt from Documentation, but there
is still some reference to this file.  So remove them.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ec7d50f

07 12月, 2012 1 次提交

cgroup_rm_file: don't delete the uncreated files · f33fddc2

由 Gao feng 提交于 12月 06, 2012

in cgroup_add_file,when creating files for cgroup,
some of creation may be skipped. So we need to avoid
deleting these uncreated files in cgroup_rm_file,
otherwise the warning msg will be triggered.

"cgroup_addrm_files: failed to remove memory_pressure_enabled, err=-2"
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@redhat.com>
Cc: stable@vger.kernel.org

f33fddc2

04 12月, 2012 1 次提交

cgroup: remove subsystem files when remounting cgroup · 7083d037

由 Gao feng 提交于 12月 03, 2012

cgroup_clear_directroy is called by cgroup_d_remove_dir
and cgroup_remount.

when we call cgroup_remount to remount the cgroup,the subsystem
may be unlinked from cgroupfs_root->subsys_list in rebind_subsystem,this
subsystem's files will not be removed in cgroup_clear_directroy.
And the system will panic when we try to access these files.

this patch removes subsystems's files before rebind_subsystems,
if rebind_subsystems failed, repopulate these removed files.

With help from Tejun.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

7083d037

01 12月, 2012 1 次提交

cgroup: use cgroup_addrm_files() in cgroup_clear_directory() · 879a3d9d

由 Gao feng 提交于 12月 01, 2012

cgroup_clear_directory() incorrectly invokes cgroup_rm_file() on each
cftset of the target subsystems, which only removes the first file of
each set.  This leaves dangling files after subsystems are removed
from a cgroup root via remount.

Use cgroup_addrm_files() to remove all files of target subsystems.

tj: Move cgroup_addrm_files() prototype decl upwards next to other
    global declarations.  Commit message updated.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

879a3d9d

30 11月, 2012 1 次提交

cgroup: warn about broken hierarchies only after css_online · 1f869e87

由 Glauber Costa 提交于 11月 30, 2012

If everything goes right, it shouldn't really matter if we are spitting
this warning after css_alloc or css_online. If we fail between then,
there are some ill cases where we would previously see the message and
now we won't (like if the files fail to be created).

I believe it really shouldn't matter: this message is intended in spirit
to be shown when creation succeeds, but with insane settings.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

1f869e87

29 11月, 2012 2 次提交

cgroup: list_del_init() on removed events · 9718ceb3

由 Greg Thelen 提交于 11月 28, 2012

Use list_del_init() rather than list_del() to remove events from
cgrp->event_list.  No functional change.  This is just defensive
coding.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

9718ceb3

cgroup: fix lockdep warning for event_control · 205a872b

由 Greg Thelen 提交于 11月 28, 2012

The cgroup_event_wake() function is called with the wait queue head
locked and it takes cgrp->event_list_lock. However, in cgroup_rmdir()
remove_wait_queue() was being called after taking
cgrp->event_list_lock.  Correct the lock ordering by using a temporary
list to obtain the event list to remove from the wait queue.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NAaron Durbin <adurbin@google.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

205a872b

28 11月, 2012 1 次提交

cgroup: move list add after list head initilization · fddfb02a

由 Li Zhong 提交于 11月 28, 2012

2243076a ("cgroup: initialize cgrp->allcg_node in
init_cgroup_housekeeping()") initializes cgrp->allcg_node in
init_cgroup_housekeeping().  Then in init_cgroup_root(), we should
call init_cgroup_housekeeping() before adding it to &root->allcg_list;
otherwise, we are initializing an entry already in a list.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

fddfb02a

20 11月, 2012 20 次提交

cgroup: remove obsolete guarantee from cgroup_task_migrate. · d0b2fdd2

由 Tao Ma 提交于 11月 20, 2012

'guarantee' is already removed from cgroup_task_migrate, so remove
the corresponding comments. Some other typos in cgroup are also
changed.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

d0b2fdd2

cgroup: add cgroup->id · 0a950f65

由 Tejun Heo 提交于 11月 19, 2012

With the introduction of generic cgroup hierarchy iterators, css_id is
being phased out.  It was unnecessarily complex, id'ing the wrong
thing (cgroups need IDs, not CSSes) and has other oddities like not
being available at ->css_alloc().

This patch adds cgroup->id, which is a simple per-hierarchy
ida-allocated ID which is assigned before ->css_alloc() and released
after ->css_free().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>

0a950f65

cgroup, cpuset: remove cgroup_subsys->post_clone() · 033fa1c5

由 Tejun Heo 提交于 11月 19, 2012

Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone().  Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core.  cpuset can
check the flag from its ->css_allocate() and take the necessary
action.

Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().

Loosely based on Glauber's "generalize post_clone into post_create"
patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Original-patch-by: NGlauber Costa <glommer@parallels.com>
Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>

033fa1c5

cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ · 2260e7fc

由 Tejun Heo 提交于 11月 19, 2012

clone_children is only meaningful for cpuset and will stay that way.
Rename the flag to reflect that and update documentation.  Also, drop
clone_children() wrapper in cgroup.c.  The thin wrapper is used only a
few times and one of them will go away soon.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>

2260e7fc

cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() · 92fb9748

由 Tejun Heo 提交于 11月 19, 2012

Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are.  Also, update documentation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

92fb9748

cgroup: allow ->post_create() to fail · b1929db4

由 Tejun Heo 提交于 11月 19, 2012

There could be cases where controllers want to do initialization
operations which may fail from ->post_create().  This patch makes
->post_create() return -errno to indicate failure and online_css()
relay such failures.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>

b1929db4

cgroup: update cgroup_create() failure path · 4b8b47eb

由 Tejun Heo 提交于 11月 19, 2012

cgroup_create() was ignoring failure of cgroupfs files.  Update it
such that, if file creation fails, it rolls back by calling
cgroup_destroy_locked() and returns failure.

Note that error out goto labels are renamed.  The labels are a bit
confusing but will become better w/ later cgroup operation renames.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

4b8b47eb

cgroup: use mutex_trylock() when grabbing i_mutex of a new cgroup directory · b8a2df6a

由 Tejun Heo 提交于 11月 19, 2012

All cgroup directory i_mutexes nest outside cgroup_mutex; however, new
directory creation is a special case.  A new cgroup directory is
created while holding cgroup_mutex.  Populating the new directory
requires both the new directory's i_mutex and cgroup_mutex.  Because
all directory i_mutexes nest outside cgroup_mutex, grabbing both
requires releasing cgroup_mutex first, which isn't a good idea as the
new cgroup isn't yet ready to be manipulated by other cgroup
opreations.

This is worked around by grabbing the new directory's i_mutex while
holding cgroup_mutex before making it visible.  As there's no other
user at that point, grabbing the i_mutex under cgroup_mutex can't lead
to deadlock.

cgroup_create_file() was using I_MUTEX_CHILD to tell lockdep not to
worry about the reverse locking order; however, this creates pseudo
locking dependency cgroup_mutex -> I_MUTEX_CHILD, which isn't true -
all directory i_mutexes are still nested outside cgroup_mutex.  This
pseudo locking dependency can lead to spurious lockdep warnings.

Use mutex_trylock() instead.  This will always succeed and lockdep
doesn't create any locking dependency for it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

b8a2df6a

cgroup: simplify cgroup_load_subsys() failure path · d19e19de

由 Tejun Heo 提交于 11月 19, 2012

Now that cgroup_unload_subsys() can tell whether the root css is
online or not, we can safely call cgroup_unload_subsys() after idr
init failure in cgroup_load_subsys().

Replace the manual unrolling and invoke cgroup_unload_subsys() on
failure.  This drops cgroup_mutex inbetween but should be safe as the
subsystem will fail try_module_get() and thus can't be mounted
inbetween.  As this means that cgroup_unload_subsys() can be called
before css_sets are rehashed, remove BUG_ON() on %NULL
css_set->subsys[] from cgroup_unload_subsys().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

d19e19de

cgroup: introduce CSS_ONLINE flag and on/offline_css() helpers · a31f2d3f

由 Tejun Heo 提交于 11月 19, 2012

New helpers on/offline_css() respectively wrap ->post_create() and
->pre_destroy() invocations.  online_css() sets CSS_ONLINE after
->post_create() is complete and offline_css() invokes ->pre_destroy()
iff CSS_ONLINE is set and clears it while also handling the temporary
dropping of cgroup_mutex.

This patch doesn't introduce any behavior change at the moment but
will be used to improve cgroup_create() failure path and allow
->post_create() to fail.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

a31f2d3f

cgroup: separate out cgroup_destroy_locked() · 42809dd4

由 Tejun Heo 提交于 11月 19, 2012

Separate out cgroup_destroy_locked() from cgroup_destroy().  This will
be later used in cgroup_create() failure path.

While at it, add lockdep asserts on i_mutex and cgroup_mutex, and move
@d and @parent assignments to their declarations.

This patch doesn't introduce any functional difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

42809dd4

cgroup: fix harmless bugs in cgroup_load_subsys() fail path and cgroup_unload_subsys() · 02ae7486

由 Tejun Heo 提交于 11月 19, 2012

* If idr init fails, cgroup_load_subsys() cleared dummytop->subsys[]
  before calilng ->destroy() making CSS inaccessible to the callback,
  and didn't unlink ss->sibling.  As no modular controller uses
  ->use_id, this doesn't cause any actual problems.

* cgroup_unload_subsys() was forgetting to free idr, call
  ->pre_destroy() and clear ->active.  As there currently is no
  modular controller which uses ->use_id, ->pre_destroy() or ->active,
  this doesn't cause any actual problems.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

02ae7486

cgroup: lock cgroup_mutex in cgroup_init_subsys() · 648bb56d

由 Tejun Heo 提交于 11月 19, 2012

Make cgroup_init_subsys() grab cgroup_mutex while initializing a
subsystem so that all helpers and callbacks are called under the
context they expect.  This isn't strictly necessary as
cgroup_init_subsys() doesn't race with anybody but will allow adding
lockdep assertions.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

648bb56d

cgroup: trivial cleanup for cgroup_init/load_subsys() · b48c6a80

由 Tejun Heo 提交于 11月 19, 2012

Consistently use @css and @dummytop in these two functions instead of
referring to them indirectly.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

b48c6a80

cgroup: make CSS_* flags bit masks instead of bit positions · 38b53aba

由 Tejun Heo 提交于 11月 19, 2012

Currently, CSS_* flags are defined as bit positions and manipulated
using atomic bitops.  There's no reason to use atomic bitops for them
and bit positions are clunkier to deal with than bit masks.  Make
CSS_* bit masks instead and use the usual C bitwise operators to
access them.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

38b53aba

cgroup: cgroup->dentry isn't a RCU pointer · febfcef6

由 Tejun Heo 提交于 11月 19, 2012

cgroup->dentry is marked and used as a RCU pointer; however, it isn't
one - the final dentry put doesn't go through call_rcu(). cgroup and
dentry share the same RCU freeing rule via synchronize_rcu() in
cgroup_diput() (kfree_rcu() used on cgrp is unnecessary). If cgrp is
accessible under RCU read lock, so is its dentry and dereferencing
cgrp->dentry doesn't need any further RCU protection or annotation.

While not being accurate, before the previous patch, the RCU accessors
served a purpose as memory barriers - cgroup->dentry used to be
assigned after the cgroup was made visible to cgroup_path(), so the
assignment and dereferencing in cgroup_path() needed the memory
barrier pair. Now that list_add_tail_rcu() happens after
cgroup->dentry is assigned, this no longer is necessary.

Remove the now unnecessary and misleading RCU annotations from
cgroup->dentry. To make up for the removal of rcu_dereference_check()
in cgroup_path(), add an explicit rcu_lockdep_assert(), which asserts
the dereference rule of @cgrp, not cgrp->dentry.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

febfcef6

cgroup: create directory before linking while creating a new cgroup · 4e139afc

由 Tejun Heo 提交于 11月 19, 2012

While creating a new cgroup, cgroup_create() links the newly allocated
cgroup into various places before trying to create its directory.
Because cgroup life-cycle is tied to the vfs objects, this makes it
impossible to use cgroup_rmdir() for rolling back creation - the
removal logic depends on having full vfs objects.

This patch moves directory creation above linking and collect linking
operations to one place.  This allows directory creation failure to
share error exit path with css allocation failures and any failure
sites afterwards (to be added later) can use cgroup_rmdir() logic to
undo creation.

Note that this also makes the memory barriers around cgroup->dentry,
which currently is misleadingly using RCU operations, unnecessary.
This will be handled in the next patch.

While at it, locking BUG_ON() on i_mutex is converted to
lockdep_assert_held().

v2: Patch originally removed %NULL dentry check in cgroup_path();
    however, Li pointed out that this patch doesn't make it
    unnecessary as ->create() may call cgroup_path().  Drop the
    change for now.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

4e139afc

cgroup: open-code cgroup_create_dir() · 28fd6f30

由 Tejun Heo 提交于 11月 19, 2012

The operation order of cgroup creation is about to change and
cgroup_create_dir() is more of a hindrance than a proper abstraction.
Open-code it by moving the parent nlink adjustment next to self nlink
adjustment in cgroup_create_file() and the rest to cgroup_create().

This patch doesn't introduce any behavior change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

28fd6f30

cgroup: initialize cgrp->allcg_node in init_cgroup_housekeeping() · 2243076a

由 Tejun Heo 提交于 11月 19, 2012

Not strictly necessary but it's annoying to have uninitialized
list_head around.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

2243076a

cgroup: remove incorrect dget/dput() pair in cgroup_create_dir() · 17543163

由 Tejun Heo 提交于 11月 19, 2012

cgroup_create_dir() does weird dancing with dentry refcnt.  On
success, it gets and then puts it achieving nothing.  On failure, it
puts but there isn't no matching get anywhere leading to the following
oops if cgroup_create_file() fails for whatever reason.

  ------------[ cut here ]------------
  kernel BUG at /work/os/work/fs/dcache.c:552!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  CPU 2
  Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
  RIP: 0010:[<ffffffff811d9c0c>]  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
  RSP: 0018:ffff88001a3ebef8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
  RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
  RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
  R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
  R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
  FS:  00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
  Stack:
   ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
   ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
   ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
  Call Trace:
   [<ffffffff811cc889>] done_path_create+0x19/0x50
   [<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80
   [<ffffffff811d2009>] sys_mkdir+0x19/0x20
   [<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b
  Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
  RIP  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
   RSP <ffff88001a3ebef8>
  ---[ end trace 1277bcfd9561ddb0 ]---

Fix it by dropping the unnecessary dget/dput() pair.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org

17543163

19 11月, 2012 1 次提交

pidns: Use task_active_pid_ns where appropriate · 17cf22c3

由 Eric W. Biederman 提交于 3月 02, 2010

The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
aka ns_of_pid(task_pid(tsk)) should have the same number of
cache line misses with the practical difference that
ns_of_pid(task_pid(tsk)) is released later in a processes life.

Furthermore by using task_active_pid_ns it becomes trivial
to write an unshare implementation for the the pid namespace.

So I have used task_active_pid_ns everywhere I can.

In fork since the pid has not yet been attached to the
process I use ns_of_pid, to achieve the same effect.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

17cf22c3

10 11月, 2012 3 次提交

cgroup: implement generic child / descendant walk macros · 574bd9f7

由 Tejun Heo 提交于 11月 09, 2012

Currently, cgroup doesn't provide any generic helper for walking a
given cgroup's children or descendants.  This patch adds the following
three macros.

* cgroup_for_each_child() - walk immediate children of a cgroup.

* cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
  in pre-order tree traversal.

* cgroup_for_each_descendant_post() - visit all descendants of a
  cgroup in post-order tree traversal.

All three only require the user to hold RCU read lock during
traversal.  Verifying that each iterated cgroup is online is the
responsibility of the user.  When used with proper synchronization,
cgroup_for_each_descendant_pre() can be used to propagate state
updates to descendants in reliable way.  See comments for details.

v2: s/config/state/ in commit message and comments per Michal.  More
    documentation on synchronization rules.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NLi Zefan <lizefan@huawei.com>

574bd9f7

cgroup: use rculist ops for cgroup->children · eb6fd504

由 Tejun Heo 提交于 11月 09, 2012

Use RCU safe list operations for cgroup->children.  This will be used
to implement cgroup children / descendant walking which can be used by
controllers.

Note that cgroup_create() now puts a new cgroup at the end of the
->children list instead of head.  This isn't strictly necessary but is
done so that the iteration order is more conventional.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>

eb6fd504

cgroup: add cgroup_subsys->post_create() · a8638030

由 Tejun Heo 提交于 11月 09, 2012

Currently, there's no way for a controller to find out whether a new
cgroup finished all ->create() allocatinos successfully and is
considered "live" by cgroup.

This becomes a problem later when we add generic descendants walking
to cgroup which can be used by controllers as controllers don't have a
synchronization point where it can synchronize against new cgroups
appearing in such walks.

This patch adds ->post_create().  It's called after all ->create()
succeeded and the cgroup is linked into the generic cgroup hierarchy.
This plays the counterpart of ->pre_destroy().

When used in combination with the to-be-added generic descendant
iterators, ->post_create() can be used to implement reliable state
inheritance.  It will be explained with the descendant iterators.

v2: Added a paragraph about its future use w/ descendant iterators per
    Michal.

v3: Forgot to add ->post_create() invocation to cgroup_load_subsys().
    Fixed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>

a8638030