1. 15 4月, 2013 4 次提交
    • T
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo 提交于
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • T
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo 提交于
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      25a7e684
    • T
      cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629
      Tejun Heo 提交于
      There's no reason to be using bitops, which tends to be more
      cumbersome, to handle root flags.  Convert them to masks.  Also, as
      they'll be moved to include/linux/cgroup.h and it's generally a good
      idea, add CGRP_ prefix.
      
      Note that flags are assigned from (1 << 1).  The first bit will be
      used by a flag which will be added soon.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      93438629
    • T
      cgroup: make cgroup_path() not print double slashes · da1f296f
      Tejun Heo 提交于
      While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
      cgroup_path() vs rename() race") introduced a bug where the path of a
      non-root cgroup would have two slahses at the beginning, which is
      caused by treating the root cgroup which has the name '/' like
      non-root cgroups.
      
       $ grep systemd /proc/self/cgroup
       1:name=systemd://user/root/1
      
      Fix it by special casing root cgroup case and not looping over it in
      the normal path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      da1f296f
  2. 13 4月, 2013 1 次提交
  3. 11 4月, 2013 4 次提交
    • T
      perf: make perf_event cgroup hierarchical · ef824fa1
      Tejun Heo 提交于
      perf_event is one of a couple remaining cgroup controllers with broken
      hierarchy support.  Converting it to support hierarchy is almost
      trivial.  The only thing necessary is to consider a task belonging to
      a descendant cgroup as a match.  IOW, if the cgroup of the currently
      executing task (@cpuctx->cgrp) equals or is a descendant of the
      event's cgroup (@event->cgrp), then the event should be enabled.
      
      Implement hierarchy support and remove .broken_hierarchy tag along
      with the incorrect comment on what needs to be done for hierarchy
      support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      ef824fa1
    • L
      cgroup: implement cgroup_is_descendant() · 78574cf9
      Li Zefan 提交于
      A couple controllers want to determine whether two cgroups are in
      ancestor/descendant relationship.  As it's more likely that the
      descendant is the primary subject of interest and there are other
      operations focusing on the descendants, let's ask is_descendent rather
      than is_ancestor.
      
      Implementation is trivial as the previous patch guarantees that all
      ancestors of a cgroup stay accessible as long as the cgroup is
      accessible.
      
      tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
          rewrote descriptions.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      78574cf9
    • L
      cgroup: make sure parent won't be destroyed before its children · 415cf07a
      Li Zefan 提交于
      Suppose we rmdir a cgroup and there're still css refs, this cgroup won't
      be freed. Then we rmdir the parent cgroup, and the parent is freed
      immediately due to css ref draining to 0. Now it would be a disaster if
      the still-alive child cgroup tries to access its parent.
      
      Make sure this won't happen.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      415cf07a
    • R
      cgroup: remove bind() method from cgroup_subsys. · 84cfb6ab
      Rami Rosen 提交于
      The bind() method of cgroup_subsys is not used in any of the
      controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
      devices, perf, hugetlb, cpu and cpuacct)
      
      tj: Removed the entry on ->bind() from
          Documentation/cgroups/cgroups.txt.  Also updated a couple
          paragraphs which were suggesting that dynamic re-binding may be
          implemented.  It's not gonna.
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      84cfb6ab
  4. 08 4月, 2013 7 次提交
  5. 04 4月, 2013 2 次提交
  6. 20 3月, 2013 6 次提交
    • L
      cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458
      Li Zefan 提交于
      These two functions share most of the code.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      081aa458
    • A
      devcg: propagate local changes down the hierarchy · bd2953eb
      Aristeu Rozanski 提交于
      This patch makes exception changes to propagate down in hierarchy respecting
      when possible local exceptions.
      
      New exceptions allowing additional access to devices won't be propagated, but
      it'll be possible to add an exception to access all of part of the newly
      allowed device(s).
      
      New exceptions disallowing access to devices will be propagated down and the
      local group's exceptions will be revalidated for the new situation.
      Example:
            A
           / \
              B
      
          group        behavior          exceptions
          A            allow             "b 8:* rwm", "c 116:1 rw"
          B            deny              "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
      
      If a new exception is added to group A:
      	# echo "c 116:* r" > A/devices.deny
      it'll propagate down and after revalidating B's local exceptions, the exception
      "c 116:2 rwm" will be removed.
      
      In case parent's exceptions change and local exceptions are not allowed anymore,
      they'll be deleted.
      
      v7:
      - do not allow behavior change when the cgroup has children
      - update documentation
      
      v6: fixed issues pointed by Serge Hallyn
      - only copy parent's exceptions while propagating behavior if the local
        behavior is different
      - while propagating exceptions, do not clear and copy parent's: it'd be against
        the premise we don't propagate access to more devices
      
      v5: fixed issues pointed by Serge Hallyn
      - updated documentation
      - not propagating when an exception is written to devices.allow
      - when propagating a new behavior, clean the local exceptions list if they're
        for a different behavior
      
      v4: fixed issues pointed by Tejun Heo
      - separated function to walk the tree and collect valid propagation targets
      
      v3: fixed issues pointed by Tejun Heo
      - update documentation
      - move css_online/css_offline changes to a new patch
      - use cgroup_for_each_descendant_pre() instead of own descendant walk
      - move exception_copy rework to a separared patch
      - move exception_clean rework to a separated patch
      
      v2: fixed issues pointed by Tejun Heo
      - instead of keeping the local settings that won't apply anymore, remove them
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bd2953eb
    • A
      devcg: use css_online and css_offline · 1909554c
      Aristeu Rozanski 提交于
      Allocate resources and change behavior only when online. This is needed in
      order to determine if a node is suitable for hierarchy propagation or if it's
      being removed.
      
      Locking:
      Both functions take devcgroup_mutex to make changes to device_cgroup structure.
      Hierarchy propagation will also take devcgroup_mutex before walking the
      tree while walking the tree itself is protected by rcu lock.
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1909554c
    • A
      devcg: prepare may_access() for hierarchy support · c39a2a30
      Aristeu Rozanski 提交于
      Currently may_access() is only able to verify if an exception is valid for the
      current cgroup, which has the same behavior. With hierarchy, it'll be also used
      to verify if a cgroup local exception is valid towards its cgroup parent, which
      might have different behavior.
      
      v2:
      - updated patch description
      - rebased on top of a new patch to expand the may_access() logic to make it
        more clear
      - fixed argument description order in may_access()
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c39a2a30
    • A
      devcg: expand may_access() logic · 26898fdf
      Aristeu Rozanski 提交于
      In order to make the next patch more clear, expand may_access() logic.
      
      v2: may_access() returns bool now
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      26898fdf
    • L
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · 3ac1707a
      Li Zefan 提交于
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      3ac1707a
  7. 13 3月, 2013 6 次提交
  8. 06 3月, 2013 3 次提交
  9. 05 3月, 2013 3 次提交
    • L
      cgroup: no need to check css refs for release notification · f50daa70
      Li Zefan 提交于
      We no longer fail rmdir() when there're still css refs, so we don't
      need to check css refs in check_for_release().
      
      This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
      without cgroup_mutex, so it can race with cgroup_unload_subsys().
      
      cgroup_has_css_refs()
      ...
        if (ss == NULL || ss->root != cgrp->root)
      
      if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
      right after the former check but before the latter, the memory that
      net_cls_subsys resides has become invalid.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f50daa70
    • L
      cpuset: use cgroup_name() in cpuset_print_task_mems_allowed() · f440d98f
      Li Zefan 提交于
      Use cgroup_name() instead of cgrp->dentry->name. This makes the code
      a bit simpler.
      
      While at it, remove cpuset_name and make cpuset_nodelist a local variable
      to cpuset_print_task_mems_allowed().
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f440d98f
    • L
      cgroup: fix cgroup_path() vs rename() race · 65dff759
      Li Zefan 提交于
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      As accessing dentry->name must be protected by dentry->d_lock or
      parent inode's i_mutex, while on the other hand cgroup-path() can
      be called with some irq-safe spinlocks held, we can't generate
      cgroup path using dentry->d_name.
      
      Alternatively we make a copy of dentry->d_name and save it in
      cgrp->name when a cgroup is created, and update cgrp->name at
      rename().
      
      v5: use flexible array instead of zero-size array.
      v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
          - add cgroup_name() wrapper.
      v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
      v2: make cgrp->name RCU safe.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      65dff759
  10. 04 3月, 2013 4 次提交
    • L
      Linux 3.9-rc1 · 6dbe51c2
      Linus Torvalds 提交于
      6dbe51c2
    • L
      Merge tag 'disintegrate-fbdev-20121220' of git://git.infradead.org/users/dhowells/linux-headers · ea882c2e
      Linus Torvalds 提交于
      Pull fbdev UAPI disintegration from David Howells:
       "You'll be glad to here that the end is nigh for the UAPI patches.
        Only the fbdev/framebuffer piece remains now that the SCSI stuff has
        gone in.
      
        Here are the UAPI disintegration bits for the fbdev drivers.  It
        appears that Florian hasn't had time to deal with my patch, but back
        in December he did say he didn't mind if I pushed it forward."
      
      Yay.  No more uapi movement.  And hopefully no more big header file
      cleanups coming up either, it just tends to be very painful.
      
      * tag 'disintegrate-fbdev-20121220' of git://git.infradead.org/users/dhowells/linux-headers:
        UAPI: (Scripted) Disintegrate include/video
      ea882c2e
    • L
      Merge tag 'stable/for-linus-3.9-rc1-tag' of... · 8e8b180a
      Linus Torvalds 提交于
      Merge tag 'stable/for-linus-3.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
       - Update the Xen ACPI memory and CPU hotplug locking mechanism.
       - Fix PAT issues wherein various applications would not start
       - Fix handling of multiple MSI as AHCI now does it.
       - Fix ARM compile failures.
      
      * tag 'stable/for-linus-3.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
        xenbus: fix compile failure on ARM with Xen enabled
        xen/pci: We don't do multiple MSI's.
        xen/pat: Disable PAT using pat_enabled value.
        xen/acpi: xen cpu hotplug minor updates
        xen/acpi: xen memory hotplug minor updates
      8e8b180a
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 56a79b7b
      Linus Torvalds 提交于
      Pull  more VFS bits from Al Viro:
       "Unfortunately, it looks like xattr series will have to wait until the
        next cycle ;-/
      
        This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
        etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
        more file_inode() work"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        constify path_get/path_put and fs_struct.c stuff
        fix nommu breakage in shmem.c
        cache the value of file_inode() in struct file
        9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
        9p: make sure ->lookup() adds fid to the right dentry
        9p: untangle ->lookup() a bit
        9p: double iput() in ->lookup() if d_materialise_unique() fails
        9p: v9fs_fid_add() can't fail now
        v9fs: get rid of v9fs_dentry
        9p: turn fid->dlist into hlist
        9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
        more file_inode() open-coded instances
        selinux: opened file can't have NULL or negative ->f_path.dentry
      
      (In the meantime, the hlist traversal macros have changed, so this
      required a semantic conflict fixup for the newly hlistified fid->dlist)
      56a79b7b