1. 05 12月, 2013 1 次提交
    • P
      selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output() · da2ea0d0
      Paul Moore 提交于
      In selinux_ip_output() we always label packets based on the parent
      socket.  While this approach works in almost all cases, it doesn't
      work in the case of TCP SYN-ACK packets when the correct label is not
      the label of the parent socket, but rather the label of the larval
      socket represented by the request_sock struct.
      
      Unfortunately, since the request_sock isn't queued on the parent
      socket until *after* the SYN-ACK packet is sent, we can't lookup the
      request_sock to determine the correct label for the packet; at this
      point in time the best we can do is simply pass/NF_ACCEPT the packet.
      It must be said that simply passing the packet without any explicit
      labeling action, while far from ideal, is not terrible as the SYN-ACK
      packet will inherit any IP option based labeling from the initial
      connection request so the label *should* be correct and all our
      access controls remain in place so we shouldn't have to worry about
      information leaks.
      Reported-by: NJanak Desai <Janak.Desai@gtri.gatech.edu>
      Tested-by: NJanak Desai <Janak.Desai@gtri.gatech.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      da2ea0d0
  2. 26 11月, 2013 1 次提交
  3. 20 11月, 2013 2 次提交
  4. 16 10月, 2013 2 次提交
    • J
      apparmor: fix bad lock balance when introspecting policy · ed2c7da3
      John Johansen 提交于
      BugLink: http://bugs.launchpad.net/bugs/1235977
      
      The profile introspection seq file has a locking bug when policy is viewed
      from a virtual root (task in a policy namespace), introspection from the
      real root is not affected.
      
      The test for root
          while (parent) {
      is correct for the real root, but incorrect for tasks in a policy namespace.
      This allows the task to walk backup the policy tree past its virtual root
      causing it to be unlocked before the virtual root should be in the p_stop
      fn.
      
      This results in the following lockdep back trace:
      [   78.479744] [ BUG: bad unlock balance detected! ]
      [   78.479792] 3.11.0-11-generic #17 Not tainted
      [   78.479838] -------------------------------------
      [   78.479885] grep/2223 is trying to release lock (&ns->lock) at:
      [   78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
      [   78.480002] but there are no more locks to release!
      [   78.480037]
      [   78.480037] other info that might help us debug this:
      [   78.480037] 1 lock held by grep/2223:
      [   78.480037]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0
      [   78.480037]
      [   78.480037] stack backtrace:
      [   78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic #17
      [   78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   78.480037]  ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190
      [   78.480037]  ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190
      [   78.480037]  ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7
      [   78.480037] Call Trace:
      [   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
      [   78.480037]  [<ffffffff817b97ef>] dump_stack+0x54/0x74
      [   78.480037]  [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100
      [   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
      [   78.480037]  [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300
      [   78.480037]  [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180
      [   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
      [   78.480037]  [<ffffffff810e5d5c>] lock_release+0xac/0x310
      [   78.480037]  [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180
      [   78.480037]  [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
      [   78.480037]  [<ffffffff81376c91>] p_stop+0x51/0x90
      [   78.480037]  [<ffffffff81211408>] seq_read+0x288/0x3d0
      [   78.480037]  [<ffffffff811e9d9e>] vfs_read+0x9e/0x170
      [   78.480037]  [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0
      [   78.480037]  [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      ed2c7da3
    • J
      apparmor: fix memleak of the profile hash · 5cb3e91e
      John Johansen 提交于
      BugLink: http://bugs.launchpad.net/bugs/1235523
      
      This fixes the following kmemleak trace:
      unreferenced object 0xffff8801e8c35680 (size 32):
        comm "apparmor_parser", pid 691, jiffies 4294895667 (age 13230.876s)
        hex dump (first 32 bytes):
          e0 d3 4e b5 ac 6d f4 ed 3f cb ee 48 1c fd 40 cf  ..N..m..?..H..@.
          5b cc e9 93 00 00 00 00 00 00 00 00 00 00 00 00  [...............
        backtrace:
          [<ffffffff817a97ee>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811ca9f3>] __kmalloc+0x103/0x290
          [<ffffffff8138acbc>] aa_calc_profile_hash+0x6c/0x150
          [<ffffffff8138074d>] aa_unpack+0x39d/0xd50
          [<ffffffff8137eced>] aa_replace_profiles+0x3d/0xd80
          [<ffffffff81376937>] profile_replace+0x37/0x50
          [<ffffffff811e9f2d>] vfs_write+0xbd/0x1e0
          [<ffffffff811ea96c>] SyS_write+0x4c/0xa0
          [<ffffffff817ccb1d>] system_call_fastpath+0x1a/0x1f
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      5cb3e91e
  5. 05 10月, 2013 3 次提交
  6. 30 9月, 2013 2 次提交
    • J
      apparmor: fix suspicious RCU usage warning in policy.c/policy.h · 4cd4fc77
      John Johansen 提交于
      The recent 3.12 pull request for apparmor was missing a couple rcu _protected
      access modifiers. Resulting in the follow suspicious RCU usage
      
       [   29.804534] [ INFO: suspicious RCU usage. ]
       [   29.804539] 3.11.0+ #5 Not tainted
       [   29.804541] -------------------------------
       [   29.804545] security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage!
       [   29.804548]
       [   29.804548] other info that might help us debug this:
       [   29.804548]
       [   29.804553]
       [   29.804553] rcu_scheduler_active = 1, debug_locks = 1
       [   29.804558] 2 locks held by apparmor_parser/1268:
       [   29.804560]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29
       [   29.804576]  #1:  (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
       [   29.804589]
       [   29.804589] stack backtrace:
       [   29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
       [   29.804599] Hardware name: ASUSTeK Computer Inc.         UL50VT          /UL50VT    , BIOS 217     03/01/2010
       [   29.804602]  0000000000000000 ffff8800b95a1d90 ffffffff8144eb9b ffff8800b94db540
       [   29.804611]  ffff8800b95a1dc0 ffffffff81087439 ffff880138cc3a18 ffff880138cc3a18
       [   29.804619]  ffff8800b9464a90 ffff880138cc3a38 ffff8800b95a1df0 ffffffff811f5084
       [   29.804628] Call Trace:
       [   29.804636]  [<ffffffff8144eb9b>] dump_stack+0x4e/0x82
       [   29.804642]  [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
       [   29.804649]  [<ffffffff811f5084>] __aa_update_replacedby+0x53/0x7f
       [   29.804655]  [<ffffffff811f5408>] __replace_profile+0x11f/0x1ed
       [   29.804661]  [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
       [   29.804668]  [<ffffffff811f16d4>] profile_replace+0x35/0x4c
       [   29.804674]  [<ffffffff81120fa3>] vfs_write+0xad/0x113
       [   29.804680]  [<ffffffff81121609>] SyS_write+0x44/0x7a
       [   29.804687]  [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b
       [   29.804691]
       [   29.804694] ===============================
       [   29.804697] [ INFO: suspicious RCU usage. ]
       [   29.804700] 3.11.0+ #5 Not tainted
       [   29.804703] -------------------------------
       [   29.804706] security/apparmor/policy.c:566 suspicious rcu_dereference_check() usage!
       [   29.804709]
       [   29.804709] other info that might help us debug this:
       [   29.804709]
       [   29.804714]
       [   29.804714] rcu_scheduler_active = 1, debug_locks = 1
       [   29.804718] 2 locks held by apparmor_parser/1268:
       [   29.804721]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29
       [   29.804733]  #1:  (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
       [   29.804744]
       [   29.804744] stack backtrace:
       [   29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
       [   29.804753] Hardware name: ASUSTeK Computer Inc.         UL50VT          /UL50VT    , BIOS 217     03/01/2010
       [   29.804756]  0000000000000000 ffff8800b95a1d80 ffffffff8144eb9b ffff8800b94db540
       [   29.804764]  ffff8800b95a1db0 ffffffff81087439 ffff8800b95b02b0 0000000000000000
       [   29.804772]  ffff8800b9efba08 ffff880138cc3a38 ffff8800b95a1dd0 ffffffff811f4f94
       [   29.804779] Call Trace:
       [   29.804786]  [<ffffffff8144eb9b>] dump_stack+0x4e/0x82
       [   29.804791]  [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
       [   29.804798]  [<ffffffff811f4f94>] aa_free_replacedby_kref+0x4d/0x62
       [   29.804804]  [<ffffffff811f4f47>] ? aa_put_namespace+0x17/0x17
       [   29.804810]  [<ffffffff811f4f0b>] kref_put+0x36/0x40
       [   29.804816]  [<ffffffff811f5423>] __replace_profile+0x13a/0x1ed
       [   29.804822]  [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
       [   29.804829]  [<ffffffff811f16d4>] profile_replace+0x35/0x4c
       [   29.804835]  [<ffffffff81120fa3>] vfs_write+0xad/0x113
       [   29.804840]  [<ffffffff81121609>] SyS_write+0x44/0x7a
       [   29.804847]  [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b
      
      Reported-by: miles.lane@gmail.com
      CC: paulmck@linux.vnet.ibm.com
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      4cd4fc77
    • T
      apparmor: Use shash crypto API interface for profile hashes · 71ac7f62
      Tyler Hicks 提交于
      Use the shash interface, rather than the hash interface, when hashing
      AppArmor profiles. The shash interface does not use scatterlists and it
      is a better fit for what AppArmor needs.
      
      This fixes a kernel paging BUG when aa_calc_profile_hash() is passed a
      buffer from vmalloc(). The hash interface requires callers to handle
      vmalloc() buffers differently than what AppArmor was doing. Due to
      vmalloc() memory not being physically contiguous, each individual page
      behind the buffer must be assigned to a scatterlist with sg_set_page()
      and then the scatterlist passed to crypto_hash_update().
      
      The shash interface does not have that limitation and allows vmalloc()
      and kmalloc() buffers to be handled in the same manner.
      
      BugLink: https://launchpad.net/bugs/1216294/
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=62261Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Acked-by: NSeth Arnold <seth.arnold@canonical.com>
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      71ac7f62
  7. 27 9月, 2013 2 次提交
    • P
      selinux: correct locking in selinux_netlbl_socket_connect) · 42d64e1a
      Paul Moore 提交于
      The SELinux/NetLabel glue code has a locking bug that affects systems
      with NetLabel enabled, see the kernel error message below.  This patch
      corrects this problem by converting the bottom half socket lock to a
      more conventional, and correct for this call-path, lock_sock() call.
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.11.0-rc3+ #19 Not tainted
       -------------------------------
       net/ipv4/cipso_ipv4.c:1928 suspicious rcu_dereference_protected() usage!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 0
       2 locks held by ping/731:
        #0:  (slock-AF_INET/1){+.-...}, at: [...] selinux_netlbl_socket_connect
        #1:  (rcu_read_lock){.+.+..}, at: [<...>] netlbl_conn_setattr
      
       stack backtrace:
       CPU: 1 PID: 731 Comm: ping Not tainted 3.11.0-rc3+ #19
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000001 ffff88006f659d28 ffffffff81726b6a ffff88003732c500
        ffff88006f659d58 ffffffff810e4457 ffff88006b845a00 0000000000000000
        000000000000000c ffff880075aa2f50 ffff88006f659d90 ffffffff8169bec7
       Call Trace:
        [<ffffffff81726b6a>] dump_stack+0x54/0x74
        [<ffffffff810e4457>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff8169bec7>] cipso_v4_sock_setattr+0x187/0x1a0
        [<ffffffff8170f317>] netlbl_conn_setattr+0x187/0x190
        [<ffffffff8170f195>] ? netlbl_conn_setattr+0x5/0x190
        [<ffffffff8131ac9e>] selinux_netlbl_socket_connect+0xae/0xc0
        [<ffffffff81303025>] selinux_socket_connect+0x135/0x170
        [<ffffffff8119d127>] ? might_fault+0x57/0xb0
        [<ffffffff812fb146>] security_socket_connect+0x16/0x20
        [<ffffffff815d3ad3>] SYSC_connect+0x73/0x130
        [<ffffffff81739a85>] ? sysret_check+0x22/0x5d
        [<ffffffff810e5e2d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff81373d4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff815d52be>] SyS_connect+0xe/0x10
        [<ffffffff81739a59>] system_call_fastpath+0x16/0x1b
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      42d64e1a
    • D
      7d1db4b2
  8. 31 8月, 2013 2 次提交
  9. 29 8月, 2013 2 次提交
    • E
      Revert "SELinux: do not handle seclabel as a special flag" · 0b4bdb35
      Eric Paris 提交于
      This reverts commit 308ab70c.
      
      It breaks my FC6 test box.  /dev/pts is not mounted.  dmesg says
      
      SELinux: mount invalid.  Same superblock, different security settings
      for (dev devpts, type devpts)
      
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      0b4bdb35
    • A
      selinux: consider filesystem subtype in policies · 102aefdd
      Anand Avati 提交于
      Not considering sub filesystem has the following limitation. Support
      for SELinux in FUSE is dependent on the particular userspace
      filesystem, which is identified by the subtype. For e.g, GlusterFS,
      a FUSE based filesystem supports SELinux (by mounting and processing
      FUSE requests in different threads, avoiding the mount time
      deadlock), whereas other FUSE based filesystems (identified by a
      different subtype) have the mount time deadlock.
      
      By considering the subtype of the filesytem in the SELinux policies,
      allows us to specify a filesystem subtype, in the following way:
      
      fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0);
      
      This way not all FUSE filesystems are put in the same bucket and
      subjected to the limitations of the other subtypes.
      Signed-off-by: NAnand Avati <avati@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      102aefdd
  10. 20 8月, 2013 1 次提交
  11. 15 8月, 2013 15 次提交
  12. 13 8月, 2013 1 次提交
    • R
      Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes · 10289b0f
      Rafal Krypa 提交于
      Smack interface for loading rules has always parsed only single rule from
      data written to it. This requires user program to call one write() per
      each rule it wants to load.
      This change makes it possible to write multiple rules, separated by new
      line character. Smack will load at most PAGE_SIZE-1 characters and properly
      return number of processed bytes. In case when user buffer is larger, it
      will be additionally truncated. All characters after last \n will not get
      parsed to avoid partial rule near input buffer boundary.
      Signed-off-by: NRafal Krypa <r.krypa@samsung.com>
      10289b0f
  13. 09 8月, 2013 6 次提交
    • T
      cgroup: make css_for_each_descendant() and friends include the origin css in the iteration · bd8815a6
      Tejun Heo 提交于
      Previously, all css descendant iterators didn't include the origin
      (root of subtree) css in the iteration.  The reasons were maintaining
      consistency with css_for_each_child() and that at the time of
      introduction more use cases needed skipping the origin anyway;
      however, given that css_is_descendant() considers self to be a
      descendant, omitting the origin css has become more confusing and
      looking at the accumulated use cases rather clearly indicates that
      including origin would result in simpler code overall.
      
      While this is a change which can easily lead to subtle bugs, cgroup
      API including the iterators has recently gone through major
      restructuring and no out-of-tree changes will be applicable without
      adjustments making this a relatively acceptable opportunity for this
      type of change.
      
      The conversions are mostly straight-forward.  If the iteration block
      had explicit origin handling before or after, it's moved inside the
      iteration.  If not, if (pos == origin) continue; is added.  Some
      conversions add extra reference get/put around origin handling by
      consolidating origin handling and the rest.  While the extra ref
      operations aren't strictly necessary, this shouldn't cause any
      noticeable difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      bd8815a6
    • T
      cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup · 492eb21b
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using css
      (cgroup_subsys_state) as the primary handle instead of cgroup in
      subsystem API.  For hierarchy iterators, this is beneficial because
      
      * In most cases, css is the only thing subsystems care about anyway.
      
      * On the planned unified hierarchy, iterations for different
        subsystems will need to skip over different subtrees of the
        hierarchy depending on which subsystems are enabled on each cgroup.
        Passing around css makes it unnecessary to explicitly specify the
        subsystem in question as css is intersection between cgroup and
        subsystem
      
      * For the planned unified hierarchy, css's would need to be created
        and destroyed dynamically independent from cgroup hierarchy.  Having
        cgroup core manage css iteration makes enforcing deref rules a lot
        easier.
      
      Most subsystem conversions are straight-forward.  Noteworthy changes
      are
      
      * blkio: cgroup_to_blkcg() is no longer used.  Removed.
      
      * freezer: cgroup_freezer() is no longer used.  Removed.
      
      * devices: cgroup_to_devcgroup() is no longer used.  Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      492eb21b
    • T
      cgroup: pass around cgroup_subsys_state instead of cgroup in file methods · 182446d0
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup.
      Please see the previous commit which converts the subsystem methods
      for rationale.
      
      This patch converts all cftype file operations to take @css instead of
      @cgroup.  cftypes for the cgroup core files don't have their subsytem
      pointer set.  These will automatically use the dummy_css added by the
      previous patch and can be converted the same way.
      
      Most subsystem conversions are straight forwards but there are some
      interesting ones.
      
      * freezer: update_if_frozen() is also converted to take @css instead
        of @cgroup for consistency.  This will make the code look simpler
        too once iterators are converted to use css.
      
      * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
        vmpressure while mem_cgroup_from_cont() can be made static.
        Updated accordingly.
      
      * cpu: cgroup_tg() doesn't have any user left.  Removed.
      
      * cpuacct: cgroup_ca() doesn't have any user left.  Removed.
      
      * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
        Removed.
      
      * net_cls: cgrp_cls_state() doesn't have any user left.  Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      182446d0
    • T
      cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods · eb95419b
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup *
      in subsystem implementations for the following reasons.
      
      * With unified hierarchy, subsystems will be dynamically bound and
        unbound from cgroups and thus css's (cgroup_subsys_state) may be
        created and destroyed dynamically over the lifetime of a cgroup,
        which is different from the current state where all css's are
        allocated and destroyed together with the associated cgroup.  This
        in turn means that cgroup_css() should be synchronized and may
        return NULL, making it more cumbersome to use.
      
      * Differing levels of per-subsystem granularity in the unified
        hierarchy means that the task and descendant iterators should behave
        differently depending on the specific subsystem the iteration is
        being performed for.
      
      * In majority of the cases, subsystems only care about its part in the
        cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
        often obtain the matching css pointer from the cgroup and don't
        bother with the cgroup pointer itself.  Passing around css fits
        much better.
      
      This patch converts all cgroup_subsys methods to take @css instead of
      @cgroup.  The conversions are mostly straight-forward.  A few
      noteworthy changes are
      
      * ->css_alloc() now takes css of the parent cgroup rather than the
        pointer to the new cgroup as the css for the new cgroup doesn't
        exist yet.  Knowing the parent css is enough for all the existing
        subsystems.
      
      * In kernel/cgroup.c::offline_css(), unnecessary open coded css
        dereference is replaced with local variable access.
      
      This patch shouldn't cause any behavior differences.
      
      v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
          with local variable @css as suggested by Li Zefan.
      
          Rebased on top of new for-3.12 which includes for-3.11-fixes so
          that ->css_free() invocation added by da0a12ca ("cgroup: fix a
          leak when percpu_ref_init() fails") is converted too.  Suggested
          by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eb95419b
    • T
      cgroup: add css_parent() · 63876986
      Tejun Heo 提交于
      Currently, controllers have to explicitly follow the cgroup hierarchy
      to find the parent of a given css.  cgroup is moving towards using
      cgroup_subsys_state as the main controller interface construct, so
      let's provide a way to climb the hierarchy using just csses.
      
      This patch implements css_parent() which, given a css, returns its
      parent.  The function is guarnateed to valid non-NULL parent css as
      long as the target css is not at the top of the hierarchy.
      
      freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
      are converted to use css_parent() instead of accessing cgroup->parent
      directly.
      
      * __parent_ca() is dropped from cpuacct and its usage is replaced with
        parent_ca().  The only difference between the two was NULL test on
        cgroup->parent which is now embedded in css_parent() making the
        distinction moot.  Note that eventually a css->parent field will be
        added to css and the NULL check in css_parent() will go away.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      63876986
    • T
      cgroup: add/update accessors which obtain subsys specific data from css · a7c6d554
      Tejun Heo 提交于
      css (cgroup_subsys_state) is usually embedded in a subsys specific
      data structure.  Subsystems either use container_of() directly to cast
      from css to such data structure or has an accessor function wrapping
      such cast.  As cgroup as whole is moving towards using css as the main
      interface handle, add and update such accessors to ease dealing with
      css's.
      
      All accessors explicitly handle NULL input and return NULL in those
      cases.  While this looks like an extra branch in the code, as all
      controllers specific data structures have css as the first field, the
      casting doesn't involve any offsetting and the compiler can trivially
      optimize out the branch.
      
      * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
        accessor.  Added.
      
      * memory, hugetlb and devices already had one but didn't explicitly
        handle NULL input.  Updated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      a7c6d554