1. 13 5月, 2014 1 次提交
    • T
      kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs · 555724a8
      Tejun Heo 提交于
      The kernfs open method - kernfs_fop_open() - inherited extra
      permission checks from sysfs.  While the vfs layer allows ignoring the
      read/write permissions checks if the issuer has CAP_DAC_OVERRIDE,
      sysfs explicitly denied open regardless of the cap if the file doesn't
      have any of the UGO perms of the requested access or doesn't implement
      the requested operation.  It can be debated whether this was a good
      idea or not but the behavior is too subtle and dangerous to change at
      this point.
      
      After cgroup got converted to kernfs, this extra perm check also got
      applied to cgroup breaking libcgroup which opens write-only files with
      O_RDWR as root.  This patch gates the extra open permission check with
      a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs.
      For sysfs, nothing changes.  For cgroup, root now can perform any
      operation regardless of the permissions as it was before kernfs
      conversion.  Note that kernfs still fails unimplemented operations
      with -EINVAL.
      
      While at it, add comments explaining KERNFS_ROOT flags.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Tested-by: NAndrey Wagin <avagin@gmail.com>
      Cc: Li Zefan <lizefan@huawei.com>
      References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com
      Fixes: 2bd59d48 ("cgroup: convert to kernfs")
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      555724a8
  2. 25 2月, 2014 1 次提交
    • L
      sysfs: fix namespace refcnt leak · fed95bab
      Li Zefan 提交于
      As mount() and kill_sb() is not a one-to-one match, we shoudn't get
      ns refcnt unconditionally in sysfs_mount(), and instead we should
      get the refcnt only when kernfs_mount() allocated a new superblock.
      
      v2:
      - Changed the name of the new argument, suggested by Tejun.
      - Made the argument optional, suggested by Tejun.
      
      v3:
      - Make the new argument as second-to-last arg, suggested by Tejun.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NTejun Heo <tj@kernel.org>
       ---
       fs/kernfs/mount.c      | 8 +++++++-
       fs/sysfs/mount.c       | 5 +++--
       include/linux/kernfs.h | 9 +++++----
       3 files changed, 15 insertions(+), 7 deletions(-)
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fed95bab
  3. 08 2月, 2014 1 次提交
    • T
      kernfs: allow nodes to be created in the deactivated state · d35258ef
      Tejun Heo 提交于
      Currently, kernfs_nodes are made visible to userland on creation,
      which makes it difficult for kernfs users to atomically succeed or
      fail creation of multiple nodes.  In addition, if something fails
      after creating some nodes, the created nodes might already be in use
      and their active refs need to be drained for removal, which has the
      potential to introduce tricky reverse locking dependency on active_ref
      depending on how the error path is synchronized.
      
      This patch introduces per-root flag KERNFS_ROOT_CREATE_DEACTIVATED.
      If set, all nodes under the root are created in the deactivated state
      and stay invisible to userland until explicitly enabled by the new
      kernfs_activate() API.  Also, nodes which have never been activated
      are guaranteed to bypass draining on removal thus allowing error paths
      to not worry about lockding dependency on active_ref draining.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d35258ef
  4. 18 12月, 2013 1 次提交
    • T
      kernfs: add kernfs_dir_ops · 80b9bbef
      Tejun Heo 提交于
      Add support for mkdir(2), rmdir(2) and rename(2) syscalls.  This is
      implemented through optional kernfs_dir_ops callback table which can
      be specified on kernfs_create_root().  An implemented callback is
      invoked when the matching syscall is invoked.
      
      As kernfs keep dcache syncs with internal representation and
      revalidates dentries on each access, the implementation of these
      methods is extremely simple.  Each just discovers the relevant
      kernfs_node(s) and invokes the requested callback which is allowed to
      do any kernfs operations and the end result doesn't necessarily have
      to match the expected semantics of the syscall.
      
      This will be used to convert cgroup to use kernfs instead of its own
      filesystem implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80b9bbef
  5. 12 12月, 2013 1 次提交
    • T
      kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_elem_dir/kernfs_elem_dir/
      * s/sysfs_elem_symlink/kernfs_elem_symlink/
      * s/sysfs_elem_attr/kernfs_elem_file/
      * s/sysfs_dirent/kernfs_node/
      * s/sd/kn/ in kernfs proper
      * s/parent_sd/parent/
      * s/target_sd/target/
      * s/dir_sd/parent/
      * s/to_sysfs_dirent()/rb_to_kn()/
      * misc renames of local vars when they conflict with the above
      
      Because md, mic and gpio dig into sysfs details, this patch ends up
      modifying them.  All are sysfs_dirent renames and trivial.  While we
      can avoid these by introducing a dummy wrapping struct sysfs_dirent
      around kernfs_node, given the limited usage outside kernfs and sysfs
      proper, I don't think such workaround is called for.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      
      - mic / gpio renames were missing.  Spotted by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324a56e1
  6. 11 12月, 2013 1 次提交
  7. 30 11月, 2013 9 次提交
    • T
      sysfs, kernfs: move mount core code to fs/kernfs/mount.c · fa736a95
      Tejun Heo 提交于
      Move core mount code to fs/kernfs/mount.c.  The respective
      declarations in fs/sysfs/sysfs.h are moved to
      fs/kernfs/kernfs-internal.h.
      
      This is pure relocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa736a95
    • T
      sysfs, kernfs: prepare mount path for kernfs · 4b93dc9b
      Tejun Heo 提交于
      We're in the process of separating out core sysfs functionality into
      kernfs which will deal with sysfs_dirents directly.  This patch
      rearranges mount path so that the kernfs and sysfs parts are separate.
      
      * As sysfs_super_info won't be visible outside kernfs proper,
        kernfs_super_ns() is added to allow kernfs users to access a
        super_block's namespace tag.
      
      * Generic mount operation is separated out into kernfs_mount_ns().
        sysfs_mount() now just performs sysfs-specific permission check,
        acquires namespace tag, and invokes kernfs_mount_ns().
      
      * Generic superblock release is separated out into kernfs_kill_sb()
        which can be used directly as file_system_type->kill_sb().  As sysfs
        needs to put the namespace tag, sysfs_kill_sb() wraps
        kernfs_kill_sb() with ns tag put.
      
      * sysfs_dir_cachep init and sysfs_inode_init() are separated out into
        kernfs_init().  kernfs_init() uses only small amount of memory and
        trying to handle and propagate kernfs_init() failure doesn't make
        much sense.  Use SLAB_PANIC for sysfs_dir_cachep and make
        sysfs_inode_init() panic on failure.
      
        After this change, kernfs_init() should be called before
        sysfs_init(), fs/namespace.c::mnt_init() modified accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b93dc9b
    • T
      sysfs, kernfs: make super_blocks bind to different kernfs_roots · df394fb5
      Tejun Heo 提交于
      kernfs is being updated to allow multiple sysfs_dirent hierarchies so
      that it can also be used by other users.  Currently, sysfs
      super_blocks are always attached to one kernfs_root - sysfs_root - and
      distinguished only by their namespace tags.
      
      This patch adds sysfs_super_info->root and update
      sysfs_fill/test_super() so that super_blocks are identified by the
      combination of both the associated kernfs_root and namespace tag.
      This allows mounting different kernfs hierarchies.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df394fb5
    • T
      sysfs, kernfs: implement kernfs_create/destroy_root() · ba7443bc
      Tejun Heo 提交于
      There currently is single kernfs hierarchy in the whole system which
      is used for sysfs.  kernfs needs to support multiple hierarchies to
      allow other users.  This patch introduces struct kernfs_root which
      serves as the root of each kernfs hierarchy and implements
      kernfs_create/destroy_root().
      
      * Each kernfs_root is associated with a root sd (sysfs_dentry).  The
        root is freed when the root sd is released and kernfs_destory_root()
        simply invokes kernfs_remove() on the root sd.  sysfs_remove_one()
        is updated to handle release of the root sd.  Note that ps_iattr
        update in sysfs_remove_one() is trivially updated for readability.
      
      * Root sd's are now dynamically allocated using sysfs_new_dirent().
        Update sysfs_alloc_ino() so that it gives out ino from 1 so that the
        root sd still gets ino 1.
      
      * While kernfs currently only points to the root sd, it'll soon grow
        fields which are specific to each hierarchy.  As determining a given
        sd's root will be necessary, sd->s_dir.root is added.  This backlink
        fits better as a separate field in sd; however, sd->s_dir is inside
        union with space to spare, so use it to save space and provide
        kernfs_root() accessor to determine the root sd.
      
      * As hierarchies may be destroyed now, each mount needs to hold onto
        the hierarchy it's attached to.  Update sysfs_fill_super() and
        sysfs_kill_sb() so that they get and put the kernfs_root
        respectively.
      
      * sysfs_root is replaced with kernfs_root which is dynamically created
        by invoking kernfs_create_root() from sysfs_init().
      
      This patch doesn't introduce any visible behavior changes.
      
      v2: kernfs_create_root() forgot to set @sd->priv.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba7443bc
    • T
      sysfs, kernfs: introduce sysfs_root_sd · 061447a4
      Tejun Heo 提交于
      Currently, it's assumed that there's a single kernfs hierarchy in the
      system anchored at sysfs_root which is defined as a global struct.  To
      allow other users of kernfs, this will be made dynamic.  Introduce a
      new global variable sysfs_root_sd which points to &sysfs_root and
      convert all &sysfs_root users.
      
      This patch doesn't introduce any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      061447a4
    • T
      sysfs, kernfs: no need to kern_mount() sysfs from sysfs_init() · 9e30cc95
      Tejun Heo 提交于
      It has been very long since sysfs depended on vfs to keep track of
      internal states and whether sysfs is mounted or not doesn't make any
      difference to sysfs's internal operation.
      
      In addition to init and filesystem type registration, sysfs_init()
      invokes kern_mount() to create in-kernel mount of sysfs.  This
      internal mounting doesn't server any purpose anymore.  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e30cc95
    • T
      sysfs, kernfs: make sysfs_super_info->ns const · 51a35e9f
      Tejun Heo 提交于
      Add const qualifier to sysfs_super_info->ns so that it's consistent
      with other namespace tag usages in sysfs.  Because kobject doesn't use
      const qualifier for namespace tags, this ends up requiring an explicit
      cast to drop const qualifier in free_sysfs_super_info().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51a35e9f
    • T
      sysfs, kernfs: drop unused params from sysfs_fill_super() · ccc532dc
      Tejun Heo 提交于
      sysfs_fill_super() takes three params - @sb, @data and @silent - but
      uses only @sb.  Drop the latter two.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccc532dc
    • T
      sysfs, kernfs: introduce kernfs[_find_and]_get() and kernfs_put() · ccf73cf3
      Tejun Heo 提交于
      Introduce kernfs interface for finding, getting and putting
      sysfs_dirents.
      
      * sysfs_find_dirent() is renamed to kernfs_find_ns() and lockdep
        assertion for sysfs_mutex is added.
      
      * sysfs_get_dirent_ns() is renamed to kernfs_find_and_get().
      
      * Macro inline dancing around __sysfs_get/put() are removed and
        kernfs_get/put() are made proper functions implemented in
        fs/sysfs/dir.c.
      
      While the conversions are mostly equivalent, there's one difference -
      kernfs_get() doesn't return the input param as its return value.  This
      change is intentional.  While passing through the input increases
      writability in some areas, it is unnecessary and has been shown to
      cause confusion regarding how the last ref is handled.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccf73cf3
  8. 28 11月, 2013 1 次提交
    • T
      sysfs: drop kobj_ns_type handling, take #2 · c84a3b27
      Tejun Heo 提交于
      The way namespace tags are implemented in sysfs is more complicated
      than necessary.  As each tag is a pointer value and required to be
      non-NULL under a namespace enabled parent, there's no need to record
      separately what type each tag is.  If multiple namespace types are
      needed, which currently aren't, we can simply compare the tag to a set
      of allowed tags in the superblock assuming that the tags, being
      pointers, won't have the same value across multiple types.
      
      This patch rips out kobj_ns_type handling from sysfs.  sysfs now has
      an enable switch to turn on namespace under a node.  If enabled, all
      children are required to have non-NULL namespace tags and filtered
      against the super_block's tag.
      
      kobject namespace determination is now performed in
      lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
      The sanity checks are also moved.  create_dir() is restructured to
      ease such addition.  This removes most kobject namespace knowledge
      from sysfs proper which will enable proper separation and layering of
      sysfs.
      
      This is the second try.  The first one was cb26a311 ("sysfs: drop
      kobj_ns_type handling") which tried to automatically enable namespace
      if there are children with non-NULL namespace tags; however, it was
      broken for symlinks as they should inherit the target's tag iff
      namespace is enabled in the parent.  This led to namespace filtering
      enabled incorrectly for wireless net class devices through phy80211
      symlinks and thus network configuration failure.  a1212d27
      ("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.
      
      This shouldn't introduce any behavior changes, for real.
      
      v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
          missing and caused build failure.  Reported by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c84a3b27
  9. 07 11月, 2013 1 次提交
    • L
      Revert "sysfs: drop kobj_ns_type handling" · a1212d27
      Linus Torvalds 提交于
      This reverts commit cb26a311.
      
      It mysteriously causes NetworkManager to not find the wireless device
      for me.  As far as I can tell, Tejun *meant* for this commit to not make
      any semantic changes, but there clearly are some.  So revert it, taking
      into account some of the calling convention changes that happened in
      this area in subsequent commits.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1212d27
  10. 27 9月, 2013 1 次提交
    • T
      sysfs: drop kobj_ns_type handling · cb26a311
      Tejun Heo 提交于
      The way namespace tags are implemented in sysfs is more complicated
      than necessary.  As each tag is a pointer value and required to be
      non-NULL under a namespace enabled parent, there's no need to record
      separately what type each tag is or where namespace is enabled.
      
      If multiple namespace types are needed, which currently aren't, we can
      simply compare the tag to a set of allowed tags in the superblock
      assuming that the tags, being pointers, won't have the same value
      across multiple types.  Also, whether to filter by namespace tag or
      not can be trivially determined by whether the node has any tagged
      children or not.
      
      This patch rips out kobj_ns_type handling from sysfs.  sysfs no longer
      cares whether specific type of namespace is enabled or not.  If a
      sysfs_dirent has a non-NULL tag, the parent is marked as needing
      namespace filtering and the value is tested against the allowed set of
      tags for the superblock (currently only one but increasing this number
      isn't difficult) and the sysfs_dirent is ignored if it doesn't match.
      
      This removes most kobject namespace knowledge from sysfs proper which
      will enable proper separation and layering of sysfs.  The namespace
      sanity checks in fs/sysfs/dir.c are replaced by the new sanity check
      in kobject_namespace().  As this is the only place ktype->namespace()
      is called for sysfs, this doesn't weaken the sanity check
      significantly.  I omitted converting the sanity check in
      sysfs_do_create_link_sd().  While the check can be shifted to upper
      layer, mistakes there are well contained and should be easily visible
      anyway.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb26a311
  11. 29 8月, 2013 1 次提交
    • E
      sysfs: Restrict mounting sysfs · 7dc5dbc8
      Eric W. Biederman 提交于
      Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
      over the net namespace.  The principle here is if you create or have
      capabilities over it you can mount it, otherwise you get to live with
      what other people have mounted.
      
      Instead of testing this with a straight forward ns_capable call,
      perform this check the long and torturous way with kobject helpers,
      this keeps direct knowledge of namespaces out of sysfs, and preserves
      the existing sysfs abstractions.
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7dc5dbc8
  12. 27 8月, 2013 1 次提交
    • E
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman 提交于
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
  13. 22 8月, 2013 1 次提交
  14. 27 3月, 2013 1 次提交
    • E
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman 提交于
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
  15. 18 1月, 2013 1 次提交
  16. 20 11月, 2012 1 次提交
  17. 14 7月, 2012 2 次提交
  18. 21 3月, 2012 1 次提交
  19. 25 1月, 2012 1 次提交
  20. 13 6月, 2011 1 次提交
    • A
      Delay struct net freeing while there's a sysfs instance refering to it · a685e089
      Al Viro 提交于
      	* new refcount in struct net, controlling actual freeing of the memory
      	* new method in kobj_ns_type_operations (->drop_ns())
      	* ->current_ns() semantics change - it's supposed to be followed by
      corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
      the new refcount; net_drop_ns() decrements it and calls net_free() if the
      last reference has been dropped.  Method renamed to ->grab_current_ns().
      	* old net_free() callers call net_drop_ns() instead.
      	* sysfs_exit_ns() is gone, along with a large part of callchain
      leading to it; now that the references stored in ->ns[...] stay valid we
      do not need to hunt them down and replace them with NULL.  That fixes
      problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
      of sb->s_instances abuse.
      
      	Note that struct net *shutdown* logics has not changed - net_cleanup()
      is called exactly when it used to be called.  The only thing postponed by
      having a sysfs instance refering to that struct net is actual freeing of
      memory occupied by struct net.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a685e089
  21. 29 10月, 2010 1 次提交
  22. 10 8月, 2010 1 次提交
  23. 22 5月, 2010 4 次提交
    • E
      sysfs: Implement sysfs tagged directory support. · 3ff195b0
      Eric W. Biederman 提交于
      The problem.  When implementing a network namespace I need to be able
      to have multiple network devices with the same name.  Currently this
      is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
      potentially a few other directories of the form /sys/ ... /net/*.
      
      What this patch does is to add an additional tag field to the
      sysfs dirent structure.  For directories that should show different
      contents depending on the context such as /sys/class/net/, and
      /sys/devices/virtual/net/ this tag field is used to specify the
      context in which those directories should be visible.  Effectively
      this is the same as creating multiple distinct directories with
      the same name but internally to sysfs the result is nicer.
      
      I am calling the concept of a single directory that looks like multiple
      directories all at the same path in the filesystem tagged directories.
      
      For the networking namespace the set of directories whose contents I need
      to filter with tags can depend on the presence or absence of hotplug
      hardware or which modules are currently loaded.  Which means I need
      a simple race free way to setup those directories as tagged.
      
      To achieve a reace free design all tagged directories are created
      and managed by sysfs itself.
      
      Users of this interface:
      - define a type in the sysfs_tag_type enumeration.
      - call sysfs_register_ns_types with the type and it's operations
      - sysfs_exit_ns when an individual tag is no longer valid
      
      - Implement mount_ns() which returns the ns of the calling process
        so we can attach it to a sysfs superblock.
      - Implement ktype.namespace() which returns the ns of a syfs kobject.
      
      Everything else is left up to sysfs and the driver layer.
      
      For the network namespace mount_ns and namespace() are essentially
      one line functions, and look to remain that.
      
      Tags are currently represented a const void * pointers as that is
      both generic, prevides enough information for equality comparisons,
      and is trivial to create for current users, as it is just the
      existing namespace pointer.
      
      The work needed in sysfs is more extensive.  At each directory
      or symlink creating I need to check if the directory it is being
      created in is a tagged directory and if so generate the appropriate
      tag to place on the sysfs_dirent.  Likewise at each symlink or
      directory removal I need to check if the sysfs directory it is
      being removed from is a tagged directory and if so figure out
      which tag goes along with the name I am deleting.
      
      Currently only directories which hold kobjects, and
      symlinks are supported.  There is not enough information
      in the current file attribute interfaces to give us anything
      to discriminate on which makes it useless, and there are
      no potential users which makes it an uninteresting problem
      to solve.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      3ff195b0
    • E
      sysfs: Remove usage of S_BIAS to avoid merge conflict with the vfs tree · 68d75ed4
      Eric W. Biederman 提交于
      In Al's latest vfs tree the code is reworked and S_BIAS has been removed.
      
      It turns out that checking to see if a super block is in the
      middle of an unmount in sysfs_exit_ns is unnecessary because we
      remove the super_block from the s_supers/s_instances list before
      struct sysfs_super_info pointed to by sb->s_fs_info is freed.
      
      For now just delete the unnecessary check to see if a superblock is in the
      middle of an unmount, it isn't necessary with or without Al's changes
      and it just causes a needless conflict.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      68d75ed4
    • E
      ba514a57
    • E
      sysfs: Basic support for multiple super blocks · 9e7fdd25
      Eric W. Biederman 提交于
      Add all of the necessary bioler plate to support
      multiple superblocks in sysfs.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      9e7fdd25
  24. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  25. 08 3月, 2010 2 次提交
  26. 25 3月, 2009 2 次提交