1. 17 2月, 2016 1 次提交
  2. 08 2月, 2016 1 次提交
  3. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  4. 15 1月, 2016 1 次提交
    • V
      Revert "kernfs: do not account ino_ida allocations to memcg" · b2a209ff
      Vladimir Davydov 提交于
      Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
      alloc_kmem_pages call) are accounted to memory cgroup automatically.
      Callers have to explicitly opt out if they don't want/need accounting
      for some reason.  Such a design decision leads to several problems:
      
       - kmalloc users are highly sensitive to failures, many of them
         implicitly rely on the fact that kmalloc never fails, while memcg
         makes failures quite plausible.
      
       - A lot of objects are shared among different containers by design.
         Accounting such objects to one of containers is just unfair.
         Moreover, it might lead to pinning a dead memcg along with its kmem
         caches, which aren't tiny, which might result in noticeable increase
         in memory consumption for no apparent reason in the long run.
      
       - There are tons of short-lived objects. Accounting them to memcg will
         only result in slight noise and won't change the overall picture, but
         we still have to pay accounting overhead.
      
      For more info, see
      
       - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
       - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza
      
      Therefore this patchset switches to the white list policy.  Now kmalloc
      users have to explicitly opt in by passing __GFP_ACCOUNT flag.
      
      Currently, the list of accounted objects is quite limited and only
      includes those allocations that (1) are known to be easily triggered
      from userspace and (2) can fail gracefully (for the full list see patch
      no.  6) and it still misses many object types.  However, accounting only
      those objects should be a satisfactory approximation of the behavior we
      used to have for most sane workloads.
      
      This patch (of 6):
      
      Revert 499611ed ("kernfs: do not account ino_ida allocations
      to memcg").
      
      Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
      fragile and difficult to maintain, because there seem to be many more
      allocations that should not be accounted than those that should be.
      Besides, false accounting an allocation might result in much worse
      consequences than not accounting at all, namely increased memory
      consumption due to pinned dead kmem caches.
      
      So it was decided to switch to the white-list policy.  This patch reverts
      bits introducing the black-list policy.  The white-list policy will be
      introduced later in the series.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2a209ff
  5. 21 11月, 2015 1 次提交
  6. 19 8月, 2015 1 次提交
  7. 01 7月, 2015 1 次提交
  8. 15 5月, 2015 1 次提交
    • V
      kernfs: do not account ino_ida allocations to memcg · 499611ed
      Vladimir Davydov 提交于
      root->ino_ida is used for kernfs inode number allocations. Since IDA has
      a layered structure, different IDs can reside on the same layer, which
      is currently accounted to some memory cgroup. The problem is that each
      kmem cache of a memory cgroup has its own directory on sysfs (under
      /sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a
      directory or any file in it gets allocated from a layer accounted to the
      cgroup which the cache is created for, the cgroup will get pinned for
      good, because one has to free all kmem allocations accounted to a cgroup
      in order to release it and destroy all its kmem caches. That said we
      must not account layers of ino_ida to any memory cgroup.
      
      Since per net init operations may create new sysfs entries directly
      (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
      per each namespace, which, in turn, creates new sysfs entries), an easy
      way to reproduce this issue is by creating network namespace(s) from
      inside a kmem-active memory cgroup.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      499611ed
  9. 16 4月, 2015 1 次提交
  10. 14 2月, 2015 2 次提交
  11. 10 1月, 2015 1 次提交
  12. 20 11月, 2014 1 次提交
  13. 09 10月, 2014 1 次提交
  14. 26 4月, 2014 2 次提交
  15. 09 3月, 2014 1 次提交
  16. 15 2月, 2014 1 次提交
  17. 12 2月, 2014 1 次提交
    • T
      cgroup: remove cgroup->name · e61734c5
      Tejun Heo 提交于
      cgroup->name handling became quite complicated over time involving
      dedicated struct cgroup_name for RCU protection.  Now that cgroup is
      on kernfs, we can drop all of it and simply use kernfs_name/path() and
      friends.  Replace cgroup->name and all related code with kernfs
      name/path constructs.
      
      * Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
        of kernfs counterparts, which involves semantic changes.
        pr_cont_cgroup_name() and pr_cont_cgroup_path() added.
      
      * cgroup->name handling dropped from cgroup_rename().
      
      * All users of cgroup_name/path() updated to the new semantics.  Users
        which were formatting the string just to printk them are converted
        to use pr_cont_cgroup_name/path() instead, which simplifies things
        quite a bit.  As cgroup_name() no longer requires RCU read lock
        around it, RCU lockings which were protecting only cgroup_name() are
        removed.
      
      v2: Comment above oom_info_lock updated as suggested by Michal.
      
      v3: dummy_top doesn't have a kn associated and
          pr_cont_cgroup_name/path() ended up calling the matching kernfs
          functions with NULL kn leading to oops.  Test for NULL kn and
          print "/" if so.  This issue was reported by Fengguang Wu.
      
      v4: Rebased on top of 0ab02ca8 ("cgroup: protect modifications to
          cgroup_idr with cgroup_mutex").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      e61734c5
  18. 11 2月, 2014 1 次提交
  19. 08 2月, 2014 14 次提交
    • T
      kernfs: implement kernfs_get_parent(), kernfs_name/path() and friends · 3eef34ad
      Tejun Heo 提交于
      kernfs_node->parent and ->name are currently marked as "published"
      indicating that kernfs users may access them directly; however, those
      fields may get updated by kernfs_rename[_ns]() and unrestricted access
      may lead to erroneous values or oops.
      
      Protect ->parent and ->name updates with a irq-safe spinlock
      kernfs_rename_lock and implement the following accessors for these
      fields.
      
      * kernfs_name()		- format the node's name into the specified buffer
      * kernfs_path()		- format the node's path into the specified buffer
      * pr_cont_kernfs_name()	- pr_cont a node's name (doesn't need buffer)
      * pr_cont_kernfs_path()	- pr_cont a node's path (doesn't need buffer)
      * kernfs_get_parent()	- pin and return a node's parent
      
      All can be called under any context.  The recursive sysfs_pathname()
      in fs/sysfs/dir.c is replaced with kernfs_path() and
      sysfs_rename_dir_ns() is updated to use kernfs_get_parent() instead of
      dereferencing parent directly.
      
      v2: Dummy definition of kernfs_path() for !CONFIG_KERNFS was missing
          static inline making it cause a lot of build warnings.  Add it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3eef34ad
    • T
      kernfs: implement kernfs_node_from_dentry(), kernfs_root_from_sb() and kernfs_rename() · 0c23b225
      Tejun Heo 提交于
      Implement helpers to determine node from dentry and root from
      super_block.  Also add a kernfs_rename_ns() wrapper which assumes NULL
      namespace.  These generally make sense and will be used by cgroup.
      
      v2: Some dummy implementations for !CONFIG_SYSFS was missing.  Fixed.
          Reported by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c23b225
    • T
      kernfs: allow nodes to be created in the deactivated state · d35258ef
      Tejun Heo 提交于
      Currently, kernfs_nodes are made visible to userland on creation,
      which makes it difficult for kernfs users to atomically succeed or
      fail creation of multiple nodes.  In addition, if something fails
      after creating some nodes, the created nodes might already be in use
      and their active refs need to be drained for removal, which has the
      potential to introduce tricky reverse locking dependency on active_ref
      depending on how the error path is synchronized.
      
      This patch introduces per-root flag KERNFS_ROOT_CREATE_DEACTIVATED.
      If set, all nodes under the root are created in the deactivated state
      and stay invisible to userland until explicitly enabled by the new
      kernfs_activate() API.  Also, nodes which have never been activated
      are guaranteed to bypass draining on removal thus allowing error paths
      to not worry about lockding dependency on active_ref draining.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d35258ef
    • T
      kernfs: add missing kernfs_active() checks in directory operations · b9c9dad0
      Tejun Heo 提交于
      kernfs_iop_lookup(), kernfs_dir_pos() and kernfs_dir_next_pos() were
      missing kernfs_active() tests before using the found kernfs_node.  As
      deactivated state is currently visible only while a node is being
      removed, this doesn't pose an actual problem.  e.g. lookup succeeding
      on a deactivated node doesn't harm anything as the eventual file
      operations are gonna fail and those failures are indistinguishible
      from the cases in which the lookups had happened before the node was
      deactivated.
      
      However, we're gonna allow new nodes to be created deactivated and
      then activated explicitly by the kernfs user when it sees fit.  This
      is to support atomically making multiple nodes visible to userland and
      thus those nodes must not be visible to userland before activated.
      
      Let's plug the lookup and readdir holes so that deactivated nodes are
      invisible to userland.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9c9dad0
    • T
      kernfs: rename kernfs_dir_ops to kernfs_syscall_ops · 90c07c89
      Tejun Heo 提交于
      We're gonna need non-dir syscall callbacks, which will make dir_ops a
      misnomer.  Let's rename kernfs_dir_ops to kernfs_syscall_ops.
      
      This is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90c07c89
    • T
      kernfs: invoke dir_ops while holding active ref of the target node · 07c7530d
      Tejun Heo 提交于
      kernfs_dir_ops are currently being invoked without any active
      reference, which makes it tricky for the invoked operations to
      determine whether the objects associated those nodes are safe to
      access and will remain that way for the duration of such operations.
      
      kernfs already has active_ref mechanism to deal with this which makes
      the removal of a given node the synchronization point for gating the
      file operations.  There's no reason for dir_ops to be any different.
      Update the dir_ops handling so that active_ref is held while the
      dir_ops are executing.  This guarantees that while a dir_ops is
      executing the target nodes stay alive.
      
      As kernfs_dir_ops doesn't have any in-kernel user at this point, this
      doesn't affect anybody.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07c7530d
    • T
      kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers · 6b0afc2a
      Tejun Heo 提交于
      Sometimes it's necessary to implement a node which wants to delete
      nodes including itself.  This isn't straightforward because of kernfs
      active reference.  While a file operation is in progress, an active
      reference is held and kernfs_remove() waits for all such references to
      drain before completing.  For a self-deleting node, this is a deadlock
      as kernfs_remove() ends up waiting for an active reference that itself
      is sitting on top of.
      
      This currently is worked around in the sysfs layer using
      sysfs_schedule_callback() which makes such removals asynchronous.
      While it works, it's rather cumbersome and inherently breaks
      synchronicity of the operation - the file operation which triggered
      the operation may complete before the removal is finished (or even
      started) and the removal may fail asynchronously.  If a removal
      operation is immmediately followed by another operation which expects
      the specific name to be available (e.g. removal followed by rename
      onto the same name), there's no way to make the latter operation
      reliable.
      
      The thing is there's no inherent reason for this to be asynchrnous.
      All that's necessary to do this synchronous is a dedicated operation
      which drops its own active ref and deactivates self.  This patch
      implements kernfs_remove_self() and its wrappers in sysfs and driver
      core.  kernfs_remove_self() is to be called from one of the file
      operations, drops the active ref the task is holding, removes the self
      node, and restores active ref to the dead node so that the ref is
      balanced afterwards.  __kernfs_remove() is updated so that it takes an
      early exit if the target node is already fully removed so that the
      active ref restored by kernfs_remove_self() after removal doesn't
      confuse the deactivation path.
      
      This makes implementing self-deleting nodes very easy.  The normal
      removal path doesn't even need to be changed to use
      kernfs_remove_self() for the self-deleting node.  The method can
      invoke kernfs_remove_self() on itself before proceeding the normal
      removal path.  kernfs_remove() invoked on the node by the normal
      deletion path will simply be ignored.
      
      This will replace sysfs_schedule_callback().  A subtle feature of
      sysfs_schedule_callback() is that it collapses multiple invocations -
      even if multiple removals are triggered, the removal callback is run
      only once.  An equivalent effect can be achieved by testing the return
      value of kernfs_remove_self() - only the one which gets %true return
      value should proceed with actual deletion.  All other instances of
      kernfs_remove_self() will wait till the enclosing kernfs operation
      which invoked the winning instance of kernfs_remove_self() finishes
      and then return %false.  This trivially makes all users of
      kernfs_remove_self() automatically show correct synchronous behavior
      even when there are multiple concurrent operations - all "echo 1 >
      delete" instances will finish only after the whole operation is
      completed by one of the instances.
      
      Note that manipulation of active ref is implemented in separate public
      functions - kernfs_[un]break_active_protection().
      kernfs_remove_self() is the only user at the moment but this will be
      used to cater to more complex cases.
      
      v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
          and sysfs_remove_file_self() had incorrect return type.  Fix it.
          Reported by kbuild test bot.
      
      v3: kernfs_[un]break_active_protection() separated out from
          kernfs_remove_self() and exposed as public API.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b0afc2a
    • T
      kernfs: remove KERNFS_REMOVED · 81c173cb
      Tejun Heo 提交于
      KERNFS_REMOVED is used to mark half-initialized and dying nodes so
      that they don't show up in lookups and deny adding new nodes under or
      renaming it; however, its role overlaps that of deactivation.
      
      It's necessary to deny addition of new children while removal is in
      progress; however, this role considerably intersects with deactivation
      - KERNFS_REMOVED prevents new children while deactivation prevents new
      file operations.  There's no reason to have them separate making
      things more complex than necessary.
      
      This patch removes KERNFS_REMOVED.
      
      * Instead of KERNFS_REMOVED, each node now starts its life
        deactivated.  This means that we now use both atomic_add() and
        atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN.  The compiler
        generates an overflow warnings when negating INT_MIN as the negation
        can't be represented as a positive number.  Nothing is actually
        broken but let's bump BIAS by one to avoid the warnings for archs
        which negates the subtrahend..
      
      * A new helper kernfs_active() which tests whether kn->active >= 0 is
        added for convenience and lockdep annotation.  All KERNFS_REMOVED
        tests are replaced with negated kernfs_active() tests.
      
      * __kernfs_remove() is updated to deactivate, but not drain, all nodes
        in the subtree instead of setting KERNFS_REMOVED.  This removes
        deactivation from kernfs_deactivate(), which is now renamed to
        kernfs_drain().
      
      * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
        checks on the active ref.
      
      * Some comment style updates in the affected area.
      
      v2: Reordered before removal path restructuring.  kernfs_active()
          dropped and kernfs_get/put_active() used instead.  RB_EMPTY_NODE()
          used in the lookup paths.
      
      v3: Reverted most of v2 except for creating a new node with
          KN_DEACTIVATED_BIAS.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81c173cb
    • T
      kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep() · 182fd64b
      Tejun Heo 提交于
      There currently are two mechanisms gating active ref lockdep
      annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
      The former disables lockdep annotations in kernfs_get/put_active()
      while the latter disables all of kernfs_deactivate().
      
      While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
      deactivation step for non-file nodes, the benefit is marginal and it
      needlessly diverges code paths.  Let's drop KERNFS_ACTIVE_REF.
      
      While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
      flag so that it's more convenient and the related code can be compiled
      out when not enabled.
      
      v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor
          KERNFS_LOCKDEP flag").  As the earlier patch already added
          KERNFS_LOCKDEP tests to kernfs_deactivate(), those additions are
          dropped from this patch and the existing ones are simply converted
          to kernfs_lockdep().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      182fd64b
    • T
      kernfs: remove kernfs_addrm_cxt · 988cd7af
      Tejun Heo 提交于
      kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
      added because there were operations which should be performed outside
      kernfs_mutex after adding and removing kernfs_nodes.  The necessary
      operations were recorded in kernfs_addrm_cxt and performed by
      kernfs_addrm_finish(); however, after the recent changes which
      relocated deactivation and unmapping so that they're performed
      directly during removal, the only operation kernfs_addrm_finish()
      performs is kernfs_put(), which can be moved inside the removal path
      too.
      
      This patch moves the kernfs_put() of the base ref to __kernfs_remove()
      and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().
      
      * kernfs_add_one() is updated to grab and release kernfs_mutex itself.
        sysfs_addrm_start/finish() invocations around it are removed from
        all users.
      
      * __kernfs_remove() puts an unlinked node directly instead of chaining
        it to kernfs_addrm_cxt.  Its callers are updated to grab and release
        kernfs_mutex instead of calling kernfs_addrm_start/finish() around
        it.
      
      v2: Rebased on top of "kernfs: associate a new kernfs_node with its
          parent on creation" which dropped @parent from kernfs_add_one().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      988cd7af
    • T
      kernfs: invoke kernfs_unmap_bin_file() directly from kernfs_deactivate() · ccf02aaf
      Tejun Heo 提交于
      kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
      the target file before kernfs_remove() finishes; however, it currently
      is being called from kernfs_addrm_finish() and has the same race
      problem as the original implementation of deactivation when there are
      multiple removers - only the remover which snatches the node to its
      addrm_cxt->removed list is guaranteed to wait for its completion
      before returning.
      
      It can be easily fixed by moving kernfs_unmap_bin_file() invocation
      from kernfs_addrm_finish() to kernfs_deactivated().  The function may
      be called multiple times but that shouldn't do any harm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccf02aaf
    • T
      kernfs: restructure removal path to fix possible premature return · 35beab06
      Tejun Heo 提交于
      The recursive nature of kernfs_remove() means that, even if
      kernfs_remove() is not allowed to be called multiple times on the same
      node, there may be race conditions between removal of parent and its
      descendants.  While we can claim that kernfs_remove() shouldn't be
      called on one of the descendants while the removal of an ancestor is
      in progress, such rule is unnecessarily restrictive and very difficult
      to enforce.  It's better to simply allow invoking kernfs_remove() as
      the caller sees fit as long as the caller ensures that the node is
      accessible.
      
      The current behavior in such situations is broken.  Whoever enters
      removal path first takes the node off the hierarchy and then
      deactivates.  Following removers either return as soon as it notices
      that it's not the first one or can't even find the target node as it
      has already been removed from the hierarchy.  In both cases, the
      following removers may finish prematurely while the nodes which should
      be removed and drained are still being processed by the first one.
      
      This patch restructures so that multiple removers, whether through
      recursion or direction invocation, always follow the following rules.
      
      * When there are multiple concurrent removers, only one puts the base
        ref.
      
      * Regardless of which one puts the base ref, all removers are blocked
        until the target node is fully deactivated and removed.
      
      To achieve the above, removal path now first marks all descendants
      including self REMOVED and then deactivates and unlinks leftmost
      descendant one-by-one.  kernfs_deactivate() is called directly from
      __kernfs_removal() and drops and regrabs kernfs_mutex for each
      descendant to drain active refs.  As this means that multiple removers
      can enter kernfs_deactivate() for the same node, the function is
      updated so that it can handle multiple deactivators of the same node -
      only one actually deactivates but all wait till drain completion.
      
      The restructured removal path guarantees that a removed node gets
      unlinked only after the node is deactivated and drained.  Combined
      with proper multiple deactivator handling, this guarantees that any
      invocation of kernfs_remove() returns only after the node itself and
      all its descendants are deactivated, drained and removed.
      
      v2: Draining separated into a separate loop (used to be in the same
          loop as unlink) and done from __kernfs_deactivate().  This is to
          allow exposing deactivation as a separate interface later.
      
          Root node removal was broken in v1 patch.  Fixed.
      
      v3: Revert most of v2 except for root node removal fix and
          simplification of KERNFS_REMOVED setting loop.
      
      v4: Refreshed on top of ("kernfs: make kernfs_deactivate() honor
          KERNFS_LOCKDEP flag").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35beab06
    • T
      kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq · abd54f02
      Tejun Heo 提交于
      kernfs_node->u.completion is used to notify deactivation completion
      from kernfs_put_active() to kernfs_deactivate().  We now allow
      multiple racing removals of the same node and the current removal
      scheme is no longer correct - kernfs_remove() invocation may return
      before the node is properly deactivated if it races against another
      removal.  The removal path will be restructured to address the issue.
      
      To help such restructure which requires supporting multiple waiters,
      this patch replaces kernfs_node->u.completion with
      kernfs_root->deactivate_waitq.  This makes deactivation event
      notifications share a per-root waitqueue_head; however, the wait path
      is quite cold and this will also allow shaving one pointer off
      kernfs_node.
      
      v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor
          KERNFS_LOCKDEP flag").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abd54f02
    • T
      kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag · a6607930
      Tejun Heo 提交于
      kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
      before performing lockdep annotations and ends up feeding
      uninitialized lockdep_map to lockdep triggering warning like the
      following on USB stick hotunplug.
      
       usb 1-2: USB disconnect, device number 2
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
        ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002
        ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002
        ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710
       Call Trace:
        [<ffffffff81cfb6bd>] dump_stack+0x4e/0x7a
        [<ffffffff810f8530>] __lock_acquire+0x1910/0x1e70
        [<ffffffff810f931a>] lock_acquire+0x9a/0x1d0
        [<ffffffff8127c75e>] kernfs_deactivate+0xee/0x130
        [<ffffffff8127d4c8>] kernfs_addrm_finish+0x38/0x60
        [<ffffffff8127d701>] kernfs_remove_by_name_ns+0x51/0xa0
        [<ffffffff8127b4f1>] remove_files.isra.1+0x41/0x80
        [<ffffffff8127b7e7>] sysfs_remove_group+0x47/0xa0
        [<ffffffff8127b873>] sysfs_remove_groups+0x33/0x50
        [<ffffffff8177d66d>] device_remove_attrs+0x4d/0x80
        [<ffffffff8177e25e>] device_del+0x12e/0x1d0
        [<ffffffff819722c2>] usb_disconnect+0x122/0x1a0
        [<ffffffff819749b5>] hub_thread+0x3c5/0x1290
        [<ffffffff810c6a6d>] kthread+0xed/0x110
        [<ffffffff81d0a56c>] ret_from_fork+0x7c/0xb0
      
      Fix it by making kernfs_deactivate() perform lockdep annotations only
      if KERNFS_LOCKDEP is set.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Reported-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6607930
  20. 06 2月, 2014 1 次提交
    • T
      kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag · da9846ae
      Tejun Heo 提交于
      kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
      before performing lockdep annotations and ends up feeding
      uninitialized lockdep_map to lockdep triggering warning like the
      following on USB stick hotunplug.
      
       usb 1-2: USB disconnect, device number 2
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
        ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002
        ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002
        ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710
       Call Trace:
        [<ffffffff81cfb6bd>] dump_stack+0x4e/0x7a
        [<ffffffff810f8530>] __lock_acquire+0x1910/0x1e70
        [<ffffffff810f931a>] lock_acquire+0x9a/0x1d0
        [<ffffffff8127c75e>] kernfs_deactivate+0xee/0x130
        [<ffffffff8127d4c8>] kernfs_addrm_finish+0x38/0x60
        [<ffffffff8127d701>] kernfs_remove_by_name_ns+0x51/0xa0
        [<ffffffff8127b4f1>] remove_files.isra.1+0x41/0x80
        [<ffffffff8127b7e7>] sysfs_remove_group+0x47/0xa0
        [<ffffffff8127b873>] sysfs_remove_groups+0x33/0x50
        [<ffffffff8177d66d>] device_remove_attrs+0x4d/0x80
        [<ffffffff8177e25e>] device_del+0x12e/0x1d0
        [<ffffffff819722c2>] usb_disconnect+0x122/0x1a0
        [<ffffffff819749b5>] hub_thread+0x3c5/0x1290
        [<ffffffff810c6a6d>] kthread+0xed/0x110
        [<ffffffff81d0a56c>] ret_from_fork+0x7c/0xb0
      
      Fix it by making kernfs_deactivate() perform lockdep annotations only
      if KERNFS_LOCKDEP is set.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Reported-by: NAlan Stern <stern@rowland.harvard.edu>
      Reported-by: NJiri Kosina <jkosina@suse.cz>
      Reported-by: NDave Jones <davej@redhat.com>
      Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da9846ae
  21. 18 1月, 2014 1 次提交
    • T
      kernfs: associate a new kernfs_node with its parent on creation · db4aad20
      Tejun Heo 提交于
      Once created, a kernfs_node is always destroyed by kernfs_put().
      Since ba7443bc ("sysfs, kernfs: implement
      kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
      to locate the ino_ida.  kernfs_root() in turn depends on
      kernfs_node->parent being set for !dir nodes.  This means that
      kernfs_put() of a !dir node requires its ->parent to be initialized.
      
      This leads to oops when a newly created !dir node is destroyed without
      going through kernfs_add_one() or after failing kernfs_add_one()
      before ->parent is set.  kernfs_root() invoked from kernfs_put() will
      try to dereference NULL parent.
      
      Fix it by moving parent association to kernfs_new_node() from
      kernfs_add_one().  kernfs_new_node() now takes @parent instead of
      @root and determines the root from the parent and also sets the new
      node's parent properly.  @parent parameter is removed from
      kernfs_add_one().  As there's no parent when creating the root node,
      __kernfs_new_node() which takes @root as before and doesn't set the
      parent is used in that case.
      
      This ensures that a kernfs_node in any stage in its life has its
      parent associated and thus can be put.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db4aad20
  22. 14 1月, 2014 4 次提交