1. 07 1月, 2011 2 次提交
  2. 01 5月, 2009 1 次提交
    • L
      configfs: Silence lockdep on mkdir() and rmdir() · e74cc06d
      Louis Rilling 提交于
      When attaching default groups (subdirs) of a new group (in mkdir() or
      in configfs_register()), configfs recursively takes inode's mutexes
      along the path from the parent of the new group to the default
      subdirs. This is needed to ensure that the VFS will not race with
      operations on these sub-dirs. This is safe for the following reasons:
      
      - the VFS allows one to lock first an inode and second one of its
        children (The lock subclasses for this pattern are respectively
        I_MUTEX_PARENT and I_MUTEX_CHILD);
      - from this rule any inode path can be recursively locked in
        descending order as long as it stays under a single mountpoint and
        does not follow symlinks.
      
      Unfortunately lockdep does not know (yet?) how to handle such
      recursion.
      
      I've tried to use Peter Zijlstra's lock_set_subclass() helper to
      upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know
      that we might recursively lock some of their descendant, but this
      usage does not seem to fit the purpose of lock_set_subclass() because
      it leads to several i_mutex locked with subclass I_MUTEX_PARENT by
      the same task.
      
      >From inside configfs it is not possible to serialize those recursive
      locking with a top-level one, because mkdir() and rmdir() are already
      called with inodes locked by the VFS. So using some
      mutex_lock_nest_lock() is not an option.
      
      I am proposing two solutions:
      1) one that wraps recursive mutex_lock()s with
         lockdep_off()/lockdep_on().
      2) (as suggested earlier by Peter Zijlstra) one that puts the
         i_mutexes recursively locked in different classes based on their
         depth from the top-level config_group created. This
         induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
         nesting of configfs default groups whenever lockdep is activated
         but this limit looks reasonably high. Unfortunately, this also
         isolates VFS operations on configfs default groups from the others
         and thus lowers the chances to detect locking issues.
      
      Nobody likes solution 1), which I can understand.
      
      This patch implements solution 2). However lockdep is still not happy with
      configfs_depend_item(). Next patch reworks the locking of
      configfs_depend_item() and finally makes lockdep happy.
      
      [ Note: This hides a few locking interactions with the VFS from lockdep.
        That was my big concern, because we like lockdep's protection.  However,
        the current state always dumps a spurious warning.  The locking is
        correct, so I tell people to ignore the warning and that we'll keep
        our eyes on the locking to make sure it stays correct.  With this patch,
        we eliminate the warning.  We do lose some of the lockdep protections,
        but this only means that we still have to keep our eyes on the locking.
        We're going to do that anyway.  -- Joel ]
      Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e74cc06d
  3. 01 8月, 2008 2 次提交
    • L
      [PATCH] configfs: Prevent userspace from creating new entries under attaching directories · 2a109f2a
      Louis Rilling 提交于
      process 1: 					process 2:
      configfs_mkdir("A")
        attach_group("A")
          attach_item("A")
            d_instantiate("A")
          populate_groups("A")
            mutex_lock("A")
            attach_group("A/B")
              attach_item("A")
                d_instantiate("A/B")
      						mkdir("A/B/C")
      						  do_path_lookup("A/B/C", LOOKUP_PARENT)
      						    ok
      						  lookup_create("A/B/C")
      						    mutex_lock("A/B")
      						    ok
      						  configfs_mkdir("A/B/C")
      						    ok
            attach_group("A/C")
              attach_item("A/C")
                d_instantiate("A/C")
              populate_groups("A/C")
                mutex_lock("A/C")
                attach_group("A/C/D")
                  attach_item("A/C/D")
                    failure
                mutex_unlock("A/C")
                detach_groups("A/C")
                  nothing to do
      						mkdir("A/C/E")
      						  do_path_lookup("A/C/E", LOOKUP_PARENT)
      						    ok
      						  lookup_create("A/C/E")
      						    mutex_lock("A/C")
      						    ok
      						  configfs_mkdir("A/C/E")
      						    ok
              detach_item("A/C")
              d_delete("A/C")
            mutex_unlock("A")
            detach_groups("A")
              mutex_lock("A/B")
              detach_group("A/B")
      	  detach_groups("A/B")
      	    nothing since no _default_ group
                detach_item("A/B")
              mutex_unlock("A/B")
              d_delete("A/B")
          detach_item("A")
          d_delete("A")
      
      Two bugs:
      
      1/ "A/B/C" and "A/C/E" are created, but never removed while their parent are
      removed in the end. The same could happen with symlink() instead of mkdir().
      
      2/ "A" and "A/C" inodes are not locked while detach_item() is called on them,
         which may probably confuse VFS.
      
      This commit fixes 1/, tagging new directories with CONFIGFS_USET_CREATING before
      building the inode and instantiating the dentry, and validating the whole
      group+default groups hierarchy in a second pass by clearing
      CONFIGFS_USET_CREATING.
      	mkdir(), symlink(), lookup(), and dir_open() simply return -ENOENT if
      called in (or linking to) a directory tagged with CONFIGFS_USET_CREATING. This
      does not prevent userspace from calling stat() successfuly on such directories,
      but this prevents userspace from adding (children to | symlinking from/to |
      read/write attributes of | listing the contents of) not validated items. In
      other words, userspace will not interact with the subsystem on a new item until
      the new item creation completes correctly.
      	It was first proposed to re-use CONFIGFS_USET_IN_MKDIR instead of a new
      flag CONFIGFS_USET_CREATING, but this generated conflicts when checking the
      target of a new symlink: a valid target directory in the middle of attaching
      a new user-created child item could be wrongly detected as being attached.
      
      2/ is fixed by next commit.
      Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      2a109f2a
    • L
      [PATCH] configfs: Fix failing symlink() making rmdir() fail · 9a73d78c
      Louis Rilling 提交于
      On a similar pattern as mkdir() vs rmdir(), a failing symlink() may make rmdir()
      fail for the symlink's parent and the symlink's target as well.
      
      failing symlink() making target's rmdir() fail:
      
      	process 1:				process 2:
      	symlink("A/S" -> "B")
      	  allow_link()
      	  create_link()
      	    attach to "B" links list
      						rmdir("B")
      						  detach_prep("B")
      						    error because of new link
      	    configfs_create_link("A", "S")
      	      error (eg -ENOMEM)
      
      failing symlink() making parent's rmdir() fail:
      
      	process 1:				process 2:
      	symlink("A/D/S" -> "B")
      	  allow_link()
      	  create_link()
      	    attach to "B" links list
      	    configfs_create_link("A/D", "S")
      	      make_dirent("A/D", "S")
      						rmdir("A")
      						  detach_prep("A")
      						    detach_prep("A/D")
      						      error because of "S"
      	      create("S")
      	        error (eg -ENOMEM)
      
      We cannot use the same solution as for mkdir() vs rmdir(), since rmdir() on the
      target cannot wait on the i_mutex of the new symlink's parent without risking a
      deadlock (with other symlink() or sys_rename()). Instead we define a global
      mutex protecting all configfs symlinks attachment, so that rmdir() can avoid the
      races above.
      Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9a73d78c
  4. 15 7月, 2008 2 次提交
    • L
      configfs: Fix failing mkdir() making racing rmdir() fail · 6d8344ba
      Louis Rilling 提交于
      When fixing the rename() vs rmdir() deadlock, we stopped locking default groups'
      inodes in configfs_detach_prep(), letting racing mkdir() in default groups
      proceed concurrently. This enables races like below happen, which leads to a
      failing mkdir() making rmdir() fail, despite the group to remove having no
      user-created directory under it in the end.
      
      	process A: 			process B:
      	/* PWD=A/B */
      	mkdir("C")
      	  make_item("C")
      	  attach_group("C")
      					rmdir("A")
      					  detach_prep("A")
      					    detach_prep("B")
      					      error because of "C"
      					  return -ENOTEMPTY
      	    attach_group("C/D")
      	      error (eg -ENOMEM)
      	  return -ENOMEM
      
      This patch prevents such scenarii by making rmdir() wait as long as
      detach_prep() fails because a racing mkdir() is in the middle of attach_group().
      To achieve this, mkdir() sets a flag CONFIGFS_USET_IN_MKDIR in parent's
      configfs_dirent before calling attach_group(), and clears the flag once
      attach_group() is done. detach_prep() fails with -EAGAIN whenever the flag is
      hit and returns the guilty inode's mutex so that rmdir() can wait on it.
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6d8344ba
    • L
      configfs: Introduce configfs_dirent_lock · 6f610764
      Louis Rilling 提交于
      This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
      traversals against linkage mutations (add/del/move). This will allow
      configfs_detach_prep() to avoid locking i_mutexes.
      
      Locking rules for configfs_dirent linkage mutations are the same plus the
      requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
      either take appropriate i_mutex as before, or take configfs_dirent_lock.
      
      The spinlock could actually be a mutex, but the critical sections are either
      O(1) or should not be too long (default groups walking in last patch).
      
      ChangeLog:
        - Clarify the comment on configfs_dirent_lock usage
        - Move sd->s_element init before linking the new dirent
        - In lseek(), do not release configfs_dirent_lock before the dirent is
          relinked.
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6f610764
  5. 17 10月, 2007 1 次提交
  6. 11 7月, 2007 1 次提交
    • J
      configfs: config item dependancies. · 631d1feb
      Joel Becker 提交于
      Sometimes other drivers depend on particular configfs items.  For
      example, ocfs2 mounts depend on a heartbeat region item.  If that
      region item is removed with rmdir(2), the ocfs2 mount must BUG or go
      readonly.  Not happy.
      
      This provides two additional API calls: configfs_depend_item() and
      configfs_undepend_item().  A client driver can call
      configfs_depend_item() on an existing item to tell configfs that it is
      depended on.  configfs will then return -EBUSY from rmdir(2) for that
      item.  When the item is no longer depended on, the client driver calls
      configfs_undepend_item() on it.
      
      These API cannot be called underneath any configfs callbacks, as
      they will conflict.  They can block and allocate.  A client driver
      probably shouldn't calling them of its own gumption.  Rather it should
      be providing an API that external subsystems call.
      
      How does this work?  Imagine the ocfs2 mount process.  When it mounts,
      it asks for a heart region item.  This is done via a call into the
      heartbeat code.  Inside the heartbeat code, the region item is looked
      up.  Here, the heartbeat code calls configfs_depend_item().  If it
      succeeds, then heartbeat knows the region is safe to give to ocfs2.
      If it fails, it was being torn down anyway, and heartbeat can gracefully
      pass up an error.
      
      [ Fixed some bad whitespace in configfs.txt. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      631d1feb
  7. 13 2月, 2007 1 次提交
  8. 08 12月, 2006 1 次提交
  9. 29 3月, 2006 1 次提交
  10. 04 2月, 2006 1 次提交
  11. 04 1月, 2006 1 次提交