1. 03 2月, 2009 3 次提交
    • J
      configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() · 0e033342
      Joel Becker 提交于
      When attaching default groups (subdirs) of a new group (in mkdir() or
      in configfs_register()), configfs recursively takes inode's mutexes
      along the path from the parent of the new group to the default
      subdirs. This is needed to ensure that the VFS will not race with
      operations on these sub-dirs. This is safe for the following reasons:
      
      - the VFS allows one to lock first an inode and second one of its
        children (The lock subclasses for this pattern are respectively
        I_MUTEX_PARENT and I_MUTEX_CHILD);
      - from this rule any inode path can be recursively locked in
        descending order as long as it stays under a single mountpoint and
        does not follow symlinks.
      
      Unfortunately lockdep does not know (yet?) how to handle such
      recursion.
      
      I've tried to use Peter Zijlstra's lock_set_subclass() helper to
      upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know
      that we might recursively lock some of their descendant, but this
      usage does not seem to fit the purpose of lock_set_subclass() because
      it leads to several i_mutex locked with subclass I_MUTEX_PARENT by
      the same task.
      
      >From inside configfs it is not possible to serialize those recursive
      locking with a top-level one, because mkdir() and rmdir() are already
      called with inodes locked by the VFS. So using some
      mutex_lock_nest_lock() is not an option.
      
      I am proposing two solutions:
      1) one that wraps recursive mutex_lock()s with
         lockdep_off()/lockdep_on().
      2) (as suggested earlier by Peter Zijlstra) one that puts the
         i_mutexes recursively locked in different classes based on their
         depth from the top-level config_group created. This
         induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
         nesting of configfs default groups whenever lockdep is activated
         but this limit looks reasonably high. Unfortunately, this alos
         isolates VFS operations on configfs default groups from the others
         and thus lowers the chances to detect locking issues.
      
      This patch implements solution 1).
      
      Solution 2) looks better from lockdep's point of view, but fails with
      configfs_depend_item(). This needs to rework the locking
      scheme of configfs_depend_item() by removing the variable lock recursion
      depth, and I think that it's doable thanks to the configfs_dirent_lock.
      For now, let's stick to solution 1).
      Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0e033342
    • J
      ocfs2: Fix possible deadlock in ocfs2_write_dquot() · f8afead7
      Jan Kara 提交于
      It could happen that some limit has been set via quotactl() and in parallel
      ->mark_dirty() is called from another thread doing e.g. dquot_alloc_space(). In
      such case ocfs2_write_dquot() must not try to sync the dquot because that needs
      global quota lock but that ranks above transaction start.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f8afead7
    • J
      ocfs2: Push out dropping of dentry lock to ocfs2_wq · ea455f8a
      Jan Kara 提交于
      Dropping of last reference to dentry lock is a complicated operation involving
      dropping of reference to inode. This can get complicated and quota code in
      particular needs to obtain some quota locks which leads to potential deadlock.
      Thus we defer dropping of inode reference to ocfs2_wq.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ea455f8a
  2. 02 2月, 2009 1 次提交
    • L
      Manually revert "mlock: downgrade mmap sem while populating mlocked regions" · 27421e21
      Linus Torvalds 提交于
      This essentially reverts commit 8edb08ca.
      
      It downgraded our mmap semaphore to a read-lock while mlocking pages, in
      order to allow other threads (and external accesses like "ps" et al) to
      walk the vma lists and take page faults etc.  Which is a nice idea, but
      the implementation does not work.
      
      Because we cannot upgrade the lock back to a write lock without
      releasing the mmap semaphore, the code had to release the lock entirely
      and then re-take it as a writelock.  However, that meant that the caller
      possibly lost the vma chain that it was following, since now another
      thread could come in and mmap/munmap the range.
      
      The code tried to work around that by just looking up the vma again and
      erroring out if that happened, but quite frankly, that was just a buggy
      hack that doesn't actually protect against anything (the other thread
      could just have replaced the vma with another one instead of totally
      unmapping it).
      
      The only way to downgrade to a read map _reliably_ is to do it at the
      end, which is likely the right thing to do: do all the 'vma' operations
      with the write-lock held, then downgrade to a read after completing them
      all, and then do the "populate the newly mlocked regions" while holding
      just the read lock.  And then just drop the read-lock and return to user
      space.
      
      The (perhaps somewhat simpler) alternative is to just make all the
      callers of mlock_vma_pages_range() know that the mmap lock got dropped,
      and just re-grab the mmap semaphore if it needs to mlock more than one
      vma region.
      
      So we can do this "downgrade mmap sem while populating mlocked regions"
      thing right, but the way it was done here was absolutely not correct.
      Thus the revert, in the expectation that we will do it all correctly
      some day.
      
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27421e21
  3. 01 2月, 2009 11 次提交
  4. 31 1月, 2009 25 次提交