1. 04 4月, 2014 2 次提交
    • X
      ocfs2: __ocfs2_mknod_locked should return error when ocfs2_create_new_inode_locks() failed · 466e68c4
      Xue jiufei 提交于
      When ocfs2_create_new_inode_locks() return error, inode open lock may
      not be obtainted for this inode.  So other nodes can remove this file
      and free dinode when inode still remain in memory on this node, which is
      not correct and may trigger BUG.  So __ocfs2_mknod_locked should return
      error when ocfs2_create_new_inode_locks() failed.
      
                    Node_1                              Node_2
      create fileA, call ocfs2_mknod()
        -> ocfs2_get_init_inode(), allocate inodeA
        -> ocfs2_claim_new_inode(), claim dinode(dinodeA)
        -> call ocfs2_create_new_inode_locks(),
           create open lock failed, return error
        -> __ocfs2_mknod_locked return success
      
                                                      unlink fileA
                                                      try open lock succeed,
                                                      and free dinodeA
      
      create another file, call ocfs2_mknod()
        -> ocfs2_get_init_inode(), allocate inodeB
        -> ocfs2_claim_new_inode(), as Node_2 had freed dinodeA,
           so claim dinodeA and update generation for dinodeA
      
      call __ocfs2_drop_dl_inodes()->ocfs2_delete_inode()
      to free inodeA, and finally triggers BUG
      on(inode->i_generation != le32_to_cpu(fe->i_generation))
      in function ocfs2_inode_lock_update().
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      466e68c4
    • D
      ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file · 2931cdcb
      Darrick J. Wong 提交于
      Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
      transaction to complete.  This isn't terribly efficient, since sync_file
      really only needs to wait for the last transaction involving that inode
      to complete, and this doesn't require i_mutex.
      
      Therefore, implement the necessary bits to track the newest tid
      associated with an inode, and teach sync_file to wait for that instead
      of waiting for everything in the journal to commit.  Furthermore, only
      issue the flush request to the drive if jbd2 hasn't already done so.
      
      This also eliminates the deadlock between ocfs2_file_aio_write() and
      ocfs2_sync_file().  aio_write takes i_mutex then calls
      ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
      However, if that dio completion involves calling fsync, then we can get
      into trouble when some ocfs2_sync_file tries to take i_mutex.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2931cdcb
  2. 11 2月, 2014 1 次提交
    • X
      ocfs2: check existence of old dentry in ocfs2_link() · 0e048316
      Xue jiufei 提交于
      System call linkat first calls user_path_at(), check the existence of
      old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
      work.  There may exist a race when Node A create a hard link for file
      while node B rm it.
      
               Node A                          Node B
      user_path_at()
        ->ocfs2_lookup(),
      find old dentry exist
                                      rm file, add inode say inodeA
                                      to orphan_dir
      
      call ocfs2_link(),create a
      hard link for inodeA.
      
                                      rm the link, add inodeA to orphan_dir
                                      again
      
      When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
      main work.  It first tranverses entrys in orphan_dir, linking all inodes
      in this orphan_dir to a list look like this:
      
      	inodeA->inodeB->...->inodeA
      
      When tranvering this list, it will fall into loop, calling iput() again
      and again.  And finally trigger BUG_ON(inode->i_state & I_CLEAR).
      Signed-off-by: Njoyce <xuejiufei@huawei.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e048316
  3. 28 1月, 2014 1 次提交
  4. 26 1月, 2014 1 次提交
  5. 13 11月, 2013 1 次提交
  6. 04 7月, 2013 3 次提交
  7. 13 6月, 2013 2 次提交
  8. 13 2月, 2013 1 次提交
  9. 14 7月, 2012 2 次提交
  10. 30 5月, 2012 1 次提交
  11. 14 2月, 2012 1 次提交
  12. 04 1月, 2012 4 次提交
  13. 02 11月, 2011 3 次提交
  14. 26 7月, 2011 1 次提交
  15. 20 7月, 2011 1 次提交
  16. 26 5月, 2011 3 次提交
  17. 31 3月, 2011 1 次提交
  18. 23 2月, 2011 1 次提交
  19. 07 3月, 2011 1 次提交
    • T
      ocfs2: Remove EXIT from masklog. · c1e8d35e
      Tao Ma 提交于
      mlog_exit is used to record the exit status of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      This patch just try to remove it or change it. So:
      1. if all the error paths already use mlog_errno, it is just removed.
         Otherwise, it will be replaced by mlog_errno.
      2. if it is used to print some return value, it is replaced with
         mlog(0,...).
      mlog_exit_ptr is changed to mlog(0.
      All those mlog(0,...) will be replaced with trace events later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      c1e8d35e
  20. 21 2月, 2011 1 次提交
    • T
      ocfs2: Remove ENTRY from masklog. · ef6b689b
      Tao Ma 提交于
      ENTRY is used to record the entry of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      So for mlog_entry_void, we just remove it.
      for mlog_entry(...), we replace it with mlog(0,...), and they
      will be replace by trace event later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      ef6b689b
  21. 02 2月, 2011 1 次提交
    • E
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris 提交于
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a7dba39
  22. 13 1月, 2011 1 次提交
  23. 07 1月, 2011 1 次提交
    • N
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin 提交于
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb045adb
  24. 23 12月, 2010 1 次提交
  25. 26 10月, 2010 1 次提交
  26. 11 9月, 2010 1 次提交
    • G
      Track negative entries v3 · 5e98d492
      Goldwyn Rodrigues 提交于
      Track negative dentries by recording the generation number of the parent
      directory in d_fsdata. The generation number for the parent directory is
      recorded in the inode_info, which increments every time the lock on the
      directory is dropped.
      
      If the generation number of the parent directory and the negative dentry
      matches, there is no need to perform the revalidate, else a revalidate
      is forced. This improves performance in situations where nodes look for
      the same non-existent file multiple times.
      
      Thanks Mark for explaining the DLM sequence.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      5e98d492
  27. 08 9月, 2010 2 次提交
    • M
      ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan · 97b8f4a9
      Mark Fasheh 提交于
      ocfs2_create_inode_in_orphan() is used by reflink to create the newly
      reflinked inode simultaneously in the orphan dir. This allows us to easily
      handle partially-reflinked files during recovery cleanup.
      
      We have a problem though - the orphan dir stringifies inode # to determine
      a unique name under which the orphan entry dirent can be created. Since
      ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir
      before it can allocate the inode, we currently call into the orphan code:
      
             /*
              * We give the orphan dir the root blkno to fake an orphan name,
              * and allocate enough space for our insertion.
              */
             status = ocfs2_prepare_orphan_dir(osb, &orphan_dir,
                                               osb->root_blkno,
                                               orphan_name, &orphan_insert);
      
      Using osb->root_blkno might work fine on unindexed directories, but the
      orphan dir can have an index.  When it has that index, the above code fails
      to allocate the proper index entry.  Later, when we try to remove the file
      from the orphan dir (using the actual inode #), the reflink operation will
      fail.
      
      To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the
      newly split out orphan and inode alloc code to figure out what the inode
      block number will be (once allocated) and then prepare the orphan dir from
      that data.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      97b8f4a9
    • M
      ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions · dd43bcde
      Mark Fasheh 提交于
      We do this because ocfs2_create_inode_in_orphan() wants to order locking of
      the orphan dir with respect to locking of the inode allocator *before*
      making any changes to the directory.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      dd43bcde