1. 11 2月, 2015 6 次提交
  2. 09 1月, 2015 2 次提交
  3. 19 12月, 2014 3 次提交
    • J
      ocfs2: fix journal commit deadlock · 136f49b9
      Junxiao Bi 提交于
      For buffer write, page lock will be got in write_begin and released in
      write_end, in ocfs2_write_end_nolock(), before it unlock the page in
      ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
      for the read lock of journal->j_trans_barrier.  Holding page lock and
      ask for journal->j_trans_barrier breaks the locking order.
      
      This will cause a deadlock with journal commit threads, ocfs2cmt will
      get write lock of journal->j_trans_barrier first, then it wakes up
      kjournald2 to do the commit work, at last it waits until done.  To
      commit journal, kjournald2 needs flushing data first, it needs get the
      cache page lock.
      
      Since some ocfs2 cluster locks are holding by write process, this
      deadlock may hung the whole cluster.
      
      unlock pages before ocfs2_run_deallocs() can fix the locking order, also
      put unlock before ocfs2_commit_trans() to make page lock is unlocked
      before j_trans_barrier to preserve unlocking order.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      136f49b9
    • J
      ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker · 1e589581
      Joseph Qi 提交于
      Commit ac4fef4d ("ocfs2/dlm: do not purge lockres that is queued for
      assert master") may have the following possible race case:
      
        dlm_dispatch_assert_master       dlm_wq
        ========================================================================
        queue_work(dlm->quedlm_worker,
            &dlm->dispatched_work);
                                       dispatch work,
                                       dlm_lockres_drop_inflight_worker
                                       *BUG_ON(res->inflight_assert_workers == 0)*
        dlm_lockres_grab_inflight_worker
        inflight_assert_workers++
      
      So ensure inflight_assert_workers to be increased first.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NXue jiufei <xuejiufei@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e589581
    • J
      ocfs2: reflink: fix slow unlink for refcounted file · f62f12b3
      Junxiao Bi 提交于
      When running ocfs2 test suite multiple nodes reflink stress test, for a
      4 nodes cluster, every unlink() for refcounted file needs about 700s.
      
      The slow unlink is caused by the contention of refcount tree lock since
      all nodes are unlink files using the same refcount tree.  When the
      unlinking file have many extents(over 1600 in our test), most of the
      extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
      execute the following call trace for every extents.  This means it needs
      get and released refcount tree lock about 1600 times.  And when several
      nodes are do this at the same time, the performance will be very low.
      
        ocfs2_remove_btree_range()
        --  ocfs2_lock_refcount_tree()
        ----  ocfs2_refcount_lock()
        ------  __ocfs2_cluster_lock()
      
      ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
      do lock/unlock once can improve a lot performance.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Wengang <wen.gang.wang@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f62f12b3
  4. 11 12月, 2014 13 次提交
  5. 20 11月, 2014 1 次提交
  6. 10 11月, 2014 1 次提交
  7. 06 11月, 2014 1 次提交
  8. 04 11月, 2014 1 次提交
  9. 01 11月, 2014 1 次提交
  10. 30 10月, 2014 1 次提交
  11. 14 10月, 2014 1 次提交
  12. 10 10月, 2014 9 次提交
    • X
      ocfs2: fix a deadlock while o2net_wq doing direct memory reclaim · b246d3d1
      Xue jiufei 提交于
      Fix a deadlock problem caused by direct memory reclaim in o2net_wq.  The
      situation is as follows:
      
      1) Receive a connect message from another node, node queues a
         work_struct o2net_listen_work.
      
      2) o2net_wq processes this work and call the following functions:
      
      o2net_wq
      -> o2net_accept_one
        -> sock_create_lite
          -> sock_alloc()
            -> kmem_cache_alloc with GFP_KERNEL
              -> ____cache_alloc_node
                ->__alloc_pages_nodemask
                  -> do_try_to_free_pages
                    -> shrink_slab
                      -> evict
                        -> ocfs2_evict_inode
                          -> ocfs2_drop_lock
                            -> dlmunlock
                              -> o2net_send_message_vec
      
         then o2net_wq wait for the unlock reply from master.
      
      3) tcp layer received the reply, call o2net_data_ready() and queue
         sc_rx_work, waiting o2net_wq to process this work.
      
      4) o2net_wq is a single thread workqueue, it process the work one by
         one.  Right now it is still doing o2net_listen_work and cannot handle
         sc_rx_work.  so we deadlock.
      
      Junxiao Bi's patch "mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set"
      (http://ozlabs.org/~akpm/mmots/broken-out/mm-clear-__gfp_fs-when-pf_memalloc_noio-is-set.patch)
      clears __GFP_FS in memalloc_noio_flags() besides __GFP_IO.  We use
      memalloc_noio_save() to set process flag PF_MEMALLOC_NOIO so that all
      allocations done by this process are done as if GFP_NOIO was specified.
      We are not reentering filesystem while doing memory reclaim.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b246d3d1
    • J
      ocfs2: fix deadlock due to wrong locking order · f775da2f
      Junxiao Bi 提交于
      For commit ocfs2 journal, ocfs2 journal thread will acquire the mutex
      osb->journal->j_trans_barrier and wake up jbd2 commit thread, then it
      will wait until jbd2 commit thread done. In order journal mode, jbd2
      needs flushing dirty data pages first, and this needs get page lock.
      So osb->journal->j_trans_barrier should be got before page lock.
      
      But ocfs2_write_zero_page() and ocfs2_write_begin_inline() obey this
      locking order, and this will cause deadlock and hung the whole cluster.
      
      One deadlock catched is the following:
      
      PID: 13449  TASK: ffff8802e2f08180  CPU: 31  COMMAND: "oracle"
       #0 [ffff8802ee3f79b0] __schedule at ffffffff8150a524
       #1 [ffff8802ee3f7a58] schedule at ffffffff8150acbf
       #2 [ffff8802ee3f7a68] rwsem_down_failed_common at ffffffff8150cb85
       #3 [ffff8802ee3f7ad8] rwsem_down_read_failed at ffffffff8150cc55
       #4 [ffff8802ee3f7ae8] call_rwsem_down_read_failed at ffffffff812617a4
       #5 [ffff8802ee3f7b50] ocfs2_start_trans at ffffffffa0498919 [ocfs2]
       #6 [ffff8802ee3f7ba0] ocfs2_zero_start_ordered_transaction at ffffffffa048b2b8 [ocfs2]
       #7 [ffff8802ee3f7bf0] ocfs2_write_zero_page at ffffffffa048e9bd [ocfs2]
       #8 [ffff8802ee3f7c80] ocfs2_zero_extend_range at ffffffffa048ec83 [ocfs2]
       #9 [ffff8802ee3f7ce0] ocfs2_zero_extend at ffffffffa048edfd [ocfs2]
       #10 [ffff8802ee3f7d50] ocfs2_extend_file at ffffffffa049079e [ocfs2]
       #11 [ffff8802ee3f7da0] ocfs2_setattr at ffffffffa04910ed [ocfs2]
       #12 [ffff8802ee3f7e70] notify_change at ffffffff81187d29
       #13 [ffff8802ee3f7ee0] do_truncate at ffffffff8116bbc1
       #14 [ffff8802ee3f7f50] sys_ftruncate at ffffffff8116bcbd
       #15 [ffff8802ee3f7f80] system_call_fastpath at ffffffff81515142
          RIP: 00007f8de750c6f7  RSP: 00007fffe786e478  RFLAGS: 00000206
          RAX: 000000000000004d  RBX: ffffffff81515142  RCX: 0000000000000000
          RDX: 0000000000000200  RSI: 0000000000028400  RDI: 000000000000000d
          RBP: 00007fffe786e040   R8: 0000000000000000   R9: 000000000000000d
          R10: 0000000000000000  R11: 0000000000000206  R12: 000000000000000d
          R13: 00007fffe786e710  R14: 00007f8de70f8340  R15: 0000000000028400
          ORIG_RAX: 000000000000004d  CS: 0033  SS: 002b
      
      crash64> bt
      PID: 7610   TASK: ffff88100fd56140  CPU: 1   COMMAND: "ocfs2cmt"
       #0 [ffff88100f4d1c50] __schedule at ffffffff8150a524
       #1 [ffff88100f4d1cf8] schedule at ffffffff8150acbf
       #2 [ffff88100f4d1d08] jbd2_log_wait_commit at ffffffffa01274fd [jbd2]
       #3 [ffff88100f4d1d98] jbd2_journal_flush at ffffffffa01280b4 [jbd2]
       #4 [ffff88100f4d1dd8] ocfs2_commit_cache at ffffffffa0499b14 [ocfs2]
       #5 [ffff88100f4d1e38] ocfs2_commit_thread at ffffffffa0499d38 [ocfs2]
       #6 [ffff88100f4d1ee8] kthread at ffffffff81090db6
       #7 [ffff88100f4d1f48] kernel_thread_helper at ffffffff81516284
      
      crash64> bt
      PID: 7609   TASK: ffff88100f2d4480  CPU: 0   COMMAND: "jbd2/dm-20-86"
       #0 [ffff88100def3920] __schedule at ffffffff8150a524
       #1 [ffff88100def39c8] schedule at ffffffff8150acbf
       #2 [ffff88100def39d8] io_schedule at ffffffff8150ad6c
       #3 [ffff88100def39f8] sleep_on_page at ffffffff8111069e
       #4 [ffff88100def3a08] __wait_on_bit_lock at ffffffff8150b30a
       #5 [ffff88100def3a58] __lock_page at ffffffff81110687
       #6 [ffff88100def3ab8] write_cache_pages at ffffffff8111b752
       #7 [ffff88100def3be8] generic_writepages at ffffffff8111b901
       #8 [ffff88100def3c48] journal_submit_data_buffers at ffffffffa0120f67 [jbd2]
       #9 [ffff88100def3cf8] jbd2_journal_commit_transaction at ffffffffa0121372[jbd2]
       #10 [ffff88100def3e68] kjournald2 at ffffffffa0127a86 [jbd2]
       #11 [ffff88100def3ee8] kthread at ffffffff81090db6
       #12 [ffff88100def3f48] kernel_thread_helper at ffffffff81516284
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Alex Chen <alex.chen@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f775da2f
    • J
      ocfs2: fix deadlock between o2hb thread and o2net_wq · 70e82a12
      Joseph Qi 提交于
      The following case may lead to o2net_wq and o2hb thread deadlock on
      o2hb_callback_sem.
      Currently there are 2 nodes say N1, N2 in the cluster. And N2 down, at
      the same time, N3 tries to join the cluster. So N1 will handle node
      down (N2) and join (N3) simultaneously.
          o2hb                               o2net_wq
          ->o2hb_do_disk_heartbeat
          ->o2hb_check_slot
          ->o2hb_run_event_list
          ->o2hb_fire_callbacks
          ->down_write(&o2hb_callback_sem)
          ->o2net_hb_node_down_cb
          ->flush_workqueue(o2net_wq)
                                             ->o2net_process_message
                                             ->dlm_query_join_handler
                                             ->o2hb_check_node_heartbeating
                                             ->o2hb_fill_node_map
                                             ->down_read(&o2hb_callback_sem)
      
      No need to take o2hb_callback_sem in dlm_query_join_handler,
      o2hb_live_lock is enough to protect live node map.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: xMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: jiangyiwen <jiangyiwen@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70e82a12
    • J
      ocfs2: don't fire quorum before connection established · 5046f18d
      Junxiao Bi 提交于
      Firing quorum before connection established can cause unexpected node to
      reboot.
      
      Assume there are 3 nodes in the cluster, Node 1, 2, 3.  Node 2 and 3 have
      wrong ip address of Node 1 in cluster.conf and global heartbeat is enabled
      in the cluster.  After the heatbeats are started on these three nodes,
      Node 1 will reboot due to quorum fencing.  It is similar case if Node 1's
      networking is not ready when starting the global heartbeat.
      
      The reboot is not friendly as customer is not fully ready for ocfs2 to
      work.  Fix it by not allowing firing quorum before the connection is
      established.  In this case, ocfs2 will wait until the wrong configuration
      is fixed or networking is up to continue.  Also update the log to guide
      the user where to check when connection is not built for a long time.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5046f18d
    • R
      fs/ocfs2/dlmglue.c: use __seq_open_private() not seq_open() · 1848cb55
      Rob Jones 提交于
      Reduce boilerplate code by using seq_open_private() instead of seq_open()
      Signed-off-by: NRob Jones <rob.jones@codethink.co.uk>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1848cb55
    • R
      fs/ocfs2/cluster/netdebug.c: use seq_open_private() not seq_open() · f3288338
      Rob Jones 提交于
      Reduce boilerplate code by using seq_open_private() instead of seq_open()
      
      Note that the code in and using sc_common_open() has been quite
      extensively changed.  Not least because there was a latent memory leak in
      the code as was: if sc_common_open() failed, the previously allocated
      buffer was not freed.
      Signed-off-by: NRob Jones <rob.jones@codethink.co.uk>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3288338
    • R
      fs/ocfs2/dlm/dlmdebug.c: use seq_open_private() not seq_open() · 8f9ac032
      Rob Jones 提交于
      Reduce boilerplate code by using seq_open_private() instead of seq_open()
      Signed-off-by: NRob Jones <rob.jones@codethink.co.uk>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f9ac032
    • X
      ocfs2: remove unused code in dlm_new_lockres() · 6ae07548
      Xue jiufei 提交于
      Remove the branch that free res->lockname.name because the condition
      is never satisfied when jump to label error.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ae07548
    • A
      ocfs2/dlm: call dlm_lockres_put without resource spinlock · 9a7e6b5a
      alex chen 提交于
      dlm_lockres_put() should be called without &res->spinlock, otherwise a
      deadlock case may happen.
      
      spin_lock(&res->spinlock)
      ...
      dlm_lockres_put
        ->dlm_lockres_release
          ->dlm_print_one_lock_resource
            ->spin_lock(&res->spinlock)
      Signed-off-by: NAlex Chen <alex.chen@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a7e6b5a