1. 04 12月, 2018 1 次提交
    • J
      jbd2: avoid long hold times of j_state_lock while committing a transaction · 96f1e097
      Jan Kara 提交于
      We can hold j_state_lock for writing at the beginning of
      jbd2_journal_commit_transaction() for a rather long time (reportedly for
      30 ms) due cleaning revoke bits of all revoked buffers under it. The
      handling of revoke tables as well as cleaning of t_reserved_list, and
      checkpoint lists does not need j_state_lock for anything. It is only
      needed to prevent new handles from joining the transaction. Generally
      T_LOCKED transaction state prevents new handles from joining the
      transaction - except for reserved handles which have to allowed to join
      while we wait for other handles to complete.
      
      To prevent reserved handles from joining the transaction while cleaning
      up lists, add new transaction state T_SWITCH and watch for it when
      starting reserved handles. With this we can just drop the lock for
      operations that don't need it.
      Reported-and-tested-by: NAdrian Hunter <adrian.hunter@intel.com>
      Suggested-by: N"Theodore Y. Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      96f1e097
  2. 06 10月, 2018 1 次提交
    • J
      jbd2: fix use after free in jbd2_log_do_checkpoint() · ccd3c437
      Jan Kara 提交于
      The code cleaning transaction's lists of checkpoint buffers has a bug
      where it increases bh refcount only after releasing
      journal->j_list_lock. Thus the following race is possible:
      
      CPU0					CPU1
      jbd2_log_do_checkpoint()
      					jbd2_journal_try_to_free_buffers()
      					  __journal_try_to_free_buffer(bh)
        ...
        while (transaction->t_checkpoint_io_list)
        ...
          if (buffer_locked(bh)) {
      
      <-- IO completes now, buffer gets unlocked -->
      
            spin_unlock(&journal->j_list_lock);
      					    spin_lock(&journal->j_list_lock);
      					    __jbd2_journal_remove_checkpoint(jh);
      					    spin_unlock(&journal->j_list_lock);
      					  try_to_free_buffers(page);
            get_bh(bh) <-- accesses freed bh
      
      Fix the problem by grabbing bh reference before unlocking
      journal->j_list_lock.
      
      Fixes: dc6e8d66 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
      Fixes: be1158cc ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
      Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
      CC: stable@vger.kernel.org
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ccd3c437
  3. 30 7月, 2018 1 次提交
  4. 17 6月, 2018 1 次提交
  5. 13 6月, 2018 1 次提交
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  6. 21 5月, 2018 2 次提交
  7. 18 4月, 2018 1 次提交
    • T
      ext4: set h_journal if there is a failure starting a reserved handle · b2569260
      Theodore Ts'o 提交于
      If ext4 tries to start a reserved handle via
      jbd2_journal_start_reserved(), and the journal has been aborted, this
      can result in a NULL pointer dereference.  This is because the fields
      h_journal and h_transaction in the handle structure share the same
      memory, via a union, so jbd2_journal_start_reserved() will clear
      h_journal before calling start_this_handle().  If this function fails
      due to an aborted handle, h_journal will still be NULL, and the call
      to jbd2_journal_free_reserved() will pass a NULL journal to
      sub_reserve_credits().
      
      This can be reproduced by running "kvm-xfstests -c dioread_nolock
      generic/475".
      
      Cc: stable@kernel.org # 3.11
      Fixes: 8f7d89f3 ("jbd2: transaction reservation support")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b2569260
  8. 20 2月, 2018 1 次提交
  9. 19 2月, 2018 2 次提交
    • T
      ext4: pass -ESHUTDOWN code to jbd2 layer · fb7c0244
      Theodore Ts'o 提交于
      Previously the jbd2 layer assumed that a file system check would be
      required after a journal abort.  In the case of the deliberate file
      system shutdown, this should not be necessary.  Allow the jbd2 layer
      to distinguish between these two cases by using the ESHUTDOWN errno.
      
      Also add proper locking to __journal_abort_soft().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      fb7c0244
    • T
      jbd2: clarify bad journal block checksum message · ed65b00f
      Theodore Ts'o 提交于
      There were two error messages emitted by jbd2, one for a bad checksum
      for a jbd2 descriptor block, and one for a bad checksum for a jbd2
      data block.  Change the data block checksum error so that the two can
      be disambiguated.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ed65b00f
  10. 10 1月, 2018 1 次提交
    • T
      jbd2: fix sphinx kernel-doc build warnings · f69120ce
      Tobin C. Harding 提交于
      Sphinx emits various (26) warnings when building make target 'htmldocs'.
      Currently struct definitions contain duplicate documentation, some as
      kernel-docs and some as standard c89 comments.  We can reduce
      duplication while cleaning up the kernel docs.
      
      Move all kernel-docs to right above each struct member.  Use the set of
      all existing comments (kernel-doc and c89).  Add documentation for
      missing struct members and function arguments.
      Signed-off-by: NTobin C. Harding <me@tobin.cc>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f69120ce
  11. 18 12月, 2017 1 次提交
    • T
      ext4: fix up remaining files with SPDX cleanups · f5166768
      Theodore Ts'o 提交于
      A number of ext4 source files were skipped due because their copyright
      permission statements didn't match the expected text used by the
      automated conversion utilities.  I've added SPDX tags for the rest.
      
      While looking at some of these files, I've noticed that we have quite
      a bit of variation on the licenses that were used --- in particular
      some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
      and we have some files that have a LGPL-2.1 license (which was quite
      surprising).
      
      I've not attempted to do any license changes.  Even if it is perfectly
      legal to relicense to GPL 2.0-only for consistency's sake, that should
      be done with ext4 developer community discussion.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      f5166768
  12. 03 11月, 2017 1 次提交
    • J
      ext4: Support for synchronous DAX faults · b8a6176c
      Jan Kara 提交于
      We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
      prepare blocks for writing and the inode has some uncommitted metadata
      changes. In the fault handler ext4_dax_fault() we then detect this case
      (through VM_FAULT_NEEDDSYNC return value) and call helper
      dax_finish_sync_fault() to flush metadata changes and insert page table
      entry. Note that this will also dirty corresponding radix tree entry
      which is what we want - fsync(2) will still provide data integrity
      guarantees for applications not using userspace flushing. And
      applications using userspace flushing can avoid calling fsync(2) and
      thus avoid the performance overhead.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b8a6176c
  13. 19 10月, 2017 1 次提交
  14. 06 7月, 2017 1 次提交
  15. 20 6月, 2017 1 次提交
  16. 22 5月, 2017 1 次提交
    • T
      jbd2: preserve original nofs flag during journal restart · b4709067
      Tahsin Erdogan 提交于
      When a transaction starts, start_this_handle() saves current
      PF_MEMALLOC_NOFS value so that it can be restored at journal stop time.
      Journal restart is a special case that calls start_this_handle() without
      stopping the transaction. start_this_handle() isn't aware that the
      original value is already stored so it overwrites it with current value.
      
      For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set
      at the end:
      
        jbd2_journal_start()
        jbd2__journal_restart()
        jbd2_journal_stop()
      
      Make jbd2__journal_restart() restore the original value before calling
      start_this_handle().
      
      Fixes: 81378da6 ("jbd2: mark the transaction context with the scope GFP_NOFS context")
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b4709067
  17. 16 5月, 2017 2 次提交
  18. 04 5月, 2017 3 次提交
  19. 30 4月, 2017 2 次提交
    • J
      jbd2: fix dbench4 performance regression for 'nobarrier' mounts · 5052b069
      Jan Kara 提交于
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
      JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
      filesystem is mounted with nobarrier mount option, journal superblock
      writes ended up being async writes after this patch and that caused
      heavy performance regression for dbench4 benchmark with high number of
      processes. In my test setup with HP RAID array with non-volatile write
      cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.
      
      Fix the problem by making sure journal superblock writes are always
      treated as synchronous since they generally block progress of the
      journalling machinery and thus the whole filesystem.
      
      Fixes: b685d3d6
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5052b069
    • J
      jbd2: Fix lockdep splat with generic/270 test · c52c47e4
      Jan Kara 提交于
      I've hit a lockdep splat with generic/270 test complaining that:
      
      3216.fsstress.b/3533 is trying to acquire lock:
       (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150
      
      but task is already holding lock:
       (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850
      
      The underlying problem is that jbd2_journal_force_commit_nested()
      (called from ext4_should_retry_alloc()) may get called while a
      transaction handle is started. In such case it takes care to not wait
      for commit of the running transaction (which would deadlock) but only
      for a commit of a transaction that is already committing (which is safe
      as that doesn't wait for any filesystem locks).
      
      In fact there are also other callers of jbd2_log_wait_commit() that take
      care to pass tid of a transaction that is already committing and for
      those cases, the lockdep instrumentation is too restrictive and leading
      to false positive reports. Fix the problem by calling
      jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
      transaction isn't already committing.
      
      Fixes: 1eaa566dSigned-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c52c47e4
  20. 19 4月, 2017 1 次提交
    • P
      mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU · 5f0d5a3a
      Paul E. McKenney 提交于
      A group of Linux kernel hackers reported chasing a bug that resulted
      from their assumption that SLAB_DESTROY_BY_RCU provided an existence
      guarantee, that is, that no block from such a slab would be reallocated
      during an RCU read-side critical section.  Of course, that is not the
      case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
      slab of blocks.
      
      However, there is a phrase for this, namely "type safety".  This commit
      therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
      to avoid future instances of this sort of confusion.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      [ paulmck: Add comments mentioning the old name, as requested by Eric
        Dumazet, in order to help people familiar with the old name find
        the new one. ]
      Acked-by: NDavid Rientjes <rientjes@google.com>
      5f0d5a3a
  21. 16 3月, 2017 1 次提交
  22. 05 2月, 2017 1 次提交
  23. 02 2月, 2017 1 次提交
    • S
      jbd2: fix use after free in kjournald2() · dbfcef6b
      Sahitya Tummala 提交于
      Below is the synchronization issue between unmount and kjournald2
      contexts, which results into use after free issue in kjournald2().
      Fix this issue by using journal->j_state_lock to synchronize the
      wait_event() done in journal_kill_thread() and the wake_up() done
      in kjournald2().
      
      TASK 1:
      umount cmd:
         |--jbd2_journal_destroy() {
             |--journal_kill_thread() {
                  write_lock(&journal->j_state_lock);
      	    journal->j_flags |= JBD2_UNMOUNT;
      	    ...
      	    write_unlock(&journal->j_state_lock);
      	    wake_up(&journal->j_wait_commit);	   TASK 2 wakes up here:
      	    					   kjournald2() {
      						     ...
      						     checks JBD2_UNMOUNT flag and calls goto end-loop;
      						     ...
      						     end_loop:
      						       write_unlock(&journal->j_state_lock);
      						       journal->j_task = NULL; --> If this thread gets
      						       pre-empted here, then TASK 1 wait_event will
      						       exit even before this thread is completely
      						       done.
      	    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
      	    ...
      	    write_lock(&journal->j_state_lock);
      	    write_unlock(&journal->j_state_lock);
      	  }
             |--kfree(journal);
           }
      }
      						       wake_up(&journal->j_wait_done_commit); --> this step
      						       now results into use after free issue.
      						   }
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dbfcef6b
  24. 14 1月, 2017 1 次提交
    • T
      fs/jbd2, locking/mutex, sched/wait: Use mutex_lock_io() for journal->j_checkpoint_mutex · 6fa7aa50
      Tejun Heo 提交于
      When an ext4 fs is bogged down by a lot of metadata IOs (in the
      reported case, it was deletion of millions of files, but any massive
      amount of journal writes would do), after the journal is filled up,
      tasks which try to access the filesystem and aren't currently
      performing the journal writes end up waiting in
      __jbd2_log_wait_for_space() for journal->j_checkpoint_mutex.
      
      Because those mutex sleeps aren't marked as iowait, this condition can
      lead to misleadingly low iowait and /proc/stat:procs_blocked.  While
      iowait propagation is far from strict, this condition can be triggered
      fairly easily and annotating these sleeps correctly helps initial
      diagnosis quite a bit.
      
      Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that
      these sleeps are properly marked as iowait.
      Reported-by: NMingbo Wan <mingbo@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@fb.com
      Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6fa7aa50
  25. 25 12月, 2016 1 次提交
  26. 01 11月, 2016 1 次提交
  27. 13 10月, 2016 1 次提交
  28. 12 10月, 2016 1 次提交
  29. 22 9月, 2016 1 次提交
  30. 16 9月, 2016 1 次提交
  31. 30 6月, 2016 4 次提交
    • A
      jbd2: make journal y2038 safe · abcfb5d9
      Arnd Bergmann 提交于
      The jbd2 journal stores the commit time in 64-bit seconds and 32-bit
      nanoseconds, which avoids an overflow in 2038, but it gets the numbers
      from current_kernel_time(), which uses 'long' seconds on 32-bit
      architectures.
      
      This simply changes the code to call current_kernel_time64() so
      we use 64-bit seconds consistently.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      abcfb5d9
    • J
      jbd2: track more dependencies on transaction commit · 1eaa566d
      Jan Kara 提交于
      So far we were tracking only dependency on transaction commit due to
      starting a new handle (which may require commit to start a new
      transaction). Now add tracking also for other cases where we wait for
      transaction commit. This way lockdep can catch deadlocks e. g. because we
      call jbd2_journal_stop() for a synchronous handle with some locks held
      which rank below transaction start.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1eaa566d
    • J
      jbd2: move lockdep tracking to journal_s · ab714aff
      Jan Kara 提交于
      Currently lockdep map is tracked in each journal handle. To be able to
      expand lockdep support to cover also other cases where we depend on
      transaction commit and where handle is not available, move lockdep map
      into struct journal_s. Since this makes the lockdep map shared for all
      handles, we have to use rwsem_acquire_read() for acquisitions now.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ab714aff
    • J
      jbd2: move lockdep instrumentation for jbd2 handles · 7a4b188f
      Jan Kara 提交于
      The transaction the handle references is free to commit once we've
      decremented t_updates counter. Move the lockdep instrumentation to that
      place. Currently it was a bit later which did not really matter but
      subsequent improvements to lockdep instrumentation would cause false
      positives with it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7a4b188f