1. 16 3月, 2017 1 次提交
    • E
      fscrypt: remove broken support for detecting keyring key revocation · 1b53cf98
      Eric Biggers 提交于
      Filesystem encryption ostensibly supported revoking a keyring key that
      had been used to "unlock" encrypted files, causing those files to become
      "locked" again.  This was, however, buggy for several reasons, the most
      severe of which was that when key revocation happened to be detected for
      an inode, its fscrypt_info was immediately freed, even while other
      threads could be using it for encryption or decryption concurrently.
      This could be exploited to crash the kernel or worse.
      
      This patch fixes the use-after-free by removing the code which detects
      the keyring key having been revoked, invalidated, or expired.  Instead,
      an encrypted inode that is "unlocked" now simply remains unlocked until
      it is evicted from memory.  Note that this is no worse than the case for
      block device-level encryption, e.g. dm-crypt, and it still remains
      possible for a privileged user to evict unused pages, inodes, and
      dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by
      simply unmounting the filesystem.  In fact, one of those actions was
      already needed anyway for key revocation to work even somewhat sanely.
      This change is not expected to break any applications.
      
      In the future I'd like to implement a real API for fscrypt key
      revocation that interacts sanely with ongoing filesystem operations ---
      waiting for existing operations to complete and blocking new operations,
      and invalidating and sanitizing key material and plaintext from the VFS
      caches.  But this is a hard problem, and for now this bug must be fixed.
      
      This bug affected almost all versions of ext4, f2fs, and ubifs
      encryption, and it was potentially reachable in any kernel configured
      with encryption support (CONFIG_EXT4_ENCRYPTION=y,
      CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
      CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
      shared fs/crypto/ code, but due to the potential security implications
      of this bug, it may still be worthwhile to backport this fix to them.
      
      Fixes: b7236e21 ("ext4 crypto: reorganize how we store keys in the inode")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Acked-by: NMichael Halcrow <mhalcrow@google.com>
      1b53cf98
  2. 21 2月, 2017 1 次提交
  3. 17 2月, 2017 1 次提交
  4. 16 2月, 2017 2 次提交
  5. 15 2月, 2017 3 次提交
    • S
      fuse: fix use after free issue in fuse_dev_do_read() · 6ba4d272
      Sahitya Tummala 提交于
      There is a potential race between fuse_dev_do_write()
      and request_wait_answer() contexts as shown below:
      
      TASK 1:
      __fuse_request_send():
        |--spin_lock(&fiq->waitq.lock);
        |--queue_request();
        |--spin_unlock(&fiq->waitq.lock);
        |--request_wait_answer():
             |--if (test_bit(FR_SENT, &req->flags))
             <gets pre-empted after it is validated true>
                                         TASK 2:
                                         fuse_dev_do_write():
                                           |--clears bit FR_SENT,
                                           |--request_end():
                                              |--sets bit FR_FINISHED
                                              |--spin_lock(&fiq->waitq.lock);
                                              |--list_del_init(&req->intr_entry);
                                              |--spin_unlock(&fiq->waitq.lock);
                                              |--fuse_put_request();
             |--queue_interrupt();
             <request gets queued to interrupts list>
                  |--wake_up_locked(&fiq->waitq);
             |--wait_event_freezable();
             <as FR_FINISHED is set, it returns and then
             the caller frees this request>
      
      Now, the next fuse_dev_do_read(), see interrupts list is not empty
      and then calls fuse_read_interrupt() which tries to access the request
      which is already free'd and gets the below crash:
      
      [11432.401266] Unable to handle kernel paging request at virtual address
      6b6b6b6b6b6b6b6b
      ...
      [11432.418518] Kernel BUG at ffffff80083720e0
      [11432.456168] PC is at __list_del_entry+0x6c/0xc4
      [11432.463573] LR is at fuse_dev_do_read+0x1ac/0x474
      ...
      [11432.679999] [<ffffff80083720e0>] __list_del_entry+0x6c/0xc4
      [11432.687794] [<ffffff80082c65e0>] fuse_dev_do_read+0x1ac/0x474
      [11432.693180] [<ffffff80082c6b14>] fuse_dev_read+0x6c/0x78
      [11432.699082] [<ffffff80081d5638>] __vfs_read+0xc0/0xe8
      [11432.704459] [<ffffff80081d5efc>] vfs_read+0x90/0x108
      [11432.709406] [<ffffff80081d67f0>] SyS_read+0x58/0x94
      
      As FR_FINISHED bit is set before deleting the intr_entry with input
      queue lock in request completion path, do the testing of this flag and
      queueing atomically with the same lock in queue_interrupt().
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: fd22d62e ("fuse: no fc->lock for iqueue parts")
      Cc: <stable@vger.kernel.org> # 4.2+
      6ba4d272
    • T
      ext4: fix fencepost in s_first_meta_bg validation · 2ba3e6e8
      Theodore Ts'o 提交于
      It is OK for s_first_meta_bg to be equal to the number of block group
      descriptor blocks.  (It rarely happens, but it shouldn't cause any
      problems.)
      
      https://bugzilla.kernel.org/show_bug.cgi?id=194567
      
      Fixes: 3a4b77cdSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      2ba3e6e8
    • T
      ext4: don't BUG when truncating encrypted inodes on the orphan list · 0d06863f
      Theodore Ts'o 提交于
      Fix a BUG when the kernel tries to mount a file system constructed as
      follows:
      
      echo foo > foo.txt
      mke2fs -Fq -t ext4 -O encrypt foo.img 100
      debugfs -w foo.img << EOF
      write foo.txt a
      set_inode_field a i_flags 0x80800
      set_super_value s_last_orphan 12
      quit
      EOF
      
      root@kvm-xfstests:~# mount -o loop foo.img /mnt
      [  160.238770] ------------[ cut here ]------------
      [  160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
      [  160.240106] invalid opcode: 0000 [#1] SMP
      [  160.240106] Modules linked in:
      [  160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G        W       4.10.0-rc3-00034-gcdd33b941b67 #227
      [  160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
      [  160.240106] task: f4518000 task.stack: f47b6000
      [  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
      [  160.240106] EFLAGS: 00010246 CPU: 0
      [  160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
      [  160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
      [  160.240106]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      [  160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
      [  160.240106] Call Trace:
      [  160.240106]  ext4_truncate+0x1e9/0x3e5
      [  160.240106]  ext4_fill_super+0x286f/0x2b1e
      [  160.240106]  ? set_blocksize+0x2e/0x7e
      [  160.240106]  mount_bdev+0x114/0x15f
      [  160.240106]  ext4_mount+0x15/0x17
      [  160.240106]  ? ext4_calculate_overhead+0x39d/0x39d
      [  160.240106]  mount_fs+0x58/0x115
      [  160.240106]  vfs_kern_mount+0x4b/0xae
      [  160.240106]  do_mount+0x671/0x8c3
      [  160.240106]  ? _copy_from_user+0x70/0x83
      [  160.240106]  ? strndup_user+0x31/0x46
      [  160.240106]  SyS_mount+0x57/0x7b
      [  160.240106]  do_int80_syscall_32+0x4f/0x61
      [  160.240106]  entry_INT80_32+0x2f/0x2f
      [  160.240106] EIP: 0xb76b919e
      [  160.240106] EFLAGS: 00000246 CPU: 0
      [  160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
      [  160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
      [  160.240106]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      [  160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
      [  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
      [  160.317241] ---[ end trace d6a773a375c810a5 ]---
      
      The problem is that when the kernel tries to truncate an inode in
      ext4_truncate(), it tries to clear any on-disk data beyond i_size.
      Without the encryption key, it can't do that, and so it triggers a
      BUG.
      
      E2fsck does *not* provide this service, and in practice most file
      systems have their orphan list processed by e2fsck, so to avoid
      crashing, this patch skips this step if we don't have access to the
      encryption key (which is the case when processing the orphan list; in
      all other cases, we will have the encryption key, or the kernel
      wouldn't have allowed the file to be opened).
      
      An open question is whether the fact that e2fsck isn't clearing the
      bytes beyond i_size causing problems --- and if we've lived with it
      not doing it for so long, can we drop this from the kernel replay of
      the orphan list in all cases (not just when we don't have the key for
      encrypted inodes).
      
      Addresses-Google-Bug: #35209576
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0d06863f
  6. 14 2月, 2017 1 次提交
  7. 11 2月, 2017 1 次提交
    • O
      Btrfs: fix btrfs_decompress_buf2page() · 6e78b3f7
      Omar Sandoval 提交于
      If btrfs_decompress_buf2page() is handed a bio with its page in the
      middle of the working buffer, then we adjust the offset into the working
      buffer. After we copy into the bio, we advance the iterator by the
      number of bytes we copied. Then, we have some logic to handle the case
      of discontiguous pages and adjust the offset into the working buffer
      again. However, if we didn't advance the bio to a new page, we may enter
      this case in error, essentially repeating the adjustment that we already
      made when we entered the function. The end result is bogus data in the
      bio.
      
      Previously, we only checked for this case when we advanced to a new
      page, but the conversion to bio iterators changed that. This restores
      the old, correct behavior.
      
      A case I saw when testing with zlib was:
      
          buf_start = 42769
          total_out = 46865
          working_bytes = total_out - buf_start = 4096
          start_byte = 45056
      
      The condition (total_out > start_byte && buf_start < start_byte) is
      true, so we adjust the offset:
      
          buf_offset = start_byte - buf_start = 2287
          working_bytes -= buf_offset = 1809
          current_buf_start = buf_start = 42769
      
      Then, we copy
      
          bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
          buf_offset += bytes = 4096
          working_bytes -= bytes = 0
          current_buf_start += bytes = 44578
      
      After bio_advance(), we are still in the same page, so start_byte is the
      same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
      which is true! So, we adjust the values again:
      
          buf_offset = start_byte - buf_start = 2287
          working_bytes = total_out - start_byte = 1809
          current_buf_start = buf_start + buf_offset = 45056
      
      But note that working_bytes was already zero before this, so we should
      have stopped copying.
      
      Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers")
      Reported-by: NPat Erley <pat-lkml@erley.org>
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Tested-by: NLiu Bo <bo.li.liu@oracle.com>
      6e78b3f7
  8. 10 2月, 2017 5 次提交
  9. 09 2月, 2017 3 次提交
  10. 08 2月, 2017 2 次提交
  11. 07 2月, 2017 3 次提交
  12. 06 2月, 2017 1 次提交
  13. 05 2月, 2017 6 次提交
  14. 04 2月, 2017 1 次提交
    • M
      fs: break out of iomap_file_buffered_write on fatal signals · d1908f52
      Michal Hocko 提交于
      Tetsuo has noticed that an OOM stress test which performs large write
      requests can cause the full memory reserves depletion.  He has tracked
      this down to the following path
      
      	__alloc_pages_nodemask+0x436/0x4d0
      	alloc_pages_current+0x97/0x1b0
      	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
      	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
      	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
      	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
      	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
      	? iomap_write_end+0x80/0x80             fs/iomap.c:150
      	iomap_apply+0xb3/0x130                  fs/iomap.c:79
      	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
      	? iomap_write_end+0x80/0x80
      	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
      	? remove_wait_queue+0x59/0x60
      	xfs_file_write_iter+0x90/0x130 [xfs]
      	__vfs_write+0xe5/0x140
      	vfs_write+0xc7/0x1f0
      	? syscall_trace_enter+0x1d0/0x380
      	SyS_write+0x58/0xc0
      	do_syscall_64+0x6c/0x200
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      the oom victim has access to all memory reserves to make a forward
      progress to exit easier.  But iomap_file_buffered_write and other
      callers of iomap_apply loop to complete the full request.  We need to
      check for fatal signals and back off with a short write instead.
      
      As the iomap_apply delegates all the work down to the actor we have to
      hook into those.  All callers that work with the page cache are calling
      iomap_write_begin so we will check for signals there.  dax_iomap_actor
      has to handle the situation explicitly because it copies data to the
      userspace directly.  Other callers like iomap_page_mkwrite work on a
      single page or iomap_fiemap_actor do not allocate memory based on the
      given len.
      
      Fixes: 68a9f5e7 ("xfs: implement iomap based buffered write path")
      Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1908f52
  15. 03 2月, 2017 1 次提交
    • J
      ext4: move halfmd4 into hash.c directly · 1c83a9aa
      Jason A. Donenfeld 提交于
      The "half md4" transform should not be used by any new code. And
      fortunately, it's only used now by ext4. Since ext4 supports several
      hashing methods, at some point it might be desirable to move to
      something like SipHash. As an intermediate step, remove half md4 from
      cryptohash.h and lib, and make it just a local function in ext4's
      hash.c. There's precedent for doing this; the other function ext can use
      for its hashes -- TEA -- is also implemented in the same place. Also, by
      being a local function, this might allow gcc to perform some additional
      optimizations.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1c83a9aa
  16. 02 2月, 2017 2 次提交
    • E
      ext4: fix use-after-iput when fscrypt contexts are inconsistent · dd01b690
      Eric Biggers 提交于
      In the case where the child's encryption context was inconsistent with
      its parent directory, we were using inode->i_sb and inode->i_ino after
      the inode had already been iput().  Fix this by doing the iput() in the
      correct places.
      
      Note: only ext4 had this bug, not f2fs and ubifs.
      
      Fixes: d9cdc903 ("ext4 crypto: enforce context consistency")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dd01b690
    • S
      jbd2: fix use after free in kjournald2() · dbfcef6b
      Sahitya Tummala 提交于
      Below is the synchronization issue between unmount and kjournald2
      contexts, which results into use after free issue in kjournald2().
      Fix this issue by using journal->j_state_lock to synchronize the
      wait_event() done in journal_kill_thread() and the wake_up() done
      in kjournald2().
      
      TASK 1:
      umount cmd:
         |--jbd2_journal_destroy() {
             |--journal_kill_thread() {
                  write_lock(&journal->j_state_lock);
      	    journal->j_flags |= JBD2_UNMOUNT;
      	    ...
      	    write_unlock(&journal->j_state_lock);
      	    wake_up(&journal->j_wait_commit);	   TASK 2 wakes up here:
      	    					   kjournald2() {
      						     ...
      						     checks JBD2_UNMOUNT flag and calls goto end-loop;
      						     ...
      						     end_loop:
      						       write_unlock(&journal->j_state_lock);
      						       journal->j_task = NULL; --> If this thread gets
      						       pre-empted here, then TASK 1 wait_event will
      						       exit even before this thread is completely
      						       done.
      	    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
      	    ...
      	    write_lock(&journal->j_state_lock);
      	    write_unlock(&journal->j_state_lock);
      	  }
             |--kfree(journal);
           }
      }
      						       wake_up(&journal->j_wait_done_commit); --> this step
      						       now results into use after free issue.
      						   }
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dbfcef6b
  17. 01 2月, 2017 6 次提交