1. 15 2月, 2017 1 次提交
    • T
      ext4: don't BUG when truncating encrypted inodes on the orphan list · 0d06863f
      Theodore Ts'o 提交于
      Fix a BUG when the kernel tries to mount a file system constructed as
      follows:
      
      echo foo > foo.txt
      mke2fs -Fq -t ext4 -O encrypt foo.img 100
      debugfs -w foo.img << EOF
      write foo.txt a
      set_inode_field a i_flags 0x80800
      set_super_value s_last_orphan 12
      quit
      EOF
      
      root@kvm-xfstests:~# mount -o loop foo.img /mnt
      [  160.238770] ------------[ cut here ]------------
      [  160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
      [  160.240106] invalid opcode: 0000 [#1] SMP
      [  160.240106] Modules linked in:
      [  160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G        W       4.10.0-rc3-00034-gcdd33b941b67 #227
      [  160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
      [  160.240106] task: f4518000 task.stack: f47b6000
      [  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
      [  160.240106] EFLAGS: 00010246 CPU: 0
      [  160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
      [  160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
      [  160.240106]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      [  160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
      [  160.240106] Call Trace:
      [  160.240106]  ext4_truncate+0x1e9/0x3e5
      [  160.240106]  ext4_fill_super+0x286f/0x2b1e
      [  160.240106]  ? set_blocksize+0x2e/0x7e
      [  160.240106]  mount_bdev+0x114/0x15f
      [  160.240106]  ext4_mount+0x15/0x17
      [  160.240106]  ? ext4_calculate_overhead+0x39d/0x39d
      [  160.240106]  mount_fs+0x58/0x115
      [  160.240106]  vfs_kern_mount+0x4b/0xae
      [  160.240106]  do_mount+0x671/0x8c3
      [  160.240106]  ? _copy_from_user+0x70/0x83
      [  160.240106]  ? strndup_user+0x31/0x46
      [  160.240106]  SyS_mount+0x57/0x7b
      [  160.240106]  do_int80_syscall_32+0x4f/0x61
      [  160.240106]  entry_INT80_32+0x2f/0x2f
      [  160.240106] EIP: 0xb76b919e
      [  160.240106] EFLAGS: 00000246 CPU: 0
      [  160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
      [  160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
      [  160.240106]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      [  160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
      [  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
      [  160.317241] ---[ end trace d6a773a375c810a5 ]---
      
      The problem is that when the kernel tries to truncate an inode in
      ext4_truncate(), it tries to clear any on-disk data beyond i_size.
      Without the encryption key, it can't do that, and so it triggers a
      BUG.
      
      E2fsck does *not* provide this service, and in practice most file
      systems have their orphan list processed by e2fsck, so to avoid
      crashing, this patch skips this step if we don't have access to the
      encryption key (which is the case when processing the orphan list; in
      all other cases, we will have the encryption key, or the kernel
      wouldn't have allowed the file to be opened).
      
      An open question is whether the fact that e2fsck isn't clearing the
      bytes beyond i_size causing problems --- and if we've lived with it
      not doing it for so long, can we drop this from the kernel replay of
      the orphan list in all cases (not just when we don't have the key for
      encrypted inodes).
      
      Addresses-Google-Bug: #35209576
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0d06863f
  2. 10 2月, 2017 2 次提交
    • J
      ext4: do not use stripe_width if it is not set · 5469d7c3
      Jan Kara 提交于
      Avoid using stripe_width for sbi->s_stripe value if it is not actually
      set. It prevents using the stride for sbi->s_stripe.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5469d7c3
    • J
      ext4: fix stripe-unaligned allocations · d9b22cf9
      Jan Kara 提交于
      When a filesystem is created using:
      
      	mkfs.ext4 -b 4096 -E stride=512 <dev>
      
      and we try to allocate 64MB extent, we will end up directly in
      ext4_mb_complex_scan_group(). This is because the request is detected
      as power-of-two allocation (so we start in ext4_mb_regular_allocator()
      with ac_criteria == 0) however the check before
      ext4_mb_simple_scan_group() refuses the direct buddy scan because the
      allocation request is too large. Since cr == 0, the check whether we
      should use ext4_mb_scan_aligned() fails as well and we fall back to
      ext4_mb_complex_scan_group().
      
      Fix the problem by checking for upper limit on power-of-two requests
      directly when detecting them.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d9b22cf9
  3. 09 2月, 2017 2 次提交
  4. 06 2月, 2017 1 次提交
  5. 05 2月, 2017 6 次提交
  6. 03 2月, 2017 1 次提交
    • J
      ext4: move halfmd4 into hash.c directly · 1c83a9aa
      Jason A. Donenfeld 提交于
      The "half md4" transform should not be used by any new code. And
      fortunately, it's only used now by ext4. Since ext4 supports several
      hashing methods, at some point it might be desirable to move to
      something like SipHash. As an intermediate step, remove half md4 from
      cryptohash.h and lib, and make it just a local function in ext4's
      hash.c. There's precedent for doing this; the other function ext can use
      for its hashes -- TEA -- is also implemented in the same place. Also, by
      being a local function, this might allow gcc to perform some additional
      optimizations.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1c83a9aa
  7. 02 2月, 2017 2 次提交
    • E
      ext4: fix use-after-iput when fscrypt contexts are inconsistent · dd01b690
      Eric Biggers 提交于
      In the case where the child's encryption context was inconsistent with
      its parent directory, we were using inode->i_sb and inode->i_ino after
      the inode had already been iput().  Fix this by doing the iput() in the
      correct places.
      
      Note: only ext4 had this bug, not f2fs and ubifs.
      
      Fixes: d9cdc903 ("ext4 crypto: enforce context consistency")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dd01b690
    • S
      jbd2: fix use after free in kjournald2() · dbfcef6b
      Sahitya Tummala 提交于
      Below is the synchronization issue between unmount and kjournald2
      contexts, which results into use after free issue in kjournald2().
      Fix this issue by using journal->j_state_lock to synchronize the
      wait_event() done in journal_kill_thread() and the wake_up() done
      in kjournald2().
      
      TASK 1:
      umount cmd:
         |--jbd2_journal_destroy() {
             |--journal_kill_thread() {
                  write_lock(&journal->j_state_lock);
      	    journal->j_flags |= JBD2_UNMOUNT;
      	    ...
      	    write_unlock(&journal->j_state_lock);
      	    wake_up(&journal->j_wait_commit);	   TASK 2 wakes up here:
      	    					   kjournald2() {
      						     ...
      						     checks JBD2_UNMOUNT flag and calls goto end-loop;
      						     ...
      						     end_loop:
      						       write_unlock(&journal->j_state_lock);
      						       journal->j_task = NULL; --> If this thread gets
      						       pre-empted here, then TASK 1 wait_event will
      						       exit even before this thread is completely
      						       done.
      	    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
      	    ...
      	    write_lock(&journal->j_state_lock);
      	    write_unlock(&journal->j_state_lock);
      	  }
             |--kfree(journal);
           }
      }
      						       wake_up(&journal->j_wait_done_commit); --> this step
      						       now results into use after free issue.
      						   }
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dbfcef6b
  8. 28 1月, 2017 2 次提交
  9. 23 1月, 2017 2 次提交
  10. 12 1月, 2017 3 次提交
    • T
      ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphores · b907f2d5
      Theodore Ts'o 提交于
      There is no need to call ext4_mark_inode_dirty while holding xattr_sem
      or i_data_sem, so where it's easy to avoid it, move it out from the
      critical region.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b907f2d5
    • T
      ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() · c755e251
      Theodore Ts'o 提交于
      The xattr_sem deadlock problems fixed in commit 2e81a4ee: "ext4:
      avoid deadlock when expanding inode size" didn't include the use of
      xattr_sem in fs/ext4/inline.c.  With the addition of project quota
      which added a new extra inode field, this exposed deadlocks in the
      inline_data code similar to the ones fixed by 2e81a4ee.
      
      The deadlock can be reproduced via:
      
         dmesg -n 7
         mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
         mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
         mkdir /vdc/a
         umount /vdc
         mount -t ext4 /dev/vdc /vdc
         echo foo > /vdc/a/foo
      
      and looks like this:
      
      [   11.158815] 
      [   11.160276] =============================================
      [   11.161960] [ INFO: possible recursive locking detected ]
      [   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W      
      [   11.161960] ---------------------------------------------
      [   11.161960] bash/2519 is trying to acquire lock:
      [   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960] 
      [   11.161960] but task is already holding lock:
      [   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
      [   11.161960] 
      [   11.161960] other info that might help us debug this:
      [   11.161960]  Possible unsafe locking scenario:
      [   11.161960] 
      [   11.161960]        CPU0
      [   11.161960]        ----
      [   11.161960]   lock(&ei->xattr_sem);
      [   11.161960]   lock(&ei->xattr_sem);
      [   11.161960] 
      [   11.161960]  *** DEADLOCK ***
      [   11.161960] 
      [   11.161960]  May be due to missing lock nesting notation
      [   11.161960] 
      [   11.161960] 4 locks held by bash/2519:
      [   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
      [   11.161960]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
      [   11.161960]  #2:  (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
      [   11.161960]  #3:  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
      [   11.161960] 
      [   11.161960] stack backtrace:
      [   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
      [   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
      [   11.161960] Call Trace:
      [   11.161960]  dump_stack+0x72/0xa3
      [   11.161960]  __lock_acquire+0xb7c/0xcb9
      [   11.161960]  ? kvm_clock_read+0x1f/0x29
      [   11.161960]  ? __lock_is_held+0x36/0x66
      [   11.161960]  ? __lock_is_held+0x36/0x66
      [   11.161960]  lock_acquire+0x106/0x18a
      [   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  down_write+0x39/0x72
      [   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  ? _raw_read_unlock+0x22/0x2c
      [   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
      [   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
      [   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
      [   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
      [   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
      [   11.161960]  ext4_try_add_inline_entry+0x69/0x152
      [   11.161960]  ext4_add_entry+0xa3/0x848
      [   11.161960]  ? __brelse+0x14/0x2f
      [   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
      [   11.161960]  ext4_add_nondir+0x17/0x5b
      [   11.161960]  ext4_create+0xcf/0x133
      [   11.161960]  ? ext4_mknod+0x12f/0x12f
      [   11.161960]  lookup_open+0x39e/0x3fb
      [   11.161960]  ? __wake_up+0x1a/0x40
      [   11.161960]  ? lock_acquire+0x11e/0x18a
      [   11.161960]  path_openat+0x35c/0x67a
      [   11.161960]  ? sched_clock_cpu+0xd7/0xf2
      [   11.161960]  do_filp_open+0x36/0x7c
      [   11.161960]  ? _raw_spin_unlock+0x22/0x2c
      [   11.161960]  ? __alloc_fd+0x169/0x173
      [   11.161960]  do_sys_open+0x59/0xcc
      [   11.161960]  SyS_open+0x1d/0x1f
      [   11.161960]  do_int80_syscall_32+0x4f/0x61
      [   11.161960]  entry_INT80_32+0x2f/0x2f
      [   11.161960] EIP: 0xb76ad469
      [   11.161960] EFLAGS: 00000286 CPU: 0
      [   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
      [   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
      [   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      
      Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4ee as a prereq)
      Reported-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c755e251
    • T
      ext4: add debug_want_extra_isize mount option · 670e9875
      Theodore Ts'o 提交于
      In order to test the inode extra isize expansion code, it is useful to
      be able to easily create file systems that have inodes with extra
      isize values smaller than the current desired value.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      670e9875
  11. 09 1月, 2017 7 次提交
    • R
      ext4: do not polute the extents cache while shifting extents · 03e916fa
      Roman Pen 提交于
      Inside ext4_ext_shift_extents() function ext4_find_extent() is called
      without EXT4_EX_NOCACHE flag, which should prevent cache population.
      
      This leads to oudated offsets in the extents tree and wrong blocks
      afterwards.
      
      Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
      ext4_find_extents() call inside ext4_ext_shift_extents function.
      
      Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: stable@vger.kernel.org
      03e916fa
    • R
      ext4: Include forgotten start block on fallocate insert range · 2a9b8cba
      Roman Pen 提交于
      While doing 'insert range' start block should be also shifted right.
      The bug can be easily reproduced by the following test:
      
          ptr = malloc(4096);
          assert(ptr);
      
          fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
          assert(fd >= 0);
      
          rc = fallocate(fd, 0, 0, 8192);
          assert(rc == 0);
          for (i = 0; i < 2048; i++)
                  *((unsigned short *)ptr + i) = 0xbeef;
          rc = pwrite(fd, ptr, 4096, 0);
          assert(rc == 4096);
          rc = pwrite(fd, ptr, 4096, 4096);
          assert(rc == 4096);
      
          for (block = 2; block < 1000; block++) {
                  rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
                  assert(rc == 0);
      
                  for (i = 0; i < 2048; i++)
                          *((unsigned short *)ptr + i) = block;
      
                  rc = pwrite(fd, ptr, 4096, 4096);
                  assert(rc == 4096);
          }
      
      Because start block is not included in the range the hole appears at
      the wrong offset (just after the desired offset) and the following
      pwrite() overwrites already existent block, keeping hole untouched.
      
      Simple way to verify wrong behaviour is to check zeroed blocks after
      the test:
      
         $ hexdump ./ext4.file | grep '0000 0000'
      
      The root cause of the bug is a wrong range (start, stop], where start
      should be inclusive, i.e. [start, stop].
      
      This patch fixes the problem by including start into the range.  But
      not to break left shift (range collapse) stop points to the beginning
      of the a block, not to the end.
      
      The other not obvious change is an iterator check on validness in a
      main loop.  Because iterator is unsigned the following corner case
      should be considered with care: insert a block at 0 offset, when stop
      variables overflows and never becomes less than start, which is 0.
      To handle this special case iterator is set to NULL to indicate that
      end of the loop is reached.
      
      Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: stable@vger.kernel.org
      2a9b8cba
    • T
      Merge branch 'fscrypt' into d · 56735be0
      Theodore Ts'o 提交于
      56735be0
    • L
      Linux 4.10-rc3 · a121103c
      Linus Torvalds 提交于
      a121103c
    • L
      Merge tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 83280e90
      Linus Torvalds 提交于
      Pull USB fixes from Greg KH:
       "Here are a bunch of USB fixes for 4.10-rc3. Yeah, it's a lot, an
        artifact of the holiday break I think.
      
        Lots of gadget and the usual XHCI fixups for reported issues (one day
        that driver will calm down...) Also included are a bunch of usb-serial
        driver fixes, and for good measure, a number of much-reported MUSB
        driver issues have finally been resolved.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (72 commits)
        USB: fix problems with duplicate endpoint addresses
        usb: ohci-at91: use descriptor-based gpio APIs correctly
        usb: storage: unusual_uas: Add JMicron JMS56x to unusual device
        usb: hub: Move hub_port_disable() to fix warning if PM is disabled
        usb: musb: blackfin: add bfin_fifo_offset in bfin_ops
        usb: musb: fix compilation warning on unused function
        usb: musb: Fix trying to free already-free IRQ 4
        usb: musb: dsps: implement clear_ep_rxintr() callback
        usb: musb: core: add clear_ep_rxintr() to musb_platform_ops
        USB: serial: ti_usb_3410_5052: fix NULL-deref at open
        USB: serial: spcp8x5: fix NULL-deref at open
        USB: serial: quatech2: fix sleep-while-atomic in close
        USB: serial: pl2303: fix NULL-deref at open
        USB: serial: oti6858: fix NULL-deref at open
        USB: serial: omninet: fix NULL-derefs at open and disconnect
        USB: serial: mos7840: fix misleading interrupt-URB comment
        USB: serial: mos7840: remove unused write URB
        USB: serial: mos7840: fix NULL-deref at open
        USB: serial: mos7720: remove obsolete port initialisation
        USB: serial: mos7720: fix parallel probe
        ...
      83280e90
    • L
      Merge tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · cc250e26
      Linus Torvalds 提交于
      Pull char/misc fixes from Greg KH:
       "Here are a few small char/misc driver fixes for 4.10-rc3.
      
        Two MEI driver fixes, and three NVMEM patches for reported issues, and
        a new Hyper-V driver MAINTAINER update. Nothing major at all, all have
        been in linux-next with no reported issues"
      
      * tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        hyper-v: Add myself as additional MAINTAINER
        nvmem: fix nvmem_cell_read() return type doc
        nvmem: imx-ocotp: Fix wrong register size
        nvmem: qfprom: Allow single byte accesses for read/write
        mei: move write cb to completion on credentials failures
        mei: bus: fix mei_cldev_enable KDoc
      cc250e26
    • L
      Merge tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 6ea17ed1
      Linus Torvalds 提交于
      Pull staging/IIO fixes from Greg KH:
       "Here are some staging and IIO driver fixes for 4.10-rc3.
      
        Most of these are minor IIO fixes of reported issues, along with one
        network driver fix to resolve an issue. And a MAINTAINERS update with
        a new mailing list. All of these, except the MAINTAINERS file update,
        have been in linux-next with no reported issues (the MAINTAINERS patch
        happened on Friday...)"
      
      * tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        MAINTAINERS: add greybus subsystem mailing list
        staging: octeon: Call SET_NETDEV_DEV()
        iio: accel: st_accel: fix LIS3LV02 reading and scaling
        iio: common: st_sensors: fix channel data parsing
        iio: max44000: correct value in illuminance_integration_time_available
        iio: adc: TI_AM335X_ADC should depend on HAS_DMA
        iio: bmi160: Fix time needed to sleep after command execution
        iio: 104-quad-8: Fix active level mismatch for the preset enable option
        iio: 104-quad-8: Fix off-by-one errors when addressing IOR
        iio: 104-quad-8: Fix index control configuration
      6ea17ed1
  12. 08 1月, 2017 7 次提交
    • E
      fscrypt: make fscrypt_operations.key_prefix a string · a5d431ef
      Eric Biggers 提交于
      There was an unnecessary amount of complexity around requesting the
      filesystem-specific key prefix.  It was unclear why; perhaps it was
      envisioned that different instances of the same filesystem type could
      use different key prefixes, or that key prefixes could be binary.
      However, neither of those things were implemented or really make sense
      at all.  So simplify the code by making key_prefix a const char *.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a5d431ef
    • E
      fscrypt: remove unused 'mode' member of fscrypt_ctx · f099d616
      Eric Biggers 提交于
      Nothing reads or writes fscrypt_ctx.mode, and it doesn't belong there
      because a fscrypt_ctx is not tied to a specific encryption mode.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f099d616
    • T
      ext4: don't allow encrypted operations without keys · 173b8439
      Theodore Ts'o 提交于
      While we allow deletes without the key, the following should not be
      permitted:
      
      # cd /vdc/encrypted-dir-without-key
      # ls -l
      total 4
      -rw-r--r-- 1 root root   0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB
      -rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD
      # mv uRJ5vJh9gE7vcomYMqTAyD  6,LKNRJsp209FbXoSvJWzB
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      173b8439
    • J
      mm: workingset: fix use-after-free in shadow node shrinker · ea07b862
      Johannes Weiner 提交于
      Several people report seeing warnings about inconsistent radix tree
      nodes followed by crashes in the workingset code, which all looked like
      use-after-free access from the shadow node shrinker.
      
      Dave Jones managed to reproduce the issue with a debug patch applied,
      which confirmed that the radix tree shrinking indeed frees shadow nodes
      while they are still linked to the shadow LRU:
      
        WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
        CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
        Call Trace:
           delete_node+0x1e4/0x200
           __radix_tree_delete_node+0xd/0x10
           shadow_lru_isolate+0xe6/0x220
           __list_lru_walk_one.isra.4+0x9b/0x190
           list_lru_walk_one+0x23/0x30
           scan_shadow_nodes+0x2e/0x40
           shrink_slab.part.44+0x23d/0x5d0
           shrink_node+0x22c/0x330
           kswapd+0x392/0x8f0
      
      This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
      inlined radix_tree_shrink().
      
      The problem is with 14b46879 ("mm: workingset: move shadow entry
      tracking to radix tree exceptional tracking"), which passes an update
      callback into the radix tree to link and unlink shadow leaf nodes when
      tree entries change, but forgot to pass the callback when reclaiming a
      shadow node.
      
      While the reclaimed shadow node itself is unlinked by the shrinker, its
      deletion from the tree can cause the left-most leaf node in the tree to
      be shrunk.  If that happens to be a shadow node as well, we don't unlink
      it from the LRU as we should.
      
      Consider this tree, where the s are shadow entries:
      
             root->rnode
                  |
             [0       n]
              |       |
           [s    ] [sssss]
      
      Now the shadow node shrinker reclaims the rightmost leaf node through
      the shadow node LRU:
      
             root->rnode
                  |
             [0        ]
              |
          [s     ]
      
      Because the parent of the deleted node is the first level below the
      root and has only one child in the left-most slot, the intermediate
      level is shrunk and the node containing the single shadow is put in
      its place:
      
             root->rnode
                  |
             [s        ]
      
      The shrinker again sees a single left-most slot in a first level node
      and thus decides to store the shadow in root->rnode directly and free
      the node - which is a leaf node on the shadow node LRU.
      
        root->rnode
             |
             s
      
      Without the update callback, the freed node remains on the shadow LRU,
      where it causes later shrinker runs to crash.
      
      Pass the node updater callback into __radix_tree_delete_node() in case
      the deletion causes the left-most branch in the tree to collapse too.
      
      Also add warnings when linked nodes are freed right away, rather than
      wait for the use-after-free when the list is scanned much later.
      
      Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-and-tested-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea07b862
    • H
      mm: stop leaking PageTables · b0b9b3df
      Hugh Dickins 提交于
      4.10-rc loadtest (even on x86, and even without THPCache) fails with
      "fork: Cannot allocate memory" or some such; and /proc/meminfo shows
      PageTables growing.
      
      Commit 953c66c2 ("mm: THP page cache support for ppc64") that got
      merged in rc1 removed the freeing of an unused preallocated pagetable
      after do_fault_around() has called map_pages().
      
      This is usually a good optimization, so that the followup doesn't have
      to reallocate one; but it's not sufficient to shift the freeing into
      alloc_set_pte(), since there are failure cases (most commonly
      VM_FAULT_RETRY) which never reach finish_fault().
      
      Check and free it at the outer level in do_fault(), then we don't need
      to worry in alloc_set_pte(), and can restore that to how it was (I
      cannot find any reason to pte_free() under lock as it was doing).
      
      And fix a separate pagetable leak, or crash, introduced by the same
      change, that could only show up on some ppc64: why does do_set_pmd()'s
      failure case attempt to withdraw a pagetable when it never deposited
      one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
      Residue of an earlier implementation, perhaps? Delete it.
      
      Fixes: 953c66c2 ("mm: THP page cache support for ppc64")
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0b9b3df
    • L
      Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 87bc6107
      Linus Torvalds 提交于
      Pull kbuild fix from Michal Marek:
       "The asm-prototypes.h file added in the last merge window results in
        invalid code with CONFIG_KMEMCHECK=y. The net result is that genksyms
        segfaults.
      
        This pull request fixes the header, the genksyms fix is in my kbuild
        branch for 4.11"
      
      * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        asm-prototypes: Clear any CPP defines before declaring the functions
      87bc6107
    • G
      MAINTAINERS: add greybus subsystem mailing list · 01d0f715
      Greg Kroah-Hartman 提交于
      The Greybus driver subsystem has a mailing list, so list it in the
      MAINTAINERS file so that people know to send patches there as well.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NJohan Hovold <johan@kernel.org>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01d0f715
  13. 07 1月, 2017 4 次提交