1. 18 10月, 2015 2 次提交
  2. 15 10月, 2015 1 次提交
    • T
      ext4: use private version of page_zero_new_buffers() for data=journal mode · b90197b6
      Theodore Ts'o 提交于
      If there is a error while copying data from userspace into the page
      cache during a write(2) system call, in data=journal mode, in
      ext4_journalled_write_end() were using page_zero_new_buffers() from
      fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
      no good if journalling is enabled.  This is a long-standing bug that
      goes back for years and years in ext3, but a combination of (a)
      data=journal not being very common, (b) in many case it only results
      in a warning message. and (c) only very rarely causes the kernel hang,
      means that we only really noticed this as a problem when commit
      998ef75d caused this failure to happen frequently enough to cause
      generic/208 to fail when run in data=journal mode.
      
      The fix is to have our own version of this function that doesn't call
      mark_dirty_buffer(), since we will end up calling
      ext4_handle_dirty_metadata() on the buffer head(s) in questions very
      shortly afterwards in ext4_journalled_write_end().
      
      Thanks to Dave Hansen and Linus Torvalds for helping to identify the
      root cause of the problem.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.com>
      b90197b6
  3. 03 10月, 2015 5 次提交
    • T
      ext4 crypto: fix bugs in ext4_encrypted_zeroout() · 36086d43
      Theodore Ts'o 提交于
      Fix multiple bugs in ext4_encrypted_zeroout(), including one that
      could cause us to write an encrypted zero page to the wrong location
      on disk, potentially causing data and file system corruption.
      Fortunately, this tends to only show up in stress tests, but even with
      these fixes, we are seeing some test failures with generic/127 --- but
      these are now caused by data failures instead of metadata corruption.
      
      Since ext4_encrypted_zeroout() is only used for some optimizations to
      keep the extent tree from being too fragmented, and
      ext4_encrypted_zeroout() itself isn't all that optimized from a time
      or IOPS perspective, disable the extent tree optimization for
      encrypted inodes for now.  This prevents the data corruption issues
      reported by generic/127 until we can figure out what's going wrong.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      36086d43
    • T
      ext4 crypto: replace some BUG_ON()'s with error checks · 687c3c36
      Theodore Ts'o 提交于
      Buggy (or hostile) userspace should not be able to cause the kernel to
      crash.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      687c3c36
    • T
      ext4 crypto: ext4_page_crypto() doesn't need a encryption context · 3684de8c
      Theodore Ts'o 提交于
      Since ext4_page_crypto() doesn't need an encryption context (at least
      not any more), this allows us to simplify a number function signature
      and also allows us to avoid needing to allocate a context in
      ext4_block_write_begin().  It also means we no longer need a separate
      ext4_decrypt_one() function.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3684de8c
    • T
      ext4: optimize ext4_writepage() for attempted 4k delalloc writes · cccd147a
      Theodore Ts'o 提交于
      In cases where the file system block size is the same as the page
      size, and ext4_writepage() is asked to write out a page which is
      either has the unwritten bit set in the extent tree, or which does not
      yet have a block assigned due to delayed allocation, we can bail out
      early and, unlocking the page earlier and avoiding a round trip
      through ext4_bio_write_page() with the attendant calls to
      set_page_writeback() and redirty_page_for_writeback().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      cccd147a
    • T
      ext4 crypto: fix memory leak in ext4_bio_write_page() · 937d7b84
      Theodore Ts'o 提交于
      There are times when ext4_bio_write_page() is called even though we
      don't actually need to do any I/O.  This happens when ext4_writepage()
      gets called by the jbd2 commit path when an inode needs to force its
      pages written out in order to provide data=ordered guarantees --- and
      a page is backed by an unwritten (e.g., uninitialized) block on disk,
      or if delayed allocation means the page's backing store hasn't been
      allocated yet.  In that case, we need to skip the call to
      ext4_encrypt_page(), since in addition to wasting CPU, it leads to a
      bounce page and an ext4 crypto context getting leaked.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      937d7b84
  4. 24 9月, 2015 3 次提交
  5. 09 9月, 2015 5 次提交
  6. 05 9月, 2015 1 次提交
    • K
      fs: create and use seq_show_option for escaping · a068acf2
      Kees Cook 提交于
      Many file systems that implement the show_options hook fail to correctly
      escape their output which could lead to unescaped characters (e.g.  new
      lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
      could lead to confusion, spoofed entries (resulting in things like
      systemd issuing false d-bus "mount" notifications), and who knows what
      else.  This looks like it would only be the root user stepping on
      themselves, but it's possible weird things could happen in containers or
      in other situations with delegated mount privileges.
      
      Here's an example using overlay with setuid fusermount trusting the
      contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
      of "sudo" is something more sneaky:
      
        $ BASE="ovl"
        $ MNT="$BASE/mnt"
        $ LOW="$BASE/lower"
        $ UP="$BASE/upper"
        $ WORK="$BASE/work/ 0 0
        none /proc fuse.pwn user_id=1000"
        $ mkdir -p "$LOW" "$UP" "$WORK"
        $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
        $ cat /proc/mounts
        none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
        none /proc fuse.pwn user_id=1000 0 0
        $ fusermount -u /proc
        $ cat /proc/mounts
        cat: /proc/mounts: No such file or directory
      
      This fixes the problem by adding new seq_show_option and
      seq_show_option_n helpers, and updating the vulnerable show_option
      handlers to use them as needed.  Some, like SELinux, need to be open
      coded due to unusual existing escape mechanisms.
      
      [akpm@linux-foundation.org: add lost chunk, per Kees]
      [keescook@chromium.org: seq_show_option should be using const parameters]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NJan Kara <jack@suse.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Cc: J. R. Okajima <hooanon05g@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a068acf2
  7. 16 8月, 2015 2 次提交
    • T
      Revert "ext4: remove block_device_ejected" · bdfe0cbd
      Theodore Ts'o 提交于
      This reverts commit 08439fec.
      
      Unfortunately we still need to test for bdi->dev to avoid a crash when a
      USB stick is yanked out while a file system is mounted:
      
         usb 2-2: USB disconnect, device number 2
         Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write
         JBD2: Error -5 detected when updating journal superblock for sdb1-8.
         BUG: unable to handle kernel paging request at 34beb000
         IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0
         *pdpt = 0000000023db9001 *pde = 0000000000000000 
         Oops: 0000 [#1] SMP 
         CPU: 0 PID: 4083 Comm: umount Tainted: G     U     OE   4.1.1-040101-generic #201507011435
         Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011
         task: ebf06b50 ti: ebebc000 task.ti: ebebc000
         EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0
         EIP is at __percpu_counter_add+0x18/0xc0
         EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001
         ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40
          DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
         CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0
         Stack:
          c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160
          ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160
          f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000
         Call Trace:
          [<c1160454>] account_page_dirtied+0x74/0x110
          [<c11e613f>] __set_page_dirty+0x3f/0xb0
          [<c11e6203>] mark_buffer_dirty+0x53/0xc0
          [<c124a0cb>] ext4_commit_super+0x17b/0x250
          [<c124ac71>] ext4_put_super+0xc1/0x320
          [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0
          [<c11cfeda>] ? evict_inodes+0xca/0xe0
          [<c11b925a>] generic_shutdown_super+0x6a/0xe0
          [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0
          [<c1165a50>] ? unregister_shrinker+0x40/0x50
          [<c11b92f6>] kill_block_super+0x26/0x70
          [<c11b94f5>] deactivate_locked_super+0x45/0x80
          [<c11ba007>] deactivate_super+0x47/0x60
          [<c11d2b39>] cleanup_mnt+0x39/0x80
          [<c11d2bc0>] __cleanup_mnt+0x10/0x20
          [<c1080b51>] task_work_run+0x91/0xd0
          [<c1011e3c>] do_notify_resume+0x7c/0x90
          [<c1720da5>] work_notify
         Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89
         EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40
         CR2: 0000000034beb000
         ---[ end trace dd564a7bea834ecd ]---
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101011Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bdfe0cbd
    • T
      ext4: ratelimit the file system mounted message · e294a537
      Theodore Ts'o 提交于
      The xfstests ext4/305 will mount and unmount the same file system over
      4,000 times, and each one of these will cause a system log message.
      Ratelimit this message since if we are getting more than a few dozen
      of these messages, they probably aren't going to be helpful.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      e294a537
  8. 15 8月, 2015 3 次提交
  9. 14 8月, 2015 1 次提交
  10. 29 7月, 2015 3 次提交
  11. 27 7月, 2015 1 次提交
  12. 24 7月, 2015 3 次提交
  13. 23 7月, 2015 1 次提交
    • D
      ext4, jbd2: add REQ_FUA flag when recording an error in the superblock · 564bc402
      Daeho Jeong 提交于
      When an error condition is detected, an error status should be recorded into
      superblocks of EXT4 or JBD2. However, the write request is submitted now
      without REQ_FUA flag, even in "barrier=1" mode, which is followed by
      panic() function in "errors=panic" mode. On mobile devices which make
      whole system reset as soon as kernel panic occurs, this write request
      containing an error flag will disappear just from storage cache without
      written to the physical cells. Therefore, when next start, even forever,
      the error flag cannot be shown in both superblocks, and e2fsck cannot fix
      the filesystem problems automatically, unless e2fsck is executed in
      force checking mode.
      
      [ Changed use test_opt(sb, BARRIER) of checking the journal flags -- TYT ]
      Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      564bc402
  14. 22 7月, 2015 5 次提交
  15. 17 7月, 2015 2 次提交
    • T
      ext4 crypto: check for too-short encrypted file names · 27977b69
      Theodore Ts'o 提交于
      An encrypted file name should never be shorter than an 16 bytes, the
      AES block size.  The 3.10 crypto layer will oops and crash the kernel
      if ciphertext shorter than the block size is passed to it.
      
      Fortunately, in modern kernels the crypto layer will not crash the
      kernel in this scenario, but nevertheless, it represents a corrupted
      directory, and we should detect it and mark the file system as
      corrupted so that e2fsck can fix this.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      27977b69
    • T
      ext4 crypto: use a jbd2 transaction when adding a crypto policy · 806c24ad
      Theodore Ts'o 提交于
      Start a jbd2 transaction, and mark the inode dirty on the inode under
      that transaction after setting the encrypt flag.  Otherwise if the
      directory isn't modified after setting the crypto policy, the
      encrypted flag might not survive the inode getting pushed out from
      memory, or the the file system getting unmounted and remounted.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      806c24ad
  16. 10 7月, 2015 1 次提交
  17. 06 7月, 2015 1 次提交