1. 14 11月, 2015 1 次提交
  2. 11 11月, 2015 1 次提交
    • R
      vfs: remove unused wrapper block_page_mkwrite() · 5c500029
      Ross Zwisler 提交于
      The function currently called "__block_page_mkwrite()" used to be called
      "block_page_mkwrite()" until a wrapper for this function was added by:
      
      commit 24da4fab ("vfs: Create __block_page_mkwrite() helper passing
      	error values back")
      
      This wrapper, the current "block_page_mkwrite()", is currently unused.
      __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.
      
      Remove the unused wrapper, rename __block_page_mkwrite() back to
      block_page_mkwrite() and update the comment above block_page_mkwrite().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c500029
  3. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  4. 07 11月, 2015 2 次提交
    • M
      mm, fs: introduce mapping_gfp_constraint() · c62d2555
      Michal Hocko 提交于
      There are many places which use mapping_gfp_mask to restrict a more
      generic gfp mask which would be used for allocations which are not
      directly related to the page cache but they are performed in the same
      context.
      
      Let's introduce a helper function which makes the restriction explicit and
      easier to track.  This patch doesn't introduce any functional changes.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c62d2555
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  5. 30 10月, 2015 1 次提交
  6. 21 10月, 2015 1 次提交
    • D
      KEYS: Merge the type-specific data with the payload data · 146aa8b1
      David Howells 提交于
      Merge the type-specific data with the payload data into one four-word chunk
      as it seems pointless to keep them separate.
      
      Use user_key_payload() for accessing the payloads of overloaded
      user-defined keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: linux-cifs@vger.kernel.org
      cc: ecryptfs@vger.kernel.org
      cc: linux-ext4@vger.kernel.org
      cc: linux-f2fs-devel@lists.sourceforge.net
      cc: linux-nfs@vger.kernel.org
      cc: ceph-devel@vger.kernel.org
      cc: linux-ima-devel@lists.sourceforge.net
      146aa8b1
  7. 19 10月, 2015 4 次提交
  8. 18 10月, 2015 8 次提交
  9. 17 10月, 2015 1 次提交
    • M
      mm, fs: obey gfp_mapping for add_to_page_cache() · 063d99b4
      Michal Hocko 提交于
      Commit 6afdb859 ("mm: do not ignore mapping_gfp_mask in page cache
      allocation paths") has caught some users of hardcoded GFP_KERNEL used in
      the page cache allocation paths.  This, however, wasn't complete and
      there were others which went unnoticed.
      
      Dave Chinner has reported the following deadlock for xfs on loop device:
      : With the recent merge of the loop device changes, I'm now seeing
      : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
      :
      : The deadlocked is as follows:
      :
      : kloopd1: loop_queue_read_work
      :       xfs_file_iter_read
      :       lock XFS inode XFS_IOLOCK_SHARED (on image file)
      :       page cache read (GFP_KERNEL)
      :       radix tree alloc
      :       memory reclaim
      :       reclaim XFS inodes
      :       log force to unpin inodes
      :       <wait for log IO completion>
      :
      : xfs-cil/loop1: <does log force IO work>
      :       xlog_cil_push
      :       xlog_write
      :       <loop issuing log writes>
      :               xlog_state_get_iclog_space()
      :               <blocks due to all log buffers under write io>
      :               <waits for IO completion>
      :
      : kloopd1: loop_queue_write_work
      :       xfs_file_write_iter
      :       lock XFS inode XFS_IOLOCK_EXCL (on image file)
      :       <wait for inode to be unlocked>
      :
      : i.e. the kloopd, with it's split read and write work queues, has
      : introduced a dependency through memory reclaim. i.e. that writes
      : need to be able to progress for reads make progress.
      :
      : The problem, fundamentally, is that mpage_readpages() does a
      : GFP_KERNEL allocation, rather than paying attention to the inode's
      : mapping gfp mask, which is set to GFP_NOFS.
      :
      : The didn't used to happen, because the loop device used to issue
      : reads through the splice path and that does:
      :
      :       error = add_to_page_cache_lru(page, mapping, index,
      :                       GFP_KERNEL & mapping_gfp_mask(mapping));
      
      This has changed by commit aa4d8616 ("block: loop: switch to VFS
      ITER_BVEC").
      
      This patch changes mpage_readpage{s} to follow gfp mask set for the
      mapping.  There are, however, other places which are doing basically the
      same.
      
      lustre:ll_dir_filler is doing GFP_KERNEL from the function which
      apparently uses GFP_NOFS for other allocations so let's make this
      consistent.
      
      cifs:readpages_get_pages is called from cifs_readpages and
      __cifs_readpages_from_fscache called from the same path obeys mapping
      gfp.
      
      ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
      regardless it uses mapping_gfp_mask for the page allocation.
      
      ext4_mpage_readpages is the called from the page cache allocation path
      same as read_pages and read_cache_pages
      
      As I've noticed in my previous post I cannot say I would be happy about
      sprinkling mapping_gfp_mask all over the place and it sounds like we
      should drop gfp_mask argument altogether and use it internally in
      __add_to_page_cache_locked that would require all the filesystems to use
      mapping gfp consistently which I am not sure is the case here.  From a
      quick glance it seems that some file system use it all the time while
      others are selective.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      063d99b4
  10. 15 10月, 2015 1 次提交
    • T
      ext4: use private version of page_zero_new_buffers() for data=journal mode · b90197b6
      Theodore Ts'o 提交于
      If there is a error while copying data from userspace into the page
      cache during a write(2) system call, in data=journal mode, in
      ext4_journalled_write_end() were using page_zero_new_buffers() from
      fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
      no good if journalling is enabled.  This is a long-standing bug that
      goes back for years and years in ext3, but a combination of (a)
      data=journal not being very common, (b) in many case it only results
      in a warning message. and (c) only very rarely causes the kernel hang,
      means that we only really noticed this as a problem when commit
      998ef75d caused this failure to happen frequently enough to cause
      generic/208 to fail when run in data=journal mode.
      
      The fix is to have our own version of this function that doesn't call
      mark_dirty_buffer(), since we will end up calling
      ext4_handle_dirty_metadata() on the buffer head(s) in questions very
      shortly afterwards in ext4_journalled_write_end().
      
      Thanks to Dave Hansen and Linus Torvalds for helping to identify the
      root cause of the problem.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.com>
      b90197b6
  11. 03 10月, 2015 5 次提交
    • T
      ext4 crypto: fix bugs in ext4_encrypted_zeroout() · 36086d43
      Theodore Ts'o 提交于
      Fix multiple bugs in ext4_encrypted_zeroout(), including one that
      could cause us to write an encrypted zero page to the wrong location
      on disk, potentially causing data and file system corruption.
      Fortunately, this tends to only show up in stress tests, but even with
      these fixes, we are seeing some test failures with generic/127 --- but
      these are now caused by data failures instead of metadata corruption.
      
      Since ext4_encrypted_zeroout() is only used for some optimizations to
      keep the extent tree from being too fragmented, and
      ext4_encrypted_zeroout() itself isn't all that optimized from a time
      or IOPS perspective, disable the extent tree optimization for
      encrypted inodes for now.  This prevents the data corruption issues
      reported by generic/127 until we can figure out what's going wrong.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      36086d43
    • T
      ext4 crypto: replace some BUG_ON()'s with error checks · 687c3c36
      Theodore Ts'o 提交于
      Buggy (or hostile) userspace should not be able to cause the kernel to
      crash.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      687c3c36
    • T
      ext4 crypto: ext4_page_crypto() doesn't need a encryption context · 3684de8c
      Theodore Ts'o 提交于
      Since ext4_page_crypto() doesn't need an encryption context (at least
      not any more), this allows us to simplify a number function signature
      and also allows us to avoid needing to allocate a context in
      ext4_block_write_begin().  It also means we no longer need a separate
      ext4_decrypt_one() function.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3684de8c
    • T
      ext4: optimize ext4_writepage() for attempted 4k delalloc writes · cccd147a
      Theodore Ts'o 提交于
      In cases where the file system block size is the same as the page
      size, and ext4_writepage() is asked to write out a page which is
      either has the unwritten bit set in the extent tree, or which does not
      yet have a block assigned due to delayed allocation, we can bail out
      early and, unlocking the page earlier and avoiding a round trip
      through ext4_bio_write_page() with the attendant calls to
      set_page_writeback() and redirty_page_for_writeback().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      cccd147a
    • T
      ext4 crypto: fix memory leak in ext4_bio_write_page() · 937d7b84
      Theodore Ts'o 提交于
      There are times when ext4_bio_write_page() is called even though we
      don't actually need to do any I/O.  This happens when ext4_writepage()
      gets called by the jbd2 commit path when an inode needs to force its
      pages written out in order to provide data=ordered guarantees --- and
      a page is backed by an unwritten (e.g., uninitialized) block on disk,
      or if delayed allocation means the page's backing store hasn't been
      allocated yet.  In that case, we need to skip the call to
      ext4_encrypt_page(), since in addition to wasting CPU, it leads to a
      bounce page and an ext4 crypto context getting leaked.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      937d7b84
  12. 29 9月, 2015 1 次提交
  13. 24 9月, 2015 4 次提交
  14. 09 9月, 2015 5 次提交
  15. 05 9月, 2015 1 次提交
    • K
      fs: create and use seq_show_option for escaping · a068acf2
      Kees Cook 提交于
      Many file systems that implement the show_options hook fail to correctly
      escape their output which could lead to unescaped characters (e.g.  new
      lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
      could lead to confusion, spoofed entries (resulting in things like
      systemd issuing false d-bus "mount" notifications), and who knows what
      else.  This looks like it would only be the root user stepping on
      themselves, but it's possible weird things could happen in containers or
      in other situations with delegated mount privileges.
      
      Here's an example using overlay with setuid fusermount trusting the
      contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
      of "sudo" is something more sneaky:
      
        $ BASE="ovl"
        $ MNT="$BASE/mnt"
        $ LOW="$BASE/lower"
        $ UP="$BASE/upper"
        $ WORK="$BASE/work/ 0 0
        none /proc fuse.pwn user_id=1000"
        $ mkdir -p "$LOW" "$UP" "$WORK"
        $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
        $ cat /proc/mounts
        none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
        none /proc fuse.pwn user_id=1000 0 0
        $ fusermount -u /proc
        $ cat /proc/mounts
        cat: /proc/mounts: No such file or directory
      
      This fixes the problem by adding new seq_show_option and
      seq_show_option_n helpers, and updating the vulnerable show_option
      handlers to use them as needed.  Some, like SELinux, need to be open
      coded due to unusual existing escape mechanisms.
      
      [akpm@linux-foundation.org: add lost chunk, per Kees]
      [keescook@chromium.org: seq_show_option should be using const parameters]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NJan Kara <jack@suse.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Cc: J. R. Okajima <hooanon05g@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a068acf2
  16. 16 8月, 2015 2 次提交
    • T
      Revert "ext4: remove block_device_ejected" · bdfe0cbd
      Theodore Ts'o 提交于
      This reverts commit 08439fec.
      
      Unfortunately we still need to test for bdi->dev to avoid a crash when a
      USB stick is yanked out while a file system is mounted:
      
         usb 2-2: USB disconnect, device number 2
         Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write
         JBD2: Error -5 detected when updating journal superblock for sdb1-8.
         BUG: unable to handle kernel paging request at 34beb000
         IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0
         *pdpt = 0000000023db9001 *pde = 0000000000000000 
         Oops: 0000 [#1] SMP 
         CPU: 0 PID: 4083 Comm: umount Tainted: G     U     OE   4.1.1-040101-generic #201507011435
         Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011
         task: ebf06b50 ti: ebebc000 task.ti: ebebc000
         EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0
         EIP is at __percpu_counter_add+0x18/0xc0
         EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001
         ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40
          DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
         CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0
         Stack:
          c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160
          ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160
          f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000
         Call Trace:
          [<c1160454>] account_page_dirtied+0x74/0x110
          [<c11e613f>] __set_page_dirty+0x3f/0xb0
          [<c11e6203>] mark_buffer_dirty+0x53/0xc0
          [<c124a0cb>] ext4_commit_super+0x17b/0x250
          [<c124ac71>] ext4_put_super+0xc1/0x320
          [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0
          [<c11cfeda>] ? evict_inodes+0xca/0xe0
          [<c11b925a>] generic_shutdown_super+0x6a/0xe0
          [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0
          [<c1165a50>] ? unregister_shrinker+0x40/0x50
          [<c11b92f6>] kill_block_super+0x26/0x70
          [<c11b94f5>] deactivate_locked_super+0x45/0x80
          [<c11ba007>] deactivate_super+0x47/0x60
          [<c11d2b39>] cleanup_mnt+0x39/0x80
          [<c11d2bc0>] __cleanup_mnt+0x10/0x20
          [<c1080b51>] task_work_run+0x91/0xd0
          [<c1011e3c>] do_notify_resume+0x7c/0x90
          [<c1720da5>] work_notify
         Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89
         EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40
         CR2: 0000000034beb000
         ---[ end trace dd564a7bea834ecd ]---
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101011Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bdfe0cbd
    • T
      ext4: ratelimit the file system mounted message · e294a537
      Theodore Ts'o 提交于
      The xfstests ext4/305 will mount and unmount the same file system over
      4,000 times, and each one of these will cause a system log message.
      Ratelimit this message since if we are getting more than a few dozen
      of these messages, they probably aren't going to be helpful.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      e294a537
  17. 15 8月, 2015 1 次提交