1. 09 3月, 2016 2 次提交
    • J
      ext4: use i_mutex to serialize unaligned AIO DIO · e142d052
      Jan Kara 提交于
      Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
      However the code cleanups that happened after 2011 when the lock was
      introduced made aio_mutex acquired at almost the same places where we
      already have exclusion using i_mutex. So just use i_mutex for the
      exclusion of unaligned AIO DIO.
      
      The change moves waiting for pending unwritten extent conversion under
      i_mutex. That makes special handling of O_APPEND writes unnecessary and
      also avoids possible livelocking of unaligned AIO DIO with aligned one
      (nothing was preventing contiguous stream of aligned AIO DIOs to let
      unaligned AIO DIO wait forever).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e142d052
    • J
      ext4: pack ioend structure better · 3bd6ad7b
      Jan Kara 提交于
      On 64-bit architectures we have two 4-byte holes in struct ext4_io_end.
      Order entries better to avoid this and thus make the structure occupy
      64 instead of 72 bytes for 64-bit architectures.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3bd6ad7b
  2. 23 2月, 2016 6 次提交
  3. 22 2月, 2016 2 次提交
  4. 19 2月, 2016 2 次提交
    • J
      ext4: fix crashes in dioread_nolock mode · 74dae427
      Jan Kara 提交于
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      74dae427
    • J
      ext4: fix bh->b_state corruption · ed8ad838
      Jan Kara 提交于
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      
      CC: stable@vger.kernel.org
      Reported-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ed8ad838
  5. 16 2月, 2016 1 次提交
  6. 12 2月, 2016 6 次提交
    • E
      ext4: remove unused parameter "newblock" in convert_initialized_extent() · 56263b4c
      Eryu Guan 提交于
      The "newblock" parameter is not used in convert_initialized_extent(),
      remove it.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      56263b4c
    • E
      ext4: don't read blocks from disk after extents being swapped · bcff2488
      Eryu Guan 提交于
      I notice ext4/307 fails occasionally on ppc64 host, reporting md5
      checksum mismatch after moving data from original file to donor file.
      
      The reason is that move_extent_per_page() calls __block_write_begin()
      and block_commit_write() to write saved data from original inode blocks
      to donor inode blocks, but __block_write_begin() not only maps buffer
      heads but also reads block content from disk if the size is not block
      size aligned.  At this time the physical block number in mapped buffer
      head is pointing to the donor file not the original file, and that
      results in reading wrong data to page, which get written to disk in
      following block_commit_write call.
      
      This also can be reproduced by the following script on 1k block size ext4
      on x86_64 host:
      
          mnt=/mnt/ext4
          donorfile=$mnt/donor
          testfile=$mnt/testfile
          e4compact=~/xfstests/src/e4compact
      
          rm -f $donorfile $testfile
      
          # reserve space for donor file, written by 0xaa and sync to disk to
          # avoid EBUSY on EXT4_IOC_MOVE_EXT
          xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile
      
          # create test file written by 0xbb
          xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile
      
          # compute initial md5sum
          md5sum $testfile | tee md5sum.txt
          # drop cache, force e4compact to read data from disk
          echo 3 > /proc/sys/vm/drop_caches
      
          # test defrag
          echo "$testfile" | $e4compact -i -v -f $donorfile
          # check md5sum
          md5sum -c md5sum.txt
      
      Fix it by creating & mapping buffer heads only but not reading blocks
      from disk, because all the data in page is guaranteed to be up-to-date
      in mext_page_mkuptodate().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bcff2488
    • I
      ext4: fix potential integer overflow · 46901760
      Insu Yun 提交于
      Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
      integer overflow could be happened.
      Therefore, need to fix integer overflow sanitization.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NInsu Yun <wuninsu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      46901760
    • H
      ext4: add a line break for proc mb_groups display · 802cf1f9
      Huaitong Han 提交于
      This patch adds a line break for proc mb_groups display.
      Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      802cf1f9
    • A
      ext4: ioctl: fix erroneous return value · fdde368e
      Anton Protopopov 提交于
      The ext4_ioctl_setflags() function which is used in the ioctls
      EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value
      EPERM instead of -EPERM in case of error. This bug was introduced by a
      recent commit 9b7365fc.
      
      The following program can be used to illustrate the wrong behavior:
      
          #include <sys/types.h>
          #include <sys/ioctl.h>
          #include <sys/stat.h>
          #include <fcntl.h>
          #include <err.h>
      
          #define FS_IOC_GETFLAGS _IOR('f', 1, long)
          #define FS_IOC_SETFLAGS _IOW('f', 2, long)
          #define FS_IMMUTABLE_FL 0x00000010
      
          int main(void)
          {
              int fd;
              long flags;
      
              fd = open("file", O_RDWR|O_CREAT, 0600);
              if (fd < 0)
                  err(1, "open");
      
              if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0)
                  err(1, "ioctl: FS_IOC_GETFLAGS");
      
              flags |= FS_IMMUTABLE_FL;
      
              if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0)
                  err(1, "ioctl: FS_IOC_SETFLAGS");
      
              warnx("ioctl returned no error");
      
              return 0;
          }
      
      Running it gives the following result:
      
          $ strace -e ioctl ./test
          ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0
          ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1
          test: ioctl returned no error
          +++ exited with 0 +++
      
      Running the program on a kernel with the bug fixed gives the proper result:
      
          $ strace -e ioctl ./test
          ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0
          ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted)
          test: ioctl: FS_IOC_SETFLAGS: Operation not permitted
          +++ exited with 1 +++
      Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      fdde368e
    • J
      ext4: fix scheduling in atomic on group checksum failure · 05145bd7
      Jan Kara 提交于
      When block group checksum is wrong, we call ext4_error() while holding
      group spinlock from ext4_init_block_bitmap() or
      ext4_init_inode_bitmap() which results in scheduling while in atomic.
      Fix the issue by calling ext4_error() later after dropping the spinlock.
      
      CC: stable@vger.kernel.org
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      05145bd7
  7. 08 2月, 2016 2 次提交
  8. 23 1月, 2016 2 次提交
    • R
      ext4: call dax_pfn_mkwrite() for DAX fsync/msync · d5be7a03
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5be7a03
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  9. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  10. 09 1月, 2016 4 次提交
  11. 07 1月, 2016 1 次提交
  12. 31 12月, 2015 1 次提交
  13. 14 12月, 2015 1 次提交
  14. 10 12月, 2015 1 次提交
  15. 09 12月, 2015 2 次提交
    • A
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro 提交于
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b255391
    • A
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro 提交于
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  16. 08 12月, 2015 6 次提交