1. 04 3月, 2015 9 次提交
  2. 01 3月, 2015 1 次提交
    • R
      nilfs2: fix potential memory overrun on inode · 957ed60b
      Ryusuke Konishi 提交于
      Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
      have a memory overrun issue:
      
      Each b-tree node of nilfs2 stores a set of key-value pairs and the number
      of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
      a few other "bn_*" members.
      
      Since the value of "bn_nchildren" is used for operations on the key-values
      within the b-tree node, it can cause memory access overrun if a large
      number is incorrectly set to "bn_nchildren".
      
      For instance, nilfs_btree_node_lookup() function determines the range of
      binary search with it, and too large "bn_nchildren" leads
      nilfs_btree_node_get_key() in that function to overrun.
      
      As for intermediate b-tree nodes, this is prevented by a sanity check
      performed when each node is read from a drive, however, no sanity check
      has been done for root nodes stored in inodes.
      
      This patch fixes the issue by adding missing sanity check against b-tree
      root nodes so that it's called when on-memory inodes are read from ifile,
      inode metadata file.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      957ed60b
  3. 24 2月, 2015 2 次提交
  4. 23 2月, 2015 11 次提交
    • D
      xfs: ensure truncate forces zeroed blocks to disk · 5885ebda
      Dave Chinner 提交于
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5885ebda
    • J
      xfs: Fix quota type in quota structures when reusing quota file · dfcc70a8
      Jan Kara 提交于
      For filesystems without separate project quota inode field in the
      superblock we just reuse project quota file for group quotas (and vice
      versa) if project quota file is allocated and we need group quota file.
      When we reuse the file, quota structures on disk suddenly have wrong
      type stored in d_flags though. Nobody really cares about this (although
      structure type reported to userspace was wrong as well) except
      that after commit 14bf61ff (quota: Switch ->get_dqblk() and
      ->set_dqblk() to use bytes as space units) assertion in
      xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
      was testing without XFS_DEBUG so I didn't notice when submitting the
      above commit).
      
      Fix the problem by properly resetting ddq->d_flags when running quotacheck
      for a quota file.
      
      CC: stable@vger.kernel.org
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dfcc70a8
    • A
      autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation · 0a280962
      Al Viro 提交于
      X-Coverup: just ask spender
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a280962
    • A
      procfs: fix race between symlink removals and traversals · 7e0e953b
      Al Viro 提交于
      use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e0e953b
    • A
      debugfs: leave freeing a symlink body until inode eviction · 0db59e59
      Al Viro 提交于
      As it is, we have debugfs_remove() racing with symlink traversals.
      Supply ->evict_inode() and do freeing there - inode will remain
      pinned until we are done with the symlink body.
      
      And rip the idiocy with checking if dentry is positive right after
      we'd verified debugfs_positive(), which is a stronger check...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0db59e59
    • K
      trylock_super(): replacement for grab_super_passive() · eb6ef3df
      Konstantin Khlebnikov 提交于
      I've noticed significant locking contention in memory reclaimer around
      sb_lock inside grab_super_passive(). Grab_super_passive() is called from
      two places: in icache/dcache shrinkers (function super_cache_scan) and
      from writeback (function __writeback_inodes_wb). Both are required for
      progress in memory allocator.
      
      Grab_super_passive() acquires sb_lock to increment sb->s_count and check
      sb->s_instances. It seems sb->s_umount locked for read is enough here:
      super-block deactivation always runs under sb->s_umount locked for write.
      Protecting super-block itself isn't a problem: in super_cache_scan() sb
      is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
      are still active. Inside writeback super-block comes from inode from bdi
      writeback list under wb->list_lock.
      
      This patch removes locking sb_lock and checks s_instances under s_umount:
      generic_shutdown_super() unlinks it under sb->s_umount locked for write.
      New variant is called trylock_super() and since it only locks semaphore,
      callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
      they're done.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eb6ef3df
    • D
      fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · 54f2a2f4
      David Howells 提交于
      Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
      rather than d_is_dir() when checking a dir watch and give an error on fake
      directories.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      54f2a2f4
    • D
      Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · ce40fa78
      David Howells 提交于
      Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
      thereof) in cachefiles:
      
       (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
           it doesn't want to deal with automounts in its cache.
      
       (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
           cachefiles.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ce40fa78
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
    • D
      VFS: Split DCACHE_FILE_TYPE into regular and special types · 44bdb5e5
      David Howells 提交于
      Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular
      files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and
      socket files).
      
      d_is_reg() and d_is_special() are added to detect these subtypes and
      d_is_file() is left as the union of the two.
      
      This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to
      use d_is_reg(dentry) instead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      44bdb5e5
    • D
      VFS: Add a fallthrough flag for marking virtual dentries · df1a085a
      David Howells 提交于
      Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
      a virtual dentry that covers another one in a lower layer that should be used
      instead.  This may be recorded on medium if directory integration is stored
      there.
      
      The flag can be set with d_set_fallthru() and tested with d_is_fallthru().
      
      Original-author: Valerie Aurora <vaurora@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      df1a085a
  5. 20 2月, 2015 7 次提交
  6. 19 2月, 2015 10 次提交
    • H
      x86, mm/ASLR: Fix stack randomization on 64-bit systems · 4e7c22d4
      Hector Marco-Gisbert 提交于
      The issue is that the stack for processes is not properly randomized on
      64 bit architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file
      "fs/binfmt_elf.c":
      
        static unsigned long randomize_stack_top(unsigned long stack_top)
        {
                 unsigned int random_variable = 0;
      
                 if ((current->flags & PF_RANDOMIZE) &&
                         !(current->personality & ADDR_NO_RANDOMIZE)) {
                         random_variable = get_random_int() & STACK_RND_MASK;
                         random_variable <<= PAGE_SHIFT;
                 }
                 return PAGE_ALIGN(stack_top) + random_variable;
                 return PAGE_ALIGN(stack_top) - random_variable;
        }
      
      Note that, it declares the "random_variable" variable as "unsigned int".
      Since the result of the shifting operation between STACK_RND_MASK (which
      is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      	  random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold
      the (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to
      2^30 (One fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved
      in the operations in the functions randomize_stack_top() and
      stack_maxrandom_size().
      
      The successful fix can be tested with:
      
        $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
        7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
        7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
        7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
        7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
        ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff,
      rather than always being 7fff.
      Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: NIsmael Ripoll <iripoll@upv.es>
      [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: CVE-2015-1593
      Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: NBorislav Petkov <bp@suse.de>
      4e7c22d4
    • Y
      ceph: return error for traceless reply race · 4d41cef2
      Yan, Zheng 提交于
      When we receives traceless reply for request that created new inode,
      we re-send a lookup request to MDS get information of the newly created
      inode. (VFS expects FS' callback return an inode in create case)
      This breaks one request into two requests. Other client may modify or
      move to the new inode in the middle.
      
      When the race happens, ceph_handle_notrace_create() unconditionally
      links the dentry for 'create' operation to the inode returned by lookup.
      This may confuse VFS when the inode is a directory (VFS does not allow
      multiple linkages for directory inode).
      
      This patch makes ceph_handle_notrace_create() when it detect a race.
      This event should be rare and it happens only when we talk to old MDS.
      Recent MDS does not send traceless reply for request that creates new
      inode.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      4d41cef2
    • Y
      ceph: fix dentry leaks · 5cba372c
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5cba372c
    • Y
      ceph: re-send requests when MDS enters reconnecting stage · 3de22be6
      Yan, Zheng 提交于
      So that MDS can check if any request is already completed and process
      completed requests in clientreplay stage. When completed requests are
      processed in clientreplay stage, MDS can avoid sending traceless
      replies.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      3de22be6
    • I
    • Y
      ceph: fix atomic_open snapdir · bf91c315
      Yan, Zheng 提交于
      ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
      and creates snapdir inode if it's -ENOENT
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      bf91c315
    • Y
      ceph: properly mark empty directory as complete · 2f92b3d0
      Yan, Zheng 提交于
      ceph_add_cap() calls __check_cap_issue(), which clears directory
      inode' complete flag. so we should set the complete flag for empty
      directory should be set after calling ceph_add_cap().
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      2f92b3d0
    • Y
      client: include kernel version in client metadata · a6a5ce4f
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      a6a5ce4f
    • Y
      ceph: provide seperate {inode,file}_operations for snapdir · 38c48b5f
      Yan, Zheng 提交于
      remove all unsupported operations from {inode,file}_operations.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      38c48b5f
    • Y
      ceph: fix request time stamp encoding · 1f041a89
      Yan, Zheng 提交于
      struct timespec uses 'long' to present second and nanosecond. 'long'
      is 64 bits on 64bits machine. ceph MDS expects time stamp to be
      encoded as struct ceph_timespec, which uses 'u32' to present second
      and nanosecond.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      1f041a89