1. 16 6月, 2015 2 次提交
  2. 12 6月, 2015 1 次提交
  3. 11 6月, 2015 4 次提交
  4. 02 6月, 2015 4 次提交
    • J
      NFS: drop unneeded goto · 13985b1f
      Julia Lawall 提交于
      Delete jump to a label on the next line, when that label is not
      used elsewhere.
      
      A simplified version of the semantic patch that makes this change is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier l;
      @@
      
      -if (...) goto l;
      -l:
      // </smpl>
      
      Also drop the unnecessary ret variable.
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      13985b1f
    • C
      NFS: Fix size of NFSACL SETACL operations · d683cc49
      Chuck Lever 提交于
      When encoding the NFSACL SETACL operation, reserve just the estimated
      size of the ACL rather than a fixed maximum. This eliminates needless
      zero padding on the wire that the server ignores.
      
      Fixes: ee5dc773 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d683cc49
    • N
      NFS: report more appropriate block size for directories. · 7ef5ca4f
      NeilBrown 提交于
      In glibc 2.21 (and several previous), a call to opendir() will
      result in a 32K (BUFSIZ*4) buffer being allocated and passed to
      getdents.
      
      However a call to fdopendir() results in an 'fstat' request to
      determine block size and a matching buffer allocated for subsequent
      use with getdents.  This will typically be 1M.
      
      The first getdents call on an NFS directory will always use
      READDIR_PLUS (or NFSv4 equivalent) if available.  Subsequent getdents
      calls only use this more expensive version if some 'stat' requests are
      made between the getdents calls.
      
      For this reason it is good to keep at least that first getdents call
      relatively short.  When fdopendir() and readdir() is used on a large
      directory, it takes approximately 32 times as long to complete as
      using "opendir".  Current versions of 'find' use fdopendir() and
      demonstrate this slowness.
      
      'stat' on a directory currently returns the 'wsize'.  This number has
      no meaning on directories.
      Actual READDIR requests are limited to ->dtsize, which itself is
      capped at 4 pages, coincidently the same as BUFSIZ*4.
      So this is a meaningful number to use as the blocksize on directories,
      and has the effect of making 'find' on large directories go a lot
      faster.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7ef5ca4f
    • T
      NFSv4: Always drain the slot table before re-establishing the lease · 5cae02f4
      Trond Myklebust 提交于
      While the NFSv4.1 code has always drained the slot tables in order to stop
      non-recovery related RPC calls when doing lease recovery, the NFSv4 code
      did not.
      The reason for the difference in behaviour is that NFSv4 does not have
      session state, and so RPC calls can in theory proceed while recovery is
      happening. In practice, however, anything I/O or state related needs to
      wait until recovery is over.
      
      This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that
      we can simplify the state recovery code by assuming that we do not have to
      deal with races between recovery and ordinary I/O.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5cae02f4
  5. 01 6月, 2015 1 次提交
    • O
      fixing infinite OPEN loop in 4.0 stateid recovery · e8d975e7
      Olga Kornievskaia 提交于
      Problem: When an operation like WRITE receives a BAD_STATEID, even though
      recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
      the open state, because of clearing delegation state for the associated
      inode, nfs_inode_find_state_and_recover() gets called and it makes the
      same state with RECLAIM_NOGRACE flag again. As a results, when we restart
      looking over the open states, we end up in the infinite loop instead of
      breaking out in the next test of state flags.
      
      Solution: unset the RECLAIM_NOGRACE set because of
      calling of nfs_inode_find_state_and_recover() after returning from calling
      recover_open() function.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e8d975e7
  6. 29 5月, 2015 12 次提交
    • A
      d_walk() might skip too much · 2159184e
      Al Viro 提交于
      when we find that a child has died while we'd been trying to ascend,
      we should go into the first live sibling itself, rather than its sibling.
      
      Off-by-one in question had been introduced in "deal with deadlock in
      d_walk()" and the fix needs to be backported to all branches this one
      has been backported to.
      
      Cc: stable@vger.kernel.org # 3.2 and later
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2159184e
    • B
      omfs: fix potential integer overflow in allocator · 5a6b2b36
      Bob Copeland 提交于
      Both 'i' and 'bits_per_entry' are signed integers but the result is a
      u64 block number.  Cast i to u64 to avoid truncation on 32-bit targets.
      
      Found by Coverity (CID 200679).
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a6b2b36
    • B
      omfs: fix sign confusion for bitmap loop counter · c0345ee5
      Bob Copeland 提交于
      The count variable is used to iterate down to (below) zero from the size
      of the bitmap and handle the one-filling the remainder of the last
      partial bitmap block.  The loop conditional expects count to be signed
      in order to detect when the final block is processed, after which count
      goes negative.
      
      Unfortunately, a recent change made this unsigned along with some other
      related fields.  The result of is this is that during mount,
      omfs_get_imap will overrun the bitmap array and corrupt memory unless
      number of blocks happens to be a multiple of 8 * blocksize.
      
      Fix by changing count back to signed: it is guaranteed to fit in an s32
      without overflow due to an enforced limit on the number of blocks in the
      filesystem.
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0345ee5
    • B
      omfs: set error return when d_make_root() fails · 3a281f94
      Bob Copeland 提交于
      A static checker found the following issue in the error path for
      omfs_fill_super:
      
          fs/omfs/inode.c:552 omfs_fill_super()
          warn: missing error code here? 'd_make_root()' failed. 'ret' = '0'
      
      Fix by returning -ENOMEM in this case.
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a281f94
    • S
      fs, omfs: add NULL terminator in the end up the token list · dcbff39d
      Sasha Levin 提交于
      match_token() expects a NULL terminator at the end of the token list so
      that it would know where to stop.  Not having one causes it to overrun
      to invalid memory.
      
      In practice, passing a mount option that omfs didn't recognize would
      sometimes panic the system.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcbff39d
    • A
      fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings · 2b1d3ae9
      Andrew Morton 提交于
      load_elf_binary() returns `retval', not `error'.
      
      Fixes: a87938b2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
      Reported-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Michael Davidson <md@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b1d3ae9
    • B
      xfs: fix broken i_nlink accounting for whiteout tmpfile inode · 22419ac9
      Brian Foster 提交于
      XFS uses the internal tmpfile() infrastructure for the whiteout inode
      used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
      the inode, drops di_nlink, adds the inode to the agi unlinked list,
      calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
      and then finishes the common inode setup (e.g., clear I_NEW and unlock).
      
      The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
      pulled up out of that function as part of the following commit to
      resolve a deadlock issue:
      
      	330033d6 xfs: fix tmpfile/selinux deadlock and initialize security
      
      As a result, callers of xfs_create_tmpfile() are responsible for either
      calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
      tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
      becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
      it back into the source dentry and calls xfs_bumplink().
      
      Update the assert in xfs_rename() to help detect this problem in the
      future and update xfs_rename_alloc_whiteout() to decrement the link
      count as part of the manual tmpfile inode setup.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      22419ac9
    • D
      xfs: xfs_iozero can return positive errno · cddc1162
      Dave Chinner 提交于
      It was missed when we converted everything in XFs to use negative error
      numbers, so fix it now. Bug introduced in 3.17 by commit 2451337d ("xfs: global
      error sign conversion"), and should go back to stable kernels.
      
      Thanks to Brian Foster for noticing it.
      
      cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      cddc1162
    • D
      xfs: xfs_attr_inactive leaves inconsistent attr fork state behind · 6dfe5a04
      Dave Chinner 提交于
      xfs_attr_inactive() is supposed to clean up the attribute fork when
      the inode is being freed. While it removes attribute fork extents,
      it completely ignores attributes in local format, which means that
      there can still be active attributes on the inode after
      xfs_attr_inactive() has run.
      
      This leads to problems with concurrent inode writeback - the in-core
      inode attribute fork is removed without locking on the assumption
      that nothing will be attempting to access the attribute fork after a
      call to xfs_attr_inactive() because it isn't supposed to exist on
      disk any more.
      
      To fix this, make xfs_attr_inactive() completely remove all traces
      of the attribute fork from the inode, regardless of it's state.
      Further, also remove the in-core attribute fork structure safely so
      that there is nothing further that needs to be done by callers to
      clean up the attribute fork. This means we can remove the in-core
      and on-disk attribute forks atomically.
      
      Also, on error simply remove the in-memory attribute fork. There's
      nothing that can be done with it once we have failed to remove the
      on-disk attribute fork, so we may as well just blow it away here
      anyway.
      
      cc: <stable@vger.kernel.org> # 3.12 to 4.0
      Reported-by: NWaiman Long <waiman.long@hp.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      6dfe5a04
    • D
      xfs: extent size hints can round up extents past MAXEXTLEN · 6dea405e
      Dave Chinner 提交于
      This results in BMBT corruption, as seen by this test:
      
      # mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc
      ....
      # mount /dev/vdc /mnt/scratch
      # xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo
      
      which results in this failure on a debug kernel:
      
      XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
      ....
      Call Trace:
       [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
       [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
       [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
       [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
       [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
       [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
       [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
       [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
       [<ffffffff81ddcc29>] ? down_write+0x29/0x60
       [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
       [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
       [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
       [<ffffffff811cd680>] vfs_fallocate+0x140/0x260
       [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
       [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17
      
      The tracepoint that indicates the extent that triggered the assert
      failure is:
      
      xfs_iext_insert:   idx 0 offset 0 block 16777224 count 2097152 flag 1
      
      Clearly indicating that the extent length is greater than MAXEXTLEN,
      which is 2097151. A prior trace point shows the allocation was an
      exact size match and that a length greater than MAXEXTLEN was asked
      for:
      
      xfs_alloc_size_done:  agno 1 agbno 8 minlen 2097152 maxlen 2097152
      					    ^^^^^^^        ^^^^^^^
      
      We don't see this problem with extent size hints through the IO path
      because we can't do single IOs large enough to trigger MAXEXTLEN
      allocation. fallocate(), OTOH, is not limited in it's allocation
      sizes and so needs help here.
      
      The issue is that the extent size hint alignment is rounding up the
      extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
      into account extent size hints when calculating the maximum extent
      length to allocate. xfs_bmapi_reserve_delalloc() is already doing
      this, but direct extent allocation is not.
      
      Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
      wrong, and it works only because delayed allocation extents are not
      limited in size to MAXEXTLEN in the in-core extent tree. hence this
      calculation does not work for direct allocation, and the delalloc
      code needs fixing. This may, in fact be the underlying bug that
      occassionally causes transaction overruns in delayed allocation
      extent conversion, so now we know it's wrong we should fix it, too.
      Many thanks to Brian Foster for finding this problem during review
      of this patch.
      
      Hence the fix, after much code reading, is to allow
      xfs_bmap_extsize_align() to align partial extents when full
      alignment would extend the alignment past MAXEXTLEN. We can safely
      do this because all callers have higher layer allocation loops that
      already handle short allocations, and so will simply run another
      allocation to cover the remainder of the requested allocation range
      that we ignored during alignment. The advantage of this approach is
      that it also removes the need for callers to do anything other than
      limit their requests to MAXEXTLEN - they don't really need to be
      aware of extent size hints at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6dea405e
    • D
      xfs: inode and free block counters need to use __percpu_counter_compare · 8c1903d3
      Dave Chinner 提交于
      Because the counters use a custom batch size, the comparison
      functions need to be aware of that batch size otherwise the
      comparison does not work correctly. This leads to ASSERT failures
      on generic/027 like this:
      
       XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
       ------------[ cut here ]------------
      ....
       Call Trace:
        [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
        [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
        [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
        [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
        [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
        [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
        [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200
        [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
        [<ffffffff811f3958>] evict+0xb8/0x190
        [<ffffffff811f433b>] iput+0x18b/0x1f0
        [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320
        [<ffffffff811d548a>] ? filp_close+0x5a/0x80
        [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
        [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71
      
      This is a regression introduced by commit 501ab323 ("xfs: use generic
      percpu counters for inode counter").
      
      This patch fixes the same problem for both the inode counter and the
      free block counter in the superblocks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8c1903d3
    • G
      xfs: use percpu_counter_read_positive for mp->m_icount · 74f9ce1c
      George Wang 提交于
      Function percpu_counter_read just return the current counter, which can be
      negative. This will cause the checking of "allocated inode
      counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
      solve this problem, and be consistent with the purpose to introduce percpu
      mechanism to xfs.
      Signed-off-by: NGeorge Wang <xuw2015@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      74f9ce1c
  7. 21 5月, 2015 6 次提交
  8. 20 5月, 2015 3 次提交
    • S
      [cifs] fix null pointer check · 1dc92c45
      Steve French 提交于
      Dan Carpenter pointed out an inconsistent null pointer check
      in smb2_hdr_assemble that was pointed out by static checker.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NSachin Prabhu <sprabhu@redhat.com>
      CC: Dan Carpenter <dan.carpenter@oracle.com>w
      1dc92c45
    • F
      Btrfs: fix racy system chunk allocation when setting block group ro · a9629596
      Filipe Manana 提交于
      If while setting a block group read-only we end up allocating a system
      chunk, through check_system_chunk(), we were not doing it while holding
      the chunk mutex which is a problem if a concurrent chunk allocation is
      happening, through do_chunk_alloc(), as it means both block groups can
      end up using the same logical addresses and physical regions in the
      device(s). So make sure we hold the chunk mutex.
      
      Cc: stable@vger.kernel.org  # 4.0+
      Fixes: 2f081088 ("btrfs: delete chunk allocation attemp when
                            setting block group ro")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a9629596
    • M
      btrfs: clear 'ret' in btrfs_check_shared() loop · 2c2ed5aa
      Mark Fasheh 提交于
      btrfs_check_shared() is leaking a return value of '1' from
      find_parent_nodes(). As a result, callers (in this case, extent_fiemap())
      are told extents are shared when they are not. This in turn broke fiemap on
      btrfs for kernels v3.18 and up.
      
      The fix is simple - we just have to clear 'ret' after we are done processing
      the results of find_parent_nodes().
      
      It wasn't clear to me at first what was happening with return values in
      btrfs_check_shared() and find_parent_nodes() - thanks to Josef for the help
      on irc. I added documentation to both functions to make things more clear
      for the next hacker who might come across them.
      
      If we could queue this up for -stable too that would be great.
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2c2ed5aa
  9. 19 5月, 2015 1 次提交
    • M
      ovl: mount read-only if workdir can't be created · cc6f67bc
      Miklos Szeredi 提交于
      OpenWRT folks reported that overlayfs fails to mount if upper fs is full,
      because workdir can't be created.  Wordir creation can fail for various
      other reasons too.
      
      There's no reason that the mount itself should fail, overlayfs can work
      fine without a workdir, as long as the overlay isn't modified.
      
      So mount it read-only and don't allow remounting read-write.
      
      Add a couple of WARN_ON()s for the impossible case of workdir being used
      despite being read-only.
      
      Reported-by: Bastian Bittorf <bittorf@bluebottle.com> 
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: <stable@vger.kernel.org> # v3.18+
      cc6f67bc
  10. 15 5月, 2015 6 次提交
    • T
      ext4: fix an ext3 collapse range regression in xfstests · b9576fc3
      Theodore Ts'o 提交于
      The xfstests test suite assumes that an attempt to collapse range on
      the range (0, 1) will return EOPNOTSUPP if the file system does not
      support collapse range.  Commit 280227a7: "ext4: move check under
      lock scope to close a race" broke this, and this caused xfstests to
      fail when run when testing file systems that did not have the extents
      feature enabled.
      Reported-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b9576fc3
    • V
      kernfs: do not account ino_ida allocations to memcg · 499611ed
      Vladimir Davydov 提交于
      root->ino_ida is used for kernfs inode number allocations. Since IDA has
      a layered structure, different IDs can reside on the same layer, which
      is currently accounted to some memory cgroup. The problem is that each
      kmem cache of a memory cgroup has its own directory on sysfs (under
      /sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a
      directory or any file in it gets allocated from a layer accounted to the
      cgroup which the cache is created for, the cgroup will get pinned for
      good, because one has to free all kmem allocations accounted to a cgroup
      in order to release it and destroy all its kmem caches. That said we
      must not account layers of ino_ida to any memory cgroup.
      
      Since per net init operations may create new sysfs entries directly
      (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
      per each namespace, which, in turn, creates new sysfs entries), an easy
      way to reproduce this issue is by creating network namespace(s) from
      inside a kmem-active memory cgroup.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      499611ed
    • D
      jbd2: fix r_count overflows leading to buffer overflow in journal recovery · e531d0bc
      Darrick J. Wong 提交于
      The journal revoke block recovery code does not check r_count for
      sanity, which means that an evil value of r_count could result in
      the kernel reading off the end of the revoke table and into whatever
      garbage lies beyond.  This could crash the kernel, so fix that.
      
      However, in testing this fix, I discovered that the code to write
      out the revoke tables also was not correctly checking to see if the
      block was full -- the current offset check is fine so long as the
      revoke table space size is a multiple of the record size, but this
      is not true when either journal_csum_v[23] are set.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      e531d0bc
    • E
      ext4: check for zero length extent explicitly · 2f974865
      Eryu Guan 提交于
      The following commit introduced a bug when checking for zero length extent
      
      5946d089 ext4: check for overlapping extents in ext4_valid_extent_entries()
      
      Zero length extent could pass the check if lblock is zero.
      
      Adding the explicit check for zero length back.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      2f974865
    • L
      ext4: fix NULL pointer dereference when journal restart fails · 9d506594
      Lukas Czerner 提交于
      Currently when journal restart fails, we'll have the h_transaction of
      the handle set to NULL to indicate that the handle has been effectively
      aborted. We handle this situation quietly in the jbd2_journal_stop() and just
      free the handle and exit because everything else has been done before we
      attempted (and failed) to restart the journal.
      
      Unfortunately there are a number of problems with that approach
      introduced with commit
      
      41a5b913 "jbd2: invalidate handle if jbd2_journal_restart()
      fails"
      
      First of all in ext4 jbd2_journal_stop() will be called through
      __ext4_journal_stop() where we would try to get a hold of the superblock
      by dereferencing h_transaction which in this case would lead to NULL
      pointer dereference and crash.
      
      In addition we're going to free the handle regardless of the refcount
      which is bad as well, because others up the call chain will still
      reference the handle so we might potentially reference already freed
      memory.
      
      Moreover it's expected that we'll get aborted handle as well as detached
      handle in some of the journalling function as the error propagates up
      the stack, so it's unnecessary to call WARN_ON every time we get
      detached handle.
      
      And finally we might leak some memory by forgetting to free reserved
      handle in jbd2_journal_stop() in the case where handle was detached from
      the transaction (h_transaction is NULL).
      
      Fix the NULL pointer dereference in __ext4_journal_stop() by just
      calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
      the potential memory leak in jbd2_journal_stop() and use proper
      handle refcounting before we attempt to free it to avoid use-after-free
      issues.
      
      And finally remove all WARN_ON(!transaction) from the code so that we do
      not get random traces when something goes wrong because when journal
      restart fails we will get to some of those functions.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      9d506594
    • T
      ext4: remove unused function prototype from ext4.h · 92c82639
      Theodore Ts'o 提交于
      The ext4_extent_tree_init() function hasn't been in the ext4 code for
      a long time ago, except in an unused function prototype in ext4.h
      
      Google-Bug-Id: 4530137
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      92c82639