1. 16 6月, 2011 6 次提交
    • D
      AFS: Use i_generation not i_version for the vnode uniquifier · d6e43f75
      David Howells 提交于
      Store the AFS vnode uniquifier in the i_generation field, not the i_version
      field of the inode struct.  i_version can then be given the AFS data version
      number.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d6e43f75
    • D
      AFS: Set s_id in the superblock to the volume name · 2e41ae22
      David Howells 提交于
      Set s_id in the superblock to the name of the AFS volume that this superblock
      corresponds to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2e41ae22
    • J
      vfs: Fix data corruption after failed write in __block_write_begin() · f9f07b6c
      Jan Kara 提交于
      I've got a report of a file corruption from fsxlinux on ext3. The important
      operations to the page were:
      mapwrite to a hole
      partial write to the page
      read - found the page zeroed from the end of the normal write
      
      The culprit seems to be that if get_block() fails in __block_write_begin()
      (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
      Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
      of the page is needed and overwrites old data.  In fact, I don't see why we
      should ever need to zero the uptodate bit here - either the page was uptodate
      when we entered __block_write_begin() and it should stay so when we leave it,
      or it was not uptodate and noone had right to set it uptodate during
      __block_write_begin() so it remains !uptodate when we leave as well. So just
      remove clearing of the bit.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f9f07b6c
    • A
      afs: afs_fill_page reads too much, or wrong data · 5e7f2337
      Anton Blanchard 提交于
      afs_fill_page should read the page that is about to be written but
      the current implementation has a number of issues. If we aren't
      extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
      are extending the file we try to read the entire file.
      
      Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
      clamped to i_size.
      
      While here, avoid calling afs_fill_page when we are doing a
      PAGE_CACHE_SIZE write.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5e7f2337
    • A
      VFS: Fix vfsmount overput on simultaneous automount · 8aef1884
      Al Viro 提交于
      [Kudos to dhowells for tracking that crap down]
      
      If two processes attempt to cause automounting on the same mountpoint at the
      same time, the vfsmount holding the mountpoint will be left with one too few
      references on it, causing a BUG when the kernel tries to clean up.
      
      The problem is that lock_mount() drops the caller's reference to the
      mountpoint's vfsmount in the case where it finds something already mounted on
      the mountpoint as it transits to the mounted filesystem and replaces path->mnt
      with the new mountpoint vfsmount.
      
      During a pathwalk, however, we don't take a reference on the vfsmount if it is
      the same as the one in the nameidata struct, but do_add_mount() doesn't know
      this.
      
      The fix is to make sure we have a ref on the vfsmount of the mountpoint before
      calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
      left with an extra ref on the mountpoint vfsmount which needs releasing.
      We can handle that in follow_managed() by not making assumptions about what
      we can and what we cannot get from lookup_mnt() as the current code does.
      
      The callers of follow_managed() expect that reference to path->mnt will be
      grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
      keep track of whether such reference has been grabbed and assume that it'll
      happen in those and only those cases that'll have us return with changed
      path->mnt.  That assumption is almost correct - it breaks in case of
      racing automounts and in even harder to hit race between following a mountpoint
      and a couple of mount --move.  The thing is, we don't need to make that
      assumption at all - after the end of loop in follow_manage() we can check
      if path->mnt has ended up unchanged and do mntput() if needed.
      
      The BUG can be reproduced with the following test program:
      
      	#include <stdio.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	int main(int argc, char **argv)
      	{
      		int pid, ws;
      		struct stat buf;
      		pid = fork();
      		stat(argv[1], &buf);
      		if (pid > 0) wait(&ws);
      		return 0;
      	}
      
      and the following procedure:
      
       (1) Mount an NFS volume that on the server has something else mounted on a
           subdirectory.  For instance, I can mount / from my server:
      
      	mount warthog:/ /mnt -t nfs4 -r
      
           On the server /data has another filesystem mounted on it, so NFS will see
           a change in FSID as it walks down the path, and will mark /mnt/data as
           being a mountpoint.  This will cause the automount code to be triggered.
      
           !!! Do not look inside the mounted fs at this point !!!
      
       (2) Run the above program on a file within the submount to generate two
           simultaneous automount requests:
      
      	/tmp/forkstat /mnt/data/testfile
      
       (3) Unmount the automounted submount:
      
      	umount /mnt/data
      
       (4) Unmount the original mount:
      
      	umount /mnt
      
           At this point the kernel should throw a BUG with something like the
           following:
      
      	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]
      
      Note that the bug appears on the root dentry of the original mount, not the
      mountpoint and not the submount because sys_umount() hasn't got to its final
      mntput_no_expire() yet, but this isn't so obvious from the call trace:
      
       [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
       [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
       [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
       [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
       [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
       [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
       [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
       [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
       [<ffffffff811862a8>] mntput+0x3b/0x44
       [<ffffffff81186d87>] release_mounts+0xa2/0xbf
       [<ffffffff811876af>] sys_umount+0x47a/0x4ba
       [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
       [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b
      
      as do_umount() is inlined.  However, you can see release_mounts() in there.
      
      Note also that it may be necessary to have multiple CPU cores to be able to
      trigger this bug.
      Tested-by: NJeff Layton <jlayton@redhat.com>
      Tested-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8aef1884
    • T
      fix wrong iput on d_inode introduced by e6bc45d6 · 50338b88
      Török Edwin 提交于
      Git bisection shows that commit e6bc45d6 causes
      BUG_ONs under high I/O load:
      
      kernel BUG at fs/inode.c:1368!
      [ 2862.501007] Call Trace:
      [ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
      [ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
      [ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
      [ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
      [ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
      [ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b
      
      A reliable way to reproduce this bug is:
      Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
      and apt-get remove openjdk-6-jdk.
      
      The buggy part of the patch is this:
      	struct inode *inode = NULL;
      .....
      -               if (nd.last.name[nd.last.len])
      -                       goto slashes;
                      inode = dentry->d_inode;
      -               if (inode)
      -                       ihold(inode);
      +               if (nd.last.name[nd.last.len] || !inode)
      +                       goto slashes;
      +               ihold(inode)
      ...
      	if (inode)
      		iput(inode);	/* truncate the inode here */
      
      If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
      and dentry->d_inode is non-NULL, then this code now does an additional iput on
      the inode, which is wrong.
      
      Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.
      
      Reference: https://lkml.org/lkml/2011/6/15/50Reported-by: NNorbert Preining <preining@logic.at>
      Reported-by: NTörök Edwin <edwintorok@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      50338b88
  2. 13 6月, 2011 5 次提交
    • A
      Delay struct net freeing while there's a sysfs instance refering to it · a685e089
      Al Viro 提交于
      	* new refcount in struct net, controlling actual freeing of the memory
      	* new method in kobj_ns_type_operations (->drop_ns())
      	* ->current_ns() semantics change - it's supposed to be followed by
      corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
      the new refcount; net_drop_ns() decrements it and calls net_free() if the
      last reference has been dropped.  Method renamed to ->grab_current_ns().
      	* old net_free() callers call net_drop_ns() instead.
      	* sysfs_exit_ns() is gone, along with a large part of callchain
      leading to it; now that the references stored in ->ns[...] stay valid we
      do not need to hunt them down and replace them with NULL.  That fixes
      problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
      of sb->s_instances abuse.
      
      	Note that struct net *shutdown* logics has not changed - net_cleanup()
      is called exactly when it used to be called.  The only thing postponed by
      having a sysfs instance refering to that struct net is actual freeing of
      memory occupied by struct net.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a685e089
    • A
      afs: fix sget() races, close leak on umount · dde194a6
      Al Viro 提交于
      * set ->s_fs_info in set() callback passed to sget()
      * allocate the thing and set it up enough for afs_test_super() before
      making it visible
      * have it freed in ->kill_sb() (current tree simply leaks it)
      * have ->put_super() leave ->s_fs_info->volume alone; it's too early for
      dropping it; do that from ->kill_sb() after having called kill_anon_super().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dde194a6
    • A
      ubifs: fix sget races · d251ed27
      Al Viro 提交于
      * allocate ubifs_info in ->mount(), fill it enough for sb_test() and
      set ->s_fs_info to it in set() callback passed to sget().
      * do *not* free it in ->put_super(); do that in ->kill_sb() after we'd
      done kill_anon_super().
      * don't free it in ubifs_fill_super() either - deactivate_locked_super()
      done by caller when ubifs_fill_super() returns an error will take care
      of that sucker.
      * get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info;
      we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need
      only ubifs_info.  Which it will find in ->s_fs_info just fine, no need to
      reassign anything...
      
      As the result, sb_test() becomes safe to apply to all superblocks that
      can be found by sget() (and a kludge with temporary use of ->s_fs_info
      to store a pointer to very different structure goes away).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d251ed27
    • A
      ubifs: split allocation of ubifs_info into a separate function · b1c27ab3
      Al Viro 提交于
      preparation to ubifs sget() race fixes
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b1c27ab3
    • A
      fix leak in proc_set_super() · ff78fca2
      Al Viro 提交于
      set_anon_super() can fail...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ff78fca2
  3. 10 6月, 2011 1 次提交
    • M
      exec: delay address limit change until point of no return · dac853ae
      Mathias Krause 提交于
      Unconditionally changing the address limit to USER_DS and not restoring
      it to its old value in the error path is wrong because it prevents us
      using kernel memory on repeated calls to this function.  This, in fact,
      breaks the fallback of hard coded paths to the init program from being
      ever successful if the first candidate fails to load.
      
      With this patch applied switching to USER_DS is delayed until the point
      of no return is reached which makes it possible to have a multi-arch
      rootfs with one arch specific init binary for each of the (hard coded)
      probed paths.
      
      Since the address limit is already set to USER_DS when start_thread()
      will be invoked, this redundancy can be safely removed.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dac853ae
  4. 09 6月, 2011 1 次提交
  5. 07 6月, 2011 4 次提交
  6. 06 6月, 2011 1 次提交
  7. 04 6月, 2011 13 次提交
  8. 03 6月, 2011 7 次提交
  9. 01 6月, 2011 2 次提交
    • A
      UBIFS: fix recovery broken by the previous recovery fix · da8b94ea
      Artem Bityutskiy 提交于
      Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb
      (UBIFS: fix extremely rare mount failure) broke recovery. This commit make
      UBIFS drop the last min. I/O unit in all journal heads, but this is needed only
      for the GC head. And this does not work for non-GC heads. For example, if
      suppose we have min. I/O units A and B, and A contains a valid node X, which
      was fsynced, and then a group of nodes Y which spans the rest of A and B. In
      this case we'll drop not only Y, but also X, which is obviously incorrect.
      
      This patch fixes the issue and additionally makes recovery to drop last min.
      I/O unit only for the GC head, and leave things as they have been for ages for
      the other heads - this is safer.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      da8b94ea
    • A
      UBIFS: amend ubifs_recover_leb interface · efcfde54
      Artem Bityutskiy 提交于
      Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells
      whether the nodes are grouped in the LEB to recover, pass the journal head
      number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag.
      
      This patch is a preparation to a further fix where we'll need to know the
      journal head number for other purposes.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      efcfde54