1. 20 4月, 2010 1 次提交
  2. 17 4月, 2010 1 次提交
  3. 03 4月, 2010 4 次提交
  4. 23 3月, 2010 1 次提交
    • J
      nfsd: don't break lease while servicing a COMMIT · 91885258
      Jeff Layton 提交于
      This is the second attempt to fix the problem whereby a COMMIT call
      causes a lease break and triggers a possible deadlock.
      
      The problem is that nfsd attempts to break a lease on a COMMIT call.
      This triggers a delegation recall if the lease is held for a delegation.
      If the client is the one holding the delegation and it's the same one on
      which it's issuing the COMMIT, then it can't return that delegation
      until the COMMIT is complete. But, nfsd won't complete the COMMIT until
      the delegation is returned. The client and server are essentially
      deadlocked until the state is marked bad (due to the client not
      responding on the callback channel).
      
      The first patch attempted to deal with this by eliminating the open of
      the file altogether and simply had nfsd_commit pass a NULL file pointer
      to the vfs_fsync_range. That would conflict with some work in progress
      by Christoph Hellwig to clean up the fsync interface, so this patch
      takes a different approach.
      
      This declares a new NFSD_MAY_NOT_BREAK_LEASE access flag that indicates
      to nfsd_open that it should not break any leases when opening the file,
      and has nfsd_commit set that flag on the nfsd_open call.
      
      For now, this patch leaves nfsd_commit opening the file with write
      access since I'm not clear on what sort of access would be more
      appropriate.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      91885258
  5. 17 3月, 2010 1 次提交
  6. 08 3月, 2010 15 次提交
  7. 07 3月, 2010 17 次提交
    • J
      nfsd4: document lease/grace-period limits · e7b184f1
      J. Bruce Fields 提交于
      The current documentation here is out of date, and not quite right.
      
      (Future work: some user documentation would be useful.)
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      e7b184f1
    • J
      nfsd4: allow setting grace period time · efc4bb4f
      J. Bruce Fields 提交于
      Allow explicit configuration of the grace period time as well as the
      lease period time.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      efc4bb4f
    • J
      nfsd4: reshuffle lease-setting code to allow reuse · f0135740
      J. Bruce Fields 提交于
      We'll soon allow setting the grace period, so we'll want to share this
      code.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      f0135740
    • J
      nfsd4: remove unnecessary lease-setting function · f958a132
      J. Bruce Fields 提交于
      This is another layer of indirection that doesn't really buy us
      anything.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      f958a132
    • J
      nfsd4: simplify lease/grace interaction · e46b498c
      J. Bruce Fields 提交于
      The original code here assumed we'd allow the user to change the lease
      any time, but only allow the change to take effect on restart.  Since
      then we modified the code to allow setting the lease on when the server
      is down.  Update the rest of the code to reflect that fact, clarify
      variable names, and add document.
      
      Also, the code insisted that the grace period always be the longer of
      the old and new lease periods, but that's overly conservative--as long
      as it lasts at least the old lease period, old clients should still know
      to recover in time.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      e46b498c
    • J
      nfsd4: simplify references to nfsd4 lease time · cf07d2ea
      J. Bruce Fields 提交于
      Instead of accessing the lease time directly, some users call
      nfs4_lease_time(), and some a macro, NFSD_LEASE_TIME, defined as
      nfs4_lease_time().  Neither layer of indirection serves any purpose.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      cf07d2ea
    • N
      coredump: suppress uid comparison test if core output files are pipes · 76595f79
      Neil Horman 提交于
      Modify uid check in do_coredump so as to not apply it in the case of
      pipes.
      
      This just got noticed in testing.  The end of do_coredump validates the
      uid of the inode for the created file against the uid of the crashing
      process to ensure that no one can pre-create a core file with different
      ownership and grab the information contained in the core when they
      shouldn' tbe able to.  This causes failures when using pipes for a core
      dumps if the crashing process is not root, which is the uid of the pipe
      when it is created.
      
      The fix is simple.  Since the check for matching uid's isn't relevant for
      pipes (a process can't create a pipe that the uermodehelper code will open
      anyway), we can just just skip it in the event ispipe is non-zero
      
      Reverts a pipe-affecting change which was accidentally made in
      
      : commit c46f739d
      : Author:     Ingo Molnar <mingo@elte.hu>
      : AuthorDate: Wed Nov 28 13:59:18 2007 +0100
      : Commit:     Linus Torvalds <torvalds@woody.linux-foundation.org>
      : CommitDate: Wed Nov 28 10:58:01 2007 -0800
      :
      :     vfs: coredumping fix
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76595f79
    • O
      coredump: set ->group_exit_code for other CLONE_VM tasks too · 5c99cbf4
      Oleg Nesterov 提交于
      User visible change.
      
      do_coredump() kills all threads which share the same ->mm but only the
      coredumping process gets the proper exit_code.  Other tasks which share
      the same ->mm die "silently" and return status == 0 to parent.
      
      This is historical behaviour, not actually a bug.  But I think Frank
      Heckenbach rightly dislikes the current behaviour.  Simple test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <signal.h>
      	#include <sys/wait.h>
      
      	int main(void)
      	{
      		int stat;
      
      		if (!fork()) {
      			if (!vfork())
      				kill(getpid(), SIGQUIT);
      		}
      
      		wait(&stat);
      		printf("stat=%x\n", stat);
      		return 0;
      	}
      
      Before this patch it prints "stat=0" despite the fact the child was killed
      by SIGQUIT.  After this patch the output is "stat=3" which obviously makes
      more sense.
      
      Even with this patch, only the task which originates the coredumping gets
      "|= 0x80" if the core was actually dumped, but at least the coredumping
      signal is visible to do_wait/etc.
      Reported-by: NFrank Heckenbach <f.heckenbach@fh-soft.de>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c99cbf4
    • M
      coredump: pass mm->flags as a coredump parameter for consistency · 30736a4d
      Masami Hiramatsu 提交于
      Pass mm->flags as a coredump parameter for consistency.
      
       ---
      1787         if (mm->core_state || !get_dumpable(mm)) {  <- (1)
      1788                 up_write(&mm->mmap_sem);
      1789                 put_cred(cred);
      1790                 goto fail;
      1791         }
      1792
      [...]
      1798         if (get_dumpable(mm) == 2) {    /* Setuid core dump mode */ <-(2)
      1799                 flag = O_EXCL;          /* Stop rewrite attacks */
      1800                 cred->fsuid = 0;        /* Dump root private */
      1801         }
       ---
      
      Since dumpable bits are not protected by lock, there is a chance to change
      these bits between (1) and (2).
      
      To solve this issue, this patch copies mm->flags to
      coredump_params.mm_flags at the beginning of do_coredump() and uses it
      instead of get_dumpable() while dumping core.
      
      This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
      dump_filter bits in mm->flags.
      
      [akpm@linux-foundation.org: fix merge]
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30736a4d
    • D
      elf coredump: add extended numbering support · 8d9032bb
      Daisuke HATAYAMA 提交于
      The current ELF dumper implementation can produce broken corefiles if
      program headers exceed 65535.  This number is determined by the number of
      vmas which the process have.  In particular, some extreme programs may use
      more than 65535 vmas.  (If you google max_map_count, you can find some
      users facing this problem.) This kind of program never be able to generate
      correct coredumps.
      
      This patch implements ``extended numbering'' that uses sh_info field of
      the first section header instead of e_phnum field in order to represent
      upto 4294967295 vmas.
      
      This is supported by
      AMD64-ABI(http://www.x86-64.org/documentation.html) and
      Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
      Of course, we are preparing patches for gdb and binutils.
      Signed-off-by: NDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d9032bb
    • D
      elf coredump: make offset calculation process and writing process explicit · 93eb211e
      Daisuke HATAYAMA 提交于
      By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
      extended numbering and so will produce the corefiles with section header
      table in a special case.
      
      The problem is the process of writing a file header offset of the section
      header table into e_shoff field of the ELF header.  ELF header is
      positioned at the beginning of the corefile, while section header at the
      end.  So, we need to take which of the following ways:
      
       1. Seek backward to retry writing operation for ELF header
          after writing process for a whole part
      
       2. Make offset calculation process and writing process
          totally sequential
      
      The clause 1.  is not always possible: one cannot assume that file system
      supports seek function.  Consider the no_llseek case.
      
      Therefore, this patch adopts the clause 2.
      Signed-off-by: NDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93eb211e
    • D
      elf coredump: replace ELF_CORE_EXTRA_* macros by functions · 1fcccbac
      Daisuke HATAYAMA 提交于
      elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
      macro for hiding _multiline_ logics in functions.  This patch removes
      #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions.  For
      architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
      order to reduce a range of modification.
      
      This cleanup is for my next patches, but I think this cleanup itself is
      worth doing regardless of my firnal purpose.
      Signed-off-by: NDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1fcccbac
    • D
      coredump: move dump_write() and dump_seek() into a header file · 088e7af7
      Daisuke HATAYAMA 提交于
      My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
      them into other newly created *.c files.  Then, each files will contain
      dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
      same.  So, this patch moves them into a header file with dump_seek().
      Also, the patch deletes confusing DUMP_WRITE macros in each files.
      Signed-off-by: NDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      088e7af7
    • D
      coredump: unify dump_seek() implementations for each binfmt_*.c · 05f47fda
      Daisuke HATAYAMA 提交于
      The current ELF dumper can produce broken corefiles if program headers
      exceed 65535.  In particular, the program in 64-bit environment often
      demands more than 65535 mmaps.  If you google max_map_count, then you can
      find many users facing this problem.
      
      Solaris has already dealt with this issue, and other OSes have also
      adopted the same method as in Solaris.  Currently, Sun's document and AMD
      64 ABI include the description for the extension, where they call the
      extension Extended Numbering.  See Reference for further information.
      
      I believe that linux kernel should adopt the same way as they did, so I've
      written this patch.
      
      I am also preparing for patches of GDB and binutils.
      
      How to fix
      ==========
      
      In new dumping process, there are two cases according to weather or
      not the number of program headers is equal to or more than 65535.
      
       - if less than 65535, the produced corefile format is exactly the same
         as the ordinary one.
      
       - if equal to or more than 65535, then e_phnum field is set to newly
         introduced constant PN_XNUM(0xffff) and the actual number of program
         headers is set to sh_info field of the section header at index 0.
      
      Compatibility Concern
      =====================
      
       * As already mentioned in Summary, Sun and AMD64 has already adopted
         this.  See Reference.
      
       * There are four combinations according to whether kernel and userland
         tools are respectively modified or not.  The next table summarizes
         shortly for each combination.
      
                        ---------------------------------------------
                           Original Kernel    |   Modified Kernel
                        ---------------------------------------------
          	            < 65535  | >= 65535 | < 65535  | >= 65535
        -------------------------------------------------------------
         Original Tools |    OK    |  broken  |   OK     | broken (#)
        -------------------------------------------------------------
         Modified Tools |    OK    |  broken  |   OK     |    OK
        -------------------------------------------------------------
      
        Note that there is no case that `OK' changes to `broken'.
      
        (#) Although this case remains broken, O-M behaves better than
        O-O. That is, while in O-O case e_phnum field would be extremely
        small due to integer overflow, in O-M case it is guaranteed to be at
        least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
        actual correct value than the O-O case.
      
      Test Program
      ============
      
      Here is a test program mkmmaps.c that is useful to produce the
      corefile with many mmaps. To use this, please take the following
      steps:
      
      $ ulimit -c unlimited
      $ sysctl vm.max_map_count=70000 # default 65530 is too small
      $ sysctl fs.file-max=70000
      $ mkmmaps 65535
      
      Then, the program will abort and a corefile will be generated.
      
      If failed, there are two cases according to the error message
      displayed.
      
       * ``out of memory'' means vm.max_map_count is still smaller
      
       * ``too many open files'' means fs.file-max is still smaller
      
      So, please change it to a larger value, and then retry it.
      
      mkmmaps.c
      ==
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/mman.h>
      #include <fcntl.h>
      #include <unistd.h>
      int main(int argc, char **argv)
      {
      	int maps_num;
      	if (argc < 2) {
      		fprintf(stderr, "mkmmaps [number of maps to be created]\n");
      		exit(1);
      	}
      	if (sscanf(argv[1], "%d", &maps_num) == EOF) {
      		perror("sscanf");
      		exit(2);
      	}
      	if (maps_num < 0) {
      		fprintf(stderr, "%d is invalid\n", maps_num);
      		exit(3);
      	}
      	for (; maps_num > 0; --maps_num) {
      		if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
      					MAP_SHARED | MAP_ANONYMOUS, (int) -1,
      					(off_t) NULL)) {
      			perror("mmap");
      			exit(4);
      		}
      	}
      	abort();
      	{
      		char buffer[128];
      		sprintf(buffer, "wc -l /proc/%u/maps", getpid());
      		system(buffer);
      	}
      	return 0;
      }
      
      Tested on i386, ia64 and um/sys-i386.
      Built on sh4 (which covers fs/binfmt_elf_fdpic.c)
      
      References
      ==========
      
       - Sun microsystems: Linker and Libraries.
         Part No: 817-1984-17, September 2008.
         URL: http://docs.sun.com/app/docs/doc/817-1984
      
       - System V ABI AMD64 Architecture Processor Supplement
         Draft Version 0.99., May 11, 2009.
         URL: http://www.x86-64.org/
      
      This patch:
      
      There are three different definitions for dump_seek() functions in
      binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively.  The
      only for binfmt_elf.c.
      
      My next patch will move dump_seek() into a header file in order to share
      the same implementations for dump_write() and dump_seek().  As the first
      step, this patch unify these three definitions for dump_seek() by applying
      the past commits that have been applied only for binfmt_elf.c.
      
      Specifically, the modification made here is part of the following commits:
      
        * d025c9db
        * 7f14daa1
      
      This patch does not change a shape of corefiles.
      Signed-off-by: NDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05f47fda
    • A
      proc: warn on non-existing proc entries · 12bac0d9
      Alexey Dobriyan 提交于
      * warn if creation goes on to non-existent directory
      * warn if removal goes on from non-existing directory
      * warn if non-existing proc entry is removed
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12bac0d9
    • A
      proc: do translation + unlink atomically at remove_proc_entry() · e17a5765
      Alexey Dobriyan 提交于
      remove_proc_entry() does
      
      	lock
      	lookup parent
      	unlock
      	lock
      	unlink proc entry from lists
      	unlock
      
      which can be made bit more correct by doing parent translation + unlink
      without dropping lock.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e17a5765
    • A
      fs/compat_ioctl.c: suppress two warnings · 45bf5cd7
      Andrew Morton 提交于
      fs/compat_ioctl.c: In function 'do_ioctl_trans':
      fs/compat_ioctl.c:534: warning: 'karg' may be used uninitialized in this function
      fs/compat_ioctl.c:533: warning: 'kcmd' may be used uninitialized in this function
      fs/compat_ioctl.c:656: warning: 'ret' may be used uninitialized in this function
      
      Reduces text size by 44 bytes.
      
      If someone calls one of these functions with an unexpected argument, the
      code's buggy as-is.
      
      Amerigo Wang <amwang@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45bf5cd7