1. 28 5月, 2010 27 次提交
    • N
      kill spurious reference to vmtruncate · 15c6fd97
      npiggin@suse.de 提交于
      Lots of filesystems calls vmtruncate despite not implementing the old
      ->truncate method.  Switch them to use simple_setsize and add some
      comments about the truncate code where it seems fitting.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      15c6fd97
    • N
      fs: introduce new truncate sequence · 7bb46a67
      npiggin@suse.de 提交于
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
      setattr > vmtruncate > truncate, have filesystems call their truncate sequence
      from ->setattr if filesystem specific operations are required. vmtruncate is
      deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
      previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence. Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion
      of simple_setattr (ie. changing i_size and trimming pagecache).
      
      To implement the new truncate sequence:
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - convert usage of helpers block_write_begin, nobh_write_begin,
        cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
        variants. These avoid calling vmtruncate to trim blocks (see previous).
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      - make use of the better opportunity to handle errors with the new sequence.
      
      Big problem with the previous calling sequence: the filesystem is not called
      until i_size has already changed.  This means it is not allowed to fail the
      call, and also it does not know what the previous i_size was. Also, generic
      code calling vmtruncate to truncate allocated blocks in case of error had
      no good way to return a meaningful error (or, for example, atomically handle
      block deallocation).
      
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7bb46a67
    • R
      fs/super: fix kernel-doc warning · 7000d3c4
      Randy Dunlap 提交于
      Fix fs/super.c kernel-doc warning and function notation:
      Warning(fs/super.c:957): No description found for parameter 'sb'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7000d3c4
    • E
      fs/minix: bugfix, number of indirect block ptrs per block depends on block size · 0ab7620a
      Erik van der Kouwe 提交于
      The MINIX filesystem driver used a constant number of indirect block
      pointers in an indirect block. This worked only for filesystems with 1kb
      block, while the MINIX default block size is now 4kb. As a consequence,
      large files were read incorrectly on such filesystems and writing a
      large file would cause the filesystem to become corrupted. This patch
      computes the number of indirect block pointers based on the block size,
      making the driver work for each block size.
      
      I would like to thank Feiran Zheng ('Fam') for pointing out the cause
      of the corruption.
      Signed-off-by: NErik van der Kouwe <vdkouwe@cs.vu.nl>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0ab7620a
    • C
      rename the generic fsync implementations · 1b061d92
      Christoph Hellwig 提交于
      We don't name our generic fsync implementations very well currently.
      The no-op implementation for in-memory filesystems currently is called
      simple_sync_file which doesn't make too much sense to start with,
      the the generic one for simple filesystems is called simple_fsync
      which can lead to some confusion.
      
      This patch renames the generic file fsync method to generic_file_fsync
      to match the other generic_file_* routines it is supposed to be used
      with, and the no-op implementation to noop_fsync to make it obvious
      what to expect.  In addition add some documentation for both methods.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1b061d92
    • C
      drop unused dentry argument to ->fsync · 7ea80859
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7ea80859
    • J
      fs: Add missing mutex_unlock · cc967be5
      Julia Lawall 提交于
      Add a mutex_unlock missing on the error path.  At other exists from the
      function that return an error flag, the mutex is unlocked, so do the same
      here.
      
      The semantic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression E1;
      @@
      
      * mutex_lock(E1,...);
        <+... when != E1
        if (...) {
          ... when != E1
      *   return ...;
        }
        ...+>
      * mutex_unlock(E1,...);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc967be5
    • A
      get rid of the magic around f_count in aio · d7065da0
      Al Viro 提交于
      __aio_put_req() plays sick games with file refcount.  What
      it wants is fput() from atomic context; it's almost always
      done with f_count > 1, so they only have to deal with delayed
      work in rare cases when their reference happens to be the
      last one.  Current code decrements f_count and if it hasn't
      hit 0, everything is fine.  Otherwise it keeps a pointer
      to struct file (with zero f_count!) around and has delayed
      work do __fput() on it.
      
      Better way to do it: use atomic_long_add_unless( , -1, 1)
      instead of !atomic_long_dec_and_test().  IOW, decrement it
      only if it's not the last reference, leave refcount alone
      if it was.  And use normal fput() in delayed work.
      
      I've made that atomic_long_add_unless call a new helper -
      fput_atomic().  Drops a reference to file if it's safe to
      do in atomic (i.e. if that's not the last one), tells if
      it had been able to do that.  aio.c converted to it, __fput()
      use is gone.  req->ki_file *always* contributes to refcount
      now.  And __fput() became static.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d7065da0
    • N
      VFS: fix recent breakage of FS_REVAL_DOT · 176306f5
      Neil Brown 提交于
      Commit 1f36f774 broke FS_REVAL_DOT semantics.
      
      In particular, before this patch, the command
         ls -l
      in an NFS mounted directory would always check if the directory on the server
      had changed and if so would flush and refill the pagecache for the dir.
      After this patch, the same "ls -l" will repeatedly return stale date until
      the cached attributes for the directory time out.
      
      The following patch fixes this by ensuring the d_revalidate is called by
      do_last when "." is being looked-up.
      link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN
      is not set so nfs_lookup_verify_inode chooses not to do any validation.
      
      The following patch restores the original behaviour.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      176306f5
    • A
      Revert "anon_inode: set S_IFREG on the anon_inode" · 1eb2cbb6
      Al Viro 提交于
      This reverts commit a7cf4145.
      1eb2cbb6
    • J
      fs/: do not fallback to default_llseek() when readdir() uses BKL · ca572727
      jan Blunck 提交于
      Do not use the fallback default_llseek() if the readdir operation of the
      filesystem still uses the big kernel lock.
      
      Since llseek() modifies
      file->f_pos of the directory directly it may need locking to not confuse
      readdir which usually uses file->f_pos directly as well
      
      Since the special characteristics of the BKL (unlocked on schedule) are
      not necessary in this case, the inode mutex can be used for locking as
      provided by generic_file_llseek().  This is only possible since all
      filesystems, except reiserfs, either use a directory as a flat file or
      with disk address offsets.  Reiserfs on the other hand uses a 32bit hash
      off the filename as the offset so generic_file_llseek() can get used as
      well since the hash is always smaller than sb->s_maxbytes (= (512 << 32) -
      blocksize).
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NAnders Larsen <al@alarsen.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca572727
    • J
      vfs: introduce noop_llseek() · ae6afc3f
      jan Blunck 提交于
      This is an implementation of ->llseek useable for the rare special case
      when userspace expects the seek to succeed but the (device) file is
      actually not able to perform the seek.  In this case you use noop_llseek()
      instead of falling back to the default implementation of ->llseek.
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae6afc3f
    • J
      aio: fix the compat vectored operations · 9d85cba7
      Jeff Moyer 提交于
      The aio compat code was not converting the struct iovecs from 32bit to
      64bit pointers, causing either EINVAL to be returned from io_getevents, or
      EFAULT as the result of the I/O.  This patch passes a compat flag to
      io_submit to signal that pointer conversion is necessary for a given iocb
      array.
      
      A variant of this was tested by Michael Tokarev.  I have also updated the
      libaio test harness to exercise this code path with good success.
      Further, I grabbed a copy of ltp and ran the
      testcases/kernel/syscall/readv and writev tests there (compiled with -m32
      on my 64bit system).  All seems happy, but extra eyes on this would be
      welcome.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: <stable@kernel.org>		[2.6.35.1]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d85cba7
    • J
      compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev · b8373363
      Jeff Moyer 提交于
      It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and
      writev AIO operations were not functioning properly.  It turns out that
      the code to convert the 32bit io vectors to 64 bits was never written.
      The results of that can be pretty bad, but in my testing, it mostly ended
      up in generating EFAULT as we walked off the list of I/O vectors provided.
      
      This patch set fixes the problem in my environment.  are greatly
      appreciated.
      
      This patch:
      
      Factor out code that will be used by both compat_do_readv_writev and the
      compat aio submission code paths.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: <stable@kernel.org>		[2.6.35.1]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8373363
    • J
      fs/affs: use ERR_CAST · cccad8f9
      Julia Lawall 提交于
      Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
      clear what is the purpose of the operation, which otherwise looks like a
      no-op.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      type T;
      T x;
      identifier f;
      @@
      
      T f (...) { <+...
      - ERR_PTR(PTR_ERR(x))
      + x
       ...+> }
      
      @@
      expression x;
      @@
      
      - ERR_PTR(PTR_ERR(x))
      + ERR_CAST(x)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cccad8f9
    • W
      kcore: add _text to KCORE_TEXT · 36e15263
      Wu Fengguang 提交于
      Extend KCORE_TEXT to cover the pages between _text and _stext, to allow
      examining some important page table pages.
      
      `readelf -a` output on x86_64 before and after patch:
      	  Type           Offset             VirtAddr           PhysAddr
      before    LOAD           0x00007fff8100c000 0xffffffff81009000 0x0000000000000000
      after     LOAD           0x00007fff81003000 0xffffffff81000000 0x0000000000000000
      
      The newly covered pages are:
      
      	0xffffffff81000000 <startup_64> etc.
      	0xffffffff81001000 <init_level4_pgt>
      	0xffffffff81002000 <level3_ident_pgt>
      	0xffffffff81003000 <level3_kernel_pgt>
      	0xffffffff81004000 <level2_fixmap_pgt>
      	0xffffffff81005000 <level1_fixmap_pgt>
      	0xffffffff81006000 <level2_ident_pgt>
      	0xffffffff81007000 <level2_kernel_pgt>
      	0xffffffff81008000 <level2_spare_pgt>
      
      Before patch, /proc/kcore shows outdated contents for the above page
      table pages, for example:
      
      	(gdb) p level3_ident_pgt
      	$1 = {<text variable, no debug info>} 0xffffffff81002000 <level3_ident_pgt>
      	(gdb) p/x *((pud_t *)&level3_ident_pgt)@512
      	$2 = {{pud = 0x1006063}, {pud = 0x0} <repeats 511 times>}
      
      while the real content is:
      
      	root@hp /home/wfg# hexdump -s 0x1002000 -n 4096 /dev/mem
      	1002000 6063 0100 0000 0000 8067 0000 0000 0000
      	1002010 0000 0000 0000 0000 0000 0000 0000 0000
      	*
      	1003000
      
      That is, on a x86_64 box with 2GB memory, we can see first-1GB / full-2GB
      identity mapping before/after patch:
      
      	(gdb) p/x *((pud_t *)&level3_ident_pgt)@512
      before  $1 = {{pud = 0x1006063}, {pud = 0x0} <repeats 511 times>}
      after   $1 = {{pud = 0x1006063}, {pud = 0x8067}, {pud = 0x0} <repeats 510 times>}
      
      Obviously the content before patch is wrong.
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36e15263
    • A
      proc: remove obsolete comments · 57f87869
      Amerigo Wang 提交于
      A quick test shows these comments are obsolete, so just remove them.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57f87869
    • D
      proc: cleanup: remove unused assignments · 73d36460
      Dan Carpenter 提交于
      I removed 3 unused assignments.  The first two get reset on the first
      statement of their functions.  For "err" in root.c we don't return an
      error and we don't use the variable again.
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73d36460
    • O
      proc: get_nr_threads() doesn't need ->siglock any longer · 7e49827c
      Oleg Nesterov 提交于
      Now that task->signal can't go away get_nr_threads() doesn't need
      ->siglock to read signal->count.
      
      Also, make it inline, move into sched.h, and convert 2 other proc users of
      signal->count to use this (now trivial) helper.
      
      Henceforth get_nr_threads() is the only valid user of signal->count, we
      are ready to turn it into "int nr_threads" or, perhaps, kill it.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e49827c
    • O
      exit: avoid sig->count in de_thread/__exit_signal synchronization · d344193a
      Oleg Nesterov 提交于
      de_thread() and __exit_signal() use signal_struct->count/notify_count for
      synchronization.  We can simplify the code and use ->notify_count only.
      Instead of comparing these two counters, we can change de_thread() to set
      ->notify_count = nr_of_sub_threads, then change __exit_signal() to
      dec-and-test this counter and notify group_exit_task.
      
      Note that __exit_signal() checks "notify_count > 0" just for symmetry with
      exit_notify(), we could just check it is != 0.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d344193a
    • O
      coredump: shift down_write(mmap_sem) into coredump_wait() · 269b005a
      Oleg Nesterov 提交于
      - move the cprm.mm_flags checks up, before we take mmap_sem
      
      - move down_write(mmap_sem) and ->core_state check from do_coredump()
        to coredump_wait()
      
      This simplifies the code and makes the locking symmetrical.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      269b005a
    • O
      coredump: factor out put_cred() calls · 5e43aef5
      Oleg Nesterov 提交于
      Given that do_coredump() calls put_cred() on exit path, it is a bit ugly
      to do put_cred() + "goto fail" twice, just add the new "fail_creds" label.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e43aef5
    • O
      coredump: cleanup "ispipe" code · d5bf4c4f
      Oleg Nesterov 提交于
      - kill "int dump_count", argv_split(argcp) accepts argcp == NULL.
      
      - move "int dump_count" under " if (ispipe)" branch, fail_dropcount
        can check ispipe.
      
      - move "char **helper_argv" as well, change the code to do argv_free()
        right after call_usermodehelper_fns().
      
      - If call_usermodehelper_fns() fails goto close_fail label instead
        of closing the file by hand.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5bf4c4f
    • O
      coredump: factor out the not-ispipe file checks · c7135411
      Oleg Nesterov 提交于
      do_coredump() does a lot of file checks after it opens the file or calls
      usermode helper.  But all of these checks are only needed in !ispipe case.
      
      Move this code into the "else" branch and kill the ugly repetitive ispipe
      checks.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7135411
    • N
      exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit · 898b374a
      Neil Horman 提交于
      The first patch in this series introduced an init function to the
      call_usermodehelper api so that processes could be customized by caller.
      This patch takes advantage of that fact, by customizing the helper in
      do_coredump to create the pipe and set its core limit to one (for our
      recusrsion check).  This lets us clean up the previous uglyness in the
      usermodehelper internals and factor call_usermodehelper out entirely.
      While I'm at it, we can also modify the helper setup to look for a core
      limit value of 1 rather than zero for our recursion check
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      898b374a
    • T
      ufs: permit mounting of BorderWare filesystems · d27d7a9a
      Thomas Stewart 提交于
      I recently had to recover some files from an old broken machine that was
      running BorderWare Document Gateway.  It's basically a drop in web server
      for sharing files.  From the look of the init process and using strings on
      of a few files it seems to be based on FreeBSD 3.3.
      
      The process turned out to be more difficult than I imagined, but to cut a
      long story short BorderWare in their wisdom use a nonstandard magic number
      in their UFS (ufstype=44bsd) file systems.  Thus Linux refuses to mount
      the file systems in order to recover the data.  After a bit of hunting I
      was able to make a quick fix to fs/ufs/super.c in order to detect the new
      magic number.
      
      I assume that this number is the same for all installations.  It's quite
      easy to find out from ufs_fs.h.  The superblock sits 8k into the block
      device and the magic number its 1372 bytes into the superblock struct.
      
      # dd if=/dev/sda5 skip=$(( 8192 + 1372 )) bs=1 count=4 2> /dev/null | hd
      00000000  97 26 24 0f                                       |.&$.|
      #
      Signed-off-by: NThomas Stewart <thomas@stewarts.org.uk>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d27d7a9a
    • J
      fs/autofs4: use memdup_user · 7ca5ca60
      Julia Lawall 提交于
      Use memdup_user when user data is immediately copied into the allocated
      region.  Elimination of the variable ads, which is no longer useful.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression from,to,size,flag;
      position p;
      identifier l1,l2;
      @@
      
      -  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
      +  to = memdup_user(from,size);
         if (
      -      to==NULL
      +      IS_ERR(to)
                       || ...) {
         <+... when != goto l1;
      -  -ENOMEM
      +  PTR_ERR(to)
         ...+>
         }
      -  if (copy_from_user(to, from, size) != 0) {
      -    <+... when != goto l2;
      -    -EFAULT
      -    ...+>
      -  }
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ca5ca60
  2. 27 5月, 2010 5 次提交
  3. 26 5月, 2010 8 次提交
    • A
      fs/fscache/object-list.c: fix warning on 32-bit · cc68e3be
      Andrew Morton 提交于
      fs/fscache/object-list.c: In function 'fscache_objlist_lookup':
      fs/fscache/object-list.c:105: warning: cast to pointer from integer of different size
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc68e3be
    • C
      Btrfs: avoid ENOSPC errors in btrfs_dirty_inode · 94b60442
      Chris Mason 提交于
      btrfs_dirty_inode tries to sneak in without much waiting or
      space reservation, mostly for performance reasons.  This
      usually works well but can cause problems when there are
      many many writers.
      
      When btrfs_update_inode fails with ENOSPC, we fallback
      to a slower btrfs_start_transaction call that will reserve
      some space.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      94b60442
    • C
      Btrfs: move O_DIRECT space reservation to btrfs_direct_IO · 3f7c579c
      Chris Mason 提交于
      This moves the delalloc space reservation done for O_DIRECT
      into btrfs_direct_IO.  This way we don't leak reserved space
      if the generic O_DIRECT write code errors out before it
      calls into btrfs_direct_IO.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3f7c579c
    • T
      NFS: Fix another nfs_wb_page() deadlock · 0522f6ad
      Trond Myklebust 提交于
      J.R. Okajima reports that the call to sync_inode() in nfs_wb_page() can
      deadlock with other writeback flush calls. It boils down to the fact
      that we cannot ever call writeback_single_inode() while holding a page
      lock (even if we do set nr_to_write to zero) since another process may
      already be waiting in the call to do_writepages(), and so will deny us
      the I_SYNC lock.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      0522f6ad
    • T
      NFS: Ensure that we mark the inode as dirty if we exit early from commit · c5efa5fc
      Trond Myklebust 提交于
      If we exit from nfs_commit_inode() without ensuring that the COMMIT rpc
      call has been completed, we must re-mark the inode as dirty. Otherwise,
      future calls to sync_inode() with the WB_SYNC_ALL flag set will fail to
      ensure that the data is on the disk.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c5efa5fc
    • T
      NFS: Fix a lock imbalance typo in nfs_access_cache_shrinker · 59844a9b
      Trond Myklebust 提交于
      Commit 9c7e7e23 (NFS: Don't call iput() in
      nfs_access_cache_shrinker) unintentionally removed the spin unlock for the
      inode->i_lock.
      Reported-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      59844a9b
    • C
      Btrfs: rework O_DIRECT enospc handling · 4845e44f
      Chris Mason 提交于
      This changes O_DIRECT write code to mark extents as delalloc
      while it is processing them.  Yan Zheng has reworked the
      enospc accounting based on tracking delalloc extents and
      this makes it much easier to track enospc in the O_DIRECT code.
      
      There are a few space cases with the O_DIRECT code though,
      it only sets the EXTENT_DELALLOC bits, instead of doing
      EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because
      we don't want to mess with clearing the dirty and uptodate
      bits when things go wrong.  This is important because there
      are no pages in the page cache, so any extent state structs
      that we put in the tree won't get freed by releasepage.  We have
      to clear them ourselves as the DIO ends.
      
      With this commit, we reserve space at in btrfs_file_aio_write,
      and then as each btrfs_direct_IO call progresses it sets
      EXTENT_DELALLOC on the range.
      
      btrfs_get_blocks_direct is responsible for clearing the delalloc
      at the same time it drops the extent lock.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4845e44f
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff