1. 23 10月, 2008 3 次提交
    • M
      [PATCH] i_version: remount support · 08b9fe6b
      Mimi Zohar 提交于
      Add support for remounting a filesystem with the i_version option.
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      08b9fe6b
    • M
      [PATCH] move executable checking into ->permission() · f696a365
      Miklos Szeredi 提交于
      For execute permission on a regular files we need to check if file has
      any execute bits at all, regardless of capabilites.
      
      This check is normally performed by generic_permission() but was also
      added to the case when the filesystem defines its own ->permission()
      method.  In the latter case the filesystem should be responsible for
      performing this check.
      
      Move the check from inode_permission() inside filesystems which are
      not calling generic_permission().
      
      Create a helper function execute_ok() that returns true if the inode
      is a directory or if any execute bits are present in i_mode.
      
      Also fix up the following code:
      
       - coda control file is never executable
       - sysctl files are never executable
       - hfs_permission seems broken on MAY_EXEC, remove
       - hfsplus_permission is eqivalent to generic_permission(), remove
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      f696a365
    • A
      [PATCH 1/2] anondev: init IDR statically · 6de24f0e
      Alexey Dobriyan 提交于
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      6de24f0e
  2. 09 10月, 2008 6 次提交
  3. 04 10月, 2008 1 次提交
    • J
      generic block based fiemap implementation · 68c9d702
      Josef Bacik 提交于
      Any block based fs (this patch includes ext3) just has to declare its own
      fiemap() function and then call this generic function with its own
      get_block_t. This works well for block based filesystems that will map
      multiple contiguous blocks at one time, but will work for filesystems that
      only map one block at a time, you will just end up with an "extent" for each
      block. One gotcha is this will not play nicely where there is hole+data
      after the EOF. This function will assume its hit the end of the data as soon
      as it hits a hole after the EOF, so if there is any data past that it will
      not pick that up. AFAIK no block based fs does this anyway, but its in the
      comments of the function anyway just in case.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      68c9d702
  4. 09 10月, 2008 1 次提交
    • M
      vfs: vfs-level fiemap interface · c4b929b8
      Mark Fasheh 提交于
      Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
      inode operation.
      
      Userspace can get extent information on a file via fiemap ioctl. As input,
      the fiemap ioctl takes a struct fiemap which includes an array of struct
      fiemap_extent (fm_extents). Size of the extent array is passed as
      fm_extent_count and number of extents returned will be written into
      fm_mapped_extents. Offset and length fields on the fiemap structure
      (fm_start, fm_length) describe a logical range which will be searched for
      extents. All extents returned will at least partially contain this range.
      The actual extent offsets and ranges returned will be unmodified from their
      offset and range on-disk.
      
      The fiemap ioctl returns '0' on success. On error, -1 is returned and errno
      is set. If errno is equal to EBADR, then fm_flags will contain those flags
      which were passed in which the kernel did not understand. On all other
      errors, the contents of fm_extents is undefined.
      
      As fiemap evolved, there have been many authors of the vfs patch. As far as
      I can tell, the list includes:
      Kalpak Shah <kalpak.shah@sun.com>
      Andreas Dilger <adilger@sun.com>
      Eric Sandeen <sandeen@redhat.com>
      Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      c4b929b8
  5. 04 10月, 2008 1 次提交
    • J
      nfsd: common grace period control · af558e33
      J. Bruce Fields 提交于
      Rewrite grace period code to unify management of grace period across
      lockd and nfsd.  The current code has lockd and nfsd cooperate to
      compute a grace period which is satisfactory to them both, and then
      individually enforce it.  This creates a slight race condition, since
      the enforcement is not coordinated.  It's also more complicated than
      necessary.
      
      Here instead we have lockd and nfsd each inform common code when they
      enter the grace period, and when they're ready to leave the grace
      period, and allow normal locking only after both of them are ready to
      leave.
      
      We also expect the locks_start_grace()/locks_end_grace() interface here
      to be simpler to build on for future cluster/high-availability work,
      which may require (for example) putting individual filesystems into
      grace, or enforcing grace periods across multiple cluster nodes.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      af558e33
  6. 30 9月, 2008 1 次提交
    • T
      Configure out file locking features · bfcd17a6
      Thomas Petazzoni 提交于
      This patch adds the CONFIG_FILE_LOCKING option which allows to remove
      support for advisory locks. With this patch enabled, the flock()
      system call, the F_GETLK, F_SETLK and F_SETLKW operations of fcntl()
      and NFS support are disabled. These features are not necessarly needed
      on embedded systems. It allows to save ~11 Kb of kernel code and data:
      
         text          data     bss     dec     hex filename
      1125436        118764  212992 1457192  163c28 vmlinux.old
      1114299        118564  212992 1445855  160fdf vmlinux
       -11137    -200       0  -11337   -2C49 +/-
      
      This patch has originally been written by Matt Mackall
      <mpm@selenic.com>, and is part of the Linux Tiny project.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: matthew@wil.cx
      Cc: linux-fsdevel@vger.kernel.org
      Cc: mpm@selenic.com
      Cc: akpm@linux-foundation.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      bfcd17a6
  7. 29 7月, 2008 1 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  8. 27 7月, 2008 10 次提交
  9. 26 7月, 2008 1 次提交
    • M
      locks: add special return value for asynchronous locks · bde74e4b
      Miklos Szeredi 提交于
      Use a special error value FILE_LOCK_DEFERRED to mean that a locking
      operation returned asynchronously.  This is returned by
      
        posix_lock_file() for sleeping locks to mean that the lock has been
        queued on the block list, and will be woken up when it might become
        available and needs to be retried (either fl_lmops->fl_notify() is
        called or fl_wait is woken up).
      
        f_op->lock() to mean either the above, or that the filesystem will
        call back with fl_lmops->fl_grant() when the result of the locking
        operation is known.  The filesystem can do this for sleeping as well
        as non-sleeping locks.
      
      This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
      filesystems are not mistaken to mean an asynchronous locking.
      
      This also makes error handling in fs/locks.c and lockd/svclock.c slightly
      cleaner.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: David Teigland <teigland@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bde74e4b
  10. 25 7月, 2008 4 次提交
    • U
      flag parameters: NONBLOCK in pipe · be61a86d
      Ulrich Drepper 提交于
      This patch adds O_NONBLOCK support to pipe2.  It is minimally more involved
      than the patches for eventfd et.al but still trivial.  The interfaces of the
      create_write_pipe and create_read_pipe helper functions were changed and the
      one other caller as well.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fds[2];
        if (syscall (__NR_pipe2, fds, 0) == -1)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
          {
            puts ("pipe2(O_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be61a86d
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
    • K
      fix soft lock up at NFS mount via per-SB LRU-list of unused dentries · da3bbdd4
      Kentaro Makita 提交于
      [Summary]
      
       Split LRU-list of unused dentries to one per superblock to avoid soft
       lock up during NFS mounts and remounting of any filesystem.
      
       Previously I posted here:
       http://lkml.org/lkml/2008/3/5/590
      
      [Descriptions]
      
      - background
      
        dentry_unused is a list of dentries which are not referenced.
        dentry_unused grows up when references on directories or files are
        released.  This list can be very long if there is huge free memory.
      
      - the problem
      
        When shrink_dcache_sb() is called, it scans all dentry_unused linearly
        under spin_lock(), and if dentry->d_sb is differnt from given
        superblock, scan next dentry.  This scan costs very much if there are
        many entries, and very ineffective if there are many superblocks.
      
        IOW, When we need to shrink unused dentries on one dentry, but scans
        unused dentries on all superblocks in the system.  For example, we scan
        500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
        dentries on other superblocks.
      
        In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
        unused dentries on NFS, but scans 100,000,000 unused dentries on
        superblocks in the system such as local ext3 filesystems.  I hear NFS
        mounting took 1 min on some system in use.
      
      * : NFS uses virtual filesystem in rpc layer, so NFS is affected by
        this problem.
      
        100,000,000 is possible number on large systems.
      
        Per-superblock LRU of unused dentried can reduce the cost in
        reasonable manner.
      
      - How to fix
      
        I found this problem is solved by David Chinner's "Per-superblock
        unused dentry LRU lists V3"(1), so I rebase it and add some fix to
        reclaim with fairness, which is in Andrew Morton's comments(2).
      
        1) http://lkml.org/lkml/2006/5/25/318
        2) http://lkml.org/lkml/2006/5/25/320
      
        Split LRU-list of unused dentries to each superblocks.  Then, NFS
        mounting will check dentries under a superblock instead of all.  But
        this spliting will break LRU of dentry-unused.  So, I've attempted to
        make reclaim unused dentrins with fairness by calculate number of
        dentries to scan on this sb based on following way
      
        number of dentries to scan on this sb =
        count * (number of dentries on this sb / number of dentries in the machine)
      
      - ToDo
       - I have to measuring performance number and do stress tests.
      
       - When unmount occurs during prune_dcache(), scanning on same
        superblock, It is unable to reach next superblock because it is gone
        away.  We restart scannig superblock from first one, it causes
        unfairness of reclaim unused dentries on first superblock.  But I think
        this happens very rarely.
      
      - Test Results
      
        Result on 6GB boxes with excessive unused dentries.
      
      Without patch:
      
      $ cat /proc/sys/fs/dentry-state
      10181835        10180203        45      0       0       0
      # mount -t nfs 10.124.60.70:/work/kernel-src nfs
      real    0m1.830s
      user    0m0.001s
      sys     0m1.653s
      
       With this patch:
      $ cat /proc/sys/fs/dentry-state
      10236610        10234751        45      0       0       0
      # mount -t nfs 10.124.60.70:/work/kernel-src nfs
      real    0m0.106s
      user    0m0.002s
      sys     0m0.032s
      
      [akpm@linux-foundation.org: fix comments]
      Signed-off-by: NKentaro Makita <k-makita@np.css.fujitsu.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da3bbdd4
    • A
      move memory_read_from_buffer() from fs.h to string.h · e108526e
      Akinobu Mita 提交于
      James Bottomley warns that inclusion of linux/fs.h in a low level
      driver was always a danger signal.  This patch moves
      memory_read_from_buffer() from fs.h to string.h and fixes includes in
      existing memory_read_from_buffer() users.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Bob Moore <robert.moore@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e108526e
  11. 15 7月, 2008 1 次提交
  12. 14 7月, 2008 1 次提交
  13. 12 7月, 2008 1 次提交
  14. 03 7月, 2008 1 次提交
    • A
      Remove BKL from remote_llseek v2 · 9465efc9
      Andi Kleen 提交于
      - Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
      failures in all users)
      - Change all users to either use generic_file_llseek_unlocked directly or
      take the BKL around. I changed the file systems who don't use the BKL
      for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
      take the BKL, but explicitely in their own source now.
      
      I moved them all over in a single patch to avoid unbisectable sections.
      
      Open problem: 32bit kernels can corrupt fpos because its modification
      is not atomic, but they can do that anyways because there's other paths who
      modify it without BKL.
      
      Do we need a special lock for the pos/f_version = 0 checks?
      
      Trond says the NFS BKL is likely not needed, but keep it for now
      until his full audit.
      
      v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
          and factor duplicated code (suggested by hch)
      
      Cc: Trond.Myklebust@netapp.com
      Cc: swhiteho@redhat.com
      Cc: sfrench@samba.org
      Cc: vandrove@vc.cvut.cz
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      9465efc9
  15. 01 7月, 2008 1 次提交
    • J
      Properly notify block layer of sync writes · 18ce3751
      Jens Axboe 提交于
      fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
      then immediately wait on them. Conceptually, that makes them sync writes
      and we should treat them as such so that the IO schedulers can handle
      them appropriately.
      
      This patch fixes a write starvation issue that Lin Ming reported, where
      xx is stuck for more than 2 minutes because of a large number of
      synchronous IO in the system:
      
      INFO: task kjournald:20558 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
      message.
      kjournald     D ffff810010820978  6712 20558      2
      ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
      ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
      0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
      Call Trace:
      [<ffffffff803ba6f2>] kobject_get+0x12/0x17
      [<ffffffff80247537>] getnstimeofday+0x2f/0x83
      [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
      [<ffffffff8066d195>] io_schedule+0x5d/0x9f
      [<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
      [<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
      [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
      [<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
      [<ffffffff80243909>] wake_bit_function+0x0/0x23
      [<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
      [<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
      [<ffffffff8023a676>] lock_timer_base+0x26/0x4b
      [<ffffffff8030300a>] kjournald+0xc1/0x1fb
      [<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
      [<ffffffff80302f49>] kjournald+0x0/0x1fb
      [<ffffffff802437bb>] kthread+0x47/0x74
      [<ffffffff8022de51>] schedule_tail+0x28/0x5d
      [<ffffffff8020cac8>] child_rip+0xa/0x12
      [<ffffffff80243774>] kthread+0x0/0x74
      [<ffffffff8020cabe>] child_rip+0x0/0x12
      
      Lin Ming confirms that this patch fixes the issue. I've run tests with
      it for the past week and no ill effects have been observed, so I'm
      proposing it for inclusion into 2.6.26.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      18ce3751
  16. 23 6月, 2008 1 次提交
  17. 07 6月, 2008 1 次提交
    • A
      introduce memory_read_from_buffer() · 93b07113
      Akinobu Mita 提交于
      This patch introduces memory_read_from_buffer().
      
      The only difference between memory_read_from_buffer() and
      simple_read_from_buffer() is which address space the function copies to.
      
      simple_read_from_buffer copies to user space memory.
      memory_read_from_buffer copies to normal memory.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Doug Warzecha <Douglas_Warzecha@dell.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Matt Domsch <Matt_Domsch@dell.com>
      Cc: Abhay Salunke <Abhay_Salunke@dell.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Markus Rechberger <markus.rechberger@amd.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Bob Moore <robert.moore@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
      Cc: Michael Holzheu <holzheu@de.ibm.com>
      Cc: Brian King <brking@us.ibm.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Andrew Vasquez <linux-driver@qlogic.com>
      Cc: Seokmann Ju <seokmann.ju@qlogic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93b07113
  18. 07 5月, 2008 2 次提交
  19. 29 4月, 2008 2 次提交