1. 14 11月, 2008 2 次提交
  2. 31 10月, 2008 1 次提交
  3. 23 10月, 2008 4 次提交
  4. 21 10月, 2008 9 次提交
  5. 09 10月, 2008 6 次提交
  6. 04 10月, 2008 1 次提交
    • J
      generic block based fiemap implementation · 68c9d702
      Josef Bacik 提交于
      Any block based fs (this patch includes ext3) just has to declare its own
      fiemap() function and then call this generic function with its own
      get_block_t. This works well for block based filesystems that will map
      multiple contiguous blocks at one time, but will work for filesystems that
      only map one block at a time, you will just end up with an "extent" for each
      block. One gotcha is this will not play nicely where there is hole+data
      after the EOF. This function will assume its hit the end of the data as soon
      as it hits a hole after the EOF, so if there is any data past that it will
      not pick that up. AFAIK no block based fs does this anyway, but its in the
      comments of the function anyway just in case.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      68c9d702
  7. 09 10月, 2008 1 次提交
    • M
      vfs: vfs-level fiemap interface · c4b929b8
      Mark Fasheh 提交于
      Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
      inode operation.
      
      Userspace can get extent information on a file via fiemap ioctl. As input,
      the fiemap ioctl takes a struct fiemap which includes an array of struct
      fiemap_extent (fm_extents). Size of the extent array is passed as
      fm_extent_count and number of extents returned will be written into
      fm_mapped_extents. Offset and length fields on the fiemap structure
      (fm_start, fm_length) describe a logical range which will be searched for
      extents. All extents returned will at least partially contain this range.
      The actual extent offsets and ranges returned will be unmodified from their
      offset and range on-disk.
      
      The fiemap ioctl returns '0' on success. On error, -1 is returned and errno
      is set. If errno is equal to EBADR, then fm_flags will contain those flags
      which were passed in which the kernel did not understand. On all other
      errors, the contents of fm_extents is undefined.
      
      As fiemap evolved, there have been many authors of the vfs patch. As far as
      I can tell, the list includes:
      Kalpak Shah <kalpak.shah@sun.com>
      Andreas Dilger <adilger@sun.com>
      Eric Sandeen <sandeen@redhat.com>
      Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      c4b929b8
  8. 04 10月, 2008 1 次提交
    • J
      nfsd: common grace period control · af558e33
      J. Bruce Fields 提交于
      Rewrite grace period code to unify management of grace period across
      lockd and nfsd.  The current code has lockd and nfsd cooperate to
      compute a grace period which is satisfactory to them both, and then
      individually enforce it.  This creates a slight race condition, since
      the enforcement is not coordinated.  It's also more complicated than
      necessary.
      
      Here instead we have lockd and nfsd each inform common code when they
      enter the grace period, and when they're ready to leave the grace
      period, and allow normal locking only after both of them are ready to
      leave.
      
      We also expect the locks_start_grace()/locks_end_grace() interface here
      to be simpler to build on for future cluster/high-availability work,
      which may require (for example) putting individual filesystems into
      grace, or enforcing grace periods across multiple cluster nodes.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      af558e33
  9. 30 9月, 2008 1 次提交
    • T
      Configure out file locking features · bfcd17a6
      Thomas Petazzoni 提交于
      This patch adds the CONFIG_FILE_LOCKING option which allows to remove
      support for advisory locks. With this patch enabled, the flock()
      system call, the F_GETLK, F_SETLK and F_SETLKW operations of fcntl()
      and NFS support are disabled. These features are not necessarly needed
      on embedded systems. It allows to save ~11 Kb of kernel code and data:
      
         text          data     bss     dec     hex filename
      1125436        118764  212992 1457192  163c28 vmlinux.old
      1114299        118564  212992 1445855  160fdf vmlinux
       -11137    -200       0  -11337   -2C49 +/-
      
      This patch has originally been written by Matt Mackall
      <mpm@selenic.com>, and is part of the Linux Tiny project.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: matthew@wil.cx
      Cc: linux-fsdevel@vger.kernel.org
      Cc: mpm@selenic.com
      Cc: akpm@linux-foundation.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      bfcd17a6
  10. 29 7月, 2008 1 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  11. 27 7月, 2008 10 次提交
  12. 26 7月, 2008 1 次提交
    • M
      locks: add special return value for asynchronous locks · bde74e4b
      Miklos Szeredi 提交于
      Use a special error value FILE_LOCK_DEFERRED to mean that a locking
      operation returned asynchronously.  This is returned by
      
        posix_lock_file() for sleeping locks to mean that the lock has been
        queued on the block list, and will be woken up when it might become
        available and needs to be retried (either fl_lmops->fl_notify() is
        called or fl_wait is woken up).
      
        f_op->lock() to mean either the above, or that the filesystem will
        call back with fl_lmops->fl_grant() when the result of the locking
        operation is known.  The filesystem can do this for sleeping as well
        as non-sleeping locks.
      
      This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
      filesystems are not mistaken to mean an asynchronous locking.
      
      This also makes error handling in fs/locks.c and lockd/svclock.c slightly
      cleaner.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: David Teigland <teigland@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bde74e4b
  13. 25 7月, 2008 2 次提交
    • U
      flag parameters: NONBLOCK in pipe · be61a86d
      Ulrich Drepper 提交于
      This patch adds O_NONBLOCK support to pipe2.  It is minimally more involved
      than the patches for eventfd et.al but still trivial.  The interfaces of the
      create_write_pipe and create_read_pipe helper functions were changed and the
      one other caller as well.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fds[2];
        if (syscall (__NR_pipe2, fds, 0) == -1)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
          {
            puts ("pipe2(O_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be61a86d
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b