1. 22 7月, 2007 1 次提交
  2. 18 7月, 2007 1 次提交
    • A
      sys_fallocate() implementation on i386, x86_64 and powerpc · 97ac7350
      Amit Arora 提交于
      fallocate() is a new system call being proposed here which will allow
      applications to preallocate space to any file(s) in a file system.
      Each file system implementation that wants to use this feature will need
      to support an inode operation called ->fallocate().
      Applications can use this feature to avoid fragmentation to certain
      level and thus get faster access speed. With preallocation, applications
      also get a guarantee of space for particular file(s) - even if later the
      the system becomes full.
      
      Currently, glibc provides an interface called posix_fallocate() which
      can be used for similar cause. Though this has the advantage of working
      on all file systems, but it is quite slow (since it writes zeroes to
      each block that has to be preallocated). Without a doubt, file systems
      can do this more efficiently within the kernel, by implementing
      the proposed fallocate() system call. It is expected that
      posix_fallocate() will be modified to call this new system call first
      and incase the kernel/filesystem does not implement it, it should fall
      back to the current implementation of writing zeroes to the new blocks.
      ToDos:
      1. Implementation on other architectures (other than i386, x86_64,
         and ppc). Patches for s390(x) and ia64 are already available from
         previous posts, but it was decided that they should be added later
         once fallocate is in the mainline. Hence not including those patches
         in this take.
      2. Changes to glibc,
         a) to support fallocate() system call
         b) to make posix_fallocate() and posix_fallocate64() call fallocate()
      Signed-off-by: NAmit Arora <aarora@in.ibm.com>
      97ac7350
  3. 17 7月, 2007 1 次提交
    • V
      diskquota: 32bit quota tools on 64bit architectures · b716395e
      Vasily Tarasov 提交于
      OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
      working on 64bit architectures.  In 2.6.10 kernel sys32_quotactl() function
      was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
      32/64bit clean, enable it for 32bit" However this isn't right.  Look at
      if_dqblk structure:
      
      struct if_dqblk {
              __u64 dqb_bhardlimit;
              __u64 dqb_bsoftlimit;
              __u64 dqb_curspace;
              __u64 dqb_ihardlimit;
              __u64 dqb_isoftlimit;
              __u64 dqb_curinodes;
              __u64 dqb_btime;
              __u64 dqb_itime;
              __u32 dqb_valid;
      };
      
      For 32 bit quota tools sizeof(if_dqblk) == 0x44.
      But for 64 bit kernel its size is 0x48, 'cause of alignment!
      Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
      that handles this and related situations.
      
      [michal.k.k.piotrowski@gmail.com: build fix]
      [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
      Signed-off-by: NVasily Tarasov <vtaras@openvz.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NMichal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b716395e
  4. 21 6月, 2007 1 次提交
  5. 13 5月, 2007 1 次提交
  6. 11 5月, 2007 3 次提交
  7. 09 5月, 2007 1 次提交
    • U
      utimensat implementation · 1c710c89
      Ulrich Drepper 提交于
      Implement utimensat(2) which is an extension to futimesat(2) in that it
      
      a) supports nano-second resolution for the timestamps
      b) allows to selectively ignore the atime/mtime value
      c) allows to selectively use the current time for either atime or mtime
      d) supports changing the atime/mtime of a symlink itself along the lines
         of the BSD lutimes(3) functions
      
      For this change the internally used do_utimes() functions was changed to
      accept a timespec time value and an additional flags parameter.
      
      Additionally the sys_utime function was changed to match compat_sys_utime
      which already use do_utimes instead of duplicating the work.
      
      Also, the completely missing futimensat() functionality is added.  We have
      such a function in glibc but we have to resort to using /proc/self/fd/* which
      not everybody likes (chroot etc).
      
      Test application (the syscall number will need per-arch editing):
      
      #include <errno.h>
      #include <fcntl.h>
      #include <time.h>
      #include <sys/time.h>
      #include <stddef.h>
      #include <syscall.h>
      
      #define __NR_utimensat 280
      
      #define UTIME_NOW       ((1l << 30) - 1l)
      #define UTIME_OMIT      ((1l << 30) - 2l)
      
      int
      main(void)
      {
        int status = 0;
      
        int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
        if (fd == -1)
          error (1, errno, "failed to create test file \"ttt\"");
      
        struct stat64 st1;
        if (fstat64 (fd, &st1) != 0)
          error (1, errno, "fstat failed");
      
        struct timespec t[2];
        t[0].tv_sec = 0;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 0;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        struct stat64 st2;
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
          {
            puts ("atim not reset to zero");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim not reset to zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0] = st1.st_atim;
        t[1].tv_sec = 0;
        t[1].tv_nsec = UTIME_OMIT;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
            || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
          {
            puts ("atim not set");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim changed from zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = UTIME_OMIT;
        t[1] = st1.st_mtim;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
            || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
          {
            puts ("mtim changed from original time");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
            || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
          {
            puts ("mtim not set");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        sleep (2);
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = UTIME_NOW;
        t[1].tv_sec = 0;
        t[1].tv_nsec = UTIME_NOW;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        struct timeval tv;
        gettimeofday(&tv,NULL);
      
        if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
            || st2.st_atim.tv_sec > tv.tv_sec)
          {
            puts ("atim not set to NOW");
            status = 1;
          }
        if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
            || st2.st_mtim.tv_sec > tv.tv_sec)
          {
            puts ("mtim not set to NOW");
            status = 1;
          }
      
        if (symlink ("ttt", "tttsym") != 0)
          error (1, errno, "cannot create symlink");
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 0;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
          error (1, errno, "utimensat failed");
      
        if (lstat64 ("tttsym", &st2) != 0)
          error (1, errno, "lstat failed");
      
        if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
          {
            puts ("symlink atim not reset to zero");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("symlink mtim not reset to zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0].tv_sec = 1;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 1;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
          {
            puts ("atim not reset to one");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim not reset to one");
            status = 1;
          }
      
        if (status == 0)
           puts ("all OK");
      
       out:
        close (fd);
        unlink ("ttt");
        unlink ("tttsym");
      
        return status;
      }
      
      [akpm@linux-foundation.org: add missing i386 syscall table entry]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@openvz.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c710c89
  8. 03 5月, 2007 1 次提交
  9. 17 3月, 2007 1 次提交
  10. 13 2月, 2007 1 次提交
  11. 12 2月, 2007 1 次提交
    • K
      [PATCH] Common compat_sys_sysinfo · d4d23add
      Kyle McMartin 提交于
      I noticed that almost all architectures implemented exactly the same
      sys32_sysinfo...  except parisc, where a bug was to be found in handling of
      the uptime.  So let's remove a whole whack of code for fun and profit.
      Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
      would be the best tested.
      
      This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
      instead extracting out the common code from sys_sysinfo.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4d23add
  12. 26 9月, 2006 3 次提交
    • J
      [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder · adf14236
      Jan Beulich 提交于
      Current gcc generates calls not jumps to noreturn functions. When that happens the
      return address can point to the next function, which confuses the unwinder.
      
      This patch works around it by marking asynchronous exception
      frames in contrast normal call frames in the unwind information.  Then teach
      the unwinder to decode this.
      
      For normal call frames the unwinder now subtracts one from the address which avoids
      this problem.  The standard libgcc unwinder uses the same trick.
      
      It doesn't include adjustment of the printed address (i.e. for the original
      example, it'd still be kernel_math_error+0 that gets displayed, but the
      unwinder wouldn't get confused anymore.
      
      This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
      unfortunately because earlier binutils don't support .cfi_signal_frame
      
      [AK: added automatic detection of the new binutils and wrote description]
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      adf14236
    • A
      [PATCH] x86: Add portable getcpu call · 3cfc348b
      Andi Kleen 提交于
      For NUMA optimization and some other algorithms it is useful to have a fast
      to get the current CPU and node numbers in user space.
      
      x86-64 added a fast way to do this in a vsyscall. This adds a generic
      syscall for other architectures to make it a generic portable facility.
      
      I expect some of them will also implement it as a faster vsyscall.
      
      The cache is an optimization for the x86-64 vsyscall optimization. Since
      what the syscall returns is an approximation anyways and user space
      often wants very fast results it can be cached for some time.  The norma
      methods to get this information in user space are relatively slow
      
      The vsyscall is in a better position to manage the cache because it has direct
      access to a fast time stamp (jiffies). For the generic syscall optimization
      it doesn't help much, but enforce a valid argument to keep programs
      portable
      
      I only added an i386 syscall entry for now. Other architectures can follow
      as needed.
      
      AK: Also added some cleanups from Andrew Morton
      Signed-off-by: NAndi Kleen <ak@suse.de>
      3cfc348b
    • A
      [PATCH] Add ppoll/pselect syscalls · 957dc87c
      Andi Kleen 提交于
      Needed TIF_RESTORE_SIGMASK first
      Signed-off-by: NAndi Kleen <ak@suse.de>
      957dc87c
  13. 29 7月, 2006 1 次提交
    • A
      [PATCH] x86_64: Don't clobber r8-r11 in int 0x80 handler · 0e92da4a
      Andi Kleen 提交于
      When int 0x80 is called from long mode r8-r11 would leak out of the
      kernel (or rather they would be filled with some values from
      the kernel stack). I don't think it's a security issue because
      the values come from the fixed stack frame which should be near
      always user registers from a previous interrupt.
      
      Still better fix it.
      
      Longer term the register save macros need to be cleaned up
      to avoid such mistakes in the future.
      
      Original analysis from Richard Brunner, fix by me.
      
      Cc: Richard.Brunner@amd.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0e92da4a
  14. 04 7月, 2006 1 次提交
  15. 27 6月, 2006 4 次提交
  16. 23 6月, 2006 1 次提交
  17. 02 5月, 2006 1 次提交
  18. 19 4月, 2006 1 次提交
  19. 10 4月, 2006 2 次提交
  20. 28 3月, 2006 1 次提交
  21. 27 3月, 2006 1 次提交
  22. 12 2月, 2006 1 次提交
    • U
      [PATCH] fstatat64 support · cff2b760
      Ulrich Drepper 提交于
      The *at patches introduced fstatat and, due to inusfficient research, I
      used the newfstat functions generally as the guideline.  The result is that
      on 32-bit platforms we don't have all the information needed to implement
      fstatat64.
      
      This patch modifies the code to pass up 64-bit information if
      __ARCH_WANT_STAT64 is defined.  I renamed the syscall entry point to make
      this clear.  Other archs will continue to use the existing code.  On x86-64
      the compat code is implemented using a new sys32_ function.  this is what
      is done for the other stat syscalls as well.
      
      This patch might break some other archs (those which define
      __ARCH_WANT_STAT64 and which already wired up the syscall).  Yet others
      might need changes to accomodate the compatibility mode.  I really don't
      want to do that work because all this stat handling is a mess (more so in
      glibc, but the kernel is also affected).  It should be done by the arch
      maintainers.  I'll provide some stand-alone test shortly.  Those who are
      eager could compile glibc and run 'make check' (no installation needed).
      
      The patch below has been tested on x86 and x86-64.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cff2b760
  23. 09 2月, 2006 1 次提交
  24. 02 2月, 2006 1 次提交
  25. 19 1月, 2006 1 次提交
  26. 12 1月, 2006 2 次提交
  27. 11 1月, 2006 1 次提交
  28. 09 1月, 2006 1 次提交
    • C
      [PATCH] Swap Migration V5: sys_migrate_pages interface · 39743889
      Christoph Lameter 提交于
      sys_migrate_pages implementation using swap based page migration
      
      This is the original API proposed by Ray Bryant in his posts during the first
      half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.
      
      The intent of sys_migrate is to migrate memory of a process.  A process may
      have migrated to another node.  Memory was allocated optimally for the prior
      context.  sys_migrate_pages allows to shift the memory to the new node.
      
      sys_migrate_pages is also useful if the processes available memory nodes have
      changed through cpuset operations to manually move the processes memory.  Paul
      Jackson is working on an automated mechanism that will allow an automatic
      migration if the cpuset of a process is changed.  However, a user may decide
      to manually control the migration.
      
      This implementation is put into the policy layer since it uses concepts and
      functions that are also needed for mbind and friends.  The patch also provides
      a do_migrate_pages function that may be useful for cpusets to automatically
      move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
      implementation.
      
      The current code here is based on the swap based page migration capability and
      thus is not able to preserve the physical layout relative to it containing
      nodeset (which may be a cpuset).  When direct page migration becomes available
      then the implementation needs to be changed to do a isomorphic move of pages
      between different nodesets.  The current implementation simply evicts all
      pages in source nodeset that are not in the target nodeset.
      
      Patch supports ia64, i386 and x86_64.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39743889
  29. 07 1月, 2006 1 次提交
  30. 13 9月, 2005 1 次提交
  31. 10 9月, 2005 1 次提交