1. 29 7月, 2008 2 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
    • H
      exec: include pagemap.h again to fix build · ca5b172b
      Hugh Dickins 提交于
      Fix compilation errors on avr32 and without CONFIG_SWAP, introduced by
      ba92a43d ("exec: remove some includes")
      
        In file included from include/asm/tlb.h:24,
                         from fs/exec.c:55:
        include/asm-generic/tlb.h: In function 'tlb_flush_mmu':
        include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages'
        include/asm-generic/tlb.h: In function 'tlb_remove_page':
        include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release'
        make[1]: *** [fs/exec.o] Error 1
      
      This straightforward part-revert is nobody's favourite patch to address
      the underlying tlb.h needs swap.h needs pagemap.h (but sparc won't like
      that) mess; but appropriate to fix the build now before any overhaul.
      Reported-by: NYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
      Reported-by: NHaavard Skinnemoen <haavard.skinnemoen@atmel.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Tested-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca5b172b
  2. 28 7月, 2008 6 次提交
    • P
      sh: Initial ELF FDPIC support. · 3bc24a1a
      Paul Mundt 提交于
      This adds initial support for ELF FDPIC on MMU-less SH, as per version
      0.2 of the ABI definition at:
      
      	http://www.codesourcery.com/public/docs/sh-fdpic/sh-fdpic-abi.txtSigned-off-by: NPaul Mundt <lethal@linux-sh.org>
      3bc24a1a
    • P
      binfmt_elf_fdpic: Magical stack pointer index, for NEW_AUX_ENT compat. · 9b14ec35
      Paul Mundt 提交于
      While implementing binfmt_elf_fdpic on SH it quickly became apparent
      that SH was the first platform to support both binfmt_elf_fdpic and
      binfmt_elf, as well as the only of the FDPIC platforms to make use of the
      auxvt.
      
      Currently binfmt_elf_fdpic uses a special version of NEW_AUX_ENT() where
      the first argument is the entry displacement after csp has been adjusted,
      being reset after each adjustment. As we have no ability to sort this out
      through the platform's ARCH_DLINFO, this index needs to be managed
      entirely in create_elf_fdpic_tables(). Presently none of the platforms
      that set their own auxvt entries are able to do so through their
      respective ARCH_DLINFOs when using binfmt_elf_fdpic.
      
      In addition to this, binfmt_elf_fdpic has been looking at
      DLINFO_ARCH_ITEMS for the number of architecture-specific entries in the
      auxvt. This is legacy cruft, and is not defined by any platforms in-tree,
      even those that make heavy use of the auxvt. AT_VECTOR_SIZE_ARCH is
      always available, and contains the number that is of interest here, so we
      switch to using that unconditionally as well.
      
      As this has direct bearing on how much stack is used, platforms that have
      configurable (or dynamically adjustable) NEW_AUX_ENT calls need to either
      make AT_VECTOR_SIZE_ARCH more fine-grained, or leave it as a worst-case
      and live with some lost stack space if those entries aren't pushed (some
      platforms may also need to purposely sacrifice some space here for
      alignment considerations, as noted in the code -- although not an issue
      for any FDPIC-capable platform today).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      9b14ec35
    • A
      task IO accounting: move all IO statistics in struct task_io_accounting · 940389b8
      Andrea Righi 提交于
      Simplify the code of include/linux/task_io_accounting.h.
      
      It is also more reasonable to have all the task i/o-related statistics in a
      single struct (task_io_accounting).
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      940389b8
    • T
      NFS: Ensure we call nfs_sb_deactive() after releasing the directory inode · 744d18db
      Trond Myklebust 提交于
      In order to avoid the "Busy inodes after unmount" error message, we need to
      ensure that nfs_async_unlink_release() releases the super block after the
      call to nfs_free_unlinkdata().
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      744d18db
    • M
      nfs_remount oops when rebooting + possible fix · 31c94469
      Marc Zyngier 提交于
      Jeff, Trond,
      
      The commit
      
      48b605f8 (NFS: implement option checking
      when remounting NFS filesystems (resend))
      
      generate an Oops on my platform when rebooting while its root FS on
      an NFS share (NFSv3, TCP) :
      
      Unmounting local filesystems...done.
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c3d00000
      [00000000] *pgd=a3d72031, *pte=00000000, *ppte=00000000
      Internal error: Oops: 17 [#1]
      Modules linked in: cpufreq_powersave cpufreq_ondemand cpufreq_userspace cpufreq_conservative ext3 jbd sd_mod pata_pcmcia libata scsi_mod pcmcia loop firmware_class pxafb cfbcopyarea cfbimgblt cfbfillrect pxa2xx_cs pxa2xx_core pcmcia_core snd_pxa2xx_ac97 snd_ac97_codec ac97_bus snd_pxa2xx_pcm snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd isp116x_hcd soundcore rtc_sa1100 snd_page_alloc pxa25x_udc usbcore rtc_ds1307 rtc_core
      CPU: 0    Not tainted  (2.6.26-03414-g33af79d1-dirty #15)
      PC is at nfs_remount+0x40/0x264
      LR is at do_remount_sb+0x158/0x194
      pc : [<c00bbf54>]    lr : [<c0076c40>]    psr: 60000013
      sp : c2dd1e70  ip : c2dd1e98  fp : c2dd1e94
      r10: 00000040  r9 : c3d17000  r8 : c3c3fc40
      r7 : 00000000  r6 : 00000000  r5 : c3d2b200  r4 : 00000000
      r3 : 00000003  r2 : 00000000  r1 : c2dd1e9c  r0 : c3c3fc00
      Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0000397f  Table: a3d00000  DAC: 00000015
      Process mount (pid: 1462, stack limit = 0xc2dd0270)
      Stack: (0xc2dd1e70 to 0xc2dd2000)
      1e60:                                     00000000 c3c3fc00 00000000 00000000
      1e80: c3c3fc40 c3d17000 c2dd1ebc c2dd1e98 c0076c40 c00bbf20 c01c61e4 00000001
      1ea0: c2dd1ebc 00000001 c3c3fc00 c2dd1ef0 c2dd1ee4 c2dd1ec0 c008c6d8 c0076af4
      1ec0: 00000021 00000040 c2dd1ef0 c3d77000 c3eaa000 00000000 c2dd1f6c c2dd1ee8
      1ee0: c008d1bc c008c5f8 00000000 c2dd0000 c3c0c320 c3805b38 c002064c 0001f820
      1f00: 0001f810 00000001 00000001 00000000 c2dd0000 00000000 c2dd1f34 c2dd1f28
      1f20: c005ead8 c005e6f8 c2dd1f44 c2dd1f38 c005eaf8 c005ead0 c2dd1f6c c2dd1f48
      1f40: c008ae3c 00000000 c3d77000 0001f810 c0ed0021 c0020ca8 c2dd0000 00000000
      1f60: c2dd1fa4 c2dd1f70 c008d2d4 c008d0bc 00000000 0001f810 c2dd1f9c c3eaa000
      1f80: c3d17000 00000000 00000000 be8b6aa8 be8b6ad0 00000015 00000000 c2dd1fa8
      1fa0: c0020b00 c008d254 00000000 be8b6aa8 0001f810 0001f820 0001f830 c0ed0021
      1fc0: 00000000 be8b6aa8 be8b6ad0 00000015 00000000 be8b6ad0 0001f810 be8b6aa8
      1fe0: 0001f810 be8b6964 0000aab8 40125124 60000010 0001f810 00000000 00000000
      Backtrace:
      [<c00bbf14>] (nfs_remount+0x0/0x264) from [<c0076c40>] (do_remount_sb+0x158/0x194)
        r9:c3d17000 r8:c3c3fc40 r7:00000000 r6:00000000 r5:c3c3fc00
      r4:00000000
      [<c0076ae8>] (do_remount_sb+0x0/0x194) from [<c008c6d8>] (do_remount+0xec/0x118)
        r6:c2dd1ef0 r5:c3c3fc00 r4:00000001
      [<c008c5ec>] (do_remount+0x0/0x118) from [<c008d1bc>] (do_mount+0x10c/0x198)
      [<c008d0b0>] (do_mount+0x0/0x198) from [<c008d2d4>] (sys_mount+0x8c/0xd4)
      [<c008d248>] (sys_mount+0x0/0xd4) from [<c0020b00>] (ret_fast_syscall+0x0/0x2c)
        r7:00000015 r6:be8b6ad0 r5:be8b6aa8 r4:00000000
      Code: 0a000086 ea000006 e3530003 8a000004 (e5923000)
      ---[ end trace 55e1b689cf8c8a6a ]---
      ------------[ cut here ]------------
      WARNING: at kernel/exit.c:966 do_exit+0x3c/0x628()
      Modules linked in: cpufreq_powersave cpufreq_ondemand cpufreq_userspace cpufreq_conservative ext3 jbd sd_mod pata_pcmcia libata scsi_mod pcmcia loop firmware_class pxafb cfbcopyarea cfbimgblt cfbfillrect pxa2xx_cs pxa2xx_core pcmcia_core snd_pxa2xx_ac97 snd_ac97_codec ac97_bus snd_pxa2xx_pcm snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd isp116x_hcd soundcore rtc_sa1100 snd_page_alloc pxa25x_udc usbcore rtc_ds1307 rtc_core
      [<c0025168>] (dump_stack+0x0/0x14) from [<c0032154>] (warn_on_slowpath+0x4c/0x68)
      [<c0032108>] (warn_on_slowpath+0x0/0x68) from [<c003531c>] (do_exit+0x3c/0x628)
        r6:0000000b r5:c3c3dc80 r4:c2dd0000
      [<c00352e0>] (do_exit+0x0/0x628) from [<c0025004>] (die+0x2b0/0x30c)
      [<c0024d54>] (die+0x0/0x30c) from [<c00270bc>] (__do_kernel_fault+0x6c/0x80)
      [<c0027050>] (__do_kernel_fault+0x0/0x80) from [<c00272e0>] (do_page_fault+0x210/0x230)
        r7:c3fa7118 r6:c3c3dc80 r5:c3d166a8 r4:00010000
      [<c00270d0>] (do_page_fault+0x0/0x230) from [<c00201ec>] (do_DataAbort+0x3c/0xa0)
      [<c00201b0>] (do_DataAbort+0x0/0xa0) from [<c002064c>] (__dabt_svc+0x4c/0x60)
      Exception stack(0xc2dd1e28 to 0xc2dd1e70)
      1e20:                   c3c3fc00 c2dd1e9c 00000000 00000003 00000000 c3d2b200
      1e40: 00000000 00000000 c3c3fc40 c3d17000 00000040 c2dd1e94 c2dd1e98 c2dd1e70
      1e60: c0076c40 c00bbf54 60000013 ffffffff
        r8:c3c3fc40 r7:00000000 r6:00000000 r5:c2dd1e5c r4:ffffffff
      [<c00bbf14>] (nfs_remount+0x0/0x264) from [<c0076c40>] (do_remount_sb+0x158/0x194)
        r9:c3d17000 r8:c3c3fc40 r7:00000000 r6:00000000 r5:c3c3fc00
      r4:00000000
      [<c0076ae8>] (do_remount_sb+0x0/0x194) from [<c008c6d8>] (do_remount+0xec/0x118)
        r6:c2dd1ef0 r5:c3c3fc00 r4:00000001
      [<c008c5ec>] (do_remount+0x0/0x118) from [<c008d1bc>] (do_mount+0x10c/0x198)
      [<c008d0b0>] (do_mount+0x0/0x198) from [<c008d2d4>] (sys_mount+0x8c/0xd4)
      [<c008d248>] (sys_mount+0x0/0xd4) from [<c0020b00>] (ret_fast_syscall+0x0/0x2c)
        r7:00000015 r6:be8b6ad0 r5:be8b6aa8 r4:00000000
      ---[ end trace 55e1b689cf8c8a6a ]---
      /etc/rc6.d/S60umountroot: line 17:  1462 Segmentation fault      mount $MOUNT_FORCE_OPT -n -o remount,ro -t dummytype dummydev / 2> /dev/null
      
      The new super.c:nfs_remount function doesn't check the validity of the
      options/options4 pointers. Unfortunately, this seems to happend.
      The obvious patch seems to check the pointers, and not to do anything if
      the happend to be NULL.
      
      Tested on an XScale PXA255 system, latest git.
      
      Regards,
      
      	M.
      Signed-off-by: NMarc Zyngier <marc.zyngier@altran.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      31c94469
    • A
      task IO accounting: improve code readability · 5995477a
      Andrea Righi 提交于
      Put all i/o statistics in struct proc_io_accounting and use inline functions to
      initialize and increment statistics, removing a lot of single variable
      assignments.
      
      This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
      CONFIG_TASK_IO_ACCOUNTING=y).
      
          text    data     bss     dec     hex filename
         11651       0       0   11651    2d83 kernel/exit.o.before
         11619       0       0   11619    2d63 kernel/exit.o.after
         10886     132     136   11154    2b92 kernel/fork.o.before
         10758     132     136   11026    2b12 kernel/fork.o.after
      
       3082029  807968 4818600 8708597  84e1f5 vmlinux.o.before
       3081869  807968 4818600 8708437  84e155 vmlinux.o.after
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5995477a
  3. 27 7月, 2008 32 次提交