1. 31 12月, 2008 1 次提交
  2. 03 12月, 2008 1 次提交
  3. 02 12月, 2008 2 次提交
    • D
      epoll: introduce resource usage limits · 7ef9964e
      Davide Libenzi 提交于
      It has been thought that the per-user file descriptors limit would also
      limit the resources that a normal user can request via the epoll
      interface.  Vegard Nossum reported a very simple program (a modified
      version attached) that can make a normal user to request a pretty large
      amount of kernel memory, well within the its maximum number of fds.  To
      solve such problem, default limits are now imposed, and /proc based
      configuration has been introduced.  A new directory has been created,
      named /proc/sys/fs/epoll/ and inside there, there are two configuration
      points:
      
        max_user_instances = Maximum number of devices - per user
      
        max_user_watches   = Maximum number of "watched" fds - per user
      
      The current default for "max_user_watches" limits the memory used by epoll
      to store "watches", to 1/32 of the amount of the low RAM.  As example, a
      256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
      That should be enough to not break existing heavy epoll users.  The
      default value for "max_user_instances" is set to 128, that should be
      enough too.
      
      This also changes the userspace, because a new error code can now come out
      from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
      listed, so that should be ok.
      
      [akpm@linux-foundation.org: use get_current_user()]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: <stable@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: NVegard Nossum <vegardno@ifi.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef9964e
    • M
      ocfs2: Small documentation update · a2eee69b
      Mark Fasheh 提交于
      Remove some features from the "not-supported" list that are actually
      supported now.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a2eee69b
  4. 01 12月, 2008 1 次提交
  5. 13 11月, 2008 1 次提交
  6. 07 11月, 2008 2 次提交
  7. 31 10月, 2008 1 次提交
  8. 20 10月, 2008 3 次提交
    • H
      ext3: add an option to control error handling on file data · 0e4fb5e2
      Hidehiro Kawai 提交于
      If the journal doesn't abort when it gets an IO error in file data blocks,
      the file data corruption will spread silently.  Because most of
      applications and commands do buffered writes without fsync(), they don't
      notice the IO error.  It's scary for mission critical systems.  On the
      other hand, if the journal aborts whenever it gets an IO error in file
      data blocks, the system will easily become inoperable.  So this patch
      introduces a filesystem option to determine whether it aborts the journal
      or just call printk() when it gets an IO error in file data.
      
      If you mount a ext3 fs with data_err=abort option, it aborts on file data
      write error.  If you mount it with data_err=ignore, it doesn't abort, just
      call printk().  data_err=ignore is the default.
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e4fb5e2
    • A
      documentation: clarify dirty_ratio and dirty_background_ratio description · 7a6560e0
      Andrea Righi 提交于
      The current documentation of dirty_ratio and dirty_background_ratio is a
      bit misleading.
      
      In the documentation we say that they are "a percentage of total system
      memory", but the current page writeback policy, intead, is to apply the
      percentages to the dirtyable memory, that means free pages + reclaimable
      pages.
      
      Better to be more explicit to clarify this concept.
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a6560e0
    • K
      coredump_filter: add hugepage dumping · e575f111
      KOSAKI Motohiro 提交于
      Presently hugepage's vma has a VM_RESERVED flag in order not to be
      swapped.  But a VM_RESERVED vma isn't core dumped because this flag is
      often used for some kernel vmas (e.g.  vmalloc, sound related).
      
      Thus hugepages are never dumped and it can't be debugged easily.  Many
      developers want hugepages to be included into core-dump.
      
      However, We can't read generic VM_RESERVED area because this area is often
      IO mapping area.  then these area reading may change device state.  it is
      definitly undesiable side-effect.
      
      So adding a hugepage specific bit to the coredump filter is better.  It
      will be able to hugepage core dumping and doesn't cause any side-effect to
      any i/o devices.
      
      In additional, libhugetlb use hugetlb private mapping pages as anonymous
      page.  Then, hugepage private mapping pages should be core dumped by
      default.
      
      Then, /proc/[pid]/core_dump_filter has two new bits.
      
       - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
       - bit 6 mean hugetlb shared mapping pages are dumped or not.  (default: no)
      
      I tested by following method.
      
      % ulimit -c unlimited
      % ./crash_hugepage  50
      % ./crash_hugepage  50  -p
      % ls -lh
      % gdb ./crash_hugepage core
      %
      % echo 0x43 > /proc/self/coredump_filter
      % ./crash_hugepage  50
      % ./crash_hugepage  50  -p
      % ls -lh
      % gdb ./crash_hugepage core
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #include <string.h>
      
      #include "hugetlbfs.h"
      
      int main(int argc, char** argv){
      	char* p;
      	int ch;
      	int mmap_flags = MAP_SHARED;
      	int fd;
      	int nr_pages;
      
      	while((ch = getopt(argc, argv, "p")) != -1) {
      		switch (ch) {
      		case 'p':
      			mmap_flags &= ~MAP_SHARED;
      			mmap_flags |= MAP_PRIVATE;
      			break;
      		default:
      			/* nothing*/
      			break;
      		}
      	}
      	argc -= optind;
      	argv += optind;
      
      	if (argc == 0){
      		printf("need # of pages\n");
      		exit(1);
      	}
      
      	nr_pages = atoi(argv[0]);
      	if (nr_pages < 2) {
      		printf("nr_pages must >2\n");
      		exit(1);
      	}
      
      	fd = hugetlbfs_unlinked_fd();
      	p = mmap(NULL, nr_pages * gethugepagesize(),
      		 PROT_READ|PROT_WRITE, mmap_flags, fd, 0);
      
      	sleep(2);
      
      	*(p + gethugepagesize()) = 1; /* COW */
      	sleep(2);
      
      	/* crash! */
      	*(int*)0 = 1;
      
      	return 0;
      }
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: William Irwin <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e575f111
  9. 17 10月, 2008 7 次提交
  10. 14 10月, 2008 2 次提交
  11. 11 10月, 2008 2 次提交
  12. 10 10月, 2008 1 次提交
  13. 09 10月, 2008 1 次提交
    • M
      vfs: vfs-level fiemap interface · c4b929b8
      Mark Fasheh 提交于
      Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
      inode operation.
      
      Userspace can get extent information on a file via fiemap ioctl. As input,
      the fiemap ioctl takes a struct fiemap which includes an array of struct
      fiemap_extent (fm_extents). Size of the extent array is passed as
      fm_extent_count and number of extents returned will be written into
      fm_mapped_extents. Offset and length fields on the fiemap structure
      (fm_start, fm_length) describe a logical range which will be searched for
      extents. All extents returned will at least partially contain this range.
      The actual extent offsets and ranges returned will be unmodified from their
      offset and range on-disk.
      
      The fiemap ioctl returns '0' on success. On error, -1 is returned and errno
      is set. If errno is equal to EBADR, then fm_flags will contain those flags
      which were passed in which the kernel did not understand. On all other
      errors, the contents of fm_extents is undefined.
      
      As fiemap evolved, there have been many authors of the vfs patch. As far as
      I can tell, the list includes:
      Kalpak Shah <kalpak.shah@sun.com>
      Andreas Dilger <adilger@sun.com>
      Eric Sandeen <sandeen@redhat.com>
      Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      c4b929b8
  14. 10 10月, 2008 2 次提交
  15. 30 9月, 2008 2 次提交
    • A
      UBIFS: add no_chk_data_crc mount option · 2953e73f
      Adrian Hunter 提交于
      UBIFS read performance can be improved by skipping the CRC
      check when data nodes are read.  This option can be used if
      the underlying media is considered to be highly reliable.
      Note that CRCs are always checked for metadata.
      
      Read speed on Arm platform with OneNAND goes from 19 MiB/s
      to 27 MiB/s with data CRC checking disabled.
      Signed-off-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
      2953e73f
    • A
      UBIFS: add bulk-read facility · 4793e7c5
      Adrian Hunter 提交于
      Some flash media are capable of reading sequentially at faster rates.
      UBIFS bulk-read facility is designed to take advantage of that, by
      reading in one go consecutive data nodes that are also located
      consecutively in the same LEB.
      
      Read speed on Arm platform with OneNAND goes from 17 MiB/s to
      19 MiB/s.
      Signed-off-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
      4793e7c5
  16. 14 9月, 2008 1 次提交
  17. 10 9月, 2008 1 次提交
  18. 03 9月, 2008 2 次提交
  19. 13 8月, 2008 3 次提交
  20. 01 8月, 2008 1 次提交
    • J
      [PATCH] configfs: Convenience macros for attribute definition. · ecb3d28c
      Joel Becker 提交于
      Sysfs has the _ATTR() and _ATTR_RO() macros to make defining extended
      form attributes easier.  configfs should have something similiar.
      
      - _CONFIGFS_ATTR() and _CONFIGFS_ATTR_RO() are the counterparts to the
        sysfs macros.
      - CONFIGFS_ATTR_STRUCT() creates the extended form attribute structure.
      - CONFIGFS_ATTR_OPS() defines the show_attribute()/store_attribute()
        operations that call the show()/store() operations of the extended
        form configfs_attributes.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ecb3d28c
  21. 28 7月, 2008 1 次提交
  22. 27 7月, 2008 2 次提交