1. 08 2月, 2008 3 次提交
    • B
      Memory controller: make charging gfp mask aware · e1a1cd59
      Balbir Singh 提交于
      Nick Piggin pointed out that swap cache and page cache addition routines
      could be called from non GFP_KERNEL contexts.  This patch makes the
      charging routine aware of the gfp context.  Charging might fail if the
      cgroup is over it's limit, in which case a suitable error is returned.
      
      This patch was tested on a Powerpc box.  I am still looking at being able
      to test the path, through which allocations happen in non GFP_KERNEL
      contexts.
      
      [kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1a1cd59
    • B
      Memory controller: add switch to control what type of pages to limit · 8697d331
      Balbir Singh 提交于
      Choose if we want cached pages to be accounted or not.  By default both are
      accounted for.  A new set of tunables are added.
      
      echo -n 1 > mem_control_type
      
      switches the accounting to account for only mapped pages
      
      echo -n 3 > mem_control_type
      
      switches the behaviour back
      
      [bunk@kernel.org: mm/memcontrol.c: clenups]
      [akpm@linux-foundation.org: fix sparc32 build]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8697d331
    • B
      Memory controller: memory accounting · 8a9f3ccd
      Balbir Singh 提交于
      Add the accounting hooks.  The accounting is carried out for RSS and Page
      Cache (unmapped) pages.  There is now a common limit and accounting for both.
      The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
      time.  Page cache is accounted at add_to_page_cache(),
      __delete_from_page_cache().  Swap cache is also accounted for.
      
      Each page's page_cgroup is protected with the last bit of the
      page_cgroup pointer, this makes handling of race conditions involving
      simultaneous mappings of a page easier.  A reference count is kept in the
      page_cgroup to deal with cases where a page might be unmapped from the RSS
      of all tasks, but still lives in the page cache.
      
      Credits go to Vaidyanathan Srinivasan for helping with reference counting work
      of the page cgroup.  Almost all of the page cache accounting code has help
      from Vaidyanathan Srinivasan.
      
      [hugh@veritas.com: fix swapoff breakage]
      [akpm@linux-foundation.org: fix locking]
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a9f3ccd
  2. 06 2月, 2008 2 次提交
    • H
      mm: remove fastcall from mm/ · 920c7a5d
      Harvey Harrison 提交于
      fastcall is always defined to be empty, remove it
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      920c7a5d
    • N
      radix-tree: avoid atomic allocations for preloaded insertions · e2848a0e
      Nick Piggin 提交于
      Most pagecache (and some other) radix tree insertions have the great
      opportunity to preallocate a few nodes with relaxed gfp flags.  But the
      preallocation is squandered when it comes time to allocate a node, we
      default to first attempting a GFP_ATOMIC allocation -- that doesn't
      normally fail, but it can eat into atomic memory reserves that we don't
      need to be using.
      
      Another upshot of this is that it removes the sometimes highly contended
      zone->lock from underneath tree_lock.  Pagecache insertions are always
      performed with a radix tree preload, and after this change, such a
      situation will never fall back to kmem_cache_alloc within
      radix_tree_node_alloc.
      
      David Miller reports seeing this allocation fail on a highly threaded
      sparc64 system:
      
      [527319.459981] dd: page allocation failure. order:0, mode:0x20
      [527319.460403] Call Trace:
      [527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
      [527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
      [527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
      [527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
      [527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
      [527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
      [527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
      [527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
      [527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
      [527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
      [527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
      [527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
      [527319.461361]  [00000000004bb920] sys_read+0x34/0x60
      [527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40
      
      The calltrace is significant: __do_page_cache_readahead allocates a number
      of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
      memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
      goes to mpage_readpages, there can be significant intervals (including disk
      IO) before all the pages are inserted into the radix-tree.  So the reserves
      can easily be depleted at that point.  The patch is confirmed to fix the
      problem.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2848a0e
  3. 03 2月, 2008 1 次提交
    • N
      fix writev regression: pan hanging unkillable and un-straceable · 124d3b70
      Nick Piggin 提交于
      Frederik Himpe reported an unkillable and un-straceable pan process.
      
      Zero length iovecs can go into an infinite loop in writev, because the
      iovec iterator does not always advance over them.
      
      The sequence required to trigger this is not trivial. I think it
      requires that a zero-length iovec be followed by a non-zero-length iovec
      which causes a pagefault in the atomic usercopy. This causes the writev
      code to drop back into single-segment copy mode, which then tries to
      copy the 0 bytes of the zero-length iovec; a zero length copy looks like
      a failure though, so it loops.
      
      Put a test into iov_iter_advance to catch zero-length iovecs. We could
      just put the test in the fallback path, but I feel it is more robust to
      skip over zero-length iovecs throughout the code (iovec iterator may be
      used in filesystems too, so it should be robust).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      124d3b70
  4. 20 12月, 2007 1 次提交
  5. 07 12月, 2007 2 次提交
  6. 01 11月, 2007 1 次提交
    • L
      Remove broken ptrace() special-case code from file mapping · 5307cc1a
      Linus Torvalds 提交于
      The kernel has for random historical reasons allowed ptrace() accesses
      to access (and insert) pages into the page cache above the size of the
      file.
      
      However, Nick broke that by mistake when doing the new fault handling in
      commit 54cb8821 ("mm: merge populate and
      nopage into fault (fixes nonlinear)".  The breakage caused a hang with
      gdb when trying to access the invalid page.
      
      The ptrace "feature" really isn't worth resurrecting, since it really is
      wrong both from a portability _and_ from an internal page cache validity
      standpoint.  So this removes those old broken remnants, and fixes the
      ptrace() hang in the process.
      
      Noticed and bisected by Duane Griffin, who also supplied a test-case
      (quoth Nick: "Well that's probably the best bug report I've ever had,
      thanks Duane!").
      
      Cc: Duane Griffin <duaneg@dghda.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5307cc1a
  7. 31 10月, 2007 1 次提交
    • Z
      dio: fix cache invalidation after sync writes · bdb76ef5
      Zach Brown 提交于
      Commit commit 65b8291c ("dio: invalidate
      clean pages before dio write") introduced a bug which stopped dio from
      ever invalidating the page cache after writes.  It still invalidated it
      before writes so most users were fine.
      
      Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
      this bug when he had a buffered reader immediately reading file data
      after an O_DIRECT wirter had written the data.  The kernel issued
      read-ahead beyond the position of the reader which overlapped with the
      O_DIRECT writer.  The failure to invalidate after writes caused the
      reader to see stale data from the read-ahead.
      
      The following patch is originally from Karl.  The following commentary
      is his:
      
      	The below 3rd try takes on your suggestion of just invalidating
      	no matter what the retval from the direct_IO call.  I ran it
      	thru the test-case several times and it has worked every time.
      	The post-invalidate is probably still too early for async-directio,
      	but I don't have a testcase for that;  just sync.  And, this
      	won't be any worse in the async case.
      
      I added a test to the aio-dio-regress repository which mimics Karl's IO
      pattern.  It verifed the bad behaviour and that the patch fixed it.  I
      agree with Karl, this still doesn't help the case where a buffered
      reader follows an AIO O_DIRECT writer.  That will require a bit more
      work.
      
      This gives up on the idea of returning EIO to indicate to userspace that
      stale data remains if the invalidation failed.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Cc: Karl Schendel <kschendel@datallegro.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdb76ef5
  8. 29 10月, 2007 1 次提交
    • E
      Fix a build error when BLOCK=n · 3a424f2d
      Emil Medve 提交于
      mm/filemap.c: In function '__filemap_fdatawrite_range':
      mm/filemap.c:200: error: implicit declaration of function
      'mapping_cap_writeback_dirty'
      
      This happens when we don't use/have any block devices and a NFS root
      filesystem is used.
      
      mapping_cap_writeback_dirty() is defined in linux/backing-dev.h which
      used to be provided in mm/filemap.c by linux/blkdev.h until commit
      f5ff8422 (Fix warnings with
      !CONFIG_BLOCK).
      Signed-off-by: NEmil Medve <Emilian.Medve@Freescale.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3a424f2d
  9. 20 10月, 2007 1 次提交
    • R
      kernel-api docbook: fix content problems · 8f731f7d
      Randy Dunlap 提交于
      Fix kernel-api docbook contents problems.
      
      docproc: linux-2.6.23-git13/include/asm-x86/unaligned_32.h: No such file or directory
      Warning(linux-2.6.23-git13//include/linux/list.h:482): bad line: 			of list entry
      Warning(linux-2.6.23-git13//mm/filemap.c:864): No description found for parameter 'ra'
      Warning(linux-2.6.23-git13//block/ll_rw_blk.c:3760): No description found for parameter 'req'
      Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'private'
      Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'cdev'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f731f7d
  10. 19 10月, 2007 1 次提交
  11. 17 10月, 2007 22 次提交
  12. 09 10月, 2007 1 次提交
  13. 12 8月, 2007 2 次提交
  14. 01 8月, 2007 1 次提交