1. 24 3月, 2006 40 次提交
    • C
      [PATCH] deprecate the kernel_thread export · ac515898
      Christoph Hellwig 提交于
      Announce that the kernel_thread export will be removed in half a year,
      after all it's users have been converted to the kthread_ API, which I plan
      to do over the next month.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac515898
    • K
      [PATCH] ide: Allow IDE interface to specify its not capable of 32-bit operations · 208a08f7
      Kumar Gala 提交于
      In some embedded systems the IDE hardware interface may only support 16-bit
      or smaller accesses.  Allow the interface to specify if this is the case
      and don't allow the drive or user to override the setting.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Acked-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      208a08f7
    • A
      [PATCH] show MCP menu only on ARCH_SA1100 · f751d50f
      Adrian Bunk 提交于
      On architectures like i386, the "Multimedia Capabilities Port drivers" menu is
      visible, but it can't be visited since it contains nothing usable for
      !ARCH_SA1100.
      
      This patch therefore shows this menu only on ARCH_SA1100.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f751d50f
    • K
      [PATCH] Conditionalize compat_sys_newfstatat · 82d821dd
      Kyle McMartin 提交于
      If we don't want sys_newfstatat because __ARCH_WANT_STAT64 is defined, then
      we certainly don't want compat_sys_newfstatat either.
      Signed-off-by: NGrant Grundler <grundler@parisc-linux.org>
      Signed-off-by: NKyle McMartin <kyle@parisc-linux.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      82d821dd
    • J
      [PATCH] console_setup() depends (wrongly?) on CONFIG_PRINTK · 2ea1c539
      John Z. Bohach 提交于
      It appears that console_setup() code only gets compiled into the kernel if
      CONFIG_PRINTK is enabled.  One detrimental side-effect of this is that
      serial8250_console_setup() never gets invoked when CONFIG_PRINTK is not
      set, resulting in baud rate not being read/parsed from command line (i.e.
      console=ttyS0,115200n8 is ignored, at least the baud rate part...)
      
      Attached patch moves console_setup() code from inside
      
      #ifdef CONFIG_PRINTK
      
      to outside (in printk.c), removing dependence on said config. option.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2ea1c539
    • N
      [PATCH] Updated Documentation/nfsroot.txt · 7e9dd124
      Nico Schottelius 提交于
      I today booted the first time my embedded device using Linux 2.6.15.2,
      which was booted by pxelinux, which then bootet itself from the nfsroot.
      
      This went pretty fine, but when I was reading through
      Documentation/nfsroot.txt I saw that there are some more modern versions
      available of loading the kernel and passing parameters.
      Signed-off-by: NNico Schottelius <nico-kernel@schottelius.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7e9dd124
    • P
      [PATCH] mmc: Secure Digital Host Controller Interface driver · d129bceb
      Pierre Ossman 提交于
      Driver for the Secure Digital Host Controller Interface specification.
      Signed-off-by: NPierre Ossman <drzeus@drzeus.cx>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d129bceb
    • P
      [PATCH] Secure Digital Host Controller id and regs · 97f2478d
      Pierre Ossman 提交于
      Class code and register definitions for the Secure Digital Host Controller
      standard.
      Signed-off-by: NPierre Ossman <drzeus@drzeus.cx>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      97f2478d
    • A
      [PATCH] msync(): use do_fsync() · 8f2e9f15
      Andrew Morton 提交于
      No need to duplicate all that code.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f2e9f15
    • A
      [PATCH] fsync: extract internal code · 18e79b40
      Andrew Morton 提交于
      Pull the guts out of do_fsync() - we can use it elsewhere.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      18e79b40
    • A
      [PATCH] msync: fix return value · 676758bd
      Andrew Morton 提交于
      msync() does a strange thing.  Essentially:
      
      	vma = find_vma();
      	for ( ; ; ) {
      		if (!vma)
      			return -ENOMEM;
      		...
      		vma = vma->vm_next;
      	}
      
      so an msync() request which starts within or before a valid VMA and which ends
      within or beyond the final VMA will incorrectly return -ENOMEM.
      
      Fix.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      676758bd
    • A
      [PATCH] msync(MS_SYNC): don't hold mmap_sem while syncing · 707c21c8
      Andrew Morton 提交于
      It seems bad to hold mmap_sem while performing synchronous disk I/O.  Alter
      the msync(MS_SYNC) code so that the lock is released while we sync the file.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      707c21c8
    • A
      [PATCH] msync(): perform dirty page levelling · 9c50823e
      Andrew Morton 提交于
      It seems sensible to perform dirty page throttling in msync: as the application
      dirties pages we can kick off pdflush early, or even force the msync() caller
      to perform writeout, or even throttle the msync() caller.
      
      The main effect of this is to start disk writeback earlier if we've just
      discovered that a large amount of pagecache has been dirtied.  (Otherwise it
      wouldn't happen for up to five seconds, next time pdflush wakes up).
      
      It also will cause the page-dirtying process to get panalised for dirtying
      those pages rather than whacking someone else with the problem.
      
      We should do this for munmap() and possibly even exit(), too.
      
      We drop the mmap_sem while performing the dirty page balancing.  It doesn't
      seem right to hold mmap_sem for that long.
      
      Note that this patch only affects MS_ASYNC.  MS_SYNC will be syncing all the
      dirty pages anyway.
      
      We note that msync(MS_SYNC) does a full-file-sync inside mmap_sem, and always
      has.  We can fix that up...
      
      The patch also tightens up the mmap_sem coverage in sys_msync(): no point in
      taking it while we perform the incoming arg checking.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9c50823e
    • A
      [PATCH] set_page_dirty() return value fixes · 4741c9fd
      Andrew Morton 提交于
      We need set_page_dirty() to return true if it actually transitioned the page
      from a clean to dirty state.  This wasn't right in a couple of places.  Do a
      kernel-wide audit, fix things up.
      
      This leaves open the possibility of returning a negative errno from
      set_page_dirty() sometime in the future.  But we don't do that at present.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4741c9fd
    • A
      [PATCH] balance_dirty_pages_ratelimited: take nr_pages arg · fa5a734e
      Andrew Morton 提交于
      Modify balance_dirty_pages_ratelimited() so that it can take a
      number-of-pages-which-I-just-dirtied argument.  For msync().
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa5a734e
    • E
      [PATCH] HOTPLUG_CPU: avoid hitting too many cachelines in recalc_bh_state() · 8a143426
      Eric Dumazet 提交于
      Instead of using for_each_cpu(i), we can use for_each_online_cpu(i).
      
      When a CPU goes offline (ie removed from online map), it might have a non
      null bh_accounting.nr, so this patch adds a transfer of this counter to an
      online CPU counter.
      
      We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
      can do this bh_accounting.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8a143426
    • A
      [PATCH] sound: remove PC98-specific OPL3_HW_OPL3_PC98 · 2ecb9e63
      Arthur Othieno 提交于
      OPL3_HW_OPL3_PC98 #define isn't used anywhere; previously in
      sound/drivers/opl3/opl3_lib.c and sound/isa/cs423x/pc98.c, the latter of which
      went away with the rest of PC98 subarch.
      Signed-off-by: NArthur Othieno <apgo@patchbomb.org>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2ecb9e63
    • A
      [PATCH] block: floppy98 removal, really. · 453ae933
      Arthur Othieno 提交于
      floppy98 went out together with the rest of PC98 subarch.  Remove stale
      Makefile entry that remained.
      Signed-off-by: NArthur Othieno <apgo@patchbomb.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      453ae933
    • H
      [PATCH] shmdt: check address alignment · df1e2fb5
      Hugh Dickins 提交于
      SUSv3 says the shmdt() function shall fail with EINVAL if the value of
      shmaddr is not the data segment start address of a shared memory segment:
      our sys_shmdt needs to reject a shmaddr which is not page-aligned.
      
      Does it have the potential to break existing apps?
      
      Hugh says
      
        "sys_shmdt() just does the wrong (unexpected) thing with a misaligned
        address: it'll fail on what you might expect it to succeed on, and only
        succeed on what it should definitely fail on.
      
        "That is, I think it behaves as if shmaddr gets rounded up, when the only
        understandable behaviour would be if it rounded it down.
      
        "Which does mean you'd have to be devious to see anything but EINVAL from
        a misaligned shmaddr there, so it's not terribly important."
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df1e2fb5
    • C
      [PATCH] sb_set_blocksize cleanup · 38885bd4
      Coywolf Qi Hunt 提交于
      sb_set_blocksize() cleanup: make sb_set_blocksize() use blksize_bits().
      Signed-off-by: NCoywolf Qi Hunt <qiyong@fc-cn.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      38885bd4
    • R
      [PATCH] early_printk: cleanup trailiing whitespace · a94ddf3a
      Randy Dunlap 提交于
      Remove all trailing tabs and spaces.  No other changes.
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a94ddf3a
    • A
      [PATCH] fadvise(): write commands · ebcf28e1
      Andrew Morton 提交于
      Add two new linux-specific fadvise extensions():
      
      LINUX_FADV_ASYNC_WRITE: start async writeout of any dirty pages between file
      offsets `offset' and `offset+len'.  Any pages which are currently under
      writeout are skipped, whether or not they are dirty.
      
      LINUX_FADV_WRITE_WAIT: wait upon writeout of any dirty pages between file
      offsets `offset' and `offset+len'.
      
      By combining these two operations the application may do several things:
      
      LINUX_FADV_ASYNC_WRITE: push some or all of the dirty pages at the disk.
      
      LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE: push all of the currently dirty
      pages at the disk.
      
      LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE, LINUX_FADV_WRITE_WAIT: push all
      of the currently dirty pages at the disk, wait until they have been written.
      
      It should be noted that none of these operations write out the file's
      metadata.  So unless the application is strictly performing overwrites of
      already-instantiated disk blocks, there are no guarantees here that the data
      will be available after a crash.
      
      To complete this suite of operations I guess we should have a "sync file
      metadata only" operation.  This gives applications access to all the building
      blocks needed for all sorts of sync operations.  But sync-metadata doesn't fit
      well with the fadvise() interface.  Probably it should be a new syscall:
      sys_fmetadatasync().
      
      The patch also diddles with the meaning of `endbyte' in sys_fadvise64_64().
      It is made to represent that last affected byte in the file (ie: it is
      inclusive).  Generally, all these byterange and pagerange functions are
      inclusive so we can easily represent EOF with -1.
      
      As Ulrich notes, these two functions are somewhat abusive of the fadvise()
      concept, which appears to be "set the future policy for this fd".
      
      But these commands are a perfect fit with the fadvise() impementation, and
      several of the existing fadvise() commands are synchronous and don't affect
      future policy either.   I think we can live with the slight incongruity.
      
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ebcf28e1
    • A
      [PATCH] filemap_fdatawrite_range() api: clarify -end parameter · 469eb4d0
      Andrew Morton 提交于
      I had trouble understanding working out whether filemap_fdatawrite_range()'s
      `end' parameter describes the last-byte-to-be-written or the last-plus-one.
      Clarify that in comments.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      469eb4d0
    • J
      [PATCH] CONFIG_UNWIND_INFO · 604bf5a2
      Jan Beulich 提交于
      As a foundation for reliable stack unwinding, this adds a config option
      (available to all architectures except IA64 and those where the module
      loader might have problems with the resulting relocations) to enable the
      generation of frame unwind information.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Paul Mundt <lethal@linux-sh.org>,
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      604bf5a2
    • J
      [PATCH] abstract type/size specification for assembly · ab7efcc9
      Jan Beulich 提交于
      Provide abstraction for generating type and size information of assembly
      routines and data, while permitting architectures to override these
      defaults.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: "Russell King" <rmk@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "Andi Kleen" <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab7efcc9
    • A
      [PATCH] fast ext3_statfs · 09fe316a
      Alex Tomas 提交于
      Under I/O load it may take up to a dozen seconds to read all group
      descriptors.  This is what ext3_statfs() does.  At the same time, we already
      maintain global numbers of free inodes/blocks.  Why don't we use them instead
      of group reading and summing?
      
      Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      09fe316a
    • O
      [PATCH] remove ipmi pm_power_off redefinition · e933b6d6
      Olaf Hering 提交于
      Use the global define of pm_power_off
      Signed-off-by: NOlaf Hering <olh@suse.de>
      Cc: Corey Minyard <minyard@acm.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e933b6d6
    • P
      [PATCH] isofs: remove unused debugging macros · 5b3cf3e0
      Pekka Enberg 提交于
      Remove unused debugging macros from isofs.  The referred debug functions do
      not exist in the kernel.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b3cf3e0
    • A
      [PATCH] s/;;/;/g · 53b3531b
      Alexey Dobriyan 提交于
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      53b3531b
    • P
      [PATCH] cpuset: remove useless local variable initialization · 29afd49b
      Paul Jackson 提交于
      Remove a useless variable initialization in cpuset __cpuset_zone_allowed().
       The local variable 'allowed' is unconditionally set before use, later on
      in the code, so does not need to be initialized.
      
      Not that it seems to matter to the code generated any, as the compiler
      optimizes out the superfluous assignment anyway.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      29afd49b
    • P
      [PATCH] cpuset: memory_spread_slab drop useless PF_SPREAD_PAGE check · b2455396
      Paul Jackson 提交于
      The hook in the slab cache allocation path to handle cpuset memory
      spreading for tasks in cpusets with 'memory_spread_slab' enabled has a
      modest performance bug.  The hook calls into the memory spreading handler
      alternate_node_alloc() if either of 'memory_spread_slab' or
      'memory_spread_page' is enabled, even though the handler does nothing
      (albeit harmlessly) for the page case
      
      Fix - drop PF_SPREAD_PAGE from the set of flag bits that are used to
      trigger a call to alternate_node_alloc().
      
      The page case is handled by separate hooks -- see the calls conditioned on
      cpuset_do_page_mem_spread() in mm/filemap.c
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b2455396
    • P
      [PATCH] cpuset: don't need to mark cpuset_mems_generation atomic · 151a4420
      Paul Jackson 提交于
      Drop the atomic_t marking on the cpuset static global
      cpuset_mems_generation.  Since all access to it is guarded by the global
      manage_mutex, there is no need for further serialization of this value.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      151a4420
    • P
      [PATCH] cpuset: remove unnecessary NULL check · 8488bc35
      Paul Jackson 提交于
      Remove a no longer needed test for NULL cpuset pointer, with a little
      comment explaining why the test isn't needed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8488bc35
    • P
      [PATCH] cpuset memory spread slab cache hooks · b0196009
      Paul Jackson 提交于
      Change the kmem_cache_create calls for certain slab caches to support cpuset
      memory spreading.
      
      See the previous patches, cpuset_mem_spread, for an explanation of cpuset
      memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
      for memory spreading.
      
      The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
      caches, and buffer_head.  This list may change over time.  In particular,
      other file system types that are used extensively on large NUMA systems may
      want to allow for spreading their directory and inode slab cache entries.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b0196009
    • P
      [PATCH] cpuset memory spread slab cache optimizations · c61afb18
      Paul Jackson 提交于
      The hooks in the slab cache allocator code path for support of NUMA
      mempolicies and cpuset memory spreading are in an important code path.  Many
      systems will use neither feature.
      
      This patch optimizes those hooks down to a single check of some bits in the
      current tasks task_struct flags.  For non NUMA systems, this hook and related
      code is already ifdef'd out.
      
      The optimization is done by using another task flag, set if the task is using
      a non-default NUMA mempolicy.  Taking this flag bit along with the
      PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
      memory spreading' patch set, one can check for the combination of any of these
      special case memory placement mechanisms with a single test of the current
      tasks task_struct flags.
      
      This patch also tightens up the code, to save a few bytes of kernel text
      space, and moves some of it out of line.  Due to the nested inlines called
      from multiple places, we were ending up with three copies of this code, which
      once we get off the main code path (for local node allocation) seems a bit
      wasteful of instruction memory.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c61afb18
    • P
      [PATCH] cpuset memory spread slab cache implementation · 101a5001
      Paul Jackson 提交于
      Provide the slab cache infrastructure to support cpuset memory spreading.
      
      See the previous patches, cpuset_mem_spread, for an explanation of cpuset
      memory spreading.
      
      This patch provides a slab cache SLAB_MEM_SPREAD flag.  If set in the
      kmem_cache_create() call defining a slab cache, then any task marked with the
      process state flag PF_MEMSPREAD will spread memory page allocations for that
      cache over all the allowed nodes, instead of preferring the local (faulting)
      node.
      
      On systems not configured with CONFIG_NUMA, this results in no change to the
      page allocation code path for slab caches.
      
      On systems with cpusets configured in the kernel, but the "memory_spread"
      cpuset option not enabled for the current tasks cpuset, this adds a call to a
      cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.
      
      For tasks so marked, a second inline test is done for the slab cache flag
      SLAB_MEM_SPREAD, and if that is set and if the allocation is not
      in_interrupt(), this adds a call to to a cpuset routine that computes which of
      the tasks mems_allowed nodes should be preferred for this allocation.
      
      ==> This patch adds another hook into the performance critical
          code path to allocating objects from the slab cache, in the
          ____cache_alloc() chunk, below.  The next patch optimizes this
          hook, reducing the impact of the combined mempolicy plus memory
          spreading hooks on this critical code path to a single check
          against the tasks task_struct flags word.
      
      This patch provides the generic slab flags and logic needed to apply memory
      spreading to a particular slab.
      
      A subsequent patch will mark a few specific slab caches for this placement
      policy.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      101a5001
    • P
      [PATCH] cpuset memory spread: slab cache format · fffb60f9
      Paul Jackson 提交于
      Rewrap the overly long source code lines resulting from the previous
      patch's addition of the slab cache flag SLAB_MEM_SPREAD.  This patch
      contains only formatting changes, and no function change.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fffb60f9
    • P
      [PATCH] cpuset memory spread: slab cache filesystems · 4b6a9316
      Paul Jackson 提交于
      Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
      memory spreading.
      
      If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
      in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
      from such a slab cache, the allocations are spread evenly over all the
      memory nodes (task->mems_allowed) allowed to that task, instead of favoring
      allocation on the node local to the current cpu.
      
      The following inode and similar caches are marked SLAB_MEM_SPREAD:
      
          file                               cache
          ====                               =====
          fs/adfs/super.c                    adfs_inode_cache
          fs/affs/super.c                    affs_inode_cache
          fs/befs/linuxvfs.c                 befs_inode_cache
          fs/bfs/inode.c                     bfs_inode_cache
          fs/block_dev.c                     bdev_cache
          fs/cifs/cifsfs.c                   cifs_inode_cache
          fs/coda/inode.c                    coda_inode_cache
          fs/dquot.c                         dquot
          fs/efs/super.c                     efs_inode_cache
          fs/ext2/super.c                    ext2_inode_cache
          fs/ext2/xattr.c (fs/mbcache.c)     ext2_xattr
          fs/ext3/super.c                    ext3_inode_cache
          fs/ext3/xattr.c (fs/mbcache.c)     ext3_xattr
          fs/fat/cache.c                     fat_cache
          fs/fat/inode.c                     fat_inode_cache
          fs/freevxfs/vxfs_super.c           vxfs_inode
          fs/hpfs/super.c                    hpfs_inode_cache
          fs/isofs/inode.c                   isofs_inode_cache
          fs/jffs/inode-v23.c                jffs_fm
          fs/jffs2/super.c                   jffs2_i
          fs/jfs/super.c                     jfs_ip
          fs/minix/inode.c                   minix_inode_cache
          fs/ncpfs/inode.c                   ncp_inode_cache
          fs/nfs/direct.c                    nfs_direct_cache
          fs/nfs/inode.c                     nfs_inode_cache
          fs/ntfs/super.c                    ntfs_big_inode_cache_name
          fs/ntfs/super.c                    ntfs_inode_cache
          fs/ocfs2/dlm/dlmfs.c               dlmfs_inode_cache
          fs/ocfs2/super.c                   ocfs2_inode_cache
          fs/proc/inode.c                    proc_inode_cache
          fs/qnx4/inode.c                    qnx4_inode_cache
          fs/reiserfs/super.c                reiser_inode_cache
          fs/romfs/inode.c                   romfs_inode_cache
          fs/smbfs/inode.c                   smb_inode_cache
          fs/sysv/inode.c                    sysv_inode_cache
          fs/udf/super.c                     udf_inode_cache
          fs/ufs/super.c                     ufs_inode_cache
          net/socket.c                       sock_inode_cache
          net/sunrpc/rpc_pipe.c              rpc_inode_cache
      
      The choice of which slab caches to so mark was quite simple.  I marked
      those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
      inode_cache, and buffer_head, which were marked in a previous patch.  Even
      though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
      potentially large file system i/o related slab caches as we need for memory
      spreading.
      
      Given that the rule now becomes "wherever you would have used a
      SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
      the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
      Future file system writers will just copy one of the existing file system
      slab cache setups and tend to get it right without thinking.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4b6a9316
    • P
      [PATCH] cpuset memory spread page cache implementation and hooks · 44110fe3
      Paul Jackson 提交于
      Change the page cache allocation calls to support cpuset memory spreading.
      
      See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
      spreading.
      
      On systems without cpusets configured in the kernel, this is no change.
      
      On systems with cpusets configured in the kernel, but the "memory_spread"
      cpuset option not enabled for the current tasks cpuset, this adds a call to a
      cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.
      
      On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
      routine that computes which of the tasks mems_allowed nodes should be
      preferred for this allocation.
      
      If memory spreading applies to a particular allocation, then any other NUMA
      mempolicy does not apply.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      44110fe3
    • P
      [PATCH] cpuset memory spread basic implementation · 825a46af
      Paul Jackson 提交于
      This patch provides the implementation and cpuset interface for an alternative
      memory allocation policy that can be applied to certain kinds of memory
      allocations, such as the page cache (file system buffers) and some slab caches
      (such as inode caches).
      
      The policy is called "memory spreading." If enabled, it spreads out these
      kinds of memory allocations over all the nodes allowed to a task, instead of
      preferring to place them on the node where the task is executing.
      
      All other kinds of allocations, including anonymous pages for a tasks stack
      and data regions, are not affected by this policy choice, and continue to be
      allocated preferring the node local to execution, as modified by the NUMA
      mempolicy.
      
      There are two boolean flag files per cpuset that control where the kernel
      allocates pages for the file system buffers and related in kernel data
      structures.  They are called 'memory_spread_page' and 'memory_spread_slab'.
      
      If the per-cpuset boolean flag file 'memory_spread_page' is set, then the
      kernel will spread the file system buffers (page cache) evenly over all the
      nodes that the faulting task is allowed to use, instead of preferring to put
      those pages on the node where the task is running.
      
      If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the
      kernel will spread some file system related slab caches, such as for inodes
      and dentries evenly over all the nodes that the faulting task is allowed to
      use, instead of preferring to put those pages on the node where the task is
      running.
      
      The implementation is simple.  Setting the cpuset flags 'memory_spread_page'
      or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or
      PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or
      subsequently joins that cpuset.  In subsequent patches, the page allocation
      calls for the affected page cache and slab caches are modified to perform an
      inline check for these flags, and if set, a call to a new routine
      cpuset_mem_spread_node() returns the node to prefer for the allocation.
      
      The cpuset_mem_spread_node() routine is also simple.  It uses the value of a
      per-task rotor cpuset_mem_spread_rotor to select the next node in the current
      tasks mems_allowed to prefer for the allocation.
      
      This policy can provide substantial improvements for jobs that need to place
      thread local data on the corresponding node, but that need to access large
      file system data sets that need to be spread across the several nodes in the
      jobs cpuset in order to fit.  Without this patch, especially for jobs that
      might have one thread reading in the data set, the memory allocation across
      the nodes in the jobs cpuset can become very uneven.
      
      A couple of Copyright year ranges are updated as well.  And a couple of email
      addresses that can be found in the MAINTAINERS file are removed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      825a46af