1. 23 8月, 2007 5 次提交
    • S
      slab: skip calling cache_free_alien() when the platform is not numa capable · 1807a1aa
      Siddha, Suresh B 提交于
      Skip calling cache_free_alien() when the platform is not numa capable.
      This will avoid cache misses that happen while accessing slabp (which is
      per page memory reference) to get nodeid.  Instead use a global variable to
      skip the call, which is mostly likely to be present in the cache.
      
      This gives a 0.8% performance boost with the database oltp workload on a
      quad-core SMP platform and by any means the number is not small :)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1807a1aa
    • A
      fix NULL pointer dereference in __vm_enough_memory() · 34b4e4aa
      Alan Cox 提交于
      The new exec code inserts an accounted vma into an mm struct which is not
      current->mm.  The existing memory check code has a hard coded assumption
      that this does not happen as does the security code.
      
      As the correct mm is known we pass the mm to the security method and the
      helper function.  A new security test is added for the case where we need
      to pass the mm and the existing one is modified to pass current->mm to
      avoid the need to change large amounts of code.
      
      (Thanks to Tobias for fixing rejects and testing)
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Cc: James Morris <jmorris@redhat.com>
      Cc: Tobias Diedrich <ranma+kernel@tdiedrich.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34b4e4aa
    • A
      synchronous lumpy reclaim: wait for page writeback when directly reclaiming contiguous areas · c661b078
      Andy Whitcroft 提交于
      Lumpy reclaim works by selecting a lead page from the LRU list and then
      selecting pages for reclaim from the order-aligned area of pages.  In the
      situation were all pages in that region are inactive and not referenced by any
      process over time, it works well.
      
      In the situation where there is even light load on the system, the pages may
      not free quickly.  Out of a area of 1024 pages, maybe only 950 of them are
      freed when the allocation attempt occurs because lumpy reclaim returned early.
       This patch alters the behaviour of direct reclaim for large contiguous
      blocks.
      
      The first attempt to call shrink_page_list() is asynchronous but if it fails,
      the pages are submitted a second time and the calling process waits for the IO
      to complete.  This may stall allocators waiting for contiguous memory but that
      should be expected behaviour for high-order users.  It is preferable behaviour
      to potentially queueing unnecessary areas for IO.  Note that kswapd will not
      stall in this fashion.
      
      [apw@shadowen.org: update to version 2]
      [apw@shadowen.org: update to version 3]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c661b078
    • A
      synchronous lumpy reclaim: ensure we count pages transitioning inactive via clear_active_flags · e9187bdc
      Andy Whitcroft 提交于
      As pointed out by Mel when reclaim is applied at higher orders a significant
      amount of IO may be started.  As this takes finite time to drain reclaim will
      consider more areas than ultimatly needed to satisfy the request.  This leads
      to more reclaim than strictly required and reduced success rates.
      
      I was able to confirm Mel's test results on systems locally.  These show that
      even under light load the success rates drop off far more than expected.
      Testing with a modified version of his patch (which follows) I was able to
      allocate almost all of ZONE_MOVABLE with a near idle system.  I ran 5 test
      passes sequentially following system boot (the system has 29 hugepages in
      ZONE_MOVABLE):
      
        2.6.23-rc1              11  8  6  7  7
        sync_lumpy              28 28 29 29 26
      
      These show that although hugely better than the near 0% success normally
      expected we can only allocate about a 1/4 of the zone.  Using synchronous
      reclaim for these allocations we get close to 100% as expected.
      
      I have also run our standard high order tests and these show no regressions in
      allocation success rates at rest, and some significant improvements under
      load.
      
      This patch:
      
      We are transitioning pages from active to inactive in clear_active_flags,
      those need counting as PGDEACTIVATE vm events.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9187bdc
    • A
      sparsemem: ensure we initialise the node mapping for SPARSEMEM_STATIC · 85770ffe
      Andy Whitcroft 提交于
      Booting SPARSEMEM on NUMA systems trips a BUG in page_alloc.c:
      
      	Initializing HighMem for node 0 (00038000:00100000)
      	Initializing HighMem for node 1 (00100000:001ffe00)
      	------------[ cut here ]------------
      	kernel BUG at /home/apw/git/linux-2.6/mm/page_alloc.c:456!
      	[...]
      
      This occurs because the section to node id mapping is not being
      setup correctly during init under SPARSEMEM_STATIC, leading to an
      attempt to free pages from all nodes into the zones on node 0.
      
      When the zone_table[] was removed in the following commit, a new
      section to node mapping table was introduced:
      
          commit 89689ae7
          [PATCH] Get rid of zone_table[]
      
      That conversion inadvertantly only initialised the node mapping in
      SPARSEMEM_EXTREME.  Ensure we initialise the node mapping in
      SPARSEMEM_STATIC.
      
      [akpm@linux-foundation.org: make the stubs static inline]
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85770ffe
  2. 12 8月, 2007 2 次提交
  3. 10 8月, 2007 2 次提交
    • C
      SLUB: Fix dynamic dma kmalloc cache creation · 1ceef402
      Christoph Lameter 提交于
      The dynamic dma kmalloc creation can run into trouble if a
      GFP_ATOMIC allocation is the first one performed for a certain size
      of dma kmalloc slab.
      
      - Move the adding of the slab to sysfs into a workqueue
        (sysfs does GFP_KERNEL allocations)
      - Do not call kmem_cache_destroy() (uses slub_lock)
      - Only acquire the slub_lock once and--if we cannot wait--do a trylock.
      
        This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
        for a range of sizes failing due to another process holding the slub_lock.
        However, we only need to acquire the spinlock once in order to establish
        each power of two DMA kmalloc cache. The possible conflict is with the
        slub_lock taken during slab management actions (create / remove slab cache).
      
        It is rather typical that a driver will first fill its buffers using
        GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
        Drivers will also create its slab caches first outside of an atomic
        context before starting to use atomic kmalloc from an interrupt context.
      
        If there are any failures then they will occur early after boot or when
        loading of multiple drivers concurrently. Drivers can already accomodate
        failures of GFP_ATOMIC for other reasons. Retries will then create the slab.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      1ceef402
    • C
      SLUB: Remove checks for MAX_PARTIAL from kmem_cache_shrink · fcda3d89
      Christoph Lameter 提交于
      The MAX_PARTIAL checks were supposed to be an optimization. However, slab
      shrinking is a manually triggered process either through running slabinfo
      or by the kernel calling kmem_cache_shrink.
      
      If one really wants to shrink a slab then all operations should be done
      regardless of the size of the partial list. This also fixes an issue that
      could surface if the number of partial slabs was initially above MAX_PARTIAL
      in kmem_cache_shrink and later drops below MAX_PARTIAL through the
      elimination of empty slabs on the partial list (rare). In that case a few
      slabs may be left off the partial list (and only be put back when they
      are empty).
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      fcda3d89
  4. 01 8月, 2007 3 次提交
  5. 31 7月, 2007 2 次提交
  6. 30 7月, 2007 3 次提交
    • A
      Remove fs.h from mm.h · 4e950f6f
      Alexey Dobriyan 提交于
      Remove fs.h from mm.h. For this,
       1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
       2) Add back fs.h or less bloated headers (err.h) to files that need it.
      
      As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
      rebuilt down to 3444 (-12.3%).
      
      Cross-compile tested without regressions on my two usual configs and (sigh):
      
      alpha              arm-mx1ads        mips-bigsur          powerpc-ebony
      alpha-allnoconfig  arm-neponset      mips-capcella        powerpc-g5
      alpha-defconfig    arm-netwinder     mips-cobalt          powerpc-holly
      alpha-up           arm-netx          mips-db1000          powerpc-iseries
      arm                arm-ns9xxx        mips-db1100          powerpc-linkstation
      arm-assabet        arm-omap_h2_1610  mips-db1200          powerpc-lite5200
      arm-at91rm9200dk   arm-onearm        mips-db1500          powerpc-maple
      arm-at91rm9200ek   arm-picotux200    mips-db1550          powerpc-mpc7448_hpc2
      arm-at91sam9260ek  arm-pleb          mips-ddb5477         powerpc-mpc8272_ads
      arm-at91sam9261ek  arm-pnx4008       mips-decstation      powerpc-mpc8313_rdb
      arm-at91sam9263ek  arm-pxa255-idp    mips-e55             powerpc-mpc832x_mds
      arm-at91sam9rlek   arm-realview      mips-emma2rh         powerpc-mpc832x_rdb
      arm-ateb9200       arm-realview-smp  mips-excite          powerpc-mpc834x_itx
      arm-badge4         arm-rpc           mips-fulong          powerpc-mpc834x_itxgp
      arm-carmeva        arm-s3c2410       mips-ip22            powerpc-mpc834x_mds
      arm-cerfcube       arm-shannon       mips-ip27            powerpc-mpc836x_mds
      arm-clps7500       arm-shark         mips-ip32            powerpc-mpc8540_ads
      arm-collie         arm-simpad        mips-jazz            powerpc-mpc8544_ds
      arm-corgi          arm-spitz         mips-jmr3927         powerpc-mpc8560_ads
      arm-csb337         arm-trizeps4      mips-malta           powerpc-mpc8568mds
      arm-csb637         arm-versatile     mips-mipssim         powerpc-mpc85xx_cds
      arm-ebsa110        i386              mips-mpc30x          powerpc-mpc8641_hpcn
      arm-edb7211        i386-allnoconfig  mips-msp71xx         powerpc-mpc866_ads
      arm-em_x270        i386-defconfig    mips-ocelot          powerpc-mpc885_ads
      arm-ep93xx         i386-up           mips-pb1100          powerpc-pasemi
      arm-footbridge     ia64              mips-pb1500          powerpc-pmac32
      arm-fortunet       ia64-allnoconfig  mips-pb1550          powerpc-ppc64
      arm-h3600          ia64-bigsur       mips-pnx8550-jbs     powerpc-prpmc2800
      arm-h7201          ia64-defconfig    mips-pnx8550-stb810  powerpc-ps3
      arm-h7202          ia64-gensparse    mips-qemu            powerpc-pseries
      arm-hackkit        ia64-sim          mips-rbhma4200       powerpc-up
      arm-integrator     ia64-sn2          mips-rbhma4500       s390
      arm-iop13xx        ia64-tiger        mips-rm200           s390-allnoconfig
      arm-iop32x         ia64-up           mips-sb1250-swarm    s390-defconfig
      arm-iop33x         ia64-zx1          mips-sead            s390-up
      arm-ixp2000        m68k              mips-tb0219          sparc
      arm-ixp23xx        m68k-amiga        mips-tb0226          sparc-allnoconfig
      arm-ixp4xx         m68k-apollo       mips-tb0287          sparc-defconfig
      arm-jornada720     m68k-atari        mips-workpad         sparc-up
      arm-kafa           m68k-bvme6000     mips-wrppmc          sparc64
      arm-kb9202         m68k-hp300        mips-yosemite        sparc64-allnoconfig
      arm-ks8695         m68k-mac          parisc               sparc64-defconfig
      arm-lart           m68k-mvme147      parisc-allnoconfig   sparc64-up
      arm-lpd270         m68k-mvme16x      parisc-defconfig     um-x86_64
      arm-lpd7a400       m68k-q40          parisc-up            x86_64
      arm-lpd7a404       m68k-sun3         powerpc              x86_64-allnoconfig
      arm-lubbock        m68k-sun3x        powerpc-cell         x86_64-defconfig
      arm-lusl7200       mips              powerpc-celleb       x86_64-up
      arm-mainstone      mips-atlas        powerpc-chrp32
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e950f6f
    • R
      Introduce CONFIG_SUSPEND for suspend-to-Ram and standby · 296699de
      Rafael J. Wysocki 提交于
      Introduce CONFIG_SUSPEND representing the ability to enter system sleep
      states, such as the ACPI S3 state, and allow the user to choose SUSPEND
      and HIBERNATION independently of each other.
      
      Make HOTPLUG_CPU be selected automatically if SUSPEND or HIBERNATION has
      been chosen and the kernel is intended for SMP systems.
      
      Also, introduce CONFIG_PM_SLEEP which is automatically selected if
      CONFIG_SUSPEND or CONFIG_HIBERNATION is set and use it to select the
      code needed for both suspend and hibernation.
      
      The top-level power management headers and the ACPI code related to
      suspend and hibernation are modified to use the new definitions (the
      changes in drivers/acpi/sleep/main.c are, mostly, moving code to reduce
      the number of ifdefs).
      
      There are many other files in which CONFIG_PM can be replaced with
      CONFIG_PM_SLEEP or even with CONFIG_SUSPEND, but they can be updated in
      the future.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      296699de
    • R
      Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION · b0cb1a19
      Rafael J. Wysocki 提交于
      Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
      confusion (among other things, with CONFIG_SUSPEND introduced in the
      next patch).
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0cb1a19
  7. 27 7月, 2007 3 次提交
  8. 25 7月, 2007 2 次提交
  9. 24 7月, 2007 1 次提交
  10. 23 7月, 2007 1 次提交
  11. 22 7月, 2007 3 次提交
  12. 20 7月, 2007 13 次提交