提交 · 37682177af68478fa83429b735fa16913c2fbb2b · openeuler / Kernel

25 5月, 2010 40 次提交

asm-generic: don't warn that atomic_t is only 24 bit · 37682177

由 Peter Fritzsche 提交于 5月 24, 2010

32-bit Sparc used to only allow usage of 24-bit of it's atomic_t type.
This was corrected with linux 2.6.3 when Keith M Wesolowski changed the
implementation to use the parisc approach of having an array of spinlocks
to protect the atomic_t.

These warnings were also removed from the sparc implementation when the
new implementation was merged in BKrev:402e4949VThdc6D3iaosSFUgabMfvw, but
the warning still remained in some other places without any 24-bit-only
atomic_t implementation inside the kernel.

We should remove these warnings to allow users to rely on the full 32-bit
range of atomic_t.
Signed-off-by: NPeter Fritzsche <peter.fritzsche@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37682177

kernel.h: add pr_warn for symmetry to dev_warn, netdev_warn · fc62f2f1

由 Joe Perches 提交于 5月 24, 2010

The current logging macros are
pr_<level>, dev_<level>, netdev_<level>, and netif_<level>.
pr_ uses warning, the other use warn.

Standardize these logging macros a bit more by adding pr_warn and
pr_warn_ratelimited.

Right now, there are:

$ for level in emerg alert crit err warn warning notice info ; do \
    for prefix in pr dev netdev netif ; do \
      echo -n "${prefix}_${level}:	`git grep -w "${prefix}_${level}" | wc -l`	" ; \
    done ; \
    echo ; \
  done
pr_emerg: 	45	dev_emerg: 	4	netdev_emerg: 	1	netif_emerg: 	4
pr_alert: 	24	dev_alert: 	36	netdev_alert: 	1	netif_alert: 	6
pr_crit: 	24	dev_crit: 	22	netdev_crit: 	1	netif_crit: 	4
pr_err:  	2013	dev_err: 	8467	netdev_err: 	267	netif_err: 	240
pr_warn: 	0	dev_warn: 	1818	netdev_warn: 	126	netif_warn: 	23
pr_warning: 	773	dev_warning: 	0	netdev_warning:	0	netif_warning: 	0
pr_notice: 	148	dev_notice: 	111	netdev_notice: 	9	netif_notice: 	3
pr_info: 	1717	dev_info: 	3007	netdev_info: 	101	netif_info: 	85
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc62f2f1

ntfs: use add_to_page_cache_lru() · 4c99000a

由 Minchan Kim 提交于 5月 24, 2010

Quote from Nick piggin's about btrfs patch
- http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html.

"add_to_page_cache_lru is exported, so it should be used. Benefits over
using a private pagevec: neater code, 128 bytes fewer stack used, percpu
lru ordering is preserved, and finally don't need to flush pagevec
before returning so batching may be shared with other LRU insertions."

Let's use it instead of private pagevec in ntfs, too.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NAnton Altaparmakov <aia21@cantab.net>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c99000a

ntfs: clean up ntfs_attr_extend_initialized · 2ec93b0b

由 Minchan Kim 提交于 5月 24, 2010

cached_page and lru_pvec have not been used.  Let's remove the arguments.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NAnton Altaparmakov <aia21@cantab.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ec93b0b

sunrpc: use formatting of module name in SUNRPC · ef7ffe8f

由 Alex Riesen 提交于 5月 24, 2010

gcc-4.3.3 produces the warning:
  "format not a string literal and no format arguments"
Signed-off-by: NAlex Riesen <raa.lkml@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: NTom Talpey <tmtalpey@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef7ffe8f

hvsi: fix messed up error checking getting state name · 08a82c68

由 Phil Carmody 提交于 5月 24, 2010

Handle out-of-range indices before reading what they refer to.  And don't
access the one-past-the-end element of the array either.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roel Kluin <roel.kluin@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08a82c68

kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN · 4be929be

由 Alexey Dobriyan 提交于 5月 24, 2010

- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4be929be

hangcheck-timer: fix x86_32 bugs · 940370fc

由 Yury Polyanskiy 提交于 5月 24, 2010

drivers/char/hangcheck-timer.c is doubly broken.  When the overflown value
of TIMER_FREQ is abnormally low, it spams the syslog with KERN_CRIT
messages "Hangcheck: hangcheck value past margin!" But whether it happens
or not depends on HZ and lpj in a complex way.  People have hit it
occasionally as far as google search can tell.

First, the following line overflows unsigned long:

# define TIMER_FREQ (HZ*loops_per_jiffy)

Second, and more importantly, loops_per_jiffy has little to do with the
con= version from the the time scale of get_cycles() (aka rdtsc) to the
time scale of jiffies.

The attached patch resolves both of the problems.
Acked-by: NJoel Becker <joel.becker@oracle.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

940370fc

endian: #define __BYTE_ORDER · b3b77c8c

由 Joakim Tjernlund 提交于 5月 24, 2010

Linux does not define __BYTE_ORDER in its endian header files which makes
some header files bend backwards to get at the current endian.  Lets
#define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for
header files that are used in user space too.

In userspace the convention is that

  1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined,
  2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN.
Signed-off-by: NJoakim Tjernlund <Joakim.Tjernlund@transmode.se>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3b77c8c

err.h: add __must_check to error pointer handlers · e47103b1

由 Jani Nikula 提交于 5月 24, 2010

Add __must_check to error pointer handlers to have the compiler warn about
mistakes like:

	if (err)
		ERR_PTR(err);

It found two bugs:

Mar 12 Nikula Jani [PATCH] enclosure: fix error path - actually return ERR_PTR() on error
Mar 12 Nikula Jani [PATCH] sunrpc: fix error path - actually return ERR_PTR() on error
Signed-off-by: NJani Nikula <ext-jani.1.nikula@nokia.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e47103b1

cpuidle: add a repeating pattern detector to the menu governor · 1f85f87d

由 Arjan van de Ven 提交于 5月 24, 2010

Currently, the menu governor uses the (corrected) next timer as key item
for predicting the idle duration.

It turns out that there are specific cases where this breaks down: There
are cases where we have a very repetitive pattern of idle durations, where
the idle period is pretty much the same, for reasons completely unrelated
to the next timer event.  Examples of such repeating patterns are network
loads with irq mitigation, the mouse moving but in theory also the wifi
beacons.

This patch adds a relatively simple detector for such repeating patterns,
where the standard deviation of the last 8 idle periods is compared to a
threshold.

With this extra predictor in place, measurements show that the DECAY
factor can now be increased (the decaying average will now decay slower)
to get an even more stable result.

[arjan@infradead.org: fix bug identified by Frank]
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f85f87d

mn10300: set ARCH_KMALLOC_MINALIGN · 6cdafaae

由 FUJITA Tomonori 提交于 5月 24, 2010

Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6cdafaae

mn10300: use generic atomic.h · 36683161

由 Mathieu Desnoyers 提交于 5月 24, 2010

asm-generic/atomic.h has been derived from the mn10300 implementation.
Remove the now duplicated mn10300 implementation by including the generic
version instead.

This adds cmpxchg_local() and cmpxchg64_local() for free to the
architecture, as they are implemented in asm-generic/atomic.h.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NPeter Fritzsche <peter.fritzsche@gmx.de>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Keith M Wesolowski <wesolows@foobazco.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36683161

m68knommu: fix broken use of BUAD_TABLE_SIZE in 68328serial driver · e9a137cb

由 Greg Ungerer 提交于 5月 24, 2010

Commit 8b505ca8 ("serial: 68328serial.c:
remove BAUD_TABLE_SIZE macro") misses one use of BAUD_TABLE_SIZE.  So the
resulting 68328serial.c does not compile:

drivers/serial/68328serial.c: In function `m68328_console_setup':
drivers/serial/68328serial.c:1439: error: `BAUD_TABLE_SIZE' undeclared (first use in this function)
drivers/serial/68328serial.c:1439: error: (Each undeclared identifier is reported only once
drivers/serial/68328serial.c:1439: error: for each function it appears in.)

Fix that last use of it.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
Cc: Thiago Farina <tfransosi@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e9a137cb

frv: set ARCH_KMALLOC_MINALIGN · 69dcf3db

由 FUJITA Tomonori 提交于 5月 24, 2010

Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69dcf3db

frv: extend gdbstub to support more features of gdb · 7ca8b9c0

由 David Howells 提交于 5月 24, 2010

Extend gdbstub to support more features of gdb remote protocol to keep
gdb-7 and emacs gud mode happy:

 (*) The D command.  Detach debugger.

 (*) The H command.  Handle setting the target thread by ignoring it.

 (*) The qAttached command.  Indicate we 'attached' to an existing process.

 (*) The qC command.  Indicate that the current thread ID is 0.

 (*) The qOffsets command.  Indicate that no relocation has been done.

 (*) The qSymbol:: command.  Indicate that we're not interested in looking up
     any symbol addresses.

 (*) The qSupported command.  Indicate the maximum packet size and the fact
     that reverse step and continue aren't supported.

 (*) The vCont? command.  Indicate that we don't support any of its variants.

Also make it possible to trace the commands and replies without tracing
the individual character I/O.

[akpm@linux-foundation.org: make gdbstub_handle_query() static]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ca8b9c0

mm: make lowmem_page_address() use PFN_PHYS() for improved portability · c6f6b596

由 Chris Metcalf 提交于 5月 24, 2010

This ensures that platforms with lowmem PAs above 32 bits work correctly
by avoiding truncating the PA during a left shift.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Cc: Barry Song <21cnbao@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c6f6b596

mem-hotplug: fix potential race while building zonelist for new populated zone · 4eaf3f64

由 Haicheng Li 提交于 5月 24, 2010

Add global mutex zonelists_mutex to fix the possible race:

     CPU0                                  CPU1                    CPU2
(1) zone->present_pages += online_pages;
(2)                                       build_all_zonelists();
(3)                                                               alloc_page();
(4)                                                               free_page();
(5) build_all_zonelists();
(6)   __build_all_zonelists();
(7)     zone->pageset = alloc_percpu();

In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).

Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
since there is a new fresh memory block added in step(6).

[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4eaf3f64

mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset · 1f522509

由 Haicheng Li 提交于 5月 24, 2010

For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
       at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
       shared in onlined_pages()

Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:

  ------------[ cut here ]------------
  kernel BUG at mm/page_alloc.c:1239!
  invalid opcode: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
   [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
   [<ffffffff81128407>] __page_cache_alloc+0x67/0x70
   [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
   [<ffffffff81132751>] ra_submit+0x21/0x30
   [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
   [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
   [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
   [<ffffffff81266cfa>] nfs_file_read+0xca/0x130
   [<ffffffff8117b20a>] do_sync_read+0xfa/0x140
   [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
   [<ffffffff8117c151>] sys_read+0x51/0x80
   [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
  RIP  [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
   RSP <ffff88000d1e78a8>
  ---[ end trace 4bda28328b9990db ]

[akpm@linux-foundation.org: merge fix]
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f522509

mem-hotplug: separate setup_per_cpu_pageset() into separate functions · 319774e2

由 Wu Fengguang 提交于 5月 24, 2010

No behavior change here.

Move some of setup_per_cpu_pageset() code into a new function
setup_zone_pageset() that will be useful for memory hotplug.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

319774e2

mm: fix NR_SECTION_ROOTS == 0 when using using sparsemem extreme. · 0faa5638

由 Marcelo Roberto Jimenez 提交于 5月 24, 2010

Got this while compiling for ARM/SA1100:

mm/sparse.c: In function '__section_nr':
mm/sparse.c:135: warning: 'root' is used uninitialized in this function

This patch follows Russell King's suggestion for a new calculation for
NR_SECTION_ROOTS.  Thanks also to Sergei Shtylyov for pointing out the
existence of the macro DIV_ROUND_UP.

Atsushi Nemoto observed:
: This fix doesn't just silence the warning - it fixes a real problem.
:
: Without this fix, mem_section[] might have 0 size so mem_section[0]
: will share other variable area.  For example, I got:
:
: c030c700 b __warned.16478
: c030c700 B mem_section
: c030c701 b __warned.16483
:
: This might cause very strange behavior.  Your patch actually fixes it.
Signed-off-by: NMarcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0faa5638

highmem: remove unneeded #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT for debug_kmap_atomic() · ff3d58c2

由 Akinobu Mita 提交于 5月 24, 2010

In f4112de6 ("mm: introduce
debug_kmap_atomic") I said that debug_kmap_atomic() needs
CONFIG_TRACE_IRQFLAGS_SUPPORT.

It was wrong.  (I thought irqs_disabled() is only available when the
architecture has CONFIG_TRACE_IRQFLAGS_SUPPORT)

Remove the #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT check to enable
kmap_atomic() debugging for the architectures which do not have
CONFIG_TRACE_IRQFLAGS_SUPPORT.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ff3d58c2

include/linux/gfp.h: fix coding style · fd23855e

由 matt mooney 提交于 5月 24, 2010

Add parenthesis in a define.  This doesn't change functionality.

checkpatch errors:
1) white space fixes
2) add spaces after comas
Signed-off-by: Nmatt mooney <mfm@muteddisk.com>
Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd23855e

include/linux/gfp.h: spelling fixes · 263ff5d8

由 matt mooney 提交于 5月 24, 2010

Fix minor spelling errors in a few comments; no code changes.
Signed-off-by: Nmatt mooney <mfm@muteddisk.com>
Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

263ff5d8

cpu/mem hotplug: enable CPUs online before local memory online · cf23422b

由 minskey guo 提交于 5月 24, 2010

Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online.  For a
numa node without onlined local memory, its zonelists are not initialized
at present.  As a result, any memory allocation operations executed by
CPUs within this node will fail.  In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo<chaohong.guo@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf23422b

vmscan: remove isolate_pages callback scan control · 8b25c6d2

由 Johannes Weiner 提交于 5月 24, 2010

For now, we have global isolation vs.  memory control group isolation, do
not allow the reclaim entry function to set an arbitrary page isolation
callback, we do not need that flexibility.

And since we already pass around the group descriptor for the memory
control group isolation case, just use it to decide which one of the two
isolator functions to use.

The decisions can be merged into nearby branches, so no extra cost there.
In fact, we save the indirect calls.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b25c6d2

vmscan: remove all_unreclaimable scan control · 0aeb2339

由 Johannes Weiner 提交于 5月 24, 2010

This scan control is abused to communicate a return value from
shrink_zones().  Write this idiomatically and remove the knob.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0aeb2339

mm: document follow_page() · 142762bd

由 Johannes Weiner 提交于 5月 24, 2010

Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

142762bd

fs-writeback: check sync bit earlier in inode_wait_for_writeback · 58a9d3d8

由 Richard Kennedy 提交于 5月 24, 2010

When wb_writeback() hasn't written anything it will re-acquire the inode
lock before calling inode_wait_for_writeback.

This change tests the sync bit first so that is doesn't need to drop &
re-acquire the lock if the inode became available while wb_writeback() was
waiting to get the lock.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58a9d3d8

mm: introduce free_pages_prepare() · ec95f53a

由 KOSAKI Motohiro 提交于 5月 24, 2010

free_hot_cold_page() and __free_pages_ok() have very similar freeing
preparation.  Consolidate them.

[akpm@linux-foundation.org: fix busted coding style]
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ec95f53a

vmscan: page_check_references(): check low order lumpy reclaim properly · 5f53e762

由 KOSAKI Motohiro 提交于 5月 24, 2010

If vmscan is under lumpy reclaim mode, it have to ignore referenced bit
for making contenious free pages.  but current page_check_references()
doesn't.

Fix it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f53e762

readahead.c: fix comment · bf8abe8b

由 Huang Shijie 提交于 5月 24, 2010

Fix a wrong comment over page_cache_async_readahead().
Signed-off-by: NHuang Shijie <shijie8@gmail.com>
Acked-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf8abe8b

vmscan: prevent get_scan_ratio() rounding errors · 76a33fc3

由 Shaohua Li 提交于 5月 24, 2010

get_scan_ratio() calculates percentage and if the percentage is < 1%, it
will round percentage down to 0% and cause we completely ignore scanning
anon/file pages to reclaim memory even the total anon/file pages are very
big.

To avoid underflow, we don't use percentage, instead we directly calculate
how many pages should be scaned.  In this way, we should get several
scanned pages for < 1% percent.

This has some benefits:

1. increase our calculation precision

2.  making our scan more smoothly.  Without this, if percent[x] is
   underflow, shrink_zone() doesn't scan any pages and suddenly it scans
   all pages when priority is zero.  With this, even priority isn't zero,
   shrink_zone() gets chance to scan some pages.

Note, this patch doesn't really change logics, but just increase
precision.  For system with a lot of memory, this might slightly changes
behavior.  For example, in a sequential file read workload, without the
patch, we don't swap any anon pages.  With it, if anon memory size is
bigger than 16G, we will see one anon page swapped.  The 16G is calculated
as PAGE_SIZE * priority(4096) * (fp/ap).  fp/ap is assumed to be 1024
which is common in this workload.  So the impact sounds not a big deal.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76a33fc3

mm: consider the entire user address space during node migration · 6ec3a127

由 Greg Thelen 提交于 5月 24, 2010

Use mm->task_size instead of TASK_SIZE to ensure that the entire user
address space is migrated.  mm->task_size is independent of the calling
task context.  TASK SIZE may be dependant on the address space size of the
calling process.  Usage of TASK_SIZE can lead to partial address space
migration if the calling process was 32 bit and the migrating process was
64 bit.

Here is the test script used on 64 system with a 32 bit echo process:

  mount -t cgroup none /cgroup -o cpuset
  cd /cgroup

  mkdir 0
  echo 1 > 0/cpuset.cpus
  echo 0 > 0/cpuset.mems
  echo 1 > 0/cpuset.memory_migrate

  mkdir 1
  echo 1 > 1/cpuset.cpus
  echo 1 > 1/cpuset.mems
  echo 1 > 1/cpuset.memory_migrate

  echo $$ > 0/tasks
  64_bit_process &
  pid=$!

  echo $pid > 1/tasks   # This does not migrate all process pages without
                        # this patch.  If 64 bit echo is used or this patch is
                        # applied, then the full address space of $pid is
                        # migrated.

To check memory migration, I watched:
  grep MemUsed /sys/devices/system/node/node*/meminfo
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ec3a127

mm: compaction: defer compaction using an exponential backoff when compaction fails · 4f92e258

由 Mel Gorman 提交于 5月 24, 2010

The fragmentation index may indicate that a failure is due to external
fragmentation but after a compaction run completes, it is still possible
for an allocation to fail.  There are two obvious reasons as to why

  o Page migration cannot move all pages so fragmentation remains
  o A suitable page may exist but watermarks are not met

In the event of compaction followed by an allocation failure, this patch
defers further compaction in the zone (1 << compact_defer_shift) times.
If the next compaction attempt also fails, compact_defer_shift is
increased up to a maximum of 6.  If compaction succeeds, the defer
counters are reset again.

The zone that is deferred is the first zone in the zonelist - i.e.  the
preferred zone.  To defer compaction in the other zones, the information
would need to be stored in the zonelist or implemented similar to the
zonelist_cache.  This would impact the fast-paths and is not justified at
this time.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4f92e258

mm: compaction: add a tunable that decides when memory should be compacted and... · 5e771905

由 Mel Gorman 提交于 5月 24, 2010

mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed

The kernel applies some heuristics when deciding if memory should be
compacted or reclaimed to satisfy a high-order allocation.  One of these
is based on the fragmentation.  If the index is below 500, memory will not
be compacted.  This choice is arbitrary and not based on data.  To help
optimise the system and set a sensible default for this value, this patch
adds a sysctl extfrag_threshold.  The kernel will only compact memory if
the fragmentation index is above the extfrag_threshold.

[randy.dunlap@oracle.com: Fix build errors when proc fs is not configured]
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5e771905

mm: compaction: direct compact when a high-order allocation fails · 56de7263

由 Mel Gorman 提交于 5月 24, 2010

Ordinarily when a high-order allocation fails, direct reclaim is entered
to free pages to satisfy the allocation.  With this patch, it is
determined if an allocation failed due to external fragmentation instead
of low memory and if so, the calling process will compact until a suitable
page is freed.  Compaction by moving pages in memory is considerably
cheaper than paging out to disk and works where there are locked pages or
no swap.  If compaction fails to free a page of a suitable size, then
reclaim will still occur.

Direct compaction returns as soon as possible.  As each block is
compacted, it is checked if a suitable page has been freed and if so, it
returns.

[akpm@linux-foundation.org: Fix build errors]
[aarcange@redhat.com: fix count_vm_event preempt in memory compaction direct reclaim]
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

56de7263

mm: compaction: add /sys trigger for per-node memory compaction · ed4a6d7f

由 Mel Gorman 提交于 5月 24, 2010

Add a per-node sysfs file called compact.  When the file is written to,
each zone in that node is compacted.  The intention that this would be
used by something like a job scheduler in a batch system before a job
starts so that the job can allocate the maximum number of hugepages
without significant start-up cost.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed4a6d7f

mm: compaction: add /proc trigger for memory compaction · 76ab0f53

由 Mel Gorman 提交于 5月 24, 2010

Add a proc file /proc/sys/vm/compact_memory.  When an arbitrary value is
written to the file, all zones are compacted.  The expected user of such a
trigger is a job scheduler that prepares the system before the target
application runs.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76ab0f53

mm: compaction: memory compaction core · 748446bb

由 Mel Gorman 提交于 5月 24, 2010

This patch is the core of a mechanism which compacts memory in a zone by
relocating movable pages towards the end of the zone.

A single compaction run involves a migration scanner and a free scanner.
Both scanners operate on pageblock-sized areas in the zone.  The migration
scanner starts at the bottom of the zone and searches for all movable
pages within each area, isolating them onto a private list called
migratelist.  The free scanner starts at the top of the zone and searches
for suitable areas and consumes the free pages within making them
available for the migration scanner.  The pages isolated for migration are
then migrated to the newly isolated free pages.

[aarcange@redhat.com: Fix unsafe optimisation]
[mel@csn.ul.ie: do not schedule work on other CPUs for compaction]
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

748446bb

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功