提交 · 25cef2251415cef5438e20965fec87096fe2efb0 · Linux-御风守护者 / linux

08 10月, 2008 1 次提交

Fix sections for omap-mcbsp platform driver · 25cef225

由 Uwe Kleine-König 提交于 10月 08, 2008

Don't use __init but __devinit to define probe function.  A pointer to
omap_mcbsp_probe is passed to the core via platform_driver_register and
so the function must not disappear when the init code is freed.  Using
__init and having HOTPLUG=y the following probably oopses:

	echo -n omap-mcbsp.1 > /sys/bus/platform/driver/omap-mcbsp/unbind
	echo -n omap-mcbsp.1 > /sys/bus/platform/driver/omap-mcbsp/bind

While at it move the remove function to the .devexit.text section.
Signed-off-by: NUwe Kleine-König <ukleinek@strlen.de>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NTony Lindgren <tony@atomide.com>

25cef225

03 10月, 2008 2 次提交

R
Merge unstable branch 'omap-rmk' · 56f68556
由 Russell King 提交于 10月 03, 2008
```
Merge branch 'omap-rmk' into omap-all
```
56f68556

Merge branch 'omap2-clock' of... · fd9470ce

由 Russell King 提交于 10月 03, 2008

Merge branch 'omap2-clock' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6.git

Merge branch 'omap2-clock' into omap-all

fd9470ce

06 9月, 2008 12 次提交

R
[ARM] omap: fix a load of "warning: symbol 'xxx' was not declared. Should it be static?" · 7c8ad982
由 Russell King 提交于 9月 05, 2008
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
7c8ad982
R
[ARM] omap: fix lots of 'Using plain integer as NULL pointer' · c0fc18c5
由 Russell King 提交于 9月 05, 2008
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
c0fc18c5

[ARM] omap: fix inappropriate casting in gpio.c · 7c7095aa

由 Russell King 提交于 9月 05, 2008

gpio.c wilfully casts physical addresses to void __iomem * and then
fixes them up at runtime using:

	bank->base = IO_ADDRESS(bank->base);

where accesses prior to this fixup are via omap_read/omap_write, and
after are by __raw_read/__raw_write.  This doesn't lend itself to
static checking, nor to easy understanding of the code.

And so, OMAP_MPUIO_BASE gets to be the right type - integer like since
it's a physical address, not a MMIO pointer.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

7c7095aa

[ARM] omap: DSP registers don't need to be casted · 397fcaf7

由 Russell King 提交于 9月 05, 2008

We're now assigning/comparing void __iomem pointers with
void __iomem pointer variables.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

397fcaf7

R
[ARM] omap: make sure virtual mmio addresses are __iomem pointer-like · 0062f104
由 Russell King 提交于 9月 04, 2008
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
0062f104

[ARM] omap: Fix IO_ADDRESS() macros · e8a91c95

由 Russell King 提交于 9月 01, 2008

OMAP1_IO_ADDRESS(), OMAP2_IO_ADDRESS() and IO_ADDRESS() returns cookies
for use with __raw_{read|write}* for accessing registers. Therefore,
these macros should return (void __iomem *) cookies, not integer values.

Doing this improves typechecking, and means we can find those places
where, eg, DMA controllers are incorrectly given virtual addresses to
DMA to, or physical addresses are thrown through a virtual to physical
address translation.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

e8a91c95

R
[ARM] omap: convert mcbsp to use ioremap() · d592dd1a
由 Russell King 提交于 9月 04, 2008
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
d592dd1a
R
[ARM] omap: convert OMAP drivers to use ioremap() · 55c381e4
由 Russell King 提交于 9月 04, 2008
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
55c381e4
R
[ARM] omap: allow ioremap() to use our fixed IO mappings · 690b5a13
由 Russell King 提交于 9月 04, 2008
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
690b5a13

[ARM] omap: remove an io_v2p() usage · e5480b73

由 Russell King 提交于 9月 01, 2008

When omap_udc is also incorporated, this macro will no longer be used.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

e5480b73

[SERIAL] 8250: serial8250_port_size() - omap ports are larger · f2eda27d

由 Russell King 提交于 9月 01, 2008

A function to contain common code for the size of the resource we
need to allocate or free.  OMAP ports need 22 bytes rather than
the standard 8 bytes.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

f2eda27d

[ARM] omap: improve is_omap_port() · 5668545a

由 Russell King 提交于 9月 01, 2008

Make is_omap_port() take the uart_8250_port structure so it can do
whatever test it desires.  Convert the test to compare the physical
addresses rather than virtual addresses.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

5668545a

05 9月, 2008 1 次提交

[ARM] omap: fix virtual vs physical address space confusions · 65846909

由 Russell King 提交于 9月 03, 2008

mcbsp is confused as to what takes a physical or virtual address.
Fix the two instances where it gets it wrong.
Acked-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

65846909

04 9月, 2008 2 次提交

[ARM] remove unused #include <version.h> · 8b540fdc

由 Huang Weiyi 提交于 8月 23, 2008

The driver(s) below do not use LINUX_VERSION_CODE nor KERNEL_VERSION.
  arch/arm/plat-mxc/clock.c

This patch removes the said #include <version.h>.
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

8b540fdc

[ARM] omap: fix build error in ohci-omap.c · c3df1a26

由 Russell King 提交于 9月 03, 2008

drivers/usb/host/ohci-omap.c: In function 'ohci_omap_init':
drivers/usb/host/ohci-omap.c:228: error: 'start_hnp' undeclared (first use in this function)
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c3df1a26

03 9月, 2008 22 次提交

[ARM] omap: fix gpio.c build error · 69114a47

由 Russell King 提交于 9月 03, 2008

arch/arm/plat-omap/gpio.c: In function '_omap_gpio_init':
arch/arm/plat-omap/gpio.c:1492: error: 'omap_mpuio_device' undeclared (first use in this function)
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

69114a47

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · d26acd92

由 Linus Torvalds 提交于 9月 02, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ipsec: Fix deadlock in xfrm_state management.
  ipv: Re-enable IP when MTU > 68
  net/xfrm: Use an IS_ERR test rather than a NULL test
  ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message.
  ath9k: Incorrect key used when group and pairwise ciphers are different.
  rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON
  mac80211: Fix debugfs union misuse and pointer corruption
  wireless/libertas/if_cs.c: fix memory leaks
  orinoco: Multicast to the specified addresses
  iwlwifi: fix 64bit platform firmware loading
  iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE)
  iwlwifi: workaround interrupt handling no some platforms
  iwlwifi: do not use GFP_DMA in iwl_tx_queue_init
  net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS
  net: Unbreak userspace usage of linux/mroute.h
  pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock()
  ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0

d26acd92

[x86] Fix TSC calibration issues · fbb16e24

由 Thomas Gleixner 提交于 9月 03, 2008

Larry Finger reported at http://lkml.org/lkml/2008/9/1/90:
An ancient laptop of mine started throwing errors from b43legacy when
I started using 2.6.27 on it. This has been bisected to commit bfc0f594
"x86: merge tsc calibration".

The unification of the TSC code adopted mostly the 64bit code, which
prefers PMTIMER/HPET over the PIT calibration.

Larrys system has an AMD K6 CPU. Such systems are known to have
PMTIMER incarnations which run at double speed. This results in a
miscalibration of the TSC by factor 0.5. So the resulting calibrated
CPU/TSC speed is half of the real CPU speed, which means that the TSC
based delay loop will run half the time it should run. That might
explain why the b43legacy driver went berserk.

On the other hand we know about systems, where the PIT based
calibration results in random crap due to heavy SMI/SMM
disturbance. On those systems the PMTIMER/HPET based calibration logic
with SMI detection shows better results.

According to Alok also virtualized systems suffer from the PIT
calibration method.

The solution is to use a more wreckage aware aproach than the current
either/or decision.

1) reimplement the retry loop which was dropped from the 32bit code
during the merge. It repeats the calibration and selects the lowest
frequency value as this is probably the closest estimate to the real
frequency

2) Monitor the delta of the TSC values in the delay loop which waits
for the PIT counter to reach zero. If the maximum value is
significantly different from the minimum, then we have a pretty safe
indicator that the loop was disturbed by an SMI.

3) keep the pmtimer/hpet reference as a backup solution for systems
where the SMI disturbance is a permanent point of failure for PIT
based calibration

4) do the loop iteration for both methods, record the lowest value and
decide after all iterations finished.

5) Set a clear preference to PIT based calibration when the result
makes sense.

The implementation does the reference calibration based on
HPET/PMTIMER around the delay, which is necessary for the PIT anyway,
but keeps separate TSC values to ensure the "independency" of the
resulting calibration values.

Tested on various 32bit/64bit machines including Geode 266Mhz, AMD K6
(affected machine with a double speed pmtimer which I grabbed out of
the dump), Pentium class machines and AMD/Intel 64 bit boxen.
Bisected-by: NLarry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fbb16e24

ipsec: Fix deadlock in xfrm_state management. · 37b08e34

由 David S. Miller 提交于 9月 02, 2008

Ever since commit 4c563f76
("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is
illegal to call __xfrm_state_destroy (and thus xfrm_state_put())
with xfrm_state_lock held.  If we do, we'll deadlock since we
have the lock already and __xfrm_state_destroy() tries to take
it again.

Fix this by pushing the xfrm_state_put() calls after the lock
is dropped.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37b08e34

drivers/char/random.c: fix a race which can lead to a bogus BUG() · 8b76f46a

由 Andrew Morton 提交于 9月 02, 2008

Fix a bug reported by and diagnosed by Aaron Straus.

This is a regression intruduced into 2.6.26 by

    commit adc782da
    Author: Matt Mackall <mpm@selenic.com>
    Date:   Tue Apr 29 01:03:07 2008 -0700

        random: simplify and rename credit_entropy_store

credit_entropy_bits() does:

	spin_lock_irqsave(&r->lock, flags);
	...
	if (r->entropy_count > r->poolinfo->POOLBITS)
		r->entropy_count = r->poolinfo->POOLBITS;

so there is a time window in which this BUG_ON():

static size_t account(struct entropy_store *r, size_t nbytes, int min,
		      int reserved)
{
	unsigned long flags;

	BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);

	/* Hold lock while accounting */
	spin_lock_irqsave(&r->lock, flags);

can trigger.

We could fix this by moving the assertion inside the lock, but it seems
safer and saner to revert to the old behaviour wherein
entropy_store.entropy_count at no time exceeds
entropy_store.poolinfo->POOLBITS.
Reported-by: NAaron Straus <aaron@merfinllc.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@kernel.org>		[2.6.26.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b76f46a

pm_qos_requirement might sleep · 9d359357

由 John Kacur 提交于 9月 02, 2008

Make PM_QOS and CPU_IDLE play nicer when run with the RT-Preempt kernel.

The purpose of the patch is to remove the spin_lock around the read in the
function pm_qos_requirement - since spinlocks can sleep in -rt and this
function is called from idle.

CPU_IDLE polls the target_value's of some of the pm_qos parameters from
the idle loop causing sleeping locking warnings.  Changing the
target_value to an atomic avoids this issue.

Remove the spinlock in pm_qos_requirement by making target_value an atomic
type.
Signed-off-by: Nmark gross <mgross@linux.intel.com>
Signed-off-by: NJohn Kacur <jkacur@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9d359357

rtc-cmos: wake again from S5 · 74c4633d

由 Rafael J. Wysocki 提交于 9月 02, 2008

Update rtc-cmos shutdown handling to leave RTC alarms active, resolving
http://bugzilla.kernel.org/show_bug.cgi?id=11411 on several boards.  There
are still some systems where the ACPI event handling doesn't cooperate.
(Possibly related to bugid 11312, reporting the spontaneous disabling of
RTC events.)

Bug 11411 reported that changes to work around some ACPI event issues
broke wake-from-S5 handling, as used for DVR applications.  (They like to
power off, then wake later to record programs.)

[yakui.zhao@intel.com: add shutdown for PNP devices]
[dbrownell@users.sourceforge.net: update comments]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NZhao Yakui <yakui.zhao@intel.com>
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Cc: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

74c4633d

sysfs: document files in /sys/firmware/sgi_uv/ · 8b3a8944

由 Russ Anderson 提交于 9月 02, 2008

Document files in /sys/firmware/sgi_uv/.
Signed-off-by: NRuss Anderson <rja@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b3a8944

ibft: fix target info parsing in ibft module · bb8fb4e6

由 Mike Christie 提交于 9月 02, 2008

I got this patch through Red Hat's bugzilla from the bug submitter and
patch creator.  I have just fixed it up so it applies without fuzz to
upstream kernels.

Original patch and description from Shyam kumar Iyer:

The issue [ibft module not displaying targets with short names] is because
of an offset calculatation error in the iscsi_ibft.c code.  Due to this
error directory structure for the target in /sys/firmware/ibft does not
get created and so the initiator is unable to connect to the target.

Note that this bug surfaced only with an name that had a short section at
the end.  eg: "iqn.1984-05.com.dell:dell".  It did not surface when the
iqn's had a longer section at the end.  eg:
"iqn.2001-04.com.example:storage.disk2.sys1.xyz"

So, the eot_offset was calculated such that an extra 48 bytes i.e.  the
size of the ibft_header which has already been accounted was subtracted
twice.

This was not evident with longer iqn names because they would overshoot
the total ibft length more than 48 bytes and thus would escape the bug.
Signed-off-by: NShyam Kumar Iyer <shyam_iyer@dell.com>
Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
Cc: Konrad Rzeszutek <konrad@virtualiron.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb8fb4e6

rtc_time_to_tm: fix signed/unsigned arithmetic · 73442daf

由 Jan Altenberg 提交于 9月 02, 2008

commit 945185a6 ("rtc: rtc_time_to_tm: use
unsigned arithmetic") changed the some types in rtc_time_to_tm() to
unsigned:

 void rtc_time_to_tm(unsigned long time, struct rtc_time *tm)
 {
-       register int days, month, year;
+       unsigned int days, month, year;

This doesn't work for all cases, because days is checked for < 0 later
on:

if (days < 0) {
	year -= 1;
	days += 365 + LEAP_YEAR(year);
}

I think the correct fix would be to keep days signed and do an appropriate
cast later on.
Signed-off-by: NJan Altenberg <jan.altenberg@linutronix.de>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73442daf

tdfxfb: fix frame buffer name overrun · b4a49b12

由 Krzysztof Helt 提交于 9月 02, 2008

If there are more then one graphics card handled by the tdfxfb driver the
name of the frame buffer overruns reserved size.
Signed-off-by: NKrzysztof Helt <krzysztof.h1@wp.pl>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4a49b12

tdfxfb: fix SDRAM memory size detection · bf6910c0

由 Krzysztof Helt 提交于 9月 02, 2008

Fix memory detection on Voodoo3 cards with SDRAM memory.
Signed-off-by: NKrzysztof Helt <krzysztof.h1@wp.pl>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf6910c0

hp-wmi: add proper hotkey support · a8823aef

由 Matthew Garrett 提交于 9月 02, 2008

It turns out that event 0x4 merely indcates that a hotkey has been
pressed, not which one.  A further query is required in order to determine
the actual keypress.  The following patch adds support for that along with
the known keycodes.
Signed-off-by: NMatthew Garrett <mjg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8823aef

hp-wmi: update to match current rfkill semantics · 3f6e2f13

由 Matthew Garrett 提交于 9月 02, 2008

hp-wmi currently changes the RFKill state by altering the struct members
rather than using the dedicated interface, meaning that update events
won't be pushed to userspace.  This patch fixes that, along with fixing
the declared type of the WWAN kill switch.  It also ensures that rfkill
interfaces are only registered for hardware that exists.
Signed-off-by: NMatthew Garrett <mjg@redhat.com>
Acked-by: NHenrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Ivo van Doorn <ivdoorn@gmail.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f6e2f13

ipc: document the new auto_msgmni proc file · 61e55d05

由 Nadia Derbey 提交于 9月 02, 2008

Update Documentation/filesystems/proc.txt: it describes the file
auto_msgmni intoduced to enable/disable msgmni automatic recomputing upon
memory add/remove (see thread http://lkml.org/lkml/2008/7/4/27).  Also
added a description for msgmni (this filex is only listed in
Documentation/sysctl/kernel.txt).
Signed-off-by: NNadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61e55d05

mm: size of quicklists shouldn't be proportional to the number of CPUs · b9541852

由 KOSAKI Motohiro 提交于 9月 02, 2008

Quicklists store pages for each CPU as caches.  (Each CPU can cache
node_free_pages/16 pages)

It is used for page table cache.  exit() will increase the cache size,
while fork() consumes it.

So for example if an apache-style application runs (one parent and many
child model), one CPU process will fork() while another CPU will process
the middleware work and exit().

At that time, the CPU on which the parent runs doesn't have page table
cache at all.  Others (on which children runs) have maximum caches.

	QList_max = (#ofCPUs - 1) x Free / 16
	=> QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1)

So, How much quicklist memory is used in the maximum case?

This is proposional to # of CPUs because the limit of per cpu quicklist
cache doesn't see the number of cpus.

Above calculation mean

	 Number of CPUs per node            2    4    8   16
	 ==============================  ====================
	 QList_max / (Free + QList_max)   5.8%  16%  30%  48%

Wow! Quicklist can spend about 50% memory at worst case.

My demonstration program is here
--------------------------------------------------------------------------------
#define _GNU_SOURCE

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sched.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

#define BUFFSIZE 512

int max_cpu(void)	/* get max number of logical cpus from /proc/cpuinfo */
{
  FILE *fd;
  char *ret, buffer[BUFFSIZE];
  int cpu = 1;

  fd = fopen("/proc/cpuinfo", "r");
  if (fd == NULL) {
    perror("fopen(/proc/cpuinfo)");
    exit(EXIT_FAILURE);
  }
  while (1) {
    ret = fgets(buffer, BUFFSIZE, fd);
    if (ret == NULL)
      break;
    if (!strncmp(buffer, "processor", 9))
      cpu = atoi(strchr(buffer, ':') + 2);
  }
  fclose(fd);
  return cpu;
}

void cpu_bind(int cpu)	/* bind current process to one cpu */
{
  cpu_set_t mask;
  int ret;

  CPU_ZERO(&mask);
  CPU_SET(cpu, &mask);
  ret = sched_setaffinity(0, sizeof(mask), &mask);
  if (ret == -1) {
    perror("sched_setaffinity()");
    exit(EXIT_FAILURE);
  }
  sched_yield();	/* not necessary */
}

#define MMAP_SIZE (10 * 1024 * 1024)	/* 10 MB */
#define FORK_INTERVAL 1	/* 1 second */

main(int argc, char *argv[])
{
  int cpu_max, nextcpu;
  long pagesize;
  pid_t pid;

  /* set max number of logical cpu */
  if (argc > 1)
    cpu_max = atoi(argv[1]) - 1;
  else
    cpu_max = max_cpu();

  /* get the page size */
  pagesize = sysconf(_SC_PAGESIZE);
  if (pagesize == -1) {
    perror("sysconf(_SC_PAGESIZE)");
    exit(EXIT_FAILURE);
  }

  /* prepare parent process */
  cpu_bind(0);
  nextcpu = cpu_max;

loop:

  /* select destination cpu for child process by round-robin rule */
  if (++nextcpu > cpu_max)
    nextcpu = 1;

  pid = fork();

  if (pid == 0) { /* child action */

    char *p;
    int i;

    /* consume page tables */
    p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
    i = MMAP_SIZE / pagesize;
    while (i-- > 0) {
      *p = 1;
      p += pagesize;
    }

    /* move to other cpu */
    cpu_bind(nextcpu);
/*
    printf("a child moved to cpu%d after mmap().\n", nextcpu);
    fflush(stdout);
 */

    /* back page tables to pgtable_quicklist */
    exit(0);

  } else if (pid > 0) { /* parent action */

    sleep(FORK_INTERVAL);
    waitpid(pid, NULL, WNOHANG);

  }

  goto loop;
}
----------------------------------------

When above program which does task migration runs, my 8GB box spends
800MB of memory for quicklist.  This is not memory leak but doesn't seem
good.

% cat /proc/meminfo

MemTotal:        7701568 kB
MemFree:         4724672 kB
(snip)
Quicklists:       844800 kB

because

- My machine spec is
	number of numa node: 2
	number of cpus:      8 (4CPU x2 node)
        total mem:           8GB (4GB x2 node)
        free mem:            about 5GB

- Then, 4.7GB x 16% ~= 880MB.
  So, Quicklist can use 800MB.

So, if following spec machine run that program

   CPUs: 64 (8cpu x 8node)
   Mem:  1TB (128GB x8node)

Then, quicklist can waste 300GB (= 1TB x 30%).  It is too large.

So, I don't like cache policies which is proportional to # of cpus.

My patch changes the number of caches
from:
   per-cpu-cache-amount = memory_on_node / 16
to
   per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Tested-by: NDavid Miller <davem@davemloft.net>
Acked-by: NMike Travis <travis@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9541852

mm: show quicklist usage in /proc/meminfo · 4b856152

由 KOSAKI Motohiro 提交于 9月 02, 2008

Quicklists can consume several GB of memory.  We should provide a means of
monitoring this.

After this patch is applied, /proc/meminfo will output the following:

% cat /proc/meminfo

MemTotal:      7715392 kB
MemFree:       5401600 kB
Buffers:         80384 kB
Cached:         300800 kB
SwapCached:          0 kB
Active:         235584 kB
Inactive:       262656 kB
SwapTotal:     2031488 kB
SwapFree:      2031488 kB
Dirty:            3520 kB
Writeback:           0 kB
AnonPages:      117696 kB
Mapped:          38528 kB
Slab:          1589952 kB
SReclaimable:    23104 kB
SUnreclaim:    1566848 kB
PageTables:      14656 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   5889152 kB
Committed_AS:   393152 kB
VmallocTotal: 17592177655808 kB
VmallocUsed:     29056 kB
VmallocChunk: 17592177626432 kB
Quicklists:     130944 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    262144 kB
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b856152

devcgroup: fix race against rmdir() · 36fd71d2

由 Li Zefan 提交于 9月 02, 2008

During the use of a dev_cgroup, we should guarantee the corresponding
cgroup won't be deleted (i.e.  via rmdir).  This can be done through
css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup
under rcu_read_lock.

And also remove checking NULL dev_cgroup, it won't be NULL since a task
always belongs to a cgroup.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36fd71d2

cirrusfb: check_par fixes · 09a2910e

由 Krzysztof Helt 提交于 9月 02, 2008

1. Check if virtual resolution fits into memory.
   Otherwise, Linux hangs during panning.
2. When selected use all available memory to
    maximize yres_virtual to speed up panning
   (previously also xres_virtual was increased).
3. Simplify memory restriction calculations.
Signed-off-by: NKrzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09a2910e

pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits · 950bbabb

由 Oleg Nesterov 提交于 9月 02, 2008

We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits.  As Robert Rex <robert.rex@exasol.com> pointed
out this is wrong.

Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().

Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed.  Kill
the now unneeded exit_child_reaper().

Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391Reported-by: NRobert Rex <robert.rex@exasol.com>
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

950bbabb

pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing · add0d4df

由 Oleg Nesterov 提交于 9月 02, 2008

zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.

Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty, we
may have EXIT_DEAD tasks which are not visible to do_wait(). In that case
the subsequent forget_original_parent() will crash the kernel because it
will try to re-parent these tasks to the NULL reaper.

Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.

Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.

Note that we must change ->child_reaper, otherwise
forget_original_parent() will use reaper == father, and in that case we
will hit the (correct) BUG_ON(!list_empty(&father->children)).
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

add0d4df

mmc: at91_mci: don't use coherent dma buffers · e385ea63

由 David Brownell 提交于 9月 02, 2008

At91_mci is abusing dma_free_coherent(), which may not be called with IRQs
disabled.  I saw "mkfs.ext3" on an MMC card objecting voluminously as each
write completed:

 WARNING: at arch/arm/mm/consistent.c:368 dma_free_coherent+0x2c/0x224()
 [<c002726c>] (dump_stack+0x0/0x14) from [<c00387d4>] (warn_on_slowpath+0x4c/0x68)
 [<c0038788>] (warn_on_slowpath+0x0/0x68) from [<c0028768>] (dma_free_coherent+0x2c/0x224)
  r6:00008008 r5:ffc06000 r4:00000000
 [<c002873c>] (dma_free_coherent+0x0/0x224) from [<c01918ac>] (at91_mci_irq+0x374/0x420)
 [<c0191538>] (at91_mci_irq+0x0/0x420) from [<c0065d9c>] (handle_IRQ_event+0x2c/0x6c)
 ...

This bug has been around for a LONG time.  The MM warning is from late
2005, but the driver merged a year later ...  so I'm puzzled why nobody
noticed this before now.

The fix involves noting that this buffer shouldn't be DMA-coherent; it's
just used for normal DMA writes.  So replace it with standard kmalloc()
buffering and DMA mapping calls.

This is the quickie fix.  A better one would not rely on allocating large
bounce buffers.  (Note that dma_alloc_coherent could have failed too, but
that case was ignored...  kmalloc is a bit more likely to fail though.)
Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Acked-by: NPierre Ossman <drzeus-mmc@drzeus.cx>
Cc: Andrew Victor <linux@maxim.org.za>
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e385ea63

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致