提交 · c7ff0d9c92435e836e13aaa8d0e56d4000424bcc · openanolis / cloud-kernel

11 8月, 2010 40 次提交

panic: keep blinking in spite of long spin timer mode · c7ff0d9c

由 TAMUKI Shoichi 提交于 8月 10, 2010

To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
 That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.

This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.

The time to call to mdelay is now 100ms.  Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.
Signed-off-by: NTAMUKI Shoichi <tamuki@linet.gr.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: NDmitry Torokhov <dtor@mail.ru>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7ff0d9c

afs: destroy work queue on init failure · bebf8cfa

由 Dan Carpenter 提交于 8月 10, 2010

We can clean up the work queue on this error path.  This function is
called from afs_init().
Signed-off-by: NDan Carpenter <error27@gmail.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bebf8cfa

dma-mapping: add DMA_xxBIT_MASK to feature-removal-schedule.txt · a35274cd

由 FUJITA Tomonori 提交于 8月 10, 2010

DMA_xxBIT_MASK macros were marked as deprecated in June 2009.  One more
year is long enough, I think.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a35274cd

pci: add PCI DMA unamp state API to feature-removal-schedule.txt · 17583363

由 FUJITA Tomonori 提交于 8月 10, 2010

It was replaced with the DMA unamp state API (which can be used for
any bus).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17583363

Documentation: DMA-API-HOWTO.txt: add multiple types of IOMMUs support · c31e74c4

由 FUJITA Tomonori 提交于 8月 10, 2010

Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c31e74c4

dma-mapping: remove dma_is_consistent API · 3b9c6c11

由 FUJITA Tomonori 提交于 8月 10, 2010

Architectures implement dma_is_consistent() in different ways (some
misinterpret the definition of API in DMA-API.txt).  So it hasn't been so
useful for drivers.  We have only one user of the API in tree.  Unlikely
out-of-tree drivers use the API.

Even if we fix dma_is_consistent() in some architectures, it doesn't look
useful at all.  It was invented long ago for some old systems that can't
allocate coherent memory at all.  It's better to export only APIs that are
definitely necessary for drivers.

Let's remove this API.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b9c6c11

scsi: 53c700: remove dma_is_consistent usage · d80e0d96

由 FUJITA Tomonori 提交于 8月 10, 2010

This driver is the only user of dma_is_consistent().  We plan to remove this
API.

The driver uses the API in the following way:

BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());

The above code tries to see if L1_CACHE_BYTES is greater than
dma_get_cache_alignment() on sysmtes that can not allocate coherent memory
(some old systems can't).

James Bottomley exmplained that this is necesary because the driver packs the
set of mailboxes into a single coherent area and separates the different
usages by a L1 cache stride.  So it's fatal if the dma

He also pointed out that we can kill this checking because we don't hit this
BUG_ON on all architectures that actually use the driver.

(akpm: stolen from the scsi tree because
dma-mapping-remove-dma_is_consistent-api.patch needs it)
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d80e0d96

dma-mapping: parisc: set ARCH_DMA_MINALIGN · 7896bfa4

由 FUJITA Tomonori 提交于 8月 10, 2010

Architectures that handle DMA-non-coherent memory need to set
ARCH_DMA_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7896bfa4

dma-mapping: unify dma_get_cache_alignment implementations · 4565f017

由 FUJITA Tomonori 提交于 8月 10, 2010

dma_get_cache_alignment returns the minimum DMA alignment.  Architectures
defines it as ARCH_DMA_MINALIGN (formally ARCH_KMALLOC_MINALIGN).  So we
can unify dma_get_cache_alignment implementations.

Note that some architectures implement dma_get_cache_alignment wrongly.
dma_get_cache_alignment() should return the minimum DMA alignment.  So
fully-coherent architectures should return 1.  This patch also fixes this
issue.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4565f017

dma-mapping: rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN · a6eb9fe1

由 FUJITA Tomonori 提交于 8月 10, 2010

Now each architecture has the own dma_get_cache_alignment implementation.

dma_get_cache_alignment returns the minimum DMA alignment.  Architectures
define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
buffer is DMA-safe; the buffer doesn't share a cache with the others).  So
we can unify dma_get_cache_alignment implementations.

This patch:

dma_get_cache_alignment() needs to know if an architecture defines
ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
alignment restriction).  However, slab.h define ARCH_KMALLOC_MINALIGN if
architectures doesn't define it.

Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
(except for crypto).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6eb9fe1

edac: mpc85xx: add support for new MPCxxx/Pxxxx EDAC controllers · cd1542c8

由 Anton Vorontsov 提交于 8月 10, 2010

Simply add proper IDs into the device table.
Signed-off-by: NAnton Vorontsov <avorontsov@mvista.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd1542c8

edac: i5400: improve handling of pci_enable_device() return value · b425d5c8

由 Kulikov Vasiliy 提交于 8月 10, 2010

-EIO is not the only error code that pci_enable_device() may return, also
the set of errors can be enhanced in future.  We should compare return
code with zero, not with concrete error value.
Signed-off-by: NKulikov Vasiliy <segooon@gmail.com>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Cc: Jeff Roberson <jroberson@jroberson.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b425d5c8

edac: i5000: improve handling of pci_enable_device() return value · 44aa80f0

由 Kulikov Vasiliy 提交于 8月 10, 2010

-EIO is not the only error code that pci_enable_device() may return, also
the set of errors can be enhanced in future.  We should compare return
code with zero, not with concrete error value.
Signed-off-by: NKulikov Vasiliy <segooon@gmail.com>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Cc: Jeff Roberson <jroberson@jroberson.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44aa80f0

edac: add wissing pieces from MPC85xx -> FSL_SOC_BOOKE · bd1688dc

由 Christoph Egger 提交于 8月 10, 2010

In 5753c082 ("powerpc/85xx: Kconfig
cleanup") menuconfig MPC85xx was replaced by FSL_SOC_BOOKE but some
references insider the code were not adjusted accordingly.  This patch
adresses these missing pieces.
Signed-off-by: NChristoph Egger <siccegge@cs.fau.de>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd1688dc

pids: alloc_pidmap: remove the unnecessary boundary checks · c52b0b91

由 Oleg Nesterov 提交于 8月 10, 2010

alloc_pidmap() calculates max_scan so that if the initial offset != 0 we
inspect the first map->page twice.  This is correct, we want to find the
unused bits < offset in this bitmap block.  Add the comment.

But it doesn't make any sense to stop the find_next_offset() loop when we
are looking into this map->page for the second time.  We have already
already checked the bits >= offset during the first attempt, it is fine to
do this again, no matter if we succeed this time or not.

Remove this hard-to-understand code.  It optimizes the very unlikely case
when we are going to fail, but slows down the more likely case.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Salman Qazi <sqazi@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c52b0b91

pids: fix a race in pid generation that causes pids to be reused immediately · 5fdee8c4

由 Salman 提交于 8月 10, 2010

A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of
the same program.  This is really bad for bash implementation.
Furthermore, many shell scripts assume that pid numbers will not be used
for some length of time.

Race Description:

A                                    B

// pid == offset == n                // pid == offset == n + 1
test_and_set_bit(offset, map->page)
                                     test_and_set_bit(offset, map->page);
                                     pid_ns->last_pid = pid;
pid_ns->last_pid = pid;
                                     // pid == n + 1 is freed (wait())

                                     // Next fork()...
                                     last = pid_ns->last_pid; // == n
                                     pid = last + 1;

Code to reproduce it (Running multiple instances is more effective):

#include <errno.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
  return (second + 32768 - first) % 32768;
}

int main(int argc, char* argv[]) {
  int failed = 0;
  pid_t last_pid = 0;
  int i;
  printf("%d\n", sizeof(pid_t));
  for (i = 0; i < 10000000; ++i) {
    if (i % 32786 == 0)
      printf("Iter: %d\n", i/32768);
    int child_exit_code = i % 256;
    pid_t pid = fork();
    if (pid == -1) {
      fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
      exit(1);
    }
    if (pid == 0) {
      // Child
      exit(child_exit_code);
    } else {
      // Parent
      if (i > 0) {
        int distance = PidDistance(last_pid, pid);
        if (distance == 0 || distance > 30000) {
          fprintf(stderr,
                  "Unexpected pid sequence: previous fork: pid=%d, "
                  "current fork: pid=%d for iteration=%d.\n",
                  last_pid, pid, i);
          failed = 1;
        }
      }
      last_pid = pid;
      int status;
      int reaped = wait(&status);
      if (reaped != pid) {
        fprintf(stderr,
                "Wait return value: expected pid=%d, "
                "got %d, iteration %d\n",
                pid, reaped, i);
        failed = 1;
      } else if (WEXITSTATUS(status) != child_exit_code) {
        fprintf(stderr,
                "Unexpected exit status %x, iteration %d\n",
                WEXITSTATUS(status), i);
        failed = 1;
      }
    }
  }
  exit(failed);
}

Thanks to Ted Tso for the key ideas of this implementation.
Signed-off-by: NSalman Qazi <sqazi@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fdee8c4

partitions: fix sometimes unreadable partition strings · 9c867fbe

由 Alexey Dobriyan 提交于 8月 10, 2010

Fix this garbage happening quite often:

==>	 sda:
	scsi 3:0:0:0: CD-ROM            TOSHIBA
==>	 sda1 sda2 sda3 sda4 <sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
			    ^^^
	Uniform CD-ROM driver Revision: 3.20
	sr 3:0:0:0: Attached scsi CD-ROM sr0
==>	 sda5 sda6 sda7 >

Make "sda: sda1 ..." lines actually lines.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c867fbe

cs5535-mfgpt: reuse timers that have never been set up · ecd62691

由 Jens Rottmann 提交于 8月 10, 2010

The MFGPT hardware may be set up only once, therefore
cs5535_mfgpt_free_timer() didn't re-set the timer's "avail" bit.  However
if a timer is freed before it has actually been in use then it may be made
available again.
Signed-off-by: NJens Rottmann <JRottmann@LiPPERTEmbedded.de>
Acked-by: NAndres Salomon <dilinger@queued.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ecd62691

drivers/char/n_gsm.c: add missing spin_unlock_irqrestore · e73790a5

由 Julia Lawall 提交于 8月 10, 2010

Add a spin_unlock_irqrestore missing on the error path.  Converting the
return to break leads to the spin_unlock_irqrestore at the end of the
function.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* spin_lock_irqsave(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* spin_unlock_irqrestore(E1,...);
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e73790a5

ipmi: print info for spmi and smbios paths like acpi and pci · 7bb671e3

由 Yinghai Lu 提交于 8月 10, 2010

Print out the reg spacing and size for spmi and smbios so BIOS developers
can make them consistent.

Also remove extra PFX on the duplicating path.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Corey Minyard <minyard@acm.org>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bb671e3

ipmi: fix memleaking for add_smi when duplicating happen · 7faefea6

由 Yinghai Lu 提交于 8月 10, 2010

Free the temporary info struct when we have duplicated ones.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Corey Minyard <minyard@acm.org>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7faefea6

drivers/char/ipmi/ipmi_si_intf.c: fix warning: variable 'addr_space' set but not used · f46c77c2

由 Justin P. Mattock 提交于 8月 10, 2010

Fix a warning message generated by GCC, and also updates a web address
pointing to a pdf containing information.

CC [M]  drivers/char/ipmi/ipmi_si_intf.o
drivers/char/ipmi/ipmi_si_intf.c: In function 'try_init_spmi':
drivers/char/ipmi/ipmi_si_intf.c:2016:8: warning: variable 'addr_space' set but not used
Signed-off-by: NSergey V. <sftp.mtuci@gmail.com>
Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
Acked-by: NCorey Minyard <minyard@acm.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f46c77c2

procfs: simplify conditional processing of fs/proc.o. · cfbef3cb

由 Robert P. J. Day 提交于 8月 10, 2010

Since the entire fs/proc directory is conditionally included based on
CONFIG_PROC_FS, it's redundant to check that same variable within that
directory.
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cfbef3cb

signalfd: fill in ssi_int for posix timers and message queues · a2a20c41

由 Nathan Lynch 提交于 8月 10, 2010

If signalfd is used to consume a signal generated by a POSIX interval
timer or POSIX message queue, the ssi_int field does not reflect the data
(sigevent->sigev_value) supplied to timer_create(2) or mq_notify(3).  (The
ssi_ptr field, however, is filled in.)

This behavior differs from signalfd's treatment of sigqueue-generated
signals -- see the default case in signalfd_copyinfo.  It also gives
results that differ from the case when a signal is handled conventionally
via a sigaction-registered handler.

So, set signalfd_siginfo->ssi_int in the remaining cases (__SI_TIMER,
__SI_MESGQ) where ssi_ptr is set.

akpm: a non-back-compatible change.  Merge into -stable to minimise the
number of kernels which are in the field and which miss this feature.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Acked-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2a20c41

ptrace: optimize exit_ptrace() for the likely case · c7e49c14

由 Oleg Nesterov 提交于 8月 10, 2010

exit_ptrace() takes tasklist_lock unconditionally.  We need this lock to
avoid the race with ptrace_traceme(), it acts as a barrier.

Change its caller, forget_original_parent(), to call exit_ptrace() under
tasklist_lock.  Change exit_ptrace() to drop and reacquire this lock if
needed.

This allows us to add the fastpath list_empty(ptraced) check.  In the
likely no-tracees case exit_ptrace() just returns and we avoid the lock()
+ unlock() sequence.

"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> suggested to add this
check, and he reports that this change adds about 11% improvement in some
tests.
Suggested-and-tested-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7e49c14

memcg: convert to use zone_to_nid() from bare zone->zone_pgdat->node_id · 13d7e3a2

由 KOSAKI Motohiro 提交于 8月 10, 2010

We have zone_to_nid().  this patch convert all existing users of
zone->zone_pgdat->node_id.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13d7e3a2

memcg: remove nid and zid argument from mem_cgroup_soft_limit_reclaim() · 00918b6a

由 KOSAKI Motohiro 提交于 8月 10, 2010

mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument.  but nid
and zid can be calculated from zone.  So remove it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00918b6a

memcg: mem_cgroup_shrink_node_zone() doesn't need sc.nodemask · 14fec796

由 KOSAKI Motohiro 提交于 8月 10, 2010

Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly.  thus
it doesn't need to initialize sc.nodemask because shrink_zone() doesn't
use it at all.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

14fec796

memcg: kill unnecessary initialization in mem_cgroup_shrink_node_zone() · da280d63

由 KOSAKI Motohiro 提交于 8月 10, 2010

sc.nr_reclaimed and sc.nr_scanned have already been initialized few lines
above "struct scan_control sc = {}" statement.

So, This patch remove this unnecessary code.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da280d63

memcg: sc.nr_to_reclaim should be initialized · b8f5c566

由 KOSAKI Motohiro 提交于 8月 10, 2010

Currently, mem_cgroup_shrink_node_zone() initialize sc.nr_to_reclaim as 0.
 It mean shrink_zone() only scan 32 pages and immediately return even if
it doesn't reclaim any pages.

This patch fixes it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8f5c566

memcg: avoid css_get() · f75ca962

由 KAMEZAWA Hiroyuki 提交于 8月 10, 2010

Now, memory cgroup increments css(cgroup subsys state)'s reference count
per a charged page.  And the reference count is kept until the page is
uncharged.  But this has 2 bad effect.

 1. Because css_get/put calls atomic_inc()/dec, heavy call of them
    on large smp will not scale well.
 2. Because css's refcnt cannot be in a state as "ready-to-release",
    cgroup's notify_on_release handler can't work with memcg.
 3. css's refcnt is atomic_t, it means smaller than 32bit. Maybe too small.

This has been a problem since the 1st merge of memcg.

This is a trial to remove css's refcnt per a page. Even if we remove
refcnt, pre_destroy() does enough synchronization as
  - check res->usage == 0.
  - check no pages on LRU.

This patch removes css's refcnt per page.  Even after this patch, at the
1st look, it seems css_get() is still called in try_charge().

But the logic is.

  - If a memcg of mm->owner is cached one, consume_stock() will work.
    At success, return immediately.
  - If consume_stock returns false, css_get() is called and go to
    slow path which may be blocked. At the end of slow path,
    css_put() is called and restart from the start if necessary.

So, in the fast path, we don't call css_get() and can avoid access to
shared counter. This patch can make the most possible case fast.

Here is a result of multi-threaded page fault benchmark.

[Before]
    25.32%  multi-fault-all  [kernel.kallsyms]      [k] clear_page_c
     9.30%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irqsave
     8.02%  multi-fault-all  [kernel.kallsyms]      [k] try_get_mem_cgroup_from_mm <=====(*)
     7.83%  multi-fault-all  [kernel.kallsyms]      [k] down_read_trylock
     5.38%  multi-fault-all  [kernel.kallsyms]      [k] __css_put
     5.29%  multi-fault-all  [kernel.kallsyms]      [k] __alloc_pages_nodemask
     4.92%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irq
     4.24%  multi-fault-all  [kernel.kallsyms]      [k] up_read
     3.53%  multi-fault-all  [kernel.kallsyms]      [k] css_put
     2.11%  multi-fault-all  [kernel.kallsyms]      [k] handle_mm_fault
     1.76%  multi-fault-all  [kernel.kallsyms]      [k] __rmqueue
     1.64%  multi-fault-all  [kernel.kallsyms]      [k] __mem_cgroup_commit_charge

[After]
    28.41%  multi-fault-all  [kernel.kallsyms]      [k] clear_page_c
    10.08%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irq
     9.58%  multi-fault-all  [kernel.kallsyms]      [k] down_read_trylock
     9.38%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irqsave
     5.86%  multi-fault-all  [kernel.kallsyms]      [k] __alloc_pages_nodemask
     5.65%  multi-fault-all  [kernel.kallsyms]      [k] up_read
     2.82%  multi-fault-all  [kernel.kallsyms]      [k] handle_mm_fault
     2.64%  multi-fault-all  [kernel.kallsyms]      [k] mem_cgroup_add_lru_list
     2.48%  multi-fault-all  [kernel.kallsyms]      [k] __mem_cgroup_commit_charge

Then, 8.02% of try_get_mem_cgroup_from_mm() disappears because this patch
removes css_tryget() in it. (But yes, this is an extreme case.)
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f75ca962

memcg: use find_lock_task_mm() in memory cgroups oom · 158e0a2d

由 KAMEZAWA Hiroyuki 提交于 8月 10, 2010

When the OOM killer scans task, it check a task is under memcg or
not when it's called via memcg's context.

But, as Oleg pointed out, a thread group leader may have NULL ->mm
and task_in_mem_cgroup() may do wrong decision. We have to use
find_lock_task_mm() in memcg as generic OOM-Killer does.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

158e0a2d

memcg: remove mem from arg of charge_common · 73045c47

由 Daisuke Nishimura 提交于 8月 10, 2010

mem_cgroup_charge_common() is always called with @mem = NULL, so it's
meaningless.  This patch removes it.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73045c47

memcg: remove redundant code · bd0d24bf

由 Daisuke Nishimura 提交于 8月 10, 2010

- try_get_mem_cgroup_from_mm() calls rcu_read_lock/unlock by itself, so we
  don't have to call them in task_in_mem_cgroup().
- *mz is not used in __mem_cgroup_uncharge_common().
- we don't have to call lookup_page_cgroup() in mem_cgroup_end_migration()
  after we've cleared PCG_MIGRATION of @oldpage.
- remove empty comment.
- remove redundant empty line in mem_cgroup_cache_charge().
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd0d24bf

memcg: clean up waiting move acct · 2bd9bb20

由 KAMEZAWA Hiroyuki 提交于 8月 10, 2010

Now, for checking a memcg is under task-account-moving, we do css_tryget()
against mc.to and mc.from.  But this is just complicating things.  This
patch makes the check easier.

This patch adds a spinlock to move_charge_struct and guard modification of
mc.to and mc.from.  By this, we don't have to think about complicated
races arount this not-critical path.

[balbir@linux.vnet.ibm.com: don't crash on a null memcg being passed]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2bd9bb20

memcg: clean up try_charge main loop · 4b534334

由 KAMEZAWA Hiroyuki 提交于 8月 10, 2010

mem_cgroup_try_charge() has a big loop in it and seems to be hard to read.
 Most of routines are for slow path.  This patch moves codes out from the
loop and make it clear what's done.

Summary:
 - refactoring a function to detect a memcg is under acccount move or not.
 - refactoring a function to wait for the end of moving task acct.
 - refactoring a main loop('s slow path) as a function and make it clear
   why we retry or quit by return code.
 - add fatal_signal_pending() check for bypassing charge loops.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b534334

memcg: remove experimental from swap account config · 65e0e811

由 KAMEZAWA Hiroyuki 提交于 8月 10, 2010

It's 11 months since we changed swap_map[] to indicates SWAP_HAS_CACHE.
Since that, memcg's swap accounting has been very stable and it seems
it can be maintained.

So, I'd like to remove EXPERIMENTAL from the config.
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65e0e811

blkdev: cgroup whitelist permission fix · b7300b78

由 Chris Wright 提交于 8月 10, 2010

The cgroup device whitelist code gets confused when trying to grant
permission to a disk partition that is not currently open.  Part of
blkdev_open() includes __blkdev_get() on the whole disk.

Basically, the only ways to reliably allow a cgroup access to a partition
on a block device when using the whitelist are to 1) also give it access
to the whole block device or 2) make sure the partition is already open in
a different context.

The patch avoids the cgroup check for the whole disk case when opening a
partition.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=589662Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Tested-by: NSerge E. Hallyn <serue@us.ibm.com>
Reported-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7300b78

cgroups: save space for the terminator · e400c285

由 Dan Carpenter 提交于 8月 10, 2010

The original code didn't leave enough space for a NULL terminator.  These
strings are copied with strcpy() into fixed length buffers in
cgroup_root_from_opts().
Signed-off-by: NDan Carpenter <error27@gmail.com>
Acked-by: NSerge E. Hallyn <serge@hallyn.com>
Reviewd-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ben Blum <bblum@andrew.cmu.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e400c285

Documentation/padata.txt: fix typos etc. · 2b24706a

由 Randy Dunlap 提交于 8月 10, 2010

Fix typos & grammar.
Use CPU instead of cpu in text.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b24706a

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功