提交 · fca78d6d2c77f87d7dbee89bbe4836a44da881e2 · openanolis / cloud-kernel

13 7月, 2011 3 次提交

NFS: Add SECINFO_NO_NAME procedure · fca78d6d

由 Bryan Schumaker 提交于 6月 02, 2011

If the client is using NFS v4.1, then we can use SECINFO_NO_NAME to find
the secflavor for the initial mount.  If the server doesn't support
SECINFO_NO_NAME then I fall back on the "guess and check" method used
for v4.0 mounts.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fca78d6d

NFS: move pnfs layouts to nfs_server structure · 6382a441

由 Weston Andros Adamson 提交于 6月 01, 2011

Layouts should be tracked per nfs_server (aka superblock)
instead of per struct nfs_client, which may have multiple FSIDs associated
with it.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

6382a441

NFS: use scope from exchange_id to skip reclaim · 78fe0f41

由 Weston Andros Adamson 提交于 5月 31, 2011

can be skipped if the "eir_server_scope" from the exchange_id proc differs from
previous calls.

Also, in the future server_scope will be useful for determining whether client
trunking is available
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

78fe0f41

28 6月, 2011 7 次提交

mm: fix assertion mapping->nrpages == 0 in end_writeback() · 08142579

由 Jan Kara 提交于 6月 27, 2011

Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger.  This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:

	CPU0				CPU1
  ...
  shrink_page_list()
    __remove_mapping()
      __delete_from_page_cache()
        radix_tree_delete()
					evict_inode()
					  truncate_inode_pages()
					    truncate_inode_pages_range()
					      pagevec_lookup() - finds nothing
					  end_writeback()
					    mapping->nrpages != 0 -> BUG
        page->mapping = NULL
        mapping->nrpages--

Fix the problem by doing a reliable check of mapping->nrpages under
mapping->tree_lock in end_writeback().

Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out
by Miklos Szeredi <mszeredi@suse.de>.

Cc: Jay <jinshan.xiong@whamcloud.com>
Cc: Miklos Szeredi <mszeredi@suse.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08142579

include/linux/compat.h: declare compat_sys_sendmmsg() · 507c5f12

由 Chris Metcalf 提交于 6月 27, 2011

This is required for tilegx to be able to use the compat unistd.h header
where compat_sys_sendmmsg() is now mentioned.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

507c5f12

tmpfs: add shmem_read_mapping_page_gfp · d9d90e5e

由 Hugh Dickins 提交于 6月 27, 2011

Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
is unsuited to tmpfs, because it inserts a page into pagecache before
calling the filesystem's ->readpage: tmpfs may have pages in swapcache
which only it knows how to locate and switch to filecache.

At present tmpfs provides a ->readpage method, and copes with this by
copying pages; but soon we can simplify it by removing its ->readpage.
Provide shmem_read_mapping_page_gfp() now, ready for that transition,

Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
with shmem_read_mapping_page() inline for the common mapping_gfp case.

(shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
read_mapping_page functions use the mapping's ->readpage, and the
read_cache_page functions use the supplied filler, so I think
read_cache_page_gfp was slightly misnamed.)
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d9d90e5e

tmpfs: take control of its truncate_range · 94c1e62d

由 Hugh Dickins 提交于 6月 27, 2011

2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.

Slightly redefine the ->truncate_range interface: let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its ->setattr.

Let's rename shmem_notify_change() to shmem_setattr(). Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform its
own variant of truncate_inode_pages_range().

Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after ->truncate_range,
there is no need to call it a third time.

Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
that i915_gem_object_truncate() can call it explicitly in future; get
this patch in first, then update drm/i915 once this is available (until
then, i915 will just be doing the truncate_inode_pages() twice).

Though introduced five years ago, no other filesystem is implementing
->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
whereupon ->truncate_range can be removed from inode_operations -
shmem_truncate_range() will help i915 across that transition too.
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94c1e62d

mm: move shmem prototypes to shmem_fs.h · 072441e2

由 Hugh Dickins 提交于 6月 27, 2011

Before adding any more global entry points into shmem.c, gather such
prototypes into shmem_fs.h.  Remove mm's own declarations from swap.h,
but for now leave the ones in mm.h: because shmem_file_setup() and
shmem_zero_setup() are called from various places, and we should not
force other subsystems to update immediately.
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

072441e2

Fix some kernel-doc warnings · 4d258b25

由 Vitaliy Ivanov 提交于 6月 27, 2011

Fix 'make htmldocs' warnings:

Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'
Signed-off-by: NVitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d258b25

Fix node_start/end_pfn() definition for mm/page_cgroup.c · c6830c22

由 KAMEZAWA Hiroyuki 提交于 6月 16, 2011

commit 21a3c964 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.

Then, we see
  mm/page_cgroup.c: In function 'page_cgroup_init':
  mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
  mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'

So, fixiing page_cgroup.c is an idea...

But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)

This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.

A result of macro expansion is here (mm/page_cgroup.c)

for !NUMA
 start_pfn = ((&contig_page_data)->node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

for NUMA (x86-64)
  start_pfn = ((node_data[nid])->node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

Changelog:
 - fixed to avoid using "nid" twice in node_end_pfn() macro.
Reported-and-acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Reported-and-tested-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c6830c22

22 6月, 2011 2 次提交

PM: Fix async resume following suspend failure · 6d0e0e84

由 Alan Stern 提交于 6月 18, 2011

The PM core doesn't handle suspend failures correctly when it comes to
asynchronously suspended devices.  These devices are moved onto the
dpm_suspended_list as soon as the corresponding async thread is
started up, and they remain on the list even if they fail to suspend
or the sleep transition is cancelled before they get suspended.  As a
result, when the PM core unwinds the transition, it tries to resume
the devices even though they were never suspended.

This patch (as1474) fixes the problem by adding a new "is_suspended"
flag to dev_pm_info.  Devices are resumed only if the flag is set.

[rjw:
 * Moved the dev->power.is_suspended check into device_resume(),
   because we need to complete dev->power.completion and clear
   dev->power.is_prepared too for devices whose
   dev->power.is_suspended flags are unset.
 * Fixed __device_suspend() to avoid setting dev->power.is_suspended
   if async_error is different from zero.]
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org

6d0e0e84

PM: Rename dev_pm_info.in_suspend to is_prepared · f76b168b

由 Alan Stern 提交于 6月 18, 2011

This patch (as1473) renames the "in_suspend" field in struct
dev_pm_info to "is_prepared", in preparation for an upcoming change.
The new name is more descriptive of what the field really means.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org

f76b168b

21 6月, 2011 2 次提交

vfs: i_state needs to be 'unsigned long' for now · 79568f5b

由 Linus Torvalds 提交于 6月 20, 2011

Commit 13e12d14 ("vfs: reorganize 'struct inode' layout a bit")
moved things around a bit changed i_state to be unsigned int instead of
unsigned long.  That was to help structure layout for the 64-bit case,
and shrink 'struct inode' a bit (admittedly that only happened when
spinlock debugging was on and i_flags didn't pack with i_lock).

However, Meelis Roos reports that this results in unaligned exceptions
on sprc, and it turns out that the bit-locking primitives that we use
for the I_NEW bit want to use the bitops.  Which want 'unsigned long',
not 'unsigned int'.

We really should fix the bit locking code to not have that kind of
requirement, but that's a much bigger change.  So for now, revert that
field back to 'unsigned long' (but keep the other re-ordering changes
from the commit that caused this).

Andi points out that we have played games with this in 'struct page', so
it's solvable with other hacks too, but since right now the struct inode
size advantage only happens with some rare config options, it's not
worth fighting.

It _would_ be worth fixing the bitlocking code, though.  Especially
since there is no type safety in the bitlocking code (this never caused
any warnings, and worked fine on x86-64, because the bitlocks take a
'void *' and x86-64 doesn't care that deeply about alignment).  So it's
currently a very easy problem to trigger by mistake and never notice.
Reported-by: NMeelis Roos <mroos@linux.ee>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79568f5b

NFSv4.1: file layout must consider pg_bsize for coalescing · 19345cb2

由 Benny Halevy 提交于 6月 19, 2011

Otherwise we end up overflowing the rpc buffer size on the receive end.
Signed-off-by: NBenny Halevy <benny@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

19345cb2

20 6月, 2011 2 次提交

devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper · 482e0cd3

由 Al Viro 提交于 6月 19, 2011

inode_permission() calls devcgroup_inode_permission() and almost all such
calls are _not_ for device nodes; let's at least keep the common path
straight...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

482e0cd3

block: add REQ_SECURE to REQ_COMMON_MASK · 155d109b

由 Namhyung Kim 提交于 6月 20, 2011

Add REQ_SECURE flag to REQ_COMMON_MASK so that
init_request_from_bio() can pass it to @req->cmd_flags.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: stable@kernel.org # 2.6.36 and newer
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

155d109b

19 6月, 2011 1 次提交

mmc: Add PCI fixup quirks for Ricoh 1180:e823 reader · be98ca65

由 Manoj Iyer 提交于 5月 26, 2011

Signed-off-by: NManoj Iyer <manoj.iyer@canonical.com>
Cc: <stable@kernel.org>
Signed-off-by: NChris Ball <cjb@laptop.org>

be98ca65

18 6月, 2011 2 次提交

Input: sh_keysc - 8x8 MODE_6 fix · cca23d0b

由 Magnus Damm 提交于 6月 18, 2011

According to the data sheet for G4, AP4 and AG5 KEYSC MODE_6 is 8x8 keys.
Bump up MAXKEYS to 64 too.
Signed-off-by: NMagnus Damm <damm@opensource.se>
Reviewed-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

cca23d0b

KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring · 87966996

由 David Howells 提交于 6月 17, 2011

____call_usermodehelper() now erases any credentials set by the
subprocess_inf::init() function.  The problem is that commit
17f60a7d ("capabilites: allow the application of capability limits
to usermode helpers") creates and commits new credentials with
prepare_kernel_cred() after the call to the init() function.  This wipes
all keyrings after umh_keys_init() is called.

The best way to deal with this is to put the init() call just prior to
the commit_creds() call, and pass the cred pointer to init().  That
means that umh_keys_init() and suchlike can modify the credentials
_before_ they are published and potentially in use by the rest of the
system.

This prevents request_key() from working as it is prevented from passing
the session keyring it set up with the authorisation token to
/sbin/request-key, and so the latter can't assume the authority to
instantiate the key.  This causes the in-kernel DNS resolver to fail
with ENOKEY unconditionally.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Tested-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87966996

17 6月, 2011 2 次提交

generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts · d8ad7d11

由 Takao Indoh 提交于 3月 29, 2011

There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.

To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.

The details of the crash are:

 (1) 2nd kernel boots up

 (2) A pending IPI from 1st kernel comes when irqs are first enabled
     in start_kernel().

 (3) Kernel tries to handle the interrupt, but call_single_queue
     is not initialized yet at this point. As a result, in the
     generic_smp_call_function_single_interrupt(), NULL pointer
     dereference occurs when list_replace_init() tries to access
     &q->list.next.

Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().
Signed-off-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Milton Miller <miltonm@bga.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

d8ad7d11

clocksource: Make watchdog robust vs. interruption · b5199515

由 Thomas Gleixner 提交于 6月 16, 2011

The clocksource watchdog code is interruptible and it has been
observed that this can trigger false positives which disable the TSC.

The reason is that an interrupt storm or a long running interrupt
handler between the read of the watchdog source and the read of the
TSC brings the two far enough apart that the delta is larger than the
unstable treshold. Move both reads into a short interrupt disabled
region to avoid that.
Reported-and-tested-by: NVernon Mauery <vernux@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org

b5199515

16 6月, 2011 12 次提交

netfilter: nf_nat: avoid double seq_adjust for loopback · 42c1edd3

由 Julian Anastasov 提交于 6月 16, 2011

	Avoid double seq adjustment for loopback traffic
because it causes silent repetition of TCP data. One
example is passive FTP with DNAT rule and difference in the
length of IP addresses.

	This patch adds check if packet is sent and
received via loopback device. As the same conntrack is
used both for outgoing and incoming direction, we restrict
seq adjustment to happen only in POSTROUTING.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

42c1edd3

gpio: add GPIOF_ values regardless on kconfig settings · c001fb72

由 Randy Dunlap 提交于 6月 14, 2011

Make GPIOF_ defined values available even when GPIOLIB nor GENERIC_GPIO
is enabled by moving them to <linux/gpio.h>.

Fixes these build errors in linux-next:
sound/soc/codecs/ak4641.c:524: error: 'GPIOF_OUT_INIT_LOW' undeclared (first use in this function)
sound/soc/codecs/wm8915.c:2921: error: 'GPIOF_OUT_INIT_LOW' undeclared (first use in this function)
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

c001fb72

drm/radeon: workaround a hw bug on some radeon chipsets with all-0 EDIDs. · 4a9a8b71

由 Dave Airlie 提交于 6月 14, 2011

Some RS690 chipsets seem to end up with floating connectors, either
a DVI connector isn't actually populated, or an add-in HDMI card
is available but not installed. In this case we seem to get a NULL byte
response for each byte of the i2c transaction, so we detect this
case and if we see it we don't do anymore DDC transactions on this
connector.

I've tested this on my RS690 without the HDMI card installed and
it seems to work fine.
Signed-off-by: NDave Airlie <airlied@redhat.com>
Reviewed-by: NAlex Deucher <alexdeucher@gmail.com>

4a9a8b71

include/asm-generic/pgtable.h: fix unbalanced parenthesis · 49b24d6b

由 Nicolas Kaiser 提交于 6月 15, 2011

Signed-off-by: NNicolas Kaiser <nikai@nikai.net>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49b24d6b

uts: make default hostname configurable, rather than always using "(none)" · bd5dc17b

由 Josh Triplett 提交于 6月 15, 2011

The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist.  Distribution init scripts have the same
fallback.  However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs.  Furthermore, "(none)" doesn't typically resolve to anything
useful.

Make the default hostname configurable.  This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration.  Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NDavid Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd5dc17b

BUILD_BUG_ON_ZERO: fix sparse breakage · ca39599c

由 Dr. David Alan Gilbert 提交于 6月 15, 2011

BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the
CHECKER case otherwise various users of it become syntactically invalid.
Signed-off-by: NDr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ca39599c

mm: increase RECLAIM_DISTANCE to 30 · 32e45ff4

由 KOSAKI Motohiro 提交于 6月 15, 2011

Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
Xeon E5520 + Intel S5520UR MB).  He is using Cyrus IMAPd and it's built on
a very traditional single-process model.

  * a master process which reads config files and manages the other
    process
  * multiple imapd processes, one per connection
  * multiple pop3d processes, one per connection
  * multiple lmtpd processes, one per connection
  * periodical "cleanup" processes.

There are thousands of independent processes.  The problem is, recent
Intel motherboard turn on zone_reclaim_mode by default and traditional
prefork model software don't work well on it.  Unfortunatelly, such models
are still typical even in the 21st century.  We can't ignore them.

This patch raises the zone_reclaim_mode threshold to 30.  30 doesn't have
any specific meaning.  but 20 means that one-hop QPI/Hypertransport and
such relatively cheap 2-4 socket machine are often used for traditional
servers as above.  The intention is that these machines don't use
zone_reclaim_mode.

Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
This patch doesn't change such high-end NUMA machine behavior.

Dave Hansen said:

: I know specifically of pieces of x86 hardware that set the information
: in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
: behavior which that implies.
:
: They've done performance testing and run very large and scary benchmarks
: to make sure that they _want_ this turned on.  What this means for them
: is that they'll probably be de-optimized, at least on newer versions of
: the kernel.
:
: If you want to do this for particular systems, maybe _that_'s what we
: should do.  Have a list of specific configurations that need the
: defaults overridden either because they're buggy, or they have an
: unusual hardware configuration not really reflected in the distance
: table.

And later said:

: The original change in the hardware tables was for the benefit of a
: benchmark.  Said benchmark isn't going to get run on mainline until the
: next batch of enterprise distros drops, at which point the hardware where
: this was done will be irrelevant for the benchmark.  I'm sure any new
: hardware will just set this distance to another yet arbitrary value to
: make the kernel do what it wants.  :)
:
: Also, when the hardware got _set_ to this initially, I complained.  So, I
: guess I'm getting my way now, with this patch.  I'm cool with it.
Reported-by: NRobert Mueller <robm@fastmail.fm>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32e45ff4

kmsg_dump.h: fix build when CONFIG_PRINTK is disabled · ac562241

由 Randy Dunlap 提交于 6月 15, 2011

Fix <linux/kmsg_dump.h> when CONFIG_PRINTK is not enabled:

include/linux/kmsg_dump.h:56: error: 'EINVAL' undeclared (first use in this function)
include/linux/kmsg_dump.h:61: error: 'EINVAL' undeclared (first use in this function)

Looks like commit 595dd3d8 ("kmsg_dump: fix build for
CONFIG_PRINTK=n") uses EINVAL without having the needed header file(s),
but I'm sure that I build tested that patch also. oh well.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac562241

vmscan: implement swap token priority aging · d7911ef3

由 KOSAKI Motohiro 提交于 6月 15, 2011

While testing for memcg aware swap token, I observed a swap token was
often grabbed an intermittent running process (eg init, auditd) and they
never release a token.

Why?

Some processes (eg init, auditd, audispd) wake up when a process exiting.
And swap token can be get first page-in process when a process exiting
makes no swap token owner.  Thus such above intermittent running process
often get a token.

And currently, swap token priority is only decreased at page fault path.
Then, if the process sleep immediately after to grab swap token, the swap
token priority never be decreased.  That's obviously undesirable.

This patch implement very poor (and lightweight) priority aging.  It only
be affect to the above corner case and doesn't change swap tendency
workload performance (eg multi process qsbench load)
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7911ef3

vmscan: implement swap token trace · 83cd81a3

由 KOSAKI Motohiro 提交于 6月 15, 2011

This is useful for observing swap token activity.

example output:

             zsh-1845  [000]   598.962716: update_swap_token_priority:
mm=ffff88015eaf7700 old_prio=1 new_prio=0
          memtoy-1830  [001]   602.033900: update_swap_token_priority:
mm=ffff880037a45880 old_prio=947 new_prio=949
          memtoy-1830  [000]   602.041509: update_swap_token_priority:
mm=ffff880037a45880 old_prio=949 new_prio=951
          memtoy-1830  [000]   602.051959: update_swap_token_priority:
mm=ffff880037a45880 old_prio=951 new_prio=953
          memtoy-1830  [000]   602.052188: update_swap_token_priority:
mm=ffff880037a45880 old_prio=953 new_prio=955
          memtoy-1830  [001]   602.427184: put_swap_token:
token_mm=ffff880037a45880
             zsh-1789  [000]   602.427281: replace_swap_token:
old_token_mm=          (null) old_prio=0 new_token_mm=ffff88015eaf7018
new_prio=2
             zsh-1789  [001]   602.433456: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=2 new_prio=4
             zsh-1789  [000]   602.437613: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=4 new_prio=6
             zsh-1789  [000]   602.443924: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=6 new_prio=8
             zsh-1789  [000]   602.451873: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=8 new_prio=10
             zsh-1789  [001]   602.462639: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=10 new_prio=12
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel<riel@redhat.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83cd81a3

vmscan,memcg: memcg aware swap token · a433658c

由 KOSAKI Motohiro 提交于 6月 15, 2011

Currently, memcg reclaim can disable swap token even if the swap token mm
doesn't belong in its memory cgroup.  It's slightly risky.  If an admin
creates very small mem-cgroup and silly guy runs contentious heavy memory
pressure workload, every tasks are going to lose swap token and then
system may become unresponsive.  That's bad.

This patch adds 'memcg' parameter into disable_swap_token().  and if the
parameter doesn't match swap token, VM doesn't disable it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a433658c

backlight: new driver for the ADP8870 backlight devices · a59ec1e7

由 Michael Hennerich 提交于 6月 15, 2011

Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a59ec1e7

15 6月, 2011 3 次提交

NFSv4.1: deprecate headerpadsz in CREATE_SESSION · c9c30dd5

由 Benny Halevy 提交于 6月 11, 2011

We don't support header padding yet so better off ditching it
Reported-by: NSid Moore <learnmost@gmail.com>
Signed-off-by: NBenny Halevy <benny@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c9c30dd5

NLM: Don't hang forever on NLM unlock requests · 0b760113

由 Trond Myklebust 提交于 5月 31, 2011

If the NLM daemon is killed on the NFS server, we can currently end up
hanging forever on an 'unlock' request, instead of aborting. Basically,
if the rpcbind request fails, or the server keeps returning garbage, we
really want to quit instead of retrying.
Tested-by: NVasily Averin <vvs@sw.ru>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

0b760113

rcu: Use softirq to address performance regression · 09223371

由 Shaohua Li 提交于 6月 14, 2011

Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread. A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler. Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling. (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime. And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work. RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case. This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Tested-by: N"Alex,Shi" <alex.shi@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

09223371

14 6月, 2011 2 次提交

jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941

由 Jan Kara 提交于 6月 13, 2011

jbd2_journal_remove_journal_head() can oops when trying to access
journal_head returned by bh2jh(). This is caused for example by the
following race:

	TASK1					TASK2
  jbd2_journal_commit_transaction()
    ...
    processing t_forget list
      __jbd2_journal_refile_buffer(jh);
      if (!jh->b_transaction) {
        jbd_unlock_bh_state(bh);
					jbd2_journal_try_to_free_buffers()
					  jbd2_journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  jbd2_journal_put_journal_head(jh)
        jbd2_journal_remove_journal_head(bh);

jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
buffer is not part of any transaction and thus frees journal_head
before TASK1 gets to doing so. Note that even buffer_head can be
released by try_to_free_buffers() after
jbd2_journal_put_journal_head() which adds even larger opportunity for
oops (but I didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head
reference (in b_jcount). That way we don't have to remove journal_head
explicitely via jbd2_journal_remove_journal_head() and instead just
remove journal_head when b_jcount drops to zero. The result of this is
that [__]jbd2_journal_refile_buffer(),
[__]jbd2_journal_unfile_buffer(), and
__jdb2_journal_remove_checkpoint() can free journal_head which needs
modification of a few callers. Also we have to be careful because once
journal_head is removed, buffer_head might be freed as well. So we
have to get our own buffer_head reference where it matters.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

de1b7941

ASoC: Remove unused and about to be broken SND_SOC_CUSTOM I/O bus · e9c03905

由 Mark Brown 提交于 6月 13, 2011

This will be removed in -next so let's drop it from mainline as soon as
we can in order to minimise surprises.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>

e9c03905

13 6月, 2011 2 次提交

block: Add __attribute__((format(printf...) and fix fallout · 08e8138a

由 Joe Perches 提交于 6月 13, 2011

Use the compiler to verify format strings and arguments.

Fix fallout.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

08e8138a

Delay struct net freeing while there's a sysfs instance refering to it · a685e089

由 Al Viro 提交于 6月 08, 2011

	* new refcount in struct net, controlling actual freeing of the memory
	* new method in kobj_ns_type_operations (->drop_ns())
	* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped.  Method renamed to ->grab_current_ns().
	* old net_free() callers call net_drop_ns() instead.
	* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL.  That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.

	Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called.  The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a685e089

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功