提交 · a8d86cd75b709a9c9402c46674ea188493c53901 · openanolis / cloud-kernel

14 9月, 2011 4 次提交

SUNRPC: compare scopeid for link-local addresses · 038c0159

由 Mi Jinlong 提交于 8月 30, 2011

For ipv6 link-local addresses, sunrpc do not compare those scope id.
This patch let sunrpc compares scope id only on link-local addresses.
Signed-off-by: NMi Jinlong <mijinlong@cn.fujitsu.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

038c0159

SUNRPC: Replace svc_addr_u by sockaddr_storage · 849a1cf1

由 Mi Jinlong 提交于 8月 30, 2011

For IPv6 local address, lockd can not callback to client for
missing scope id when binding address at inet6_bind:

 324       if (addr_type & IPV6_ADDR_LINKLOCAL) {
 325               if (addr_len >= sizeof(struct sockaddr_in6) &&
 326                   addr->sin6_scope_id) {
 327                       /* Override any existing binding, if another one
 328                        * is supplied by user.
 329                        */
 330                       sk->sk_bound_dev_if = addr->sin6_scope_id;
 331               }
 332
 333               /* Binding to link-local address requires an interface */
 334               if (!sk->sk_bound_dev_if) {
 335                       err = -EINVAL;
 336                       goto out_unlock;
 337               }

Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
besides address.
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NMi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

849a1cf1

NFSD: Remove the ex_pathname field from struct svc_export · 2f1ddda1

由 Trond Myklebust 提交于 9月 12, 2011

There are no more users...
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2f1ddda1

NFSD: Cleanup for nfsd4_path() · ed748aac

由 Trond Myklebust 提交于 9月 12, 2011

The current code is sort of hackish in that it assumes a referral is always
matched to an export. When we add support for junctions that may not be the
case.
We can replace nfsd4_path() with a function that encodes the components
directly from the dentries. Since nfsd4_path is currently the only user of
the 'ex_pathname' field in struct svc_export, this has the added benefit
of allowing us to get rid of that.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ed748aac

31 8月, 2011 1 次提交
- J
  nfsd: remove include/linux/nfsd/syscall.h · c152292f
  由 J. Bruce Fields 提交于 8月 26, 2011
```
We don't need this any more.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
  c152292f
28 8月, 2011 1 次提交
- J
  nfsd4: cleanup and consolidate seqid_mutating_err · a9004abc
  由 J. Bruce Fields 提交于 8月 23, 2011
```
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
  a9004abc
27 8月, 2011 2 次提交

Remove include/linux/nfsd/const.h · c10bd39d

由 J. Bruce Fields 提交于 8月 19, 2011

Userspace shouldn't have a use for these constants.  Nothing here is
used outside fs/nfsd.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c10bd39d

nfsd: remove unused defines · 8cfb7913

由 J. Bruce Fields 提交于 8月 08, 2011

At least one of these is actually wrong anyway.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8cfb7913

20 8月, 2011 3 次提交

sunrpc: use better NUMA affinities · 11fd165c

由 Eric Dumazet 提交于 7月 28, 2011

Use NUMA aware allocations to reduce latencies and increase throughput.

sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: NGreg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

11fd165c

locks: fix tracking of inprogress lease breaks · 778fc546

由 J. Bruce Fields 提交于 7月 26, 2011

We currently use a bit in fl_flags to record whether a lease is being
broken, and set fl_type to the type (RDLCK or UNLCK) that it will
eventually have.  This means that once the lease break starts, we forget
what the lease's type *used* to be.  Breaking a read lease will then
result in blocking read opens, even though there's no conflict--because
the lease type is now F_UNLCK and we can no longer tell whether it was
previously a read or write lease.

So, instead keep fl_type as the original type (the type which we
enforce), and keep track of whether we're unlocking or merely
downgrading by replacing the single FL_INPROGRESS flag by
FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags.

To get this right we also need to track separate downgrade and break
times, to handle the case where a write-leased file gets conflicting
opens first for read, then later for write.

(I first considered just eliminating the downgrade behavior
completely--nfsv4 doesn't need it, and nobody as far as I can tell
actually uses it currently--but Jeremy Allison tells me that Windows
oplocks do behave this way, so Samba will probably use this some day.)
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

778fc546

locks: move F_INPROGRESS from fl_type to fl_flags field · 710b7216

由 J. Bruce Fields 提交于 7月 26, 2011

F_INPROGRESS isn't exposed to userspace.  To me it makes more sense in
fl_flags....
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

710b7216

08 8月, 2011 1 次提交

fix rcu annotations noise in cred.h · 32955148

由 Al Viro 提交于 8月 07, 2011

task->cred is declared as __rcu, and access to other tasks' ->cred is,
indeed, protected. Access to current->cred does not need rcu_dereference()
at all, since only the task itself can change its ->cred. sparse, of
course, has no way of knowing that...

Add force-cast in current_cred(), make current_fsuid() et.al. use it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32955148

07 8月, 2011 5 次提交

vfs: optimize inode cache access patterns · 3ddcd056

由 Linus Torvalds 提交于 8月 06, 2011

The inode structure layout is largely random, and some of the vfs paths
really do care.  The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible.  The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture.  So there's more tuning
to be done.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ddcd056

vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e

由 Linus Torvalds 提交于 8月 06, 2011

Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.

And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

830c0f0e

net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea

由 David S. Miller 提交于 8月 03, 2011

Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.
Reported-by: NDan Kaminsky <dan@doxpara.com>
Tested-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e5714ea

crypto: Move md5_transform to lib/md5.c · bc0b96b5

由 David S. Miller 提交于 8月 03, 2011

We are going to use this for TCP/IP sequence number and fragment ID
generation.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc0b96b5

lib/sha1: use the git implementation of SHA-1 · 1eb19a12

由 Mandeep Singh Baines 提交于 8月 05, 2011

For ChromiumOS, we use SHA-1 to verify the integrity of the root
filesystem.  The speed of the kernel sha-1 implementation has a major
impact on our boot performance.

To improve boot performance, we investigated using the heavily optimized
sha-1 implementation used in git.  With the git sha-1 implementation, we
see a 11.7% improvement in boot time.

10 reboots, remove slowest/fastest.

Before:

  Mean: 6.58 seconds Stdev: 0.14

After (with git sha-1, this patch):

  Mean: 5.89 seconds Stdev: 0.07

The other cool thing about the git SHA-1 implementation is that it only
needs 64 bytes of stack for the workspace while the original kernel
implementation needed 320 bytes.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Cc: Nicolas Pitre <nico@cam.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1eb19a12

06 8月, 2011 1 次提交

Add KEY_MICMUTE and enable it on Lenovo X220 · 33009557

由 Andy Lutomirski 提交于 5月 24, 2011

I suspect that this works on T410.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Signed-off-by: NMatthew Garrett <mjg@redhat.com>

33009557

05 8月, 2011 1 次提交

nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4 · 655b1612

由 Boaz Harrosh 提交于 5月 29, 2011

exofs file system wants to use pnfs_osd_xdr.h file instead of
redefining pnfs-objects types in it's private "pnfs.h" headr.

Before we do the switch we must make sure pnfs_osd_xdr.h is
compilable also under NFS versions smaller than 4.1. Since now
it is needed regardless of version, by the exofs code.

nfs4_string is not the only nfs4 type out in the global scope.
Ack-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

655b1612

04 8月, 2011 14 次提交

Revert "dt: add of_alias_scan and of_alias_get_id" · fe55c184

由 Grant Likely 提交于 8月 04, 2011

This reverts commit 750f463a.

of_alias_* still needs work to be generalized for 'promtree' dt
platforms, and to no implicitly create entries for available ids.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

fe55c184

tmpfs radix_tree: locate_item to speed up swapoff · e504f3fd

由 Hugh Dickins 提交于 8月 03, 2011

We have already acknowledged that swapoff of a tmpfs file is slower than
it was before conversion to the generic radix_tree: a little slower
there will be acceptable, if the hotter paths are faster.

But it was a shock to find swapoff of a 500MB file 20 times slower on my
laptop, taking 10 minutes; and at that rate it significantly slows down
my testing.

Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before; and
more realistic tests on other machines don't fare as badly.

I've tried a number of things to improve it, including tagging the swap
entries, then doing lookup by tag: I'd expected that to halve the time,
but in practice it's erratic, and often counter-productive.

The only change I've so far found to make a consistent improvement, is
to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry. Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).

So, add radix_tree_locate_item() as an expedient, once-off,
single-caller hack to do the lookup directly in place. #ifdef it on
CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
applicability as save space in other configurations. And, sadly,
#include sched.h for cond_resched().
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e504f3fd

tmpfs: use kmemdup for short symlinks · 69f07ec9

由 Hugh Dickins 提交于 8月 03, 2011

But we've not yet removed the old swp_entry_t i_direct[16] from
shmem_inode_info. That's because it was still being shared with the
inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode
size), and use kmemdup() for short symlinks, say, those up to 128 bytes.

I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
rather than shmem_evict_inode(), where we usually do such freeing? I
guess it doesn't matter, and I'm not into NUMA mpol testing right now.
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69f07ec9

tmpfs: convert mem_cgroup shmem to radix-swap · aa3b1895

由 Hugh Dickins 提交于 8月 03, 2011

Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
had to move swappage to filecache with GFP_NOWAIT.

Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
moving its call out from shmem_add_to_page_cache() to two of thats three
callers.  But leave it doing mem_cgroup_uncharge_cache_page() on error:
although asymmetrical, it's easier for all 3 callers to handle.

These two changes would also be appropriate if anyone were to start
using shmem_read_mapping_page_gfp() with GFP_NOWAIT.

Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
radix_tree_exceptional_entry() to get what it needs for itself.
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa3b1895

tmpfs: miscellaneous trivial cleanups · 41ffe5d5

由 Hugh Dickins 提交于 8月 03, 2011

While it's at its least, make a number of boring nitpicky cleanups to
shmem.c, mostly for consistency of variable naming.  Things like "swap"
instead of "entry", "pgoff_t index" instead of "unsigned long idx".

And since everything else here is prefixed "shmem_", better change
init_tmpfs() to shmem_init().
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41ffe5d5

tmpfs: demolish old swap vector support · 285b2c4f

由 Hugh Dickins 提交于 8月 03, 2011

The maximum size of a shmem/tmpfs file has been limited by the maximum
size of its triple-indirect swap vector. With 4kB page size, maximum
filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
that on a 64-bit kernel. (With 8kB page size, maximum filesize was just
over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)

It's a shame that tmpfs should be more restrictive than ramfs, and this
limitation has now been noticed. Add another level to the swap vector?
No, it became obscure and hard to maintain, once I complicated it to
make use of highmem pages nine years ago: better choose another way.

Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
tmpfs would never have invented its own peculiar radix tree: we would
have fitted swap entries into the common radix tree instead, in much the
same way as we fit swap entries into page tables.

And why should each file have a separate radix tree for its pages and
for its swap entries? The swap entries are required precisely where and
when the pages are not. We want to put them together in a single radix
tree: which can then avoid much of the locking which was needed to
prevent them from being exchanged underneath us.

This also avoids the waste of memory devoted to swap vectors, first in
the shmem_inode itself, then at least two more pages once a file grew
beyond 16 data pages (pages accounted by df and du, but not by memcg).
Allocated upfront, to avoid allocation when under swapping pressure, but
pure waste when CONFIG_SWAP is not set - I have never spattered around
the ifdefs to prevent that, preferring this move to sharing the common
radix tree instead.

There are three downsides to sharing the radix tree. One, that it binds
tmpfs more tightly to the rest of mm, either requiring knowledge of swap
entries in radix tree there, or duplication of its code here in shmem.c.
I believe that the simplications and memory savings (and probable higher
performance, not yet measured) justify that.

Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
nodes that cannot be freed under memory pressure - whereas before it was
the less precious highmem swap vector pages that could not be freed.
I'm hoping that 64-bit has now been accessible for long enough, that the
highmem argument has grown much less persuasive.

Three, that swapoff is slower than it used to be on tmpfs files, since
it's using a simple generic mechanism not tailored to it: I find this
noticeable, and shall want to improve, but maybe nobody else will
notice.

So... now remove most of the old swap vector code from shmem.c. But,
for the moment, keep the simple i_direct vector of 16 pages, with simple
accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
to help mark where swap needs to be handled in subsequent patches.
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

285b2c4f

mm: let swap use exceptional entries · a2c16d6c

由 Hugh Dickins 提交于 8月 03, 2011

If swap entries are to be stored along with struct page pointers in a
radix tree, they need to be distinguished as exceptional entries.

Most of the handling of swap entries in radix tree will be contained in
shmem.c, but a few functions in filemap.c's common code need to check
for their appearance: find_get_page(), find_lock_page(),
find_get_pages() and find_get_pages_contig().

So as not to slow their fast paths, tuck those checks inside the
existing checks for unlikely radix_tree_deref_slot(); except for
find_lock_page(), where it is an added test. And make it a BUG in
find_get_pages_tag(), which is not applied to tmpfs files.

A part of the reason for eliminating shmem_readpage() earlier, was to
minimize the places where common code would need to allow for swap
entries.

The swp_entry_t known to swapfile.c must be massaged into a slightly
different form when stored in the radix tree, just as it gets massaged
into a pte_t when stored in page tables.

In an i386 kernel this limits its information (type and page offset) to
30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum
swapfile size of 128GB. Which is less than the 512GB we previously
allowed with X86_PAE (where the swap entry can occupy the entire upper
32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and
there's not a new limitation on 64-bit (where swap filesize is already
limited to 16TB by a 32-bit page offset). Thirty areas of 128GB is
probably still enough swap for a 64GB 32-bit machine.

Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and
enforce filesize limit in read_swap_header(), just as for ptes.
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2c16d6c

radix_tree: exceptional entries and indices · 6328650b

由 Hugh Dickins 提交于 8月 03, 2011

A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
peculiar swap vector, instead keeping a file's swap entries in the same
radix tree as its struct page pointers: thus saving memory, and
simplifying its code and locking.

This patch:

The radix_tree is used by several subsystems for different purposes.  A
major use is to store the struct page pointers of a file's pagecache for
memory management.  But what if mm wanted to store something other than
page pointers there too?

The low bit of a radix_tree entry is already used to denote an indirect
pointer, for internal use, and the unlikely radix_tree_deref_retry()
case.

Define the next bit as denoting an exceptional entry, and supply inline
functions radix_tree_exception() to return non-0 in either unlikely
case, and radix_tree_exceptional_entry() to return non-0 in the second
case.

If a subsystem already uses radix_tree with that bit set, no problem: it
does not affect internal workings at all, but is defined for the
convenience of those storing well-aligned pointers in the radix_tree.

The radix_tree_gang_lookups have an implicit assumption that the caller
can deduce the offset of each entry returned e.g.  by the page->index of
a struct page.  But that may not be feasible for some kinds of item to
be stored there.

radix_tree_gang_lookup_slot() allow for an optional indices argument,
output array in which to return those offsets.  The same could be added
to other radix_tree_gang_lookups, but for now keep it to the only one
for which we need it.
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6328650b

drivers/video/backlight/aat2870_bl.c: fix setting max_current · 5d6f921b

由 Axel Lin 提交于 8月 03, 2011

 - Current implementation tests wrong value for setting
   aat2870_bl->max_current.

 - In the current implementation, we cannot differentiate between 2 cases:

   a) if pdata->max_current is not set , or

   b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0).

   Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in
   aat2870_brightness() accordingly.
Signed-off-by: NAxel Lin <axel.lin@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Tested-by: NJin Park <jinyoungp@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d6f921b

mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODE · 3dab1bce

由 Johannes Weiner 提交于 8月 03, 2011

__GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes.
It's supposed to be passed through from the page allocator to
zone_statistics(), but it never gets there as gfp_allowed_mask is not
wide enough and masks out the flag early in the allocation path.

The result is an accounting glitch where successful NUMA allocations
by-agent are not properly attributed as local.

Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dab1bce

ida: simplified functions for id allocation · 88eca020

由 Rusty Russell 提交于 8月 03, 2011

The current hyper-optimized functions are overkill if you simply want to
allocate an id for a device.  Create versions which use an internal
lock.

In followup patches, numerous drivers are converted to use this
interface.

Thanks to Tejun for feedback.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NJonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88eca020

fault-injection: add ability to export fault_attr in arbitrary directory · dd48c085

由 Akinobu Mita 提交于 8月 03, 2011

init_fault_attr_dentries() is used to export fault_attr via debugfs.
But it can only export it in debugfs root directory.

Per Forlin is working on mmc_fail_request which adds support to inject
data errors after a completed host transfer in MMC subsystem.

The fault_attr for mmc_fail_request should be defined per mmc host and
export it in debugfs directory per mmc host like
/sys/kernel/debug/mmc0/mmc_fail_request.

init_fault_attr_dentries() doesn't help for mmc_fail_request.  So this
introduces fault_create_debugfs_attr() which is able to create a
directory in the arbitrary directory and replace
init_fault_attr_dentries().

[akpm@linux-foundation.org: extraneous semicolon, per Randy]
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Tested-by: NPer Forlin <per.forlin@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd48c085

cpuidle: stop depending on pm_idle · a0bfa137

由 Len Brown 提交于 4月 01, 2011

cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.

Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:

  my_arch_cpu_idle()
	...
	if(cpuidle_call_idle())
		pm_idle();

cc: Kevin Hilman <khilman@deeprootsystems.com>
cc: Paul Mundt <lethal@linux-sh.org>
cc: x86@kernel.org
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

a0bfa137

cpuidle: replace xen access to x86 pm_idle and default_idle · d91ee586

由 Len Brown 提交于 4月 01, 2011

When a Xen Dom0 kernel boots on a hypervisor, it gets access
to the raw-hardware ACPI tables.  While it parses the idle tables
for the hypervisor's beneift, it uses HLT for its own idle.

Rather than have xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used.

cc: xen-devel@lists.xensource.com
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

d91ee586

03 8月, 2011 7 次提交

HWPoison: add memory_failure_queue() · ea8f5fb8

由 Huang Ying 提交于 7月 13, 2011

memory_failure() is the entry point for HWPoison memory error
recovery.  It must be called in process context.  But commonly
hardware memory errors are notified via MCE or NMI, so some delayed
execution mechanism must be used.  In MCE handler, a work queue + ring
buffer mechanism is used.

In addition to MCE, now APEI (ACPI Platform Error Interface) GHES
(Generic Hardware Error Source) can be used to report memory errors
too.  To add support to APEI GHES memory recovery, a mechanism similar
to that of MCE is implemented.  memory_failure_queue() is the new
entry point that can be called in IRQ context.  The next step is to
make MCE handler uses this interface too.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

ea8f5fb8

lib, Make gen_pool memory allocator lockless · 7f184275

由 Huang Ying 提交于 7月 13, 2011

This version of the gen_pool memory allocator supports lockless
operation.

This makes it safe to use in NMI handlers and other special
unblockable contexts that could otherwise deadlock on locks.  This is
implemented by using atomic operations and retries on any conflicts.
The disadvantage is that there may be livelocks in extreme cases.  For
better scalability, one gen_pool allocator can be used for each CPU.

The lockless operation only works if there is enough memory available.
If new memory is added to the pool a lock has to be still taken.  So
any user relying on locklessness has to ensure that sufficient memory
is preallocated.

The basic atomic operation of this allocator is cmpxchg on long.  On
architectures that don't have NMI-safe cmpxchg implementation, the
allocator can NOT be used in NMI handler.  So code uses the allocator
in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

7f184275

lib, Add lock-less NULL terminated single list · f49f23ab

由 Huang Ying 提交于 7月 13, 2011

Cmpxchg is used to implement adding new entry to the list, deleting
all entries from the list, deleting first entry of the list and some
other operations.

Because this is a single list, so the tail can not be accessed in O(1).

If there are multiple producers and multiple consumers, llist_add can
be used in producers and llist_del_all can be used in consumers.  They
can work simultaneously without lock.  But llist_del_first can not be
used here.  Because llist_del_first depends on list->first->next does
not changed if list->first is not changed during its operation, but
llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
llist_add) sequence in another consumer may violate that.

If there are multiple producers and one consumer, llist_add can be
used in producers and llist_del_all or llist_del_first can be used in
the consumer.

This can be summarized as follow:

           |   add    | del_first |  del_all
 add       |    -     |     -     |     -
 del_first |          |     L     |     L
 del_all   |          |           |     -

Where "-" stands for no lock is needed, while "L" stands for lock is
needed.

The list entries deleted via llist_del_all can be traversed with
traversing function such as llist_for_each etc.  But the list entries
can not be traversed safely before deleted from the list.  The order
of deleted entries is from the newest to the oldest added one.  If you
want to traverse from the oldest to the newest, you must reverse the
order by yourself before traversing.

The basic atomic operation of this list is cmpxchg on long.  On
architectures that don't have NMI-safe cmpxchg implementation, the
list can NOT be used in NMI handler.  So code uses the list in NMI
handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

f49f23ab

dt: add of_alias_scan and of_alias_get_id · 750f463a

由 Shawn Guo 提交于 8月 03, 2011

The patch adds function of_alias_scan to populate a global lookup
table with the properties of 'aliases' node and function
of_alias_get_id for drivers to find alias id from the lookup table.
Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
[grant.likely: add locking and rework parse loop]
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

750f463a

A
RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached · 3567866b
由 Al Viro 提交于 8月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3567866b

get rid of boilerplate switches in posix_acl.h · 951c0d66

由 Al Viro 提交于 8月 03, 2011

the only potentially subtle thing here: get_cached_acl()
is never called with the second argument other than
ACL_TYPE_{ACCESS,DEFAULT}.  IOW, that return ERR_PTR(-EINVAL)
in there might as well be BUG().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

951c0d66

ACPI: remove NID_INVAL · f52e00c6

由 David Rientjes 提交于 7月 28, 2011

b552a8c5 ("ACPI: remove NID_INVAL") removed the left over uses of
NID_INVAL, but didn't actually remove the definition.  Remove it.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

f52e00c6

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功