提交 · 771fb4d806a92bf6c988fcfbd286ae40a9374332 · openanolis / cloud-kernel

11 12月, 2012 11 次提交

mm: mempolicy: Check for misplaced page · 771fb4d8

由 Lee Schermerhorn 提交于 10月 25, 2012

This patch provides a new function to test whether a page resides
on a node that is appropriate for the mempolicy for the vma and
address where the page is supposed to be mapped.  This involves
looking up the node where the page belongs.  So, the function
returns that node so that it may be used to allocated the page
without consulting the policy again.

A subsequent patch will call this function from the fault path.
Because of this, I don't want to go ahead and allocate the page, e.g.,
via alloc_page_vma() only to have to free it if it has the correct
policy.  So, I just mimic the alloc_page_vma() node computation
logic--sort of.

Note:  we could use this function to implement a MPOL_MF_STRICT
behavior when migrating pages to match mbind() mempolicy--e.g.,
to ensure that pages in an interleaved range are reinterleaved
rather than left where they are when they reside on any page in
the interleave nodemask.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ Added MPOL_F_LAZY to trigger migrate-on-fault;
  simplified code now that we don't have to bother
  with special crap for interleaved ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NMel Gorman <mgorman@suse.de>

771fb4d8

mm: mempolicy: Add MPOL_NOOP · d3a71033

由 Lee Schermerhorn 提交于 10月 25, 2012

This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" policy
to mbind().  When the NOOP policy is used with the 'MOVE and 'LAZY
flags, mbind() will map the pages PROT_NONE so that they will be
migrated on the next touch.

This allows an application to prepare for a new phase of operation
where different regions of shared storage will be assigned to
worker threads, w/o changing policy.  Note that we could just use
"default" policy in this case.  However, this also allows an
application to request that pages be migrated, only if necessary,
to follow any arbitrary policy that might currently apply to a
range of pages, without knowing the policy, or without specifying
multiple mbind()s for ranges with different policies.

[ Bug in early version of mpol_parse_str() reported by Fengguang Wu. ]
Bug-Reported-by: NReported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NMel Gorman <mgorman@suse.de>

d3a71033

mm: mempolicy: Make MPOL_LOCAL a real policy · 479e2802

由 Peter Zijlstra 提交于 10月 25, 2012

Make MPOL_LOCAL a real and exposed policy such that applications that
relied on the previous default behaviour can explicitly request it.
Requested-by: NChristoph Lameter <cl@linux.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NMel Gorman <mgorman@suse.de>

479e2802

mm: numa: Create basic numa page hinting infrastructure · d10e63f2

由 Mel Gorman 提交于 10月 25, 2012

Note: This patch started as "mm/mpol: Create special PROT_NONE
	infrastructure" and preserves the basic idea but steals *very*
	heavily from "autonuma: numa hinting page faults entry points" for
	the actual fault handlers without the migration parts.	The end
	result is barely recognisable as either patch so all Signed-off
	and Reviewed-bys are dropped. If Peter, Ingo and Andrea are ok with
	this version, I will re-add the signed-offs-by to reflect the history.

In order to facilitate a lazy -- fault driven -- migration of pages, create
a special transient PAGE_NUMA variant, we can then use the 'spurious'
protection faults to drive our migrations from.

The meaning of PAGE_NUMA depends on the architecture but on x86 it is
effectively PROT_NONE. Actual PROT_NONE mappings will not generate these
NUMA faults for the reason that the page fault code checks the permission on
the VMA (and will throw a segmentation fault on actual PROT_NONE mappings),
before it ever calls handle_mm_fault.

[dhillf@gmail.com: Fix typo]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>

d10e63f2

mm: numa: Support NUMA hinting page faults from gup/gup_fast · 0b9d7052

由 Andrea Arcangeli 提交于 10月 05, 2012

Introduce FOLL_NUMA to tell follow_page to check
pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do
so because it always invokes handle_mm_fault and retries the
follow_page later.

KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.

Other follow_page callers like KSM should not use FOLL_NUMA, or they
would fail to get the pages if they use follow_page instead of
get_user_pages.

[ This patch was picked up from the AutoNUMA tree. ]
Originally-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
[ ported to this tree. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NRik van Riel <riel@redhat.com>

0b9d7052

mm: numa: pte_numa() and pmd_numa() · be3a7284

由 Andrea Arcangeli 提交于 10月 04, 2012

Implement pte_numa and pmd_numa.

We must atomically set the numa bit and clear the present bit to
define a pte_numa or pmd_numa.

Once a pte or pmd has been set as pte_numa or pmd_numa, the next time
a thread touches a virtual address in the corresponding virtual range,
a NUMA hinting page fault will trigger. The NUMA hinting page fault
will clear the NUMA bit and set the present bit again to resolve the
page fault.

The expectation is that a NUMA hinting page fault is used as part
of a placement policy that decides if a page should remain on the
current node or migrated to a different node.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>

be3a7284

mm: compaction: Add scanned and isolated counters for compaction · 397487db

由 Mel Gorman 提交于 10月 19, 2012

Compaction already has tracepoints to count scanned and isolated pages
but it requires that ftrace be enabled and if that information has to be
written to disk then it can be disruptive. This patch adds vmstat counters
for compaction called compact_migrate_scanned, compact_free_scanned and
compact_isolated.

With these counters, it is possible to define a basic cost model for
compaction. This approximates of how much work compaction is doing and can
be compared that with an oprofile showing TLB misses and see if the cost of
compaction is being offset by THP for example. Minimally a compaction patch
can be evaluated in terms of whether it increases or decreases cost. The
basic cost model looks like this

Fundamental unit u:	a word	sizeof(void *)

Ca  = cost of struct page access = sizeof(struct page) / u

Cmc = Cost migrate page copy = (Ca + PAGE_SIZE/u) * 2
Cmf = Cost migrate failure   = Ca * 2
Ci  = Cost page isolation    = (Ca + Wi)
	where Wi is a constant that should reflect the approximate
	cost of the locking operation.

Csm = Cost migrate scanning = Ca
Csf = Cost free    scanning = Ca

Overall cost =	(Csm * compact_migrate_scanned) +
	      	(Csf * compact_free_scanned)    +
	      	(Ci  * compact_isolated)	+
		(Cmc * pgmigrate_success)	+
		(Cmf * pgmigrate_failed)

Where the values are read from /proc/vmstat.

This is very basic and ignores certain costs such as the allocation cost
to do a migrate page copy but any improvement to the model would still
use the same vmstat counters.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>

397487db

mm: migrate: Add a tracepoint for migrate_pages · 7b2a2d4a

由 Mel Gorman 提交于 10月 19, 2012

The pgmigrate_success and pgmigrate_fail vmstat counters tells the user
about migration activity but not the type or the reason. This patch adds
a tracepoint to identify the type of page migration and why the page is
being migrated.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>

7b2a2d4a

mm: compaction: Move migration fail/success stats to migrate.c · 5647bc29

由 Mel Gorman 提交于 10月 19, 2012

The compact_pages_moved and compact_pagemigrate_failed events are
convenient for determining if compaction is active and to what
degree migration is succeeding but it's at the wrong level. Other
users of migration may also want to know if migration is working
properly and this will be particularly true for any automated
NUMA migration. This patch moves the counters down to migration
with the new events called pgmigrate_success and pgmigrate_fail.
The compact_blocks_moved counter is removed because while it was
useful for debugging initially, it's worthless now as no meaningful
conclusions can be drawn from its value.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>

5647bc29

mm: Count the number of pages affected in change_protection() · 7da4d641

由 Peter Zijlstra 提交于 11月 19, 2012

This will be used for three kinds of purposes:

 - to optimize mprotect()

 - to speed up working set scanning for working set areas that
   have not been touched

 - to more accurately scan per real working set

No change in functionality from this patch.
Suggested-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

7da4d641

x86/mm: Introduce pte_accessible() · 2c3cf556

由 Rik van Riel 提交于 10月 09, 2012

We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that
the pte is associated with a page.

However, for TLB flushing purposes, we would like to know whether the pte
points to an actually accessible page.  This allows us to skip remote TLB
flushes for pages that are not actually accessible.

Fill in this method for x86 and provide a safe (but slower) method
on other architectures.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Fixed-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org
[ Added Linus's review fixes. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2c3cf556

17 11月, 2012 4 次提交

revert "mm: fix-up zone present pages" · 5576646f

由 Andrew Morton 提交于 11月 16, 2012

Revert commit 7f1290f2 ("mm: fix-up zone present pages")

That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM.  With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Tested-by: NChris Clayton <chris2553@googlemail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5576646f

rapidio: fix kernel-doc warnings · 2ca3cb50

由 Randy Dunlap 提交于 11月 16, 2012

Fix rapidio kernel-doc warnings:

Warning(drivers/rapidio/rio.c:415): No description found for parameter 'local'
Warning(drivers/rapidio/rio.c:415): Excess function parameter 'lstart' description in 'rio_map_inb_region'
Warning(include/linux/rio.h:290): No description found for parameter 'switches'
Warning(include/linux/rio.h:290): No description found for parameter 'destid_table'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Acked-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ca3cb50

memcg: fix hotplugged memory zone oops · bea8c150

由 Hugh Dickins 提交于 11月 16, 2012

When MEMCG is configured on (even when it's disabled by boot option),
when adding or removing a page to/from its lru list, the zone pointer
used for stats updates is nowadays taken from the struct lruvec.  (On
many configurations, calculating zone from page is slower.)

But we have no code to update all the lruvecs (per zone, per memcg) when
a memory node is hotadded.  Here's an extract from the oops which
results when running numactl to bind a program to a newly onlined node:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000f60
  IP:  __mod_zone_page_state+0x9/0x60
  Pid: 1219, comm: numactl Not tainted 3.6.0-rc5+ #180 Bochs Bochs
  Process numactl (pid: 1219, threadinfo ffff880039abc000, task ffff8800383c4ce0)
  Call Trace:
    __pagevec_lru_add_fn+0xdf/0x140
    pagevec_lru_move_fn+0xb1/0x100
    __pagevec_lru_add+0x1c/0x30
    lru_add_drain_cpu+0xa3/0x130
    lru_add_drain+0x2f/0x40
   ...

The natural solution might be to use a memcg callback whenever memory is
hotadded; but that solution has not been scoped out, and it happens that
we do have an easy location at which to update lruvec->zone.  The lruvec
pointer is discovered either by mem_cgroup_zone_lruvec() or by
mem_cgroup_page_lruvec(), and both of those do know the right zone.

So check and set lruvec->zone in those; and remove the inadequate
attempt to set lruvec->zone from lruvec_init(), which is called before
NODE_DATA(node) has been allocated in such cases.

Ah, there was one exceptionr.  For no particularly good reason,
mem_cgroup_force_empty_list() has its own code for deciding lruvec.
Change it to use the standard mem_cgroup_zone_lruvec() and
mem_cgroup_get_lru_size() too.  In fact it was already safe against such
an oops (the lru lists in danger could only be empty), but we're better
proofed against future changes this way.

I've marked this for stable (3.6) since we introduced the problem in 3.5
(now closed to stable); but I have no idea if this is the only fix
needed to get memory hotadd working with memcg in 3.6, and received no
answer when I enquired twice before.
Reported-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bea8c150

mm, oom: reintroduce /proc/pid/oom_adj · fa0cbbf1

由 David Rientjes 提交于 11月 12, 2012

This is mostly a revert of 01dc52eb ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.

It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels.  It simply scales the value linearly when /proc/pid/oom_score_adj
is written.

The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt.  We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.
Reported-by: NArtem S. Tashkinov <t.artem@lycos.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa0cbbf1

16 11月, 2012 1 次提交

clk: remove inline usage from clk-provider.h · 93532c8a

由 Igor Mazanov 提交于 11月 15, 2012

Users of GCC 4.7 have reported compiler errors due to having inline
applied to function declarations in clk-provider.h.  The definitions
exist in drivers/clk/clk.c.  An example error:

In file included from arch/arm/mach-omap2/clockdomain.c:25:0:
arch/arm/mach-omap2/clockdomain.c: In function ‘clkdm_clk_disable’:
include/linux/clk-provider.h:338:12: error: inlining failed in call to always_inline ‘__clk_get_enable_count’: function body not available
arch/arm/mach-omap2/clockdomain.c:1001:28: error: called from here
make[1]: *** [arch/arm/mach-omap2/clockdomain.o] Error 1
make: *** [arch/arm/mach-omap2] Error 2

This patch removes the use of inline from include/linux/clk-provider.h
but keeps the function definitions in drivers/clk/clk.c as inlined since
they are one-liners.
Signed-off-by: NIgor Mazanov <i.mazanov@gmail.com>
Acked-by: NPaul Walmsley <paul@pwsan.com>
Signed-off-by: NMike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: improved subject, added changelog]

93532c8a

10 11月, 2012 1 次提交

of/address: sparc: Declare of_address_to_resource() as an extern function for sparc again · 0bce04be

由 Andreas Larsson 提交于 11月 06, 2012

This bug-fix makes sure that of_address_to_resource is defined extern for sparc
so that the sparc-specific implementation of of_address_to_resource() is once
again used when including include/linux/of_address.h in a sparc context. A
number of drivers in mainline relies on this function working for sparc.

The bug was introduced in a850a755, "of/address:
add empty static inlines for !CONFIG_OF". Contrary to that commit title, the
static inlines are added for !CONFIG_OF_ADDRESS, and CONFIG_OF_ADDRESS is never
defined for sparc. This is good behavior for the other functions in
include/linux/of_address.h, as the extern functions defined in
drivers/of/address.c only gets linked when OF_ADDRESS is configured. However,
for of_address_to_resource there exists a sparc-specific implementation in
arch/sparc/arch/sparc/kernel/of_device_common.c

Solution suggested by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: NAndreas Larsson <andreas@gaisler.com>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bce04be

09 11月, 2012 1 次提交

revert "epoll: support for disabling items, and a self-test app" · a80a6b85

由 Andrew Morton 提交于 11月 08, 2012

Revert commit 03a7beb5 ("epoll: support for disabling items, and a
self-test app") pending resolution of the issues identified by Michael
Kerrisk, copied below.

We'll revisit this for 3.8.

: I've taken a look at this patch as it currently stands in 3.7-rc1, and
: done a bit of testing. (By the way, the test program
: tools/testing/selftests/epoll/test_epoll.c does not compile...)
:
: There are one or two places where the behavior seems a little strange,
: so I have a question or two at the end of this mail. But other than
: that, I want to check my understanding so that the interface can be
: correctly documented.
:
: Just to go though my understanding, the problem is the following
: scenario in a multithreaded application:
:
: 1. Multiple threads are performing epoll_wait() operations,
:    and maintaining a user-space cache that contains information
:    corresponding to each file descriptor being monitored by
:    epoll_wait().
:
: 2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
:    a file descriptor from the epoll interest list, and
:    delete the corresponding record from the user-space cache.
:
: 3. The problem with (2) is that some other thread may have
:    previously done an epoll_wait() that retrieved information
:    about the fd in question, and may be in the middle of using
:    information in the cache that relates to that fd. Thus,
:    there is a potential race.
:
: 4. The race can't solved purely in user space, because doing
:    so would require applying a mutex across the epoll_wait()
:    call, which would of course blow thread concurrency.
:
: Right?
:
: Your solution is the EPOLL_CTL_DISABLE operation. I want to
: confirm my understanding about how to use this flag, since
: the description that has accompanied the patches so far
: has been a bit sparse
:
: 0. In the scenario you're concerned about, deleting a file
:    descriptor means (safely) doing the following:
:    (a) Deleting the file descriptor from the epoll interest list
:        using EPOLL_CTL_DEL
:    (b) Deleting the corresponding record in the user-space cache
:
: 1. It's only meaningful to use this EPOLL_CTL_DISABLE in
:    conjunction with EPOLLONESHOT.
:
: 2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
:    conjunction is a logical error.
:
: 3. The correct way to code multithreaded applications using
:    EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:
:
:    a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
:       should EPOLLONESHOT.
:
:    b. When a thread wants to delete a file descriptor, it
:       should do the following:
:
:       [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
:       [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
:           was zero, then the file descriptor can be safely
:           deleted by the thread that made this call.
:       [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
:           then the descriptor is in use. In this case, the calling
:           thread should set a flag in the user-space cache to
:           indicate that the thread that is using the descriptor
:           should perform the deletion operation.
:
: Is all of the above correct?
:
: The implementation depends on checking on whether
: (events & ~EP_PRIVATE_BITS) == 0
: This replies on the fact that EPOLL_CTL_AD and EPOLL_CTL_MOD always
: set EPOLLHUP and EPOLLERR in the 'events' mask, and EPOLLONESHOT
: causes those flags (as well as all others in ~EP_PRIVATE_BITS) to be
: cleared.
:
: A corollary to the previous paragraph is that using EPOLL_CTL_DISABLE
: is only useful in conjunction with EPOLLONESHOT. However, as things
: stand, one can use EPOLL_CTL_DISABLE on a file descriptor that does
: not have EPOLLONESHOT set in 'events' This results in the following
: (slightly surprising) behavior:
:
: (a) The first call to epoll_ctl(EPOLL_CTL_DISABLE) returns 0
:     (the indicator that the file descriptor can be safely deleted).
: (b) The next call to epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY.
:
: This doesn't seem particularly useful, and in fact is probably an
: indication that the user made a logic error: they should only be using
: epoll_ctl(EPOLL_CTL_DISABLE) on a file descriptor for which
: EPOLLONESHOT was set in 'events'. If that is correct, then would it
: not make sense to return an error to user space for this case?

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paton J. Lewis" <palewis@adobe.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a80a6b85

08 11月, 2012 4 次提交

mmc: dw_mmc: constify dw_mci_idmac_ops in exynos back-end · 8e2b36ea

由 Arnd Bergmann 提交于 11月 06, 2012

The of_device_id match data is now marked as const and
must not be modified. This changes the dw_mmc to mark
all pointers passing the dw_mci_drv_data or dw_mci_dma_ops
structures as const, and also marks the static definitions
as const.

drivers/mmc/host/dw_mmc-exynos.c: In function 'dw_mci_exynos_probe':
drivers/mmc/host/dw_mmc-exynos.c:234:11: warning: assignment discards 'const' qualifier from pointer target type [enabled by default]
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Thomas Abraham <thomas.abraham@linaro.org>
Cc: Will Newton <will.newton@imgtec.com>
Signed-off-by: NChris Ball <cjb@laptop.org>

8e2b36ea

mmc: sdhci-of-esdhc: disable CMD23 for some Freescale SoCs · 63ef5d8c

由 Jerry Huang 提交于 10月 25, 2012

CMD23 causes lots of errors in kernel on some freescale SoCs
(P1020, P1021, P1022, P1024, P1025 and P4080) when MMC card used,
which is because these controllers does not support CMD23,
even on the SoCs which declares CMD23 is supported.
Therefore, we'll not use CMD23.
Signed-off-by: NJerry Huang <Chang-Ming.Huang@freescale.com>
Signed-off-by: NShaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: NAnton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: NChris Ball <cjb@laptop.org>

63ef5d8c

mmc: dw_mmc: convert the variable type of irq · d676188e

由 Seungwon Jeon 提交于 9月 28, 2012

Even though platform_get_irq returns error, 'host->irq'
always has an unsigned value. Less-than-zero comparison
of an unsigned value is never true. Type of 'unsigned int'
will be changed for 'int'.
Signed-off-by: NSeungwon Jeon <tgih.jun@samsung.com>
Acked-by: NWill Newton <will.newton@imgtec.com>
Signed-off-by: NChris Ball <cjb@laptop.org>

d676188e

drivers: bus: ocp2scp: add pdata support · 0133370f

由 Kishon Vijay Abraham I 提交于 11月 02, 2012

ocp2scp was not having pdata support which makes *musb* fail for non-dt
boot in OMAP platform. The pdata will have information about the devices
that is connected to ocp2scp. ocp2scp driver will now make use of this
information to create the devices that is attached to ocp2scp.

This is needed to fix MUSB regression caused by commit c9e4412a
(arm: omap: phy: remove unused functions from omap-phy-internal.c)
Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
Acked-by: NFelipe Balbi <balbi@ti.com>
[tony@atomide.com: updated comments for regression info]
Signed-off-by: NTony Lindgren <tony@atomide.com>

0133370f

07 11月, 2012 1 次提交

xen/hvm: If we fail to fetch an HVM parameter print out which flag it is. · 6d877e6b

由 Konrad Rzeszutek Wilk 提交于 10月 19, 2012

Makes it easier to troubleshoot in the field.
Acked-by: NIan Campbell <ian.campbell@citrix.com>
[v1: Use macro per Ian's suggestion]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

6d877e6b

04 11月, 2012 1 次提交

ptp: update adjfreq callback description · 87f4d7c1

由 Jacob Keller 提交于 11月 01, 2012

This patch updates the adjfreq callback description to include a note that the
delta in ppb is always relative to the base frequency, and not to the current
frequency of the hardware clock.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
CC: stable@vger.kernel.org [v3.5+]
CC: Richard Cochran <richard.cochran@gmail.com>
CC: John Stultz <john.stultz@linaro.org>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87f4d7c1

03 11月, 2012 1 次提交

hashtable: introduce a small and naive hashtable · d9b482c8

由 Sasha Levin 提交于 10月 30, 2012

This hashtable implementation is using hlist buckets to provide a simple
hashtable to prevent it from getting reimplemented all over the kernel.
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
[ Merging this now, so that subsystems can start applying Sasha's
  patches that use this   - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d9b482c8

01 11月, 2012 2 次提交

KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66

由 Xiao Guangrong 提交于 10月 24, 2012

After commit b3356bf0 (KVM: emulator: optimize "rep ins" handling),
the pieces of io data can be collected and write them to the guest memory
or MMIO together

Unfortunately, kvm splits the mmio access into 8 bytes and store them to
vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
will cause vcpu->mmio_fragments overflow

The bug can be exposed by isapc (-M isapc):

[23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ ......]
[23154.858083] Call Trace:
[23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
[23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
[23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]

Actually, we can use one mmio_fragment to store a large mmio access then
split it when we pass the mmio-exit-info to userspace. After that, we only
need two entries to store mmio info for the cross-mmio pages access
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

87da7e66

xen/mmu: Use Xen specific TLB flush instead of the generic one. · 95a7d768

由 Konrad Rzeszutek Wilk 提交于 10月 31, 2012

As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.

This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.

Oracle-bug: 14630170

CC: stable@vger.kernel.org
Tested-by: NJingjie Jiang <jingjie.jiang@oracle.com>
Suggested-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

95a7d768

30 10月, 2012 1 次提交

ALSA: Add a reference counter to card instance · a0830dbd

由 Takashi Iwai 提交于 10月 16, 2012

For more strict protection for wild disconnections, a refcount is
introduced to the card instance, and let it up/down when an object is
referred via snd_lookup_*() in the open ops.

The free-after-last-close check is also changed to check this refcount
instead of the empty list, too.
Reported-by: NMatthieu CASTET <matthieu.castet@parrot.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

a0830dbd

29 10月, 2012 2 次提交

percpu-rw-semaphores: use rcu_read_lock_sched · 1bf11c53

由 Mikulas Patocka 提交于 10月 22, 2012

Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched
instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu.

This is an optimization. The RCU-protected region is very small, so
there will be no latency problems if we disable preempt in this region.

So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates
to preempt_disable / preempt_disable. It is smaller (and supposedly
faster) than preemptible rcu_read_lock / rcu_read_unlock.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1bf11c53

percpu-rw-semaphores: use light/heavy barriers · 5c1eabe6

由 Mikulas Patocka 提交于 10月 22, 2012

This patch introduces new barrier pair light_mb() and heavy_mb() for
percpu rw semaphores.

This patch fixes a bug in percpu-rw-semaphores where a barrier was
missing in percpu_up_write.

This patch improves performance on the read path of
percpu-rw-semaphores: on non-x86 cpus, there was a smp_mb() in
percpu_up_read. This patch changes it to a compiler barrier and removes
the "#if defined(X86) ..." condition.

From: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c1eabe6

27 10月, 2012 1 次提交

mac80211: verify that skb data is present · 9b395bc3

由 Johannes Berg 提交于 10月 26, 2012

A number of places in the mesh code don't check that
the frame data is present and in the skb header when
trying to access. Add those checks and the necessary
pskb_may_pull() calls. This prevents accessing data
that doesn't actually exist.

To do this, export ieee80211_get_mesh_hdrlen() to be
able to use it in mac80211.

Cc: stable@vger.kernel.org
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

9b395bc3

26 10月, 2012 1 次提交

rbtree: include linux/compiler.h for definition of __always_inline · 29fc7c5a

由 Will Deacon 提交于 10月 25, 2012

rb_erase_augmented() is a static function annotated with
__always_inline.  This causes a compile failure when attempting to use
the rbtree implementation as a library (e.g.  kvm tool):

  rbtree_augmented.h:125:24: error: expected `=', `,', `;', `asm' or `__attribute__' before `void'

Include linux/compiler.h in rbtree_augmented.h so that the __always_inline
macro is resolved correctly.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Reviewed-by: NMichel Lespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29fc7c5a

25 10月, 2012 2 次提交

dynamic_debug: Remove unnecessary __used · c0d2af63

由 Joe Perches 提交于 10月 18, 2012

The __used attribute prevents gcc from eliminating
unnecessary, otherwise optimized away, metadata for
debugging logging messages.

Remove the __used attribute.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c0d2af63

x86, mm: Trim memory in memblock to be page aligned · 6ede1fd3

由 Yinghai Lu 提交于 10月 22, 2012

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.comAcked-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

6ede1fd3

24 10月, 2012 2 次提交

perf, cpu hotplug: Use cached value of smp_processor_id() · c13d38e4

由 Srivatsa S. Bhat 提交于 10月 16, 2012

The perf_cpu_notifier() macro invokes smp_processor_id()
multiple times. Optimize it by using a local variable.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20121016075817.3572.76733.stgit@srivatsabhat.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c13d38e4

perf, cpu hotplug: Run CPU_STARTING notifiers with irqs disabled · 6760bca9

由 Srivatsa S. Bhat 提交于 10月 16, 2012

The CPU_STARTING notifiers are supposed to be run with irqs
disabled. But the perf_cpu_notifier() macro invokes them without
doing that. Fix it.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20121016075809.3572.47848.stgit@srivatsabhat.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6760bca9

23 10月, 2012 3 次提交

A
drm/radeon: add some new SI PCI ids · b6aa22db
由 Alex Deucher 提交于 10月 16, 2012
```
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
```
b6aa22db

extcon: MAX77693: Add platform data for MUIC device to initialize registers · f8457d57

由 Chanwoo Choi 提交于 10月 08, 2012

This patch add platform data for MUIC device to initialize register
on probe() call because it should unmask interrupt mask register
and initialize some register related to MUIC device.
Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: NMyungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>

f8457d57

tcp: add SYN/data info to TCP_INFO · 6f73601e

由 Yuchung Cheng 提交于 10月 19, 2012

Add a bit TCPI_OPT_SYN_DATA (32) to the socket option TCP_INFO:tcpi_options.
It's set if the data in SYN (sent or received) is acked by SYN-ACK. Server or
client application can use this information to check Fast Open success rate.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f73601e

22 10月, 2012 1 次提交

extcon: standard cable names definition and declaration changed · 0cf6ad8a

由 anish kumar 提交于 8月 30, 2012

With this change now individual drivers can use standard cable
names as below:
static const char *arizona_cable[] = {
    extcon_cable_name[EXTCON_USB],
    extcon_cable_name[EXTCON_USB_HOST],
    "CUSTOM_CABLE"
    NULL,
}
Signed-off-by: Nanish kumar <anish198519851985@gmail.com>
Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com>

0cf6ad8a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功