提交 · bb33db7a076f4719dc68c235e187dd4bfb16b621 · openeuler / raspberrypi-kernel

15 4月, 2013 1 次提交

Add file_ns_capable() helper function for open-time capability checking · 935d8aab

由 Linus Torvalds 提交于 4月 14, 2013

Nothing is using it yet, but this will allow us to delay the open-time
checks to use time, without breaking the normal UNIX permission
semantics where permissions are determined by the opener (and the file
descriptor can then be passed to a different process, or the process can
drop capabilities).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

935d8aab

13 4月, 2013 3 次提交

x86-32: Fix possible incomplete TLB invalidate with PAE pagetables · 1de14c3c

由 Dave Hansen 提交于 4月 12, 2013

This patch attempts to fix:

	https://bugzilla.kernel.org/show_bug.cgi?id=56461

The symptom is a crash and messages like this:

	chrome: Corrupted page table at address 34a03000
	*pdpt = 0000000000000000 *pde = 0000000000000000
	Bad pagetable: 000f [#1] PREEMPT SMP

Ingo guesses this got introduced by commit 611ae8e3 ("x86/tlb:
enable tlb flush range support for x86") since that code started to free
unused pagetables.

On x86-32 PAE kernels, that new code has the potential to free an entire
PMD page and will clear one of the four page-directory-pointer-table
(aka pgd_t entries).

The hardware aggressively "caches" these top-level entries and invlpg
does not actually affect the CPU's copy.  If we clear one we *HAVE* to
do a full TLB flush, otherwise we might continue using a freed pmd page.
(note, we do this properly on the population side in pud_populate()).

This patch tracks whenever we clear one of these entries in the 'struct
mmu_gather', and ensures that we follow up with a full tlb flush.

BTW, I disassembled and checked that:

	if (tlb->fullmm == 0)
and
	if (!tlb->fullmm && !tlb->need_flush_all)

generate essentially the same code, so there should be zero impact there
to the !PAE case.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Artem S Tashkinov <t.artem@mailcity.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1de14c3c

ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE section · 7f49ef69

由 Steven Rostedt (Red Hat) 提交于 4月 12, 2013

As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to
be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the
ftrace_pid_fops is defined when DYNAMIC_FTRACE is not.

Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7f49ef69

tracing: Fix possible NULL pointer dereferences · 6a76f8c0

由 Namhyung Kim 提交于 4月 11, 2013

Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops.  However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.

It can be easily reproduced with following command:

  $ cd /sys/kernel/debug/tracing
  $ echo 1234 | sudo tee -a set_ftrace_pid

In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().

Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6a76f8c0

12 4月, 2013 1 次提交

kthread: Prevent unpark race which puts threads on the wrong cpu · f2530dc7

由 Thomas Gleixner 提交于 4月 09, 2013

The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK  
  bind_to(CPU2)		BUG_ON(wrong CPU)						    

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NDave Hansen <dave@sr71.net>
Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
Acked-by: NPeter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f2530dc7

11 4月, 2013 1 次提交

lsm: add the missing documentation for the security_skb_owned_by() hook · 6b07a24f

由 Paul Moore 提交于 4月 10, 2013

Unfortunately we didn't catch the missing comments earlier when the
patch was merged.
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b07a24f

10 4月, 2013 3 次提交

procfs: add proc_remove_subtree() · 8ce584c7

由 Al Viro 提交于 3月 30, 2013

just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8ce584c7

spinlocks and preemption points need to be at least compiler barriers · 386afc91

由 Linus Torvalds 提交于 4月 09, 2013

In UP and non-preempt respectively, the spinlocks and preemption
disable/enable points are stubbed out entirely, because there is no
regular code that can ever hit the kind of concurrency they are meant to
protect against.

However, while there is no regular code that can cause scheduling, we
_do_ end up having some exceptional (literally!) code that can do so,
and that we need to make sure does not ever get moved into the critical
region by the compiler.

In particular, get_user() and put_user() is generally implemented as
inline asm statements (even if the inline asm may then make a call
instruction to call out-of-line), and can obviously cause a page fault
and IO as a result.  If that inline asm has been scheduled into the
middle of a preemption-safe (or spinlock-protected) code region, we
obviously lose.

Now, admittedly this is *very* unlikely to actually ever happen, and
we've not seen examples of actual bugs related to this.  But partly
exactly because it's so hard to trigger and the resulting bug is so
subtle, we should be extra careful to get this right.

So make sure that even when preemption is disabled, and we don't have to
generate any actual *code* to explicitly tell the system that we are in
a preemption-disabled region, we need to at least tell the compiler not
to move things around the critical region.

This patch grew out of the same discussion that caused commits
79e5f05e ("ARC: Add implicit compiler barrier to raw_local_irq*
functions") and 3e2e0d2c ("tile: comment assumption about
__insn_mtspr for <asm/irqflags.h>") to come about.

Note for stable: use discretion when/if applying this.  As mentioned,
this bug may never have actually bitten anybody, and gcc may never have
done the required code motion for it to possibly ever trigger in
practice.

Cc: stable@vger.kernel.org
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

386afc91

selinux: add a skb_owned_by() hook · ca10b9e9

由 Eric Dumazet 提交于 4月 08, 2013

Commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb())
broke certain SELinux/NetLabel configurations by no longer correctly
assigning the sock to the outgoing SYNACK packet.

Cost of atomic operations on the LISTEN socket is quite big,
and we would like it to happen only if really needed.

This patch introduces a new security_ops->skb_owned_by() method,
that is a void operation unless selinux is active.
Reported-by: NMiroslav Vadkerti <mvadkert@redhat.com>
Diagnosed-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-security-module@vger.kernel.org
Acked-by: NJames Morris <james.l.morris@oracle.com>
Tested-by: NPaul Moore <pmoore@redhat.com>
Acked-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca10b9e9

09 4月, 2013 2 次提交

af_iucv: fix recvmsg by replacing skb_pull() function · f9c41a62

由 Ursula Braun 提交于 4月 07, 2013

When receiving data messages, the "BUG_ON(skb->len < skb->data_len)" in
the skb_pull() function triggers a kernel panic.

Replace the skb_pull logic by a per skb offset as advised by
Eric Dumazet.
Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
Signed-off-by: NFrank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9c41a62

ftrace: Do not call stub functions in control loop · 395b97a3

由 Steven Rostedt (Red Hat) 提交于 3月 27, 2013

The function tracing control loop used by perf spits out a warning
if the called function is not a control function. This is because
the control function references a per cpu allocated data structure
on struct ftrace_ops that is not allocated for other types of
functions.

commit 0a016409 "ftrace: Optimize the function tracer list loop"

Had an optimization done to all function tracing loops to optimize
for a single registered ops. Unfortunately, this allows for a slight
race when tracing starts or ends, where the stub function might be
called after the current registered ops is removed. In this case we
get the following dump:

root# perf stat -e ftrace:function sleep 1
[   74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
[   74.349522] Hardware name: PRIMERGY RX200 S6
[   74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[   74.446233] Pid: 1377, comm: perf Tainted: G        W    3.9.0-rc1 #1
[   74.453458] Call Trace:
[   74.456233]  [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
[   74.462997]  [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
[   74.470272]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
[   74.478117]  [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
[   74.484681]  [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
[   74.491760]  [<ffffffff8162f400>] ftrace_call+0x5/0x2f
[   74.497511]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
[   74.503486]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
[   74.509500]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
[   74.516088]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[   74.522268]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
[   74.528837]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
[   74.536696]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[   74.542878]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
[   74.548869]  [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
[   74.556243]  [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
[   74.563709]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[   74.569887]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
[   74.575898]  [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
[   74.582505]  [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
[   74.589298]  [<ffffffff811295d0>] free_event+0x70/0x1a0
[   74.595208]  [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
[   74.602460]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[   74.608667]  [<ffffffff8112a640>] put_event+0x90/0xc0
[   74.614373]  [<ffffffff8112a740>] perf_release+0x10/0x20
[   74.620367]  [<ffffffff811a3044>] __fput+0xf4/0x280
[   74.625894]  [<ffffffff811a31de>] ____fput+0xe/0x10
[   74.631387]  [<ffffffff81083697>] task_work_run+0xa7/0xe0
[   74.637452]  [<ffffffff81014981>] do_notify_resume+0x71/0xb0
[   74.643843]  [<ffffffff8162fa92>] int_signal+0x12/0x17

To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
ftrace_ops stub as just that, a stub. This flag is now checked in the
control loop and the function is not called if the flag is set.

Thanks to Jovi for not just reporting the bug, but also pointing out
where the bug was in the code.

Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.comTested-by: NWANG Chao <chaowang@redhat.com>
Reported-by: NWANG Chao <chaowang@redhat.com>
Reported-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

395b97a3

07 4月, 2013 1 次提交

KVM: Allow cross page reads and writes from cached translations. · 8f964525

由 Andrew Honig 提交于 3月 29, 2013

This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page.  If the range falls within
the same memslot, then this will be a fast operation.  If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.

Tested: Test against kvm_clock unit tests.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8f964525

06 4月, 2013 1 次提交

netfilter: don't reset nf_trace in nf_reset() · 124dff01

由 Patrick McHardy 提交于 4月 05, 2013

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

124dff01

05 4月, 2013 1 次提交

net: count hw_addr syncs so that unsync works properly. · 4543fbef

由 Vlad Yasevich 提交于 4月 02, 2013

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4543fbef

04 4月, 2013 2 次提交

libata: Set max sector to 65535 for Slimtype DVD A DS8A8SH drive · a32450e1

由 Shan Hai 提交于 3月 18, 2013

The Slimtype DVD A  DS8A8SH drive locks up when max sector is smaller than
65535, and the blow backtrace is observed on locking up:

INFO: task flush-8:32:1130 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:32      D ffffffff8180cf60     0  1130      2 0x00000000
 ffff880273aef618 0000000000000046 0000000000000005 ffff880273aee000
 ffff880273aee000 ffff880273aeffd8 ffff880273aee010 ffff880273aee000
 ffff880273aeffd8 ffff880273aee000 ffff88026e842ea0 ffff880274a10000
Call Trace:
 [<ffffffff8168fc2d>] schedule+0x5d/0x70
 [<ffffffff8168fccc>] io_schedule+0x8c/0xd0
 [<ffffffff81324461>] get_request+0x731/0x7d0
 [<ffffffff8133dc60>] ? cfq_allow_merge+0x50/0x90
 [<ffffffff81083aa0>] ? wake_up_bit+0x40/0x40
 [<ffffffff81320443>] ? bio_attempt_back_merge+0x33/0x110
 [<ffffffff813248ea>] blk_queue_bio+0x23a/0x3f0
 [<ffffffff81322176>] generic_make_request+0xc6/0x120
 [<ffffffff81322308>] submit_bio+0x138/0x160
 [<ffffffff811d7596>] ? bio_alloc_bioset+0x96/0x120
 [<ffffffff811d1f61>] submit_bh+0x1f1/0x220
 [<ffffffff811d48b8>] __block_write_full_page+0x228/0x340
 [<ffffffff811d3650>] ? attach_nobh_buffers+0xc0/0xc0
 [<ffffffff811d8960>] ? I_BDEV+0x10/0x10
 [<ffffffff811d8960>] ? I_BDEV+0x10/0x10
 [<ffffffff811d4ab6>] block_write_full_page_endio+0xe6/0x100
 [<ffffffff811d4ae5>] block_write_full_page+0x15/0x20
 [<ffffffff811d9268>] blkdev_writepage+0x18/0x20
 [<ffffffff81142527>] __writepage+0x17/0x40
 [<ffffffff811438ba>] write_cache_pages+0x34a/0x4a0
 [<ffffffff81142510>] ? set_page_dirty+0x70/0x70
 [<ffffffff81143a61>] generic_writepages+0x51/0x80
 [<ffffffff81143ab0>] do_writepages+0x20/0x50
 [<ffffffff811c9ed6>] __writeback_single_inode+0xa6/0x2b0
 [<ffffffff811ca861>] writeback_sb_inodes+0x311/0x4d0
 [<ffffffff811caaa6>] __writeback_inodes_wb+0x86/0xd0
 [<ffffffff811cad43>] wb_writeback+0x1a3/0x330
 [<ffffffff816916cf>] ? _raw_spin_lock_irqsave+0x3f/0x50
 [<ffffffff811b8362>] ? get_nr_inodes+0x52/0x70
 [<ffffffff811cb0ac>] wb_do_writeback+0x1dc/0x260
 [<ffffffff8168dd34>] ? schedule_timeout+0x204/0x240
 [<ffffffff811cb232>] bdi_writeback_thread+0x102/0x2b0
 [<ffffffff811cb130>] ? wb_do_writeback+0x260/0x260
 [<ffffffff81083550>] kthread+0xc0/0xd0
 [<ffffffff81083490>] ? kthread_worker_fn+0x1b0/0x1b0
 [<ffffffff8169a3ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81083490>] ? kthread_worker_fn+0x1b0/0x1b0

 The above trace was triggered by
   "dd if=/dev/zero of=/dev/sr0 bs=2048 count=32768"

 It was previously working by accident, since another bug introduced
 by 4dce8ba9 (libata: Use 'bool' return value for ata_id_XXX) caused
 all drives to use maxsect=65535.

Cc: stable@vger.kernel.org
Signed-off-by: NShan Hai <shan.hai@windriver.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

a32450e1

libata: Use integer return value for atapi_command_packet_set · d8668fcb

由 Shan Hai 提交于 3月 18, 2013

The function returns type of ATAPI drives so it should return integer value.
The commit 4dce8ba9 (libata: Use 'bool' return value for ata_id_XXX) since
v2.6.39 changed the type of return value from int to bool, the change would
cause all of the ATAPI class drives to be treated as TYPE_TAPE and the
max_sectors of the drives to be set to 65535 because of the commit
f8d8e579(libata: increase 128 KB / cmd limit for ATAPI tape drives), for the
function would return true for all ATAPI class drives and the TYPE_TAPE is
defined as 0x01.

Cc: stable@vger.kernel.org
Signed-off-by: NShan Hai <shan.hai@windriver.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

d8668fcb

02 4月, 2013 1 次提交

PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset · 5faaa035

由 Rajagopal Venkat 提交于 3月 21, 2013

Fix compiler warnings generated when devfreq is not enabled
(CONFIG_PM_DEVFREQ is not set).
Signed-off-by: NRajagopal Venkat <rajagopal.venkat@linaro.org>
Acked-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5faaa035

01 4月, 2013 1 次提交

Revert "lockdep: check that no locks held at freeze time" · dbf520a9

由 Paul Walmsley 提交于 3月 31, 2013

This reverts commit 6aa97070.

Commit 6aa97070 ("lockdep: check that no locks held at freeze time")
causes problems with NFS root filesystems.  The failures were noticed on
OMAP2 and 3 boards during kernel init:

  [ BUG: swapper/0/1 still has locks held! ]
  3.9.0-rc3-00344-ga937536b #1 Not tainted
  -------------------------------------
  1 lock held by swapper/0/1:
   #0:  (&type->s_umount_key#13/1){+.+.+.}, at: [<c011e84c>] sget+0x248/0x574

  stack backtrace:
    rpc_wait_bit_killable
    __wait_on_bit
    out_of_line_wait_on_bit
    __rpc_execute
    rpc_run_task
    rpc_call_sync
    nfs_proc_get_root
    nfs_get_root
    nfs_fs_mount_common
    nfs_try_mount
    nfs_fs_mount
    mount_fs
    vfs_kern_mount
    do_mount
    sys_mount
    do_mount_root
    mount_root
    prepare_namespace
    kernel_init_freeable
    kernel_init

Although the rootfs mounts, the system is unstable.  Here's a transcript
from a PM test:

  http://www.pwsan.com/omap/testlogs/test_v3.9-rc3/20130317194234/pm/37xxevm/37xxevm_log.txt

Here's what the test log should look like:

  http://www.pwsan.com/omap/testlogs/test_v3.8/20130218214403/pm/37xxevm/37xxevm_log.txt

Mailing list discussion is here:

  http://lkml.org/lkml/2013/3/4/221

Deal with this for v3.9 by reverting the problem commit, until folks can
figure out the right long-term course of action.
Signed-off-by: NPaul Walmsley <paul@pwsan.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <maciej.rutecki@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ben Chan <benchan@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dbf520a9

29 3月, 2013 1 次提交

Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs" · 09a9f1d2

由 Michel Lespinasse 提交于 3月 28, 2013

This reverts commit 18693050 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").

VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.

Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.

Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09a9f1d2

28 3月, 2013 1 次提交

line up comment for ndo_bridge_getlink · 24f11a5c

由 Dmitry Kravkov 提交于 3月 27, 2013

Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24f11a5c

27 3月, 2013 5 次提交

userns: Restrict when proc and sysfs can be mounted · 87a8ebd6

由 Eric W. Biederman 提交于 3月 24, 2013

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

87a8ebd6

vfs: Add a mount flag to lock read only bind mounts · 90563b19

由 Eric W. Biederman 提交于 3月 22, 2013

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

90563b19

userns: Don't allow creation if the user is chrooted · 3151527e

由 Eric W. Biederman 提交于 3月 15, 2013

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace.  Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead.  With this result that this is not a practical
limitation for using user namespaces.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

3151527e

PCI: Add PCI ROM helper for platform-provided ROM images · fffe01f7

由 Matthew Garrett 提交于 3月 26, 2013

It turns out that some UEFI systems provide apparently an apparently valid
PCI ROM BAR that turns out to contain garbage, so the attempt in 547b5246
to prefer the ROM from the BAR actually breaks a different set of machines.
As Linus pointed out, the graphics drivers are probably in the best
position to make this judgement, so this basically reverts 547b5246 and
f9a37be0 and adds a new helper function. Followup patches will add support
to nouveau and radeon for probing this ROM source if they can't find a ROM
from some other source.

[bhelgaas: added reporter and bugzilla pointers, s/f4eb5ff05/547b5246]
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=927451
Reference: http://lkml.kernel.org/r/kg69ef$vdb$1@ger.gmane.orgReported-by: NMantas Mikulėnas <grawity@gmail.com>
Reported-by: NChris Murphy <bugzilla@colorremedies.com>
Signed-off-by: NMatthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

fffe01f7

ipv4: Fix ip-header identification for gso packets. · 330305cc

由 Pravin B Shelar 提交于 3月 24, 2013

ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab081
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.
Reported-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

330305cc

26 3月, 2013 3 次提交

libfc, fcoe, bnx2fc: Split fc_disc_init into fc_disc_{init, config} · 0807619d

由 Robert Love 提交于 3月 25, 2013

Split discovery initialization in code that is setup once (fcoe_disc_init)
and code that can be re-configured (fcoe_disc_config).
Signed-off-by: NRobert Love <robert.w.love@intel.com>
Tested-by: NJack Morgan <jack.morgan@intel.com>
Reviewed-by: NBhanu Prakash Gollapudi <bprakash@broadcom.com>

0807619d

libfc, fcoe, bnx2fc: Always use fcoe_disc_init for discovery layer initialization · 8a9a7138

由 Robert Love 提交于 3月 25, 2013

Currently libfcoe is doing some libfc discovery layer initialization outside of
libfc. This patch moves this code into libfc and sets up a split in discovery
(one time) initialization code and (re-configurable) settings that will come in
the next patch.
Signed-off-by: NRobert Love <robert.w.love@intel.com>
Tested-by: NJack Morgan <jack.morgan@intel.com>
Reviewed-by: NBhanu Prakash Gollapudi <bprakash@broadcom.com>

8a9a7138

usb: add find_raw_port_number callback to struct hc_driver() · 3f5eb141

由 Lan Tianyu 提交于 3月 19, 2013

xhci driver divides the root hub into two logical hubs which work
respectively for usb 2.0 and usb 3.0 devices. They are independent
devices in the usb core. But in the ACPI table, it's one device node
and all usb2.0 and usb3.0 ports are under it. Binding usb port with
its acpi node needs the raw port number which is reflected in the xhci
extended capabilities table. This patch is to add find_raw_port_number
callback to struct hc_driver(), fill it with xhci_find_raw_port_number()
which will return raw port number and add a wrap usb_hcd_find_raw_port_number().

Otherwise, refactor xhci_find_real_port_number(). Using
xhci_find_raw_port_number() to get real index in the HW port status
registers instead of scanning through the xHCI roothub port array.
This can help to speed up.

All addresses in xhci->usb2_ports and xhci->usb3_ports array are
kown good ports and don't include following bad ports in the extended
capabilities talbe.
     (1) root port that doesn't have an entry
     (2) root port with unknown speed
     (3) root port that is listed twice and with different speeds.

So xhci_find_raw_port_number() will only return port num of good ones
and never touch bad ports above.
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>

3f5eb141

25 3月, 2013 1 次提交

netfilter: reset nf_trace in nf_reset · 130549fe

由 Gao feng 提交于 3月 21, 2013

We forgot to clear the nf_trace of sk_buff in nf_reset,
When we use veth device, this nf_trace information will
be leaked from one net namespace to another net namespace.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

130549fe

23 3月, 2013 3 次提交

mm: zone_end_pfn is too small · f9228b20

由 Russ Anderson 提交于 3月 22, 2013

Booting with 32 TBytes memory hits BUG at mm/page_alloc.c:552! (output
below).

The key hint is "page 4294967296 outside zone".
4294967296 = 0x100000000 (bit 32 is set).

The problem is in include/linux/mmzone.h:

  530 static inline unsigned zone_end_pfn(const struct zone *zone)
  531 {
  532         return zone->zone_start_pfn + zone->spanned_pages;
  533 }

zone_end_pfn is "unsigned" (32 bits).  Changing it to "unsigned long"
(64 bits) fixes the problem.

zone_end_pfn() was added recently in commit 108bcc96 ("mm: add & use
zone_end_pfn() and zone_spans_pfn()")

Output from the failure.

  No AGP bridge found
  page 4294967296 outside zone [ 4294967296 - 4327469056 ]
  ------------[ cut here ]------------
  kernel BUG at mm/page_alloc.c:552!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU 0
  Pid: 0, comm: swapper Not tainted 3.9.0-rc2.dtp+ #10
  RIP: free_one_page+0x382/0x430
  Process swapper (pid: 0, threadinfo ffffffff81942000, task ffffffff81955420)
  Call Trace:
    __free_pages_ok+0x96/0xb0
    __free_pages+0x25/0x50
    __free_pages_bootmem+0x8a/0x8c
    __free_memory_core+0xea/0x131
    free_low_memory_core_early+0x4a/0x98
    free_all_bootmem+0x45/0x47
    mem_init+0x7b/0x14c
    start_kernel+0x216/0x433
    x86_64_start_reservations+0x2a/0x2c
    x86_64_start_kernel+0x144/0x153
  Code: 89 f1 ba 01 00 00 00 31 f6 d3 e2 4c 89 ef e8 66 a4 01 00 e9 2c fe ff ff 0f 0b eb fe 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 eb f3 <0f> 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 49
Signed-off-by: NRuss Anderson <rja@sgi.com>
Reported-by: NGeorge Beshers <gbeshers@sgi.com>
Acked-by: NHedi Berriche <hedi@sgi.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9228b20

printk: Provide a wake_up_klogd() off-case · dc72c32e

由 Frederic Weisbecker 提交于 3月 22, 2013

wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on
log_wait waitqueue.  It should be a stub in this case for users like
bust_spinlocks().

Otherwise this results in this warning when CONFIG_PRINTK=n and
CONFIG_IRQ_WORK=n:

	kernel/built-in.o In function `wake_up_klogd':
	(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'

To fix this, provide an off-case for wake_up_klogd() when
CONFIG_PRINTK=n.

There is much more from console_unlock() and other console related code
in printk.c that should be moved under CONFIG_PRINTK.  But for now,
focus on a minimal fix as we passed the merged window already.

[akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Reported-by: NJames Hogan <james.hogan@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc72c32e

irq_work.h: fix warning when CONFIG_IRQ_WORK=n · fe8d5261

由 James Hogan 提交于 3月 22, 2013

A randconfig caught repeated compiler warnings when CONFIG_IRQ_WORK=n
due to the definition of a non-inline static function in
<linux/irq_work.h>:

  include/linux/irq_work.h +40 : warning: 'irq_work_needs_cpu' defined but not used

Make it inline to supress the warning.  This is caused commit
00b42959 ("irq_work: Don't stop the tick with pending works") merged
in v3.9-rc1.
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe8d5261

22 3月, 2013 4 次提交

xen-pciback: notify hypervisor about devices intended to be assigned to guests · 909b3fdb

由 Jan Beulich 提交于 3月 12, 2013

For MSI-X capable devices the hypervisor wants to write protect the
MSI-X table and PBA, yet it can't assume that resources have been
assigned to their final values at device enumeration time. Thus have
pciback do that notification, as having the device controlled by it is
a prerequisite to assigning the device to guests anyway.

This is the kernel part of hypervisor side commit 4245d33 ("x86/MSI:
add mechanism to fully protect MSI-X table from PV guest accesses") on
the master branch of git://xenbits.xen.org/xen.git.

CC: stable@vger.kernel.org
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

909b3fdb

M
Revert "KVM: allow host header to be included even for !CONFIG_KVM" · 09a6e1f4
由 Marcelo Tosatti 提交于 3月 22, 2013
```
This reverts commit f445f11e as
it breaks PPC with CONFIG_KVM=n.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
09a6e1f4

USB: serial: add modem-status-change wait queue · e5b33dc9

由 Johan Hovold 提交于 3月 19, 2013

Add modem-status-change wait queue to struct usb_serial_port that
subdrivers can use to implement TIOCMIWAIT.

Currently subdrivers use a private wait queue which may have been
released when waking up after device disconnected.

Note that we're adding a new wait queue rather than reusing the tty-port
one as we do not want to get woken up at hangup (yet).

Cc: stable <stable@vger.kernel.org>
Signed-off-by: NJohan Hovold <jhovold@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e5b33dc9

net: fix *_DIAG_MAX constants · ae5fc987

由 Andrey Vagin 提交于 3月 21, 2013

Follow the common pattern and define *_DIAG_MAX like:

        [...]
        __XXX_DIAG_MAX,
};

Because everyone is used to do:

        struct nlattr *attrs[XXX_DIAG_MAX+1];

        nla_parse([...], XXX_DIAG_MAX, [...]
Reported-by: NThomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae5fc987

21 3月, 2013 3 次提交

thermal: shorten too long mcast group name · 73214f5d

由 Masatake YAMATO 提交于 3月 19, 2013

The original name is too long.
Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73214f5d

flow_keys: include thoff into flow_keys for later usage · 8ed78166

由 Daniel Borkmann 提交于 3月 19, 2013

In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ed78166

udp: add encap_destroy callback · 44046a59

由 Tom Parkin 提交于 3月 19, 2013

Users of udp encapsulation currently have an encap_rcv callback which they can
use to hook into the udp receive path.

In situations where a encapsulation user allocates resources associated with a
udp encap socket, it may be convenient to be able to also hook the proto
.destroy operation.  For example, if an encap user holds a reference to the
udp socket, the destroy hook might be used to relinquish this reference.

This patch adds a socket destroy hook into udp, which is set and enabled
in the same way as the existing encap_rcv hook.
Signed-off-by: NTom Parkin <tparkin@katalix.com>
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44046a59

20 3月, 2013 1 次提交

usb: ulpi: Define a *otg_ulpi_create no-op · 7fa4cd1a

由 Fabio Estevam 提交于 3月 20, 2013

Building a kernel for imx_v4_v5_defconfig with CONFIG_USB_ULPI disabled, results
in the following error:

arch/arm/mach-imx/built-in.o: In function 'pca100_init':
platform-mx2-emma.c:(.init.text+0x6788): undefined reference to 'otg_ulpi_create'
platform-mx2-emma.c:(.init.text+0x682c): undefined reference to 'mxc_ulpi_access_ops'

Fix this by providing a no-op definition of *otg_ulpi_create for the case when
CONFIG_USB_ULPI is not defined.
Acked-by: NIgor Grinberg <grinberg@compulab.co.il>
Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: NFelipe Balbi <balbi@ti.com>

7fa4cd1a