提交 · 3e2abc5a35d25442821e1733687b7abbc83b5072 · openeuler / raspberrypi-kernel

21 3月, 2012 2 次提交

ACPI: EC: Add ec_get_handle() · 3e2abc5a

由 Seth Forshee 提交于 1月 18, 2012

toshiba_acpi needs to execute an AML method within the EC namespace
to make hotkeys work on some platforms. Provide an interface to
allow it to easily get a handle to the EC namespace for this purpose.
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NMatthew Garrett <mjg@redhat.com>

3e2abc5a

platform, x86: Kill off Moorestown · f39eaa67

由 Alan Cox 提交于 1月 26, 2012

All production devices operate in the Oaktrail configuration with legacy PC
elements present and an ACPI BIOS. Continue stripping out the Moorestown
elements from the tree leaving Medfield.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NMatthew Garrett <mjg@redhat.com>

f39eaa67

08 3月, 2012 2 次提交

route: Remove redirect_genid · ac3f48de

由 Steffen Klassert 提交于 3月 06, 2012

As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac3f48de

inetpeer: Invalidate the inetpeer tree along with the routing cache · 5faa5df1

由 Steffen Klassert 提交于 3月 06, 2012

We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.

To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5faa5df1

06 3月, 2012 6 次提交

memcg: fix GPF when cgroup removal races with last exit · 7512102c

由 Hugh Dickins 提交于 3月 05, 2012

When moving tasks from old memcg (with move_charge_at_immigrate on new
memcg), followed by removal of old memcg, hit General Protection Fault in
mem_cgroup_lru_del_list() (called from release_pages called from
free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
exit_mmap from mmput from exit_mm from do_exit).

Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
still trying to update its stats, and take page off lru before freeing.

A task, or a charge, or a page on lru: each secures a memcg against
removal. In this case, the last task has been moved out of the old memcg,
and it is exiting: anonymous pages are uncharged one by one from the
memcg, as they are zapped from its pagetables, so the charge gets down to
0; but the pages themselves are queued in an mmu_gather for freeing.

Most of those pages will be on lru (and force_empty is careful to
lru_add_drain_all, to add pages from pagevec to lru first), but not
necessarily all: perhaps some have been isolated for page reclaim, perhaps
some isolated for other reasons. So, force_empty may find no task, no
charge and no page on lru, and let the removal proceed.

There would still be no problem if these pages were immediately freed; but
typically (and the put_page_testzero protocol demands it) they have to be
added back to lru before they are found freeable, then removed from lru
and freed. We don't see the issue when adding, because the
mem_cgroup_iter() loops keep their own reference to the memcg being
scanned; but when it comes to mem_cgroup_lru_del_list().

I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
removed those convolutions, but left this General Protection Fault.

But it's surprisingly easy to restore the old behaviour: just check
PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
to add), and reset pc to root_mem_cgroup if page is uncharged. A risky
change? just going back to how it worked before; testing, and an audit of
uses of pc->mem_cgroup, show no problem.

And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
clear pc->mem_cgroup if necessary").

Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
is not strictly necessary: the lru_lock there, with RCU before memcg
structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
without that; but it seems cleaner to rely on one dependency less.
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7512102c

vfork: kill PF_STARTING · 6e27f63e

由 Oleg Nesterov 提交于 3月 05, 2012

Previously it was (ab)used by utrace.  Then it was wrongly used by the
scheduler code.

Currently it is not used, kill it before it finds the new erroneous user.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e27f63e

coredump_wait: don't call complete_vfork_done() · 57b59c4a

由 Oleg Nesterov 提交于 3月 05, 2012

Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done().  zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.

mm_release() becomes the only caller, unexport complete_vfork_done().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57b59c4a

vfork: make it killable · d68b46fe

由 Oleg Nesterov 提交于 3月 05, 2012

Make vfork() killable.

Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
fails we do not return to the user-mode and never touch the memory shared
with our child.

However, in this case we should clear child->vfork_done before return, we
use task_lock() in do_fork()->wait_for_vfork_done() and
complete_vfork_done() to serialize with each other.

Note: now that we use task_lock() we don't really need completion, we
could turn task->vfork_done into "task_struct *wake_up_me" but this needs
some complications.

NOTE: this and the next patches do not affect in-kernel users of
CLONE_VFORK, kernel threads run with all signals ignored including
SIGKILL/SIGSTOP.

However this is obviously the user-visible change.  Not only a fatal
signal can kill the vforking parent, a sub-thread can do execve or
exit_group() and kill the thread sleeping in vfork().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d68b46fe

vfork: introduce complete_vfork_done() · c415c3b4

由 Oleg Nesterov 提交于 3月 05, 2012

No functional changes.

Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c415c3b4

kmsg_dump: don't run on non-error paths by default · c22ab332

由 Matthew Garrett 提交于 3月 05, 2012

Since commit 04c6862c ("kmsg_dump: add kmsg_dump() calls to the
reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
run on normal paths including poweroff and reboot.

This is less than ideal given pstore implementations that can only
represent single backtraces, since a reboot may overwrite a stored oops
before it's been picked up by userspace.  In addition, some pstore
backends may have low performance and provide a significant delay in
reboot as a result.

This patch adds a printk.always_kmsg_dump kernel parameter (which can also
be changed from userspace).  Without it, the code will only be run on
failure paths rather than on normal paths.  The option can be enabled in
environments where there's a desire to attempt to audit whether or not a
reboot was cleanly requested or not.
Signed-off-by: NMatthew Garrett <mjg@redhat.com>
Acked-by: NSeiji Aguchi <seiji.aguchi@hds.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c22ab332

05 3月, 2012 2 次提交

vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c · 5483f18e

由 Linus Torvalds 提交于 3月 04, 2012

It's only used inside fs/dcache.c, and we're going to play games with it
for the word-at-a-time patches.  This time we really don't even want to
export it, because it really is an internal function to fs/dcache.c, and
has been since it was introduced.

Having it in that extremely hot header file (it's included in pretty
much everything, thanks to <linux/fs.h>) is a disaster for testing
different versions, and is utterly pointless.

We really should have some kind of header file diet thing, where we
figure out which parts of header files are really better off private and
only result in more expensive compiles.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5483f18e

percpu: fix __this_cpu_{sub,inc,dec}_return() definition · adb79506

由 Konstantin Khlebnikov 提交于 2月 29, 2012

This patch adds missed "__" prefixes, otherwise these functions
works as irq/preemption safe.
Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

adb79506

03 3月, 2012 5 次提交

vfs: clarify and clean up dentry_cmp() · 5707c87f

由 Linus Torvalds 提交于 3月 02, 2012

It did some odd things for unclear reasons. As this is one of the
functions that gets changed when doing word-at-a-time compares, this is
yet another of the "don't change any semantics, but clean things up so
that subsequent patches don't get obscured by the cleanups".
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5707c87f

vfs: uninline full_name_hash() · 0145acc2

由 Linus Torvalds 提交于 3月 02, 2012

.. and also use it in lookup_one_len() rather than open-coding it.

There aren't any performance-critical users, so inlining it is silly.
But it wouldn't matter if it wasn't for the fact that the word-at-a-time
dentry name patches want to conditionally replace the function, and
uninlining it sets the stage for that.

So again, this is a preparatory patch that doesn't change any semantics,
and only prepares for a much cleaner and testable word-at-a-time dentry
name accessor patch.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0145acc2

vfs: trivial __d_lookup_rcu() cleanups · 8966be90

由 Linus Torvalds 提交于 3月 02, 2012

These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.

They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8966be90

regset: Return -EFAULT, not -EIO, on host-side memory fault · 5189fa19

由 H. Peter Anvin 提交于 3月 02, 2012

There is only one error code to return for a bad user-space buffer
pointer passed to a system call in the same address space as the
system call is executed, and that is EFAULT.  Furthermore, the
low-level access routines, which catch most of the faults, return
EFAULT already.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5189fa19

regset: Prevent null pointer reference on readonly regsets · c8e25258

由 H. Peter Anvin 提交于 3月 02, 2012

The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.

Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8e25258

02 3月, 2012 2 次提交

Block: use a freezable workqueue for disk-event polling · 62d3c543

由 Alan Stern 提交于 3月 02, 2012

This patch (as1519) fixes a bug in the block layer's disk-events
polling.  The polling is done by a work routine queued on the
system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
polling continues even in the middle of a system sleep transition.

Obviously, polling a suspended drive for media changes and such isn't
a good thing to do; in the case of USB mass-storage devices it can
lead to real problems requiring device resets and even re-enumeration.

The patch fixes things by creating a new system-wide, non-reentrant,
freezable workqueue and using it for disk-events polling.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
CC: <stable@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62d3c543

block: Fix NULL pointer dereference in sd_revalidate_disk · fe316bf2

由 Jun'ichi Nomura 提交于 3月 02, 2012

Since 2.6.39 (1196f8b8), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.

However it ends up calling driver's revalidate_disk without open
and could cause oops.

In the case of SCSI:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               <scsi_device torn down>
      rescan_partitions
        sd_revalidate_disk
          <oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Reported-by: NHuajun Li <huajun.li.lee@gmail.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe316bf2

29 2月, 2012 1 次提交

tcp: fix comment for tp->highest_sack · ecb97192

由 Neal Cardwell 提交于 2月 27, 2012

There was an off-by-one error in the comments describing the
highest_sack field in struct tcp_sock. The comments previously claimed
that it was the "start sequence of the highest skb with SACKed
bit". This commit fixes the comments to note that it is the "start
sequence of the skb just *after* the highest skb with SACKed bit".
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecb97192

27 2月, 2012 2 次提交

[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional · 97a29d59

由 James Bottomley 提交于 1月 30, 2012

The problem in

commit fea80311
Author: Randy Dunlap <rdunlap@xenotime.net>
Date:   Sun Jul 24 11:39:14 2011 -0700

    iomap: make IOPORT/PCI mapping functions conditional

is that if your architecture supplies pci_iomap/pci_iounmap, it expects
always to supply them.  Adding empty body defitions in the !CONFIG_PCI
case, which is what this patch does, breaks the parisc compile because
the functions become doubly defined.  It took us a while to spot this,
because we don't actually build !CONFIG_PCI very often (only if someone
is brave enough to test the snake/asp machines).

Since the note in the commit log says this is to fix a
CONFIG_GENERIC_IOMAP issue (which it does because CONFIG_GENERIC_IOMAP
supplies pci_iounmap only if CONFIG_PCI is set), there should actually
have been a condition upon this.  This should make sure no other
architecture's !CONFIG_PCI compile breaks in the same way as parisc.

The fix had to be updated to take account of the GENERIC_PCI_IOMAP
separation.
Reported-by: NRolf Eike Beer <eike@sf-mail.de>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

97a29d59

Fix autofs compile without CONFIG_COMPAT · 3c761ea0

由 Linus Torvalds 提交于 2月 26, 2012

The autofs compat handling fix caused a compile failure when
CONFIG_COMPAT isn't defined.

Instead of adding random #ifdef'fery in autofs, let's just make the
compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
just hardcodes to zero.

We could probably do something similar for a number of other cases where
we have #ifdef's in code, but this is the low-hanging fruit.
Reported-and-tested-by: NAndreas Schwab <schwab@linux-m68k.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c761ea0

25 2月, 2012 1 次提交

epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() · d80e731e

由 Oleg Nesterov 提交于 2月 24, 2012

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d80e731e

24 2月, 2012 3 次提交

netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2) · 7d367e06

由 Jozsef Kadlecsik 提交于 2月 24, 2012

Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.
It required to extend (and thus change) nf_conntrack_hash_insert
so that it makes sure conntrack and ctnetlink do not add the same entry
twice to the conntrack table.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7d367e06

ARM: 7339/1: amba/serial.h: Include types.h for resolving dependency of type bool · 7e55d052

由 viresh kumar 提交于 2月 23, 2012

serial.h uses bool, but its definition is missing, as it doesn't include
types.h. Fix this by including types.h
Signed-off-by: NViresh Kumar <viresh.kumar@st.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

7e55d052

ipsec: be careful of non existing mac headers · 03606895

由 Eric Dumazet 提交于 2月 23, 2012

Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)

Before copying mac header, better make sure it is present.

Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809Reported-by: NNiccolò Belli <darkbasic@linuxsystems.it>
Tested-by: NNiccolò Belli <darkbasic@linuxsystems.it>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03606895

22 2月, 2012 6 次提交

sched/events: Revert trace_sched_stat_sleeptime() · 8c79a045

由 Peter Zijlstra 提交于 1月 30, 2012

Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
added a new sched:sched_stat_sleeptime tracepoint.

It's broken: the first sample we get on a task might be bad because
of a stale sleep_start value that wasn't reset at the last task switch
because the tracepoint was not active.

It also breaks the existing schedstat samples due to the side
effects of:

-               se->statistics.sleep_start = 0;
...
-               se->statistics.block_start = 0;

Nor do I see means to fix it without adding overhead to the scheduler
fast path, which I'm not willing to for the sake of redundant
instrumentation.

Most importantly, sleep time information can already be constructed
by tracing context switches and wakeups, and taking the timestamp
difference between the schedule-out, the wakeup and the schedule-in.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

8c79a045

sys_poll: fix incorrect type for 'timeout' parameter · faf30900

由 Linus Torvalds 提交于 2月 21, 2012

The 'poll()' system call timeout parameter is supposed to be 'int', not
'long'.

Now, the reason this matters is that right now 32-bit compat mode is
broken on at least x86-64, because the 32-bit code just calls
'sys_poll()' directly on x86-64, and the 32-bit argument will have been
zero-extended, turning a signed 'int' into a large unsigned 'long'
value.

We could just introduce a 'compat_sys_poll()' function for this, and
that may eventually be what we have to do, but since the actual standard
poll() semantics is *supposed* to be 'int', and since at least on x86-64
glibc sign-extends the argument before invocing the system call (so
nobody can actually use a 64-bit timeout value in user space _anyway_,
even in 64-bit binaries), the simpler solution would seem to be to just
fix the definition of the system call to match what it should have been
from the very start.

If it turns out that somebody somehow circumvents the user-level libc
64-bit sign extension and actually uses a large unsigned 64-bit timeout
despite that not being how poll() is supposed to work, we will need to
do the compat_sys_poll() approach.
Reported-by: NThomas Meyer <thomas@m3y3r.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

faf30900

asm-generic: architecture independent readq/writeq for 32bit environment · 797a796a

由 Hitoshi Mitake 提交于 2月 07, 2012

This provides unified readq()/writeq() helper functions for 32-bit
drivers.

For some cases, readq/writeq without atomicity is harmful, and order of
io access has to be specified explicitly.  So in this patch, new two
header files which contain non-atomic readq/writeq are added.

 - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/
   writeq with the order of lower address -> higher address

 - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/
   writeq with reversed order

This allows us to remove some readq()s that were added drivers when the
default non-atomic ones were removed in commit dbee8a0a ("x86:
remove 32-bit versions of readq()/writeq()")

The drivers which need readq/writeq but can do with the non-atomic ones
must add the line:

  #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */

But this will be nop in 64-bit environments, and no other #ifdefs are
required.  So I believe that this patch can solve the problem of
 1. driver-specific readq/writeq
 2. atomicity and order of io access

This patch is tested with building allyesconfig and allmodconfig as
ARCH=x86 and ARCH=i386 on top of tip/master.

Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NHitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

797a796a

rtnetlink: Fix problem with buffer allocation · 115c9b81

由 Greg Rose 提交于 2月 21, 2012

Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

115c9b81

percpu: use raw_local_irq_* in _this_cpu op · e920d597

由 Ming Lei 提交于 2月 15, 2012

It doesn't make sense to trace irq off or do irq flags
lock proving inside 'this_cpu' operations, so replace local_irq_*
with raw_local_irq_* in 'this_cpu' op.

Also the patch fixes onelockdep warning[1] by the replacement, see
below:

In commit: 933393f5(percpu:
Remove irqsafe_cpu_xxx variants), local_irq_save/restore(flags) are
added inside this_cpu_inc operation, so that trace_hardirqs_off_caller
will be called by trace_hardirqs_on_caller directly because
__debug_atomic_inc is implemented as this_cpu_inc, which may trigger
the lockdep warning[1], for example in the below ARM scenary:

	kernel_thread_helper	/*irq disabled*/
		->trace_hardirqs_on_caller	/*hardirqs_enabled was set*/
			->trace_hardirqs_off_caller	/*hardirqs_enabled cleared*/
				__this_cpu_add(redundant_hardirqs_on)
			->trace_hardirqs_off_caller	/*irq disabled, so call here*/

The 'unannotated irqs-on' warning will be triggered somewhere because
irq is just enabled after the irq trace in kernel_thread_helper.

[1],
[    0.162841] ------------[ cut here ]------------
[    0.167694] WARNING: at kernel/lockdep.c:3493 check_flags+0xc0/0x1d0()
[    0.174468] Modules linked in:
[    0.177703] Backtrace:
[    0.180328] [<c00171f0>] (dump_backtrace+0x0/0x110) from [<c0412320>] (dump_stack+0x18/0x1c)
[    0.189086]  r6:c051f778 r5:00000da5 r4:00000000 r3:60000093
[    0.195007] [<c0412308>] (dump_stack+0x0/0x1c) from [<c00410e8>] (warn_slowpath_common+0x54/0x6c)
[    0.204223] [<c0041094>] (warn_slowpath_common+0x0/0x6c) from [<c0041124>] (warn_slowpath_null+0x24/0x2c)
[    0.214111]  r8:00000000 r7:00000000 r6:ee069598 r5:60000013 r4:ee082000
[    0.220825] r3:00000009
[    0.223693] [<c0041100>] (warn_slowpath_null+0x0/0x2c) from [<c0088f38>] (check_flags+0xc0/0x1d0)
[    0.232910] [<c0088e78>] (check_flags+0x0/0x1d0) from [<c008d348>] (lock_acquire+0x4c/0x11c)
[    0.241668] [<c008d2fc>] (lock_acquire+0x0/0x11c) from [<c0415aa4>] (_raw_spin_lock+0x3c/0x74)
[    0.250610] [<c0415a68>] (_raw_spin_lock+0x0/0x74) from [<c010a844>] (set_task_comm+0x20/0xc0)
[    0.259521]  r6:ee069588 r5:ee0691c0 r4:ee082000
[    0.264404] [<c010a824>] (set_task_comm+0x0/0xc0) from [<c0060780>] (kthreadd+0x28/0x108)
[    0.272857]  r8:00000000 r7:00000013 r6:c0044a08 r5:ee0691c0 r4:ee082000
[    0.279571] r3:ee083fe0
[    0.282470] [<c0060758>] (kthreadd+0x0/0x108) from [<c0044a08>] (do_exit+0x0/0x6dc)
[    0.290405]  r5:c0060758 r4:00000000
[    0.294189] ---[ end trace 1b75b31a2719ed1c ]---
[    0.299041] possible reason: unannotated irqs-on.
[    0.303955] irq event stamp: 5
[    0.307159] hardirqs last  enabled at (4): [<c001331c>] no_work_pending+0x8/0x2c
[    0.314880] hardirqs last disabled at (5): [<c0089b08>] trace_hardirqs_on_caller+0x60/0x26c
[    0.323547] softirqs last  enabled at (0): [<c003f754>] copy_process+0x33c/0xef4
[    0.331207] softirqs last disabled at (0): [<  (null)>]   (null)
[    0.337585] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

e920d597

percpu: fix generic definition of __this_cpu_add_and_return() · 7d96b3e5

由 Konstantin Khlebnikov 提交于 2月 19, 2012

This patch adds missed "__" into function prefix.
Otherwise on all archectures (except x86) it expands to irq/preemtion-safe
variant: _this_cpu_generic_add_return(), which do extra irq-save/irq-restore.
Optimal generic implementation is __this_cpu_generic_add_return().
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

7d96b3e5

21 2月, 2012 1 次提交

netfilter: ebtables: fix alignment problem in ppc · 88ba136d

由 Joerg Willmann 提交于 2月 21, 2012

ebt_among extension of ebtables uses __alignof__(_xt_align) while the
corresponding kernel module uses __alignof__(ebt_replace) to determine
the alignment in EBT_ALIGN().

These are the results of these values on different platforms:

x86 x86_64 ppc
__alignof__(_xt_align) 4 8 8
__alignof__(ebt_replace) 4 8 4

ebtables fails to add rules which use the among extension.

I'm using kernel 2.6.33 and ebtables 2.0.10-4

According to Bart De Schuymer, userspace alignment was changed to
_xt_align to fix an alignment issue on a userspace32-kernel64 system
(he thinks it was for an ARM device). So userspace must be right.
The kernel alignment macro needs to change so it also uses _xt_align
instead of ebt_replace. The userspace changes date back from
June 29, 2009.
Signed-off-by: NJoerg Willmann <joe@clnt.de>
Signed-off by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

88ba136d

20 2月, 2012 1 次提交

digsig: changed type of the timestamp · 59cca653

由 Dmitry Kasatkin 提交于 2月 02, 2012

time_t was used in the signature and key packet headers,
which is typedef of long and is different on 32 and 64 bit architectures.
Signature and key format should be independent of architecture.
Similar to GPG, I have changed the type to uint32_t.
Signed-off-by: NDmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

59cca653

16 2月, 2012 2 次提交

crypto: sha512 - use standard ror64() · f2ea0f5f

由 Alexey Dobriyan 提交于 1月 14, 2012

Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.

The difference is shift value being "unsigned int" instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.

Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f2ea0f5f

dt: add empty of_find_compatible_node function · 2261cc62

由 Shawn Guo 提交于 2月 15, 2012

Add empty of_find_compatible_node function for !CONFIG_OF build.
Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

2261cc62

15 2月, 2012 4 次提交

Bluetooth: Remove usage of __cancel_delayed_work() · 6de32750

由 Ulisses Furquim 提交于 1月 30, 2012

__cancel_delayed_work() is being used in some paths where we cannot
sleep waiting for the delayed work to finish. However, that function
might return while the timer is running and the work will be queued
again. Replace the calls with safer cancel_delayed_work() version
which spins until the timer handler finishes on other CPUs and
cancels the delayed work.
Signed-off-by: NUlisses Furquim <ulisses@profusion.mobi>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

6de32750

Bluetooth: Fix potential deadlock · a51cd2be

由 Andre Guedes 提交于 1月 27, 2012

We don't need to use the _sync variant in hci_conn_hold and
hci_conn_put to cancel conn->disc_work delayed work. This way
we avoid potential deadlocks like this one reported by lockdep.

======================================================
[ INFO: possible circular locking dependency detected ]
3.2.0+ #1 Not tainted
-------------------------------------------------------
kworker/u:1/17 is trying to acquire lock:
 (&hdev->lock){+.+.+.}, at: [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]

but task is already holding lock:
 ((&(&conn->disc_work)->work)){+.+...}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((&(&conn->disc_work)->work)){+.+...}:
       [<ffffffff81057444>] lock_acquire+0x8a/0xa7
       [<ffffffff81034ed1>] wait_on_work+0x3d/0xaa
       [<ffffffff81035b54>] __cancel_work_timer+0xac/0xef
       [<ffffffff81035ba4>] cancel_delayed_work_sync+0xd/0xf
       [<ffffffffa00554b0>] smp_chan_create+0xde/0xe6 [bluetooth]
       [<ffffffffa0056160>] smp_conn_security+0xa3/0x12d [bluetooth]
       [<ffffffffa0053640>] l2cap_connect_cfm+0x237/0x2e8 [bluetooth]
       [<ffffffffa004239c>] hci_proto_connect_cfm+0x2d/0x6f [bluetooth]
       [<ffffffffa0046ea5>] hci_event_packet+0x29d1/0x2d60 [bluetooth]
       [<ffffffffa003dde3>] hci_rx_work+0xd0/0x2e1 [bluetooth]
       [<ffffffff810357af>] process_one_work+0x178/0x2bf
       [<ffffffff81036178>] worker_thread+0xce/0x152
       [<ffffffff81039a03>] kthread+0x95/0x9d
       [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10

-> #1 (slock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}:
       [<ffffffff81057444>] lock_acquire+0x8a/0xa7
       [<ffffffff812e553a>] _raw_spin_lock_bh+0x36/0x6a
       [<ffffffff81244d56>] lock_sock_nested+0x24/0x7f
       [<ffffffffa004d96f>] lock_sock+0xb/0xd [bluetooth]
       [<ffffffffa0052906>] l2cap_chan_connect+0xa9/0x26f [bluetooth]
       [<ffffffffa00545f8>] l2cap_sock_connect+0xb3/0xff [bluetooth]
       [<ffffffff81243b48>] sys_connect+0x69/0x8a
       [<ffffffff812e6579>] system_call_fastpath+0x16/0x1b

-> #0 (&hdev->lock){+.+.+.}:
       [<ffffffff81056d06>] __lock_acquire+0xa80/0xd74
       [<ffffffff81057444>] lock_acquire+0x8a/0xa7
       [<ffffffff812e3870>] __mutex_lock_common+0x48/0x38e
       [<ffffffff812e3c75>] mutex_lock_nested+0x2a/0x31
       [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff810357af>] process_one_work+0x178/0x2bf
       [<ffffffff81036178>] worker_thread+0xce/0x152
       [<ffffffff81039a03>] kthread+0x95/0x9d
       [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> slock-AF_BLUETOOTH-BTPROTO_L2CAP --> (&(&conn->disc_work)->work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((&(&conn->disc_work)->work));
                               lock(slock-AF_BLUETOOTH-BTPROTO_L2CAP);
                               lock((&(&conn->disc_work)->work));
  lock(&hdev->lock);

 *** DEADLOCK ***

2 locks held by kworker/u:1/17:
 #0:  (hdev->name){.+.+.+}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
 #1:  ((&(&conn->disc_work)->work)){+.+...}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf

stack backtrace:
Pid: 17, comm: kworker/u:1 Not tainted 3.2.0+ #1
Call Trace:
 [<ffffffff812e06c6>] print_circular_bug+0x1f8/0x209
 [<ffffffff81056d06>] __lock_acquire+0xa80/0xd74
 [<ffffffff81021ef2>] ? arch_local_irq_restore+0x6/0xd
 [<ffffffff81022bc7>] ? vprintk+0x3f9/0x41e
 [<ffffffff81057444>] lock_acquire+0x8a/0xa7
 [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
 [<ffffffff812e3870>] __mutex_lock_common+0x48/0x38e
 [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
 [<ffffffff81190fd6>] ? __dynamic_pr_debug+0x6d/0x6f
 [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
 [<ffffffff8105320f>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff812e3c75>] mutex_lock_nested+0x2a/0x31
 [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
 [<ffffffff810357af>] process_one_work+0x178/0x2bf
 [<ffffffff81035751>] ? process_one_work+0x11a/0x2bf
 [<ffffffff81055af3>] ? lock_acquired+0x1d0/0x1df
 [<ffffffffa00410f3>] ? hci_acl_disconn+0x65/0x65 [bluetooth]
 [<ffffffff81036178>] worker_thread+0xce/0x152
 [<ffffffff810407ed>] ? finish_task_switch+0x45/0xc5
 [<ffffffff810360aa>] ? manage_workers.isra.25+0x16a/0x16a
 [<ffffffff81039a03>] kthread+0x95/0x9d
 [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
 [<ffffffff812e5db4>] ? retint_restore_args+0x13/0x13
 [<ffffffff8103996e>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff812e7750>] ? gs_change+0x13/0x13
Signed-off-by: NAndre Guedes <andre.guedes@openbossa.org>
Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@openbossa.org>
Reviewed-by: NUlisses Furquim <ulisses@profusion.mobi>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

a51cd2be

Bluetooth: silence lockdep warning · b5a30dda

由 Octavian Purdila 提交于 1月 22, 2012

Since bluetooth uses multiple protocols types, to avoid lockdep
warnings, we need to use different lockdep classes (one for each
protocol type).

This is already done in bt_sock_create but it misses a couple of cases
when new connections are created. This patch corrects that to fix the
following warning:

<4>[ 1864.732366] =======================================================
<4>[ 1864.733030] [ INFO: possible circular locking dependency detected ]
<4>[ 1864.733544] 3.0.16-mid3-00007-gc9a0f62 #3
<4>[ 1864.733883] -------------------------------------------------------
<4>[ 1864.734408] t.android.btclc/4204 is trying to acquire lock:
<4>[ 1864.734869]  (rfcomm_mutex){+.+.+.}, at: [<c14970ea>] rfcomm_dlc_close+0x15/0x30
<4>[ 1864.735541]
<4>[ 1864.735549] but task is already holding lock:
<4>[ 1864.736045]  (sk_lock-AF_BLUETOOTH){+.+.+.}, at: [<c1498bf7>] lock_sock+0xa/0xc
<4>[ 1864.736732]
<4>[ 1864.736740] which lock already depends on the new lock.
<4>[ 1864.736750]
<4>[ 1864.737428]
<4>[ 1864.737437] the existing dependency chain (in reverse order) is:
<4>[ 1864.738016]
<4>[ 1864.738023] -> #1 (sk_lock-AF_BLUETOOTH){+.+.+.}:
<4>[ 1864.738549]        [<c1062273>] lock_acquire+0x104/0x140
<4>[ 1864.738977]        [<c13d35c1>] lock_sock_nested+0x58/0x68
<4>[ 1864.739411]        [<c1493c33>] l2cap_sock_sendmsg+0x3e/0x76
<4>[ 1864.739858]        [<c13d06c3>] __sock_sendmsg+0x50/0x59
<4>[ 1864.740279]        [<c13d0ea2>] sock_sendmsg+0x94/0xa8
<4>[ 1864.740687]        [<c13d0ede>] kernel_sendmsg+0x28/0x37
<4>[ 1864.741106]        [<c14969ca>] rfcomm_send_frame+0x30/0x38
<4>[ 1864.741542]        [<c1496a2a>] rfcomm_send_ua+0x58/0x5a
<4>[ 1864.741959]        [<c1498447>] rfcomm_run+0x441/0xb52
<4>[ 1864.742365]        [<c104f095>] kthread+0x63/0x68
<4>[ 1864.742742]        [<c14d5182>] kernel_thread_helper+0x6/0xd
<4>[ 1864.743187]
<4>[ 1864.743193] -> #0 (rfcomm_mutex){+.+.+.}:
<4>[ 1864.743667]        [<c1061ada>] __lock_acquire+0x988/0xc00
<4>[ 1864.744100]        [<c1062273>] lock_acquire+0x104/0x140
<4>[ 1864.744519]        [<c14d2c70>] __mutex_lock_common+0x3b/0x33f
<4>[ 1864.744975]        [<c14d303e>] mutex_lock_nested+0x2d/0x36
<4>[ 1864.745412]        [<c14970ea>] rfcomm_dlc_close+0x15/0x30
<4>[ 1864.745842]        [<c14990d9>] __rfcomm_sock_close+0x5f/0x6b
<4>[ 1864.746288]        [<c1499114>] rfcomm_sock_shutdown+0x2f/0x62
<4>[ 1864.746737]        [<c13d275d>] sys_socketcall+0x1db/0x422
<4>[ 1864.747165]        [<c14d42f0>] syscall_call+0x7/0xb
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

b5a30dda

Bluetooth: Fix using an absolute timeout on hci_conn_put() · 33166063

由 Vinicius Costa Gomes 提交于 1月 04, 2012

queue_delayed_work() expects a relative time for when that work
should be scheduled.
Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@openbossa.org>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

33166063