提交 · 4d6f843a38fa26035598f1f35181aa5f328db897 · openeuler / raspberrypi-kernel

03 7月, 2013 1 次提交

sync: don't block the flusher thread waiting on IO · 7747bd4b

由 Dave Chinner 提交于 7月 02, 2013

When sync does it's WB_SYNC_ALL writeback, it issues data Io and
then immediately waits for IO completion. This is done in the
context of the flusher thread, and hence completely ties up the
flusher thread for the backing device until all the dirty inodes
have been synced. On filesystems that are dirtying inodes constantly
and quickly, this means the flusher thread can be tied up for
minutes per sync call and hence badly affect system level write IO
performance as the page cache cannot be cleaned quickly.

We already have a wait loop for IO completion for sync(2), so cut
this out of the flusher thread and delegate it to wait_sb_inodes().
Hence we can do rapid IO submission, and then wait for it all to
complete.

Effect of sync on fsmark before the patch:

FSUse%        Count         Size    Files/sec     App Overhead
.....
     0       640000         4096      35154.6          1026984
     0       720000         4096      36740.3          1023844
     0       800000         4096      36184.6           916599
     0       880000         4096       1282.7          1054367
     0       960000         4096       3951.3           918773
     0      1040000         4096      40646.2           996448
     0      1120000         4096      43610.1           895647
     0      1200000         4096      40333.1           921048

And a single sync pass took:

  real    0m52.407s
  user    0m0.000s
  sys     0m0.090s

After the patch, there is no impact on fsmark results, and each
individual sync(2) operation run concurrently with the same fsmark
workload takes roughly 7s:

  real    0m6.930s
  user    0m0.000s
  sys     0m0.039s

IOWs, sync is 7-8x faster on a busy filesystem and does not have an
adverse impact on ongoing async data write operations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7747bd4b

01 7月, 2013 1 次提交

jbd2: invalidate handle if jbd2_journal_restart() fails · 41a5b913

由 Theodore Ts'o 提交于 7月 01, 2013

If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction.  In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.

Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reported-by: NYounger Liu <younger.liu@huawei.com>
Cc: Jan Kara <jack@suse.cz>

41a5b913

29 6月, 2013 7 次提交

A
[readdir] constify ->actor · ac6614b7
由 Al Viro 提交于 5月 22, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ac6614b7

[readdir] ->readdir() is gone · 2233f31a

由 Al Viro 提交于 5月 22, 2013

everything's converted to ->iterate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2233f31a

[readdir] convert ext3 · 5ded75ec

由 Al Viro 提交于 5月 15, 2013

new helper: dir_relax(inode).  Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*).  Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5ded75ec

[readdir] switch dcache_readdir() users to ->iterate() · 5f99f4e7

由 Al Viro 提交于 5月 15, 2013

new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5f99f4e7

[readdir] introduce ->iterate(), ctx->pos, dir_emit() · bb6f619b

由 Al Viro 提交于 5月 15, 2013

New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bb6f619b

[readdir] introduce iterate_dir() and dir_context · 5c0ba4e0

由 Al Viro 提交于 5月 15, 2013

iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5c0ba4e0

A
move linux/loop.h to drivers/block · 83a87611
由 Al Viro 提交于 5月 12, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
83a87611

27 6月, 2013 3 次提交

sched: Fix typo in struct sched_avg member description · 239003ea

由 Kamalesh Babulal 提交于 6月 27, 2013

Remove extra 'for' from the description about member of
struct sched_avg.
Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: pjt@google.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/20130627060409.GB18582@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

239003ea

Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" · 141965c7

由 Alex Shi 提交于 6月 26, 2013

Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then
we can use runnable load variables.

Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted
patch(introduced in 9ee474f5), but also need to revert.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51CA76A3.3050207@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

141965c7

net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17

由 Nicolas Schichan 提交于 6月 26, 2013

When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
rename of a network interface, it can end up waiting for a workqueue
to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
the fact that read_secklock_begin() will spin forever waiting for the
writer process (the one doing the interface rename) to update the
devnet_rename_seq sequence.

This patch fixes the problem by adding a helper (netdev_get_name())
and using it in the code handling the SIOCGIFNAME ioctl and
SO_BINDTODEVICE setsockopt.

The netdev_get_name() helper uses raw_seqcount_begin() to avoid
spinning forever, waiting for devnet_rename_seq->sequence to become
even. cond_resched() is used in the contended case, before retrying
the access to give the writer process a chance to finish.

The use of raw_seqcount_begin() will incur some unneeded work in the
reader process in the contended case, but this is better than
deadlocking the system.
Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dbe7c17

26 6月, 2013 5 次提交

mutex: Add w/w mutex slowpath debugging · 23010027

由 Daniel Vetter 提交于 6月 20, 2013

Injects EDEADLK conditions at pseudo-random interval, with
exponential backoff up to UINT_MAX (to ensure that every lock
operation still completes in a reasonable time).

This way we can test the wound slowpath even for ww mutex users
where contention is never expected, and the ww deadlock
avoidance algorithm is only needed for correctness against
malicious userspace. An example would be protecting kernel
modesetting properties, which thanks to single-threaded X isn't
really expected to contend, ever.

I've looked into using the CONFIG_FAULT_INJECTION
infrastructure, but decided against it for two reasons:

- EDEADLK handling is mandatory for ww mutex users and should
  never affect the outcome of a syscall. This is in contrast to -ENOMEM
  injection. So fine configurability isn't required.

- The fault injection framework only allows to set a simple
  probability for failure. Now the probability that a ww mutex acquire
  stage with N locks will never complete (due to too many injected
  EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
  operations for the completely uncontended case would be O(exp(N)).
  The per-acuiqire ctx exponential backoff solution choosen here only
  results in O(log N) overhead due to injection and so O(log N * N)
  lock operations. This way we can fail with high probability (and so
  have good test coverage even for fancy backoff and lock acquisition
  paths) without running into patalogical cases.

Note that EDEADLK will only ever be injected when we managed to
acquire the lock. This prevents any behaviour changes for users
which rely on the EALREADY semantics.
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

23010027

mutex: Add support for wound/wait style locks · 040a0a37

由 Maarten Lankhorst 提交于 6月 24, 2013

Wound/wait mutexes are used when other multiple lock
acquisitions of a similar type can be done in an arbitrary
order. The deadlock handling used here is called wait/wound in
the RDBMS literature: The older tasks waits until it can acquire
the contended lock. The younger tasks needs to back off and drop
all the locks it is currently holding, i.e. the younger task is
wounded.

For full documentation please read Documentation/ww-mutex-design.txt.

References: https://lwn.net/Articles/548909/Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NRob Clark <robdclark@gmail.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

040a0a37

driver core: device.h: fix doc compilation warnings · bfd63cd2

由 Michael Opdenacker 提交于 6月 26, 2013

This patch fixes the below 3 warnings running "make htmldocs",
by adding descriptions for recently added structure members:

DOCPROC Documentation/DocBook/device-drivers.xml
Warning(/work/git.free-electrons.com/users/michael-opdenacker/linux//include/linux/device.h:116): No description found for parameter 'lock_key'
Warning(/work/git.free-electrons.com/users/michael-opdenacker/linux//include/linux/device.h:723): No description found for parameter 'cma_area'
Warning(/work/git.free-electrons.com/users/michael-opdenacker/linux//include/linux/device.h:723): No description found for parameter 'iommu_group'

Don't hesitate to propose better descriptions!
Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

bfd63cd2

gre: fix a possible skb leak · bd8a7036

由 Eric Dumazet 提交于 6月 24, 2013

commit 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE")
added a possible skb leak, because it frees only the head of segment
list, in case a skb_linearize() call fails.

This patch adds a kfree_skb_list() helper to fix the bug.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd8a7036

futex: Take hugepages into account when generating futex_key · 13d60f4b

由 Zhang Yi 提交于 6月 25, 2013

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]
Signed-off-by: NZhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn>
Tested-by: NMa Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: N'Mel Gorman' <mgorman@suse.de>
Acked-by: N'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

13d60f4b

25 6月, 2013 3 次提交

genirq: Add irq_get_trigger_type() to get IRQ flags · 1f6236bf

由 Javier Martinez Canillas 提交于 6月 14, 2013

Drivers that want to get the trigger edge/level type flags for a given
interrupt have to call irq_get_irq_data(irq) to get the struct
irq_data and then irqd_get_trigger_type(irq_data) to obtain the IRQ
flags.

This is not only error prone but also unnecessary exposes the struct
irq_data to callers.

It's better to have an irq_get_trigger_type() function to obtain the
edge/level flags for an IRQ.
Signed-off-by: NJavier Martinez Canillas <javier.martinez@collabora.co.uk>
Acked-by: NGrant Likely <grant.likely@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/1371228049-27080-2-git-send-email-javier.martinez@collabora.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1f6236bf

usb: chipidea: drop "13xxx" infix · 8e22978c

由 Alexander Shishkin 提交于 6月 24, 2013

"ci13xxx" is bad for at least the following reasons:
  * people often mistype it
  * it doesn't add any informational value to the names it's used in
  * it needlessly attracts mail filters

This patch replaces it with "ci_hdrc", "ci_udc" or "ci_hw", depending
on the situation. Modules with ci13xxx prefix are also renamed accordingly
and aliases are added for compatibility. Otherwise, no functional changes.
Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8e22978c

usb: fix build error without CONFIG_USB_PHY · 848d5b91

由 Peter Chen 提交于 6月 24, 2013

on i386:

drivers/built-in.o: In function `ci_hdrc_probe':
core.c:(.text+0x20446b): undefined reference to `of_usb_get_phy_mode'
Signed-off-by: NPeter Chen <peter.chen@freescale.com>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: NFelipe Balbi <balbi@ti.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

848d5b91

24 6月, 2013 2 次提交

ARM: edma: Add EDMA crossbar event mux support · 2646a0e5

由 Matt Porter 提交于 6月 20, 2013

EDMA supports a cross bar which provides ability
to mux additional events into physical channels
present in the channel controller.

This is required when the number of events present
in the system are more than number of available
physical channels.

Changes by Joel:
* Split EDMA xbar support out of original EDMA DT parsing patch
to keep it easier for review.
* Rewrite shift and offset calculation.
Suggested-by: NSekhar Nori <nsekhar@ti.com>
Suggested by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NJoel A Fernandes <joelagnel@ti.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
[nsekhar@ti.com: fix checkpatch errors and a minor coding improvement]
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

2646a0e5

ARM: edma: Add DT and runtime PM support to the private EDMA API · 6cba4355

由 Matt Porter 提交于 6月 20, 2013

Adds support for parsing the TI EDMA DT data into the required EDMA
private API platform data. Enables runtime PM support to initialize
the EDMA hwmod. Enables build on OMAP.

Changes by Joel:
* Setup default one-to-one mapping for queue_priority and queue_tc
mapping as discussed in [1].
* Split out xbar stuff to separate patch. [1]
* Dropped unused DT helper to convert to array
* Fixed dangling pointer issue with Sekhar's changes

[1] https://patchwork.kernel.org/patch/2226761/Signed-off-by: NMatt Porter <mporter@ti.com>
[nsekhar@ti.com: fix checkpatch errors, build breakages. Introduce
edma_setup_info_from_dt() as part of that effort]
Signed-off-by: NJoel A Fernandes <joelagnel@ti.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

6cba4355

23 6月, 2013 1 次提交

perf: Drop sample rate when sampling is too slow · 14c63f17

由 Dave Hansen 提交于 6月 21, 2013

This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

14c63f17

20 6月, 2013 3 次提交

A
splice: don't pass the address of ->f_pos to methods · 7995bd28
由 Al Viro 提交于 6月 20, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7995bd28

net: vlan: fix comment for vlan_ethhdr->h_vlan_proto · a7bf5804

由 Olaf Hering 提交于 6月 14, 2013

After addition of 8021AD h_vlan_proto can be either ETH_P_8021Q or
ETH_P_8021AD.
Signed-off-by: NOlaf Hering <olaf@aepfle.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7bf5804

fmc: avoid readl/writel namespace conflict · c2955da0

由 Arnd Bergmann 提交于 6月 19, 2013

The use of the 'readl' and 'writel' identifiers here causes build errors on
architectures where those are macros. This renames the fields to read32/write32
to avoid the problem.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NAlessandro Rubini <rubini@gnudd.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c2955da0

19 6月, 2013 10 次提交

FS-Cache: The retrieval remaining-pages counter needs to be atomic_t · 1bb4b7f9

由 David Howells 提交于 5月 21, 2013

struct fscache_retrieval contains a count of the number of pages that still
need some processing (n_pages).  This is decremented as the pages are
processed.

However, this needs to be atomic as fscache_retrieval_complete() (I think) just
occasionally may be called from cachefiles_read_backing_file() and
cachefiles_read_copier() simultaneously.

This happens when an fscache_read_or_alloc_pages() request containing a lot of
pages (say a couple of hundred) is being processed.  The read on each backing
page is dispatched individually because we need to insert a monitor into the
waitqueue to catch when the read completes.  However, under low-memory
conditions, we might be forced to wait in the allocator - and this gives the
I/O on the backing page a chance to complete first.

When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto
the workqueue without waiting for the operation to finish the initial I/O
dispatch (we want to release any pages we can as soon as we can), thus both can
end up running simultaneously and potentially attempting to partially complete
the retrieval simultaneously (ENOMEM may occur, backing pages may already be in
the page cache).

This was demonstrated by parallelling the non-atomic counter with an atomic
counter and printing both of them when the assertion fails.  At this point, the
atomic counter has reached zero, but the non-atomic counter has not.

To fix this, make the counter an atomic_t.

This results in the following bug appearing

	FS-Cache: Assertion failed
	3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at fs/fscache/operation.c:421!

or

	FS-Cache: Assertion failed
	3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at fs/fscache/operation.c:414!

With a backtrace like the following:

RIP: 0010:[<ffffffffa0211b1d>] fscache_put_operation+0x1ad/0x240 [fscache]
Call Trace:
 [<ffffffffa0213185>] fscache_retrieval_work+0x55/0x270 [fscache]
 [<ffffffffa0213130>] ? fscache_retrieval_work+0x0/0x270 [fscache]
 [<ffffffff81090b10>] worker_thread+0x170/0x2a0
 [<ffffffff81096d10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810909a0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81096966>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968d0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-and-tested-By: NMilosz Tanski <milosz@adfin.com>
Acked-by: NJeff Layton <jlayton@redhat.com>

1bb4b7f9

FS-Cache: Simplify cookie retention for fscache_objects, fixing oops · 1362729b

由 David Howells 提交于 5月 10, 2013

Simplify the way fscache cache objects retain their cookie.  The way I
implemented the cookie storage handling made synchronisation a pain (ie. the
object state machine can't rely on the cookie actually still being there).

Instead of the the object being detached from the cookie and the cookie being
freed in __fscache_relinquish_cookie(), we defer both operations:

 (*) The detachment of the object from the list in the cookie now takes place
     in fscache_drop_object() and is thus governed by the object state machine
     (fscache_detach_from_cookie() has been removed).

 (*) The release of the cookie is now in fscache_object_destroy() - which is
     called by the cache backend just before it frees the object.

This means that the fscache_cookie struct is now available to the cache all the
way through from ->alloc_object() to ->drop_object() and ->put_object() -
meaning that it's no longer necessary to take object->lock to guarantee access.

However, __fscache_relinquish_cookie() doesn't wait for the object to go all
the way through to destruction before letting the netfs proceed.  That would
massively slow down the netfs.  Since __fscache_relinquish_cookie() leaves the
cookie around, in must therefore break all attachments to the netfs - which
includes ->def, ->netfs_data and any outstanding page read/writes.

To handle this, struct fscache_cookie now has an n_active counter:

 (1) This starts off initialised to 1.

 (2) Any time the cache needs to get at the netfs data, it calls
     fscache_use_cookie() to increment it - if it is not zero.  If it was zero,
     then access is not permitted.

 (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
     to decrement it.  This does a wake-up on it if it reaches 0.

 (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
     reach 0.  The initialisation to 1 in step (1) ensures that we only get
     wake ups when we're trying to get rid of the cookie.

This leaves __fscache_relinquish_cookie() a lot simpler.


***
This fixes a problem in the current code whereby if fscache_invalidate() is
followed sufficiently quickly by fscache_relinquish_cookie() then it is
possible for __fscache_relinquish_cookie() to have detached the cookie from the
object and cleared the pointer before a thread is dispatched to process the
invalidation state in the object state machine.

Since the pending write clearance was deferred to the invalidation state to
make it asynchronous, we need to either wait in relinquishment for the stores
tree to be cleared in the invalidation state or we need to handle the clearance
in relinquishment.

Further, if the relinquishment code does clear the tree, then the invalidation
state need to make the clearance contingent on still having the cookie to hand
(since that's where the tree is rooted) and we have to prevent the cookie from
disappearing for the duration.

This can lead to an oops like the following:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
...
RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30
...
CR2: 000000000000000c ...
...
Process kslowd002 (...)
....
Call Trace:
 [<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
 [<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180
 [<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
 [<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff8110e233>] slow_work_execute+0x233/0x310
 [<ffffffff8110e515>] slow_work_thread+0x205/0x360
 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8110e310>] ? slow_work_thread+0x0/0x360
 [<ffffffff81096936>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

The parameter to fscache_invalidate_writes() was object->cookie which is NULL.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-By: NMilosz Tanski <milosz@adfin.com>
Acked-by: NJeff Layton <jlayton@redhat.com>

1362729b

FS-Cache: Fix object state machine to have separate work and wait states · caaef690

由 David Howells 提交于 5月 10, 2013

Fix object state machine to have separate work and wait states as that makes
it easier to envision.

There are now three kinds of state:

 (1) Work state.  This is an execution state.  No event processing is performed
     by a work state.  The function attached to a work state returns a pointer
     indicating the next state to which the OSM should transition.  Returning
     NO_TRANSIT repeats the current state, but goes back to the scheduler
     first.

 (2) Wait state.  This is an event processing state.  No execution is
     performed by a wait state.  Wait states are just tables of "if event X
     occurs, clear it and transition to state Y".  The dispatcher returns to
     the scheduler if none of the events in which the wait state has an
     interest are currently pending.

 (3) Out-of-band state.  This is a special work state.  Transitions to normal
     states can be overridden when an unexpected event occurs (eg. I/O error).
     Instead the dispatcher disables and clears the OOB event and transits to
     the specified work state.  This then acts as an ordinary work state,
     though object->state points to the overridden destination.  Returning
     NO_TRANSIT resumes the overridden transition.

In addition, the states have names in their definitions, so there's no need for
tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.

Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object->flags to indicate what phase an object is in.

The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL).  An object flag now carries the information about retirement.

Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-By: NMilosz Tanski <milosz@adfin.com>
Acked-by: NJeff Layton <jlayton@redhat.com>

caaef690

FS-Cache: Wrap checks on object state · 493f7bc1

由 David Howells 提交于 5月 10, 2013

Wrap checks on object state (mostly outside of fs/fscache/object.c) with
inline functions so that the mechanism can be replaced.

Some of the state checks within object.c are left as-is as they will be
replaced.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-By: NMilosz Tanski <milosz@adfin.com>
Acked-by: NJeff Layton <jlayton@redhat.com>

493f7bc1

FS-Cache: Uninline fscache_object_init() · 610be24e

由 David Howells 提交于 5月 10, 2013

Uninline fscache_object_init() so as not to expose some of the FS-Cache
internals to the cache backend.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-By: NMilosz Tanski <milosz@adfin.com>
Acked-by: NJeff Layton <jlayton@redhat.com>

610be24e

perf/x86/intel: Support Haswell/v4 LBR format · 135c5612

由 Andi Kleen 提交于 6月 17, 2013

Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.

Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

135c5612

sched: Rename sched.c as sched/core.c in comments and Documentation · 0a0fca9d

由 Viresh Kumar 提交于 6月 04, 2013

Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
back and the comments/Documentation never got updated.

I figured it out when I was going through sched-domains.txt and so thought of
fixing it globally.

I haven't crossed check if the stuff that is referenced in sched/core.c by all
these files is still present and hasn't changed as that wasn't the motive behind
this patch.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0a0fca9d

tracing/context-tracking: Add preempt_schedule_context() for tracing · 29bb9e5a

由 Steven Rostedt 提交于 5月 24, 2013

Dave Jones hit the following bug report:

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.10.0-rc2+ #1 Not tainted
 -------------------------------
 include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:
 RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 2 locks held by cc1/63645:
  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
 Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
  0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
  ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
  0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
 Call Trace:
  [<ffffffff816ae383>] dump_stack+0x19/0x1b
  [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
  [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
  [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
  [<ffffffff8108dffc>] update_curr+0xec/0x240
  [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
  [<ffffffff816b3a71>] __schedule+0x161/0x9b0
  [<ffffffff816b4721>] preempt_schedule+0x51/0x80
  [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
  [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
  [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
  [<ffffffff816be280>] ftrace_call+0x5/0x2f
  [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
  [<ffffffff816b4805>] ? schedule_user+0x5/0x70
  [<ffffffff816b4805>] ? schedule_user+0x5/0x70
  [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
 ------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>

29bb9e5a

perf/x86: Reduce stack usage of x86_schedule_events() · 43b45780

由 Andrew Hunter 提交于 5月 23, 2013

x86_schedule_events() caches event constraints on the stack during
scheduling.  Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.

Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event.  For 8 bytes per event (1% of
its size) we can save the giant stack frame.

This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.

Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong.  I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.
Signed-off-by: NAndrew Hunter <ahh@google.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

43b45780

perf: Add const qualifier to perf_pmu_register's 'name' arg · 03d8e80b

由 Mischa Jonker 提交于 6月 04, 2013

This allows us to use pdev->name for registering a PMU device.
IMO the name is not supposed to be changed anyway.
Signed-off-by: NMischa Jonker <mjonker@synopsys.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370339148-5566-1-git-send-email-mjonker@synopsys.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

03d8e80b

18 6月, 2013 4 次提交

ARM: davinci: move private EDMA API to arm/common · 3ad7a42d

由 Matt Porter 提交于 3月 06, 2013

Move mach-davinci/dma.c to common/edma.c so it can be used
by OMAP (specifically AM33xx) as well.
Signed-off-by: NMatt Porter <mporter@ti.com>
Acked-by: Chris Ball <cjb@laptop.org> # davinci_mmc.c
Acked-by: NMark Brown <broonie@linaro.org>
Acked-by: NOlof Johansson <olof@lixom.net>
[nsekhar@ti.com: dropped davinci sffsdr changes]
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

3ad7a42d

FMC: add needed headers · e34fae78

由 Alessandro Rubini 提交于 6月 12, 2013

This set of headers comes from commit ab23167f (current master of the
project on ohwr.org). They define the basic data structures for FMC
and its SDB support.
Signed-off-by: NAlessandro Rubini <rubini@gnudd.com>
Acked-by: NJuan David Gonzalez Cobas <dcobas@cern.ch>
Acked-by: NEmilio G. Cota <cota@braap.org>
Acked-by: NSamuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e34fae78

extcon: Palmas Extcon Driver · b1f254e3

由 Graeme Gregory 提交于 5月 28, 2013

This is the driver for the USB comparator built into the palmas chip. It
handles the various USB OTG events that can be generated by cable
insertion/removal.
Signed-off-by: NGraeme Gregory <gg@slimlogic.co.uk>
Signed-off-by: NMoiz Sonasath <m-sonasath@ti.com>
Signed-off-by: NRuchika Kharwar <ruchika@ti.com>
Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: NGeorge Cherian <george.cherian@ti.com>
[kishon@ti.com: adapted palmas usb driver to use the extcon framework]
Signed-off-by: NSebastien Guiriec <s-guiriec@ti.com>
Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: NMyungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b1f254e3

USB: EHCI: tegra: fix circular module dependencies · 91a687d8

由 Stephen Warren 提交于 6月 13, 2013

The Tegra EHCI driver directly calls various functions in the Tegra USB
PHY driver. The reverse is also true; the PHY driver calls into the EHCI
driver. This is problematic when the two are built as modules.

The calls from the PHY to EHCI driver were originally added in commit
bbdabdb6 "usb: add APIs to access host registers from Tegra PHY", for the
following reasons:

1) The register being touched is an EHCI register, so logically only the
   EHCI driver should touch it.
2) (1) implies that some locking may be needed to correctly implement the
   r/m/w access to this shared register.
3) We were expecting to pass only the PHY register space to the Tegra PHY
   driver, and hence it would not have access to touch the shared
   registers.

To solve this, that commit added functions in the EHCI driver to touch the
shared register on behalf of the PHY driver.

In practice, we ended up not having any locking in the implementaiton of
those functions, and I've been led to believe this is safe. Equally, (3)
did not happen either. Hence, it is possible for the PHY driver to touch
the shared register directly.

Given that, this patch moves the code to touch the shared register back
into the PHY driver, to eliminate the module problems. If we actually
need locking or co-ordination in the future, I propose we put the lock
support into some pre-existing core module, or into a third separate
module, in order to avoid the circular dependencies.

I apologize for my contribution to code churn here.
Signed-off-by: NStephen Warren <swarren@nvidia.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Tested-by: NThierry Reding <thierry.reding@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

91a687d8