提交 · 5e1f54201cb481f40a04bc47e1bc8c093a189e23 · openeuler / raspberrypi-kernel

10 12月, 2012 4 次提交

inet_diag: validate port comparison byte code to prevent unsafe reads · 5e1f5420

由 Neal Cardwell 提交于 12月 09, 2012

Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.

Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e1f5420

inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() · f67caec9

由 Neal Cardwell 提交于 12月 08, 2012

Add logic to check the address family of the user-supplied conditional
and the address family of the connection entry. We now do not do
prefix matching of addresses from different address families (AF_INET
vs AF_INET6), except for the previously existing support for having an
IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
maintains as-is).

This change is needed for two reasons:

(1) The addresses are different lengths, so comparing a 128-bit IPv6
prefix match condition to a 32-bit IPv4 connection address can cause
us to unwittingly walk off the end of the IPv4 address and read
garbage or oops.

(2) The IPv4 and IPv6 address spaces are semantically distinct, so a
simple bit-wise comparison of the prefixes is not meaningful, and
would lead to bogus results (except for the IPv4-mapped IPv6 case,
which this commit maintains).
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f67caec9

inet_diag: validate byte code to prevent oops in inet_diag_bc_run() · 405c0059

由 Neal Cardwell 提交于 12月 08, 2012

Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
operations.

Previously we did not validate the inet_diag_hostcond, address family,
address length, and prefix length. So a malicious user could make the
kernel read beyond the end of the bytecode array by claiming to have a
whole inet_diag_hostcond when the bytecode was not long enough to
contain a whole inet_diag_hostcond of the given address family. Or
they could make the kernel read up to about 27 bytes beyond the end of
a connection address by passing a prefix length that exceeded the
length of addresses of the given family.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

405c0059

inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state · 1c95df85

由 Neal Cardwell 提交于 12月 08, 2012

Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c95df85

09 12月, 2012 2 次提交

mm: vmscan: fix inappropriate zone congestion clearing · ed23ec4f

由 Johannes Weiner 提交于 12月 06, 2012

commit c702418f ("mm: vmscan: do not keep kswapd looping forever due
to individual uncompactable zones") removed zone watermark checks from
the compaction code in kswapd but left in the zone congestion clearing,
which now happens unconditionally on higher order reclaim.

This messes up the reclaim throttling logic for zones with
dirty/writeback pages, where zones should only lose their congestion
status when their watermarks have been restored.

Remove the clearing from the zone compaction section entirely.  The
preliminary zone check and the reclaim loop in kswapd will clear it if
the zone is considered balanced.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed23ec4f

vfs: fix O_DIRECT read past end of block device · 684c9aae

由 Linus Torvalds 提交于 12月 07, 2012

The direct-IO write path already had the i_size checks in mm/filemap.c,
but it turns out the read path did not, and removing the block size
checks in fs/block_dev.c (commit bbec0270: "blkdev_max_block: make
private to fs/buffer.c") removed the magic "shrink IO to past the end of
the device" code there.

Fix it by truncating the IO to the size of the block device, like the
write path already does.

NOTE! I suspect the write path would be *much* better off doing it this
way in fs/block_dev.c, rather than hidden deep in mm/filemap.c.  The
mm/filemap.c code is extremely hard to follow, and has various
conditionals on the target being a block device (ie the flag passed in
to 'generic_write_checks()', along with a conditional update of the
inode timestamp etc).

It is also quite possible that we should treat this whole block device
size as a "s_maxbytes" issue, and try to make the logic even more
generic.  However, in the meantime this is the fairly minimal targeted
fix.

Noted by Milan Broz thanks to a regression test for the cryptsetup
reencrypt tool.
Reported-and-tested-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

684c9aae

08 12月, 2012 4 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1b3c393c

由 Linus Torvalds 提交于 12月 07, 2012

Pull networking fixes from David Miller:
 "Two stragglers:

   1) The new code that adds new flushing semantics to GRO can cause SKB
      pointer list corruption, manage the lists differently to avoid the
      OOPS.  Fix from Eric Dumazet.

   2) When TCP fast open does a retransmit of data in a SYN-ACK or
      similar, we update retransmit state that we shouldn't triggering a
      WARN_ON later.  Fix from Yuchung Cheng."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: gro: fix possible panic in skb_gro_receive()
  tcp: bug fix Fast Open client retransmission

1b3c393c

net: gro: fix possible panic in skb_gro_receive() · c3c7c254

由 Eric Dumazet 提交于 12月 06, 2012

commit 2e71a6f8 (net: gro: selective flush of packets) added
a bug for skbs using frag_list. This part of the GRO stack is rarely
used, as it needs skb not using a page fragment for their skb->head.

Most drivers do use a page fragment, but some of them use GFP_KERNEL
allocations for the initial fill of their RX ring buffer.

napi_gro_flush() overwrite skb->prev that was used for these skb to
point to the last skb in frag_list.

Fix this using a separate field in struct napi_gro_cb to point to the
last fragment.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3c7c254

tcp: bug fix Fast Open client retransmission · 93b174ad

由 Yuchung Cheng 提交于 12月 06, 2012

If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.

Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93b174ad

Merge tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 1afa4717

由 Linus Torvalds 提交于 12月 07, 2012

Pull MMC fixes from Chris Ball:
 "Two small regression fixes:

   - sdhci-s3c: Fix runtime PM regression against 3.7-rc1
   - sh-mmcif: Fix oops against 3.6"

* tag 'mmc-fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
  mmc: sdhci-s3c: fix missing clock for gpio card-detect

1afa4717

07 12月, 2012 10 次提交

tmpfs: fix shared mempolicy leak · 18a2f371

由 Mel Gorman 提交于 12月 05, 2012

This fixes a regression in 3.7-rc, which has since gone into stable.

Commit 00442ad0 ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().

Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad0,
alloc_pages_vma() has kept refcount in balance, so now no problem.
Reported-and-tested-by: NTommi Rantala <tt.rantala@gmail.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18a2f371

mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones · c702418f

由 Johannes Weiner 提交于 12月 04, 2012

When a zone meets its high watermark and is compactable in case of
higher order allocations, it contributes to the percentage of the node's
memory that is considered balanced.

This requirement, that a node be only partially balanced, came about
when kswapd was desparately trying to balance tiny zones when all bigger
zones in the node had plenty of free memory.  Arguably, the same should
apply to compaction: if a significant part of the node is balanced
enough to run compaction, do not get hung up on that tiny zone that
might never get in shape.

When the compaction logic in kswapd is reached, we know that at least
25% of the node's memory is balanced properly for compaction (see
zone_balanced and pgdat_balanced).  Remove the individual zone checks
that restart the kswapd cycle.

Otherwise, we may observe more endless looping in kswapd where the
compaction code loops back to reclaim because of a single zone and
reclaim does nothing because the node is considered balanced overall.

See for example

  https://bugzilla.redhat.com/show_bug.cgi?id=866988Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-and-tested-by: NThorsten Leemhuis <fedora@leemhuis.info>
Reported-by: NJiri Slaby <jslaby@suse.cz>
Tested-by: NJohn Ellson <john.ellson@comcast.net>
Tested-by: NZdenek Kabelac <zkabelac@redhat.com>
Tested-by: NBruno Wolff III <bruno@wolff.to>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c702418f

mm: compaction: validate pfn range passed to isolate_freepages_block · 60177d31

由 Mel Gorman 提交于 12月 06, 2012

Commit 0bf380bc ("mm: compaction: check pfn_valid when entering a
new MAX_ORDER_NR_PAGES block during isolation for migration") added a
check for pfn_valid() when isolating pages for migration as the scanner
does not necessarily start pageblock-aligned.

Since commit c89511ab ("mm: compaction: Restart compaction from near
where it left off"), the free scanner has the same problem. This patch
makes sure that the pfn range passed to isolate_freepages_block() is
within the same block so that pfn_valid() checks are unnecessary.

In answer to Henrik's wondering why others have not reported this:
reproducing this requires a large enough hole with the right aligment to
have compaction walk into a PFN range with no memmap. Size and
alignment depends in the memory model - 4M for FLATMEM and 128M for
SPARSEMEM on x86. It needs a "lucky" machine.
Reported-by: NHenrik Rydberg <rydberg@euromail.se>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60177d31

mmc: sh-mmcif: avoid oops on spurious interrupts (second try) · 91ab252a

由 Guennadi Liakhovetski 提交于 8月 22, 2012

On some systems, e.g., kzm9g, MMCIF interfaces can produce spurious
interrupts without any active request. To prevent the Oops, that results
in such cases, don't dereference the mmc request pointer until we make
sure, that we are indeed processing such a request.
Reported-by: NTetsuyuki Kobayashi <koba@kmckk.co.jp>
Signed-off-by: NGuennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: NTetsuyuki Kobayashi <koba@kmckk.co.jp>
Cc: stable@vger.kernel.org
Signed-off-by: NChris Ball <cjb@laptop.org>

91ab252a

Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts" · 6984f3c3

由 Chris Ball 提交于 12月 03, 2012

This reverts commit 8464dd52, which was a misapplied debugging
version of the patch, not the final patch itself.
Signed-off-by: NChris Ball <cjb@laptop.org>
Cc: stable@vger.kernel.org

6984f3c3

mmc: sdhci-s3c: fix missing clock for gpio card-detect · fe007c02

由 Heiko Stübner 提交于 11月 18, 2012

2abeb5c5 ("Add clk_(enable/disable) in runtime suspend/resume")
added the capability to stop the clocks when the device is runtime
suspended, but forgot to handle the case of the card-detect using
an external gpio.

Therefore in the case that runtime-pm is enabled, start the io-clock
when a card is inserted and stop it again once it is removed.
Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
Signed-off-by: NChris Ball <cjb@laptop.org>

fe007c02

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 04c5decd

由 Linus Torvalds 提交于 12月 06, 2012

Pull MIPS fixes from Ralf Baechle:
 "These are the fixes for the N32 syscall bugs found by Al, an
  extraneous break that broke detection for R3000 and R3081 processors,
  an endless loop processing signals for kernel task (x86 received the
  same fix a while ago) and a fix for transparent huge page which took
  ages to track down because it was so hard to come up with a workable
  test case."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Fix endless loop when processing signals for kernel tasks
  MIPS: R3000/R3081: Fix CPU detection.
  MIPS: N32: Fix signalfd4 syscall entry point
  MIPS: N32: Fix preadv(2) and pwritev(2) entry points.
  MIPS: Avoid mcheck by flushing page range in huge_ptep_set_access_flags()

04c5decd

Merge branch 'more-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · d91fa971

由 Linus Torvalds 提交于 12月 06, 2012

Pull build fix from Rusty Russell:
 "Tim Gardner <tim.gardner@canonical.com> writes:
  > It is $(obj)/oid_registry.o that is dependent on $(obj)/oid_registry_data.c.
  > The object file cannot be built until $(obj)/oid_registry_data.c has been
  > generated.
  >
  > A periodic and hard to reproduce parallel build failure is due to
  > this incorrect lib/Makefile dependency. The compile error is completely
  > disingenuous.
  >
  >   GEN     lib/oid_registry_data.c
  > Compiling 49 OIDs
  >   CC      lib/oid_registry.o
  > gcc: error: lib/oid_registry.c: No such file or directory
  > gcc: fatal error: no input files
  > compilation terminated.
  > make[3]: *** [lib/oid_registry.o] Error 4

  I can't reproduce it either.  It's completely weird; nothing ever
  removes lib/oid_registry.c, so either gcc is giving the wrong message
  or it's a weird fs with a very odd race.

  But your version is definitely more correct than the previous one,
  so..."

* 'more-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  lib/Makefile: Fix oid_registry build dependency

d91fa971

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · 54d1ae49

由 Linus Torvalds 提交于 12月 06, 2012

Pull module signing fixes from Rusty Russell:
 "David gave me these a month ago, during my git workflow churn :("

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  ASN.1: Fix an indefinite length skip error
  MODSIGN: Don't use enum-type bitfields in module signature info block

54d1ae49

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cfd1f032

由 Linus Torvalds 提交于 12月 06, 2012

Pull watchdog fix from Thomas Gleixner:
 "Trivial CPU hotplug regression fix for the watchdog code"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  watchdog: Fix CPU hotplug regression

cfd1f032

06 12月, 2012 5 次提交

lib/Makefile: Fix oid_registry build dependency · 527897cc

由 Tim Gardner 提交于 12月 04, 2012

It is $(obj)/oid_registry.o that is dependent on $(obj)/oid_registry_data.c.
The object file cannot be built until $(obj)/oid_registry_data.c has been
generated.

A periodic and hard to reproduce parallel build failure is due to
this incorrect lib/Makefile dependency. The compile error is completely
disingenuous.

  GEN     lib/oid_registry_data.c
Compiling 49 OIDs
  CC      lib/oid_registry.o
gcc: error: lib/oid_registry.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
make[3]: *** [lib/oid_registry.o] Error 4

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

527897cc

MIPS: Fix endless loop when processing signals for kernel tasks · c90e6fbb

由 Dmitry Adamushko 提交于 4月 05, 2012

The problem occurs [1] when a kernel-mode task returns from a system
call with a pending signal.

A real-life scenario is a child of 'khelper' returning from a failed
kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
kernel_execve() fails due to a pending SIGKILL, which is the result of
"kill -9 -1" (at least, busybox's init does it upon reboot).

The loop is as follows:

* syscall_exit_work:
 - work_pending:            // start_of_the_loop
 - work_notifysig:
   - do_notify_resume()
     - do_signal()
       - if (!user_mode(regs)) return;
 - resume_userspace         // TIF_SIGPENDING is still set
 - work_pending             // so we call work_pending => goto
                            // start_of_the_loop

More information can be found in another LKML thread:
http://www.serverphorums.com/read.php?12,457826

[1] The problem was also reproduced on !CONFIG_VM86 x86, and the
following fix was accepted.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=29a2e2836ff9ea65a603c89df217f4198973a74fSigned-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3571/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

c90e6fbb

MIPS: R3000/R3081: Fix CPU detection. · 2d33976f

由 Ralf Baechle 提交于 6月 08, 2012

Broken since e05ea74fc56f347f872ef9946d27c53e8bf20864 (lmo) rsp.
cea7e2df (kernel.org) [MIPS: Sort out CPU
type to name translation.]  These CPUs are no longer very popular to say
the least ...
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Reported-by: NMurphy McCauley <murphy.mccauley@gmail.com>

2d33976f

MIPS: N32: Fix signalfd4 syscall entry point · 97daa768

由 Ralf Baechle 提交于 12月 04, 2012

This needs to use the compat entry point or it's going to fail on big
endian systems.

Noticed by Al Viro.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

97daa768

vfs: clear to the end of the buffer on partial buffer reads · 27d7c2a0

由 Dan Carpenter 提交于 12月 05, 2012

READ is zero so the "rw & READ" test is always false.  The intended test
was "((rw & RW_MASK) == READ)".
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27d7c2a0

05 12月, 2012 9 次提交

ASN.1: Fix an indefinite length skip error · f3537f91

由 David Howells 提交于 10月 22, 2012

Fix an error in asn1_find_indefinite_length() whereby small definite length
elements of size 0x7f are incorrecly classified as non-small. Without this
fix, an error will be given as the length of the length will be perceived as
being very much greater than the maximum supported size.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

f3537f91

MODSIGN: Don't use enum-type bitfields in module signature info block · 12e130b0

由 David Howells 提交于 10月 22, 2012

Don't use enum-type bitfields in the module signature info block as we can't be
certain how the compiler will handle them.  As I understand it, it is arch
dependent, and it is possible for the compiler to rearrange them based on
endianness and to insert a byte of padding to pad the three enums out to four
bytes.

Instead use u8 fields for these, which the compiler should emit in the right
order without padding.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

12e130b0

watchdog: Fix CPU hotplug regression · 8d451690

由 Thomas Gleixner 提交于 12月 04, 2012

Norbert reported:
"3.7-rc6 booted with nmi_watchdog=0 fails to suspend to RAM or
 offline CPUs. It's reproducable with a KVM guest and physical
 system."

The reason is that commit bcd951cf(watchdog: Use hotplug thread
infrastructure) missed to take this into account. So the cpu offline
code gets stuck in the teardown function because it accesses non
initialized data structures.

Add a check for watchdog_enabled into that path to cure the issue.
Reported-and-tested-by: NNorbert Warmuth <nwarmuth@t-online.de>
Tested-by: NJoseph Salisbury <joseph.salisbury@canonical.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1211231033230.2701@ionos
Link: http://bugs.launchpad.net/bugs/1079534Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

8d451690

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · df2fc246

由 Linus Torvalds 提交于 12月 04, 2012

Pull module fixes from Rusty Russell:
 "Module signing build fixes for blackfin and metag"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modsign: add symbol prefix to certificate list
  linux/kernel.h: define SYMBOL_PREFIX

df2fc246

Merge tag 'upstream-3.7-rc9' of git://git.infradead.org/linux-ubi · 70dcc535

由 Linus Torvalds 提交于 12月 04, 2012

Pull UBI changes from Artem Bityutskiy:
 "Fixes for 2 brown-paperbag bugs introduced this merge window by the
  fastmap code:

   1.  The UBI background thread got stuck when a bit-flip happened
       because free LEBs was not removed from the "free" tree when we
       started using it.
   2.  I/O debugging checks did not work because we called a sleeping
       function in atomic context."

* tag 'upstream-3.7-rc9' of git://git.infradead.org/linux-ubi:
  UBI: dont call ubi_self_check_all_ff() in __wl_get_peb()
  UBI: remove PEB from free tree in get_peb_for_wl()

70dcc535

Merge branch 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · ca50496e

由 Linus Torvalds 提交于 12月 04, 2012

Pull workqueue fixes from Tejun Heo:
 "So, safe fixes my ass.

  Commit 8852aac2 ("workqueue: mod_delayed_work_on() shouldn't queue
  timer on 0 delay") had the side-effect of performing delayed_work
  sanity checks even when @delay is 0, which should be fine for any sane
  use cases.

  Unfortunately, megaraid was being overly ingenious.  It seemingly
  wanted to use cancel_delayed_work_sync() before cancel_work_sync() was
  introduced, but didn't want to waste the space for full delayed_work
  as it was only going to use 0 @delay.  So, it only allocated space for
  struct work_struct and then cast it to struct delayed_work and passed
  it into delayed_work functions - truly awesome engineering tradeoff to
  save some bytes.

  Xiaotian fixed it by making megraid allocate full delayed_work for
  now.  It should be converted to use work_struct and cancel_work_sync()
  but I think we better do that after 3.7.

  I added another commit to change BUG_ON()s in __queue_delayed_work()
  to WARN_ON_ONCE()s so that the kernel doesn't crash even if there are
  more such abuses."

* 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s
  megaraid: fix BUG_ON() from incorrect use of delayed work

ca50496e

MIPS: N32: Fix preadv(2) and pwritev(2) entry points. · d5563715

由 Ralf Baechle 提交于 12月 04, 2012

By using the native syscall entry point the kernel was also expecting
64-bit iovec structures.

This is broken since ddd9e91b [preadv/
pwritev: MIPS: Add preadv(2) and pwritev(2) syscalls.] which originally
added these two syscalls.  I walked through piles of code, including
libc and couldn't find anything that would have worked around the issue
so this change the API to what it should always have been.

Noticed and patch suggested by Al Viro.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

d5563715

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 609e3ff3

由 Linus Torvalds 提交于 12月 04, 2012

Pull sparc fixes from David Miller:
 "Two small fixes for Sparc, nobody uses sparc, so these are low risk :-)

   1) Piggyback is too picky about the symbol types that _start and _end
      have in the final kernel image, and it thus breaks with newer
      binutils.  Future proof by getting rid of the symbol type checks.

   2) exit_group() should kill register windows on sparc64 the same way
      we do for plain exit().  Thanks to Al Viro for spotting this."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix piggyback with newer binutils.
  sparc64: exit_group should kill register windows just like plain exit.

609e3ff3

vfs: avoid "attempt to access beyond end of device" warnings · 57302e0d

由 Linus Torvalds 提交于 12月 04, 2012

The block device access simplification that avoided accessing the (racy)
block size information (commit bbec0270: "blkdev_max_block: make
private to fs/buffer.c") no longer checks the maximum block size in the
block mapping path.

That was _almost_ as simple as just removing the code entirely, because
the readers and writers all check the size of the device anyway, so
under normal circumstances it "just worked".

However, the block size may be such that the end of the device may
straddle one single buffer_head.  At which point we may still want to
access the end of the device, but the buffer we use to access it
partially extends past the end.

The 'bd_set_size()' function intentionally sets the block size to avoid
this, but mounting the device - or setting the block size by hand to
some other value - can modify that block size.

So instead, teach 'submit_bh()' about the special case of the buffer
head straddling the end of the device, and turning such an access into a
smaller IO access, avoiding the problem.

This, btw, also means that unlike before, we can now access the whole
device regardless of device block size setting.  So now, even if the
device size is only 512-byte aligned, we can read and write even the
last sector even when having a much bigger block size for accessing the
rest of the device.

So with this, we could now get rid of the 'bd_set_size()' block size
code entirely - resulting in faster IO for the common case - but that
would be a separate patch.
Reported-and-tested-by: NRomain Francoise <romain@orebokech.com>
Reporeted-and-tested-by: NMeelis Roos <mroos@linux.ee>
Reported-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57302e0d

04 12月, 2012 6 次提交

workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s · fc4b514f

由 Tejun Heo 提交于 12月 04, 2012

8852aac2 ("workqueue: mod_delayed_work_on() shouldn't queue timer on
0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
megaraid - it allocated work_struct, casted it to delayed_work and
then pass that into queue_delayed_work().

Previously, this was okay because 0 @delay short-circuited to
queue_work() before doing anything with delayed_work.  8852aac2
moved 0 @delay test into __queue_delayed_work() after sanity check on
delayed_work making megaraid trigger BUG_ON().

Although megaraid is already fixed by c1d390d8 ("megaraid: fix
BUG_ON() from incorrect use of delayed work"), this patch converts
BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
abusers, if there are more, trigger warning but don't crash the
machine.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Xiaotian Feng <xtfeng@gmail.com>

fc4b514f

MIPS: Avoid mcheck by flushing page range in huge_ptep_set_access_flags() · ac53c4fc

由 David Daney 提交于 12月 03, 2012

Problem:

1) Huge page mapping of anonymous memory is initially invalid.  Will be
   faulted in by copy-on-write mechanism.

2) Userspace attempts store at the end of the huge mapping.

3) TLB Refill exception handler fill TLB with a normal (4K sized)
   invalid page at the end of the huge mapping virtual address range.

4) Userspace restarted, and re-attempts the store at the end of the
   huge mapping.

5) Page from #3 is invalid, we get a fault and go to the hugepage
   fault handler.  This tries to map a huge page and calls
   huge_ptep_set_access_flags() to install the mapping.

6) We just call the generic ptep_set_access_flags() to set up the page
   tables, but the flush there assumes a normal (4K sized) page and
   only tries to flush the first part of the huge page virtual address
   out of the TLB, since the existing entry from step #3 doesn't
   conflict, nothing is flushed.

7) We attempt to load the mapping into the TLB, but because it
   conflicts with the entry from step #3, we get a Machine Check
   exception.

The fix: Flush the entire rage covered by the huge page in
huge_ptep_set_access_flags(), and remove the optimization in
local_flush_tlb_range() so that the flush actually does the correct
thing.
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Hillf Danton <dhillf@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4661/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
(cherry picked from commit dd617f258cc39d36be26afee9912624a2d23112c)

ac53c4fc

megaraid: fix BUG_ON() from incorrect use of delayed work · c1d390d8

由 Xiaotian Feng 提交于 12月 04, 2012

megaraid use INIT_WORK to declare a hotplug_work, but cast the
hotplug_work from work_struct to delayed_work and
schedule_delayed_work on it.  This is very dangerous, as other part of
delayed_work might be kernel memories allocated by others.

With commit 8852aac2 ("workqueue: mod_delayed_work_on() shouldn't queue
timer on 0 delay"), schedule_delayed_work() will check dwork->timer
before queue_work even when @delay is 0, this causes megaraid code to
hit the BUG_ON() in workqueue code.  Change megaraid code to use
delayed work.
Signed-off-by: NXiaotian Feng <dannyfeng@tencent.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: linux-scsi@vger.kernel.org

c1d390d8

UBI: dont call ubi_self_check_all_ff() in __wl_get_peb() · 894aef21

由 Richard Weinberger 提交于 12月 03, 2012

As ubi_self_check_all_ff() might sleep we are not allowed
to call it from atomic context.
For now we call it only from ubi_wl_get_peb().
There are some code paths where it would also make sense,
but these paths are currently atomic and only enabled
when fastmap is used.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>

894aef21

UBI: remove PEB from free tree in get_peb_for_wl() · ed4b7021

由 Richard Weinberger 提交于 12月 03, 2012

If UBI is built without fastmap, get_peb_for_wl() has to
remove the PEB manially from the free tree.
Otherwise the requested PEB lives in two trees.
Reported-by: NZach Sadecki <zsadecki@itwatchdogs.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>

ed4b7021

sparc: Fix piggyback with newer binutils. · 0032c857

由 David S. Miller 提交于 12月 03, 2012

Newer versions of binutils mark '_end' as 'B' instead of 'A' for
whatever reason.

To be honest, the piggyback code doesn't actually care what kind
of symbol _start and _end are, it just wants to find them and
record the address.

So remove the type from the match strings.
Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0032c857