提交 · f4f62301c6f42127b7462274abfcbc278f84d59a · openanolis / cloud-kernel

27 8月, 2008 9 次提交

fs_enet: Fix SCC Ethernet on CPM2, and crash in fs_enet_rx_napi() · f4f62301

由 Heiko Schocher 提交于 8月 25, 2008

Signed-off-by: NHeiko Schocher <hs@denx.de>
Signed-off-by: NVitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

f4f62301

igb: fix setting the number of tx queues · 34a20e89

由 Alexander Duyck 提交于 8月 26, 2008

The real_num_tx_queues was not being set when in MSI-X only mode. This patch
corrects that path so all interrupt types are correctly configured.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

34a20e89

igb: ethtool -d reads EICR which is incorrect as it is read on clear · fe59de38

由 Alexander Duyck 提交于 8月 26, 2008

Ethtool -d is reading the EICR and ICR registers which is currently
clearing these registers and masking off interrupts.  To prevent this we
read the EICS and ICS equivilents as they can be read without clearing or
masking.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

fe59de38

igb: force all queues to interrupt once every 2 seconds · 7a6ea550

由 Alexander Duyck 提交于 8月 26, 2008

Set the EICS bit for each of the RX queues at least once every 2 seconds to
prevent the rx queues from stalling.
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

7a6ea550

r8169: balance pci_map / pci_unmap pair · a866bbf6

由 Francois Romieu 提交于 8月 26, 2008

The leak hurts with swiotlb and jumbo frames.

Fix http://bugzilla.kernel.org/show_bug.cgi?id=9468.

Heavily hinted by Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>.
Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com>
Tested-by: NAlistair John Strachan <alistair@devzero.co.uk>
Tested-by: NTimothy J Fontaine <tjfontaine@atxconsulting.com>
Cc: Edward Hsu <edward_hsu@realtek.com.tw>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

a866bbf6

myri10ge: update version string to 1.4.3-1.358 · 0623807a

由 Brice Goglin 提交于 8月 26, 2008

Update myri10ge version string to 1.4.3-1.358.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

0623807a

ixgbe: fix vlan filtering · 3d01625a

由 Alexander Duyck 提交于 8月 26, 2008

VLAN filtering is broken, due to reading the incorrect register for
the VLAN filtering settings.  Fixed by reading/writing the correct
register.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

3d01625a

Blackfin EMAC Driver: the BF526 also supports the MAC, · 736783b8

由 Mike Frysinger 提交于 8月 27, 2008

so update things accordingly
Signed-off-by: NMike Frysinger <vapier.adi@gmail.com>
Signed-off-by: NBryan Wu <cooloney@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

736783b8

J

Merge branch 'for-2.6.27' of git://git.marvell.com/mv643xx_eth into upstream-fixes · 373a5e02
由 Jeff Garzik 提交于 8月 27, 2008

373a5e02

26 8月, 2008 12 次提交

bnx2x: Version update · c2d42545

由 Eilon Greenstein 提交于 8月 25, 2008

Version update
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2d42545

bnx2x: Multi Queue · 231fd58a

由 Yitchak Gertner 提交于 8月 25, 2008

The multi queue support is still disabled by default for the bnx2x
(needs some more testing and validation), but there are 2 obvious bug in
it which are fixed in this patch
Signed-off-by: NYitchak Gertner <gertner@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

231fd58a

bnx2x: NAPI and interrupts enable/disable · 65abd74d

由 Yitchak Gertner 提交于 8月 25, 2008

Fixing the order of enabling and disabling NAPI and the interrupts
Signed-off-by: NYitchak Gertner <gertner@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65abd74d

bnx2x: NIC load failure cleanup · d1014634

由 Yitchak Gertner 提交于 8月 25, 2008

Load failures were not handled correctly
Signed-off-by: NYitchak Gertner <gertner@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1014634

bnx2x: Initialization structure · 3cdf1db7

由 Yitchak Gertner 提交于 8月 25, 2008

The TPA initialization is part of the FW internal memory initialization
and so it is moved to the appropriate function
Signed-off-by: NYitchak Gertner <gertner@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3cdf1db7

bnx2x: HW lock timeout · 46230476

由 Eilon Greenstein 提交于 8月 25, 2008

Increasing the lock timeout to 5 seconds instead of 1 second to minimize
the chance of failures due to timeout
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46230476

bnx2x: Minimize lock time · 76b190c5

由 Eilon Greenstein 提交于 8月 25, 2008

After iSCSI boot, the HW lock should only protect the flag so only the
first function will reset the chip and not then entire chip reset
process
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76b190c5

bnx2x: Fan failure mechanism on additional design · 7add905f

由 Eilon Greenstein 提交于 8月 25, 2008

The A1021G board is also using the fan failure mechanism in the same way
the A1022G board does
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7add905f

bnx2x: Rx work check · 2772f903

由 Eilon Greenstein 提交于 8月 25, 2008

The has Rx work check was wrong: when the FW was at the end of the page,
the driver was already at the beginning of the next page. Since the
check only validated that both driver and FW are pointing to the same
place, it concluded that there is still work to be done. This caused
some serious issues including long latency results on ping-pong test and
lockups while unloading the driver in that condition.
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2772f903

ipv6: sysctl fixes · ce3113ec

由 Al Viro 提交于 8月 25, 2008

Braino: net.ipv6 in ipv6 skeleton has no business in rotable
class
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce3113ec

ipv4: sysctl fixes · 2f4520d3

由 Al Viro 提交于 8月 25, 2008

net.ipv4.neigh should be a part of skeleton to avoid ordering problems
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f4520d3

sctp: add verification checks to SCTP_AUTH_KEY option · 30c2235c

由 Vlad Yasevich 提交于 8月 25, 2008

The structure used for SCTP_AUTH_KEY option contains a
length that needs to be verfied to prevent buffer overflow
conditions.  Spoted by Eugene Teo <eteo@redhat.com>.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30c2235c

24 8月, 2008 6 次提交

L
mv643xx_eth: bump version to 1.3 · c4560318
由 Lennert Buytenhek 提交于 8月 24, 2008
```
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
```
c4560318

mv643xx_eth: enforce multiple-of-8-bytes receive buffer size restriction · abe78717

由 Lennert Buytenhek 提交于 8月 24, 2008

The mv643xx_eth hardware ignores the lower three bits of the buffer
size field in receive descriptors, causing the reception of full-sized
packets to fail at some MTUs. Fix this by rounding the size of
allocated receive buffers up to a multiple of eight bytes.

While we are at it, add a bit of extra space to each receive buffer so
that we can handle multiple vlan tags on ingress.
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>

abe78717

mv643xx_eth: fix NULL pointer dereference in rxq_process() · 9e1f3772

由 Lennert Buytenhek 提交于 8月 24, 2008

When we are low on memory, the assumption that every descriptor in the
receive ring will have an skbuff associated with it does not hold.

rxq_process() was assuming that if the receive descriptor it is working
on is not owned by the hardware, it can safely be processed and handed
to the networking stack. But a descriptor in the receive ring not being
owned by the hardware can also happen when we are low on memory and did
not manage to refill the receive ring fully.

This patch changes rxq_process()'s bailout condition from "the first
receive descriptor to be processed is owned by the hardware" to "the
first receive descriptor to be processed is owned by the hardware OR
the number of valid receive descriptors in the ring is zero".
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>

9e1f3772

mv643xx_eth: fix inconsistent lock semantics · 8e0b1bf6

由 Lennert Buytenhek 提交于 8月 24, 2008

Nicolas Pitre noted that mv643xx_eth_poll was incorrectly using
non-IRQ-safe locks while checking whether to wake up the netdevice's
transmit queue. Convert the locking to *_irq() variants, since we
are running from softirq context where interrupts are enabled.
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>

8e0b1bf6

mv643xx_eth: fix double add_timer() on the receive oom timer · 92c70f27

由 Lennert Buytenhek 提交于 8月 24, 2008

Commit 12e4ab79 ("mv643xx_eth: be
more agressive about RX refill") changed the condition for the receive
out-of-memory timer to be scheduled from "the receive ring is empty"
to "the receive ring is not full".

This can lead to a situation where the receive out-of-memory timer is
pending because a previous rxq_refill() didn't manage to refill the
receive ring entirely as a result of being out of memory, and
rxq_refill() is then called again as a side effect of a packet receive
interrupt, and that rxq_refill() call then again does not succeed to
refill the entire receive ring with fresh empty skbuffs because we are
still out of memory, and then tries to call add_timer() on the already
scheduled out-of-memory timer.

This patch fixes this issue by changing the add_timer() call in
rxq_refill() to a mod_timer() call.  If the OOM timer was not already
scheduled, this will behave as before, whereas if it was already
scheduled, this patch will push back its firing time a bit, which is
safe because we've (unsuccessfully) attempted to refill the receive
ring just before we do this.
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>

92c70f27

mv643xx_eth: fix NAPI 'rotting packet' issue · 819ddcaf

由 Lennert Buytenhek 提交于 8月 24, 2008

When a receive interrupt occurs, mv643xx_eth would first process the
receive descriptors and then ACK the receive interrupt, instead of the
other way round.

This would leave a small race window between processing the last
receive descriptor and clearing the receive interrupt status in which
a new packet could come in, which would then 'rot' in the receive
ring until the next receive interrupt would come in.

Fix this by ACKing (clearing) the receive interrupt condition before
processing the receive descriptors.
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>

819ddcaf

23 8月, 2008 2 次提交

ipv6: protocol for address routes · f410a1fb

由 Stephen Hemminger 提交于 8月 23, 2008

This fixes a problem spotted with zebra, but not sure if it is
necessary a kernel problem.  With IPV6 when an address is added to an
interface, Zebra creates a duplicate RIB entry, one as a connected
route, and other as a kernel route.

When an address is added to an interface the RTN_NEWADDR message
causes Zebra to create a connected route. In IPV4 when an address is
added to an interface a RTN_NEWROUTE message is set to user space with
the protocol RTPROT_KERNEL. Zebra ignores these messages, because it
already has the connected route.

The problem is that route created in IPV6 has route protocol ==
RTPROT_BOOT.  Was this a design decision or a bug? This fixes it. Same
patch applies to both net-2.6 and stable.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f410a1fb

icmp: icmp_sk() should not use smp_processor_id() in preemptible code · fdc0bde9

由 Denis V. Lunev 提交于 8月 23, 2008

Pass namespace into icmp_xmit_lock, obtain socket inside and return
it as a result for caller.

Thanks Alexey Dobryan for this report:

Steps to reproduce:

	CONFIG_PREEMPT=y
	CONFIG_DEBUG_PREEMPT=y
	tracepath <something>

BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205
caller is icmp_sk+0x15/0x30
Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1

Call Trace:
 [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0
 [<ffffffff80409405>] icmp_sk+0x15/0x30
 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110
 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40
 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900
 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290
 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60
 [<ffffffff803e6650>] ? dst_output+0x0/0x10
 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60
 [<ffffffff803e92e3>] ip_output+0xa3/0xf0
 [<ffffffff803e68d0>] ip_local_out+0x20/0x30
 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400
 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0
 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0
 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80
 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110
 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010
 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450
 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590
 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80
 [<ffffffff803ba50a>] sys_sendto+0xea/0x120
 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80
 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0
 [<ffffffff8024e0c6>] ? up_read+0x26/0x30
 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b

icmp6_sk() is similar.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdc0bde9

22 8月, 2008 1 次提交

pkt_sched: Fix qdisc list locking · f6e0b239

由 Jarek Poplawski 提交于 8月 22, 2008

Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup())
without rtnl_lock(), adding and deleting from a qdisc list needs
additional locking. This patch adds global spinlock qdisc_list_lock
and wrapper functions for modifying the list. It is considered as a
temporary solution until hfsc_dequeue(), netem_dequeue() and
tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone.

With feedback from Herbert Xu and David S. Miller.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6e0b239

21 8月, 2008 10 次提交

pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race · 2540e051

由 Jarek Poplawski 提交于 8月 21, 2008

dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog()
or other timer calling netif_schedule() after dev_queue_deactivate().
We prevent this checking aliveness before scheduling the timer. Since
during deactivation the root qdisc is available only as qdisc_sleeping
additional accessor qdisc_root_sleeping() is created.

With feedback from Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2540e051

sctp: fix potential panics in the SCTP-AUTH API. · 5e739d17

由 Vlad Yasevich 提交于 8月 21, 2008

All of the SCTP-AUTH socket options could cause a panic
if the extension is disabled and the API is envoked.

Additionally, there were some additional assumptions that
certain pointers would always be valid which may not
always be the case.

This patch hardens the API and address all of the crash
scenarios.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e739d17

L

Linux v2.6.27-rc4 · 6a55617e
由 Linus Torvalds 提交于 8月 20, 2008

6a55617e

cramfs: fix named-pipe handling · 82d63fc9

由 Al Viro 提交于 8月 20, 2008

After commit a97c9bf3 (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.

It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).

Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.
Reported-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>		[2.6.14 and later]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82d63fc9

fbdefio: add set_page_dirty handler to deferred IO FB · d847471d

由 Ian Campbell 提交于 8月 20, 2008

Fixes kernel BUG at lib/radix-tree.c:473.

Previously the handler was incidentally provided by tmpfs but this was
removed with:

  commit 14fcc23f
  Author: Hugh Dickins <hugh@veritas.com>
  Date:   Mon Jul 28 15:46:19 2008 -0700

    tmpfs: fix kernel BUG in shmem_delete_inode

relying on this behaviour was incorrect in any case and the BUG also
appeared when the device node was on an ext3 filesystem.

v2: override a_ops at open() time rather than mmap() time to minimise
races per AKPM's concerns.
Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
Cc: Jaya Kumar <jayakumar.lkml@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kel Modderman <kel@otaku42.de>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: <stable@kernel.org> [14fcc23f is in 2.6.25.14 and 2.6.26.1]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d847471d

rtc: rtc-ds1374: fix 'no irq' case handling · b42f9317

由 Anton Vorontsov 提交于 8月 20, 2008

On a PowerPC board with ds1374 RTC I'm getting this error while RTC tries
to probe:

rtc-ds1374 0-0068: unable to request IRQ

This happens because I2C probing code (drivers/of/of_i2c.c) is specifying
IRQ0 for 'no irq' case, which is correct.

The driver handles this incorrectly, though. This patch fixes it.
Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: NPeter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b42f9317

mm: xip/ext2 fix block allocation race · 14bac5ac

由 Nick Piggin 提交于 8月 20, 2008

XIP can call into get_xip_mem concurrently with the same file,offset with
create=1.  This usually maps down to get_block, which expects the page
lock to prevent such a situation.  This causes ext2 to explode for one
reason or another.

Serialise those calls for the moment.  For common usages today, I suspect
get_xip_mem rarely is called to create new blocks.  In future as XIP
technologies evolve we might need to look at which operations require
scalability, and rework the locking to suit.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Acked-by: NCarsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

14bac5ac

mm: xip fix fault vs sparse page invalidate race · 538f8ea6

由 Nick Piggin 提交于 8月 20, 2008

XIP has a race between sparse pages being inserted into page tables, and
sparse pages being zapped when its time to put a non-sparse page in.

What can happen is that a process can be left with a dangling sparse page
in a MAP_SHARED mapping, while the rest of the world sees the non-sparse
version.  Ie.  data corruption.

Guard these operations with a seqlock, making fault-in-sparse-pages the
slowpath, and try-to-unmap-sparse-pages the fastpath.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Acked-by: NCarsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

538f8ea6

mm: dirty page tracking race fix · 479db0bf

由 Nick Piggin 提交于 8月 20, 2008

There is a race with dirty page accounting where a page may not properly
be accounted for.

clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.

page_mkclean walks the rmaps for that page, and for each one it cleans and
write protects the pte if it was dirty.  It uses page_check_address to
find the pte.  That function has a shortcut to avoid the ptl if the pte is
not present.  Unfortunately, the pte can be switched to not-present then
back to present by other code while holding the page table lock -- this
should not be a signal for page_mkclean to ignore that pte, because it may
be dirty.

For example, powerpc64's set_pte_at will clear a previously present pte
before setting it to the desired value.  There may also be other code in
core mm or in arch which do similar things.

The consequence of the bug is loss of data integrity due to msync, and
loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
also be unreliable (depending on the exact XIP locking scheme), which can
lead to data corruption.

Fix this by having an option to always take ptl to check the pte in
page_check_address.

It's possible to retain this optimization for page_referenced and
try_to_unmap.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

479db0bf

fix setpriority(PRIO_PGRP) thread iterator breakage · 2d70b68d

由 Ken Chen 提交于 8月 20, 2008

When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting.  The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.

Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process.  Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.
Signed-off-by: NKen Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d70b68d

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功