- 04 2月, 2015 8 次提交
-
-
由 NeilBrown 提交于
There is currently no locking around calls to the 'congested' bdi function. If called at an awkward time while an array is being converted from one level (or personality) to another, there is a tiny chance of running code in an unreferenced module etc. So add a 'congested' function to the md_personality operations structure, and call it with appropriate locking from a central 'mddev_congested'. When the array personality is changing the array will be 'suspended' so no IO is processed. If mddev_congested detects this, it simply reports that the array is congested, which is a safe guess. As mddev_suspend calls synchronize_rcu(), mddev_congested can avoid races by included the whole call inside an rcu_read_lock() region. This require that the congested functions for all subordinate devices can be run under rcu_lock. Fortunately this is the case. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This lock is used for (slightly) more than helping with writing superblocks, and it will soon be extended further. So the name is inappropriate. Also, the _irq variant hasn't been needed since 2.6.37 as it is never taking from interrupt or bh context. So: -rename write_lock to lock -document what it protects -remove _irq ... except in md_flush_request() as there is no wait_event_lock() (with no _irq). This can be cleaned up after appropriate changes to wait.h. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
That last condition is unclear and over cautious. There are two related issues here. If a partial write is destined for a missing device, then either RMW or RCW can work. We must read all the available block. Only then can the missing blocks be calculated, and then the parity update performed. If RMW is not an option, then there is a complication even without partial writes. If we would need to read a missing device to perform the reconstruction, then we must first read every block so the missing device data can be computed. This is the case for RAID6 (Which currently does not support RMW) and for times when we don't trust the parity (after a crash) and so are in the process of resyncing it. So make these two cases more clear and separate, and perform the relevant tests more thoroughly. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Both the last two cases are only relevant if something has failed and something needs to be written (but not over-written), and if it is OK to pre-read blocks at this point. So factor out those tests and explain them. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Some of the conditions in need_this_block have very straight forward motivation. Separate those out and document them. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
fetch_block() has a very large and hard to read 'if' condition. Separate it into its own function so that it can be made more readable. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jes Sorensen 提交于
67f45548 introduced a call to md_wakeup_thread() when adding to the delayed_list. However the md thread is woken up unconditionally just below. Remove the unnecessary wakeup call. Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jan Beulich 提交于
Just like for AVX2 (which simply needs an #if -> #ifdef conversion), SSSE3 assembler support should be checked for before using it. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 02 2月, 2015 2 次提交
-
-
由 NeilBrown 提交于
commit 8eb23b9f sched: Debug nested sleeps causes false-positive warnings in RAID5 code. This annotation removes them and adds a comment explaining why there is no real problem. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a non-page-aligned write is destined for a device which is missing/faulty, we can deadlock. As the target device is missing, a read-modify-write cycle is not possible. As the write is not for a full-page, a recontruct-write cycle is not possible. This should be handled by logic in fetch_block() which notices there is a non-R5_OVERWRITE write to a missing device, and so loads all blocks. However since commit 67f45548, that code requires STRIPE_PREREAD_ACTIVE before it will active, and those circumstances never set STRIPE_PREREAD_ACTIVE. So: in handle_stripe_dirtying, if neither rmw or rcw was possible, set STRIPE_DELAYED, which will cause STRIPE_PREREAD_ACTIVE be set after a suitable delay. Fixes: 67f45548 Cc: stable@vger.kernel.org (v3.16+) Reported-by: NMikulas Patocka <mpatocka@redhat.com> Tested-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 1月, 2015 5 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net由 Linus Torvalds 提交于
Pull networking fixes from David Miller: 1) Don't OOPS on socket AIO, from Christoph Hellwig. 2) Scheduled scans should be aborted upon RFKILL, from Emmanuel Grumbach. 3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish. 4) Fix RCU locking across copy_to_user() in bpf code, from Alexei Starovoitov. 5) Lots of crash, memory leak, short TX packet et al bug fixes in sh_eth from Ben Hutchings. 6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel Borkmann. 7) Fix return value logic for poll handlers in netxen, enic, and bnx2x. From Eric Dumazet and Govindarajulu Varadarajan. 8) Header length calculation fix in mac80211 from Fred Chou. 9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths. From Ezequiel Garcia. 10) udp_diag has bogus logic in it's hash chain skipping, copy same fix tcp diag used. From Herbert Xu. 11) amd-xgbe programs wrong rx flow control register, from Thomas Lendacky. 12) Fix race leading to use after free in ping receive path, from Subash Abhinov Kasiviswanathan. 13) Cache redirect routes otherwise we can get a heavy backlog of rcu jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) net: don't OOPS on socket aio stmmac: prevent probe drivers to crash kernel bnx2x: fix napi poll return value for repoll ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too sh_eth: Fix DMA-API usage for RX buffers sh_eth: Check for DMA mapping errors on transmit sh_eth: Ensure DMA engines are stopped before freeing buffers sh_eth: Remove RX overflow log messages ping: Fix race in free in receive path udp_diag: Fix socket skipping within chain can: kvaser_usb: Fix state handling upon BUS_ERROR events can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT can: kvaser_usb: Send correct context to URB completion can: kvaser_usb: Do not sleep in atomic context ipv4: try to cache dst_entries which would cause a redirect samples: bpf: relax test_maps check bpf: rcu lock must not be held when calling copy_to_user() net: sctp: fix slab corruption from use after free on INIT collisions net: mv643xx_eth: Fix highmem support in non-TSO egress path sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers ...
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andy Shevchenko 提交于
In the case when alloc_netdev fails we return NULL to a caller. But there is no check for NULL in the probe drivers. This patch changes NULL to an error pointer. The function description is amended to reflect what we may get returned. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux由 Linus Torvalds 提交于
Pull powerpc fixes from Michael Ellerman: "Two powerpc fixes" * tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
-
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux由 Linus Torvalds 提交于
Pull one more module fix from Rusty Russell: "SCSI was using module_refcount() to figure out when the module was unloading: this broke with new atomic refcounting. The code is still suspicious, but this solves the WARN_ON()" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: scsi: always increment reference count
-
- 27 1月, 2015 25 次提交
-
-
With the commit d75b1ade ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. When in busy_poll is we return 0 in napi_poll. We should return budget. Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec由 David S. Miller 提交于
Steffen Klassert says: ==================== ipsec 2015-01-26 Just two small fixes for _decode_session6() where we might decode to wrong header information in some rare situations. Please pull or let me know if there are problems. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>: | [root@rhel7-5 lkundrak]# sh -x lal | + ip link add dev0 type dummy | + ip link set dev0 up | + ip link add dev1 type dummy | + ip link set dev1 up | + ip addr add 2001:db8:8086::2/64 dev dev0 | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 | + ip link del dev0 type dummy | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 | | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: NLubomir Rintel <lkundrak@v3.sk> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: NLubomir Rintel <lkundrak@v3.sk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Ben Hutchings says: ==================== Fixes for sh_eth #3 I'm continuing review and testing of Ethernet support on the R-Car H2 chip. This series fixes the last of the more serious issues I've found. These are not tested on any of the other supported chips. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
- Use the return value of dma_map_single(), rather than calling virt_to_page() separately - Check for mapping failue - Call dma_unmap_single() rather than dma_sync_single_for_cpu() Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
dma_map_single() may fail if an IOMMU or swiotlb is in use, so we need to check for this. Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Currently we try to clear EDRRR and EDTRR and immediately continue to free buffers. This is unsafe because: - In general, register writes are not serialised with DMA, so we still have to wait for DMA to complete somehow - The R8A7790 (R-Car H2) manual states that the TX running flag cannot be cleared by writing to EDTRR - The same manual states that clearing the RX running flag only stops RX DMA at the next packet boundary I applied this patch to the driver to detect DMA writes to freed buffers: > --- a/drivers/net/ethernet/renesas/sh_eth.c > +++ b/drivers/net/ethernet/renesas/sh_eth.c > @@ -1098,7 +1098,14 @@ static void sh_eth_ring_free(struct net_device *ndev) > /* Free Rx skb ringbuffer */ > if (mdp->rx_skbuff) { > for (i = 0; i < mdp->num_rx_ring; i++) > + memcpy(mdp->rx_skbuff[i]->data, > + "Hello, world", 12); > + msleep(100); > + for (i = 0; i < mdp->num_rx_ring; i++) { > + WARN_ON(memcmp(mdp->rx_skbuff[i]->data, > + "Hello, world", 12)); > dev_kfree_skb(mdp->rx_skbuff[i]); > + } > } > kfree(mdp->rx_skbuff); > mdp->rx_skbuff = NULL; then ran the loop: while ethtool -G eth0 rx 128 ; ethtool -G eth0 rx 64; do echo -n .; done and 'ping -f' toward the sh_eth port from another machine. The warning fired several times a minute. To fix these issues: - Deactivate all TX descriptors rather than writing to EDTRR - As there seems to be no way of telling when RX DMA is stopped, perform a soft reset to ensure that both DMA enginess are stopped - To reduce the possibility of the reset truncating a transmitted frame, disable egress and wait a reasonable time to reach a packet boundary before resetting - Update statistics before resetting (The 'reasonable time' does not allow for CS/CD in half-duplex mode, but half-duplex no longer seems reasonable!) Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
If RX traffic is overflowing the FIFO or DMA ring, logging every time this happens just makes things worse. These errors are visible in the statistics anyway. Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Merge tag 'linux-can-fixes-for-3.19-20150127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-01-27 this is another pull request for net/master which consists of 4 patches. All 4 patches are contributed by Ahmed S. Darwish, he fixes more problems in the kvaser_usb driver. David, please merge net/master to net-next/master, as we have more kvaser_usb patches in the queue, that target net-next. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 subashab@codeaurora.org 提交于
An exception is seen in ICMP ping receive path where the skb destructor sock_rfree() tries to access a freed socket. This happens because ping_rcv() releases socket reference with sock_put() and this internally frees up the socket. Later icmp_rcv() will try to free the skb and as part of this, skb destructor is called and which leads to a kernel panic as the socket is freed already in ping_rcv(). -->|exception -007|sk_mem_uncharge -007|sock_rfree -008|skb_release_head_state -009|skb_release_all -009|__kfree_skb -010|kfree_skb -011|icmp_rcv -012|ip_local_deliver_finish Fix this incorrect free by cloning this skb and processing this cloned skb instead. This patch was suggested by Eric Dumazet Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
While working on rhashtable walking I noticed that the UDP diag dumping code is buggy. In particular, the socket skipping within a chain never happens, even though we record the number of sockets that should be skipped. As this code was supposedly copied from TCP, this patch does what TCP does and resets num before we walk a chain. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ahmed S. Darwish 提交于
While being in an ERROR_WARNING state, and receiving further bus error events with error counters still in the ERROR_WARNING range of 97-127 inclusive, the state handling code erroneously reverts back to ERROR_ACTIVE. Per the CAN standard, only revert to ERROR_ACTIVE when the error counters are less than 96. Moreover, in certain Kvaser models, the BUS_ERROR flag is always set along with undefined bits in the M16C status register. Thus use bitwise operators instead of full equality for checking that register against bus errors. Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Ahmed S. Darwish 提交于
On some x86 laptops, plugging a Kvaser device again after an unplug makes the firmware always ignore the very first command. For such a case, provide some room for retries instead of completely exiting the driver init code. Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Ahmed S. Darwish 提交于
Send expected argument to the URB completion hander: a CAN netdevice instead of the network interface private context `kvaser_usb_net_priv'. This was discovered by having some garbage in the kernel log in place of the netdevice names: can0 and can1. Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Ahmed S. Darwish 提交于
Upon receiving a hardware event with the BUS_RESET flag set, the driver kills all of its anchored URBs and resets all of its transmit URB contexts. Unfortunately it does so under the context of URB completion handler `kvaser_usb_read_bulk_callback()', which is often called in an atomic context. While the device is flooded with many received error packets, usb_kill_urb() typically sleeps/reschedules till the transfer request of each killed URB in question completes, leading to the sleep in atomic bug. [3] In v2 submission of the original driver patch [1], it was stated that the URBs kill and tx contexts reset was needed since we don't receive any tx acknowledgments later and thus such resources will be locked down forever. Fortunately this is no longer needed since an earlier bugfix in this patch series is now applied: all tx URB contexts are reset upon CAN channel close. [2] Moreover, a BUS_RESET is now treated _exactly_ like a BUS_OFF event, which is the recommended handling method advised by the device manufacturer. [1] http://article.gmane.org/gmane.linux.network/239442 http://www.webcitation.org/6Vr2yagAQ [2] can: kvaser_usb: Reset all URB tx contexts upon channel close 889b77f7 [3] Stacktrace: <IRQ> [<ffffffff8158de87>] dump_stack+0x45/0x57 [<ffffffff8158b60c>] __schedule_bug+0x41/0x4f [<ffffffff815904b1>] __schedule+0x5f1/0x700 [<ffffffff8159360a>] ? _raw_spin_unlock_irqrestore+0xa/0x10 [<ffffffff81590684>] schedule+0x24/0x70 [<ffffffff8147d0a5>] usb_kill_urb+0x65/0xa0 [<ffffffff81077970>] ? prepare_to_wait_event+0x110/0x110 [<ffffffff8147d7d8>] usb_kill_anchored_urbs+0x48/0x80 [<ffffffffa01f4028>] kvaser_usb_unlink_tx_urbs+0x18/0x50 [kvaser_usb] [<ffffffffa01f45d0>] kvaser_usb_rx_error+0xc0/0x400 [kvaser_usb] [<ffffffff8108b14a>] ? vprintk_default+0x1a/0x20 [<ffffffffa01f5241>] kvaser_usb_read_bulk_callback+0x4c1/0x5f0 [kvaser_usb] [<ffffffff8147a73e>] __usb_hcd_giveback_urb+0x5e/0xc0 [<ffffffff8147a8a1>] usb_hcd_giveback_urb+0x41/0x110 [<ffffffffa0008748>] finish_urb+0x98/0x180 [ohci_hcd] [<ffffffff810cd1a7>] ? acct_account_cputime+0x17/0x20 [<ffffffff81069f65>] ? local_clock+0x15/0x30 [<ffffffffa000a36b>] ohci_work+0x1fb/0x5a0 [ohci_hcd] [<ffffffff814fbb31>] ? process_backlog+0xb1/0x130 [<ffffffffa000cd5b>] ohci_irq+0xeb/0x270 [ohci_hcd] [<ffffffff81479fc1>] usb_hcd_irq+0x21/0x30 [<ffffffff8108bfd3>] handle_irq_event_percpu+0x43/0x120 [<ffffffff8108c0ed>] handle_irq_event+0x3d/0x60 [<ffffffff8108ec84>] handle_fasteoi_irq+0x74/0x110 [<ffffffff81004dfd>] handle_irq+0x1d/0x30 [<ffffffff81004727>] do_IRQ+0x57/0x100 [<ffffffff8159482a>] common_interrupt+0x6a/0x6a Signed-off-by: NAhmed S. Darwish <ahmed.darwish@valeo.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 David S. Miller 提交于
Merge tag 'mac80211-for-davem-2015-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Another set of last-minute fixes: * fix station double-removal when suspending while associating * fix the HT (802.11n) header length calculation * fix the CCK radiotap flag used for monitoring, a pretty old regression but a simple one-liner * fix per-station group-key handling Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f8864972 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: NMarcelo Leitner <mleitner@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Alexei Starovoitov says: ==================== bpf: fix two bugs Michael Holzheu caught two issues (in bpf syscall and in the test). Fix them. Details in corresponding patches. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
hash map is unordered, so get_next_key() iterator shouldn't rely on particular order of elements. So relax this test. Fixes: ffb65f27 ("bpf: add a testsuite for eBPF maps") Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
BUG: sleeping function called from invalid context at mm/memory.c:3732 in_atomic(): 0, irqs_disabled(): 0, pid: 671, name: test_maps 1 lock held by test_maps/671: #0: (rcu_read_lock){......}, at: [<0000000000264190>] map_lookup_elem+0xe8/0x260 Call Trace: ([<0000000000115b7e>] show_trace+0x12e/0x150) [<0000000000115c40>] show_stack+0xa0/0x100 [<00000000009b163c>] dump_stack+0x74/0xc8 [<000000000017424a>] ___might_sleep+0x23a/0x248 [<00000000002b58e8>] might_fault+0x70/0xe8 [<0000000000264230>] map_lookup_elem+0x188/0x260 [<0000000000264716>] SyS_bpf+0x20e/0x840 Fix it by allocating temporary buffer to store map element value. Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps") Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0 [ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
Merge misc fixes from Andrew Morton: "Six fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element printk: add dummy routine for when CONFIG_PRINTK=n mm/vmscan: fix highidx argument type memcg: remove extra newlines from memcg oom kill log x86, build: replace Perl script with Shell script mm: page_alloc: embed OOM killing naturally into allocation slowpath
-
由 Ezequiel Garcia 提交于
Commit 69ad0dd7 Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Date: Mon May 19 13:59:59 2014 -0300 net: mv643xx_eth: Use dma_map_single() to map the skb fragments caused a nasty regression by removing the support for highmem skb fragments. By using page_address() to get the address of a fragment's page, we are assuming a lowmem page. However, such assumption is incorrect, as fragments can be in highmem pages, resulting in very nasty issues. This commit fixes this by using the skb_frag_dma_map() helper, which takes care of mapping the skb fragment properly. Additionally, the type of mapping is now tracked, so it can be unmapped using dma_unmap_page or dma_unmap_single when appropriate. This commit also fixes the error path in txq_init() to release the resources properly. Fixes: 69ad0dd7 ("net: mv643xx_eth: Use dma_map_single() to map the skb fragments") Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NEzequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Ben Hutchings says: ==================== Fixes for sh_eth #2 I'm continuing review and testing of Ethernet support on the R-Car H2 chip. This series fixes more of the issues I've found, but it won't be the last set. These are not tested on any of the other supported chips. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
In order to stop the RX path accessing the RX ring while it's being stopped or resized, we clear the interrupt mask (EESIPR) and then call free_irq() or synchronise_irq(). This is insufficient because the interrupt handler or NAPI poller may set EESIPR again after we clear it. Also, in sh_eth_set_ringparam() we currently don't disable NAPI polling at all. I could easily trigger a crash by running the loop: while ethtool -G eth0 rx 128 && ethtool -G eth0 rx 64; do echo -n .; done and 'ping -f' toward the sh_eth port from another machine. To fix this: - Add a software flag (irq_enabled) to signal whether interrupts should be enabled - In the interrupt handler, if the flag is clear then clear EESIPR and return - In the NAPI poller, if the flag is clear then don't set EESIPR - Set the flag before enabling interrupts in sh_eth_dev_init() and sh_eth_set_ringparam() - Clear the flag and serialise with the interrupt and NAPI handlers before clearing EESIPR in sh_eth_close() and sh_eth_set_ringparam() After this, I could run the loop for 100,000 iterations successfully. Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-