- 05 3月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
David noticed : ------------------ Eric, I was profiling the non-routing-cache case and something that stuck out is the case of calling inet_getpeer() with create==0. If an entry is not found, we have to redo the lookup under a spinlock to make certain that a concurrent writer rebalancing the tree does not "hide" an existing entry from us. This makes the case of a create==0 lookup for a not-present entry really expensive. It is on the order of 600 cpu cycles on my Niagara2. I added a hack to not do the relookup under the lock when create==0 and it now costs less than 300 cycles. This is now a pretty common operation with the way we handle COW'd metrics, so I think it's definitely worth optimizing. ----------------- One solution is to use a seqlock instead of a spinlock to protect struct inet_peer_base. After a failed avl tree lookup, we can easily detect if a writer did some changes during our lookup. Taking the lock and redo the lookup is only necessary in this case. Note: Add one private rcu_deref_locked() macro to place in one spot the access to spinlock included in seqlock. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 3月, 2011 7 次提交
-
-
由 David Howells 提交于
When a DNS resolver key is instantiated with an error indication, attempts to read that key will result in an oops because user_read() is expecting there to be a payload - and there isn't one [CVE-2011-1076]. Give the DNS resolver key its own read handler that returns the error cached in key->type_data.x[0] as an error rather than crashing. Also make the kenter() at the beginning of dns_resolver_instantiate() limit the amount of data it prints, since the data is not necessarily NUL-terminated. The buggy code was added in: commit 4a2d7892 Author: Wang Lei <wang840925@gmail.com> Date: Wed Aug 11 09:37:58 2010 +0100 Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2] This can trivially be reproduced by any user with the following program compiled with -lkeyutils: #include <stdlib.h> #include <keyutils.h> #include <err.h> static char payload[] = "#dnserror=6"; int main() { key_serial_t key; key = add_key("dns_resolver", "a", payload, sizeof(payload), KEY_SPEC_SESSION_KEYRING); if (key == -1) err(1, "add_key"); if (keyctl_read(key, NULL, 0) == -1) err(1, "read_key"); return 0; } What should happen is that keyctl_read() reports error 6 (ENXIO) to the user: dns-break: read_key: No such device or address but instead the kernel oopses. This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands as both of those cut the data down below the NUL termination that must be included in the data. Without this dns_resolver_instantiate() will return -EINVAL and the key will not be instantiated such that it can be read. The oops looks like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f PGD 3bdf8067 PUD 385b9067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq CPU 0 Modules linked in: Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468 /DG965RY RIP: 0010:[<ffffffff811b99f7>] [<ffffffff811b99f7>] user_read+0x4f/0x8f RSP: 0018:ffff88003bf47f08 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378 RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000 R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1 FS: 00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090) Stack: ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000 ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000 00000000004005a0 00007fffba368060 0000000000000000 0000000000000000 Call Trace: [<ffffffff811b708e>] keyctl_read_key+0xac/0xcf [<ffffffff811b7c07>] sys_keyctl+0x75/0xb6 [<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed <41> 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48 RIP [<ffffffff811b99f7>] user_read+0x4f/0x8f RSP <ffff88003bf47f08> CR2: 0000000000000010 Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> cc: Wang Lei <wang840925@gmail.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Patrick McHardy 提交于
Netlink message processing in the kernel is synchronous these days, capabilities can be checked directly in security_netlink_recv() from the current process. Signed-off-by: NPatrick McHardy <kaber@trash.net> Reviewed-by: NJames Morris <jmorris@namei.org> [chrisw: update to include pohmelfs and uvesafb] Signed-off-by: NChris Wright <chrisw@sous-sol.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Reported-by: NStephen Hemminger <shemminger@vyatta.com> Reported-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Because of various alignements [SLUB / qdisc], we use 512 bytes of memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and 192 bytes on 32bit ones. Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs) Change qdisc_alloc(), first trying a regular allocation before an oversized one. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Netlink message processing in the kernel is synchronous these days, the session information can be collected when needed. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
As reported by Eric: [11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60 ... [11483.697741] Call Trace: [11483.697764] [<c12fc9d2>] udp_sendmsg+0x282/0x6e0 [11483.697790] [<c12a1c01>] ? memcpy_toiovec+0x51/0x70 [11483.697818] [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0 The pointer passed to dst_release() is -EINVAL, that's because we leave an error pointer in the local variable "rt" by accident. NULL it out to fix the bug. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 3月, 2011 7 次提交
-
-
由 David Howells 提交于
The OpenAFS server is now sending ACKALL packets, so we need to handle them. Otherwise we report a protocol error and abort. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shmulik Ravid 提交于
This patch adds the support for retrieving the remote or peer DCBX configuration via dcbnl for embedded DCBX stacks supporting the CEE DCBX standard. Signed-off-by: NShmulik Ravid <shmulikr@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shmulik Ravid 提交于
These 2 patches add the support for retrieving the remote or peer DCBX configuration via dcbnl for embedded DCBX stacks. The peer configuration is part of the DCBX MIB and is useful for debugging and diagnostics of the overall DCB configuration. The first patch add this support for IEEE 802.1Qaz standard the second patch add the same support for the older CEE standard. Diff for v2 - the peer-app-info is CEE specific. Signed-off-by: NShmulik Ravid <shmulikr@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
The incorrect ops routine was being tested for in DCB_ATTR_IEEE_PFC attributes. This patch corrects it. Currently, every driver implementing ieee_setets also implements ieee_setpfc so this bug is not actualized yet. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This avoid a stack frame at zero cost. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Instead of on the stack. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Instead of on the stack. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2011 24 次提交
-
-
由 Jan Engelhardt 提交于
Like many other places, we have to check that the array index is within allowed limits, or otherwise, a kernel oops and other nastiness can ensue when we access memory beyond the end of the array. [ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000 [ 5954.120014] IP: __find_logger+0x6f/0xa0 [ 5954.123979] nf_log_bind_pf+0x2b/0x70 [ 5954.123979] nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log] [ 5954.123979] nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink] ... The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind was decoupled from nf_log_register. Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>, via irc.freenode.net/#netfilter Signed-off-by: NJan Engelhardt <jengelh@medozas.de> Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Gerrit Renker 提交于
This fixes a bug in the order of dccp_rcv_state_process() that still permitted reception even after closing the socket. A Reset after close thus causes a NULL pointer dereference by not preventing operations on an already torn-down socket. dccp_v4_do_rcv() | | state other than OPEN v dccp_rcv_state_process() | | DCCP_PKT_RESET v dccp_rcv_reset() | v dccp_time_wait() WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128() Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common) [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n) [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd) [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai) [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces) [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_) [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0) [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380) [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70) The fix is by testing the socket state first. Receiving a packet in Closed state now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1. Reported-and-tested-by: NJohan Hovold <jhovold@gmail.com> Cc: stable@kernel.org Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
The patch to replace inet->cork with cork left out two spots in __ip_append_data that can result in bogus packet construction. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
If CONFIG_NET_KEY_MIGRATE is not defined the arguments of pfkey_migrate stub do not match causing warning. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The route lookup code in icmpv6_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The route lookup code in icmp_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
Fix dst_lock usage in __ip_vs_update_dest. We need _bh locking because destination is updated in user context. Can cause lockups on frequent destination updates. Problem reported by Simon Kirby. Bug was introduced in 2.6.37 from the "ipvs: changes for local real server" change. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NHans Schillstrom <hans@schillstrom.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 David S. Miller 提交于
Return a dst pointer which is potentitally error encoded. Don't pass original dst pointer by reference, pass a struct net instead of a socket, and elide the flow argument since it is unnecessary. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This can be determined from the flow flags instead. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Since it indicates whether we are invoked from a sleepable context or not. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This boolean state is now available in the flow flags. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
And set is in contexts where the route resolution can sleep. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Since that is what the current vague "flags" argument means. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Since that's what the current vague "flags" thing means. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Route lookups follow a general pattern in the ipv6 code wherein we first find the non-IPSEC route, potentially override the flow destination address due to ipv6 options settings, and then finally make an IPSEC search using either xfrm_lookup() or __xfrm_lookup(). __xfrm_lookup() is used when we want to generate a blackhole route if the key manager needs to resolve the IPSEC rules (in this case -EREMOTE is returned and the original 'dst' is left unchanged). Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC resolution is necessary, we simply fail the lookup completely. All of these cases are encapsulated into two routines, ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which handles unconnected UDP datagram sockets. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
The UDP transmit path has been running under the socket lock for a long time because of the corking feature. This means that transmitting to the same socket in multiple threads does not scale at all. However, as most users don't actually use corking, the locking can be removed in the common case. This patch creates a lockless fast path where corking is not used. Please note that this does create a slight inaccuracy in the enforcement of socket send buffer limits. In particular, we may exceed the socket limit by up to (number of CPUs) * (packet size) because of the way the limit is computed. As the primary purpose of socket buffers is to indicate congestion, this should not be a great problem for now. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch converts UDP to use the new ip_finish_skb API. This would then allows us to more easily use ip_make_skb which allows UDP to run without a socket lock. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch adds the helper ip_make_skb which is like ip_append_data and ip_push_pending_frames all rolled into one, except that it does not send the skb produced. The sending part is carried out by ip_send_skb, which the transport protocol can call after it has tweaked the skb. It is meant to be called in cases where corking is not used should have a one-to-one correspondence to sendmsg. This patch also adds the helper ip_finish_skb which is meant to be replace ip_push_pending_frames when corking is required. Previously the protocol stack would peek at the socket write queue and add its header to the first packet. With ip_finish_skb, the protocol stack can directly operate on the final skb instead, just like the non-corking case with ip_make_skb. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
In order to allow simultaneous calls to ip_append_data on the same socket, it must not modify any shared state in sk or inet (other than those that are designed to allow that such as atomic counters). This patch abstracts out write references to sk and inet_sk in ip_append_data and its friends so that we may use the underlying code in parallel. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
UFO doesn't really use the sk_sndmsg_* parameters so touching them is pointless. It can't use them anyway since the whole point of UFO is to use the original pages without copying. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Berg 提交于
... Otherwise it is displayed when mac80211 isn't even turned on, which is completely pointless. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Felix Fietkau 提交于
Signed-off-by: NFelix Fietkau <nbd@openwrt.org> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Felix Fietkau 提交于
Also fix a typo in the STATION_INFO_TX_BITRATE description Signed-off-by: NFelix Fietkau <nbd@openwrt.org> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 01 3月, 2011 1 次提交
-
-
由 Anders Berggren 提交于
Enabling TX timestamps (SO_TIMESTAMPING) for IPv6 UDP packets, in the same fashion as for IPv4. Necessary in order for NICs such as Intel 82580 to timestamp IPv6 packets. Signed-off-by: NAnders Berggren <anders@halon.se> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-