- 02 6月, 2010 3 次提交
-
-
由 Eric Dumazet 提交于
When many cpus compete for sending frames on a given qdisc, the qdisc spinlock suffers from very high contention. The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire the lock, and cannot dequeue packets fast enough, since it must wait for this lock for each dequeued packet. One solution to this problem is to force all cpus spinning on a second lock before trying to get the main lock, when/if they see __QDISC_STATE_RUNNING already set. The owning cpu then compete with at most one other cpu for the main lock, allowing for higher dequeueing rate. Based on a previous patch from Alexander Duyck. I added the heuristic to avoid the atomic in fast path, and put the new lock far away from the cache line used by the dequeue worker. Also try to release the busylock lock as late as possible. Tests with following script gave a boost from ~50.000 pps to ~600.000 pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver. (A single netperf flow can reach ~800.000 pps on this platform) for j in `seq 0 3`; do for i in `seq 0 7`; do netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 & done done Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
__QDISC_STATE_RUNNING is always changed while qdisc lock is held. We can avoid two atomic operations in xmit path, if we move this bit in a new __state container. Location of this __state container is carefully chosen so that fast path only dirties one qdisc cache line. THROTTLED bit could later be moved into this __state location too, to avoid dirtying first qdisc cache line. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a second patch will move on another location. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 5月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
This new sock lock primitive was introduced to speedup some user context socket manipulation. But it is unsafe to protect two threads, one using regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh This patch changes lock_sock_bh to be careful against 'owned' state. If owned is found to be set, we must take the slow path. lock_sock_bh() now returns a boolean to say if the slow path was taken, and this boolean is used at unlock_sock_bh time to call the appropriate unlock function. After this change, BH are either disabled or enabled during the lock_sock_bh/unlock_sock_bh protected section. This might be misleading, so we rename these functions to lock_sock_fast()/unlock_sock_fast(). Reported-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 5月, 2010 2 次提交
-
-
由 Dan Carpenter 提交于
Sparse complains because these one-bit bitfields are signed. include/net/sctp/structs.h:879:24: error: dubious one-bit signed bitfield include/net/sctp/structs.h:889:31: error: dubious one-bit signed bitfield include/net/sctp/structs.h:895:26: error: dubious one-bit signed bitfield include/net/sctp/structs.h:898:31: error: dubious one-bit signed bitfield include/net/sctp/structs.h:901:27: error: dubious one-bit signed bitfield It doesn't cause a problem in the current code, but it would be better to clean it up. This was introduced by c0058a35: "sctp: Save some room in the sctp_transport by using bitfields". Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
When the cls_cgroup module is not loaded, task_cls_classid will return an uninitialised classid instead of zero. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 5月, 2010 5 次提交
-
-
由 Alexey Dobriyan 提交于
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NWANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Fix sock.h kernel-doc warning: Warning(include/net/sock.h:1438): No description found for parameter 'wq' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
There is a typo in cgroup_cls_state when cls_cgroup is built-in. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in mac80211.h: Warning(include/net/mac80211.h:838): No description found for parameter 'ap_addr' Warning(include/net/mac80211.h:1726): No description found for parameter 'get_survey' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 John W. Linville 提交于
This reverts commit 03ceedea. This patch was reported to cause a regression in which connectivity is lost and cannot be reestablished after a suspend/resume cycle. Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 24 5月, 2010 3 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 03ceedea, since it breaks resume from suspend-to-ram on Rafael's Acer Ferrari One. NetworkManager thinks everything is ok, but it can't connect to the AP to get an IP address after the resume. In fact, it even breaks resume for non-ath9k chipsets: reverting it also fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes says: "Indeed, this patch needs to be reverted. That mac80211 change is wrong and completely unnecessary." Reported-and-requested-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NJohannes Berg <johannes@sipsolutions.net> Cc: Daniel Yingqiang Ma <yma.cool@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Herbert Xu 提交于
Up until now cls_cgroup has relied on fetching the classid out of the current executing thread. This runs into trouble when a packet processing is delayed in which case it may execute out of another thread's context. Furthermore, even when a packet is not delayed we may fail to classify it if soft IRQs have been disabled, because this scenario is indistinguishable from one where a packet unrelated to the current thread is processed by a real soft IRQ. In fact, the current semantics is inherently broken, as a single skb may be constructed out of the writes of two different tasks. A different manifestation of this problem is when the TCP stack transmits in response of an incoming ACK. This is currently unclassified. As we already have a concept of packet ownership for accounting purposes in the skb->sk pointer, this is a natural place to store the classid in a persistent manner. This patch adds the cls_cgroup classid in struct sock, filling up an existing hole on 64-bit :) The value is set at socket creation time. So all sockets created via socket(2) automatically gains the ID of the thread creating it. Whenever another process touches the socket by either reading or writing to it, we will change the socket classid to that of the process if it has a valid (non-zero) classid. For sockets created on inbound connections through accept(2), we inherit the classid of the original listening socket through sk_clone, possibly preceding the actual accept(2) call. In order to minimise risks, I have not made this the authoritative classid. For now it is only used as a backup when we execute with soft IRQs disabled. Once we're completely happy with its semantics we can use it as the sole classid. Footnote: I have rearranged the error path on cls_group module creation. If we didn't do this, then there is a window where someone could create a tc rule using cls_group before the cgroup subsystem has been registered. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sjur Braendeland 提交于
Discovered bug when running high number of parallel connect requests. Replace buggy home brewed list with linux/list.h. Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 5月, 2010 2 次提交
-
-
由 Sripathi Kodi 提交于
I made a V2 of this patch on top of my patches for VFS switches. All the changes were due to change in some offsets. rename - change name of file or directory size[4] Trename tag[2] fid[4] newdirfid[4] name[s] size[4] Rrename tag[2] The rename message is used to change the name of a file, possibly moving it to a new directory. The 9P wstat message can only rename a file within the same directory. Signed-off-by: NJim Garlick <garlick@llnl.gov> Signed-off-by: NSripathi Kodi <sripathik@in.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 Sripathi Kodi 提交于
I made a V2 of this patch on top of my patches for VFS switches. The change was adding v9fs_statfs pointer to v9fs_super_ops_dotl instead of v9fs_super_ops. statfs - get file system statistics size[4] Tstatfs tag[2] fid[4] size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8] files[8] ffree[8] fsid[8] namelen[4] The statfs message is used to request file system information returned by the statfs(2) system call, which is used by df(1) to report file system and disk space usage. Signed-off-by: NJim Garlick <garlick@llnl.gov> Signed-off-by: NSripathi Kodi <sripathik@in.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
- 20 5月, 2010 1 次提交
-
-
由 Joerg Marx 提交于
This race was triggered by a 'conntrack -F' command running in parallel to the insertion of a hash for a new connection. Losing this race led to a dead conntrack entry effectively blocking traffic for a particular connection until timeout or flushing the conntrack hashes again. Now the check for an already dying connection is done inside the lock. Signed-off-by: NJoerg Marx <joerg.marx@secunet.com> Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
- 19 5月, 2010 1 次提交
-
-
由 Herbert Xu 提交于
This patch replaces the boolean dead flag on inet6_ifaddr with a state enum. This allows us to roll back changes when deleting an address according to whether DAD has completed or not. This patch only adds the state field and does not change the logic. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2010 6 次提交
-
-
由 Eric Dumazet 提交于
skb rxhash should be cleared when a skb is handled by a tunnel before being delivered again, so that correct packet steering can take place. There are other cleanups and accounting that we can factorize in a new helper, skb_tunnel_rx() Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 andrew hendry 提交于
Moves the x25 accept approve flag from char into bitfield. Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 andrew hendry 提交于
Moves the x25 interrupt flag from char into bitfield. Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 andrew hendry 提交于
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup. Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ip_route_input() is the version returning a refcounted dst, while ip_route_input_noref() returns a non refcounted one. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2010 3 次提交
-
-
由 Eric Dumazet 提交于
TCP-MD5 sessions have intermittent failures, when route cache is invalidated. ip_queue_xmit() has to find a new route, calls sk_setup_caps(sk, &rt->u.dst), destroying the sk->sk_route_caps &= ~NETIF_F_GSO_MASK that MD5 desperately try to make all over its way (from tcp_transmit_skb() for example) So we send few bad packets, and everything is fine when tcp_transmit_skb() is called again for this socket. Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a socket field, sk_route_nocaps, containing bits to mask on sk_route_caps. Reported-by: NBhaskar Dutta <bhaskie@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
TCP MD5 support uses percpu data for temporary storage. It currently disables preemption so that same storage cannot be reclaimed by another thread on same cpu. We also have to make sure a softirq handler wont try to use also same context. Various bug reports demonstrated corruptions. Fix is to disable preemption and BH. Reported-by: NBhaskar Dutta <bhaskie@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amerigo Wang 提交于
(Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NWANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 5月, 2010 2 次提交
-
-
由 Allan Stephens 提交于
Eliminate comments in TIPC's main API files that are either obsolete, incorrect, misleading, or unhelpful. It also adds in one new comment. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Berg 提交于
This adds support for offloading the channel switch operation to devices that support such, typically by having specific firmware API for it. The reasons for this could be that the firmware provides better timing or that regulatory enforcement done by the device requires special handling of CSAs. In order to allow drivers to specify the timing to the device, the new channel_switch callback will pass through the received frame's mactime, where available. Signed-off-by: NWey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 11 5月, 2010 4 次提交
-
-
由 Patrick McHardy 提交于
This patch adds support for multiple independant multicast routing instances, named "tables". Userspace multicast routing daemons can bind to a specific table instance by issuing a setsockopt call using a new option MRT6_TABLE. The table number is stored in the raw socket data and affects all following ip6mr setsockopt(), getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT) is created with a default routing rule pointing to it. Newly created pim6reg devices have the table number appended ("pim6regX"), with the exception of devices created in the default table, which are named just "pim6reg" for compatibility reasons. Packets are directed to a specific table instance using routing rules, similar to how regular routing rules work. Currently iif, oif and mark are supported as keys, source and destination addresses could be supported additionally. Example usage: - bind pimd/xorp/... to a specific table: uint32_t table = 123; setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table)); - create routing rules directing packets to the new table: # ip -6 mrule add iif eth0 lookup 123 # ip -6 mrule add oif eth0 lookup 123 Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Patrick McHardy 提交于
Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Patrick McHardy 提交于
Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Patrick McHardy 提交于
The unres_queue is currently shared between all namespaces. Following patches will additionally allow to create multiple multicast routing tables in each namespace. Having a single shared queue for all these users seems to excessive, move the queue and the cleanup timer to the per-namespace data to unshare it. As a side-effect, this fixes a bug in the seq file iteration functions: the first entry returned is always from the current namespace, entries returned after that may belong to any namespace. Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
- 10 5月, 2010 7 次提交
-
-
由 Marcel Holtmann 提交于
Instead of having a global workqueue for all controllers, it makes more sense to have a workqueue per controller. Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Gustavo F. Padovan 提交于
l2cap_ertm_send() can be called both from user context and bottom half context. The socket locks for that contexts are different, the user context uses a mutex(which can sleep) and the second one uses a spinlock_bh. That creates a race condition when we have interruptions on both contexts at the same time. The better way to solve this is to add a new spinlock to lock l2cap_ertm_send() and the vars it access. The other solution was to defer l2cap_ertm_send() with a workqueue, but we the sending process already has one defer on the hci layer. It's not a good idea add another one. The patch refactor the code to create l2cap_retransmit_frames(), then we encapulate the lock of l2cap_ertm_send() for some call. It also changes l2cap_retransmit_frame() to l2cap_retransmit_one_frame() to avoid confusion Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: NJoão Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Gustavo F. Padovan 提交于
Supports Local Busy condition handling through a waitqueue that wake ups each 200ms and try to push the packets to the upper layer. If it can push all the queue then it leaves the Local Busy state. The patch modifies the behaviour of l2cap_ertm_reassembly_sdu() to support retry of the push operation. Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: NJoão Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Gustavo F. Padovan 提交于
hci_send_acl can't fail, so we can make it void. This patch changes that and all the funcions that use hci_send_acl(). That change exposed a bug on sending connectionless data. We were not reporting the lenght send back to the user space. Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: NJoão Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Gustavo F. Padovan 提交于
With the sockopt extension we can set a per-channel MaxTx value. Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: NJoão Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Gustavo F. Padovan 提交于
Now that we can set the txWindow we need to change the acknowledgement procedure to ack after each (pi->txWindow/6 + 1). The plus 1 is to avoid the zero value. It also renames pi->num_to_ack to a better name: pi->num_acked. Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: NJoão Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Gustavo F. Padovan 提交于
Now we can set/get Transmission Window size via sockopt. Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: NJoão Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-