- 10 12月, 2010 2 次提交
-
-
由 Eric Dumazet 提交于
Followup of commit b178bb3d (net: reorder struct sock fields) Optimize INET input path a bit further, by : 1) moving sk_refcnt close to sk_lock. This reduces number of dirtied cache lines by one on 64bit arches (and 64 bytes cache line size). 2) moving inet_daddr & inet_rcv_saddr at the beginning of sk (same cache line than hash / family / bound_dev_if / nulls_node) This reduces number of accessed cache lines in lookups by one, and dont increase size of inet and timewait socks. inet and tw sockets now share same place-holder for these fields. Before patch : offsetof(struct sock, sk_refcnt) = 0x10 offsetof(struct sock, sk_lock) = 0x40 offsetof(struct sock, sk_receive_queue) = 0x60 offsetof(struct inet_sock, inet_daddr) = 0x270 offsetof(struct inet_sock, inet_rcv_saddr) = 0x274 After patch : offsetof(struct sock, sk_refcnt) = 0x44 offsetof(struct sock, sk_lock) = 0x48 offsetof(struct sock, sk_receive_queue) = 0x68 offsetof(struct inet_sock, inet_daddr) = 0x0 offsetof(struct inet_sock, inet_rcv_saddr) = 0x4 compute_score() (udp or tcp) now use a single cache line per ignored item, instead of two. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Use helper functions to hide all direct accesses, especially writes, to dst_entry metrics values. This will allow us to: 1) More easily change how the metrics are stored. 2) Implement COW for metrics. In particular this will help us put metrics into the inetpeer cache if that is what we end up doing. We can make the _metrics member a pointer instead of an array, initially have it point at the read-only metrics in the FIB, and then on the first set grab an inetpeer entry and point the _metrics member there. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
-
- 09 12月, 2010 3 次提交
-
-
由 Tom Herbert 提交于
Rather than printing the message to the log, use a mib counter to keep track of the count of occurences of time wait bucket overflow. Reduces spam in logs. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
sk_run_filter() doesnt write on skb, change its prototype to reflect this. Fix two af_packet comments. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit : > Hmm.. > > If somebody can explain why RTNL is held in arp_ioctl() (and therefore > in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so > that your patch can be applied. > > Right now it is not good, because RTNL wont be necessarly held when you > are going to call arp_invalidate() ? While doing this analysis, I found a refcount bug in llc, I'll send a patch for net-2.6 Meanwhile, here is the patch for net-next-2.6 Your patch then can be applied after mine. Thanks [PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl() dev_getbyhwaddr() was called under RTNL. Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use RCU locking instead of RTNL. Change arp_ioctl() to use RCU instead of RTNL locking. Note: this fix a dev refcount bug in llc Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2010 6 次提交
-
-
由 Tomasz Grobelny 提交于
This patch adds a generic infrastructure for policy-based dequeueing of TX packets and provides two policies: * a simple FIFO policy (which is the default) and * a priority based policy (set via socket options). Both policies honour the tx_qlen sysctl for the maximum size of the write queue (can be overridden via socket options). The priority policy uses skb->priority internally to assign an u32 priority identifier, using the same ranking as SO_PRIORITY. The skb->priority field is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary data using cmsg(3), the patch also provides the requisite parsing routines. Signed-off-by: NTomasz Grobelny <tomasz@grobelny.oswiecenia.net> Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
-
由 Eric Dumazet 提交于
If caller holds RTNL, we dont need a memory barrier (smp_read_barrier_depends) included in rcu_dereference(). Just use rtnl_dereference() to properly document the assertions. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism, to be able to build advanced filters. This can help spreading packets on several sockets with a fast selection, after RPS dispatch to N cpus for example, or to catch a percentage of flows in one queue. tcpdump -s 500 "cpu = 1" : [0] ld CPU [1] jeq #1 jt 2 jf 3 [2] ret #500 [3] ret #0 # take 12.5 % of flows (average) tcpdump -s 1000 "rxhash & 7 = 2" : [0] ld RXHASH [1] and #7 [2] jeq #2 jt 3 jf 4 [3] ret #1000 [4] ret #0 Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Rui <wirelesser@gmail.com> Acked-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexey Orishko 提交于
Changes: include/linux/usb/usbnet.h: - a new flag to indicate driver's capability to accumulate IP packets in Tx direction and extract several packets from single skb in Rx direction. drivers/net/usb/usbnet.c: - the procedure of counting packets in usbnet was updated due to the accumulating of IP packets in the driver - no short packets are sent if indicated by the flag in driver_info structure Signed-off-by: NAlexey Orishko <alexey.orishko@stericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matt Carlson 提交于
In commit 52b02d04 entitled "tg3: Add EEE support", Ben Hutchings had commented that the EEE advertisement register will be in a standard location. This patch moves that definition into mdio.h and changes the code to use it. Signed-off-by: NMatt Carlson <mcarlson@broadcom.com> Reviewed-by: NBenjamin Li <benli@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and sk_clone() in commit 47e958ea Problem is we can have several clones sharing a common sk_filter, and these clones might want to sk_filter_attach() their own filters at the same time, and can overwrite old_filter->rcu, corrupting RCU queues. We can not use filter->rcu without being sure no other thread could do the same thing. Switch code to a more conventional ref-counting technique : Do the atomic decrement immediately and queue one rcu call back when last reference is released. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2010 5 次提交
-
-
由 Allan Stephens 提交于
As part of the removal of TIPC's native API support it is no longer necessary for TIPC to export symbols for routines that can be called by kernel-based applications, nor for it to have header files that kernel-based applications can include to access the declarations for those routines. This commit eliminates the exporting of symbols by TIPC and migrates the contents of each obsolete native API include file into its corresponding non-native API equivalent. The code which was migrated in this commit was migrated intact, in that there are no technical changes combined with the relocation. Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shan Wei 提交于
These macros have been defined for several years since v2.6.12-rc2(tracing by git), but never be used. So remove them. Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shan Wei 提交于
__ICMP_MIB_MAX is equal to the total number of icmp mib, So no need to add 1. This wastes 4/8 bytes memory. Change it to be same as ICMP6_MIB_MAX, TCP_MIB_MAX, UDP_MIB_MAX. Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Brother of ipv4's inet_csk_route_req(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
To go along side ipv4's rt_get_peer(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 12月, 2010 4 次提交
-
-
由 David S. Miller 提交于
The only thing AF-specific about remembering the timestamp for a time-wait TCP socket is getting the peer. Abstract that behind a new timewait_sock_ops vector. Support for real IPV6 sockets is not filled in yet, but curiously this makes timewait recycling start to work for v4-mapped ipv6 sockets. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Now with ipv6 support it is no longer less than 64 bytes. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
They are verboten these days. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Allocate qdisc memory according to NUMA properties of cpus included in xps map. To be effective, qdisc should be (re)setup after changes of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus I added a numa_node field in struct netdev_queue, containing NUMA node if all cpus included in xps_cpus share same node, else -1. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2010 5 次提交
-
-
由 David S. Miller 提交于
Then we can make a completely generic tcp_remember_stamp() that uses ->get_peer() as a helper, minimizing the AF specific code and minimizing the eventual code duplication when we implement the ipv6 side of TW recycling. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
They are only allowed on cached ipv6 routes. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Now that all of the infrastructure is in place, we can add the ipv6 shorthand for peer creation. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
And make an inet_getpeer_v4() helper, update callers. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Currently only the v4 aspect is used, but this will change. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2010 3 次提交
-
-
由 Eric Dumazet 提交于
Its easy to eat all kernel memory and trigger NMI watchdog, using an exploit program that queues unix sockets on top of others. lkml ref : http://lkml.org/lkml/2010/11/25/8 This mechanism is used in applications, one choice we have is to have a recursion limit. Other limits might be needed as well (if we queue other types of files), since the passfd mechanism is currently limited by socket receive queue sizes only. Add a recursion_level to unix socket, allowing up to 4 levels. Each time we send an unix socket through sendfd mechanism, we copy its recursion level (plus one) to receiver. This recursion level is cleared when socket receive queue is emptied. Reported-by: NМарк Коренберг <socketpair@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Avoid sparse warnings : add __rcu annotations and use rcu_dereference_protected() where necessary. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shan Wei 提交于
1. SCTP_CMD_NUM_VERBS,SCTP_CMD_MAX These two macros have never been used for several years since v2.6.12-rc2. 2.sctp_port_rover,sctp_port_alloc_lock The commit 063930 abandoned global variables of port_rover and port_alloc_lock, but still keep two macros to refer to them. So, remove them now. commit 06393009 Author: Stephen Hemminger <shemminger@linux-foundation.org> Date: Wed Oct 10 17:30:18 2007 -0700 [SCTP]: port randomization Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2010 5 次提交
-
-
由 Tom Herbert 提交于
This patch adds XPS_CONFIG option to enable and disable XPS. This is done in the same manner as RPS_CONFIG. This is also fixes build failure in XPS code when SMP is not enabled. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Timo Teräs 提交于
fl->fl_gre_key is network byte order contrary to fl->fl_icmp_*. Make xfrm_flowi_{s|d}port return network byte order values for gre key too. Signed-off-by: NTimo Teräs <timo.teras@iki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 andrew hendry 提交于
Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When testing struct netdev_queue state against FROZEN bit, we also test XOFF bit. We can test both bits at once and save some cycles. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shan Wei 提交于
These macros have been existed for several years since v2.6.12-rc2. But they never be used. So remove them now. Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2010 1 次提交
-
-
由 Thomas Graf 提交于
As David pointed out correctly, updates to af-specific attributes are currently not atomic. If multiple changes are requested and one of them fails, previous updates may have been applied already leaving the link behind in a undefined state. This patch splits the function parse_link_af() into two functions validate_link_af() and set_link_at(). validate_link_af() is placed to validate_linkmsg() check for errors as early as possible before any changes to the link have been made. set_link_af() is called to commit the changes later. This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. Also, instead of silently ignoring unknown address families and config blocks for address families which did not register a set function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are returned to avoid comitting 4 out of 5 update requests without notifying the user. Signed-off-by: NThomas Graf <tgraf@infradead.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2010 6 次提交
-
-
由 Johannes Berg 提交于
This adds the ability for drivers to use CQM events to notify about packet loss for specific stations (which could be the AP for the managed mode case). Since the threshold might be determined by the driver (it isn't passed in right now) it will be passed out of the driver to userspace in the event. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Bruno Randolf 提交于
Signed-off-by: NBruno Randolf <br1@einfach.org> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Felix Fietkau 提交于
- store the multicast rate as an index instead of the rate value (reduces cpu overhead in a hotpath) - validate the rate values (must match a bitrate in at least one sband) Signed-off-by: NFelix Fietkau <nbd@openwrt.org> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 John W. Linville 提交于
This reverts commit 86107fd1. This patch inadvertantly changed the userland ABI. Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Tom Herbert 提交于
This patch implements transmit packet steering (XPS) for multiqueue devices. XPS selects a transmit queue during packet transmission based on configuration. This is done by mapping the CPU transmitting the packet to a queue. This is the transmit side analogue to RPS-- where RPS is selecting a CPU based on receive queue, XPS selects a queue based on the CPU (previously there was an XPS patch from Eric Dumazet, but that might more appropriately be called transmit completion steering). Each transmit queue can be associated with a number of CPUs which will use the queue to send packets. This is configured as a CPU mask on a per queue basis in: /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus The mappings are stored per device in an inverted data structure that maps CPUs to queues. In the netdevice structure this is an array of num_possible_cpu structures where each structure holds and array of queue_indexes for queues which that CPU can use. The benefits of XPS are improved locality in the per queue data structures. Also, transmit completions are more likely to be done nearer to the sending thread, so this should promote locality back to the socket on free (e.g. UDP). The benefits of XPS are dependent on cache hierarchy, application load, and other factors. XPS would nominally be configured so that a queue would only be shared by CPUs which are sharing a cache, the degenerative configuration woud be that each CPU has it's own queue. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. bnx2x on 16 core AMD XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU No XPS (16 queues) 996K at 100% CPU Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
In dev_pick_tx, don't do work in calculating queue index or setting the index in the sock unless the device has more than one queue. This allows the sock to be set only with a queue index of a multi-queue device which is desirable if device are stacked like in a tunnel. We also allow the mapping of a socket to queue to be changed. To maintain in order packet transmission a flag (ooo_okay) has been added to the sk_buff structure. If a transport layer sets this flag on a packet, the transmit queue can be changed for the socket. Presumably, the transport would set this if there was no possbility of creating OOO packets (for instance, there are no packets in flight for the socket). This patch includes the modification in TCP output for setting this flag. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-