- 16 2月, 2011 1 次提交
-
-
由 Ben Hutchings 提交于
If the root qdisc for a net device is mqprio, and the driver's ndo_setup_tc() operation dynamically adds and remvoes TX queues, netif_set_real_num_tx_queues() will be called during device unregistration to remove the extra TX queues when the qdisc is destroyed. Currently this causes the corresponding kobjects to be leaked, and the device's reference count never drops to 0. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
-
- 14 2月, 2011 3 次提交
-
-
由 Jiri Pirko 提交于
This patch allows userspace to enslave/release slave devices via netlink interface using IFLA_MASTER. This introduces generic way to add/remove underling devices. Signed-off-by: NJiri Pirko <jpirko@redhat.com> Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
dev->master is now tightly connected to bonding driver. This patch makes this pointer more general and ready to be used by others. - netdev_set_master() - bond specifics moved to new function netdev_set_bond_master() - introduced netif_is_bond_slave() to check if device is a bonding slave Signed-off-by: NJiri Pirko <jpirko@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
No need to check (master) twice and to drive in and out the header file. Signed-off-by: NJiri Pirko <jpirko@redhat.com> Reviewed-by: NNicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 2月, 2011 1 次提交
-
-
由 Xiaotian Feng 提交于
commit a512b92b adds sysfs entry for net device group, but before this commit, tun also uses group sysfs, so after this commit checkin, kernel warns like this: sysfs: cannot create duplicate filename '/devices/virtual/net/vnet0/group' Since tun has used this for years, rename sysfs under tun might break existing userspace, so rename group sysfs entry for net device group is a better choice. Signed-off-by: NXiaotian Feng <dfeng@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 2月, 2011 1 次提交
-
-
由 David S. Miller 提交于
In commit aa942104 ("net: init ingress queue") we moved the allocation and lock initialization of the queues into alloc_netdev_mq() since register_netdevice() is way too late. The problem is that dev->type is not setup until the setup() callback is invoked by alloc_netdev_mq(), and the dev->type is what determines the lockdep class to use for the locks in the queues. Fix this by doing the queue allocation after the setup() callback runs. This is safe because the setup() callback is not allowed to make any state changes that need to be undone on error (memory allocations, etc.). It may, however, make state changes that are undone by free_netdev() (such as netif_napi_add(), which is done by the ipoib driver's setup routine). The previous code also leaked a reference to the &init_net namespace object on RX/TX queue allocation failures. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 2月, 2011 1 次提交
-
-
由 Andy Gospodarek 提交于
Like Herbert's change from a few days ago: 66c46d74 gro: Reset dev pointer on reuse this may not be necessary at this point, but we should still clean up the skb->skb_iif. If not we may end up with an invalid valid for skb->skb_iif when the skb is reused and the check is done in __netif_receive_skb. Signed-off-by: NAndy Gospodarek <andy@greyhouse.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 2月, 2011 1 次提交
-
-
由 Tom Herbert 提交于
In get_rps_cpu, add check that the rps_flow_table for the device is NULL when trying to take fast path when RPS map length is one. Without this, RFS is effectively disabled if map length is one which is not correct. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 1月, 2011 2 次提交
-
-
由 Eric W. Biederman 提交于
Ed Swierk <eswierk@bigswitch.com> writes: > On 2.6.35.7 > ip link add link eth0 netns 9999 type macvlan > where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang: > [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d > [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0 > [10663.821944] Oops: 0000 [#1] SMP > [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq > [10663.821959] CPU 3 > [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class > [10663.822155] > [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO > [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286 > [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000 > [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000 > [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041 > [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818 > [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000 > [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000 > [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0 > [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0) > [10663.822236] Stack: > [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6 > [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818 > [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413 > [10663.822281] Call Trace: > [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0 > [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90 > [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0 > [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570 > [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570 > [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20 > [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290 > [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290 > [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0 > [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40 > [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0 > [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0 > [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120 > [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150 > [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80 > [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110 > [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70 > [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0 > [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0 > [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560 > [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150 > [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350 > [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b > [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55 > [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.822627] RSP <ffff88014aebf7b8> > [10663.822631] CR2: 000000000000006d > [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]--- This bug was introduced in: commit 81adee47 Author: Eric W. Biederman <ebiederm@aristanetworks.com> Date: Sun Nov 8 00:53:51 2009 -0800 net: Support specifying the network namespace upon device creation. There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Where apparently I forgot to add error handling to the path where we create a new network device in a new network namespace, and pass in an invalid pid. Cc: stable@kernel.org Reported-by: NEd Swierk <eswierk@bigswitch.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
On older kernels the VLAN code may zero skb->dev before dropping it and causing it to be reused by GRO. Unfortunately we didn't reset skb->dev in that case which causes the next GRO user to get a bogus skb->dev pointer. This particular problem no longer happens with the current upstream kernel due to changes in VLAN processing. However, for correctness we should still reset the skb->dev pointer in the GRO reuse function in case a future user does the same thing. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 1月, 2011 1 次提交
-
-
由 David S. Miller 提交于
If there are no explicit metrics attached to a route, hook fi->fib_info up to dst_default_metrics. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 1月, 2011 3 次提交
-
-
由 Eric Dumazet 提交于
Commit c6d14c84 (net: Introduce for_each_netdev_rcu() iterator) added a race in dev_seq_next(). The rcu_dereference() call should be done _before_ testing the end of list, or we might return a wrong net_device if a concurrent thread changes net_device list under us. Note : discovered thanks to a sparse warning : net/core/dev.c:3919:9: error: incompatible types in comparison expression (different address spaces) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
pskb_expand_head() triggers a kmemcheck warning when copy of skb_shared_info is done in pskb_expand_head() This is because destructor_arg field is not necessarily initialized at this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to instruct kmemcheck this is a normal situation. Resolves bugzilla.kernel.org 27212 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212Reported-by: NChristian Casteyde <casteyde.christian@free.fr> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kurt Van Dijck 提交于
I'm testing an API that uses IFLA_AF_SPEC attribute. In the rtnetlink core , the set_link_af() member of the rtnl_af_ops struct receives the nested attribute (as I expected), but the validate_link_af() member receives the parent attribute. IMO, this patch fixes this. Signed-off-by: NKurt Van Dijck <kurt.van.dijck@eia.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 1月, 2011 1 次提交
-
-
由 David S. Miller 提交于
Routing metrics are now copy-on-write. Initially a route entry points it's metrics at a read-only location. If a routing table entry exists, it will point there. Else it will point at the all zero metric place-holder called 'dst_default_metrics'. The writeability state of the metrics is stored in the low bits of the metrics pointer, we have two bits left to spare if we want to store more states. For the initial implementation, COW is implemented simply via kmalloc. However future enhancements will change this to place the writable metrics somewhere else, in order to increase sharing. Very likely this "somewhere else" will be the inetpeer cache. Note also that this means that metrics updates may transiently fail if we cannot COW the metrics successfully. But even by itself, this patch should decrease memory usage and increase cache locality especially for routing workloads. In those cases the read-only metric copies stay in place and never get written to. TCP workloads where metrics get updated, and those rare cases where PMTU triggers occur, will take a very slight performance hit. But that hit will be alleviated when the long-term writable metrics move to a more sharable location. Since the metrics storage went from a u32 array of RTAX_MAX entries to what is essentially a pointer, some retooling of the dst_entry layout was necessary. Most importantly, we need to preserve the alignment of the reference count so that it doesn't share cache lines with the read-mostly state, as per Eric Dumazet's alignment assertion checks. The only non-trivial bit here is the move of the 'flags' member into the writeable cacheline. This is OK since we are always accessing the flags around the same moment when we made a modification to the reference count. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 1月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
We spend lot of time clearing pages in pktgen. (Or not clearing them on ipv6 and leaking kernel memory) Since we dont modify them, we can use one zeroed page, and get references on it. This page can use NUMA affinity as well. Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6 Results using skbs with one frag : Before patch : Result: OK: 608980458(c608978520+d1938) nsec, 1000000000 (100byte,1frags) 1642088pps 1313Mb/sec (1313670400bps) errors: 0 After patch : Result: OK: 345285014(c345283891+d1123) nsec, 1000000000 (100byte,1frags) 2896158pps 2316Mb/sec (2316926400bps) errors: 0 Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 1月, 2011 8 次提交
-
-
由 Vlad Dogaru 提交于
The group of a network device can be queried or changed from userspace using sysfs. For example, considering sysfs mounted in /sys, one can change the group that interface lo belongs to: echo 1 > /sys/class/net/lo/group Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org> Acked-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eugene Teo 提交于
There is a conflict between commit b00916b1 and a77f5db3. This patch resolves the conflict by clearing the heap allocation in ethtool_get_regs(). Cc: stable@kernel.org Signed-off-by: NEugene Teo <eugeneteo@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will be used by ethtool and might spam dmesg unnecessarily. This converts the function to use netdev_info() instead of plain printk(). As a side effect, bonding and bridge devices will now log dropped features on every slave device change. Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Allow drivers for multiqueue hardware with flow filter tables to accelerate RFS. The driver must: 1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion IRQs (in queue order). This will provide a mapping from CPUs to the queues for which completions are handled nearest to them. 2. Implement net_device_ops::ndo_rx_flow_steer. This operation adds or replaces a filter steering the given flow to the given RX queue, if possible. 3. Periodically remove filters for which rps_may_expire_flow() returns true. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Schmidt 提交于
Suppose that several linear skbs of the same flow were received by GRO. They were thus merged into one skb with a frag_list. Then a new skb of the same flow arrives, but it is a paged skb with data starting in its frags[]. Before adding the skb to the frag_list skb_gro_receive() will of course adjust the skb to throw away the headers. It correctly modifies the page_offset and size of the frag, but it leaves incorrect information in the skb: ->data_len is not decreased at all. ->len is decreased only by headlen, as if no change were done to the frag. Later in a receiving process this causes skb_copy_datagram_iovec() to return -EFAULT and this is seen in userspace as the result of the recv() syscall. In practice the bug can be reproduced with the sfc driver. By default the driver uses an adaptive scheme when it switches between using napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is reproduced when under rx load with enough successful GRO merging the driver decides to switch from the former to the latter. Manual control is also possible, so reproducing this is easy with netcat: - on machine1 (with sfc): nc -l 12345 > /dev/null - on machine2: nc machine1 12345 < /dev/zero - on machine1: echo 1 > /sys/module/sfc/parameters/rx_alloc_method # use skbs echo 2 > /sys/module/sfc/parameters/rx_alloc_method # use pages - See that nc has quit suddenly. [v2: Modified by Eric Dumazet to avoid advancing skb->data past the end and to use a temporary variable.] Signed-off-by: NMichal Schmidt <mschmidt@redhat.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Commit 941666c2 "net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()" introduced a regression, reported by Jamie Heilman. "arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert in pneigh_lookup() Removing RTNL requirement from arp_ioctl() was a mistake, just revert that part. Reported-by: NJamie Heilman <jamie@audible.transient.net> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 1月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 1月, 2011 4 次提交
-
-
由 Patrick McHardy 提交于
rtnl_group_changelink() is invoked by rtnl_newlink() before the link attributes have been validated. Additionally the group changes are performed even if NLM_F_CREATE is specified and a new link is created, while more reasonable semantics would be to set the group value on the newly created link. Fix both problems by moving the rtnl_group_changelink() invocation down to the handling of non-existant links without NLM_F_CREATE() and add a dev_set_group() call to rtnl_create_link(). Signed-off-by: NPatrick McHardy <kaber@trash.net> Acked-by: NVlad Dogaru <ddvlad@rosedu.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
fix some minor issues and sparse (__rcu) warnings Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This patch converts stab qdisc management to RCU, so that we can perform the qdisc_calculate_pkt_len() call before getting qdisc lock. This shortens the lock's held time in __dev_xmit_skb(). This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of cache misses and so reducing latencies. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Octavian Purdila <opurdila@ixiacom.com> Reviewed-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2011 5 次提交
-
-
由 John Fastabend 提交于
This patch provides a mechanism for lower layer devices to steer traffic using skb->priority to tx queues. This allows for hardware based QOS schemes to use the default qdisc without incurring the penalties related to global state and the qdisc lock. While reliably receiving skbs on the correct tx ring to avoid head of line blocking resulting from shuffling in the LLD. Finally, all the goodness from txq caching and xps/rps can still be leveraged. Many drivers and hardware exist with the ability to implement QOS schemes in the hardware but currently these drivers tend to rely on firmware to reroute specific traffic, a driver specific select_queue or the queue_mapping action in the qdisc. By using select_queue for this drivers need to be updated for each and every traffic type and we lose the goodness of much of the upstream work. Firmware solutions are inherently inflexible. And finally if admins are expected to build a qdisc and filter rules to steer traffic this requires knowledge of how the hardware is currently configured. The number of tx queues and the queue offsets may change depending on resources. Also this approach incurs all the overhead of a qdisc with filters. With the mechanism in this patch users can set skb priority using expected methods ie setsockopt() or the stack can set the priority directly. Then the skb will be steered to the correct tx queues aligned with hardware QOS traffic classes. In the normal case with single traffic class and all queues in this class everything works as is until the LLD enables multiple tcs. To steer the skb we mask out the lower 4 bits of the priority and allow the hardware to configure upto 15 distinct classes of traffic. This is expected to be sufficient for most applications at any rate it is more then the 8021Q spec designates and is equal to the number of prio bands currently implemented in the default qdisc. This in conjunction with a userspace application such as lldpad can be used to implement 8021Q transmission selection algorithms one of these algorithms being the extended transmission selection algorithm currently being used for DCB. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Dogaru 提交于
If a rtnetlink request specifies a negative or zero ifindex and has no interface name attribute, but has a group attribute, then the chenges are made to all the interfaces belonging to the specified group. Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org> Acked-by: NJamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Dogaru 提交于
Net devices can now be grouped, enabling simpler manipulation from userspace. This patch adds a group field to the net_device structure, as well as rtnetlink support to query and modify it. Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org> Acked-by: NJamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit 03634668 (net offloading: Convert checksums to use centrally computed features.) mistakenly swapped can_checksum_protocol() arguments. This broke IPv6 on bnx2 for instance, on NIC without TCPv6 checksum offloads. Reported-by: NHans de Bruin <jmdebruin@xmsnet.nl> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This reverts commit 0ab03c2b. It breaks several things including the avahi daemon. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 1月, 2011 2 次提交
-
-
由 Eric Dumazet 提交于
Packet filter (BPF) doesnt need to disable softirqs, being fully re-entrant and lock-less. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesse Gross 提交于
In netif_skb_features() we return only the features that are valid for vlans if we have a vlan packet. However, we should not mask out NETIF_F_HW_VLAN_TX since it enables transmission of vlan tags and is obviously valid. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NJesse Gross <jesse@nicira.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert ab35cd4b (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.htmlReported-by: NNeil Jones <neiljay@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 1月, 2011 1 次提交
-
-
由 KOVACS Krisztian 提交于
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but failed to update the #ifdef stanzas guarding the defragmentation related fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c. This patch adds the required #ifdefs so that IPv6 tproxy can truly be used without connection tracking. Original report: http://marc.info/?l=linux-netdev&m=129010118516341&w=2Reported-by: NRandy Dunlap <randy.dunlap@oracle.com> Acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NKOVACS Krisztian <hidden@balabit.hu> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 11 1月, 2011 2 次提交
-
-
由 Eric Dumazet 提交于
HTB takes into account skb is segmented in stats updates. Generalize this to all schedulers. They should use qdisc_bstats_update() helper instead of manipulating bstats.bytes and bstats.packets Add bstats_update() helper too for classes that use gnet_stats_basic_packed fields. Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no stab is setup on qdisc. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Added alloc_netdev_mqs function which allows the number of transmit and receive queues to be specified independenty. alloc_netdev_mq was changed to a macro to call the new function. Also added alloc_etherdev_mqs with same purpose. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-