提交 · 2787b04b6c5e7607510e8248b38b0aeacb5505f6 · openanolis / cloud-kernel

15 8月, 2012 6 次提交

packet: Introduce net/packet/internal.h header · 2787b04b

由 Pavel Emelyanov 提交于 8月 13, 2012

The diag module will need to access some private packet_sock data, so
move it to a header in advance. This file will be shared between the
af_packet.c and the diag.c
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2787b04b

net: ipv4: fib_trie: Don't unnecessarily search for already found fib leaf · ad5b3102

由 Igor Maravic 提交于 8月 13, 2012

We've already found leaf, don't search for it again. Same is for fib leaf info.
Signed-off-by: NIgor Maravic <igorm@etf.rs>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad5b3102

Replace rwlock on xfrm_policy_afinfo with rcu · 418a99ac

由 Priyanka Jain 提交于 8月 12, 2012

xfrm_policy_afinfo is read mosly data structure.
Write on xfrm_policy_afinfo is done only at the
time of configuration.
So rwlocks can be safely replaced with RCU.

RCUs usage optimizes the performance.
Signed-off-by: NPriyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

418a99ac

gre: Support GRE over IPv6 · c12b395a

由 xeb@mail.ru 提交于 8月 10, 2012

GRE over IPv6 implementation.
Signed-off-by: NDmitry Kozlov <xeb@mail.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c12b395a

net: remove netdev_bonding_change() · b7bc2a5b

由 Amerigo Wang 提交于 8月 09, 2012

I don't see any benifits to use netdev_bonding_change() than
using call_netdevice_notifiers() directly.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7bc2a5b

net: move and rename netif_notify_peers() · ee89bab1

由 Amerigo Wang 提交于 8月 09, 2012

I believe net/core/dev.c is a better place for netif_notify_peers(),
because other net event notify functions also stay in this file.

And rename it to netdev_notify_peers().

Cc: David S. Miller <davem@davemloft.net>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee89bab1

10 8月, 2012 7 次提交

hyperv: Add comments for the extended buffer after RNDIS message · 0f48917b

由 Haiyang Zhang 提交于 8月 09, 2012

Reported-by: NOlaf Hering <olaf@aepfle.de>
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f48917b

net: Loopback ifindex is constant now · 1fb9489b

由 Pavel Emelyanov 提交于 8月 08, 2012

As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fb9489b

net: Make ifindex generation per-net namespace · aa79e66e

由 Pavel Emelyanov 提交于 8月 08, 2012

Strictly speaking this is only _really_ required for checkpoint-restore to
make loopback device always have the same index.

This change appears to be safe wrt "ifindex should be unique per-system"
concept, as all the ifindex usage is either already made per net namespace
of is explicitly limited with init_net only.

There are two cool side effects of this. The first one -- ifindices of
devices in container are always small, regardless of how many containers
we've started (and re-started) so far. The second one is -- we can speed
up the loopback ifidex access as shown in the next patch.

v2: Place ifindex right after dev_base_seq : avoid two holes and use the
same cache line, dirtied in list_netdevice()/unlist_netdevice()
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa79e66e

veth: Allow to create peer link with given ifindex · e6f8f1a7

由 Pavel Emelyanov 提交于 8月 08, 2012

The ifinfomsg is in there (thanks kaber@ for foreseeing this long time ago),
so take the given ifidex and register netdev with it.

Ben noticed, that this code path previously ignored ifmp->ifi_index and
userland could be passing in garbage. Thus it may now fail occasionally
because the value clashes with an existing interface.

To address this it's assumed that if the caller specifies the ifindex for
the veth master device, then it's aware of this possibility and should
explicitly specify (or set to 0 for auto-assignment) the peer's ifindex as
well. With this the compatibility with old tools not setting ifindex is
preserved.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f8f1a7

net: Allow to create links with given ifindex · 9c7dafbf

由 Pavel Emelyanov 提交于 8月 08, 2012

Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index
is not zero. I propose to allow requesting ifindices on link creation. This
is required by the checkpoint-restore to correctly restore a net namespace
(i.e. -- a container).
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c7dafbf

net: Dont use ifindices in hash fns · b14f243a

由 Pavel Emelyanov 提交于 8月 08, 2012

Eric noticed, that when there will be devices with equal indices, some
hash functions that use them will become less effective as they could.
Fix this in advance by mixing the net_device address into the hash value
instead of the device index.

This is true for arp and ndisc hash fns. The netlabel, can and llc ones
are also ifindex-based, but that three are init_net-only, thus will not
be affected.

Many thanks to David and Eric for the hash32_ptr implementation!
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b14f243a

time: jiffies_delta_to_clock_t() helper to the rescue · a399a805

由 Eric Dumazet 提交于 8月 08, 2012

Various /proc/net files sometimes report crazy timer values, expressed
in clock_t units.

This happens when an expired timer delta (expires - jiffies) is passed
to jiffies_to_clock_t().

This function has an overflow in :

return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);

commit cbbc719f (time: Change jiffies_to_clock_t() argument type
to unsigned long) only got around the problem.

As we cant output negative values in /proc/net/tcp without breaking
various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
that caps the negative delta to a 0 value.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NMaciej Żenczykowski <maze@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: hank <pyu@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a399a805

08 8月, 2012 4 次提交

fib: use __fls() on non null argument · 79cda75a

由 Eric Dumazet 提交于 8月 07, 2012

__fls(x) is a bit faster than fls(x), granted we know x is non null.

As Ben Hutchings pointed out, fls(x) = __fls(x) + 1
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79cda75a

net: output path optimizations · 425f09ab

由 Eric Dumazet 提交于 8月 07, 2012

1) Avoid dirtying neighbour's confirmed field.

  TCP workloads hits this cache line for each incoming ACK.
  Lets write n->confirmed only if there is a jiffie change.

2) Optimize neigh_hh_output() for the common Ethernet case, were
   hh_len is less than 16 bytes. Replace the memcpy() call
   by two inlined 64bit load/stores on x86_64.

Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)

24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.

before : 2.247s, 2.235s, 2.247s, 2.318s
after  : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

425f09ab

documentation: dt: bindings: cpsw: fixing the examples for directly using it in dts file · e07b94f1

由 Mugunthan V N 提交于 8月 06, 2012

Fixing the cpsw device tree example to make it simpler to copy pastable to dts
file and use it directly.
Signed-off-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e07b94f1

drivers: net: ethernet: davince_mdio: device tree implementation · ec03e6a8

由 Mugunthan V N 提交于 8月 06, 2012

device tree implementation for davinci mdio driver
Signed-off-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec03e6a8

07 8月, 2012 4 次提交

tcp: ecn: dont delay ACKS after CE · aae06bf5

由 Eric Dumazet 提交于 8月 06, 2012

While playing with CoDel and ECN marking, I discovered a
non optimal behavior of receiver of CE (Congestion Encountered)
segments.

In pathological cases, sender has reduced its cwnd to low values,
and receiver delays its ACK (by 40 ms).

While RFC 3168 6.1.3 (The TCP Receiver) doesn't explicitly recommend
to send immediate ACKS, we believe its better to not delay ACKS, because
a CE segment should give same signal than a dropped segment, and its
quite important to reduce RTT to give ECE/CWR signals as fast as
possible.

Note we already call tcp_enter_quickack_mode() from TCP_ECN_check_ce()
if we receive a retransmit, for the same reason.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aae06bf5

net: tcp: GRO should be ECN friendly · a9e050f4

由 Eric Dumazet 提交于 8月 05, 2012

While doing TCP ECN tests, I discovered GRO was reordering packets if it
receives one packet with CE set, while previous packets in same NAPI run
have ECT(0) for the same flow :

09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags
[DF], proto TCP (6), length 4396)
    172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 4344

09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF],
proto TCP (6), length 1500)
    172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 1448

09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF],
proto TCP (6), length 64)
    172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f
(incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val
1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0

We have two problems here :

1) GRO reorders packets

  If NIC gave packet1, then packet2, which happen to be from "different
flows"  GRO feeds stack with packet2, then packet1. I have yet to
understand how to solve this problem.

2) GRO is not ECN friendly

Delivering packets out of order makes TCP stack not as fast as it could
be.

In this patch I suggest we make the tos test not part of the 'same_flow'
determination, but part of the 'should flush' logic
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9e050f4

net: reorganize IP MIB values · 14a19680

由 Eric Dumazet 提交于 8月 04, 2012

Reduce IP latencies by placing hot MIB IP fields in a single cache line.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14a19680

net: avoid reloads in SNMP_UPD_PO_STATS · d25398df

由 Eric Dumazet 提交于 8月 04, 2012

Avoid two instructions to reload dev->nd_net->mib.ip_statistics pointer,
unsing a temp variable, in ip_rcv(), ip_output() paths for example.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d25398df

04 8月, 2012 19 次提交

ipv4: Introduce IN_DEV_NET_ROUTE_LOCALNET · 9eb43e76

由 Eric Dumazet 提交于 8月 03, 2012

performance profiles show a high cost in the IN_DEV_ROUTE_LOCALNET()
call done in ip_route_input_slow(), because of multiple dereferences,
even if cache lines are clean and available in cpu caches.

Since we already have the 'net' pointer, introduce
IN_DEV_NET_ROUTE_LOCALNET() macro avoiding two dereferences
(dev_net(in_dev->dev))

Also change the tests to use IN_DEV_NET_ROUTE_LOCALNET() only if saddr
or/and daddr are loopback addresse.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eb43e76

ipv4: change inet_addr_hash() · 40384999

由 Eric Dumazet 提交于 8月 03, 2012

Use net_hash_mix(net) instead of hash_ptr(net, 8), and use
hash_32() instead of using a serie of XOR

Define IN4_ADDR_HSIZE_SHIFT for clarity

__ip_dev_find() can perform the net_eq() call only if ifa_local
matches the key, to avoid unneeded dereferences.

remove inline attributes

# size net/ipv4/devinet.o.before net/ipv4/devinet.o
   text	   data	    bss	    dec	    hex	filename
  17471	   2545	   2048	  22064	   5630	net/ipv4/devinet.o.before
  17335	   2545	   2048	  21928	   55a8	net/ipv4/devinet.o
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40384999

net: skb_share_check() should use consume_skb() · 47061bc4

由 Eric Dumazet 提交于 8月 03, 2012

In order to avoid false drop_monitor indications, we should
call consume_skb() if skb_clone() was successful.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47061bc4

D

Merge branch 'master' of git://kernel.ubuntu.com/rtg/net-next · cc72d100
由 David S. Miller 提交于 8月 03, 2012

cc72d100

drivers: net: ethernet: cpsw: Add device tree support to CPSW · 2eb32b0a

由 Mugunthan V N 提交于 7月 30, 2012

This patch adds device tree support for cpsw driver
Signed-off-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2eb32b0a

drivers: net: ethernet: cpsw: Add SOC dependency support for cpsw dependent modules · f07454fe

由 Mugunthan V N 提交于 7月 30, 2012

cpsw is dependent on davinci_cpdma and davinci_mdio, so adding SOC support for
dependent modules
Signed-off-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f07454fe

ppp: add 64 bit stats · e51f6ff3

由 Kevin Groeneveld 提交于 7月 27, 2012

Add 64 bit stats to ppp driver.  The 64 bit stats include tx_bytes,
rx_bytes, tx_packets and rx_packets.  Other stats are still 32 bit.
The 64 bit stats can be retrieved via the ndo_get_stats operation.  The
SIOCGPPPSTATS ioctl is still 32 bit stats only.
Signed-off-by: NKevin Groeneveld <kgroeneveld@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e51f6ff3

team: add support for queue override by setting queue_id for port · 8ff5105a

由 Jiri Pirko 提交于 7月 27, 2012

Similar to what bonding has. This allows to set queue_id for port so
this port will be used when skb with matching skb->queue_mapping is
going to be transmitted.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ff5105a

team: add per port priority option · a86fc6b7

由 Jiri Pirko 提交于 7月 27, 2012

Allow userspace to set port priority.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a86fc6b7

team: add signed 32-bit team option type · 69821638

由 Jiri Pirko 提交于 7月 27, 2012

Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69821638

netlink: add signed types · 4778e0be

由 Jiri Pirko 提交于 7月 27, 2012

Signed types might be needed in NL communication from time to time
(I need s32 in team driver), so add them.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4778e0be

cris: fix eth_v10.c build error · ff6e1225

由 Randy Dunlap 提交于 8月 03, 2012

Fix build error on cris (not tested, no toolchain here):

drivers/net/cris/eth_v10.c: error: too many arguments to function 'e100rxtx_interrupt'
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc:	Mikael Starvik <starvik@axis.com>
Cc:	Jesper Nilsson <jesper.nilsson@axis.com>
Cc:	linux-cris-kernel@axis.com
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff6e1225

cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN · f3a1ef9c

由 Peter Meiser 提交于 8月 02, 2012

Hello,

looking at http://sourceforge.net/apps/mediawiki/mbm/index.php?title=Main_Page#Supported_devices, there are branded Ericsson devices from Dell and Toshiba.

The to-be-added vendor IDs are 0x413c for Dell and 0x0930 for Toshiba.

Please find attached a patch to add these vendor IDs.
Signed-off-by: NPeter Meiser <meiser@gmx-topmail.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3a1ef9c

isdnloop: fix and simplify isdnloop_init() · 77f00f63

由 Wu Fengguang 提交于 8月 02, 2012

Fix a buffer overflow bug by removing the revision and printk.

[   22.016214] isdnloop-ISDN-driver Rev 1.11.6.7
[   22.097508] isdnloop: (loop0) virtual card added
[   22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972
[   22.174400]
[   22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb13-dirty #129
[   22.624071] Call Trace:
[   22.720558]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[   22.815248]  [<ffffffff8222b623>] panic+0x110/0x329
[   22.914330]  [<ffffffff83244972>] ? isdnloop_init+0xaf/0xb1
[   23.014800]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[   23.090763]  [<ffffffff8108e24b>] __stack_chk_fail+0x2b/0x30
[   23.185748]  [<ffffffff83244972>] isdnloop_init+0xaf/0xb1
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77f00f63

hyperv: Move wait completion msg code into rndis_filter_halt_device() · ae9e63bb

由 Haiyang Zhang 提交于 8月 03, 2012

We need to wait for send_completion msg before put_rndis_request() at
the end of rndis_filter_halt_device(). Otherwise, netvsc_send_completion()
may reference freed memory which is overwritten, and cause panic.
Reported-by: NLong Li <longli@microsoft.com>
Reported-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae9e63bb

net/mlx4_core: Remove port type restrictions · 2207b60f

由 Yevgeny Petrilin 提交于 8月 03, 2012

Port1=Eth, Port2=IB restriction is no longer required.
Having RoCE, there will always rdma port initialized over ConnectX
physical port, no matter whether the link layer is IB or Ethernet.
So we always have dual port IB device.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2207b60f

net/mlx4_en: Fixing TX queue stop/wake flow · c18520bd

由 Yevgeny Petrilin 提交于 8月 03, 2012

Removing the ring->blocked flag, it is redundant and leads to a race:

We close the TX queue and then set the "blocked" flag.
Between those 2 operations the completion function can check the "blocked"
flag, sees that it is 0, and wouldn't open the TX queue.

Using netif_tx_queue_stopped to check the state of the queue to avoid this race.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c18520bd

net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC · c8c40b7f

由 Amir Vadai 提交于 8月 03, 2012

Should NOT check SMAC=DMAC when:
1. loopback is turned on
2. validate_loopback is true.

Fixed it accordingly.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8c40b7f

net_sched: gact: Fix potential panic in tcf_gact(). · 696ecdc1

由 Hiroaki SHIMODA 提交于 8月 03, 2012

gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.

So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.
Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

696ecdc1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功