提交 · 15f1bb1f1e067be7088ed43ef23d59629bd24348 · openanolis / cloud-kernel

30 7月, 2015 23 次提交

qlcnic: Fix corruption while copying · 15f1bb1f

由 Shahed Shaikh 提交于 7月 29, 2015

Use proper typecasting while performing byte-by-byte copy
Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15f1bb1f

act_bpf: fix memory leaks when replacing bpf programs · f4eaed28

由 Daniel Borkmann 提交于 7月 29, 2015

We currently trigger multiple memory leaks when replacing bpf
actions, besides others:

  comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s)
  hex dump (first 32 bytes):
    01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
    18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00  ...m............
  backtrace:
    [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0
    [<ffffffff8120a37a>] __vmalloc+0x4a/0x50
    [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0
    [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0
    [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
    [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0
    [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0
    [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0
    [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
    [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf]
    [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910
    [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240
    [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0
    [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40
    [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0

Issue is that the old content from tcf_bpf is allocated and needs
to be released when we replace it. We seem to do that since the
beginning of act_bpf on the filter and insns, later on the name as
well.

Example test case, after patch:

  # FOO="1,6 0 0 4294967295,"
  # BAR="1,6 0 0 4294967294,"
  # tc actions add action bpf bytecode "$FOO" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
   index 2 ref 1 bind 0
  # tc actions replace action bpf bytecode "$BAR" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
   index 2 ref 1 bind 0
  # tc actions replace action bpf bytecode "$FOO" index 2
  # tc actions show action bpf
   action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
   index 2 ref 1 bind 0
  # tc actions del action bpf index 2
  [...]
  # echo "scan" > /sys/kernel/debug/kmemleak
  # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
  0

Fixes: d23b8ad8 ("tc: add BPF based action")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4eaed28

Merge branch 'thunderx-fixes' · f68b1231

由 David S. Miller 提交于 7月 29, 2015

Aleksey Makarov says:

====================
net: thunderx: Misc fixes

Miscellaneous fixes for the ThunderX VNIC driver

All the patches can be applied individually.
It's ok to drop some if the maintainer feels uncomfortable
with applying for 4.2.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f68b1231

net: thunderx: Fix for crash while BGX teardown · 60f83c89

由 Thanneeru Srinivasulu 提交于 7月 29, 2015

Cortina phy does not have kernel driver and we don't attach
device with phy layer for intefaces like XFI, XLAUI etc,
Hence check for interface type before calling disconnect.
Signed-off-by: NThanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60f83c89

net: thunderx: Add PCI driver shutdown routine · 4adf4351

由 Sunil Goutham 提交于 7月 29, 2015

Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4adf4351

net: thunderx: Fix crash when changing rss with mutliple traffic flows · b49087dd

由 Sunil Goutham 提交于 7月 29, 2015

This fixes a crash when changing rss with multiple traffic flows.

While interface teardown, disable tx queues after all NAPI threads
are done. If done otherwise tx queues might be woken up inside NAPI
if any CQE_TX are processed.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b49087dd

net: thunderx: Set watchdog timeout value · 3d7a8aaa

由 Sunil Goutham 提交于 7月 29, 2015

If a txq (SQ) remains in stopped state after this timeout its
considered as stuck and interface is reinited.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d7a8aaa

net: thunderx: Wakeup TXQ only if CQE_TX are processed · 74840b83

由 Sunil Goutham 提交于 7月 29, 2015

Previously TXQ is wakedup whenever napi is executed
and irrespective of if any CQE_TX are processed or not.
Added 'txq_stop' and 'txq_wake' counters to aid in debugging
if there are any future issues.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74840b83

net: thunderx: Suppress alloc_pages() failure warnings · f8ce9666

由 Sunil Goutham 提交于 7月 29, 2015

Suppressing standard alloc_pages() warnings. Some kernel configs limit
alloc size and the network driver may fail. Do not drop a kernel
warning in this case, instead just drop a oneliner that the network
driver could not be loaded since the buffer could not be allocated.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8ce9666

net: thunderx: Fix TSO packet statistic · 2cb468e0

由 Sunil Goutham 提交于 7月 29, 2015

Fixing TSO packages not being counted.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cb468e0

net: thunderx: Fix memory leak when changing queue count · c62cd3c4

由 Sunil Goutham 提交于 7月 29, 2015

Fix for memory leak when changing queue/channel count via ethtool
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c62cd3c4

net: thunderx: Fix RQ_DROP miscalculation · 32c1b965

由 Sunil Goutham 提交于 7月 29, 2015

With earlier configured value sufficient number of CQEs are not
being reserved for transmitted packets. Hence under heavy incoming
traffic load, receive notifications will take away most of the CQ
thus transmit notifications will be lost resulting in tx skbs not
being freed.

Finally SQ will be full and it will be stopped, watchdog timer
will kick in. After this fix receive notifications will not take
morethan half of CQ reserving the rest for transmit notifications.

Also changed CQ & SQ sizes from 16k to 4k.
This is also due to the receive notifications taking first half of
CQ under heavy load and time taken by NAPI to clear transmit notifications
will increase with higher queue sizes. Again results in SQ being stopped.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32c1b965

net: thunderx: Fix memory leak while tearing down interface · 143ceb0b

由 Sunil Goutham 提交于 7月 29, 2015

Fixed 'tso_hdrs' memory not being freed properly.
Also fixed SQ skbuff maintenance issues.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

143ceb0b

net: thunderx: Fix data integrity issues with LDWB · 4b561c17

由 Sunil Goutham 提交于 7月 29, 2015

Switching back to LDD transactions from LDWB.

While transmitting packets out with LDWB transactions
data integrity issues are seen very frequently.
hence switching back to LDD.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NRobert Richter <rrichter@cavium.com>
Signed-off-by: NAleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b561c17

ipv6: flush nd cache on IFF_NOARP change · c8507fb2

由 Eric Dumazet 提交于 7月 29, 2015

This patch is the IPv6 equivalent of commit
6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")

Without it, we keep buggy neighbours in the cache, with destination
MAC address equal to our own MAC address.

Tested:
 tcpdump -i eth0 -s 0 ip6 -n -e &
 ip link set dev eth0 arp off
 ping6 remote   // sends buggy frames
 ip link set dev eth0 arp on
 ping6 remote   // should work once kernel is patched
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NMario Fanelli <mariofanelli@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8507fb2

Merge branch 'netcp-fixes' · b2428f94

由 David S. Miller 提交于 7月 29, 2015

Murali Karicheri says:

====================
net: netcp: bug fixes for dynamic module support

This series fixes few bugs to allow keystone netcp modules to be
dynamically loaded and removed. Currently it allows following
sequence multiple times

 insmod cpsw_ale.ko
 insmod davinci_mdio.ko
 insmod keystone_netcp.ko
 insmod keystone_netcp_ethss.ko
 ifup eth0
 ifup eth1
 ping <hosts on eth0>
 ping <hosts on eth1>
 ifdown eth1
 ifdown eth0
 rmmod keystone_netcp_ethss.ko
 rmmod keystone_netcp.ko
 rmmod davinci_mdio.ko
 rmmod cpsw_ale.ko
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2428f94

net: netcp: ethss: cleanup gbe_probe() and gbe_remove() functions · 31a184b7

由 Karicheri, Muralidharan 提交于 7月 28, 2015

This patch clean up error handle code to use goto label properly. In some
cases, the code unnecessarily use goto instead of just returning the error
code.  Code also make explicit calls to devm_* APIs on error which is
not necessary. In the gbe_remove() also it makes similar calls which is
also unnecessary.

Also fix few checkpatch warnings
Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31a184b7

net: netcp: ethss: fix up incorrect use of list api · c20afae7

由 Karicheri, Muralidharan 提交于 7月 28, 2015

The code seems to assume a null is returned when the list is empty
from first_sec_slave() to break the loop which is incorrect. Fix the
code by using list_empty().
Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c20afae7

net: netcp: fix cleanup interface list in netcp_remove() · 01a03099

由 Karicheri, Muralidharan 提交于 7月 28, 2015

Currently if user do rmmod keystone_netcp.ko following warning is
seen :-

[   59.035891] ------------[ cut here ]------------
[   59.040535] WARNING: CPU: 2 PID: 1619 at drivers/net/ethernet/ti/
netcp_core.c:2127 netcp_remove)

This is because the interface list is not cleaned up in netcp_remove.
This patch fixes this. Also fix some checkpatch related warnings.
Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01a03099

ebpf, x86: fix general protection fault when tail call is invoked · 2482abb9

由 Daniel Borkmann 提交于 7月 28, 2015

With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
the following general protection fault out of an eBPF program with a simple
tail call, f.e. tracex5 (or a stripped down version of it):

  [  927.097918] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
  [...]
  [  927.100870] task: ffff8801f228b780 ti: ffff880016a64000 task.ti: ffff880016a64000
  [  927.102096] RIP: 0010:[<ffffffffa002440d>]  [<ffffffffa002440d>] 0xffffffffa002440d
  [  927.103390] RSP: 0018:ffff880016a67a68  EFLAGS: 00010006
  [  927.104683] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000001
  [  927.105921] RDX: 0000000000000000 RSI: ffff88014e438000 RDI: ffff880016a67e00
  [  927.107137] RBP: ffff880016a67c90 R08: 0000000000000000 R09: 0000000000000001
  [  927.108351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880016a67e00
  [  927.109567] R13: 0000000000000000 R14: ffff88026500e460 R15: ffff880220a81520
  [  927.110787] FS:  00007fe7d5c1f740(0000) GS:ffff880265000000(0000) knlGS:0000000000000000
  [  927.112021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  927.113255] CR2: 0000003e7bbb91a0 CR3: 000000006e04b000 CR4: 00000000001407e0
  [  927.114500] Stack:
  [  927.115737]  ffffc90008cdb000 ffff880016a67e00 ffff88026500e460 ffff880220a81520
  [  927.117005]  0000000100000000 000000000000001b ffff880016a67aa8 ffffffff8106c548
  [  927.118276]  00007ffcdaf22e58 0000000000000000 0000000000000000 ffff880016a67ff0
  [  927.119543] Call Trace:
  [  927.120797]  [<ffffffff8106c548>] ? lookup_address+0x28/0x30
  [  927.122058]  [<ffffffff8113d176>] ? __module_text_address+0x16/0x70
  [  927.123314]  [<ffffffff8117bf0e>] ? is_ftrace_trampoline+0x3e/0x70
  [  927.124562]  [<ffffffff810c1a0f>] ? __kernel_text_address+0x5f/0x80
  [  927.125806]  [<ffffffff8102086f>] ? print_context_stack+0x7f/0xf0
  [  927.127033]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.128254]  [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
  [  927.129461]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.130654]  [<ffffffff8119ee4a>] trace_call_bpf+0x8a/0x140
  [  927.131837]  [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
  [  927.133015]  [<ffffffff8119f008>] kprobe_perf_func+0x28/0x220
  [  927.134195]  [<ffffffff811a1668>] kprobe_dispatcher+0x38/0x60
  [  927.135367]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.136523]  [<ffffffff81061400>] kprobe_ftrace_handler+0xf0/0x150
  [  927.137666]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.138802]  [<ffffffff8117950c>] ftrace_ops_recurs_func+0x5c/0xb0
  [  927.139934]  [<ffffffffa022b0d5>] 0xffffffffa022b0d5
  [  927.141066]  [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
  [  927.142199]  [<ffffffff81174b95>] seccomp_phase1+0x5/0x230
  [  927.143323]  [<ffffffff8102c0a4>] syscall_trace_enter_phase1+0xc4/0x150
  [  927.144450]  [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
  [  927.145572]  [<ffffffff8102c0a4>] ? syscall_trace_enter_phase1+0xc4/0x150
  [  927.146666]  [<ffffffff817f9a9f>] tracesys+0xd/0x44
  [  927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 21 83
                       c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 00 74
                       0a <48> 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 fd ff
                       ff 4c
  [  927.150046] RIP  [<ffffffffa002440d>] 0xffffffffa002440d

The code section with the instructions that traps points into the eBPF JIT
image of the root program (the one invoking the tail call instruction).

Using bpf_jit_disasm -o on the eBPF root program image:

  [...]
  4e:   mov    -0x204(%rbp),%eax
        8b 85 fc fd ff ff
  54:   cmp    $0x20,%eax               <--- if (tail_call_cnt > MAX_TAIL_CALL_CNT)
        83 f8 20
  57:   ja     0x000000000000007a
        77 21
  59:   add    $0x1,%eax                <--- tail_call_cnt++
        83 c0 01
  5c:   mov    %eax,-0x204(%rbp)
        89 85 fc fd ff ff
  62:   lea    -0x80(%rsi,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 44 d6 80
  67:   mov    (%rax),%rax
        48 8b 00
  6a:   cmp    $0x0,%rax                <--- check for NULL
        48 83 f8 00
  6e:   je     0x000000000000007a
        74 0a
  70:   mov    0x20(%rax),%rax          <--- GPF triggered here! fetch of bpf_func
        48 8b 40 20                              [ matches <48> 8b 40 20 ... from above ]
  74:   add    $0x33,%rax               <--- prologue skip of new prog
        48 83 c0 33
  78:   jmpq   *%rax                    <--- jump to new prog insns
        ff e0
  [...]

The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
jump to map slot 0 is pointing to a poisoned page. The issue is the following:

lea instruction has a wrong offset, i.e. it should be ...

  lea    0x80(%rsi,%rdx,8),%rax

... but it actually seems to be ...

  lea   -0x80(%rsi,%rdx,8),%rax

... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
to be positive instead of negative. Disassembling the interpreter, we btw
similarly do:

  [...]
  c88:  lea     0x80(%rax,%rdx,8),%rax  <--- prog = array->prog[index]
        48 8d 84 d0 80 00 00 00
  c90:  add     $0x1,%r13d
        41 83 c5 01
  c94:  mov     (%rax),%rax
        48 8b 00
  [...]

Now the other interesting fact is that this panic triggers only when things
like CONFIG_LOCKDEP are being used. In that case offsetof(struct bpf_array,
prog) starts at offset 0x80 and in non-CONFIG_LOCKDEP case at offset 0x50.
Reason is that the work_struct inside struct bpf_map grows by 48 bytes in my
case due to the lockdep_map member (which also has CONFIG_LOCK_STAT enabled
members).

Changing the emitter to always use the 4 byte displacement in the lea
instruction fixes the panic on my side. It increases the tail call instruction
emission by 3 more byte, but it should cover us from various combinations
(and perhaps other future increases on related structures).

After patch, disassembly:

  [...]
  9e:   lea    0x80(%rsi,%rdx,8),%rax   <--- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
        48 8d 84 d6 80 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

  [...]
  9e:   lea    0x50(%rsi,%rdx,8),%rax   <--- No CONFIG_LOCKDEP
        48 8d 84 d6 50 00 00 00
  a6:   mov    (%rax),%rax
        48 8b 00
  [...]

Fixes: b52f00e6 ("x86: bpf_jit: implement bpf_tail_call() helper")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2482abb9

bridge: mdb: fix delmdb state in the notification · 7ae90a4f

由 Nikolay Aleksandrov 提交于 7月 28, 2015

Since mdb states were introduced when deleting an entry the state was
left as it was set in the delete request from the user which leads to
the following output when doing a monitor (for example):
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
^^^
Note the "temp" state in the delete notification which is wrong since
the entry was permanent, the state in a delete is always reported as
"temp" regardless of the real state of the entry.

After this patch:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

There's one important note to make here that the state is actually not
matched when doing a delete, so one can delete a permanent entry by
stating "temp" in the end of the command, I've chosen this fix in order
not to break user-space tools which rely on this (incorrect) behaviour.

So to give an example after this patch and using the wrong state:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

Note the state of the entry that got deleted is correct in the
notification.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ae90a4f

bridge: mcast: give fast leave precedence over multicast router and querier · 544586f7

由 Satish Ashok 提交于 7月 28, 2015

When fast leave is configured on a bridge port and an IGMP leave is
received for a group, the group is not deleted immediately if there is
a router detected or if multicast querier is configured.
Ideally the group should be deleted immediately when fast leave is
configured.
Signed-off-by: NSatish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

544586f7

bridge: Fix network header pointer for vlan tagged packets · df356d5e

由 Toshiaki Makita 提交于 7月 28, 2015

There are several devices that can receive vlan tagged packets with
CHECKSUM_PARTIAL like tap, possibly veth and xennet.
When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
by bridge to a device with the IP_CSUM feature, they end up with checksum
error because before entering bridge, the network header is set to
ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.

Since the network header is exepected to be ETH_HLEN in flow-dissection
and hash-calculation in RPS in rx path, and since the header pointer fix
is needed only in tx path, set the appropriate network header on forwarding
packets.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df356d5e

29 7月, 2015 4 次提交

packet: tpacket_snd(): fix signed/unsigned comparison · dbd46ab4

由 Alexander Drozdov 提交于 7月 28, 2015

tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:

tp_len > dev->mtu + dev->hard_header_len

as dev->mtu and dev->hard_header_len are both unsigned.

That may lead to just returning an incorrect EMSGSIZE errno
to the user.

Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbd46ab4

arp: filter NOARP neighbours for SIOCGARP · 11c91ef9

由 Eric Dumazet 提交于 7月 27, 2015

When arp is off on a device, and ioctl(SIOCGARP) is queried,
a buggy answer is given with MAC address of the device, instead
of the mac address of the destination/gateway.

We filter out NUD_NOARP neighbours for /proc/net/arp,
we must do the same for SIOCGARP ioctl.

Tested:

lpaa23:~# ./arp 10.246.7.190
MAC=00:01:e8:22:cb:1d      // correct answer

lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# cat /proc/net/arp   # check arp table is now 'empty'
IP address       HW type     Flags       HW address    Mask     Device
lpaa23:~# ./arp 10.246.7.190
MAC=00:1a:11:c3:0d:7f   // buggy answer before patch (this is eth0 mac)

After patch :

lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# ./arp 10.246.7.190
ioctl(SIOCGARP) failed: No such device or address
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NVytautas Valancius <valas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11c91ef9

net/ipv4: suppress NETDEV_UP notification on address lifetime update · 865b8042

由 David Ward 提交于 7月 26, 2015

This notification causes the FIB to be updated, which is not needed
because the address already exists, and more importantly it may undo
intentional changes that were made to the FIB after the address was
originally added. (As a point of comparison, when an address becomes
deprecated because its preferred lifetime expired, a notification on
this chain is not generated.)

The motivation for this commit is fixing an incompatibility between
DHCP clients which set and update the address lifetime according to
the lease, and a commercial VPN client which replaces kernel routes
in a way that outbound traffic is sent only through the tunnel (and
disconnects if any further route changes are detected via netlink).
Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

865b8042

bridge: stp: when using userspace stp stop kernel hello and hold timers · 76b91c32

由 Nikolay Aleksandrov 提交于 7月 23, 2015

These should be handled only by the respective STP which is in control.
They become problematic for devices with limited resources with many
ports because the hold_timer is per port and fires each second and the
hello timer fires each 2 seconds even though it's global. While in
user-space STP mode these timers are completely unnecessary so it's better
to keep them off.
Also ensure that when the bridge is up these timers are started only when
running with kernel STP.
Signed-off-by: NSatish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76b91c32

28 7月, 2015 4 次提交

packet: missing dev_put() in packet_do_bind() · 158cd4af

由 Lars Westerhoff 提交于 7月 28, 2015

When binding a PF_PACKET socket, the use count of the bound interface is
always increased with dev_hold in dev_get_by_{index,name}. However,
when rebound with the same protocol and device as in the previous bind
the use count of the interface was not decreased. Ultimately, this
caused the deletion of the interface to fail with the following message:

unregister_netdevice: waiting for dummy0 to become free. Usage count = 1

This patch moves the dev_put out of the conditional part that was only
executed when either the protocol or device changed on a bind.

Fixes: 902fefb8 ('packet: improve socket create/bind latency in some cases')
Signed-off-by: NLars Westerhoff <lars.westerhoff@newtec.eu>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

158cd4af

macvtap: fix network header pointer for VLAN tagged pkts · c5c62f1b

由 Ivan Vecera 提交于 7月 23, 2015

Network header is set with offset ETH_HLEN but it is not true for VLAN
(multiple-)tagged and results in checksum issues in lower devices.

v2: leave skb->protocol untouched (thx Vlad), comment added
v3: moved after skb_probe_transport_header() call (thx Toshiaki)
Signed-off-by: NIvan Vecera <ivecera@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5c62f1b

fib_trie: Drop unnecessary calls to leaf_pull_suffix · 1513069e

由 Alexander Duyck 提交于 7月 27, 2015

It was reported that update_suffix was taking a long time on systems where
a large number of leaves were attached to a single node.  As it turns out
fib_table_flush was calling update_suffix for each leaf that didn't have all
of the aliases stripped from it.  As a result, on this large node removing
one leaf would result in us calling update_suffix for every other leaf on
the node.

The fix is to just remove the calls to leaf_pull_suffix since they are
redundant as we already have a call in resize that will go through and
update the suffix length for the node before we exit out of
fib_table_flush or fib_table_flush_external.
Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1513069e

macb: Fix build with macro'ized readl/writel. · 7a6e0706

由 David S. Miller 提交于 7月 27, 2015

If an architecture defines readl/writel using CPP macros, we
get the following kinds of build failure:

> > > drivers/net/ethernet/cadence/macb.c:164:1: error: macro "writel"
> > > passed 3 arguments, but takes just 2
>      macb_or_gem_writel(bp, SA1B, bottom);
>     ^

Rename the methods so that this doesn't happen.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a6e0706

27 7月, 2015 9 次提交

net: fec: Ensure clocks are enabled while using mdio bus · 8fff755e

由 Andrew Lunn 提交于 7月 25, 2015

When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the IPG clock is not enabled, MDIO
reads/writes will simply time out.

Add support for runtime PM to control this clock. Enable/disable this
clock using runtime PM, with open()/close() and mdio read()/write()
function triggering runtime PM operations. Since PM is optional, the
IPG clock is enabled at probe and is no longer modified by
fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
guaranteed the clock is running when MDIO operations are performed.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
Cc: tyler.baker@linaro.org
Cc: fabio.estevam@freescale.com
Cc: shawn.guo@linaro.org
Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
Tested-by: NTyler Baker <tyler.baker@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fff755e

net: netcp: Fixes SGMII reset on network interface shutdown · 7025e88a

由 WingMan Kwok 提交于 7月 24, 2015

This patch asserts SGMII RTRESET, i.e. resetting the SGMII Tx/Rx
logic,  during network interface shutdown to avoid having the
hardware wedge when shutting down with high incoming traffic rates.
This is cleared (brought out of RTRESET) when the interface is
brought back up.
Signed-off-by: NWingMan Kwok <w-kwok2@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7025e88a

Merge branch 'macb-fixes' · 54109da3

由 David S. Miller 提交于 7月 27, 2015

Andy Shevchenko says:

====================
net/macb: fix for AVR32 and clean up

It seems no one had tested recently the driver on AVR32 platforms such as
ATNGW100. This series bring it back to work.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54109da3

net/macb: convert to kernel doc · e018a0cc

由 Andy Shevchenko 提交于 7月 24, 2015

This patch coverts struct description to the kernel doc format. There is no
functional change.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e018a0cc

net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP() · 94b295ed

由 Andy Shevchenko 提交于 7月 24, 2015

macb_count_tx_descriptors() repeats the generic macro DIV_ROUND_UP(). The patch
does a replacement.

There is no functional change.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94b295ed

net/macb: suppress compiler warnings · 8bcbf82f

由 Andy Shevchenko 提交于 7月 24, 2015

This patch fixes the following warnings:
drivers/net/ethernet/cadence/macb.c: In function ‘macb_handle_link_change’:
drivers/net/ethernet/cadence/macb.c:266: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c:267: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c:291: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_update_stats’:
drivers/net/ethernet/cadence/macb.c:1908: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_get_ethtool_strings’:
drivers/net/ethernet/cadence/macb.c:1988: warning: comparison between signed and unsigned
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bcbf82f

net/macb: use dev_*() when netdev is not yet registered · a35919e1

由 Andy Shevchenko 提交于 7月 24, 2015

To avoid messages like

macb macb.0 (unnamed net_device) (uninitialized): Cadence caps 0x00000000
macb macb.0 (unnamed net_device) (uninitialized): invalid hw address, using random

let's use dev_*() macros.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a35919e1

net/macb: check if macb_config present · f36dbe6a

由 Andy Shevchenko 提交于 7月 24, 2015

The commit 98b5a0f4 introduces jumbo frame support, but also it assumes
that macb_config present which is not always true.

The configuration without macb_config fails to boot.

 Unable to handle kernel NULL pointer dereference at virtual address 00000010
 ptbr = 90350000 pgd = 00000000
 Oops: Kernel access of bad area, sig: 11 [#1]
 FRAME_POINTER chip: 0x01f:0x1e82 rev 2
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-next-20150723+ #13
 task: 91c26000 ti: 91c28000 task.ti: 91c28000
 PC is at macb_probe+0x140/0x61c

Fixes: 98b5a0f4 (net: macb: Add support for jumbo frames)
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f36dbe6a

net/macb: improve big endian CPU support · f2ce8a9e

由 Andy Shevchenko 提交于 7月 24, 2015

The commit a50dad35 (net: macb: Add big endian CPU support) converted I/O
accessors to readl_relaxed() and writel_relaxed() and consequentially broke
MACB driver on AVR32 platforms such as ATNGW100.

This patch improves I/O access by checking endiannes first and use the
corresponding methods.

Fixes: a50dad35 (net: macb: Add big endian CPU support)
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2ce8a9e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功