1. 24 8月, 2016 5 次提交
  2. 23 8月, 2016 3 次提交
    • J
      net sched: fix encoding to use real length · 28a10c42
      Jamal Hadi Salim 提交于
      Encoding of the metadata was using the padded length as opposed to
      the real length of the data which is a bug per specification.
      This has not been an issue todate because all metadatum specified
      so far has been 32 bit where aligned and data length are the same width.
      This also includes a bug fix for validating the length of a u16 field.
      But since there is no metadata of size u16 yes we are fine to include it
      here.
      
      While at it get rid of magic numbers.
      
      Fixes: ef6980b6 ("net sched: introduce IFE action")
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28a10c42
    • S
      net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset · c0451fe1
      Shmulik Ladkani 提交于
      In b8247f09,
      
         "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"
      
      gso skbs arriving from an ingress interface that go through UDP
      tunneling, are allowed to be fragmented if the resulting encapulated
      segments exceed the dst mtu of the egress interface.
      
      This aligned the behavior of gso skbs to non-gso skbs going through udp
      encapsulation path.
      
      However the non-gso vs gso anomaly is present also in the following
      cases of a GRE tunnel:
       - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
         (e.g. OvS vport-gre with df_default=false)
       - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set
      
      In both of the above cases, the non-gso skbs get fragmented, whereas the
      gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
      as they don't go through the segment+fragment code path.
      
      Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.
      
      Tunnels that do set IP_DF, will not go to fragmentation of segments.
      This preserves behavior of ip_gre in (the default) pmtudisc mode.
      
      Fixes: b8247f09 ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
      Reported-by: Nwenxu <wenxu@ucloud.cn>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Tested-by: Nwenxu <wenxu@ucloud.cn>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0451fe1
    • M
      net: ipv6: Remove addresses for failures with strict DAD · 85b51b12
      Mike Manning 提交于
      If DAD fails with accept_dad set to 2, global addresses and host routes
      are incorrectly left in place. Even though disable_ipv6 is set,
      contrary to documentation, the addresses are not dynamically deleted
      from the interface. It is only on a subsequent link down/up that these
      are removed. The fix is not only to set the disable_ipv6 flag, but
      also to call addrconf_ifdown(), which is the action to carry out when
      disabling IPv6. This results in the addresses and routes being deleted
      immediately. The DAD failure for the LL addr is determined as before
      via netlink, or by the absence of the LL addr (which also previously
      would have had to be checked for in case of an intervening link down
      and up). As the call to addrconf_ifdown() requires an rtnl lock, the
      logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().
      
      Previous behavior:
      
      root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
      net.ipv6.conf.eth3.accept_dad = 2
      root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2000::10/64 scope global
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
             valid_lft forever preferred_lft forever
      root@vm1:/# ip -6 route show dev eth3
      2000::/64  proto kernel  metric 256
      fe80::/64  proto kernel  metric 256
      root@vm1:/# ip link set down eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      root@vm1:/# ip -6 route show dev eth3
      root@vm1:/#
      
      New behavior:
      
      root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
      net.ipv6.conf.eth3.accept_dad = 2
      root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      root@vm1:/# ip -6 route show dev eth3
      root@vm1:/#
      Signed-off-by: NMike Manning <mmanning@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85b51b12
  3. 20 8月, 2016 2 次提交
    • G
      l2tp: Fix the connect status check in pppol2tp_getname · 56cff471
      Gao Feng 提交于
      The sk->sk_state is bits flag, so need use bit operation check
      instead of value check.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56cff471
    • M
      sctp: linearize early if it's not GSO · 4c2f2454
      Marcelo Ricardo Leitner 提交于
      Because otherwise when crc computation is still needed it's way more
      expensive than on a linear buffer to the point that it affects
      performance.
      
      It's so expensive that netperf test gives a perf output as below:
      
      Overhead  Command         Shared Object       Symbol
        18,62%  netserver       [kernel.vmlinux]    [k] crc32_generic_shift
         2,57%  netserver       [kernel.vmlinux]    [k] __pskb_pull_tail
         1,94%  netserver       [kernel.vmlinux]    [k] fib_table_lookup
         1,90%  netserver       [kernel.vmlinux]    [k] copy_user_enhanced_fast_string
         1,66%  swapper         [kernel.vmlinux]    [k] intel_idle
         1,63%  netserver       [kernel.vmlinux]    [k] _raw_spin_lock
         1,59%  netserver       [sctp]              [k] sctp_packet_transmit
         1,55%  netserver       [kernel.vmlinux]    [k] memcpy_erms
         1,42%  netserver       [sctp]              [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3016.42   2.88     3.78     1.874   2.462
      
      After patch:
      Overhead  Command         Shared Object      Symbol
         2,75%  netserver       [kernel.vmlinux]   [k] memcpy_erms
         2,63%  netserver       [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
         2,39%  netserver       [kernel.vmlinux]   [k] fib_table_lookup
         2,04%  netserver       [kernel.vmlinux]   [k] __pskb_pull_tail
         1,91%  netserver       [kernel.vmlinux]   [k] _raw_spin_lock
         1,91%  netserver       [sctp]             [k] sctp_packet_transmit
         1,72%  netserver       [mlx4_en]          [k] mlx4_en_process_rx_cq
         1,68%  netserver       [sctp]             [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3681.77   3.83     3.46     2.045   1.849
      
      Fixes: 3acb50c1 ("sctp: delay as much as possible skb_linearize")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2f2454
  4. 19 8月, 2016 1 次提交
  5. 18 8月, 2016 9 次提交
  6. 17 8月, 2016 2 次提交
  7. 16 8月, 2016 3 次提交
    • V
      tipc: fix NULL pointer dereference in shutdown() · d2fbdf76
      Vegard Nossum 提交于
      tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
      call tipc_node_xmit_skb() on it.
      
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
          RIP: 0010:[<ffffffff830bb46b>]  [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140
          RSP: 0018:ffff8800595bfce8  EFLAGS: 00010246
          RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
          RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
          RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
          R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
          R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
          FS:  00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
          DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
          Stack:
           0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
           ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
           0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
          Call Trace:
           [<ffffffff830c84a3>] tipc_shutdown+0x553/0x880
           [<ffffffff825b4a3b>] SyS_shutdown+0x14b/0x170
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
          Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 <80> 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
          RIP  [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140
           RSP <ffff8800595bfce8>
          ---[ end trace 57b0484e351e71f1 ]---
      
      I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
      userspace is equipped to handle that. Anyway, this is better than a GPF
      and looks somewhat consistent with other tipc_msg_create() callers.
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2fbdf76
    • S
      gre: set inner_protocol on xmit · 3d7b3320
      Simon Horman 提交于
      Ensure that the inner_protocol is set on transmit so that GSO segmentation,
      which relies on that field, works correctly.
      
      This is achieved by setting the inner_protocol in gre_build_header rather
      than each caller of that function. It ensures that the inner_protocol is
      set when gre_fb_xmit() is used to transmit GRE which was not previously the
      case.
      
      I have observed this is not the case when OvS transmits GRE using
      lwtunnel metadata (which it always does).
      
      Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol")
      Cc: Pravin Shelar <pshelar@ovn.org>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d7b3320
    • L
      net: ipv6: Fix ping to link-local addresses. · 5e457896
      Lorenzo Colitti 提交于
      ping_v6_sendmsg does not set flowi6_oif in response to
      sin6_scope_id or sk_bound_dev_if, so it is not possible to use
      these APIs to ping an IPv6 address on a different interface.
      Instead, it sets flowi6_iif, which is incorrect but harmless.
      
      Stop setting flowi6_iif, and support various ways of setting oif
      in the same priority order used by udpv6_sendmsg.
      
      Tested: https://android-review.googlesource.com/#/c/254470/Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e457896
  8. 15 8月, 2016 1 次提交
  9. 14 8月, 2016 6 次提交
    • S
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca 提交于
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      952fcfd0
    • M
      net: ipv6: Do not keep IPv6 addresses when IPv6 is disabled · bc561632
      Mike Manning 提交于
      If IPv6 is disabled when the option is set to keep IPv6
      addresses on link down, userspace is unaware of this as
      there is no such indication via netlink. The solution is to
      remove the IPv6 addresses in this case, which results in
      netlink messages indicating removal of addresses in the
      usual manner. This fix also makes the behavior consistent
      with the case of having IPv6 disabled first, which stops
      IPv6 addresses from being added.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: NMike Manning <mmanning@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc561632
    • V
      net/sctp: always initialise sctp_ht_iter::start_fail · 54236ab0
      Vegard Nossum 提交于
      sctp_transport_seq_start() does not currently clear iter->start_fail on
      success, but relies on it being zero when it is allocated (by
      seq_open_net()).
      
      This can be a problem in the following sequence:
      
          open() // allocates iter (and implicitly sets iter->start_fail = 0)
          read()
           - iter->start() // fails and sets iter->start_fail = 1
           - iter->stop() // doesn't call sctp_transport_walk_stop() (correct)
          read() again
           - iter->start() // succeeds, but doesn't change iter->start_fail
           - iter->stop() // doesn't call sctp_transport_walk_stop() (wrong)
      
      We should initialize sctp_ht_iter::start_fail to zero if ->start()
      succeeds, otherwise it's possible that we leave an old value of 1 there,
      which will cause ->stop() to not call sctp_transport_walk_stop(), which
      causes all sorts of problems like not calling rcu_read_unlock() (and
      preempt_enable()), eventually leading to more warnings like this:
      
          BUG: sleeping function called from invalid context at mm/slab.h:388
          in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
          Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
      
           [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
           [<ffffffff83295892>] _raw_spin_lock+0x12/0x40
           [<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
           [<ffffffff82ec665f>] sctp_transport_walk_start+0x2f/0x60
           [<ffffffff82edda1d>] sctp_transport_seq_start+0x4d/0x150
           [<ffffffff81439e50>] traverse+0x170/0x850
           [<ffffffff8143aeec>] seq_read+0x7cc/0x1180
           [<ffffffff814f996c>] proc_reg_read+0xbc/0x180
           [<ffffffff813d0384>] do_loop_readv_writev+0x134/0x210
           [<ffffffff813d2a95>] do_readv_writev+0x565/0x660
           [<ffffffff813d6857>] vfs_readv+0x67/0xa0
           [<ffffffff813d6c16>] do_preadv+0x126/0x170
           [<ffffffff813d710c>] SyS_preadv+0xc/0x10
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83296225>] return_from_SYSCALL_64+0x0/0x6a
           [<ffffffffffffffff>] 0xffffffffffffffff
      
      Notice that this is a subtly different stacktrace from the one in commit
      5fc382d8 ("net/sctp: terminate rhashtable walk correctly").
      
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-By: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54236ab0
    • V
      net/irda: handle iriap_register_lsap() allocation failure · 5ba092ef
      Vegard Nossum 提交于
      If iriap_register_lsap() fails to allocate memory, self->lsap is
      set to NULL. However, none of the callers handle the failure and
      irlmp_connect_request() will happily dereference it:
      
          iriap_register_lsap: Unable to allocated LSAP!
          ================================================================================
          UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
          member access within null pointer of type 'struct lsap_cb'
          CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
          04/01/2014
           0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
           ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
           ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
          Call Trace:
           [<ffffffff82344f40>] dump_stack+0xac/0xfc
           [<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a
           [<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411
           [<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970
           [<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160
           [<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0
           [<ffffffff83b78904>] iriap_do_client_event+0x94/0x120
           [<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0
           [<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630
           [<ffffffff83ba90c8>] irda_connect+0x828/0x12d0
           [<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340
           [<ffffffff833c7e09>] SyS_connect+0x9/0x10
           [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
           [<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25
          ================================================================================
      
      The bug seems to have been around since forever.
      
      There's more problems with missing error checks in iriap_init() (and
      indeed all of irda_init()), but that's a bigger problem that needs
      very careful review and testing. This patch will fix the most serious
      bug (as it's easily reached from unprivileged userspace).
      
      I have tested my patch with a reproducer.
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ba092ef
    • D
      bpf: fix write helpers with regards to non-linear parts · 0ed661d5
      Daniel Borkmann 提交于
      Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
      it's currently defect with regards to skbs when the write_len spans into
      non-linear parts, no matter if cloned or not.
      
      There are multiple issues at once. First, using skb_store_bits() is not
      correct since even if we have a cloned skb, page frags can still be shared.
      To really make them private, we need to pull them in via __pskb_pull_tail()
      first, which also gets us a private head via pskb_expand_head() implicitly.
      
      This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
      bpf_l4_csum_replace(). Really, the only thing reasonable and working here
      is to call skb_ensure_writable() before any write operation. Meaning, via
      pskb_may_pull() it makes sure that parts we want to access are pulled in and
      if not does so plus unclones the skb implicitly. If our write_len still fits
      the headlen and we're cloned and our header of the clone is not writable,
      then we need to make a private copy via pskb_expand_head(). skb_store_bits()
      is a bit misleading and only safe to store into non-linear data in different
      contexts such as 357b40a1 ("[IPV6]: IPV6_CHECKSUM socket option can
      corrupt kernel memory").
      
      For above BPF helper functions, it means after fixed bpf_try_make_writable(),
      we've pulled in enough, so that we operate always based on skb->data. Thus,
      the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
      In bpf_skb_store_bytes(), the len check is unnecessary too since it can
      only pass in maximum of BPF stack size, so adding offset is guaranteed to
      never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
      offset alignment since they use __sum16 pointer for writing resulting csum.
      
      The remaining helpers that change skb data not discussed here yet are
      bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
      vlan helpers internally call either skb_ensure_writable() (pop case) and
      skb_cow_head() (push case, for head expansion), respectively. Similarly,
      bpf_skb_proto_xlat() takes care to not mangle page frags.
      
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Fixes: 91bc4822 ("tc: bpf: add checksum helpers")
      Fixes: 3697649f ("bpf: try harder on clones when writing into skb")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ed661d5
    • C
      calipso: fix resource leak on calipso_genopt failure · b4c0e0c6
      Colin Ian King 提交于
      Currently, if calipso_genopt fails then the error exit path
      does not free the ipv6_opt_hdr new causing a memory leak. Fix
      this by kfree'ing new on the error exit path.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4c0e0c6
  10. 13 8月, 2016 1 次提交
  11. 11 8月, 2016 2 次提交
  12. 10 8月, 2016 5 次提交
    • L
      netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes · 4da449ae
      Laura Garcia Liebana 提交于
      Fix the direct assignment of offset and length attributes included in
      nft_exthdr structure from u32 data to u8.
      Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4da449ae
    • T
      bridge: Fix problems around fdb entries pointing to the bridge device · 7bb90c37
      Toshiaki Makita 提交于
      Adding fdb entries pointing to the bridge device uses fdb_insert(),
      which lacks various checks and does not respect added_by_user flag.
      
      As a result, some inconsistent behavior can happen:
      * Adding temporary entries succeeds but results in permanent entries.
      * Same goes for "dynamic" and "use".
      * Changing mac address of the bridge device causes deletion of
        user-added entries.
      * Replacing existing entries looks successful from userspace but actually
        not, regardless of NLM_F_EXCL flag.
      
      Use the same logic as other entries and fix them.
      
      Fixes: 3741873b ("bridge: allow adding of fdb entries pointing to the bridge device")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bb90c37
    • L
      vti: flush x-netns xfrm cache when vti interface is removed · a5d0dc81
      Lance Richardson 提交于
      When executing the script included below, the netns delete operation
      hangs with the following message (repeated at 10 second intervals):
      
        kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      This occurs because a reference to the lo interface in the "secure" netns
      is still held by a dst entry in the xfrm bundle cache in the init netns.
      
      Address this problem by garbage collecting the tunnel netns flow cache
      when a cross-namespace vti interface receives a NETDEV_DOWN notification.
      
      A more detailed description of the problem scenario (referencing commands
      in the script below):
      
      (1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
      
        The vti_test interface is created in the init namespace. vti_tunnel_init()
        attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
        setting the tunnel net to &init_net.
      
      (2) ip link set vti_test netns secure
      
        The vti_test interface is moved to the "secure" netns. Note that
        the associated struct ip_tunnel still has tunnel->net set to &init_net.
      
      (3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
      
        The first packet sent using the vti device causes xfrm_lookup() to be
        called as follows:
      
            dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);
      
        Note that tunnel->net is the init namespace, while skb_dst(skb) references
        the vti_test interface in the "secure" namespace. The returned dst
        references an interface in the init namespace.
      
        Also note that the first parameter to xfrm_lookup() determines which flow
        cache is used to store the computed xfrm bundle, so after xfrm_lookup()
        returns there will be a cached bundle in the init namespace flow cache
        with a dst referencing a device in the "secure" namespace.
      
      (4) ip netns del secure
      
        Kernel begins to delete the "secure" namespace.  At some point the
        vti_test interface is deleted, at which point dst_ifdown() changes
        the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
        in the "secure" namespace however).
        Since nothing has happened to cause the init namespace's flow cache
        to be garbage collected, this dst remains attached to the flow cache,
        so the kernel loops waiting for the last reference to lo to go away.
      
      <Begin script>
      ip link add br1 type bridge
      ip link set dev br1 up
      ip addr add dev br1 1.1.1.1/8
      
      ip netns add secure
      ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
      ip link set vti_test netns secure
      ip netns exec secure ip link set vti_test up
      ip netns exec secure ip link s lo up
      ip netns exec secure ip addr add dev lo 192.168.100.1/24
      ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
      ip xfrm policy flush
      ip xfrm state flush
      ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
         proto esp mode tunnel mark 1
      ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
         proto esp mode tunnel mark 1
      ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
         mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
      ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
         mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
      
      ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
      
      ip netns del secure
      <End script>
      Reported-by: NHangbin Liu <haliu@redhat.com>
      Reported-by: NJan Tluka <jtluka@redhat.com>
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5d0dc81
    • D
      rxrpc: Free packets discarded in data_ready · 992c273a
      David Howells 提交于
      Under certain conditions, the data_ready handler will discard a packet.
      These need to be freed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      992c273a
    • D
      rxrpc: Fix a use-after-push in data_ready handler · 50fd85a1
      David Howells 提交于
      Fix a use of a packet after it has been enqueued onto the packet processing
      queue in the data_ready handler.  Once on a call's Rx queue, we mustn't
      touch it any more as it may be dequeued and freed by the call processor
      running on a work queue.
      
      Save the values we need before enqueuing.
      
      Without this, we can get an oops like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 000000000000009c
      IP: [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
      PGD 0 
      Oops: 0000 [#1] SMP
      Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
      CPU: 2 PID: 0 Comm: swapper/2 Tainted: G            E   4.7.0-fsdevel+ #1336
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff88040d6863c0 task.stack: ffff88040d68c000
      RIP: 0010:[<ffffffffa01854e8>]  [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc]
      RSP: 0018:ffff88041fb03a78  EFLAGS: 00010246
      RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001
      RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00
      RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001
      R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0
      Stack:
       ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00
       ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0
       ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000
      Call Trace:
       <IRQ> 
       [<ffffffffa0186072>] rxrpc_data_ready+0x89d/0xbae [af_rxrpc]
       [<ffffffff814c94d7>] __sock_queue_rcv_skb+0x24c/0x2b2
       [<ffffffff8155c59a>] __udp_queue_rcv_skb+0x4b/0x1bd
       [<ffffffff8155e048>] udp_queue_rcv_skb+0x281/0x4db
       [<ffffffff8155ea8f>] __udp4_lib_rcv+0x7ed/0x963
       [<ffffffff8155ef9a>] udp_rcv+0x15/0x17
       [<ffffffff81531d86>] ip_local_deliver_finish+0x1c3/0x318
       [<ffffffff81532544>] ip_local_deliver+0xbb/0xc4
       [<ffffffff81531bc3>] ? inet_del_offload+0x40/0x40
       [<ffffffff815322a9>] ip_rcv_finish+0x3ce/0x42c
       [<ffffffff81532851>] ip_rcv+0x304/0x33d
       [<ffffffff81531edb>] ? ip_local_deliver_finish+0x318/0x318
       [<ffffffff814dff9d>] __netif_receive_skb_core+0x601/0x6e8
       [<ffffffff814e072e>] __netif_receive_skb+0x13/0x54
       [<ffffffff814e082a>] netif_receive_skb_internal+0xbb/0x17c
       [<ffffffff814e1838>] napi_gro_receive+0xf9/0x1bd
       [<ffffffff8144eb9f>] rtl8169_poll+0x32b/0x4a8
       [<ffffffff814e1c7b>] net_rx_action+0xe8/0x357
       [<ffffffff81051074>] __do_softirq+0x1aa/0x414
       [<ffffffff810514ab>] irq_exit+0x3d/0xb0
       [<ffffffff810184a2>] do_IRQ+0xe4/0xfc
       [<ffffffff81612053>] common_interrupt+0x93/0x93
       <EOI> 
       [<ffffffff814af837>] ? cpuidle_enter_state+0x1ad/0x2be
       [<ffffffff814af832>] ? cpuidle_enter_state+0x1a8/0x2be
       [<ffffffff814af96a>] cpuidle_enter+0x12/0x14
       [<ffffffff8108956f>] call_cpuidle+0x39/0x3b
       [<ffffffff81089855>] cpu_startup_entry+0x230/0x35d
       [<ffffffff810312ea>] start_secondary+0xf4/0xf7
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      50fd85a1