1. 28 8月, 2019 6 次提交
    • V
      net: dsa: clear VLAN PVID flag for CPU port · b9499904
      Vivien Didelot 提交于
      When the bridge offloads a VLAN on a slave port, we also need to
      program its dedicated CPU port as a member of the VLAN.
      
      Drivers may handle the CPU port's membership as they want. For example,
      Marvell as a special "Unmodified" mode to pass frames as is through
      such ports.
      
      Even though DSA expects the drivers to handle the CPU port membership,
      it does not make sense to program user VLANs as PVID on the CPU port.
      This patch clears this flag before programming the CPU port.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Suggested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9499904
    • V
      net: dsa: program VLAN on CPU port from slave · 7e1741b4
      Vivien Didelot 提交于
      DSA currently programs a VLAN on the CPU port implicitly after the
      related notifier is received by a switch.
      
      While we still need to do this transparent programmation of the DSA
      links in the fabric, programming the CPU port this way may cause
      problems in some corners such as the tag_8021q driver.
      
      Because the dedicated CPU port is specific to a slave, make their
      programmation explicit a few layers up, in the slave code.
      
      Note that technically, DSA links have a dedicated CPU port as well,
      but since they are only used as conduit between interconnected switches
      of a fabric, programming them transparently this way is what we want.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e1741b4
    • V
      net: dsa: check bridge VLAN in slave operations · c5335d73
      Vivien Didelot 提交于
      The bridge VLANs are not offloaded by dsa_port_vlan_* if the port is
      not bridged or if its bridge is not VLAN aware.
      
      This is a good thing but other corners of DSA, such as the tag_8021q
      driver, may need to program VLANs regardless the bridge state.
      
      And also because bridge_dev is specific to user ports anyway, move
      these checks were it belongs, one layer up in the slave code.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Suggested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5335d73
    • V
      net: dsa: add slave VLAN helpers · bdcff080
      Vivien Didelot 提交于
      Add dsa_slave_vlan_add and dsa_slave_vlan_del helpers to handle
      SWITCHDEV_OBJ_ID_PORT_VLAN switchdev objects. Also copy the
      switchdev_obj_port_vlan structure on add since we will modify it in
      future patches.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdcff080
    • V
      net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add · cf360866
      Vivien Didelot 提交于
      Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
      
      This function is used in the tag_8021q.c code to offload the PVID of
      ports, which would simply not work if .port_vlan_add is not supported
      by the underlying switch.
      
      Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
      that is to say in dsa_slave_vlan_rx_add_vid.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf360866
    • V
      net: dsa: remove bitmap operations · e65d45cc
      Vivien Didelot 提交于
      The bitmap operations were introduced to simplify the switch drivers
      in the future, since most of them could implement the common VLAN and
      MDB operations (add, del, dump) with simple functions taking all target
      ports at once, and thus limiting the number of hardware accesses.
      
      Programming an MDB or VLAN this way in a single operation would clearly
      simplify the drivers a lot but would require a new get-set interface
      in DSA. The usage of such bitmap from the stack also raised concerned
      in the past, leading to the dynamic allocation of a new ds->_bitmap
      member in the dsa_switch structure. So let's get rid of them for now.
      
      This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add}
      switch operations into new dsa_switch_{mdb,vlan}_{prepare,add}
      variants not using any bitmap argument anymore.
      
      New dsa_switch_{mdb,vlan}_match helpers have been introduced to make
      clear which local port of a switch must be programmed with the target
      object. While the targeted user port is an obvious candidate, the
      DSA links must also be programmed, as well as the CPU port for VLANs.
      
      While at it, also remove local variables that are only used once.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e65d45cc
  2. 27 8月, 2019 14 次提交
  3. 26 8月, 2019 2 次提交
  4. 25 8月, 2019 7 次提交
    • Z
      net: rds: add service level support in rds-info · e0e6d062
      Zhu Yanjun 提交于
      >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
      is used to identify different flows within an IBA subnet.
      It is carried in the local route header of the packet.
      
      Before this commit, run "rds-info -I". The outputs are as
      below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      After this commit, the output is as below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      
      The commit fe3475af ("net: rds: add per rds connection cache
      statistics") adds cache_allocs in struct rds_info_rdma_connection
      as below:
      struct rds_info_rdma_connection {
      ...
              __u32           rdma_mr_max;
              __u32           rdma_mr_size;
              __u8            tos;
              __u32           cache_allocs;
       };
      The peer struct in rds-tools of struct rds_info_rdma_connection is as
      below:
      struct rds_info_rdma_connection {
      ...
              uint32_t        rdma_mr_max;
              uint32_t        rdma_mr_size;
              uint8_t         tos;
              uint8_t         sl;
              uint32_t        cache_allocs;
      };
      The difference between userspace and kernel is the member variable sl.
      In the kernel struct, the member variable sl is missing. This will
      introduce risks. So it is necessary to use this commit to avoid this risk.
      
      Fixes: fe3475af ("net: rds: add per rds connection cache statistics")
      CC: Joe Jin <joe.jin@oracle.com>
      CC: JUNXIAO_BI <junxiao.bi@oracle.com>
      Suggested-by: NGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e6d062
    • J
      net: route dump netlink NLM_F_MULTI flag missing · e93fb3e9
      John Fastabend 提交于
      An excerpt from netlink(7) man page,
      
        In multipart messages (multiple nlmsghdr headers with associated payload
        in one byte stream) the first and all following headers have the
        NLM_F_MULTI flag set, except for the last  header  which  has the type
        NLMSG_DONE.
      
      but, after (ee28906f) there is a missing NLM_F_MULTI flag in the middle of a
      FIB dump. The result is user space applications following above man page
      excerpt may get confused and may stop parsing msg believing something went
      wrong.
      
      In the golang netlink lib [0] the library logic stops parsing believing the
      message is not a multipart message. Found this running Cilium[1] against
      net-next while adding a feature to auto-detect routes. I noticed with
      multiple route tables we no longer could detect the default routes on net
      tree kernels because the library logic was not returning them.
      
      Fix this by handling the fib_dump_info_fnhe() case the same way the
      fib_dump_info() handles it by passing the flags argument through the
      call chain and adding a flags argument to rt_fill_info().
      
      Tested with Cilium stack and auto-detection of routes works again. Also
      annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
      NLMSG_DONE flags look correct after this.
      
      Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
      as is done for fib_dump_info() so this looks correct to me.
      
      [0] https://github.com/vishvananda/netlink/
      [1] https://github.com/cilium/
      
      Fixes: ee28906f ("ipv4: Dump route exceptions if requested")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e93fb3e9
    • Z
      sock: fix potential memory leak in proto_register() · b45ce321
      zhanglin 提交于
      If protocols registered exceeded PROTO_INUSE_NR, prot will be
      added to proto_list, but no available bit left for prot in
      proto_inuse_idx.
      
      Changes since v2:
      * Propagate the error code properly
      Signed-off-by: Nzhanglin <zhang.lin16@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b45ce321
    • M
      net/core/skmsg: Delete an unnecessary check before the function call “consume_skb” · dd016aca
      Markus Elfring 提交于
      The consume_skb() function performs also input parameter validation.
      Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd016aca
    • H
      xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode · c3b4c3a4
      Hangbin Liu 提交于
      In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv6
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmpv6_send
            - icmpv6_route_lookup
              - xfrm_decode_session_reverse
                - decode_session4
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
                - decode_session6
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the skb_dst(skb)->dev is actually not needed as we
      called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
      used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
      
      So make a dst dev check here should be clean and safe.
      
      v4: No changes.
      
      v3: No changes.
      
      v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Tested-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3b4c3a4
    • H
      ipv4/icmp: fix rt dst dev null pointer dereference · e2c69393
      Hangbin Liu 提交于
      In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the reason we need rt->dst.dev is to get the net.
      So we can just try get it from skb->dev when rt->dst.dev is NULL.
      
      v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
      still use dst.dev and do a check to avoid crash.
      
      v3: No changes.
      
      v2: fix the issue in __icmp_send() instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: c8b34e68 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2c69393
    • Y
      openvswitch: Fix log message in ovs conntrack · 12c6bc38
      Yi-Hung Wei 提交于
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12c6bc38
  5. 24 8月, 2019 6 次提交
    • I
      bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 · 2c238177
      Ilya Leoshkevich 提交于
      test_select_reuseport fails on s390 due to verifier rejecting
      test_select_reuseport_kern.o with the following message:
      
      	; data_check.eth_protocol = reuse_md->eth_protocol;
      	18: (69) r1 = *(u16 *)(r6 +22)
      	invalid bpf_context access off=22 size=2
      
      This is because on big-endian machines casts from __u32 to __u16 are
      generated by referencing the respective variable as __u16 with an offset
      of 2 (as opposed to 0 on little-endian machines).
      
      The verifier already has all the infrastructure in place to allow such
      accesses, it's just that they are not explicitly enabled for
      eth_protocol field. Enable them for eth_protocol field by using
      bpf_ctx_range instead of offsetof.
      
      Ditto for ip_protocol, bind_inany and len, since they already allow
      narrowing, and the same problem can arise when working with them.
      
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2c238177
    • J
      flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH · db38de39
      Jakub Sitnicki 提交于
      Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
      free the program once a grace period has elapsed. The callback can run
      together with new RCU readers that started after the last grace period.
      New RCU readers can potentially see the "old" to-be-freed or already-freed
      pointer to the program object before the RCU update-side NULLs it.
      
      Reorder the operations so that the RCU update-side resets the protected
      pointer before the end of the grace period after which the program will be
      freed.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Reported-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NPetar Penkov <ppenkov@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db38de39
    • I
      drop_monitor: Make timestamps y2038 safe · bd1200b7
      Ido Schimmel 提交于
      Timestamps are currently communicated to user space as 'struct
      timespec', which is not considered y2038 safe since it uses a 32-bit
      signed value for seconds.
      
      Fix this while the API is still not part of any official kernel release
      by using 64-bit nanoseconds timestamps instead.
      
      Fixes: ca30707d ("drop_monitor: Add packet alert mode")
      Fixes: 5e58109b ("drop_monitor: Add support for packet alert mode for hardware drops")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1200b7
    • D
      net/rds: Whitelist rdma_cookie and rx_tstamp for usercopy · bf1867db
      Dag Moxnes 提交于
      Add the RDMA cookie and RX timestamp to the usercopy whitelist.
      
      After the introduction of hardened usercopy whitelisting
      (https://lwn.net/Articles/727322/), a warning is displayed when the
      RDMA cookie or RX timestamp is copied to userspace:
      
      kernel: WARNING: CPU: 3 PID: 5750 at
      mm/usercopy.c:81 usercopy_warn+0x8e/0xa6
      [...]
      kernel: Call Trace:
      kernel: __check_heap_object+0xb8/0x11b
      kernel: __check_object_size+0xe3/0x1bc
      kernel: put_cmsg+0x95/0x115
      kernel: rds_recvmsg+0x43d/0x620 [rds]
      kernel: sock_recvmsg+0x43/0x4a
      kernel: ___sys_recvmsg+0xda/0x1e6
      kernel: ? __handle_mm_fault+0xcae/0xf79
      kernel: __sys_recvmsg+0x51/0x8a
      kernel: SyS_recvmsg+0x12/0x1c
      kernel: do_syscall_64+0x79/0x1ae
      
      When the whitelisting feature was introduced, the memory for the RDMA
      cookie and RX timestamp in RDS was not added to the whitelist, causing
      the warning above.
      Signed-off-by: NDag Moxnes <dag.moxnes@oracle.com>
      Tested-by: NJenny <jenny.x.xu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf1867db
    • S
      ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev · db0b99f5
      Sabrina Dubroca 提交于
      Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
      ignoring the specific error value. This results in addrconf_add_dev
      returning ENOBUFS in all cases, which is unfortunate in cases such as:
      
          # ip link add dummyX type dummy
          # ip link set dummyX mtu 1200 up
          # ip addr add 2000::/64 dev dummyX
          RTNETLINK answers: No buffer space available
      
      Commit a317a2f1 ("ipv6: fail early when creating netdev named all
      or default") introduced error returns in ipv6_add_dev. Before that,
      that function would simply return NULL for all failures.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db0b99f5
    • X
      net: ipv6: fix listify ip6_rcv_finish in case of forwarding · c7a42eb4
      Xin Long 提交于
      We need a similar fix for ipv6 as Commit 0761680d ("net: ipv4: fix
      listify ip_rcv_finish in case of forwarding") does for ipv4.
      
      This issue can be reprocuded by syzbot since Commit 323ebb61 ("net:
      use listified RX for handling GRO_NORMAL skbs") on net-next. The call
      trace was:
      
        kernel BUG at include/linux/skbuff.h:2225!
        RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
        RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
        Call Trace:
          sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
          sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
          sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
          sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
          sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
          ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
          ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
          NF_HOOK include/linux/netfilter.h:305 [inline]
          NF_HOOK include/linux/netfilter.h:299 [inline]
          ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
          dst_input include/net/dst.h:442 [inline]
          ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
          ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
          ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
          ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
          __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
          __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
          __netif_receive_skb_list net/core/dev.c:5149 [inline]
          netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
          gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
          gro_normal_list net/core/dev.c:5755 [inline]
          gro_normal_one net/core/dev.c:5769 [inline]
          napi_frags_finish net/core/dev.c:5782 [inline]
          napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
          tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
          tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
      
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Fixes: 323ebb61 ("net: use listified RX for handling GRO_NORMAL skbs")
      Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
      Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7a42eb4
  6. 23 8月, 2019 3 次提交
  7. 22 8月, 2019 2 次提交