1. 25 10月, 2018 1 次提交
  2. 24 10月, 2018 1 次提交
    • K
      Revert "net: simplify sock_poll_wait" · 89ab066d
      Karsten Graul 提交于
      This reverts commit dd979b4d.
      
      This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
      internal TCP socket for the initial handshake with the remote peer.
      Whenever the SMC connection can not be established this TCP socket is
      used as a fallback. All socket operations on the SMC socket are then
      forwarded to the TCP socket. In case of poll, the file->private_data
      pointer references the SMC socket because the TCP socket has no file
      assigned. This causes tcp_poll to wait on the wrong socket.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ab066d
  3. 21 10月, 2018 2 次提交
    • R
      Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin" · d2fb4fb8
      Roopa Prabhu 提交于
      This reverts commit 8e326289.
      
      This patch results in unnecessary netlink notification when one
      tries to delete a neigh entry already in NUD_FAILED state. Found
      this with a buggy app that tries to delete a NUD_FAILED entry
      repeatedly. While the notification issue can be fixed with more
      checks, adding more complexity here seems unnecessary. Also,
      recent tests with other changes in the neighbour code have
      shown that the INCOMPLETE and PROBE checks are good enough for
      the original issue.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2fb4fb8
    • J
      bpf: sk_msg program helper bpf_msg_push_data · 6fff607e
      John Fastabend 提交于
      This allows user to push data into a msg using sk_msg program types.
      The format is as follows,
      
      	bpf_msg_push_data(msg, offset, len, flags)
      
      this will insert 'len' bytes at offset 'offset'. For example to
      prepend 10 bytes at the front of the message the user can,
      
      	bpf_msg_push_data(msg, 0, 10, 0);
      
      This will invalidate data bounds so BPF user will have to then recheck
      data bounds after calling this. After this the msg size will have been
      updated and the user is free to write into the added bytes. We allow
      any offset/len as long as it is within the (data, data_end) range.
      However, a copy will be required if the ring is full and its possible
      for the helper to fail with ENOMEM or EINVAL errors which need to be
      handled by the BPF program.
      
      This can be used similar to XDP metadata to pass data between sk_msg
      layer and lower layers.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6fff607e
  4. 20 10月, 2018 6 次提交
    • D
      net: fix pskb_trim_rcsum_slow() with odd trim offset · d55bef50
      Dimitris Michailidis 提交于
      We've been getting checksum errors involving small UDP packets, usually
      59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
      has been complaining that HW is providing bad checksums. Turns out the
      problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98
      ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
      
      The source of the problem is that when the bytes we are trimming start
      at an odd address, as in the case of the 1 padding byte above,
      skb_checksum() returns a byte-swapped value. We cannot just combine this
      with skb->csum using csum_sub(). We need to use csum_block_sub() here
      that takes into account the parity of the start address and handles the
      swapping.
      
      Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Signed-off-by: NDimitris Michailidis <dmichail@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d55bef50
    • D
      netpoll: allow cleanup to be synchronous · c9fbd71f
      Debabrata Banerjee 提交于
      This fixes a problem introduced by:
      commit 2cde6acd ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
      
      When using netconsole on a bond, __netpoll_cleanup can asynchronously
      recurse multiple times, each __netpoll_free_async call can result in
      more __netpoll_free_async's. This means there is now a race between
      cleanup_work queues on multiple netpoll_info's on multiple devices and
      the configuration of a new netpoll. For example if a netconsole is set
      to enable 0, reconfigured, and enable 1 immediately, this netconsole
      will likely not work.
      
      Given the reason for __netpoll_free_async is it can be called when rtnl
      is not locked, if it is locked, we should be able to execute
      synchronously. It appears to be locked everywhere it's called from.
      
      Generalize the design pattern from the teaming driver for current
      callers of __netpoll_free_async.
      
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9fbd71f
    • J
      bpf: skmsg, fix psock create on existing kcm/tls port · 5032d079
      John Fastabend 提交于
      Before using the psock returned by sk_psock_get() when adding it to a
      sockmap we need to ensure it is actually a sockmap based psock.
      Previously we were only checking this after incrementing the reference
      counter which was an error. This resulted in a slab-out-of-bounds
      error when the psock was not actually a sockmap type.
      
      This moves the check up so the reference counter is only used
      if it is a sockmap psock.
      
      Eric reported the following KASAN BUG,
      
      BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
      BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
      Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387
      
      CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       check_memory_region_inline mm/kasan/kasan.c:260 [inline]
       check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
       kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
       atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
       refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
       sk_psock_get include/linux/skmsg.h:379 [inline]
       sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178
       sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669
       sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738
       map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5032d079
    • S
      bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB · b39b5f41
      Song Liu 提交于
      BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
      skb. This patch enables direct access of skb for these programs.
      
      Two helper functions bpf_compute_and_save_data_end() and
      bpf_restore_data_end() are introduced. There are used in
      __cgroup_bpf_run_filter_skb(), to compute proper data_end for the
      BPF program, and restore original data afterwards.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b39b5f41
    • M
      bpf: add queue and stack maps · f1a2e44a
      Mauricio Vasquez B 提交于
      Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs.
      These maps support peek, pop and push operations that are exposed to eBPF
      programs through the new bpf_map[peek/pop/push] helpers.  Those operations
      are exposed to userspace applications through the already existing
      syscalls in the following way:
      
      BPF_MAP_LOOKUP_ELEM            -> peek
      BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop
      BPF_MAP_UPDATE_ELEM            -> push
      
      Queue/stack maps are implemented using a buffer, tail and head indexes,
      hence BPF_F_NO_PREALLOC is not supported.
      
      As opposite to other maps, queue and stack do not use RCU for protecting
      maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE
      argument that is a pointer to a memory zone where to save the value of a
      map.  Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not
      be passed as an extra argument.
      
      Our main motivation for implementing queue/stack maps was to keep track
      of a pool of elements, like network ports in a SNAT, however we forsee
      other use cases, like for exampling saving last N kernel events in a map
      and then analysing from userspace.
      Signed-off-by: NMauricio Vasquez B <mauricio.vasquez@polito.it>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f1a2e44a
    • D
      Revert "bond: take rcu lock in netpoll_send_skb_on_dev" · 48995423
      David S. Miller 提交于
      This reverts commit 6fe94878.
      
      It is causing more serious regressions than the RCU warning
      it is fixing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48995423
  5. 16 10月, 2018 8 次提交
    • E
      net: extend sk_pacing_rate to unsigned long · 76a9ebe8
      Eric Dumazet 提交于
      sk_pacing_rate has beed introduced as a u32 field in 2013,
      effectively limiting per flow pacing to 34Gbit.
      
      We believe it is time to allow TCP to pace high speed flows
      on 64bit hosts, as we now can reach 100Gbit on one TCP flow.
      
      This patch adds no cost for 32bit kernels.
      
      The tcpi_pacing_rate and tcpi_max_pacing_rate were already
      exported as 64bit, so iproute2/ss command require no changes.
      
      Unfortunately the SO_MAX_PACING_RATE socket option will stay
      32bit and we will need to add a new option to let applications
      control high pacing rates.
      
      State      Recv-Q Send-Q Local Address:Port             Peer Address:Port
      ESTAB      0      1787144  10.246.9.76:49992             10.246.9.77:36741
                       timer:(on,003ms,0) ino:91863 sk:2 <->
       skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0)
       ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448
       rcvmss:536 advmss:1448
       cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177
       segs_in:3916318 data_segs_out:177279175
       bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2)
       send 28045.5Mbps lastrcv:73333
       pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps
       busy:73333ms unacked:135 retrans:0/157 rcv_space:14480
       notsent:2085120 minrtt:0.013
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76a9ebe8
    • M
      FDDI: defza: Support capturing outgoing SMT traffic · 9f9a742d
      Maciej W. Rozycki 提交于
      DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate
      SMT frames with adapter's firmware.  Any SMT frame received from the RMC
      via the Rx queue is queued back by the driver to the SMT Rx queue for
      the firmware to process.  Similarly the firmware uses the SMT Tx queue
      to supply the driver with SMT frames which are queued back to the Tx
      queue for the RMC to send to the ring.
      
      When a network tap is attached to an FDDI interface handled by `defza'
      any incoming SMT frames captured are queued to our usual processing of
      network data received, which in turn delivers them to any listening
      taps.
      
      However the outgoing SMT frames produced by the firmware bypass our
      network protocol stack and are therefore not delivered to taps.  This in
      turn means that taps are missing a part of network traffic sent by the
      adapter, which may make it more difficult to track down network problems
      or do general traffic analysis.
      
      Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that
      a network tap is attached, with a newly-created `dev_nit_active' helper
      wrapping the usual condition used in the transmit path.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9a742d
    • W
      ethtool: fix a privilege escalation bug · 58f5bbe3
      Wenwen Wang 提交于
      In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
      use-space buffer 'useraddr' and checked to see whether it is
      ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
      the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
      according to 'sub_cmd', a permission check is enforced through the function
      ns_capable(). For example, the permission check is required if 'sub_cmd' is
      ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
      ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
      done by anyone". The following execution invokes different handlers
      according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
      ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
      object 'per_queue_opt' is copied again from the user-space buffer
      'useraddr' and 'per_queue_opt.sub_command' is used to determine which
      operation should be performed. Given that the buffer 'useraddr' is in the
      user space, a malicious user can race to change the sub-command between the
      two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
      ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
      before ethtool_set_per_queue() is called, the attacker changes
      ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
      bypass the permission check and execute ETHTOOL_SCOALESCE.
      
      This patch enforces a check in ethtool_set_per_queue() after the second
      copy from 'useraddr'. If the sub-command is different from the one obtained
      in the first copy in dev_ethtool(), an error code EINVAL will be returned.
      
      Fixes: f38d138a ("net/ethtool: support set coalesce per queue")
      Signed-off-by: NWenwen Wang <wang6495@umn.edu>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58f5bbe3
    • W
      ethtool: fix a missing-check bug · 2bb3207d
      Wenwen Wang 提交于
      In ethtool_get_rxnfc(), the eth command 'cmd' is compared against
      'ETHTOOL_GRXFH' to see whether it is necessary to adjust the variable
      'info_size'. Then the whole structure of 'info' is copied from the
      user-space buffer 'useraddr' with 'info_size' bytes. In the following
      execution, 'info' may be copied again from the buffer 'useraddr' depending
      on the 'cmd' and the 'info.flow_type'. However, after these two copies,
      there is no check between 'cmd' and 'info.cmd'. In fact, 'cmd' is also
      copied from the buffer 'useraddr' in dev_ethtool(), which is the caller
      function of ethtool_get_rxnfc(). Given that 'useraddr' is in the user
      space, a malicious user can race to change the eth command in the buffer
      between these copies. By doing so, the attacker can supply inconsistent
      data and cause undefined behavior because in the following execution 'info'
      will be passed to ops->get_rxnfc().
      
      This patch adds a necessary check on 'info.cmd' and 'cmd' to confirm that
      they are still same after the two copies in ethtool_get_rxnfc(). Otherwise,
      an error code EINVAL will be returned.
      Signed-off-by: NWenwen Wang <wang6495@umn.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bb3207d
    • J
      bpf: Fix IPv6 dport byte-order in bpf_sk_lookup · 5ef0ae84
      Joe Stringer 提交于
      Commit 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      mistakenly passed the destination port in network byte-order to the IPv6
      TCP/UDP socket lookup functions, which meant that BPF writers would need
      to either manually swap the byte-order of this field or otherwise IPv6
      sockets could not be located via this helper.
      
      Fix the issue by swapping the byte-order appropriately in the helper.
      This also makes the API more consistent with the IPv4 version.
      
      Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5ef0ae84
    • J
      bpf: Allow sk_lookup with IPv6 module · 8a615c6b
      Joe Stringer 提交于
      This is a more complete fix than d71019b5 ("net: core: Fix build
      with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
      module is loaded (not just if it's compiled in).
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      8a615c6b
    • D
      tls: convert to generic sk_msg interface · d829e9c4
      Daniel Borkmann 提交于
      Convert kTLS over to make use of sk_msg interface for plaintext and
      encrypted scattergather data, so it reuses all the sk_msg helpers
      and data structure which later on in a second step enables to glue
      this to BPF.
      
      This also allows to remove quite a bit of open coded helpers which
      are covered by the sk_msg API. Recent changes in kTLs 80ece6a0
      ("tls: Remove redundant vars from tls record structure") and
      4e6d4720 ("tls: Add support for inplace records encryption")
      changed the data path handling a bit; while we've kept the latter
      optimization intact, we had to undo the former change to better
      fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
      been brought back and are linked into the sk_msg sgs. Now the kTLS
      record contains a msg_plaintext and msg_encrypted sk_msg each.
      
      In the original code, the zerocopy_from_iter() has been used out
      of TX but also RX path. For the strparser skb-based RX path,
      we've left the zerocopy_from_iter() in decrypt_internal() mostly
      untouched, meaning it has been moved into tls_setup_from_iter()
      with charging logic removed (as not used from RX). Given RX path
      is not based on sk_msg objects, we haven't pursued setting up a
      dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
      could be an option to prusue in a later step.
      
      Joint work with John.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d829e9c4
    • D
      bpf, sockmap: convert to generic sk_msg interface · 604326b4
      Daniel Borkmann 提交于
      Add a generic sk_msg layer, and convert current sockmap and later
      kTLS over to make use of it. While sk_buff handles network packet
      representation from netdevice up to socket, sk_msg handles data
      representation from application to socket layer.
      
      This means that sk_msg framework spans across ULP users in the
      kernel, and enables features such as introspection or filtering
      of data with the help of BPF programs that operate on this data
      structure.
      
      Latter becomes in particular useful for kTLS where data encryption
      is deferred into the kernel, and as such enabling the kernel to
      perform L7 introspection and policy based on BPF for TLS connections
      where the record is being encrypted after BPF has run and came to
      a verdict. In order to get there, first step is to transform open
      coding of scatter-gather list handling into a common core framework
      that subsystems can use.
      
      The code itself has been split and refactored into three bigger
      pieces: i) the generic sk_msg API which deals with managing the
      scatter gather ring, providing helpers for walking and mangling,
      transferring application data from user space into it, and preparing
      it for BPF pre/post-processing, ii) the plain sock map itself
      where sockets can be attached to or detached from; these bits
      are independent of i) which can now be used also without sock
      map, and iii) the integration with plain TCP as one protocol
      to be used for processing L7 application data (later this could
      e.g. also be extended to other protocols like UDP). The semantics
      are the same with the old sock map code and therefore no change
      of user facing behavior or APIs. While pursuing this work it
      also helped finding a number of bugs in the old sockmap code
      that we've fixed already in earlier commits. The test_sockmap
      kselftest suite passes through fine as well.
      
      Joint work with John.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      604326b4
  6. 14 10月, 2018 1 次提交
  7. 13 10月, 2018 2 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
    • D
      net: Evict neighbor entries on carrier down · 859bd2ef
      David Ahern 提交于
      When a link's carrier goes down it could be a sign of the port changing
      networks. If the new network has overlapping addresses with the old one,
      then the kernel will continue trying to use neighbor entries established
      based on the old network until the entries finally age out - meaning a
      potentially long delay with communications not working.
      
      This patch evicts neighbor entries on carrier down with the exception of
      those marked permanent. Permanent entries are managed by userspace (either
      an admin or a routing daemon such as FRR).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      859bd2ef
  8. 11 10月, 2018 7 次提交
    • S
      net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce
      Sabrina Dubroca 提交于
      Since commit 5aad1de5 ("ipv4: use separate genid for next hop
      exceptions"), exceptions get deprecated separately from cached
      routes. In particular, administrative changes don't clear PMTU anymore.
      
      As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
      on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
      the local MTU change can become stale:
       - if the local MTU is now lower than the PMTU, that PMTU is now
         incorrect
       - if the local MTU was the lowest value in the path, and is increased,
         we might discover a higher PMTU
      
      Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
      cases.
      
      If the exception was locked, the discovered PMTU was smaller than the
      minimal accepted PMTU. In that case, if the new local MTU is smaller
      than the current PMTU, let PMTU discovery figure out if locking of the
      exception is still needed.
      
      To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
      notifier. By the time the notifier is called, dev->mtu has been
      changed. This patch adds the old MTU as additional information in the
      notifier structure, and a new call_netdevice_notifiers_u32() function.
      
      Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af7d6cce
    • D
      rtnetlink: Update comment in rtnl_stats_dump regarding strict data checking · e75fa073
      David Ahern 提交于
      The NLM_F_DUMP_PROPER_HDR netlink flag was replaced by a setsockopt.
      Update the comment in rtnl_stats_dump.
      
      Fixes: 841891ec ("rtnetlink: Update rtnl_stats_dump for strict data checking")
      Reported-by: NChristian Brauner <christian@brauner.io>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e75fa073
    • D
      rtnetlink: Move ifm in valid_fdb_dump_legacy to closer to use · 4565d7e5
      David Ahern 提交于
      Move setting of local variable ifm to after the message parsing in
      valid_fdb_dump_legacy. Avoid potential future use of unchecked variable.
      
      Fixes: 8dfbda19 ("rtnetlink: Move input checking for rtnl_fdb_dump to helper")
      Reported-by: NChristian Brauner <christian@brauner.io>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4565d7e5
    • E
      net: make skb_partial_csum_set() more robust against overflows · 52b5d6f5
      Eric Dumazet 提交于
      syzbot managed to crash in skb_checksum_help() [1] :
      
              BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb));
      
      Root cause is the following check in skb_partial_csum_set()
      
      	if (unlikely(start > skb_headlen(skb)) ||
      	    unlikely((int)start + off > skb_headlen(skb) - 2))
      		return false;
      
      If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff
      and the check fails to detect that ((int)start + off) is off the limit,
      since the compare is unsigned.
      
      When we fix that, then the first condition (start > skb_headlen(skb))
      becomes obsolete.
      
      Then we should also check that (skb_headroom(skb) + start) wont
      overflow 16bit field.
      
      [1]
      kernel BUG at net/core/dev.c:2880!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880
      Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf
      RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293
      RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7
      RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006
      RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000
      R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001
      R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8
      FS:  00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269
       validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312
       __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
       packet_snd net/packet/af_packet.c:2928 [inline]
       packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953
      
      Fixes: 5ff8dda3 ("net: Ensure partial checksum offset is inside the skb head")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52b5d6f5
    • M
      devlink: Add helper function for safely copy string param · bde74ad1
      Moshe Shemesh 提交于
      Devlink string param buffer is allocated at the size of
      DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure
      this size is not exceeded.
      Renamed DEVLINK_PARAM_MAX_STRING_VALUE to
      __DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by
      devlink only. The driver should use the helper function instead to
      verify it doesn't exceed the allowed length.
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde74ad1
    • M
      devlink: Fix param cmode driverinit for string type · 1276534c
      Moshe Shemesh 提交于
      Driverinit configuration mode value is held by devlink to enable the
      driver fetch the value after reload command. In case the param type is
      string devlink should copy the value from driver string buffer to
      devlink string buffer on devlink_param_driverinit_value_set() and
      vice-versa on devlink_param_driverinit_value_get().
      
      Fixes: ec01aeb1 ("devlink: Add support for get/set driverinit value")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1276534c
    • M
      devlink: Fix param set handling for string type · f355cfcd
      Moshe Shemesh 提交于
      In case devlink param type is string, it needs to copy the string value
      it got from the input to devlink_param_value.
      
      Fixes: e3b7ca18 ("devlink: Add param set command")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f355cfcd
  9. 10 10月, 2018 1 次提交
    • J
      net: fix generic XDP to handle if eth header was mangled · 29724956
      Jesper Dangaard Brouer 提交于
      XDP can modify (and resize) the Ethernet header in the packet.
      
      There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
      are setup before reaching (netif_receive_)generic_xdp.
      
      This bug was hit when XDP were popping VLAN headers (changing
      eth->h_proto), as skb->protocol still contains VLAN-indication
      (ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
      the packet (basically popping the VLAN again).
      
      This patch catch if XDP changed eth header in such a way, that SKB
      fields needs to be updated.
      
      V2: on request from Song Liu, use ETH_HLEN instead of mac_len,
      in __skb_push() as eth_type_trans() use ETH_HLEN in paired skb_pull_inline().
      
      Fixes: d4455169 ("net: xdp: support xdp generic on virtual devices")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      29724956
  10. 09 10月, 2018 11 次提交