1. 27 4月, 2018 2 次提交
  2. 25 4月, 2018 7 次提交
    • C
      ipconfig: Write NTP server IPs to /proc/net/ipconfig/ntp_servers · c04d2cb2
      Chris Novakovic 提交于
      Distributed filesystems are most effective when the server and client
      clocks are synchronised. Embedded devices often use NFS for their
      root filesystem but typically do not contain an RTC, so the clocks of
      the NFS server and the embedded device will be out-of-sync when the root
      filesystem is mounted (and may not be synchronised until late in the
      boot process).
      
      Extend ipconfig with the ability to export IP addresses of NTP servers
      it discovers to /proc/net/ipconfig/ntp_servers. They can be supplied as
      follows:
      
       - If ipconfig is configured manually via the "ip=" or "nfsaddrs="
         kernel command line parameters, one NTP server can be specified in
         the new "<ntp0-ip>" parameter.
       - If ipconfig is autoconfigured via DHCP, request DHCP option 42 in
         the DHCPDISCOVER message, and record the IP addresses of up to three
         NTP servers sent by the responding DHCP server in the subsequent
         DHCPOFFER message.
      
      ipconfig will only write the NTP server IP addresses it discovers to
      /proc/net/ipconfig/ntp_servers, one per line (in the order received from
      the DHCP server, if DHCP autoconfiguration is used); making use of these
      NTP servers is the responsibility of a user space process (e.g. an
      initrd/initram script that invokes an NTP client before mounting an NFS
      root filesystem).
      Signed-off-by: NChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c04d2cb2
    • C
      ipconfig: Create /proc/net/ipconfig directory · 4d019b3f
      Chris Novakovic 提交于
      To allow ipconfig to report IP configuration details to user space
      processes without cluttering /proc/net, create a new subdirectory
      /proc/net/ipconfig. All files containing IP configuration details should
      be written to this directory.
      Signed-off-by: NChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d019b3f
    • C
      ipconfig: Correctly initialise ic_nameservers · 300eec7c
      Chris Novakovic 提交于
      ic_nameservers, which stores the list of name servers discovered by
      ipconfig, is initialised (i.e. has all of its elements set to NONE, or
      0xffffffff) by ic_nameservers_predef() in the following scenarios:
      
       - before the "ip=" and "nfsaddrs=" kernel command line parameters are
         parsed (in ip_auto_config_setup());
       - before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in
         order to clear any values that may have been set after parsing "ip="
         or "nfsaddrs=" and are no longer needed.
      
      This means that ic_nameservers_predef() is not called when neither "ip="
      nor "nfsaddrs=" is specified on the kernel command line. In this
      scenario, every element in ic_nameservers remains set to 0x00000000,
      which is indistinguishable from ANY and causes pnp_seq_show() to write
      the following (bogus) information to /proc/net/pnp:
      
        #MANUAL
        nameserver 0.0.0.0
        nameserver 0.0.0.0
        nameserver 0.0.0.0
      
      This is potentially problematic for systems that blindly link
      /etc/resolv.conf to /proc/net/pnp.
      
      Ensure that ic_nameservers is also initialised when neither "ip=" nor
      "nfsaddrs=" are specified by calling ic_nameservers_predef() in
      ip_auto_config(), but only when ip_auto_config_setup() was not called
      earlier. This causes the following to be written to /proc/net/pnp, and
      is consistent with what gets written when ipconfig is configured
      manually but no name servers are specified on the kernel command line:
      
        #MANUAL
      Signed-off-by: NChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      300eec7c
    • C
      ipconfig: BOOTP: Request CONF_NAMESERVERS_MAX name servers · de1fa15b
      Chris Novakovic 提交于
      When ipconfig is autoconfigured via BOOTP, the request packet
      initialised by ic_bootp_init_ext() always allocates 8 bytes for the name
      server option, limiting the BOOTP server to responding with at most 2
      name servers even though ipconfig in fact supports an arbitrary number
      of name servers (as defined by CONF_NAMESERVERS_MAX, which is currently
      3).
      
      Only request name servers in the request packet if CONF_NAMESERVERS_MAX
      is positive (to comply with [1, §3.8]), and allocate enough space in the
      packet for CONF_NAMESERVERS_MAX name servers to indicate the maximum
      number we can accept in response.
      
      [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions":
          https://tools.ietf.org/rfc/rfc2132.txtSigned-off-by: NChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de1fa15b
    • C
      ipconfig: BOOTP: Don't request IEN-116 name servers · 4e1a8af2
      Chris Novakovic 提交于
      When ipconfig is autoconfigured via BOOTP, the request packet
      initialised by ic_bootp_init_ext() allocates 8 bytes for tag 5 ("Name
      Server" [1, §3.7]), but tag 5 in the response isn't processed by
      ic_do_bootp_ext(). Instead, allocate the 8 bytes to tag 6 ("Domain Name
      Server" [1, §3.8]), which is processed by ic_do_bootp_ext(), and appears
      to have been the intended tag to request.
      
      This won't cause any breakage for existing users, as tag 5 responses
      provided by BOOTP servers weren't being processed anyway.
      
      [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions":
          https://tools.ietf.org/rfc/rfc2132.txtSigned-off-by: NChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e1a8af2
    • C
      ipconfig: Tidy up reporting of name servers · e18bdc83
      Chris Novakovic 提交于
      Commit 5e953778 ("ipconfig: add
      nameserver IPs to kernel-parameter ip=") adds the IP addresses of
      discovered name servers to the summary printed by ipconfig when
      configuration is complete. It appears the intention in ip_auto_config()
      was to print the name servers on a new line (especially given the
      spacing and lack of comma before "nameserver0="), but they're actually
      printed on the same line as the NFS root filesystem configuration
      summary:
      
        [    0.686186] IP-Config: Complete:
        [    0.686226]      device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1
        [    0.686328]      host=test, domain=example.com, nis-domain=(none)
        [    0.686386]      bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=     nameserver0=10.0.0.1
      
      This makes it harder to read and parse ipconfig's output. Instead, print
      the name servers on a separate line:
      
        [    0.791250] IP-Config: Complete:
        [    0.791289]      device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1
        [    0.791407]      host=test, domain=example.com, nis-domain=(none)
        [    0.791475]      bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=
        [    0.791476]      nameserver0=10.0.0.1
      Signed-off-by: NChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e18bdc83
    • E
      tcp: md5: only call tp->af_specific->md5_lookup() for md5 sockets · 8c2320e8
      Eric Dumazet 提交于
      RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive,
      given they have no result.
      We can omit the calls for sockets that have no md5 keys.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c2320e8
  3. 24 4月, 2018 1 次提交
  4. 23 4月, 2018 4 次提交
    • Y
      net: init sk_cookie for inet socket · c6849a3a
      Yafang Shao 提交于
      With sk_cookie we can identify a socket, that is very helpful for
      traceing and statistic, i.e. tcp tracepiont and ebpf.
      So we'd better init it by default for inet socket.
      When using it, we just need call atomic64_read(&sk->sk_cookie).
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6849a3a
    • R
      net: fib_rules: add extack support · b16fb418
      Roopa Prabhu 提交于
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b16fb418
    • Y
      net: introduce a new tracepoint for tcp_rcv_space_adjust · 6163849d
      Yafang Shao 提交于
      tcp_rcv_space_adjust is called every time data is copied to user space,
      introducing a tcp tracepoint for which could show us when the packet is
      copied to user.
      
      When a tcp packet arrives, tcp_rcv_established() will be called and with
      the existed tracepoint tcp_probe we could get the time when this packet
      arrives.
      Then this packet will be copied to user, and tcp_rcv_space_adjust will
      be called and with this new introduced tracepoint we could get the time
      when this packet is copied to user.
      With these two tracepoints, we could figure out whether the user program
      processes this packet immediately or there's latency.
      
      Hence in the printk message, sk_cookie is printed as a key to relate
      tcp_rcv_space_adjust with tcp_probe.
      
      Maybe we could export sockfd in this new tracepoint as well, then we
      could relate this new tracepoint with epoll/read/recv* tracepoints, and
      finally that could show us the whole lifespan of this packet. But we
      could also implement that with pid as these functions are executed in
      process context.
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6163849d
    • J
      tcp: don't read out-of-bounds opsize · 7e5a206a
      Jann Horn 提交于
      The old code reads the "opsize" variable from out-of-bounds memory (first
      byte behind the segment) if a broken TCP segment ends directly after an
      opcode that is neither EOL nor NOP.
      
      The result of the read isn't used for anything, so the worst thing that
      could theoretically happen is a pagefault; and since the physmap is usually
      mostly contiguous, even that seems pretty unlikely.
      
      The following C reproducer triggers the uninitialized read - however, you
      can't actually see anything happen unless you put something like a
      pr_warn() in tcp_parse_md5sig_option() to print the opsize.
      
      ====================================
      #define _GNU_SOURCE
      #include <arpa/inet.h>
      #include <stdlib.h>
      #include <errno.h>
      #include <stdarg.h>
      #include <net/if.h>
      #include <linux/if.h>
      #include <linux/ip.h>
      #include <linux/tcp.h>
      #include <linux/in.h>
      #include <linux/if_tun.h>
      #include <err.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/ioctl.h>
      #include <assert.h>
      
      void systemf(const char *command, ...) {
        char *full_command;
        va_list ap;
        va_start(ap, command);
        if (vasprintf(&full_command, command, ap) == -1)
          err(1, "vasprintf");
        va_end(ap);
        printf("systemf: <<<%s>>>\n", full_command);
        system(full_command);
      }
      
      char *devname;
      
      int tun_alloc(char *name) {
        int fd = open("/dev/net/tun", O_RDWR);
        if (fd == -1)
          err(1, "open tun dev");
        static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
        strcpy(req.ifr_name, name);
        if (ioctl(fd, TUNSETIFF, &req))
          err(1, "TUNSETIFF");
        devname = req.ifr_name;
        printf("device name: %s\n", devname);
        return fd;
      }
      
      #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
      
      void sum_accumulate(unsigned int *sum, void *data, int len) {
        assert((len&2)==0);
        for (int i=0; i<len/2; i++) {
          *sum += ntohs(((unsigned short *)data)[i]);
        }
      }
      
      unsigned short sum_final(unsigned int sum) {
        sum = (sum >> 16) + (sum & 0xffff);
        sum = (sum >> 16) + (sum & 0xffff);
        return htons(~sum);
      }
      
      void fix_ip_sum(struct iphdr *ip) {
        unsigned int sum = 0;
        sum_accumulate(&sum, ip, sizeof(*ip));
        ip->check = sum_final(sum);
      }
      
      void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
        unsigned int sum = 0;
        struct {
          unsigned int saddr;
          unsigned int daddr;
          unsigned char pad;
          unsigned char proto_num;
          unsigned short tcp_len;
        } fakehdr = {
          .saddr = ip->saddr,
          .daddr = ip->daddr,
          .proto_num = ip->protocol,
          .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
        };
        sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
        sum_accumulate(&sum, tcp, tcp->doff*4);
        tcp->check = sum_final(sum);
      }
      
      int main(void) {
        int tun_fd = tun_alloc("inject_dev%d");
        systemf("ip link set %s up", devname);
        systemf("ip addr add 192.168.42.1/24 dev %s", devname);
      
        struct {
          struct iphdr ip;
          struct tcphdr tcp;
          unsigned char tcp_opts[20];
        } __attribute__((packed)) syn_packet = {
          .ip = {
            .ihl = sizeof(struct iphdr)/4,
            .version = 4,
            .tot_len = htons(sizeof(syn_packet)),
            .ttl = 30,
            .protocol = IPPROTO_TCP,
            /* FIXUP check */
            .saddr = IPADDR(192,168,42,2),
            .daddr = IPADDR(192,168,42,1)
          },
          .tcp = {
            .source = htons(1),
            .dest = htons(1337),
            .seq = 0x12345678,
            .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
            .syn = 1,
            .window = htons(64),
            .check = 0 /*FIXUP*/
          },
          .tcp_opts = {
            /* INVALID: trailing MD5SIG opcode after NOPs */
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 19
          }
        };
        fix_ip_sum(&syn_packet.ip);
        fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
        while (1) {
          int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
          if (write_res != sizeof(syn_packet))
            err(1, "packet write failed");
        }
      }
      ====================================
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e5a206a
  5. 20 4月, 2018 4 次提交
  6. 18 4月, 2018 1 次提交
    • D
      net: Move fib_convert_metrics to metrics file · a919525a
      David Ahern 提交于
      Move logic of fib_convert_metrics into ip_metrics_convert. This allows
      the code that converts netlink attributes into metrics struct to be
      re-used in a later patch by IPv6.
      
      This is mostly a code move with the following changes to variable names:
        - fi->fib_net becomes net
        - fc_mx and fc_mx_len are passed as inputs pulled from fib_config
        - metrics array is passed as an input from fi->fib_metrics->metrics
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a919525a
  7. 17 4月, 2018 5 次提交
    • E
      tcp: implement mmap() for zero copy receive · 93ab6cc6
      Eric Dumazet 提交于
      Some networks can make sure TCP payload can exactly fit 4KB pages,
      with well chosen MSS/MTU and architectures.
      
      Implement mmap() system call so that applications can avoid
      copying data without complex splice() games.
      
      Note that a successful mmap( X bytes) on TCP socket is consuming
      bytes, as if recvmsg() has been done. (tp->copied += X)
      
      Only PROT_READ mappings are accepted, as skb page frags
      are fundamentally shared and read only.
      
      If tcp_mmap() finds data that is not a full page, or a patch of
      urgent data, -EINVAL is returned, no bytes are consumed.
      
      Application must fallback to recvmsg() to read the problematic sequence.
      
      mmap() wont block,  regardless of socket being in blocking or
      non-blocking mode. If not enough bytes are in receive queue,
      mmap() would return -EAGAIN, or -EIO if socket is in a state
      where no other bytes can be added into receive queue.
      
      An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
      to efficiently use mmap()
      
      On the sender side, MSG_EOR might help to clearly separate unaligned
      headers and 4K-aligned chunks if necessary.
      
      Tested:
      
      mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
      MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)
      
      Without mmap() (tcp_mmap -s)
      
      received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
        cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
      received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
        cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
      received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
        cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
      received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
        cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches
      
      With mmap() on receiver (tcp_mmap -s -z)
      
      received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
        cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
      received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
        cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
      received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
        cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
      received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
        cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93ab6cc6
    • E
      tcp: avoid extra wakeups for SO_RCVLOWAT users · 03f45c88
      Eric Dumazet 提交于
      SO_RCVLOWAT is properly handled in tcp_poll(), so that POLLIN is only
      generated when enough bytes are available in receive queue, after
      David change (commit c7004482 "tcp: Respect SO_RCVLOWAT in tcp_poll().")
      
      But TCP still calls sk->sk_data_ready() for each chunk added in receive
      queue, meaning thread is awaken, and goes back to sleep shortly after.
      
      Tested:
      
      tcp_mmap test program, receiving 32768 MB of data with SO_RCVLOWAT set to 512KB
      
      -> Should get ~2 wakeups (c-switches) per MB, regardless of how many
      (tiny or big) packets were received.
      
      High speed (mostly full size GRO packets)
      
      received 32768 MB (100 % mmap'ed) in 8.03112 s, 34.2266 Gbit,
        cpu usage user:0.037 sys:1.404, 43.9758 usec per MB, 65497 c-switches
      
      received 32768 MB (99.9954 % mmap'ed) in 7.98453 s, 34.4263 Gbit,
        cpu usage user:0.03 sys:1.422, 44.3115 usec per MB, 65485 c-switches
      
      Low speed (sender is ratelimited and sends 1-MSS at a time, so GRO is not helping)
      
      received 22474.5 MB (100 % mmap'ed) in 6015.35 s, 0.0313414 Gbit,
        cpu usage user:0.05 sys:1.586, 72.7952 usec per MB, 44950 c-switches
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03f45c88
    • E
      tcp: fix delayed acks behavior for SO_RCVLOWAT · 796f82ea
      Eric Dumazet 提交于
      We should not delay acks if there are not enough bytes
      in receive queue to satisfy SO_RCVLOWAT.
      
      Since [E]POLLIN event is not going to be generated, there is little
      hope for a delayed ack to be useful.
      
      In fact, delaying ACK prevents sender from completing
      the transfer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      796f82ea
    • E
      tcp: fix SO_RCVLOWAT and RCVBUF autotuning · d1361840
      Eric Dumazet 提交于
      Applications might use SO_RCVLOWAT on TCP socket hoping to receive
      one [E]POLLIN event only when a given amount of bytes are ready in socket
      receive queue.
      
      Problem is that receive autotuning is not aware of this constraint,
      meaning sk_rcvbuf might be too small to allow all bytes to be stored.
      
      Add a new (struct proto_ops)->set_rcvlowat method so that a protocol
      can override the default setsockopt(SO_RCVLOWAT) behavior.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1361840
    • G
      net: Fix one possible memleak in ip_setup_cork · 9783ccd0
      Gao Feng 提交于
      It would allocate memory in this function when the cork->opt is NULL. But
      the memory isn't freed if failed in the latter rt check, and return error
      directly. It causes the memleak if its caller is ip_make_skb which also
      doesn't free the cork->opt when meet a error.
      
      Now move the rt check ahead to avoid the memleak.
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9783ccd0
  8. 16 4月, 2018 1 次提交
  9. 13 4月, 2018 1 次提交
    • E
      tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets · 72123032
      Eric Dumazet 提交于
      syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
      
      I believe this was caused by a TCP_MD5SIG being set on live
      flow.
      
      This is highly unexpected, since TCP option space is limited.
      
      For instance, presence of TCP MD5 option automatically disables
      TCP TimeStamp option at SYN/SYNACK time, which we can not do
      once flow has been established.
      
      Really, adding/deleting an MD5 key only makes sense on sockets
      in CLOSE or LISTEN state.
      
      [1]
      BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
      CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
       tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline]
       tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184
       tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x448fe9
      RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9
      RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004
      RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010
      R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000
      R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624
       __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline]
       tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline]
       tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72123032
  10. 10 4月, 2018 1 次提交
  11. 09 4月, 2018 1 次提交
    • E
      inetpeer: fix uninit-value in inet_getpeer · b6a37e5e
      Eric Dumazet 提交于
      syzbot/KMSAN reported that p->dtime was read while it was
      not yet initialized in :
      
      	delta = (__u32)jiffies - p->dtime;
      	if (delta < ttl || !refcount_dec_if_one(&p->refcnt))
      		gc_stack[i] = NULL;
      
      This is a false positive, because the inetpeer wont be erased
      from rb-tree if the refcount_dec_if_one(&p->refcnt) does not
      succeed. And this wont happen before first inet_putpeer() call
      for this inetpeer has been done, and ->dtime field is written
      exactly before the refcount_dec_and_test(&p->refcnt).
      
      The KMSAN report was :
      
      BUG: KMSAN: uninit-value in inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
      BUG: KMSAN: uninit-value in inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
      CPU: 0 PID: 9494 Comm: syz-executor5 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
       inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
       inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
       icmpv4_xrlim_allow net/ipv4/icmp.c:330 [inline]
       icmp_send+0x2b44/0x3050 net/ipv4/icmp.c:725
       ip_options_compile+0x237c/0x29f0 net/ipv4/ip_options.c:472
       ip_rcv_options net/ipv4/ip_input.c:284 [inline]
       ip_rcv_finish+0xda8/0x16d0 net/ipv4/ip_input.c:365
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562
       __netif_receive_skb net/core/dev.c:4627 [inline]
       netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
       netif_receive_skb+0x230/0x240 net/core/dev.c:4725
       tun_rx_batched drivers/net/tun.c:1555 [inline]
       tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
       tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
       do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
       do_iter_write+0x30d/0xd40 fs/read_write.c:932
       vfs_writev fs/read_write.c:977 [inline]
       do_writev+0x3c9/0x830 fs/read_write.c:1012
       SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
       SyS_writev+0x56/0x80 fs/read_write.c:1082
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x455111
      RSP: 002b:00007fae0365cba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 000000000000002e RCX: 0000000000455111
      RDX: 0000000000000001 RSI: 00007fae0365cbf0 RDI: 00000000000000fc
      RBP: 0000000020000040 R08: 00000000000000fc R09: 0000000000000000
      R10: 000000000000002e R11: 0000000000000293 R12: 00000000ffffffff
      R13: 0000000000000658 R14: 00000000006fc8e0 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
       inet_getpeer+0xed8/0x1e70 net/ipv4/inetpeer.c:210
       inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
       ip4_frag_init+0x4d1/0x740 net/ipv4/ip_fragment.c:153
       inet_frag_alloc net/ipv4/inet_fragment.c:369 [inline]
       inet_frag_create net/ipv4/inet_fragment.c:385 [inline]
       inet_frag_find+0x7da/0x1610 net/ipv4/inet_fragment.c:418
       ip_find net/ipv4/ip_fragment.c:275 [inline]
       ip_defrag+0x448/0x67a0 net/ipv4/ip_fragment.c:676
       ip_check_defrag+0x775/0xda0 net/ipv4/ip_fragment.c:724
       packet_rcv_fanout+0x2a8/0x8d0 net/packet/af_packet.c:1447
       deliver_skb net/core/dev.c:1897 [inline]
       deliver_ptype_list_skb net/core/dev.c:1912 [inline]
       __netif_receive_skb_core+0x314a/0x4a80 net/core/dev.c:4545
       __netif_receive_skb net/core/dev.c:4627 [inline]
       netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
       netif_receive_skb+0x230/0x240 net/core/dev.c:4725
       tun_rx_batched drivers/net/tun.c:1555 [inline]
       tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
       tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
       do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
       do_iter_write+0x30d/0xd40 fs/read_write.c:932
       vfs_writev fs/read_write.c:977 [inline]
       do_writev+0x3c9/0x830 fs/read_write.c:1012
       SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
       SyS_writev+0x56/0x80 fs/read_write.c:1082
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6a37e5e
  12. 08 4月, 2018 2 次提交
    • E
      soreuseport: initialise timewait reuseport field · 3099a529
      Eric Dumazet 提交于
      syzbot reported an uninit-value in inet_csk_bind_conflict() [1]
      
      It turns out we never propagated sk->sk_reuseport into timewait socket.
      
      [1]
      BUG: KMSAN: uninit-value in inet_csk_bind_conflict+0x5f9/0x990 net/ipv4/inet_connection_sock.c:151
      CPU: 1 PID: 3589 Comm: syzkaller008242 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       inet_csk_bind_conflict+0x5f9/0x990 net/ipv4/inet_connection_sock.c:151
       inet_csk_get_port+0x1d28/0x1e40 net/ipv4/inet_connection_sock.c:320
       inet6_bind+0x121c/0x1820 net/ipv6/af_inet6.c:399
       SYSC_bind+0x3f2/0x4b0 net/socket.c:1474
       SyS_bind+0x54/0x80 net/socket.c:1460
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x4416e9
      RSP: 002b:00007ffce6d15c88 EFLAGS: 00000217 ORIG_RAX: 0000000000000031
      RAX: ffffffffffffffda RBX: 0100000000000000 RCX: 00000000004416e9
      RDX: 000000000000001c RSI: 0000000020402000 RDI: 0000000000000004
      RBP: 0000000000000000 R08: 00000000e6d15e08 R09: 00000000e6d15e08
      R10: 0000000000000004 R11: 0000000000000217 R12: 0000000000009478
      R13: 00000000006cd448 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
       tcp_time_wait+0xf17/0xf50 net/ipv4/tcp_minisocks.c:283
       tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003
       tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269
       inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435
       sock_release net/socket.c:595 [inline]
       sock_close+0xe0/0x300 net/socket.c:1149
       __fput+0x49e/0xa10 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x10e1/0x38d0 kernel/exit.c:867
       do_group_exit+0x1a0/0x360 kernel/exit.c:970
       SYSC_exit_group+0x21/0x30 kernel/exit.c:981
       SyS_exit_group+0x25/0x30 kernel/exit.c:979
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
       inet_twsk_alloc+0xaef/0xc00 net/ipv4/inet_timewait_sock.c:182
       tcp_time_wait+0xd9/0xf50 net/ipv4/tcp_minisocks.c:258
       tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003
       tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269
       inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435
       sock_release net/socket.c:595 [inline]
       sock_close+0xe0/0x300 net/socket.c:1149
       __fput+0x49e/0xa10 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x10e1/0x38d0 kernel/exit.c:867
       do_group_exit+0x1a0/0x360 kernel/exit.c:970
       SYSC_exit_group+0x21/0x30 kernel/exit.c:981
       SyS_exit_group+0x25/0x30 kernel/exit.c:979
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
       inet_twsk_alloc+0x13b/0xc00 net/ipv4/inet_timewait_sock.c:163
       tcp_time_wait+0xd9/0xf50 net/ipv4/tcp_minisocks.c:258
       tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003
       tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269
       inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435
       sock_release net/socket.c:595 [inline]
       sock_close+0xe0/0x300 net/socket.c:1149
       __fput+0x49e/0xa10 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x10e1/0x38d0 kernel/exit.c:867
       do_group_exit+0x1a0/0x360 kernel/exit.c:970
       SYSC_exit_group+0x21/0x30 kernel/exit.c:981
       SyS_exit_group+0x25/0x30 kernel/exit.c:979
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: da5e3630 ("soreuseport: TCP/IPv4 implementation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3099a529
    • E
      ipv4: fix uninit-value in ip_route_output_key_hash_rcu() · d0ea2b12
      Eric Dumazet 提交于
      syzbot complained that res.type could be used while not initialized.
      
      Using RTN_UNSPEC as initial value seems better than using garbage.
      
      BUG: KMSAN: uninit-value in __mkroute_output net/ipv4/route.c:2200 [inline]
      BUG: KMSAN: uninit-value in ip_route_output_key_hash_rcu+0x31f0/0x3940 net/ipv4/route.c:2493
      CPU: 1 PID: 12207 Comm: syz-executor0 Not tainted 4.16.0+ #81
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       __mkroute_output net/ipv4/route.c:2200 [inline]
       ip_route_output_key_hash_rcu+0x31f0/0x3940 net/ipv4/route.c:2493
       ip_route_output_key_hash net/ipv4/route.c:2322 [inline]
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x1eb/0x3c0 net/ipv4/route.c:2577
       raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x455259
      RSP: 002b:00007fdc0625dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007fdc0625e6d4 RCX: 0000000000455259
      RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013
      RBP: 000000000072bea0 R08: 0000000020000080 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000004f7 R14: 00000000006fa7c8 R15: 0000000000000000
      
      Local variable description: ----res.i.i@ip_route_output_flow
      Variable was created at:
       ip_route_output_flow+0x75/0x3c0 net/ipv4/route.c:2576
       raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0ea2b12
  13. 07 4月, 2018 2 次提交
  14. 06 4月, 2018 3 次提交
    • R
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap 提交于
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      514c6032
    • M
      arp: fix arp_filter on l3slave devices · 58b35f27
      Miguel Fadon Perlines 提交于
      arp_filter performs an ip_route_output search for arp source address and
      checks if output device is the same where the arp request was received,
      if it is not, the arp request is not answered.
      
      This route lookup is always done on main route table so l3slave devices
      never find the proper route and arp is not answered.
      
      Passing l3mdev_master_ifindex_rcu(dev) return value as oif fixes the
      lookup for l3slave devices while maintaining same behavior for non
      l3slave devices as this function returns 0 in that case.
      
      Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
      Signed-off-by: NMiguel Fadon Perlines <mfadon@teldat.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58b35f27
    • E
      ip_tunnel: better validate user provided tunnel names · 9cb726a2
      Eric Dumazet 提交于
      Use dev_valid_name() to make sure user does not provide illegal
      device name.
      
      syzbot caught the following bug :
      
      BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
      BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
      Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482
      
      CPU: 0 PID: 4482 Comm: syzkaller268107 Not tainted 4.16.0+ #1
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x1b9/0x29f lib/dump_stack.c:53
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
       check_memory_region_inline mm/kasan/kasan.c:260 [inline]
       check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
       memcpy+0x37/0x50 mm/kasan/kasan.c:303
       strlcpy include/linux/string.h:300 [inline]
       __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
       ip_tunnel_create net/ipv4/ip_tunnel.c:352 [inline]
       ip_tunnel_ioctl+0x818/0xd40 net/ipv4/ip_tunnel.c:861
       ipip_tunnel_ioctl+0x1c5/0x420 net/ipv4/ipip.c:350
       dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
       dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
       sock_ioctl+0x47e/0x680 net/socket.c:1015
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       SYSC_ioctl fs/ioctl.c:708 [inline]
       SyS_ioctl+0x24/0x30 fs/ioctl.c:706
       do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cb726a2
  15. 05 4月, 2018 1 次提交
  16. 04 4月, 2018 1 次提交
    • P
      net: avoid unneeded atomic operation in ip*_append_data() · 9e8445a5
      Paolo Abeni 提交于
      After commit 694aba69 ("ipv4: factorize sk_wmem_alloc updates
      done by __ip_append_data()") and commit 1f4c6eb2 ("ipv6:
      factorize sk_wmem_alloc updates done by __ip6_append_data()"),
      when transmitting sub MTU datagram, an addtional, unneeded atomic
      operation is performed in ip*_append_data() to update wmem_alloc:
      in the above condition the delta is 0.
      
      The above cause small but measurable performance regression in UDP
      xmit tput test with packet size below MTU.
      
      This change avoids such overhead updating wmem_alloc only if
      wmem_alloc_delta is non zero.
      
      The error path is left intentionally unmodified: it's a slow path
      and simplicity is preferred to performances.
      
      Fixes: 694aba69 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
      Fixes: 1f4c6eb2 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e8445a5
  17. 02 4月, 2018 2 次提交
    • X
      route: check sysctl_fib_multipath_use_neigh earlier than hash · 6174a30d
      Xin Long 提交于
      Prior to this patch, when one packet is hashed into path [1]
      (hash <= nh_upper_bound) and it's neigh is dead, it will try
      path [2]. However, if path [2]'s neigh is alive but it's
      hash > nh_upper_bound, it will not return this alive path.
      This packet will never be sent even if path [2] is alive.
      
       3.3.3.1/24:
        nexthop via 1.1.1.254 dev eth1 weight 1 <--[1] (dead neigh)
        nexthop via 2.2.2.254 dev eth2 weight 1 <--[2]
      
      With sysctl_fib_multipath_use_neigh set is supposed to find an
      available path respecting to the l3/l4 hash. But if there is
      no available route with this hash, it should at least return
      an alive route even with other hash.
      
      This patch is to fix it by processing fib_multipath_use_neigh
      earlier than the hash check, so that it will at least return
      an alive route if there is when fib_multipath_use_neigh is
      enabled. It's also compatible with before when there are alive
      routes with the l3/l4 hash.
      
      Fixes: a6db4494 ("net: ipv4: Consider failed nexthops in multipath routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6174a30d
    • E
      ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() · 694aba69
      Eric Dumazet 提交于
      While testing my inet defrag changes, I found that the senders
      could spend ~20% of cpu cycles in skb_set_owner_w() updating
      sk->sk_wmem_alloc for every fragment they cook.
      
      The solution to this problem is to use alloc_skb() instead
      of sock_wmalloc() and manually perform a single sk_wmem_alloc change.
      
      Similar change for IPv6 is provided in following patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      694aba69
  18. 01 4月, 2018 1 次提交