1. 27 4月, 2018 1 次提交
    • J
      bpf: fix uninitialized variable in bpf tools · 81542556
      John Fastabend 提交于
      Here the variable cont is used as the saved_pointer for a call to
      strtok_r(). It is safe to use the value uninitialized in this
      context however and the later reference is only ever used if
      the strtok_r is successful. But, 'gcc-5' at least doesn't have all
      this knowledge so initialize cont to NULL. Additionally, do the
      natural NULL check before accessing just for completness.
      
      The warning is the following:
      
      ./bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’:
      ./bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        } else if (matches(subcmd, "pcap") == 0) {
      
      Fixes: fd981e3c "filter: bpf_dbg: add minimal bpf debugger"
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      81542556
  2. 26 4月, 2018 5 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 25eb0ea7
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-04-25
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix to clear the percpu metadata_dst that could otherwise carry
         stale ip_tunnel_info, from William.
      
      2) Fix that reduces the number of passes in x64 JIT with regards to
         dead code sanitation to avoid risk of prog rejection, from Gianluca.
      
      3) Several fixes of sockmap programs, besides others, fixing a double
         page_put() in error path, missing refcount hold for pinned sockmap,
         adding required -target bpf for clang in sample Makefile, from John.
      
      4) Fix to disable preemption in __BPF_PROG_RUN_ARRAY() paths, from Roman.
      
      5) Fix tools/bpf/ Makefile with regards to a lex/yacc build error
         seen on older gcc-5, from John.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25eb0ea7
    • J
      bpf: fix for lex/yacc build error with gcc-5 · 9c299a32
      John Fastabend 提交于
      Fix build error found with Ubuntu shipped gcc-5
      
      ~/git/bpf/tools/bpf$ make all
      
      Auto-detecting system features:
      ...                        libbfd: [ OFF ]
      ...        disassembler-four-args: [ OFF ]
      
        CC       bpf_jit_disasm.o
        LINK     bpf_jit_disasm
        CC       bpf_dbg.o
      /home/john/git/bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’:
      /home/john/git/bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        } else if (matches(subcmd, "pcap") == 0) {
                   ^
        LINK     bpf_dbg
        CC       bpf_asm.o
      make: *** No rule to make target `bpf_exp.yacc.o', needed by `bpf_asm'.  Stop.
      
      Fixes: 5a8997f2 ("tools: bpf: respect output directory during build")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9c299a32
    • D
      rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp · 91a82529
      Dag Moxnes 提交于
      The function rds_ib_setup_qp is calling rds_ib_get_client_data and
      should correspondingly call rds_ib_dev_put. This call was lost in
      the non-error path with the introduction of error handling done in
      commit 3b12f73a ("rds: ib: add error handle")
      Signed-off-by: NDag Moxnes <dag.moxnes@oracle.com>
      Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91a82529
    • U
      net/smc: keep clcsock reference in smc_tcp_listen_work() · 070204a3
      Ursula Braun 提交于
      The internal CLC socket should exist till the SMC-socket is released.
      Function tcp_listen_worker() releases the internal CLC socket of a
      listen socket, if an smc_close_active() is called. This function
      is called for the final release(), but it is called for shutdown
      SHUT_RDWR as well. This opens a door for protection faults, if
      socket calls using the internal CLC socket are called for a
      shutdown listen socket.
      
      With the changes of
      commit 3d502067 ("net/smc: simplify wait when closing listen socket")
      there is no need anymore to release the internal CLC socket in
      function tcp_listen_worker((). It is sufficient to release it in
      smc_release().
      
      Fixes: 127f4970 ("net/smc: release clcsock from tcp_listen_worker")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Reported-by: syzbot+9045fc589fcd196ef522@syzkaller.appspotmail.com
      Reported-by: syzbot+28a2c86cf19c81d871fa@syzkaller.appspotmail.com
      Reported-by: syzbot+9605e6cace1b5efd4a0a@syzkaller.appspotmail.com
      Reported-by: syzbot+cf9012c597c8379d535c@syzkaller.appspotmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      070204a3
    • A
      net: phy: allow scanning busses with missing phys · 02a6efca
      Alexandre Belloni 提交于
      Some MDIO busses will error out when trying to read a phy address with no
      phy present at that address. In that case, probing the bus will fail
      because __mdiobus_register() is scanning the bus for all possible phys
      addresses.
      
      In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at
      this address and set the phy ID to 0xffffffff which is then properly
      handled in get_phy_device().
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02a6efca
  3. 25 4月, 2018 10 次提交
    • G
      bpf, x64: fix JIT emission for dead code · 1612a981
      Gianluca Borello 提交于
      Commit 2a5418a1 ("bpf: improve dead code sanitizing") replaced dead
      code with a series of ja-1 instructions, for safety. That made JIT
      compilation much more complex for some BPF programs. One instance of such
      programs is, for example:
      
      bool flag = false
      ...
      /* A bunch of other code */
      ...
      if (flag)
              do_something()
      
      In some cases llvm is not able to remove at compile time the code for
      do_something(), so the generated BPF program ends up with a large amount
      of dead instructions. In one specific real life example, there are two
      series of ~500 and ~1000 dead instructions in the program. When the
      verifier replaces them with a series of ja-1 instructions, it causes an
      interesting behavior at JIT time.
      
      During the first pass, since all the instructions are estimated at 64
      bytes, the ja-1 instructions end up being translated as 5 bytes JMP
      instructions (0xE9), since the jump offsets become increasingly large (>
      127) as each instruction gets discovered to be 5 bytes instead of the
      estimated 64.
      
      Starting from the second pass, the first N instructions of the ja-1
      sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets
      become <= 127 this time. In particular, N is defined as roughly 127 / (5
      - 2) ~= 42. So, each further pass will make the subsequent N JMP
      instructions shrink from 5 to 2 bytes, making the image shrink every time.
      This means that in order to have the entire program converge, there need
      to be, in the real example above, at least ~1000 / 42 ~= 24 passes just
      for translating the dead code. If we add this number to the passes needed
      to translate the other non dead code, it brings such program to 40+
      passes, and JIT doesn't complete. Ultimately the userspace loader fails
      because such BPF program was supposed to be part of a prog array owner
      being JITed.
      
      While it is certainly possible to try to refactor such programs to help
      the compiler remove dead code, the behavior is not really intuitive and it
      puts further burden on the BPF developer who is not expecting such
      behavior. To make things worse, such programs are working just fine in all
      the kernel releases prior to the ja-1 fix.
      
      A possible approach to mitigate this behavior consists into noticing that
      for ja-1 instructions we don't really need to rely on the estimated size
      of the previous and current instructions, we know that a -1 BPF jump
      offset can be safely translated into a 0xEB instruction with a jump offset
      of -2.
      
      Such fix brings the BPF program in the previous example to complete again
      in ~9 passes.
      
      Fixes: 2a5418a1 ("bpf: improve dead code sanitizing")
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1612a981
    • W
      bpf: clear the ip_tunnel_info. · 5540fbf4
      William Tu 提交于
      The percpu metadata_dst might carry the stale ip_tunnel_info
      and cause incorrect behavior.  When mixing tests using ipv4/ipv6
      bpf vxlan and geneve tunnel, the ipv6 tunnel info incorrectly uses
      ipv4's src ip addr as its ipv6 src address, because the previous
      tunnel info does not clean up.  The patch zeros the fields in
      ip_tunnel_info.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Reported-by: NYifeng Sun <pkusunyifeng@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5540fbf4
    • L
      Merge branch 'userns-linus' of... · 3be4aaf4
      Linus Torvalds 提交于
      Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull userns bug fix from Eric Biederman:
       "Just a small fix to properly set the return code on error"
      
      * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        commoncap: Handle memory allocation failure.
      3be4aaf4
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 24cac700
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix rtnl deadlock in ipvs, from Julian Anastasov.
      
       2) s390 qeth fixes from Julian Wiedmann (control IO completion stalls,
          bad MAC address update sequence, request side races on command IO
          timeouts).
      
       3) Handle seq_file overflow properly in l2tp, from Guillaume Nault.
      
       4) Fix VLAN priority mappings in cpsw driver, from Ivan Khoronzhuk.
      
       5) Packet scheduler ife action fixes (malformed TLV lengths, etc.) from
          Alexander Aring.
      
       6) Fix out of bounds access in tcp md5 option parser, from Jann Horn.
      
       7) Missing netlink attribute policies in rtm_ipv6_policy table, from
          Eric Dumazet.
      
       8) Missing socket address length checks in l2tp and pppoe connect, from
          Guillaume Nault.
      
       9) Fix netconsole over team and bonding, from Xin Long.
      
      10) Fix race with AF_PACKET socket state bitfields, from Willem de
          Bruijn.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
        ice: Fix insufficient memory issue in ice_aq_manage_mac_read
        sfc: ARFS filter IDs
        net: ethtool: Add missing kernel doc for FEC parameters
        packet: fix bitfield update race
        ice: Do not check INTEVENT bit for OICR interrupts
        ice: Fix incorrect comment for action type
        ice: Fix initialization for num_nodes_added
        igb: Fix the transmission mode of queue 0 for Qav mode
        ixgbevf: ensure xdp_ring resources are free'd on error exit
        team: fix netconsole setup over team
        amd-xgbe: Only use the SFP supported transceiver signals
        amd-xgbe: Improve KR auto-negotiation and training
        amd-xgbe: Add pre/post auto-negotiation phy hooks
        pppoe: check sockaddr length in pppoe_connect()
        l2tp: check sockaddr length in pppol2tp_connect()
        net: phy: marvell: clear wol event before setting it
        ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
        bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
        tcp: don't read out-of-bounds opsize
        ibmvnic: Clean actual number of RX or TX pools
        ...
      24cac700
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · d19efb72
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-04-24
      
      This series contains fixes to ixgbevf, igb and ice drivers.
      
      Colin Ian King fixes the return value on error for the new XDP support
      that went into ixgbevf for 4.17.
      
      Vinicius provides a fix for queue 0 for igb, which was not receiving all
      the credits it needed when QAV mode was enabled.
      
      Anirudh provides several fixes for the new ice driver, starting with
      properly initializing num_nodes_added to zero.  Fixed up a code comment
      to better reflect what is really going on in the code.  Fixed how to
      detect if an OICR interrupt has occurred to a more reliable method.
      
      Md Fahad fixes the ice driver to allocate the right amount of memory
      when reading and storing the devices MAC addresses.  The device can have
      up to 2 MAC addresses (LAN and WoL), while WoL is currently not
      supported, we need to ensure it can be properly handled when support is
      added.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d19efb72
    • M
      ice: Fix insufficient memory issue in ice_aq_manage_mac_read · d6fef10c
      Md Fahad Iqbal Polash 提交于
      For the MAC read operation, the device can return up to two (LAN and WoL)
      MAC addresses. Without access to adequate memory, the device will return
      an error. Fixed this by allocating the right amount of memory. Also, logic
      to detect and copy the LAN MAC address into the port_info structure has
      been added. Note that the WoL MAC address is ignored currently as the WoL
      feature isn't supported yet.
      
      Fixes: dc49c772 ("ice: Get MAC/PHY/link info and scheduler topology")
      Signed-off-by: NMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
      Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d6fef10c
    • E
      sfc: ARFS filter IDs · f8d62037
      Edward Cree 提交于
      Associate an arbitrary ID with each ARFS filter, allowing to properly query
       for expiry.  The association is maintained in a hash table, which is
       protected by a spinlock.
      
      v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
      v2: fixed uninitialised variable (thanks davem and lkp-robot).
      
      Fixes: 3af0f342 ("sfc: replace asynchronous filter operations")
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8d62037
    • F
      net: ethtool: Add missing kernel doc for FEC parameters · d805c520
      Florian Fainelli 提交于
      While adding support for ethtool::get_fecparam and set_fecparam, kernel
      doc for these functions was missed, add those.
      
      Fixes: 1a5f3da2 ("net: ethtool: add support for forward error correction modes")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d805c520
    • W
      packet: fix bitfield update race · a6361f0c
      Willem de Bruijn 提交于
      Updates to the bitfields in struct packet_sock are not atomic.
      Serialize these read-modify-write cycles.
      
      Move po->running into a separate variable. Its writes are protected by
      po->bind_lock (except for one startup case at packet_create). Also
      replace a textual precondition warning with lockdep annotation.
      
      All others are set only in packet_setsockopt. Serialize these
      updates by holding the socket lock. Analogous to other field updates,
      also hold the lock when testing whether a ring is active (pg_vec).
      
      Fixes: 8dc41944 ("[PACKET]: Add optional checksum computation for recvmsg")
      Reported-by: NDaeRyong Jeong <threeearcat@gmail.com>
      Reported-by: NByoungyoung Lee <byoungyoung@purdue.edu>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6361f0c
    • B
      ice: Do not check INTEVENT bit for OICR interrupts · 30d84397
      Ben Shelton 提交于
      According to the hardware spec, checking the INTEVENT bit isn't a
      reliable way to detect if an OICR interrupt has occurred. This is
      because this bit can be cleared by the hardware/firmware before the
      interrupt service routine has run. So instead, just check for OICR
      events every time.
      
      Fixes: 940b61af ("ice: Initialize PF and setup miscellaneous interrupt")
      Signed-off-by: NBen Shelton <benjamin.h.shelton@intel.com>
      Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      30d84397
  4. 24 4月, 2018 21 次提交
  5. 23 4月, 2018 3 次提交
    • X
      bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave · ddea788c
      Xin Long 提交于
      After Commit 8a8efa22 ("bonding: sync netpoll code with bridge"), it
      would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
      if bond->dev->npinfo was set.
      
      However now slave_dev npinfo is set with bond->dev->npinfo before calling
      slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
      in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
      It causes that the lower dev of this slave dev can't set its npinfo.
      
      One way to reproduce it:
      
        # modprobe bonding
        # brctl addbr br0
        # brctl addif br0 eth1
        # ifconfig bond0 192.168.122.1/24 up
        # ifenslave bond0 eth2
        # systemctl restart netconsole
        # ifenslave bond0 br0
        # ifconfig eth2 down
        # systemctl restart netconsole
      
      The netpoll won't really work.
      
      This patch is to remove that slave_dev npinfo setting in bond_enslave().
      
      Fixes: 8a8efa22 ("bonding: sync netpoll code with bridge")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddea788c
    • J
      tcp: don't read out-of-bounds opsize · 7e5a206a
      Jann Horn 提交于
      The old code reads the "opsize" variable from out-of-bounds memory (first
      byte behind the segment) if a broken TCP segment ends directly after an
      opcode that is neither EOL nor NOP.
      
      The result of the read isn't used for anything, so the worst thing that
      could theoretically happen is a pagefault; and since the physmap is usually
      mostly contiguous, even that seems pretty unlikely.
      
      The following C reproducer triggers the uninitialized read - however, you
      can't actually see anything happen unless you put something like a
      pr_warn() in tcp_parse_md5sig_option() to print the opsize.
      
      ====================================
      #define _GNU_SOURCE
      #include <arpa/inet.h>
      #include <stdlib.h>
      #include <errno.h>
      #include <stdarg.h>
      #include <net/if.h>
      #include <linux/if.h>
      #include <linux/ip.h>
      #include <linux/tcp.h>
      #include <linux/in.h>
      #include <linux/if_tun.h>
      #include <err.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/ioctl.h>
      #include <assert.h>
      
      void systemf(const char *command, ...) {
        char *full_command;
        va_list ap;
        va_start(ap, command);
        if (vasprintf(&full_command, command, ap) == -1)
          err(1, "vasprintf");
        va_end(ap);
        printf("systemf: <<<%s>>>\n", full_command);
        system(full_command);
      }
      
      char *devname;
      
      int tun_alloc(char *name) {
        int fd = open("/dev/net/tun", O_RDWR);
        if (fd == -1)
          err(1, "open tun dev");
        static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
        strcpy(req.ifr_name, name);
        if (ioctl(fd, TUNSETIFF, &req))
          err(1, "TUNSETIFF");
        devname = req.ifr_name;
        printf("device name: %s\n", devname);
        return fd;
      }
      
      #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
      
      void sum_accumulate(unsigned int *sum, void *data, int len) {
        assert((len&2)==0);
        for (int i=0; i<len/2; i++) {
          *sum += ntohs(((unsigned short *)data)[i]);
        }
      }
      
      unsigned short sum_final(unsigned int sum) {
        sum = (sum >> 16) + (sum & 0xffff);
        sum = (sum >> 16) + (sum & 0xffff);
        return htons(~sum);
      }
      
      void fix_ip_sum(struct iphdr *ip) {
        unsigned int sum = 0;
        sum_accumulate(&sum, ip, sizeof(*ip));
        ip->check = sum_final(sum);
      }
      
      void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
        unsigned int sum = 0;
        struct {
          unsigned int saddr;
          unsigned int daddr;
          unsigned char pad;
          unsigned char proto_num;
          unsigned short tcp_len;
        } fakehdr = {
          .saddr = ip->saddr,
          .daddr = ip->daddr,
          .proto_num = ip->protocol,
          .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
        };
        sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
        sum_accumulate(&sum, tcp, tcp->doff*4);
        tcp->check = sum_final(sum);
      }
      
      int main(void) {
        int tun_fd = tun_alloc("inject_dev%d");
        systemf("ip link set %s up", devname);
        systemf("ip addr add 192.168.42.1/24 dev %s", devname);
      
        struct {
          struct iphdr ip;
          struct tcphdr tcp;
          unsigned char tcp_opts[20];
        } __attribute__((packed)) syn_packet = {
          .ip = {
            .ihl = sizeof(struct iphdr)/4,
            .version = 4,
            .tot_len = htons(sizeof(syn_packet)),
            .ttl = 30,
            .protocol = IPPROTO_TCP,
            /* FIXUP check */
            .saddr = IPADDR(192,168,42,2),
            .daddr = IPADDR(192,168,42,1)
          },
          .tcp = {
            .source = htons(1),
            .dest = htons(1337),
            .seq = 0x12345678,
            .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
            .syn = 1,
            .window = htons(64),
            .check = 0 /*FIXUP*/
          },
          .tcp_opts = {
            /* INVALID: trailing MD5SIG opcode after NOPs */
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 19
          }
        };
        fix_ip_sum(&syn_packet.ip);
        fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
        while (1) {
          int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
          if (write_res != sizeof(syn_packet))
            err(1, "packet write failed");
        }
      }
      ====================================
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e5a206a
    • L
      Linux 4.17-rc2 · 6d08b06e
      Linus Torvalds 提交于
      6d08b06e