1. 26 10月, 2017 5 次提交
  2. 25 10月, 2017 2 次提交
  3. 24 10月, 2017 7 次提交
  4. 23 10月, 2017 3 次提交
    • H
      ipsec: Fix aborted xfrm policy dump crash · 1137b5e2
      Herbert Xu 提交于
      An independent security researcher, Mohamed Ghannam, has reported
      this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
      program.
      
      The xfrm_dump_policy_done function expects xfrm_dump_policy to
      have been called at least once or it will crash.  This can be
      triggered if a dump fails because the target socket's receive
      buffer is full.
      
      This patch fixes it by using the cb->start mechanism to ensure that
      the initialisation is always done regardless of the buffer situation.
      
      Fixes: 12a169e7 ("ipsec: Put dumpers on the dump list")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1137b5e2
    • E
      tcp/dccp: fix lockdep splat in inet_csk_route_req() · a6ca7abe
      Eric Dumazet 提交于
      This patch fixes the following lockdep splat in inet_csk_route_req()
      
        lockdep_rcu_suspicious
        inet_csk_route_req
        tcp_v4_send_synack
        tcp_rtx_synack
        inet_rtx_syn_ack
        tcp_fastopen_synack_time
        tcp_retransmit_timer
        tcp_write_timer_handler
        tcp_write_timer
        call_timer_fn
      
      Thread running inet_csk_route_req() owns a reference on the request
      socket, so we have the guarantee ireq->ireq_opt wont be changed or
      freed.
      
      lockdep can enforce this invariant for us.
      
      Fixes: c92e8c02 ("tcp/dccp: fix ireq->opt races")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6ca7abe
    • K
      tcp: do tcp_mstamp_refresh before retransmits on TSQ handler · 3a91d29f
      Koichiro Den 提交于
      When retransmission on TSQ handler was introduced in the commit
      f9616c35 ("tcp: implement TSQ for retransmits"), the retransmitted
      skbs' timestamps were updated on the actual transmission. In the later
      commit 385e2070 ("tcp: use tp->tcp_mstamp in output path"), it stops
      being done so. In the commit, the comment says "We try to refresh
      tp->tcp_mstamp only when necessary", and at present tcp_tsq_handler and
      tcp_v4_mtu_reduced applies to this. About the latter, it's okay since
      it's rare enough.
      
      About the former, even though possible retransmissions on the tasklet
      comes just after the destructor run in NET_RX softirq handling, the time
      between them could be nonnegligibly large to the extent that
      tcp_rack_advance or rto rearming be affected if other (remaining) RX,
      BLOCK and (preceding) TASKLET sofirq handlings are unexpectedly heavy.
      
      So in the same way as tcp_write_timer_handler does, doing tcp_mstamp_refresh
      ensures the accuracy of algorithms relying on it.
      
      Fixes: 385e2070 ("tcp: use tp->tcp_mstamp in output path")
      Signed-off-by: NKoichiro Den <den@klaipeden.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a91d29f
  5. 22 10月, 2017 23 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b5ac3beb
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "A little more than usual this time around. Been travelling, so that is
        part of it.
      
        Anyways, here are the highlights:
      
         1) Deal with memcontrol races wrt. listener dismantle, from Eric
            Dumazet.
      
         2) Handle page allocation failures properly in nfp driver, from Jaku
            Kicinski.
      
         3) Fix memory leaks in macsec, from Sabrina Dubroca.
      
         4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.
      
         5) Several fixes in bnxt_en driver, including preventing potential
            NVRAM parameter corruption from Michael Chan.
      
         6) Fix for KRACK attacks in wireless, from Johannes Berg.
      
         7) rtnetlink event generation fixes from Xin Long.
      
         8) Deadlock in mlxsw driver, from Ido Schimmel.
      
         9) Disallow arithmetic operations on context pointers in bpf, from
            Jakub Kicinski.
      
        10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
            Xin Long.
      
        11) Only TCP is supported for sockmap, make that explicit with a
            check, from John Fastabend.
      
        12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.
      
        13) Fix panic in packet_getsockopt(), also from Eric Dumazet.
      
        14) Add missing locked in hv_sock layer, from Dexuan Cui.
      
        15) Various aquantia bug fixes, including several statistics handling
            cures. From Igor Russkikh et al.
      
        16) Fix arithmetic overflow in devmap code, from John Fastabend.
      
        17) Fix busted socket memory accounting when we get a fault in the tcp
            zero copy paths. From Willem de Bruijn.
      
        18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
        stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
        ipv6: flowlabel: do not leave opt->tot_len with garbage
        of_mdio: Fix broken PHY IRQ in case of probe deferral
        textsearch: fix typos in library helpers
        rxrpc: Don't release call mutex on error pointer
        net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
        net: stmmac: Fix stmmac_get_rx_hwtstamp()
        net: stmmac: Add missing call to dev_kfree_skb()
        mlxsw: spectrum_router: Configure TIGCR on init
        mlxsw: reg: Add Tunneling IPinIP General Configuration Register
        net: ethtool: remove error check for legacy setting transceiver type
        soreuseport: fix initialization race
        net: bridge: fix returning of vlan range op errors
        sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
        bpf: add test cases to bpf selftests to cover all access tests
        bpf: fix pattern matches for direct packet access
        bpf: fix off by one for range markings with L{T, E} patterns
        bpf: devmap fix arithmetic overflow in bitmap_size calculation
        net: aquantia: Bad udp rate on default interrupt coalescing
        net: aquantia: Enable coalescing management via ethtool interface
        ...
      b5ac3beb
    • B
      stmmac: Don't access tx_q->dirty_tx before netif_tx_lock · 8d5f4b07
      Bernd Edlinger 提交于
      This is the possible reason for different hard to reproduce
      problems on my ARMv7-SMP test system.
      
      The symptoms are in recent kernels imprecise external aborts,
      and in older kernels various kinds of network stalls and
      unexpected page allocation failures.
      
      My testing indicates that the trouble started between v4.5 and v4.6
      and prevails up to v4.14.
      
      Using the dirty_tx before acquiring the spin lock is clearly
      wrong and was first introduced with v4.6.
      
      Fixes: e3ad57c9 ("stmmac: review RX/TX ring management")
      Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d5f4b07
    • E
      ipv6: flowlabel: do not leave opt->tot_len with garbage · 864e2a1f
      Eric Dumazet 提交于
      When syzkaller team brought us a C repro for the crash [1] that
      had been reported many times in the past, I finally could find
      the root cause.
      
      If FlowLabel info is merged by fl6_merge_options(), we leave
      part of the opt_space storage provided by udp/raw/l2tp with random value
      in opt_space.tot_len, unless a control message was provided at sendmsg()
      time.
      
      Then ip6_setup_cork() would use this random value to perform a kzalloc()
      call. Undefined behavior and crashes.
      
      Fix is to properly set tot_len in fl6_merge_options()
      
      At the same time, we can also avoid consuming memory and cpu cycles
      to clear it, if every option is copied via a kmemdup(). This is the
      change in ip6_setup_cork().
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cb64a100 task.stack: ffff8801cc350000
      RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
      RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
      RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
      RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
      R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
      R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
      FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
      DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
       udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x358/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4520a9
      RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
      RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
      RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
      R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
      Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
      RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      864e2a1f
    • G
      of_mdio: Fix broken PHY IRQ in case of probe deferral · 66bdede4
      Geert Uytterhoeven 提交于
      If an Ethernet PHY is initialized before the interrupt controller it is
      connected to, a message like the following is printed:
      
          irq: no irq domain found for /interrupt-controller@e61c0000 !
      
      However, the actual error is ignored, leading to a non-functional (POLL)
      PHY interrupt later:
      
          Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)
      
      Depending on whether the PHY driver will fall back to polling, Ethernet
      may or may not work.
      
      To fix this:
        1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
           of_irq_get().
           Unlike the former, the latter returns -EPROBE_DEFER if the
           interrupt controller is not yet available, so this condition can be
           detected.
           Other errors are handled the same as before, i.e. use the passed
           mdio->irq[addr] as interrupt.
        2. Propagate and handle errors from of_mdiobus_register_phy() and
           of_mdiobus_register_device().
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66bdede4
    • R
      textsearch: fix typos in library helpers · 7433a8d6
      Randy Dunlap 提交于
      Fix spellos (typos) in textsearch library helpers.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7433a8d6
    • D
      rxrpc: Don't release call mutex on error pointer · 6cb3ece9
      David Howells 提交于
      Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
      call pointer actually holds an error value.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cb3ece9
    • D
      Merge branch 'stmmac-hw-tstamp-fixes' · 748759d5
      David S. Miller 提交于
      Jose Abreu says:
      
      ====================
      net: stmmac: Fix HW timestamping
      
      Three fixes for HW timestamping feature, all of them for RX side.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      748759d5
    • J
      net: stmmac: Prevent infinite loop in get_rx_timestamp_status() · 9454360d
      Jose Abreu 提交于
      Prevent infinite loop by correctly setting the loop condition to
      break when i == 10.
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9454360d
    • J
      net: stmmac: Fix stmmac_get_rx_hwtstamp() · 98870943
      Jose Abreu 提交于
      When using GMAC4 the valid timestamp is from CTX next desc but
      we are passing the previous desc to get_rx_timestamp_status()
      callback.
      
      Fix this and while at it rework a little bit the function logic.
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98870943
    • J
      net: stmmac: Add missing call to dev_kfree_skb() · 9c8080d0
      Jose Abreu 提交于
      When RX HW timestamp is enabled and a frame is discarded we are
      not freeing the skb but instead only setting to NULL the entry.
      
      Add a call to dev_kfree_skb_any() so that skb entry is correctly
      freed.
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c8080d0
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · e5f468b3
      Linus Torvalds 提交于
      Pull input fixes from Dmitry Torokhov:
      
       - joydev now implements a blacklist to avoid creating joystick nodes
         for accelerometers found in composite devices such as PlaStation
         controllers
      
       - assorted driver fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: ims-psu - check if CDC union descriptor is sane
        Input: joydev - blacklist ds3/ds4/udraw motion sensors
        Input: allow matching device IDs on property bits
        Input: factor out and export input_device_id matching code
        Input: goodix - poll the 'buffer status' bit before reading data
        Input: axp20x-pek - fix module not auto-loading for axp221 pek
        Input: tca8418 - enable interrupt after it has been requested
        Input: stmfts - fix setting ABS_MT_POSITION_* maximum size
        Input: ti_am335x_tsc - fix incorrect step config for 5 wire touchscreen
        Input: synaptics - disable kernel tracking on SMBus devices
      e5f468b3
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ec0145e9
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro:
       "MS_I_VERSION fixes - Mimi's fix + missing bits picked from Matthew
        (his patch contained a duplicate of the fs/namespace.c fix as well,
        but by that point the original fix had already been applied)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Convert fs/*/* to SB_I_VERSION
        vfs: fix mounting a filesystem with i_version
      ec0145e9
    • D
      Merge branch 'mlxsw-fixes' · 0247880a
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      mlxsw: spectrum: Configure TTL of "inherit" for offloaded tunnels
      
      Petr says:
      
      Currently mlxsw only offloads tunnels that are configured with TTL of "inherit"
      (which is the default). However, Spectrum defaults to 255 and the driver
      neglects to change the configuration. Thus the tunnel packets from offloaded
      tunnels always have TTL of 255, even though tunnels with explicit TTL of 255 are
      never actually offloaded.
      
      To fix this, introduce support for TIGCR, the register that keeps the related
      bits of global tunnel configuration, and use it on first offload to properly
      configure inheritance of TTL of tunnel packets from overlay packets.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0247880a
    • P
      mlxsw: spectrum_router: Configure TIGCR on init · dcbda282
      Petr Machata 提交于
      Spectrum tunnels do not default to ttl of "inherit" like the Linux ones
      do. Configure TIGCR on router init so that the TTL of tunnel packets is
      copied from the overlay packets.
      
      Fixes: ee954d1a ("mlxsw: spectrum_router: Support GRE tunnels")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcbda282
    • P
      mlxsw: reg: Add Tunneling IPinIP General Configuration Register · 14aefd90
      Petr Machata 提交于
      The TIGCR register is used for setting up the IPinIP Tunnel
      configuration.
      
      Fixes: ee954d1a ("mlxsw: spectrum_router: Support GRE tunnels")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14aefd90
    • N
      net: ethtool: remove error check for legacy setting transceiver type · 95491e3c
      Niklas Söderlund 提交于
      Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
      restores the transceiver type to struct ethtool_link_settings and
      convert_link_ksettings_to_legacy_settings() but forgets to remove the
      error check for the same in convert_legacy_settings_to_link_ksettings().
      This prevents older versions of ethtool to change link settings.
      
          # ethtool --version
          ethtool version 3.16
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          Cannot set new settings: Invalid argument
            not setting speed
            not setting duplex
            not setting autoneg
      
      While newer versions of ethtool works.
      
          # ethtool --version
          ethtool version 4.10
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          [   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
          [   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
      
      Fixes: 19cab887 ("net: ethtool: Add back transceiver type")
      Signed-off-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reported-by: NRenjith R V <renjith.rv@quest-global.com>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95491e3c
    • C
      soreuseport: fix initialization race · 1b5f962e
      Craig Gallek 提交于
      Syzkaller stumbled upon a way to trigger
      WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
      reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39
      
      There are two initialization paths for the sock_reuseport structure in a
      socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
      SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
      assumedthat the socket lock protected both of these paths when it actually
      only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
      double allocation by running these paths concurrently.
      
      This patch moves the check for double allocation into the reuseport_alloc
      function which is protected by a global spin lock.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b5f962e
    • N
      net: bridge: fix returning of vlan range op errors · 66c54517
      Nikolay Aleksandrov 提交于
      When vlan tunnels were introduced, vlan range errors got silently
      dropped and instead 0 was returned always. Restore the previous
      behaviour and return errors to user-space.
      
      Fixes: efa5356b ("bridge: per vlan dst_metadata netlink support")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66c54517
    • W
      sock: correct sk_wmem_queued accounting on efault in tcp zerocopy · 54d43117
      Willem de Bruijn 提交于
      Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
      after triggering an EFAULT in __zerocopy_sg_from_iter.
      
      On this error, skb_zerocopy_stream_iter resets the skb to its state
      before the operation with __pskb_trim. It cannot kfree_skb like
      datagram callers, as the skb may have data from a previous send call.
      
      __pskb_trim calls skb_condense for unowned skbs, which adjusts their
      truesize. These tcp skbuffs are owned and their truesize must add up
      to sk_wmem_queued. But they match because their skb->sk is NULL until
      tcp_transmit_skb.
      
      Temporarily set skb->sk when calling __pskb_trim to signal that the
      skbuffs are owned and avoid the skb_condense path.
      
      Fixes: 52267790 ("sock: add MSG_ZEROCOPY")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54d43117
    • D
      Merge branch 'bpf-range-marking-fixes' · d2b27624
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      Two BPF fixes for range marking
      
      The set contains two fixes for direct packet access range
      markings and test cases for all direct packet access patterns
      that the verifier matches on.
      
      They are targeted for net tree, note that once net gets merged
      into net-next, there will be a minor merge conflict due to
      signature change of the function find_good_pkt_pointers() as
      well as data_meta patterns present in net-next tree. You can
      just add bool false to the data_meta patterns and I will
      follow-up with properly converting the patterns for data_meta
      in a similar way.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2b27624
    • D
      bpf: add test cases to bpf selftests to cover all access tests · b37242c7
      Daniel Borkmann 提交于
      Lets add test cases to cover really all possible direct packet
      access tests for good/bad access cases so we keep tracking them.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b37242c7
    • D
      bpf: fix pattern matches for direct packet access · 0fd4759c
      Daniel Borkmann 提交于
      Alexander had a test program with direct packet access, where
      the access test was in the form of data + X > data_end. In an
      unrelated change to the program LLVM decided to swap the branches
      and emitted code for the test in form of data + X <= data_end.
      We hadn't seen these being generated previously, thus verifier
      would reject the program. Therefore, fix up the verifier to
      detect all test cases, so we don't run into such issues in the
      future.
      
      Fixes: b4e432f1 ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
      Reported-by: NAlexander Alemayhu <alexander@alemayhu.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fd4759c
    • D
      bpf: fix off by one for range markings with L{T, E} patterns · fb2a311a
      Daniel Borkmann 提交于
      During review I noticed that the current logic for direct packet
      access marking in check_cond_jmp_op() has an off by one for the
      upper right range border when marking in find_good_pkt_pointers()
      with BPF_JLT and BPF_JLE. It's not really harmful given access
      up to pkt_end is always safe, but we should nevertheless correct
      the range marking before it becomes ABI. If pkt_data' denotes a
      pkt_data derived pointer (pkt_data + X), then for pkt_data' < pkt_end
      in the true branch as well as for pkt_end <= pkt_data' in the false
      branch we mark the range with X although it should really be X - 1
      in these cases. For example, X could be pkt_end - pkt_data, then
      when testing for pkt_data' < pkt_end the verifier simulation cannot
      deduce that a byte load of pkt_data' - 1 would succeed in this
      branch.
      
      Fixes: b4e432f1 ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb2a311a