1. 10 11月, 2016 7 次提交
    • T
      net: qcom/emac: configure the external phy to allow pause frames · 3e884493
      Timur Tabi 提交于
      Pause frames are used to enable flow control.  A MAC can send and
      receive pause frames in order to throttle traffic.  However, the PHY
      must be configured to allow those frames to pass through.
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e884493
    • R
      net: bgmac: fix reversed checks for clock control flag · cdb26d33
      Rafał Miłecki 提交于
      This fixes regression introduced by patch adding feature flags. It was
      already reported and patch followed (it got accepted) but it appears it
      was incorrect. Instead of fixing reversed condition it broke a good one.
      
      This patch was verified to actually fix SoC hanges caused by bgmac on
      BCM47186B0.
      
      Fixes: db791eb2 ("net: ethernet: bgmac: convert to feature flags")
      Fixes: 4af1474e ("net: bgmac: Fix errant feature flag check")
      Cc: Jon Mason <jon.mason@broadcom.com>
      Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdb26d33
    • B
      bna: Add synchronization for tx ring. · d667f785
      Benjamin Poirier 提交于
      We received two reports of BUG_ON in bnad_txcmpl_process() where
      hw_consumer_index appeared to be ahead of producer_index. Out of order
      write/read of these variables could explain these reports.
      
      bnad_start_xmit(), as a producer of tx descriptors, has a few memory
      barriers sprinkled around writes to producer_index and the device's
      doorbell but they're not paired with anything in bnad_txcmpl_process(), a
      consumer.
      
      Since we are synchronizing with a device, we must use mandatory barriers,
      not smp_*. Also, I didn't see the purpose of the last smp_mb() in
      bnad_start_xmit().
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d667f785
    • T
      Revert "net/mlx4_en: Fix panic during reboot" · f91d7181
      Tariq Toukan 提交于
      This reverts commit 9d2afba0.
      
      The original issue would possibly exist if an external module
      tried calling our "ethtool_ops" without checking if it still
      exists.
      
      The right way of solving it is by simply doing the check in
      the caller side.
      Currently, no action is required as there's no such use case.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f91d7181
    • M
      net-ipv6: on device mtu change do not add mtu to mtu-less routes · fb56be83
      Maciej Żenczykowski 提交于
      Routes can specify an mtu explicitly or inherit the mtu from
      the underlying device - this inheritance is implemented in
      dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().
      
      Currently changing the mtu of a device adds mtu explicitly
      to routes using that device.
      
      ie.
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
      
      After this patch the route entry no longer changes unless it already has an mtu.
      There is no need: this inheritance is already done in ip6_mtu()
      
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route add local 2000::2 dev lo mtu 2000
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 1501
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
        # ip -6 route del local 2000::2
      
      This is desirable because changing device mtu and then resetting it
      to the previous value shouldn't change the user visible routing table.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      CC: Eric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb56be83
    • S
      sock: fix sendmmsg for partial sendmsg · 3023898b
      Soheil Hassas Yeganeh 提交于
      Do not send the next message in sendmmsg for partial sendmsg
      invocations.
      
      sendmmsg assumes that it can continue sending the next message
      when the return value of the individual sendmsg invocations
      is positive. It results in corrupting the data for TCP,
      SCTP, and UNIX streams.
      
      For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
      of "aefgh" if the first sendmsg invocation sends only the first
      byte while the second sendmsg goes through.
      
      Datagram sockets either send the entire datagram or fail, so
      this patch affects only sockets of type SOCK_STREAM and
      SOCK_SEQPACKET.
      
      Fixes: 228e548e ("net: Add sendmmsg socket system call")
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3023898b
    • G
      driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed. · aa5fd0fb
      Gao Feng 提交于
      When there is no existing macvlan port in lowdev, one new macvlan port
      would be created. But it doesn't be destoried when something failed later.
      It casues some memleak.
      
      Now add one flag to indicate if new macvlan port is created.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa5fd0fb
  2. 08 11月, 2016 12 次提交
  3. 05 11月, 2016 7 次提交
  4. 04 11月, 2016 14 次提交
    • W
      taskstats: fix the length of cgroupstats_cmd_get_policy · 243d5212
      WANG Cong 提交于
      cgroupstats_cmd_get_policy is [CGROUPSTATS_CMD_ATTR_MAX+1],
      taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1],
      but their family.maxattr is TASKSTATS_CMD_ATTR_MAX.
      CGROUPSTATS_CMD_ATTR_MAX is less than TASKSTATS_CMD_ATTR_MAX,
      so we could end up accessing out-of-bound.
      
      Change cgroupstats_cmd_get_policy to TASKSTATS_CMD_ATTR_MAX+1,
      this is safe because the rest are initialized to 0's.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      243d5212
    • W
      genetlink: fix a memory leak on error path · 00ffc1ba
      WANG Cong 提交于
      In __genl_register_family(), when genl_validate_assign_mc_groups()
      fails, we forget to free the memory we possibly allocate for
      family->attrbuf.
      
      Note, some callers call genl_unregister_family() to clean up
      on error path, it doesn't work because the family is inserted
      to the global list in the nearly last step.
      
      Cc: Jakub Kicinski <kubakici@wp.pl>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00ffc1ba
    • E
      ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped · 990ff4d8
      Eric Dumazet 提交于
      While fuzzing kernel with syzkaller, Andrey reported a nasty crash
      in inet6_bind() caused by DCCP lacking a required method.
      
      Fixes: ab1e0a13 ("[SOCK] proto: Add hashinfo member to struct proto")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      990ff4d8
    • G
      ehea: fix operation state report · 29ab5a3b
      Guilherme G. Piccoli 提交于
      Currently the ehea driver is missing a call to netif_carrier_off()
      before the interface bring-up; this is necessary in order to
      initialize the __LINK_STATE_NOCARRIER bit in the net_device state
      field. Otherwise, we observe state UNKNOWN on "ip address" command
      output.
      
      This patch adds a call to netif_carrier_off() on ehea's net device
      open callback.
      Reported-by: NXiong Zhou <zhou@redhat.com>
      Reference-ID: IBM bz #137702, Red Hat bz #1089134
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29ab5a3b
    • E
      ipv6: dccp: fix out of bound access in dccp_v6_err() · 1aa9d1a0
      Eric Dumazet 提交于
      dccp_v6_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmpv6_notify() are more than enough.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aa9d1a0
    • E
      netlink: netlink_diag_dump() runs without locks · 93636d1f
      Eric Dumazet 提交于
      A recent commit removed locking from netlink_diag_dump() but forgot
      one error case.
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      4.9.0-rc3+ #336 Not tainted
      -------------------------------------
      syz-executor/4018 is trying to release lock ([   36.220068] nl_table_lock
      ) at:
      [<ffffffff82dc8683>] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
      but there are no more locks to release!
      
      other info that might help us debug this:
      3 locks held by syz-executor/4018:
       #0: [   36.220068]  (
      sock_diag_mutex[   36.220068] ){+.+.+.}
      , at: [   36.220068] [<ffffffff82c3873b>] sock_diag_rcv+0x1b/0x40
       #1: [   36.220068]  (
      sock_diag_table_mutex[   36.220068] ){+.+.+.}
      , at: [   36.220068] [<ffffffff82c38e00>] sock_diag_rcv_msg+0x140/0x3a0
       #2: [   36.220068]  (
      nlk->cb_mutex[   36.220068] ){+.+.+.}
      , at: [   36.220068] [<ffffffff82db6600>] netlink_dump+0x50/0xac0
      
      stack backtrace:
      CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff8800645df688 ffffffff81b46934 ffffffff84eb3e78 ffff88006ad85800
       ffffffff82dc8683 ffffffff84eb3e78 ffff8800645df6b8 ffffffff812043ca
       dffffc0000000000 ffff88006ad85ff8 ffff88006ad85fd0 00000000ffffffff
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81b46934>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
       [<ffffffff812043ca>] print_unlock_imbalance_bug+0x17a/0x1a0
      kernel/locking/lockdep.c:3388
       [<     inline     >] __lock_release kernel/locking/lockdep.c:3512
       [<ffffffff8120cfd8>] lock_release+0x8e8/0xc60 kernel/locking/lockdep.c:3765
       [<     inline     >] __raw_read_unlock ./include/linux/rwlock_api_smp.h:225
       [<ffffffff83fc001a>] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
       [<ffffffff82dc8683>] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
       [<ffffffff82db6947>] netlink_dump+0x397/0xac0 net/netlink/af_netlink.c:2110
      
      Fixes: ad202074 ("netlink: Use rhashtable walk interface in diag dump")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93636d1f
    • E
      dccp: fix out of bound access in dccp_v4_err() · 6706a97f
      Eric Dumazet 提交于
      dccp_v4_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
      
      This patch might allow to process more ICMP messages, as some routers
      are still limiting the size of reflected bytes to 28 (RFC 792), instead
      of extended lengths (RFC 1812 4.3.2.3)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6706a97f
    • E
      dccp: do not send reset to already closed sockets · 346da62c
      Eric Dumazet 提交于
      Andrey reported following warning while fuzzing with syzkaller
      
      WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
       ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
       0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
       [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
       [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
       [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
       [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
       [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
       [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
       [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
       [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
       [<     inline     >] exit_task_work include/linux/task_work.h:21
       [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
       [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
       [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
       [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
       [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
      arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
      arch/x86/entry/common.c:259
       [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      
      Fix this the same way we did for TCP in commit 565b7b2d
      ("tcp: do not send reset to already closed sockets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      346da62c
    • E
      dccp: do not release listeners too soon · c3f24cfb
      Eric Dumazet 提交于
      Andrey Konovalov reported following error while fuzzing with syzkaller :
      
      IPv4: Attempt to release alive inet socket ffff880068e98940
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88006b9e0000 task.stack: ffff880068770000
      RIP: 0010:[<ffffffff819ead5f>]  [<ffffffff819ead5f>]
      selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
      RSP: 0018:ffff8800687771c8  EFLAGS: 00010202
      RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
      RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
      RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
      R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
      R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
      FS:  00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
      Stack:
       ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
       ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
       ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
      Call Trace:
       [<ffffffff819c8dd5>] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
       [<ffffffff82c2a9e7>] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
       [<ffffffff82b81e60>] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
       [<ffffffff838bbf12>] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306abd2>] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
       [<     inline     >] dst_input ./include/net/dst.h:507
       [<ffffffff83068500>] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306b82f>] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
       [<ffffffff82bd9fb7>] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
       [<ffffffff82bdb19a>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
       [<ffffffff82bdb493>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
       [<ffffffff82bdb6b8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
       [<ffffffff8241fc75>] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
       [<ffffffff82421b5a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
       [<     inline     >] new_sync_write fs/read_write.c:499
       [<ffffffff8151bd44>] __vfs_write+0x334/0x570 fs/read_write.c:512
       [<ffffffff8151f85b>] vfs_write+0x17b/0x500 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81523184>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
       [<ffffffff83fc02c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It turns out DCCP calls __sk_receive_skb(), and this broke when
      lookups no longer took a reference on listeners.
      
      Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
      so that sock_put() is used only when needed.
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f24cfb
    • E
      tcp: fix return value for partial writes · 79d8665b
      Eric Dumazet 提交于
      After my commit, tcp_sendmsg() might restart its loop after
      processing socket backlog.
      
      If sk_err is set, we blindly return an error, even though we
      copied data to user space before.
      
      We should instead return number of bytes that could be copied,
      otherwise user space might resend data and corrupt the stream.
      
      This might happen if another thread is using recvmsg(MSG_ERRQUEUE)
      to process timestamps.
      
      Issue was diagnosed by Soheil and Willem, big kudos to them !
      
      Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Tested-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79d8665b
    • L
      ipv4: allow local fragmentation in ip_finish_output_gso() · 9ee6c5dc
      Lance Richardson 提交于
      Some configurations (e.g. geneve interface with default
      MTU of 1500 over an ethernet interface with 1500 MTU) result
      in the transmission of packets that exceed the configured MTU.
      While this should be considered to be a "bad" configuration,
      it is still allowed and should not result in the sending
      of packets that exceed the configured MTU.
      
      Fix by dropping the assumption in ip_finish_output_gso() that
      locally originated gso packets will never need fragmentation.
      Basic testing using iperf (observing CPU usage and bandwidth)
      have shown no measurable performance impact for traffic not
      requiring fragmentation.
      
      Fixes: c7ba65d7 ("net: ip: push gso skb forwarding handling down the stack")
      Reported-by: NJan Tluka <jtluka@redhat.com>
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ee6c5dc
    • D
      net: tcp: check skb is non-NULL for exact match on lookups · da96786e
      David Ahern 提交于
      Andrey reported the following error report while running the syzkaller
      fuzzer:
      
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800398c4480 task.stack: ffff88003b468000
      RIP: 0010:[<ffffffff83091106>]  [<     inline     >]
      inet_exact_dif_match include/net/tcp.h:808
      RIP: 0010:[<ffffffff83091106>]  [<ffffffff83091106>]
      __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
      RSP: 0018:ffff88003b46f270  EFLAGS: 00010202
      RAX: 0000000000000004 RBX: 0000000000004242 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffc90000e3c000 RDI: 0000000000000054
      RBP: ffff88003b46f2d8 R08: 0000000000004000 R09: ffffffff830910e7
      R10: 0000000000000000 R11: 000000000000000a R12: ffffffff867fa0c0
      R13: 0000000000004242 R14: 0000000000000003 R15: dffffc0000000000
      FS:  00007fb135881700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020cc3000 CR3: 000000006d56a000 CR4: 00000000000006f0
      Stack:
       0000000000000000 000000000601a8c0 0000000000000000 ffffffff00004242
       424200003b9083c2 ffff88003def4041 ffffffff84e7e040 0000000000000246
       ffff88003a0911c0 0000000000000000 ffff88003a091298 ffff88003b9083ae
      Call Trace:
       [<ffffffff831100f4>] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
       [<ffffffff83115b1b>] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
      ...
      
      MD5 has a code path that calls __inet_lookup_listener with a null skb,
      so inet{6}_exact_dif_match needs to check skb against null before pulling
      the flag.
      
      Fixes: a04a480d ("net: Require exact match for TCP socket lookups if
             dif is l3mdev")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da96786e
    • E
      tcp: fix potential memory corruption · ac9e70b1
      Eric Dumazet 提交于
      Imagine initial value of max_skb_frags is 17, and last
      skb in write queue has 15 frags.
      
      Then max_skb_frags is lowered to 14 or smaller value.
      
      tcp_sendmsg() will then be allowed to add additional page frags
      and eventually go past MAX_SKB_FRAGS, overflowing struct
      skb_shared_info.
      
      Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
      Cc: Håkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac9e70b1
    • M
      qede: Correctly map aggregation replacement pages · 9512925a
      Mintz, Yuval 提交于
      Driver allocates replacement buffers before-hand to make
      sure whenever an aggregation begins there would be a replacement
      for the Rx buffers, as we can't release the buffer until
      aggregation is terminated and driver logic assumes the Rx rings
      are always full.
      
      For every other Rx page that's being allocated [I.e., regular]
      the page is being completely mapped while for the replacement
      buffers only the first portion of the page is being mapped.
      This means that:
        a. Once replacement buffer replenishes the regular Rx ring,
      assuming there's more than a single packet on page we'd post unmapped
      memory toward HW [assuming mapping is actually done in granularity
      smaller than page].
        b. Unmaps are being done for the entire page, which is incorrect.
      
      Fixes: 55482edc ("qede: Add slowpath/fastpath support and enable hardware GRO")
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9512925a