1. 11 1月, 2018 4 次提交
  2. 10 1月, 2018 5 次提交
    • A
      bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866
      Alexei Starovoitov 提交于
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      290af866
    • W
      ipv6: remove null_entry before adding default route · 4512c43e
      Wei Wang 提交于
      In the current code, when creating a new fib6 table, tb6_root.leaf gets
      initialized to net->ipv6.ip6_null_entry.
      If a default route is being added with rt->rt6i_metric = 0xffffffff,
      fib6_add() will add this route after net->ipv6.ip6_null_entry. As
      null_entry is shared, it could cause problem.
      
      In order to fix it, set fn->leaf to NULL before calling
      fib6_add_rt2node() when trying to add the first default route.
      And reset fn->leaf to null_entry when adding fails or when deleting the
      last default route.
      
      syzkaller reported the following issue which is fixed by this commit:
      
      WARNING: suspicious RCU usage
      4.15.0-rc5+ #171 Not tainted
      -----------------------------
      net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      4 locks held by swapper/0/0:
       #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline]
       #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310
       #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline]
       #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007
       #2:  (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560
       #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline]
       #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948
      
      stack backtrace:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
       fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701
       fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892
       fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815
       fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863
       fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933
       __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949
       fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline]
       fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016
       fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033
       call_timer_fn+0x228/0x820 kernel/time/timer.c:1320
       expire_timers kernel/time/timer.c:1357 [inline]
       __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660
       run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
       __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:540 [inline]
       smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904
       </IRQ>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4512c43e
    • N
      net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg() · 20b50d79
      Nicolai Stange 提交于
      Commit 8f659a03 ("net: ipv4: fix for a race condition in
      raw_sendmsg") fixed the issue of possibly inconsistent ->hdrincl handling
      due to concurrent updates by reading this bit-field member into a local
      variable and using the thus stabilized value in subsequent tests.
      
      However, aforementioned commit also adds the (correct) comment that
      
        /* hdrincl should be READ_ONCE(inet->hdrincl)
         * but READ_ONCE() doesn't work with bit fields
         */
      
      because as it stands, the compiler is free to shortcut or even eliminate
      the local variable at its will.
      
      Note that I have not seen anything like this happening in reality and thus,
      the concern is a theoretical one.
      
      However, in order to be on the safe side, emulate a READ_ONCE() on the
      bit-field by doing it on the local 'hdrincl' variable itself:
      
      	int hdrincl = inet->hdrincl;
      	hdrincl = READ_ONCE(hdrincl);
      
      This breaks the chain in the sense that the compiler is not allowed
      to replace subsequent reads from hdrincl with reloads from inet->hdrincl.
      
      Fixes: 8f659a03 ("net: ipv4: fix for a race condition in raw_sendmsg")
      Signed-off-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20b50d79
    • X
      net: caif: use strlcpy() instead of strncpy() · 3dc2fa47
      Xiongfeng Wang 提交于
      gcc-8 reports
      
      net/caif/caif_dev.c: In function 'caif_enroll_dev':
      ./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
      be truncated copying 15 bytes from a string of length 15
      [-Wstringop-truncation]
      
      net/caif/cfctrl.c: In function 'cfctrl_linkup_request':
      ./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
      be truncated copying 15 bytes from a string of length 15
      [-Wstringop-truncation]
      
      net/caif/cfcnfg.c: In function 'caif_connect_client':
      ./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
      be truncated copying 15 bytes from a string of length 15
      [-Wstringop-truncation]
      
      The compiler require that the input param 'len' of strncpy() should be
      greater than the length of the src string, so that '\0' is copied as
      well. We can just use strlcpy() to avoid this warning.
      Signed-off-by: NXiongfeng Wang <xiongfeng.wang@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc2fa47
    • A
      net: core: fix module type in sock_diag_bind · b8fd0823
      Andrii Vladyka 提交于
      Use AF_INET6 instead of AF_INET in IPv6-related code path
      Signed-off-by: NAndrii Vladyka <tulup@mail.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8fd0823
  3. 09 1月, 2018 2 次提交
    • M
      sctp: fix the handling of ICMP Frag Needed for too small MTUs · b6c5734d
      Marcelo Ricardo Leitner 提交于
      syzbot reported a hang involving SCTP, on which it kept flooding dmesg
      with the message:
      [  246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
      low, using default minimum of 512
      
      That happened because whenever SCTP hits an ICMP Frag Needed, it tries
      to adjust to the new MTU and triggers an immediate retransmission. But
      it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
      allowed (512) would not cause the PMTU to change, and issued the
      retransmission anyway (thus leading to another ICMP Frag Needed, and so
      on).
      
      As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
      are higher than that, sctp_transport_update_pmtu() is changed to
      re-fetch the PMTU that got set after our request, and with that, detect
      if there was an actual change or not.
      
      The fix, thus, skips the immediate retransmission if the received ICMP
      resulted in no change, in the hope that SCTP will select another path.
      
      Note: The value being used for the minimum MTU (512,
      SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
      SCTP_MIN_PMTU), but such change belongs to another patch.
      
      Changes from v1:
      - do not disable PMTU discovery, in the light of commit
      06ad3919 ("[SCTP] Don't disable PMTU discovery when mtu is small")
      and as suggested by Xin Long.
      - changed the way to break the rtx loop by detecting if the icmp
        resulted in a change or not
      Changes from v2:
      none
      
      See-also: https://lkml.org/lkml/2017/12/22/811Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6c5734d
    • M
      sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled · cc35c3d1
      Marcelo Ricardo Leitner 提交于
      Currently, if PMTU discovery is disabled on a given transport, but the
      configured value is higher than the actual PMTU, it is likely that we
      will get some icmp Frag Needed. The issue is, if PMTU discovery is
      disabled, we won't update the information and will issue a
      retransmission immediately, which may very well trigger another ICMP,
      and another retransmission, leading to a loop.
      
      The fix is to simply not trigger immediate retransmissions if PMTU
      discovery is disabled on the given transport.
      
      Changes from v2:
      - updated stale comment, noticed by Xin Long
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc35c3d1
  4. 06 1月, 2018 1 次提交
  5. 05 1月, 2018 3 次提交
    • W
      ipv6: fix general protection fault in fib6_add() · 7bbfe00e
      Wei Wang 提交于
      In fib6_add(), pn could be NULL if fib6_add_1() failed to return a fib6
      node. Checking pn != fn before accessing pn->leaf makes sure pn is not
      NULL.
      This fixes the following GPF reported by syzkaller:
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 3201 Comm: syzkaller001778 Not tainted 4.15.0-rc5+ #151
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:fib6_add+0x736/0x15a0 net/ipv6/ip6_fib.c:1244
      RSP: 0018:ffff8801c7626a70 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000020 RCX: ffffffff84794465
      RDX: 0000000000000004 RSI: ffff8801d38935f0 RDI: 0000000000000282
      RBP: ffff8801c7626da0 R08: 1ffff10038ec4c35 R09: 0000000000000000
      R10: ffff8801c7626c68 R11: 0000000000000000 R12: 00000000fffffffe
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000009
      FS:  0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:0000000009b70840
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000020be1000 CR3: 00000001d585a006 CR4: 00000000001606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1006
       ip6_route_multipath_add+0xd14/0x16c0 net/ipv6/route.c:3833
       inet6_rtm_newroute+0xdc/0x160 net/ipv6/route.c:3957
       rtnetlink_rcv_msg+0x733/0x1020 net/core/rtnetlink.c:4411
       netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4423
       netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
       sock_sendmsg_nosec net/socket.c:636 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:646
       sock_write_iter+0x31a/0x5d0 net/socket.c:915
       call_write_iter include/linux/fs.h:1772 [inline]
       do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
       do_iter_write+0x154/0x540 fs/read_write.c:932
       compat_writev+0x225/0x420 fs/read_write.c:1246
       do_compat_writev+0x115/0x220 fs/read_write.c:1267
       C_SYSC_writev fs/read_write.c:1278 [inline]
       compat_SyS_writev+0x26/0x30 fs/read_write.c:1274
       do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
       do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
       entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:125
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bbfe00e
    • M
      RDS: null pointer dereference in rds_atomic_free_op · 7d11f77f
      Mohamed Ghannam 提交于
      set rm->atomic.op_active to 0 when rds_pin_pages() fails
      or the user supplied address is invalid,
      this prevents a NULL pointer usage in rds_atomic_free_op()
      Signed-off-by: NMohamed Ghannam <simo.ghannam@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d11f77f
    • A
      rtnetlink: give a user socket to get_target_net() · f428fe4a
      Andrei Vagin 提交于
      This function is used from two places: rtnl_dump_ifinfo and
      rtnl_getlink. In rtnl_getlink(), we give a request skb into
      get_target_net(), but in rtnl_dump_ifinfo, we give a response skb
      into get_target_net().
      The problem here is that NETLINK_CB() isn't initialized for the response
      skb. In both cases we can get a user socket and give it instead of skb
      into get_target_net().
      
      This bug was found by syzkaller with this call-trace:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868
      RSP: 0018:ffff8801c880f348 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8443f900
      RDX: 000000000000007b RSI: ffffffff86510f40 RDI: 00000000000003d8
      RBP: ffff8801c880f360 R08: 0000000000000000 R09: 1ffff10039101e4f
      R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86510f40
      R13: 000000000000000c R14: 0000000000000004 R15: 0000000000000011
      FS:  0000000001a1a880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020151000 CR3: 00000001c9511005 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        netlink_ns_capable+0x26/0x30 net/netlink/af_netlink.c:886
        get_target_net+0x9d/0x120 net/core/rtnetlink.c:1765
        rtnl_dump_ifinfo+0x2e5/0xee0 net/core/rtnetlink.c:1806
        netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2222
        __netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2319
        netlink_dump_start include/linux/netlink.h:214 [inline]
        rtnetlink_rcv_msg+0x7f0/0xb10 net/core/rtnetlink.c:4485
        netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2441
        rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
        netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
        netlink_unicast+0x4be/0x6a0 net/netlink/af_netlink.c:1334
        netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
      
      Cc: Jiri Benc <jbenc@redhat.com>
      Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f428fe4a
  6. 04 1月, 2018 4 次提交
  7. 03 1月, 2018 6 次提交
    • J
      tipc: fix problems with multipoint-to-point flow control · f9c935db
      Jon Maloy 提交于
      In commit 04d7b574 ("tipc: add multipoint-to-point flow control") we
      introduced a protocol for preventing buffer overflow when many group
      members try to simultaneously send messages to the same receiving member.
      
      Stress test of this mechanism has revealed a couple of related bugs:
      
      - When the receiving member receives an advertisement REMIT message from
        one of the senders, it will sometimes prematurely activate a pending
        member and send it the remitted advertisement, although the upper
        limit for active senders has been reached. This leads to accumulation
        of illegal advertisements, and eventually to messages being dropped
        because of receive buffer overflow.
      
      - When the receiving member leaves REMITTED state while a received
        message is being read, we miss to look at the pending queue, to
        activate the oldest pending peer. This leads to some pending senders
        being starved out, and never getting the opportunity to profit from
        the remitted advertisement.
      
      We fix the former in the function tipc_group_proto_rcv() by returning
      directly from the function once it becomes clear that the remitting
      peer cannot leave REMITTED state at that point.
      
      We fix the latter in the function tipc_group_update_rcv_win() by looking
      up and activate the longest pending peer when it becomes clear that the
      remitting peer now can leave REMITTED state.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9c935db
    • S
      ethtool: do not print warning for applications using legacy API · 71891e2d
      Stephen Hemminger 提交于
      In kernel log ths message appears on every boot:
       "warning: `NetworkChangeNo' uses legacy ethtool link settings API,
        link modes are only partially reported"
      
      When ethtool link settings API changed, it started complaining about
      usages of old API. Ironically, the original patch was from google but
      the application using the legacy API is chrome.
      
      Linux ABI is fixed as much as possible. The kernel must not break it
      and should not complain about applications using legacy API's.
      This patch just removes the warning since using legacy API's
      in Linux is perfectly acceptable.
      
      Fixes: 3f1ac7a7 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71891e2d
    • R
      net/sched: Fix update of lastuse in act modules implementing stats_update · 3bb23421
      Roi Dayan 提交于
      We need to update lastuse to to the most updated value between what
      is already set and the new value.
      If HW matching fails, i.e. because of an issue, the stats are not updated
      but it could be that software did match and updated lastuse.
      
      Fixes: 5712bf9c ("net/sched: act_mirred: Use passed lastuse argument")
      Fixes: 9fea47d9 ("net/sched: act_gact: Update statistics when offloaded to hardware")
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bb23421
    • X
      ip6_tunnel: allow ip6gre dev mtu to be set below 1280 · 2fa771be
      Xin Long 提交于
      Commit 582442d6 ("ipv6: Allow the MTU of ipip6 tunnel to be set
      below 1280") fixed a mtu setting issue. It works for ipip6 tunnel.
      
      But ip6gre dev updates the mtu also with ip6_tnl_change_mtu. Since
      the inner packet over ip6gre can be ipv4 and it's mtu should also
      be allowed to set below 1280, the same issue also exists on ip6gre.
      
      This patch is to fix it by simply changing to check if parms.proto
      is IPPROTO_IPV6 in ip6_tnl_change_mtu instead, to make ip6gre to
      go to 'else' branch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fa771be
    • E
      ip6_tunnel: disable dst caching if tunnel is dual-stack · 23263ec8
      Eli Cooper 提交于
      When an ip6_tunnel is in mode 'any', where the transport layer
      protocol can be either 4 or 41, dst_cache must be disabled.
      
      This is because xfrm policies might apply to only one of the two
      protocols. Caching dst would cause xfrm policies for one protocol
      incorrectly used for the other.
      Signed-off-by: NEli Cooper <elicooper@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23263ec8
    • D
      Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns" · 55a5ec9b
      David S. Miller 提交于
      This reverts commit 87c320e5.
      
      Changing the error return code in some situations turns out to
      be harmful in practice.  In particular Michael Ellerman reports
      that DHCP fails on his powerpc machines, and this revert gets
      things working again.
      
      Johannes Berg agrees that this revert is the best course of
      action for now.
      
      Fixes: 029b6d14 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"")
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a5ec9b
  8. 29 12月, 2017 3 次提交
  9. 28 12月, 2017 1 次提交
  10. 27 12月, 2017 9 次提交
    • T
      tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path · 642a8439
      Tommi Rantala 提交于
      Calling tipc_mon_delete() before the monitor has been created will oops.
      This can happen in tipc_enable_bearer() error path if tipc_disc_create()
      fails.
      
      [   48.589074] BUG: unable to handle kernel paging request at 0000000000001008
      [   48.590266] IP: tipc_mon_delete+0xea/0x270 [tipc]
      [   48.591223] PGD 1e60c5067 P4D 1e60c5067 PUD 1eb0cf067 PMD 0
      [   48.592230] Oops: 0000 [#1] SMP KASAN
      [   48.595610] CPU: 5 PID: 1199 Comm: tipc Tainted: G    B            4.15.0-rc4-pc64-dirty #5
      [   48.597176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   48.598489] RIP: 0010:tipc_mon_delete+0xea/0x270 [tipc]
      [   48.599347] RSP: 0018:ffff8801d827f668 EFLAGS: 00010282
      [   48.600705] RAX: ffff8801ee813f00 RBX: 0000000000000204 RCX: 0000000000000000
      [   48.602183] RDX: 1ffffffff1de6a75 RSI: 0000000000000297 RDI: 0000000000000297
      [   48.604373] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff1dd1533
      [   48.605607] R10: ffffffff8eafbb05 R11: fffffbfff1dd1534 R12: 0000000000000050
      [   48.607082] R13: dead000000000200 R14: ffffffff8e73f310 R15: 0000000000001020
      [   48.608228] FS:  00007fc686484800(0000) GS:ffff8801f5540000(0000) knlGS:0000000000000000
      [   48.610189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.611459] CR2: 0000000000001008 CR3: 00000001dda70002 CR4: 00000000003606e0
      [   48.612759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   48.613831] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   48.615038] Call Trace:
      [   48.615635]  tipc_enable_bearer+0x415/0x5e0 [tipc]
      [   48.620623]  tipc_nl_bearer_enable+0x1ab/0x200 [tipc]
      [   48.625118]  genl_family_rcv_msg+0x36b/0x570
      [   48.631233]  genl_rcv_msg+0x5a/0xa0
      [   48.631867]  netlink_rcv_skb+0x1cc/0x220
      [   48.636373]  genl_rcv+0x24/0x40
      [   48.637306]  netlink_unicast+0x29c/0x350
      [   48.639664]  netlink_sendmsg+0x439/0x590
      [   48.642014]  SYSC_sendto+0x199/0x250
      [   48.649912]  do_syscall_64+0xfd/0x2c0
      [   48.650651]  entry_SYSCALL64_slow_path+0x25/0x25
      [   48.651843] RIP: 0033:0x7fc6859848e3
      [   48.652539] RSP: 002b:00007ffd25dff938 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   48.654003] RAX: ffffffffffffffda RBX: 00007ffd25dff990 RCX: 00007fc6859848e3
      [   48.655303] RDX: 0000000000000054 RSI: 00007ffd25dff990 RDI: 0000000000000003
      [   48.656512] RBP: 00007ffd25dff980 R08: 00007fc685c35fc0 R09: 000000000000000c
      [   48.657697] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000d13010
      [   48.658840] R13: 00007ffd25e009c0 R14: 0000000000000000 R15: 0000000000000000
      [   48.662972] RIP: tipc_mon_delete+0xea/0x270 [tipc] RSP: ffff8801d827f668
      [   48.664073] CR2: 0000000000001008
      [   48.664576] ---[ end trace e811818d54d5ce88 ]---
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      642a8439
    • T
      tipc: error path leak fixes in tipc_enable_bearer() · 19142551
      Tommi Rantala 提交于
      Fix memory leak in tipc_enable_bearer() if enable_media() fails, and
      cleanup with bearer_disable() if tipc_mon_create() fails.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19142551
    • A
      RDS: Check cmsg_len before dereferencing CMSG_DATA · 14e138a8
      Avinash Repaka 提交于
      RDS currently doesn't check if the length of the control message is
      large enough to hold the required data, before dereferencing the control
      message data. This results in following crash:
      
      BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
      [inline]
      BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
      net/rds/send.c:1066
      Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157
      
      CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       rds_rdma_bytes net/rds/send.c:1013 [inline]
       rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
       __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
       SYSC_sendmmsg net/socket.c:2139 [inline]
       SyS_sendmmsg+0x35/0x60 net/socket.c:2134
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x43fe49
      RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
      RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
      R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000
      
      To fix this, we verify that the cmsg_len is large enough to hold the
      data to be read, before proceeding further.
      Reported-by: Nsyzbot <syzkaller-bugs@googlegroups.com>
      Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14e138a8
    • J
      tipc: fix memory leak of group member when peer node is lost · 3a33a19b
      Jon Maloy 提交于
      When a group member receives a member WITHDRAW event, this might have
      two reasons: either the peer member is leaving the group, or the link
      to the member's node has been lost.
      
      In the latter case we need to issue a DOWN event to the user right away,
      and let function tipc_group_filter_msg() perform delete of the member
      item. However, in this case we miss to change the state of the member
      item to MBR_LEAVING, so the member item is not deleted, and we have a
      memory leak.
      
      We now separate better between the four sub-cases of a WITHRAW event
      and make sure that each case is handled correctly.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a33a19b
    • J
      net: sched: fix possible null pointer deref in tcf_block_put · 4853f128
      Jiri Pirko 提交于
      We need to check block for being null in both tcf_block_put and
      tcf_block_put_ext.
      
      Fixes: 343723dd ("net: sched: fix clsact init error path")
      Reported-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4853f128
    • J
      tipc: base group replicast ack counter on number of actual receivers · 0a3d805c
      Jon Maloy 提交于
      In commit 2f487712 ("tipc: guarantee that group broadcast doesn't
      bypass group unicast") we introduced a mechanism that requires the first
      (replicated) broadcast sent after a unicast to be acknowledged by all
      receivers before permitting sending of the next (true) broadcast.
      
      The counter for keeping track of the number of acknowledges to expect
      is based on the tipc_group::member_cnt variable. But this misses that
      some of the known members may not be ready for reception, and will never
      acknowledge the message, either because they haven't fully joined the
      group or because they are leaving the group. Such members are identified
      by not fulfilling the condition tested for in the function
      tipc_group_is_enabled().
      
      We now set the counter for the actual number of acks to receive at the
      moment the message is sent, by just counting the number of recipients
      satisfying the tipc_group_is_enabled() test.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3d805c
    • C
      net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap() · b2fb01f4
      Cong Wang 提交于
      The rcu_barrier_bh() in mini_qdisc_pair_swap() is to wait for
      flying RCU callback installed by a previous mini_qdisc_pair_swap(),
      however we miss it on the tp_head==NULL path, which leads to that
      the RCU callback still uses miniq_old->rcu after it is freed together
      with qdisc in qdisc_graft(). So just add it on that path too.
      
      Fixes: 46209401 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath ")
      Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Tested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2fb01f4
    • A
      ip6_gre: fix device features for ioctl setup · e5a9336a
      Alexey Kodanev 提交于
      When ip6gre is created using ioctl, its features, such as
      scatter-gather, GSO and tx-checksumming will be turned off:
      
        # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
        # ethtool -k gre6 (truncated output)
          tx-checksumming: off
          scatter-gather: off
          tcp-segmentation-offload: off
          generic-segmentation-offload: off [requested on]
      
      But when netlink is used, they will be enabled:
        # ip link add gre6 type ip6gre remote fd00::1
        # ethtool -k gre6 (truncated output)
          tx-checksumming: on
          scatter-gather: on
          tcp-segmentation-offload: on
          generic-segmentation-offload: on
      
      This results in a loss of performance when gre6 is created via ioctl.
      The issue was found with LTP/gre tests.
      
      Fix it by moving the setup of device features to a separate function
      and invoke it with ndo_init callback because both netlink and ioctl
      will eventually call it via register_netdevice():
      
         register_netdevice()
             - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
                 - ip6gre_tunnel_init_common()
                      - ip6gre_tnl_init_features()
      
      The moved code also contains two minor style fixes:
        * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
        * fixed the issue reported by checkpatch: "Unnecessary parentheses around
          'nt->encap.type == TUNNEL_ENCAP_NONE'"
      
      Fixes: ac4eb009 ("ip6gre: Add support for basic offloads offloads excluding GSO")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5a9336a
    • H
      netfilter: nf_tables: fix potential NULL-ptr deref in nf_tables_dump_obj_done() · 8bea728d
      Hangbin Liu 提交于
      If there is no NFTA_OBJ_TABLE and NFTA_OBJ_TYPE, the c.data will be NULL in
      nf_tables_getobj(). So before free filter->table in nf_tables_dump_obj_done(),
      we need to check if filter is NULL first.
      
      Fixes: e46abbcc ("netfilter: nf_tables: Allow table names of up to 255 chars")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8bea728d
  11. 22 12月, 2017 2 次提交