1. 18 12月, 2018 1 次提交
  2. 06 10月, 2018 1 次提交
    • W
      ipv6: take rcu lock in rawv6_send_hdrinc() · a688caa3
      Wei Wang 提交于
      In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
      directly assign the dst to skb and set passed in dst to NULL to avoid
      double free.
      However, in error case, we free skb and then do stats update with the
      dst pointer passed in. This causes use-after-free on the dst.
      Fix it by taking rcu read lock right before dst could get released to
      make sure dst does not get freed until the stats update is done.
      Note: we don't have this issue in ipv4 cause dst is not used for stats
      update in v4.
      
      Syzkaller reported following crash:
      BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
      BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
      Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
      
      CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
       rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457099
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
      RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000
      
      Allocated by task 32088:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:105
       ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
       ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
       ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
       fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
       ip6_route_output include/net/ip6_route.h:88 [inline]
       ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
       rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5356:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x83/0x290 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:141
       dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
       __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
       rcu_do_batch kernel/rcu/tree.c:2576 [inline]
       invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
       __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
       rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
       __do_softirq+0x30b/0xad8 kernel/softirq.c:292
      
      Fixes: 1789a640 ("raw: avoid two atomics in xmit")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a688caa3
  3. 07 7月, 2018 2 次提交
  4. 04 7月, 2018 1 次提交
  5. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  6. 26 5月, 2018 1 次提交
  7. 16 5月, 2018 2 次提交
  8. 28 3月, 2018 1 次提交
  9. 27 3月, 2018 1 次提交
  10. 20 2月, 2018 1 次提交
  11. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  12. 16 1月, 2018 1 次提交
    • D
      ip: Define usercopy region in IP proto slab cache · 8c2bc895
      David Windsor 提交于
      The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
      userspace. In support of usercopy hardening, this patch defines a region
      in the struct proto slab cache in which userspace copy operations are
      allowed.
      
      example usage trace:
      
          net/ipv4/raw.c:
              raw_seticmpfilter(...):
                  ...
                  copy_from_user(&raw_sk(sk)->filter, ..., optlen)
      
              raw_geticmpfilter(...):
                  ...
                  copy_to_user(..., &raw_sk(sk)->filter, len)
      
          net/ipv6/raw.c:
              rawv6_seticmpfilter(...):
                  ...
                  copy_from_user(&raw6_sk(sk)->filter, ..., optlen)
      
              rawv6_geticmpfilter(...):
                  ...
                  copy_to_user(..., &raw6_sk(sk)->filter, len)
      
      This region is known as the slab cache's usercopy region. Slab caches
      can now check that each dynamically sized copy operation involving
      cache-managed memory falls entirely within the slab's usercopy region.
      
      This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
      whitelisting code in the last public patch of grsecurity/PaX based on my
      understanding of the code. Changes or omissions from the original code are
      mine and don't reflect the original grsecurity/PaX code.
      Signed-off-by: NDavid Windsor <dave@nullcore.net>
      [kees: split from network patch, provide usage trace]
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      8c2bc895
  13. 18 10月, 2017 1 次提交
  14. 08 8月, 2017 1 次提交
  15. 05 6月, 2017 1 次提交
  16. 04 5月, 2017 1 次提交
    • A
      ipv4, ipv6: ensure raw socket message is big enough to hold an IP header · 86f4c90a
      Alexander Potapenko 提交于
      raw_send_hdrinc() and rawv6_send_hdrinc() expect that the buffer copied
      from the userspace contains the IPv4/IPv6 header, so if too few bytes are
      copied, parts of the header may remain uninitialized.
      
      This bug has been detected with KMSAN.
      
      For the record, the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in nf_ct_frag6_gather+0xf5a/0x44a0
      inter: 0
      CPU: 0 PID: 1036 Comm: probe Not tainted 4.11.0-rc5+ #2455
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
       __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
       nf_ct_frag6_gather+0xf5a/0x44a0 net/ipv6/netfilter/nf_conntrack_reasm.c:577
       ipv6_defrag+0x1d9/0x280 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0x13f/0x3c0 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       NF_HOOK ./include/linux/netfilter.h:255
       rawv6_send_hdrinc net/ipv6/raw.c:673
       rawv6_sendmsg+0x2fcb/0x41a0 net/ipv6/raw.c:919
       inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
       SyS_sendto+0xbc/0xe0 net/socket.c:1664
       do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x436e03
      RSP: 002b:00007ffce48baf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000436e03
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007ffce48baf90 R08: 00007ffce48baf50 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000401790 R14: 0000000000401820 R15: 0000000000000000
      origin: 00000000d9400053
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270
       slab_alloc_node mm/slub.c:2735
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678
       sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903
       sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920
       rawv6_send_hdrinc net/ipv6/raw.c:638
       rawv6_sendmsg+0x2918/0x41a0 net/ipv6/raw.c:919
       inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
       SyS_sendto+0xbc/0xe0 net/socket.c:1664
       do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      , triggered by the following syscalls:
        socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
        sendto(3, NULL, 0, 0, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ff00::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EPERM
      
      A similar report is triggered in net/ipv4/raw.c if we use a PF_INET socket
      instead of a PF_INET6 one.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f4c90a
  17. 27 4月, 2017 1 次提交
  18. 08 2月, 2017 1 次提交
  19. 24 12月, 2016 1 次提交
    • D
      ipv6: handle -EFAULT from skb_copy_bits · a98f9175
      Dave Jones 提交于
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a98f9175
  20. 05 11月, 2016 1 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
  21. 24 10月, 2016 1 次提交
    • C
      net: ip, diag -- Add diag interface for raw sockets · 432490f9
      Cyrill Gorcunov 提交于
      In criu we are actively using diag interface to collect sockets
      present in the system when dumping applications. And while for
      unix, tcp, udp[lite], packet, netlink it works as expected,
      the raw sockets do not have. Thus add it.
      
      v2:
       - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
       - implement @destroy for diag requests (by dsa@)
      
      v3:
       - add export of raw_abort for IPv6 (by dsa@)
       - pass net-admin flag into inet_sk_diag_fill due to
         changes in net-next branch (by dsa@)
      
      v4:
       - use @pad in struct inet_diag_req_v2 for raw socket
         protocol specification: raw module carries sockets
         which may have custom protocol passed from socket()
         syscall and sole @sdiag_protocol is not enough to
         match underlied ones
       - start reporting protocol specifed in socket() call
         when sockets are raw ones for the same reason: user
         space tools like ss may parse this attribute and use
         it for socket matching
      
      v5 (by eric.dumazet@):
       - use sock_hold in raw_sock_get instead of atomic_inc,
         we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
         when looking up so counter won't be zero here.
      
      v6:
       - use sdiag_raw_protocol() helper which will access @pad
         structure used for raw sockets protocol specification:
         we can't simply rename this member without breaking uapi
      
      v7:
       - sine sdiag_raw_protocol() helper is not suitable for
         uapi lets rather make an alias structure with proper
         names. __check_inet_diag_req_raw helper will catch
         if any of structure unintentionally changed.
      
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andrey Vagin <avagin@openvz.org>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      432490f9
  22. 21 10月, 2016 1 次提交
    • E
      udp: must lock the socket in udp_disconnect() · 286c72de
      Eric Dumazet 提交于
      Baozeng Ding reported KASAN traces showing uses after free in
      udp_lib_get_port() and other related UDP functions.
      
      A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.
      
      I could write a reproducer with two threads doing :
      
      static int sock_fd;
      static void *thr1(void *arg)
      {
      	for (;;) {
      		connect(sock_fd, (const struct sockaddr *)arg,
      			sizeof(struct sockaddr_in));
      	}
      }
      
      static void *thr2(void *arg)
      {
      	struct sockaddr_in unspec;
      
      	for (;;) {
      		memset(&unspec, 0, sizeof(unspec));
      	        connect(sock_fd, (const struct sockaddr *)&unspec,
      			sizeof(unspec));
              }
      }
      
      Problem is that udp_disconnect() could run without holding socket lock,
      and this was causing list corruptions.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286c72de
  23. 11 9月, 2016 1 次提交
  24. 12 6月, 2016 1 次提交
  25. 04 5月, 2016 1 次提交
    • W
      ipv6: add new struct ipcm6_cookie · 26879da5
      Wei Wang 提交于
      In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
      variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
      functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
      This is not a good practice and makes it hard to add new parameters.
      This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
      ipv4 and include the above mentioned variables. And we only pass the
      pointer to this structure to corresponding functions. This makes it easier
      to add new parameters in the future and makes the function cleaner.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26879da5
  26. 05 4月, 2016 2 次提交
    • S
      sock: enable timestamping using control messages · c14ac945
      Soheil Hassas Yeganeh 提交于
      Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
      This is very costly when users want to sample writes to gather
      tx timestamps.
      
      Add support for enabling SO_TIMESTAMPING via control messages by
      using tsflags added in `struct sockcm_cookie` (added in the previous
      patches in this series) to set the tx_flags of the last skb created in
      a sendmsg. With this patch, the timestamp recording bits in tx_flags
      of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.
      
      Please note that this is only effective for overriding the recording
      timestamps flags. Users should enable timestamp reporting (e.g.,
      SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
      socket options and then should ask for SOF_TIMESTAMPING_TX_*
      using control messages per sendmsg to sample timestamps for each
      write.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c14ac945
    • S
      ipv6: process socket-level control messages in IPv6 · ad1e46a8
      Soheil Hassas Yeganeh 提交于
      Process socket-level control messages by invoking
      __sock_cmsg_send in ip6_datagram_send_ctl for control messages on
      the SOL_SOCKET layer.
      
      This makes sure whenever ip6_datagram_send_ctl is called for
      udp and raw, we also process socket-level control messages.
      
      This is a bit uglier than IPv4, since IPv6 does not have
      something like ipcm_cookie. Perhaps we can later create
      a control message cookie for IPv6?
      
      Note that this commit interprets new control messages that
      were ignored before. As such, this commit does not change
      the behavior of IPv6 control messages.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad1e46a8
  27. 18 12月, 2015 1 次提交
  28. 03 12月, 2015 1 次提交
  29. 08 10月, 2015 1 次提交
  30. 18 9月, 2015 4 次提交
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • E
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman 提交于
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a26a56
    • E
      adb28c9d
    • E
      net: Merge dst_output and dst_output_sk · 5a70649e
      Eric W. Biederman 提交于
      Add a sock paramter to dst_output making dst_output_sk superfluous.
      Add a skb->sk parameter to all of the callers of dst_output
      Have the callers of dst_output_sk call dst_output.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a70649e
  31. 10 7月, 2015 1 次提交
    • T
      ipv6: Nonlocal bind · 35a256fe
      Tom Herbert 提交于
      Add support to allow non-local binds similar to how this was done for IPv4.
      Non-local binds are very useful in emulating the Internet in a box, etc.
      
      This add the ip_nonlocal_bind sysctl under ipv6.
      
      Testing:
      
      Set up nonlocal binding and receive routing on a host, e.g.:
      
      ip -6 rule add from ::/0 iif eth0 lookup 200
      ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
      sysctl -w net.ipv6.ip_nonlocal_bind=1
      
      Set up routing to 2001:0:0:1::/64 on peer to go to first host
      
      ping6 -I 2001:0:0:1::1 peer-address -- to verify
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35a256fe
  32. 31 5月, 2015 1 次提交
  33. 26 5月, 2015 1 次提交
    • M
      ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags · 48e8aa6e
      Martin KaFai Lau 提交于
      The neighbor look-up used to depend on the rt6i_gateway (if
      there is a gateway) or the rt6i_dst (if it is a RTF_CACHE clone)
      as the nexthop address.  Note that rt6i_dst is set to fl6->daddr
      for the RTF_CACHE clone where fl6->daddr is the one used to do
      the route look-up.
      
      Now, we only create RTF_CACHE clone after encountering exception.
      When doing the neighbor look-up with a route that is neither a gateway
      nor a RTF_CACHE clone, the daddr in skb will be used as the nexthop.
      
      In some cases, the daddr in skb is not the one used to do
      the route look-up.  One example is in ip_vs_dr_xmit_v6() where the
      real nexthop server address is different from the one in the skb.
      
      This patch is going to follow the IPv4 approach and ask the
      ip6_pol_route() callers to set the FLOWI_FLAG_KNOWN_NH properly.
      
      In the next patch, ip6_pol_route() will honor the FLOWI_FLAG_KNOWN_NH
      and create a RTF_CACHE clone.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NJulian Anastasov <ja@ssi.bg>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48e8aa6e
  34. 08 4月, 2015 1 次提交
    • D
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller 提交于
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7026b1dd