1. 05 11月, 2016 2 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
    • L
      net: core: add UID to flows, rules, and routes · 622ec2c9
      Lorenzo Colitti 提交于
      - Define a new FIB rule attributes, FRA_UID_RANGE, to describe a
        range of UIDs.
      - Define a RTA_UID attribute for per-UID route lookups and dumps.
      - Support passing these attributes to and from userspace via
        rtnetlink. The value INVALID_UID indicates no UID was
        specified.
      - Add a UID field to the flow structures.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622ec2c9
  2. 01 11月, 2016 1 次提交
    • D
      net: Enable support for VRF with ipv4 multicast · e58e4159
      David Ahern 提交于
      Enable support for IPv4 multicast:
      - similar to unicast the flow struct is updated to L3 master device
        if relevant prior to calling fib_rules_lookup. The table id is saved
        to the lookup arg so the rule action for ipmr can return the table
        associated with the device.
      
      - ip_mr_forward needs to check for master device mismatch as well
        since the skb->dev is set to it
      
      - allow multicast address on VRF device for Rx by checking for the
        daddr in the VRF device as well as the original ingress device
      
      - on Tx need to drop to __mkroute_output when FIB lookup fails for
        multicast destination address.
      
      - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates
        IPMR FIB rules on first device create similar to FIB rules. In
        addition the VRF driver does not divert IPv4 multicast packets:
        it breaks on Tx since the fib lookup fails on the mcast address.
      
      With this patch, ipmr forwarding and local rx/tx work.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e58e4159
  3. 14 10月, 2016 1 次提交
  4. 26 9月, 2016 1 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
  5. 22 9月, 2016 1 次提交
  6. 11 9月, 2016 3 次提交
  7. 31 8月, 2016 1 次提交
    • R
      net: lwtunnel: Handle fragmentation · 14972cbd
      Roopa Prabhu 提交于
      Today mpls iptunnel lwtunnel_output redirect expects the tunnel
      output function to handle fragmentation. This is ok but can be
      avoided if we did not do the mpls output redirect too early.
      ie we could wait until ip fragmentation is done and then call
      mpls output for each ip fragment.
      
      To make this work we will need,
      1) the lwtunnel state to carry encap headroom
      2) and do the redirect to the encap output handler on the ip fragment
      (essentially do the output redirect after fragmentation)
      
      This patch adds tunnel headroom in lwtstate to make sure we
      account for tunnel data in mtu calculations during fragmentation
      and adds new xmit redirect handler to redirect to lwtunnel xmit func
      after ip fragmentation.
      
      This includes IPV6 and some mtu fixes and testing from David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14972cbd
  8. 10 5月, 2016 1 次提交
  9. 28 4月, 2016 1 次提交
  10. 14 4月, 2016 1 次提交
    • C
      route: do not cache fib route info on local routes with oif · d6d5e999
      Chris Friesen 提交于
      For local routes that require a particular output interface we do not want
      to cache the result.  Caching the result causes incorrect behaviour when
      there are multiple source addresses on the interface.  The end result
      being that if the intended recipient is waiting on that interface for the
      packet he won't receive it because it will be delivered on the loopback
      interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
      interface as well.
      
      This can be tested by running a program such as "dhcp_release" which
      attempts to inject a packet on a particular interface so that it is
      received by another program on the same board.  The receiving process
      should see an IP_PKTINFO ipi_ifndex value of the source interface
      (e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
      will still appear on the loopback interface in tcpdump but the important
      aspect is that the CMSG info is correct.
      
      Sample dhcp_release command line:
      
         dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
      Signed-off-by: NAllain Legacy <allain.legacy@windriver.com>
      Signed off-by: Chris Friesen <chris.friesen@windriver.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6d5e999
  11. 12 4月, 2016 1 次提交
    • D
      net: vrf: Fix dst reference counting · 9ab179d8
      David Ahern 提交于
      Vivek reported a kernel exception deleting a VRF with an active
      connection through it. The root cause is that the socket has a cached
      reference to a dst that is destroyed. Converting the dst_destroy to
      dst_release and letting proper reference counting kick in does not
      work as the dst has a reference to the device which needs to be released
      as well.
      
      I talked to Hannes about this at netdev and he pointed out the ipv4 and
      ipv6 dst handling has dst_ifdown for just this scenario. Rather than
      continuing with the reinvented dst wheel in VRF just remove it and
      leverage the ipv4 and ipv6 versions.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab179d8
  12. 19 2月, 2016 1 次提交
  13. 08 10月, 2015 4 次提交
  14. 07 10月, 2015 2 次提交
  15. 05 10月, 2015 2 次提交
  16. 30 9月, 2015 6 次提交
  17. 26 9月, 2015 1 次提交
  18. 21 9月, 2015 1 次提交
    • N
      net: Fix behaviour of unreachable, blackhole and prohibit routes · 0315e382
      Nikola Forró 提交于
      Man page of ip-route(8) says following about route types:
      
        unreachable - these destinations are unreachable.  Packets are dis‐
        carded and the ICMP message host unreachable is generated.  The local
        senders get an EHOSTUNREACH error.
      
        blackhole - these destinations are unreachable.  Packets are dis‐
        carded silently.  The local senders get an EINVAL error.
      
        prohibit - these destinations are unreachable.  Packets are discarded
        and the ICMP message communication administratively prohibited is
        generated.  The local senders get an EACCES error.
      
      In the inet6 address family, this was correct, except the local senders
      got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
      In the inet address family, all three route types generated ICMP message
      net unreachable, and the local senders got ENETUNREACH error.
      
      In both address families all three route types now behave consistently
      with documentation.
      Signed-off-by: NNikola Forró <nforro@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0315e382
  19. 18 9月, 2015 1 次提交
    • D
      net: Initialize table in fib result · bde6f9de
      David Ahern 提交于
      Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:
      
      [    0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
      [    0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
      [    0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0
      [    0.877597] Oops: 0000 [#1] SMP
      [    0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
      [    0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
      [    0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [    0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000
      [    0.877597] RIP: 0010:[<ffffffff8155b5e2>]  [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
      [    0.877597] RSP: 0018:ffff88003ed03ba0  EFLAGS: 00010202
      [    0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020
      [    0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8
      [    0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000
      [    0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00
      [    0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600
      [    0.877597] FS:  00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
      [    0.877597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0
      [    0.877597] Stack:
      [    0.877597]  0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
      [    0.877597]  ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
      [    0.877597]  0000000000000000 0000000000000046 0000000000000000 0000000400000000
      [    0.877597] Call Trace:
      [    0.877597]  <IRQ>
      [    0.877597]  [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
      [    0.877597]  [<ffffffff8158e13c>] arp_process+0x39c/0x690
      [    0.877597]  [<ffffffff8158e57e>] arp_rcv+0x13e/0x170
      [    0.877597]  [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
      [    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
      [    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
      [    0.877597]  [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
      [    0.877597]  [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
      [    0.877597]  [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
      [    0.877597]  [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
      [    0.877597]  [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
      [    0.877597]  [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
      [    0.877597]  [<ffffffff81053228>] __do_softirq+0x98/0x260
      [    0.877597]  [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30
      
      The root cause is use of res.table uninitialized.
      
      Thanks to Nikolay for noticing the uninitialized use amongst the maze of
      gotos.
      
      As Nikolay pointed out the second initialization is not required to fix
      the oops, but rather to fix a related problem where a valid lookup should
      be invalidated before creating the rth entry.
      
      Fixes: b7503e0c ("net: Add FIB table id to rtable")
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Reported-by: NRichard Alpe <richard.alpe@ericsson.com>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde6f9de
  20. 16 9月, 2015 3 次提交
  21. 30 8月, 2015 1 次提交
  22. 29 8月, 2015 1 次提交
  23. 21 8月, 2015 1 次提交
  24. 18 8月, 2015 1 次提交
  25. 14 8月, 2015 1 次提交