1. 26 9月, 2016 1 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
  2. 22 9月, 2016 1 次提交
  3. 11 9月, 2016 3 次提交
  4. 31 8月, 2016 1 次提交
    • R
      net: lwtunnel: Handle fragmentation · 14972cbd
      Roopa Prabhu 提交于
      Today mpls iptunnel lwtunnel_output redirect expects the tunnel
      output function to handle fragmentation. This is ok but can be
      avoided if we did not do the mpls output redirect too early.
      ie we could wait until ip fragmentation is done and then call
      mpls output for each ip fragment.
      
      To make this work we will need,
      1) the lwtunnel state to carry encap headroom
      2) and do the redirect to the encap output handler on the ip fragment
      (essentially do the output redirect after fragmentation)
      
      This patch adds tunnel headroom in lwtstate to make sure we
      account for tunnel data in mtu calculations during fragmentation
      and adds new xmit redirect handler to redirect to lwtunnel xmit func
      after ip fragmentation.
      
      This includes IPV6 and some mtu fixes and testing from David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14972cbd
  5. 10 5月, 2016 1 次提交
  6. 28 4月, 2016 1 次提交
  7. 14 4月, 2016 1 次提交
    • C
      route: do not cache fib route info on local routes with oif · d6d5e999
      Chris Friesen 提交于
      For local routes that require a particular output interface we do not want
      to cache the result.  Caching the result causes incorrect behaviour when
      there are multiple source addresses on the interface.  The end result
      being that if the intended recipient is waiting on that interface for the
      packet he won't receive it because it will be delivered on the loopback
      interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
      interface as well.
      
      This can be tested by running a program such as "dhcp_release" which
      attempts to inject a packet on a particular interface so that it is
      received by another program on the same board.  The receiving process
      should see an IP_PKTINFO ipi_ifndex value of the source interface
      (e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
      will still appear on the loopback interface in tcpdump but the important
      aspect is that the CMSG info is correct.
      
      Sample dhcp_release command line:
      
         dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
      Signed-off-by: NAllain Legacy <allain.legacy@windriver.com>
      Signed off-by: Chris Friesen <chris.friesen@windriver.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6d5e999
  8. 12 4月, 2016 1 次提交
    • D
      net: vrf: Fix dst reference counting · 9ab179d8
      David Ahern 提交于
      Vivek reported a kernel exception deleting a VRF with an active
      connection through it. The root cause is that the socket has a cached
      reference to a dst that is destroyed. Converting the dst_destroy to
      dst_release and letting proper reference counting kick in does not
      work as the dst has a reference to the device which needs to be released
      as well.
      
      I talked to Hannes about this at netdev and he pointed out the ipv4 and
      ipv6 dst handling has dst_ifdown for just this scenario. Rather than
      continuing with the reinvented dst wheel in VRF just remove it and
      leverage the ipv4 and ipv6 versions.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab179d8
  9. 19 2月, 2016 1 次提交
  10. 08 10月, 2015 4 次提交
  11. 07 10月, 2015 2 次提交
  12. 05 10月, 2015 2 次提交
  13. 30 9月, 2015 6 次提交
  14. 26 9月, 2015 1 次提交
  15. 21 9月, 2015 1 次提交
    • N
      net: Fix behaviour of unreachable, blackhole and prohibit routes · 0315e382
      Nikola Forró 提交于
      Man page of ip-route(8) says following about route types:
      
        unreachable - these destinations are unreachable.  Packets are dis‐
        carded and the ICMP message host unreachable is generated.  The local
        senders get an EHOSTUNREACH error.
      
        blackhole - these destinations are unreachable.  Packets are dis‐
        carded silently.  The local senders get an EINVAL error.
      
        prohibit - these destinations are unreachable.  Packets are discarded
        and the ICMP message communication administratively prohibited is
        generated.  The local senders get an EACCES error.
      
      In the inet6 address family, this was correct, except the local senders
      got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
      In the inet address family, all three route types generated ICMP message
      net unreachable, and the local senders got ENETUNREACH error.
      
      In both address families all three route types now behave consistently
      with documentation.
      Signed-off-by: NNikola Forró <nforro@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0315e382
  16. 18 9月, 2015 1 次提交
    • D
      net: Initialize table in fib result · bde6f9de
      David Ahern 提交于
      Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:
      
      [    0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
      [    0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
      [    0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0
      [    0.877597] Oops: 0000 [#1] SMP
      [    0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
      [    0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
      [    0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [    0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000
      [    0.877597] RIP: 0010:[<ffffffff8155b5e2>]  [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
      [    0.877597] RSP: 0018:ffff88003ed03ba0  EFLAGS: 00010202
      [    0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020
      [    0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8
      [    0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000
      [    0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00
      [    0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600
      [    0.877597] FS:  00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
      [    0.877597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0
      [    0.877597] Stack:
      [    0.877597]  0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
      [    0.877597]  ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
      [    0.877597]  0000000000000000 0000000000000046 0000000000000000 0000000400000000
      [    0.877597] Call Trace:
      [    0.877597]  <IRQ>
      [    0.877597]  [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
      [    0.877597]  [<ffffffff8158e13c>] arp_process+0x39c/0x690
      [    0.877597]  [<ffffffff8158e57e>] arp_rcv+0x13e/0x170
      [    0.877597]  [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
      [    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
      [    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
      [    0.877597]  [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
      [    0.877597]  [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
      [    0.877597]  [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
      [    0.877597]  [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
      [    0.877597]  [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
      [    0.877597]  [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
      [    0.877597]  [<ffffffff81053228>] __do_softirq+0x98/0x260
      [    0.877597]  [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30
      
      The root cause is use of res.table uninitialized.
      
      Thanks to Nikolay for noticing the uninitialized use amongst the maze of
      gotos.
      
      As Nikolay pointed out the second initialization is not required to fix
      the oops, but rather to fix a related problem where a valid lookup should
      be invalidated before creating the rth entry.
      
      Fixes: b7503e0c ("net: Add FIB table id to rtable")
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Reported-by: NRichard Alpe <richard.alpe@ericsson.com>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde6f9de
  17. 16 9月, 2015 3 次提交
  18. 30 8月, 2015 1 次提交
  19. 29 8月, 2015 1 次提交
  20. 21 8月, 2015 1 次提交
  21. 18 8月, 2015 1 次提交
  22. 14 8月, 2015 2 次提交
  23. 04 8月, 2015 1 次提交
  24. 27 7月, 2015 1 次提交
  25. 25 7月, 2015 1 次提交
    • J
      ipv4: consider TOS in fib_select_default · 2392debc
      Julian Anastasov 提交于
      fib_select_default considers alternative routes only when
      res->fi is for the first alias in res->fa_head. In the
      common case this can happen only when the initial lookup
      matches the first alias with highest TOS value. This
      prevents the alternative routes to require specific TOS.
      
      This patch solves the problem as follows:
      
      - routes that require specific TOS should be returned by
      fib_select_default only when TOS matches, as already done
      in fib_table_lookup. This rule implies that depending on the
      TOS we can have many different lists of alternative gateways
      and we have to keep the last used gateway (fa_default) in first
      alias for the TOS instead of using single tb_default value.
      
      - as the aliases are ordered by many keys (TOS desc,
      fib_priority asc), we restrict the possible results to
      routes with matching TOS and lowest metric (fib_priority)
      and routes that match any TOS, again with lowest metric.
      
      For example, packet with TOS 8 can not use gw3 (not lowest
      metric), gw4 (different TOS) and gw6 (not lowest metric),
      all other gateways can be used:
      
      tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
      tos 8 via gw2 metric 2
      tos 8 via gw3 metric 3
      tos 4 via gw4
      tos 0 via gw5
      tos 0 via gw6 metric 1
      Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2392debc