1. 18 9月, 2015 1 次提交
    • D
      net: Initialize table in fib result · bde6f9de
      David Ahern 提交于
      Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:
      
      [    0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
      [    0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
      [    0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0
      [    0.877597] Oops: 0000 [#1] SMP
      [    0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
      [    0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
      [    0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [    0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000
      [    0.877597] RIP: 0010:[<ffffffff8155b5e2>]  [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
      [    0.877597] RSP: 0018:ffff88003ed03ba0  EFLAGS: 00010202
      [    0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020
      [    0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8
      [    0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000
      [    0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00
      [    0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600
      [    0.877597] FS:  00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
      [    0.877597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0
      [    0.877597] Stack:
      [    0.877597]  0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
      [    0.877597]  ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
      [    0.877597]  0000000000000000 0000000000000046 0000000000000000 0000000400000000
      [    0.877597] Call Trace:
      [    0.877597]  <IRQ>
      [    0.877597]  [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
      [    0.877597]  [<ffffffff8158e13c>] arp_process+0x39c/0x690
      [    0.877597]  [<ffffffff8158e57e>] arp_rcv+0x13e/0x170
      [    0.877597]  [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
      [    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
      [    0.877597]  [<ffffffff81515795>] ? __build_skb+0x25/0x100
      [    0.877597]  [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
      [    0.877597]  [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
      [    0.877597]  [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
      [    0.877597]  [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
      [    0.877597]  [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
      [    0.877597]  [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
      [    0.877597]  [<ffffffff81053228>] __do_softirq+0x98/0x260
      [    0.877597]  [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30
      
      The root cause is use of res.table uninitialized.
      
      Thanks to Nikolay for noticing the uninitialized use amongst the maze of
      gotos.
      
      As Nikolay pointed out the second initialization is not required to fix
      the oops, but rather to fix a related problem where a valid lookup should
      be invalidated before creating the rth entry.
      
      Fixes: b7503e0c ("net: Add FIB table id to rtable")
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Reported-by: NRichard Alpe <richard.alpe@ericsson.com>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde6f9de
  2. 16 9月, 2015 3 次提交
  3. 30 8月, 2015 1 次提交
  4. 29 8月, 2015 1 次提交
  5. 21 8月, 2015 1 次提交
  6. 18 8月, 2015 1 次提交
  7. 14 8月, 2015 2 次提交
  8. 04 8月, 2015 1 次提交
  9. 27 7月, 2015 1 次提交
  10. 25 7月, 2015 1 次提交
    • J
      ipv4: consider TOS in fib_select_default · 2392debc
      Julian Anastasov 提交于
      fib_select_default considers alternative routes only when
      res->fi is for the first alias in res->fa_head. In the
      common case this can happen only when the initial lookup
      matches the first alias with highest TOS value. This
      prevents the alternative routes to require specific TOS.
      
      This patch solves the problem as follows:
      
      - routes that require specific TOS should be returned by
      fib_select_default only when TOS matches, as already done
      in fib_table_lookup. This rule implies that depending on the
      TOS we can have many different lists of alternative gateways
      and we have to keep the last used gateway (fa_default) in first
      alias for the TOS instead of using single tb_default value.
      
      - as the aliases are ordered by many keys (TOS desc,
      fib_priority asc), we restrict the possible results to
      routes with matching TOS and lowest metric (fib_priority)
      and routes that match any TOS, again with lowest metric.
      
      For example, packet with TOS 8 can not use gw3 (not lowest
      metric), gw4 (different TOS) and gw6 (not lowest metric),
      all other gateways can be used:
      
      tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
      tos 8 via gw2 metric 2
      tos 8 via gw3 metric 3
      tos 4 via gw4
      tos 0 via gw5
      tos 0 via gw6 metric 1
      Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2392debc
  11. 22 7月, 2015 5 次提交
  12. 10 7月, 2015 1 次提交
  13. 24 6月, 2015 1 次提交
    • A
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek 提交于
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eeb075f
  14. 23 5月, 2015 1 次提交
  15. 04 5月, 2015 1 次提交
    • A
      net: ipv4: route: Fix sending IGMP messages with link address · 6a211654
      Andrew Lunn 提交于
      In setups with a global scope address on an interface, and a lesser
      scope address on an interface sending IGMP reports, the reports can be
      sent using the other interfaces global scope address rather than the
      local interface address. RFC 2236 suggests:
      
           Ignore the Report if you cannot identify the source address of
           the packet as belonging to a subnet assigned to the interface on
           which the packet was received.
      
      since such reports could be forged.
      
      Look at the protocol when deciding if a RT_SCOPE_LINK address should
      be used for the packet.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a211654
  16. 02 5月, 2015 1 次提交
  17. 30 4月, 2015 1 次提交
  18. 04 4月, 2015 2 次提交
  19. 01 4月, 2015 2 次提交
  20. 26 3月, 2015 1 次提交
  21. 10 3月, 2015 1 次提交
  22. 30 1月, 2015 1 次提交
  23. 27 1月, 2015 1 次提交
  24. 19 1月, 2015 1 次提交
  25. 18 1月, 2015 1 次提交
    • J
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg 提交于
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053c095a
  26. 16 1月, 2015 1 次提交
    • E
      ipv4: per cpu uncached list · 5055c371
      Eric Dumazet 提交于
      RAW sockets with hdrinc suffer from contention on rt_uncached_lock
      spinlock.
      
      One solution is to use percpu lists, since most routes are destroyed
      by the cpu that created them.
      
      It is unclear why we even have to put these routes in uncached_list,
      as all outgoing packets should be freed when a device is dismantled.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: caacf05e ("ipv4: Properly purge netdev references on uncached routes.")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5055c371
  27. 31 10月, 2014 1 次提交
  28. 29 9月, 2014 1 次提交
  29. 16 9月, 2014 1 次提交
  30. 06 9月, 2014 2 次提交