1. 16 6月, 2017 1 次提交
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
  2. 12 6月, 2017 1 次提交
  3. 09 6月, 2017 1 次提交
  4. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  5. 17 5月, 2017 1 次提交
  6. 18 4月, 2017 2 次提交
  7. 14 4月, 2017 1 次提交
  8. 29 3月, 2017 1 次提交
  9. 28 2月, 2017 1 次提交
  10. 13 1月, 2017 1 次提交
    • N
      ipmr: improve hash scalability · 8fb472c0
      Nikolay Aleksandrov 提交于
      Recently we started using ipmr with thousands of entries and easily hit
      soft lockups on smaller devices. The reason is that the hash function
      uses the high order bits from the src and dst, but those don't change in
      many common cases, also the hash table  is only 64 elements so with
      thousands it doesn't scale at all.
      This patch migrates the hash table to rhashtable, and in particular the
      rhl interface which allows for duplicate elements to be chained because
      of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple
      duplicate entries to be added with different interfaces (IMO wrong, but
      it's been in for a long time).
      
      And here are some results from tests I've run in a VM:
       mr_table size (default, allocated for all namespaces):
        Before                    After
         49304 bytes               2400 bytes
      
       Add 65000 routes (the diff is much larger on smaller devices):
        Before                    After
         1m42s                     58s
      
       Forwarding 256 byte packets with 65000 routes (test done in a VM):
        Before                    After
         3 Mbps / ~1465 pps        122 Mbps / ~59000 pps
      
      As a bonus we no longer see the soft lockups on smaller devices which
      showed up even with 2000 entries before.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fb472c0
  11. 09 1月, 2017 1 次提交
  12. 03 1月, 2017 1 次提交
  13. 25 12月, 2016 1 次提交
  14. 04 11月, 2016 1 次提交
  15. 01 11月, 2016 2 次提交
    • D
      net: Enable support for VRF with ipv4 multicast · e58e4159
      David Ahern 提交于
      Enable support for IPv4 multicast:
      - similar to unicast the flow struct is updated to L3 master device
        if relevant prior to calling fib_rules_lookup. The table id is saved
        to the lookup arg so the rule action for ipmr can return the table
        associated with the device.
      
      - ip_mr_forward needs to check for master device mismatch as well
        since the skb->dev is set to it
      
      - allow multicast address on VRF device for Rx by checking for the
        daddr in the VRF device as well as the original ingress device
      
      - on Tx need to drop to __mkroute_output when FIB lookup fails for
        multicast destination address.
      
      - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates
        IPMR FIB rules on first device create similar to FIB rules. In
        addition the VRF driver does not divert IPv4 multicast packets:
        it breaks on Tx since the fib lookup fails on the mcast address.
      
      With this patch, ipmr forwarding and local rx/tx work.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e58e4159
    • N
  16. 26 9月, 2016 1 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
  17. 21 9月, 2016 1 次提交
  18. 27 7月, 2016 1 次提交
  19. 20 7月, 2016 1 次提交
  20. 17 7月, 2016 1 次提交
    • N
      net: ipmr/ip6mr: add support for keeping an entry age · 43b9e127
      Nikolay Aleksandrov 提交于
      In preparation for hardware offloading of ipmr/ip6mr we need an
      interface that allows to check (and later update) the age of entries.
      Relying on stats alone can show activity but not actual age of the entry,
      furthermore when there're tens of thousands of entries a lot of the
      hardware implementations only support "hit" bits which are cleared on
      read to denote that the entry was active and shouldn't be aged out,
      these can then be naturally translated into age timestamp and will be
      compatible with the software forwarding age. Using a lastuse entry doesn't
      affect performance because the members in that cache line are written to
      along with the age.
      Since all new users are encouraged to use ipmr via netlink, this is
      exported via the RTA_EXPIRES attribute.
      Also do a minor local variable declaration style adjustment - arrange them
      longest to shortest.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Shrijeet Mukherjee <shm@cumulusnetworks.com>
      CC: Satish Ashok <sashok@cumulusnetworks.com>
      CC: Donald Sharp <sharpd@cumulusnetworks.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43b9e127
  21. 28 6月, 2016 1 次提交
  22. 22 4月, 2016 1 次提交
  23. 01 12月, 2015 5 次提交
  24. 25 11月, 2015 1 次提交
  25. 24 11月, 2015 9 次提交
  26. 23 11月, 2015 1 次提交
    • N
      net: ipmr: fix static mfc/dev leaks on table destruction · 0e615e96
      Nikolay Aleksandrov 提交于
      When destroying an mrt table the static mfc entries and the static
      devices are kept, which leads to devices that can never be destroyed
      (because of refcnt taken) and leaked memory, for example:
      unreferenced object 0xffff880034c144c0 (size 192):
        comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
        hex dump (first 32 bytes):
          98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.....S.4....
          ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  ................
        backtrace:
          [<ffffffff815c1b9e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811ea6e0>] kmem_cache_alloc+0x190/0x300
          [<ffffffff815931cb>] ip_mroute_setsockopt+0x5cb/0x910
          [<ffffffff8153d575>] do_ip_setsockopt.isra.11+0x105/0xff0
          [<ffffffff8153e490>] ip_setsockopt+0x30/0xa0
          [<ffffffff81564e13>] raw_setsockopt+0x33/0x90
          [<ffffffff814d1e14>] sock_common_setsockopt+0x14/0x20
          [<ffffffff814d0b51>] SyS_setsockopt+0x71/0xc0
          [<ffffffff815cdbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Make sure that everything is cleaned on netns destruction.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e615e96