1. 25 5月, 2016 3 次提交
    • E
      net_sched: avoid too many hrtimer_start() calls · a9efad8b
      Eric Dumazet 提交于
      I found a serious performance bug in packet schedulers using hrtimers.
      
      sch_htb and sch_fq are definitely impacted by this problem.
      
      We constantly rearm high resolution timers if some packets are throttled
      in one (or more) class, and other packets are flying through qdisc on
      another (non throttled) class.
      
      hrtimer_start() does not have the mod_timer() trick of doing nothing if
      expires value does not change :
      
      	if (timer_pending(timer) &&
                  timer->expires == expires)
                      return 1;
      
      This issue is particularly visible when multiple cpus can queue/dequeue
      packets on the same qdisc, as hrtimer code has to lock a remote base.
      
      I used following fix :
      
      1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
      it.
      
      2) Cache watchdog prior expiration. hrtimer might provide this, but I
      prefer to not rely on some hrtimer internal.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9efad8b
    • H
      ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path. · 252f3f5a
      Haishuang Yan 提交于
      In gre6 xmit path, we are sending a GRE packet, so set fl6 proto
      to IPPROTO_GRE properly.
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      252f3f5a
    • H
      ip6_gre: Fix MTU setting for ip6gretap · 1b227e53
      Haishuang Yan 提交于
      When creat an ip6gretap interface with an unreachable route,
      the MTU is about 14 bytes larger than what was needed.
      
      If the remote address is reachable:
      ping6 2001:0:130::1 -c 2
      PING 2001:0:130::1(2001:0:130::1) 56 data bytes
      64 bytes from 2001:0:130::1: icmp_seq=1 ttl=64 time=1.46 ms
      64 bytes from 2001:0:130::1: icmp_seq=2 ttl=64 time=81.1 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b227e53
  2. 24 5月, 2016 2 次提交
    • E
      ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n · 049bbf58
      Ezequiel Garcia 提交于
      Commit fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
      moves the default TTL assignment, and as side-effect IPv4 TTL now
      has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).
      
      The sysctl_ip_default_ttl is fundamental for IP to work properly,
      as it provides the TTL to be used as default. The defautl TTL may be
      used in ip_selected_ttl, through the following flow:
      
        ip_select_ttl
          ip4_dst_hoplimit
            net->ipv4.sysctl_ip_default_ttl
      
      This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
      in net_init_net, called during ipv4's initialization.
      
      Without this commit, a kernel built without sysctl support will send
      all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
      with setsockopt).
      
      Given a similar issue might appear on the other knobs that were
      namespaceify, this commit also moves them.
      
      Fixes: fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
      Signed-off-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      049bbf58
    • S
      net/atm: sk_err_soft must be positive · c685293a
      Stefan Hajnoczi 提交于
      The sk_err and sk_err_soft fields are positive errno values and
      userspace applications rely on this when using getsockopt(SO_ERROR).
      
      ATM code places an -errno into sk_err_soft in sigd_send() and returns it
      from svc_addparty()/svc_dropparty().
      
      Although I am not familiar with ATM code I came to this conclusion
      because:
      
      1. sigd_send() msg->type cases as_okay and as_error both have:
      
         sk->sk_err = -msg->reply;
      
         while the as_addparty and as_dropparty cases have:
      
         sk->sk_err_soft = msg->reply;
      
         This is the source of the inconsistency.
      
      2. svc_addparty() returns an -errno and assumes sk_err_soft is also an
         -errno:
      
             if (flags & O_NONBLOCK) {
                 error = -EINPROGRESS;
                 goto out;
             }
             ...
             error = xchg(&sk->sk_err_soft, 0);
         out:
             release_sock(sk);
             return error;
      
         This shows that sk_err_soft is indeed being treated as an -errno.
      
      This patch ensures that sk_err_soft is always a positive errno.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c685293a
  3. 21 5月, 2016 20 次提交
  4. 20 5月, 2016 5 次提交
  5. 18 5月, 2016 10 次提交
    • M
      batman-adv: initialize ELP orig address on secondary interfaces · ebe24cea
      Marek Lindner 提交于
      This fix prevents nodes to wrongly create a 00:00:00:00:00:00 originator
      which can potentially interfere with the rest of the neighbor statistics.
      
      Fixes: d6f94d91 ("batman-adv: ELP - adding basic infrastructure")
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      ebe24cea
    • L
      batman-adv: Avoid duplicate neigh_node additions · e123705e
      Linus Lüssing 提交于
      Two parallel calls to batadv_neigh_node_new() might race for creating
      and adding the same neig_node. Fix this by including the check for any
      already existing, identical neigh_node within the spin-lock.
      
      This fixes splats like the following:
      
      [  739.535069] ------------[ cut here ]------------
      [  739.535079] WARNING: CPU: 0 PID: 0 at /usr/src/batman-adv/git/batman-adv/net/batman-adv/bat_iv_ogm.c:1004 batadv_iv_ogm_process_per_outif+0xe3f/0xe60 [batman_adv]()
      [  739.535092] too many matching neigh_nodes
      [  739.535094] Modules linked in: dm_mod tun ip6table_filter ip6table_mangle ip6table_nat nf_nat_ipv6 ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TCPMSS xt_mark iptable_mangle xt_tcpudp xt_conntrack iptable_filter ip_tables x_tables ip_gre ip_tunnel gre bridge stp llc thermal_sys kvm_intel kvm crct10dif_pclmul crc32_pclmul sha256_ssse3 sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev pcspkr ip6_gre ip6_tunnel tunnel6 batman_adv(O) libcrc32c nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 ext4 crc16 mbcache jbd2 xen_netfront xen_blkfront crc32c_intel
      [  739.535177] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  O    4.2.0-0.bpo.1-amd64 #1 Debian 4.2.6-3~bpo8+2
      [  739.535186]  0000000000000000 ffffffffa013b050 ffffffff81554521 ffff88007d003c18
      [  739.535201]  ffffffff8106fa01 0000000000000000 ffff8800047a087a ffff880079c3a000
      [  739.735602]  ffff88007b82bf40 ffff88007bc2d1c0 ffffffff8106fa7a ffffffffa013aa8e
      [  739.735624] Call Trace:
      [  739.735639]  <IRQ>  [<ffffffff81554521>] ? dump_stack+0x40/0x50
      [  739.735677]  [<ffffffff8106fa01>] ? warn_slowpath_common+0x81/0xb0
      [  739.735692]  [<ffffffff8106fa7a>] ? warn_slowpath_fmt+0x4a/0x50
      [  739.735715]  [<ffffffffa012448f>] ? batadv_iv_ogm_process_per_outif+0xe3f/0xe60 [batman_adv]
      [  739.735740]  [<ffffffffa0124813>] ? batadv_iv_ogm_receive+0x363/0x380 [batman_adv]
      [  739.735762]  [<ffffffffa0124813>] ? batadv_iv_ogm_receive+0x363/0x380 [batman_adv]
      [  739.735783]  [<ffffffff810b0841>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
      [  739.735804]  [<ffffffffa012cb39>] ? batadv_batman_skb_recv+0xc9/0x110 [batman_adv]
      [  739.735825]  [<ffffffff81464891>] ? __netif_receive_skb_core+0x841/0x9a0
      [  739.735838]  [<ffffffff810b0841>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
      [  739.735853]  [<ffffffff81465681>] ? process_backlog+0xa1/0x140
      [  739.735864]  [<ffffffff81464f1a>] ? net_rx_action+0x20a/0x320
      [  739.735878]  [<ffffffff81073aa7>] ? __do_softirq+0x107/0x270
      [  739.735891]  [<ffffffff81073d82>] ? irq_exit+0x92/0xa0
      [  739.735905]  [<ffffffff8137e0d1>] ? xen_evtchn_do_upcall+0x31/0x40
      [  739.735924]  [<ffffffff8155b8fe>] ? xen_do_hypervisor_callback+0x1e/0x40
      [  739.735939]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
      [  739.735965]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
      [  739.735979]  [<ffffffff8100a39c>] ? xen_safe_halt+0xc/0x20
      [  739.735991]  [<ffffffff8101da6c>] ? default_idle+0x1c/0xa0
      [  739.736004]  [<ffffffff810abf6b>] ? cpu_startup_entry+0x2eb/0x350
      [  739.736019]  [<ffffffff81b2af5e>] ? start_kernel+0x480/0x48b
      [  739.736032]  [<ffffffff81b2d116>] ? xen_start_kernel+0x507/0x511
      [  739.736048] ---[ end trace c106bb901244bc8c ]---
      
      Fixes: f987ed6e ("batman-adv: protect neighbor list with rcu locks")
      Reported-by: NMartin Weinelt <martin@darmstadt.freifunk.net>
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      e123705e
    • S
      batman-adv: Fix integer overflow in batadv_iv_ogm_calc_tq · d285f52c
      Sven Eckelmann 提交于
      The undefined behavior sanatizer detected an signed integer overflow in a
      setup with near perfect link quality
      
          UBSAN: Undefined behaviour in net/batman-adv/bat_iv_ogm.c:1246:25
          signed integer overflow:
          8713350 * 255 cannot be represented in type 'int'
      
      The problems happens because the calculation of mixed unsigned and signed
      integers resulted in an integer multiplication.
      
            batadv_ogm_packet::tq (u8 255)
          * tq_own (u8 255)
          * tq_asym_penalty (int 134; max 255)
          * tq_iface_penalty (int 255; max 255)
      
      The tq_iface_penalty, tq_asym_penalty and inv_asym_penalty can just be
      changed to unsigned int because they are not expected to become negative.
      
      Fixes: c0398768 ("batman-adv: add WiFi penalty")
      Signed-off-by: NSven Eckelmann <sven.eckelmann@open-mesh.com>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      d285f52c
    • A
      batman-adv: make sure ELP/OGM orig MAC is updated on address change · 1653f61d
      Antonio Quartulli 提交于
      When the MAC address of the primary interface is changed,
      update the originator address in the ELP and OGM skb buffers as
      well in order to reflect the change.
      
      Fixes: d6f94d91 ("batman-adv: ELP - adding basic infrastructure")
      Reported-by: NMarek Lindner <marek@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      1653f61d
    • S
      batman-adv: Fix unexpected free of bcast_own on add_if error · f7dcdf5f
      Sven Eckelmann 提交于
      The function batadv_iv_ogm_orig_add_if allocates new buffers for bcast_own
      and bcast_own_sum. It is expected that these buffers are unchanged in case
      either bcast_own or bcast_own_sum couldn't be resized.
      
      But the error handling of this function frees the already resized buffer
      for bcast_own when the allocation of the new bcast_own_sum buffer failed.
      This will lead to an invalid memory access when some code will try to
      access bcast_own.
      
      Instead the resized new bcast_own buffer has to be kept. This will not lead
      to problems because the size of the buffer was only increased and therefore
      no user of the buffer will try to access bytes outside of the new buffer.
      
      Fixes: d0015fdd ("batman-adv: provide orig_node routing API")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      f7dcdf5f
    • S
      batman-adv: Fix refcnt leak in batadv_v_neigh_* · 71f9d27d
      Sven Eckelmann 提交于
      The functions batadv_neigh_ifinfo_get increase the reference counter of the
      batadv_neigh_ifinfo. These have to be reduced again when the reference is
      not used anymore to correctly free the objects.
      
      Fixes: 97869060 ("batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API calls")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      71f9d27d
    • S
      batman-adv: Avoid nullptr derefence in batadv_v_neigh_is_sob · a45e932a
      Sven Eckelmann 提交于
      batadv_neigh_ifinfo_get can return NULL when it cannot find (even when only
      temporarily) anymore the neigh_ifinfo in the list neigh->ifinfo_list. This
      has to be checked to avoid kernel Oopses when the ifinfo is dereferenced.
      
      This a situation which isn't expected but is already handled by functions
      like batadv_v_neigh_cmp. The same kind of warning is therefore used before
      the function returns without dereferencing the pointers.
      
      Fixes: 97869060 ("batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API calls")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      a45e932a
    • F
      batman-adv: fix skb deref after free · 63d443ef
      Florian Westphal 提交于
      batadv_send_skb_to_orig() calls dev_queue_xmit() so we can't use skb->len.
      
      Fixes: 95332477 ("batman-adv: network coding - buffer unicast packets before forward")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <a@unstable.cc>
      63d443ef
    • J
      switchdev: pass pointer to fib_info instead of copy · da4ed551
      Jiri Pirko 提交于
      The problem is that fib_info->nh is [0] so the struct fib_info
      allocation size depends on number of nexthops. If we just copy fib_info,
      we do not copy the nexthops info and driver accesses memory which is not
      ours.
      
      Given the fact that fib4 does not defer operations and therefore it does
      not need copy, just pass the pointer down to drivers as it was done
      before.
      
      Fixes: 850d0cbc ("switchdev: remove pointers from switchdev objects")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da4ed551
    • W
      net_sched: close another race condition in tcf_mirred_release() · dc327f89
      WANG Cong 提交于
      We saw the following extra refcount release on veth device:
      
        kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1
      
      Since we heavily use mirred action to redirect packets to veth, I think
      this is caused by the following race condition:
      
      CPU0:
      tcf_mirred_release(): (in RCU callback)
      	struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);
      
      CPU1:
      mirred_device_event():
              spin_lock_bh(&mirred_list_lock);
              list_for_each_entry(m, &mirred_list, tcfm_list) {
                      if (rcu_access_pointer(m->tcfm_dev) == dev) {
                              dev_put(dev);
                              /* Note : no rcu grace period necessary, as
                               * net_device are already rcu protected.
                               */
                              RCU_INIT_POINTER(m->tcfm_dev, NULL);
                      }
              }
              spin_unlock_bh(&mirred_list_lock);
      
      CPU0:
      tcf_mirred_release():
              spin_lock_bh(&mirred_list_lock);
              list_del(&m->tcfm_list);
              spin_unlock_bh(&mirred_list_lock);
              if (dev)               // <======== Stil refers to the old m->tcfm_dev
                      dev_put(dev);  // <======== dev_put() is called on it again
      
      The action init code path is good because it is impossible to modify
      an action that is being removed.
      
      So, fix this by moving everything under the spinlock.
      
      Fixes: 2ee22a90 ("net_sched: act_mirred: remove spinlock in fast path")
      Fixes: 6bd00b85 ("act_mirred: fix a race condition on mirred_list")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc327f89