1. 16 6月, 2016 5 次提交
  2. 11 6月, 2016 2 次提交
  3. 10 6月, 2016 2 次提交
  4. 09 6月, 2016 8 次提交
    • M
      mac80211: implement codel on fair queuing flows · 5caa328e
      Michal Kazior 提交于
      There is no other limit other than a global
      packet count limit when using software queuing.
      This means a single flow queue can grow insanely
      long. This is particularly bad for TCP congestion
      algorithms which requires a little more
      sophisticated frame dropping scheme than a mere
      headdrop on limit overflow.
      
      Hence apply (a slighly modified, to fit the knobs)
      CoDel5 on flow queues. This improves TCP
      convergence and stability when combined with
      wireless driver which keeps its own tx queue/fifo
      at a minimum fill level for given link conditions.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      5caa328e
    • M
      mac80211: skip netdev queue control with software queuing · 80a83cfc
      Michal Kazior 提交于
      Qdiscs are designed with no regard to 802.11
      aggregation requirements and hand out
      packet-by-packet with no guarantee they are
      destined to the same tid. This does more bad than
      good no matter how fairly a given qdisc may behave
      on an ethernet interface.
      
      Software queuing used per-AC netdev subqueue
      congestion control whenever a global AC limit was
      hit. This meant in practice a single station or
      tid queue could starve others rather easily. This
      could resonate with qdiscs in a bad way or could
      just end up with poor aggregation performance.
      Increasing the AC limit would increase induced
      latency which is also bad.
      
      Disabling qdiscs by default and performing
      taildrop instead of netdev subqueue congestion
      control on the other hand makes it possible for
      tid queues to fill up "in the meantime" while
      preventing stations starving each other.
      
      This increases aggregation opportunities and
      should allow software queuing based drivers
      achieve better performance by utilizing airtime
      more efficiently with big aggregates.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      80a83cfc
    • F
      sched: place state, next_sched and gso_skb in same cacheline again · c8945043
      Florian Westphal 提交于
      Earlier commits removed two members from struct Qdisc which places
      next_sched/gso_skb into a different cacheline than ->state.
      
      This restores the struct layout to what it was before the removal.
      Move the two members, then add an annotation so they all reside in the
      same cacheline.
      
      This adds a 16 byte hole after cpu_qstats.
      
      The hole could be closed but as it doesn't decrease total struct size just
      do it this way.
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8945043
    • F
      sched: remove qdisc->drop · a09ceb0e
      Florian Westphal 提交于
      after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
      more callers of ->drop() outside of other ->drop functions, i.e.
      nothing calls them.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a09ceb0e
    • F
      sched: remove qdisc_rehape_fail · c3a173d7
      Florian Westphal 提交于
      After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail
      is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a173d7
    • F
      cbq: remove TCA_CBQ_POLICE support · dd47c1fa
      Florian Westphal 提交于
      iproute2 doesn't implement any cbq option that results in this attribute
      being sent to kernel.
      
      To make use of it, user would have to
      
      - patch iproute2
      - add a class
      - attach a qdisc to the class (default pfifo doesn't work as
        q->handle is 0 and cbq_set_police() is a no-op in this case)
      - re-'add' the same class (tc class change ...) again
      - user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
        this 'police' feature relies on its presence
      - the added qdisc must be one of bfifo, pfifo or netem
      
      If all of these conditions are met and _some_ leaf qdiscs, namely
      p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into
      cbq, which will attempt to re-queue the skb into a different class
      as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT.
      
      [ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ].
      
      This feature, which isn't documented or implemented in iproute2,
      and isn't implemented consistently (most qdiscs like sfq, codel, etc
      drop right away instead of attempting this reclassification) is the
      sole reason for the reshape_fail and __parent member in Qdisc struct.
      
      So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP
      so userspace knows we don't support it, and then remove no-longer needed
      infrastructure in followup commit.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd47c1fa
    • D
      net: Add l3mdev rule · 96c63fa7
      David Ahern 提交于
      Currently, VRFs require 1 oif and 1 iif rule per address family per
      VRF. As the number of VRF devices increases it brings scalability
      issues with the increasing rule list. All of the VRF rules have the
      same format with the exception of the specific table id to direct the
      lookup. Since the table id is available from the oif or iif in the
      loopup, the VRF rules can be consolidated to a single rule that pulls
      the table from the VRF device.
      
      This patch introduces a new rule attribute l3mdev. The l3mdev rule
      means the table id used for the lookup is pulled from the L3 master
      device (e.g., VRF) rather than being statically defined. With the
      l3mdev rule all of the basic VRF FIB rules are reduced to 1 l3mdev
      rule per address family (IPv4 and IPv6).
      
      If an admin wishes to insert higher priority rules for specific VRFs
      those rules will co-exist with the l3mdev rule. This capability means
      current VRF scripts will co-exist with this new simpler implementation.
      
      Currently, the rules list for both ipv4 and ipv6 look like this:
          $ ip  ru ls
          1000:       from all oif vrf1 lookup 1001
          1000:       from all iif vrf1 lookup 1001
          1000:       from all oif vrf2 lookup 1002
          1000:       from all iif vrf2 lookup 1002
          1000:       from all oif vrf3 lookup 1003
          1000:       from all iif vrf3 lookup 1003
          1000:       from all oif vrf4 lookup 1004
          1000:       from all iif vrf4 lookup 1004
          1000:       from all oif vrf5 lookup 1005
          1000:       from all iif vrf5 lookup 1005
          1000:       from all oif vrf6 lookup 1006
          1000:       from all iif vrf6 lookup 1006
          1000:       from all oif vrf7 lookup 1007
          1000:       from all iif vrf7 lookup 1007
          1000:       from all oif vrf8 lookup 1008
          1000:       from all iif vrf8 lookup 1008
          ...
          32765:      from all lookup local
          32766:      from all lookup main
          32767:      from all lookup default
      
      With the l3mdev rule the list is just the following regardless of the
      number of VRFs:
          $ ip ru ls
          1000:       from all lookup [l3mdev table]
          32765:      from all lookup local
          32766:      from all lookup main
          32767:      from all lookup default
      
      (Note: the above pretty print of the rule is based on an iproute2
             prototype. Actual verbage may change)
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96c63fa7
    • F
      net: dsa: Initialize CPU port ethtool ops per tree · 0c73c523
      Florian Fainelli 提交于
      Now that we can properly support multiple distinct trees in the system,
      using a global variable: dsa_cpu_port_ethtool_ops is getting clobbered
      as soon as the second switch tree gets probed, and we don't want that.
      
      We need to move this to be dynamically allocated, and since we can't
      really be comparing addresses anymore to determine first time
      initialization versus any other times, just move this to dsa.c and
      dsa2.c where the remainder of the dst/ds initialization happens.
      
      The operations teardown restores the master netdev's ethtool_ops to its
      original ethtool_ops pointer (typically within the Ethernet driver)
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c73c523
  5. 08 6月, 2016 6 次提交
  6. 07 6月, 2016 1 次提交
  7. 06 6月, 2016 1 次提交
    • M
      ipvs: update real-server binding of outgoing connections in SIP-pe · 3ec10d3a
      Marco Angaroni 提交于
      Previous patch that introduced handling of outgoing packets in SIP
      persistent-engine did not call ip_vs_check_template() in case packet was
      matching a connection template. Assumption was that real-server was
      healthy, since it was sending a packet just in that moment.
      
      There are however real-server fault conditions requiring that association
      between call-id and real-server (represented by connection template)
      gets updated. Here is an example of the sequence of events:
        1) RS1 is a back2back user agent that handled call-id1 and call-id2
        2) RS1 is down and was marked as unavailable
        3) new message from outside comes to IPVS with call-id1
        4) IPVS reschedules the message to RS2, which becomes new call handler
        5) RS2 forwards the message outside, translating call-id1 to call-id2
        6) inside pe->conn_out() IPVS matches call-id2 with existing template
        7) IPVS does not change association call-id2 <-> RS1
        8) new message comes from client with call-id2
        9) IPVS reschedules the message to a real-server potentially different
           from RS2, which is now the correct destination
      
      This patch introduces ip_vs_check_template() call in the handling of
      outgoing packets for SIP-pe. And also introduces a second optional
      argument for ip_vs_check_template() that allows to check if dest
      associated to a connection template is the same dest that was identified
      as the source of the packet. This is to change the real-server bound to a
      particular call-id independently from its availability status: the idea
      is that it's more reliable, for in->out direction (where internal
      network can be considered trusted), to always associate a call-id with
      the last real-server that used it in one of its messages. Think about
      above sequence of events where, just after step 5, RS1 returns instead
      to be available.
      
      Comparison of dests is done by simply comparing pointers to struct
      ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its
      memory address, but represent a different real-server in terms of
      ip-address / port.
      
      Fixes: 39b97223 ("ipvs: handle connections started by real-servers")
      Signed-off-by: NMarco Angaroni <marcoangaroni@gmail.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      3ec10d3a
  8. 05 6月, 2016 6 次提交
  9. 04 6月, 2016 1 次提交
    • M
      sctp: Add GSO support · 90017acc
      Marcelo Ricardo Leitner 提交于
      SCTP has this pecualiarity that its packets cannot be just segmented to
      (P)MTU. Its chunks must be contained in IP segments, padding respected.
      So we can't just generate a big skb, set gso_size to the fragmentation
      point and deliver it to IP layer.
      
      This patch takes a different approach. SCTP will now build a skb as it
      would be if it was received using GRO. That is, there will be a cover
      skb with protocol headers and children ones containing the actual
      segments, already segmented to a way that respects SCTP RFCs.
      
      With that, we can tell skb_segment() to just split based on frag_list,
      trusting its sizes are already in accordance.
      
      This way SCTP can benefit from GSO and instead of passing several
      packets through the stack, it can pass a single large packet.
      
      v2:
      - Added support for receiving GSO frames, as requested by Dave Miller.
      - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
      - Added heuristics similar to what we have in TCP for not generating
        single GSO packets that fills cwnd.
      v3:
      - consider sctphdr size in skb_gso_transport_seglen()
      - rebased due to 5c7cdf33 ("gso: Remove arbitrary checks for
        unsupported GSO")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90017acc
  10. 02 6月, 2016 1 次提交
  11. 31 5月, 2016 2 次提交
    • K
      cfg80211: Advertise extended capabilities per interface type to userspace · 019ae3a9
      Kanchanapally, Vidyullatha 提交于
      The driver extended capabilities may differ for different
      interface types which the userspace needs to know (for
      example the fine timing measurement initiator and responder
      bits might differ for a station and AP). Add a new nl80211
      attribute to provide extended capabilities per interface type
      to userspace.
      Signed-off-by: NVidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
      Reviewed-by: NJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      019ae3a9
    • J
      cfg80211: Allow cfg80211_connect_result() errors to be distinguished · bf1ecd21
      Jouni Malinen 提交于
      Previously, the status parameter to cfg80211_connect_result() was
      documented as using WLAN_STATUS_UNSPECIFIED_FAILURE (1) when the real
      status code for the failure is not known. This value can be used by an
      AP (and often is) and as such, user space cannot distinguish between
      explicitly rejected authentication/association and not being able to
      even try to associate or not receiving a response from the AP.
      
      Add a new inline function, cfg80211_connect_timeout(), to be used when
      the driver knows that the connection attempt failed due to a reason
      where connection could not be attempt or no response was received from
      the AP. The internal functions now allow a negative status value (-1) to
      be used as an indication of this special case. This results in the
      NL80211_ATTR_TIMED_OUT to be added to the NL80211_CMD_CONNECT event to
      allow user space to determine this case was hit. For backwards
      compatibility, NL80211_STATUS_CODE with the value
      WLAN_STATUS_UNSPECIFIED_FAILURE is still indicated in the event in such
      a case.
      Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com>
      [johannes: fix cfg80211_connect_bss() prototype to use int for status,
       add cfg80211_connect_timeout() to docbook, fix docbook]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      bf1ecd21
  12. 30 5月, 2016 1 次提交
    • A
      ipv6: hide ip6_encap_hlen/ip6_tnl_encap definitions · 9791d8e7
      Arnd Bergmann 提交于
      A recent cleanup moved MAX_IPTUN_ENCAP_OPS along with some other
      definitions, but it is now invisible when CONFIG_INET is
      not defined, but still referenced from ip6_tunnel.h:
      
      In file included from net/xfrm/xfrm_input.c:17:0:
      include/net/ip6_tunnel.h:67:17: error: 'MAX_IPTUN_ENCAP_OPS' undeclared here (not in a function)
         ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
                       ^~~~~~~~~~~~~~~~~~~
      
      This hides the ip6_encap_hlen and ip6_tnl_encap functions inside
      of CONFIG_INET so we don't run into the the problem.
      
      Alternatively we could move the macro out of the #ifdef again to
      restore the previous behavior
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 55c2bc14 ("net: Cleanup encap items in ip_tunnels.h")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9791d8e7
  13. 25 5月, 2016 2 次提交
    • E
      netfilter: nf_queue: Make the queue_handler pernet · dc3ee32e
      Eric W. Biederman 提交于
      Florian Weber reported:
      > Under full load (unshare() in loop -> OOM conditions) we can
      > get kernel panic:
      >
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      > IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
      > [..]
      > task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
      > RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
      > RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
      > RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
      > [..]
      > Call Trace:
      >  [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
      >  [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
      >  [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
      >  [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
      >  [<ffffffff8141b652>] setup_net+0xc2/0x120
      >  [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
      >  [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
      >  [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
      >  [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
      >  [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
      > Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
      >
      
      The simple fix for this requires a new pernet variable for struct
      nf_queue that indicates when it is safe to use the dynamically
      allocated nf_queue state.
      
      As we need a variable anyway make nf_register_queue_handler and
      nf_unregister_queue_handler pernet.  This allows the existing logic of
      when it is safe to use the state from the nfnetlink_queue module to be
      reused with no changes except for making it per net.
      
      The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
      function nfnl_queue_net_exit_batch so that the worst case of having a
      syncrhonize_rcu in the pernet exit path is not experienced in batch
      mode.
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dc3ee32e
    • E
      net_sched: avoid too many hrtimer_start() calls · a9efad8b
      Eric Dumazet 提交于
      I found a serious performance bug in packet schedulers using hrtimers.
      
      sch_htb and sch_fq are definitely impacted by this problem.
      
      We constantly rearm high resolution timers if some packets are throttled
      in one (or more) class, and other packets are flying through qdisc on
      another (non throttled) class.
      
      hrtimer_start() does not have the mod_timer() trick of doing nothing if
      expires value does not change :
      
      	if (timer_pending(timer) &&
                  timer->expires == expires)
                      return 1;
      
      This issue is particularly visible when multiple cpus can queue/dequeue
      packets on the same qdisc, as hrtimer code has to lock a remote base.
      
      I used following fix :
      
      1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
      it.
      
      2) Cache watchdog prior expiration. hrtimer might provide this, but I
      prefer to not rely on some hrtimer internal.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9efad8b
  14. 21 5月, 2016 2 次提交