1. 20 4月, 2013 2 次提交
  2. 19 4月, 2013 1 次提交
    • E
      tcp: introduce TCPSpuriousRtxHostQueues SNMP counter · 0e280af0
      Eric Dumazet 提交于
      Host queues (Qdisc + NIC) can hold packets so long that TCP can
      eventually retransmit a packet before the first transmit even left
      the host.
      
      Its not clear right now if we could avoid this in the first place :
      
      - We could arm RTO timer not at the time we enqueue packets, but
        at the time we TX complete them (tcp_wfree())
      
      - Cancel the sending of the new copy of the packet if prior one
        is still in queue.
      
      This patch adds instrumentation so that we can at least see how
      often this problem happens.
      
      TCPSpuriousRtxHostQueues SNMP counter is incremented every time
      we detect the fast clone is not yet freed in tcp_transmit_skb()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e280af0
  3. 18 4月, 2013 6 次提交
  4. 17 4月, 2013 2 次提交
  5. 16 4月, 2013 3 次提交
  6. 15 4月, 2013 1 次提交
  7. 13 4月, 2013 1 次提交
    • E
      tcp: GSO should be TSQ friendly · d6a4a104
      Eric Dumazet 提交于
      I noticed that TSQ (TCP Small queues) was less effective when TSO is
      turned off, and GSO is on. If BQL is not enabled, TSQ has then no
      effect.
      
      It turns out the GSO engine frees the original gso_skb at the time the
      fragments are generated and queued to the NIC.
      
      We should instead call the tcp_wfree() destructor for the last fragment,
      to keep the flow control as intended in TSQ. This effectively limits
      the number of queued packets on qdisc + NIC layers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a4a104
  8. 12 4月, 2013 2 次提交
    • M
      usbnet: handle link change · 4b49f58f
      Ming Lei 提交于
      The link change is detected via the interrupt pipe, and bulk
      pipes are responsible for transfering packets, so it is reasonable
      to stop bulk transfer after link is reported as off.
      
      Two adavantages may be obtained with stopping bulk transfer
      after link becomes off:
      
      - USB bus bandwidth is saved(USB bus is shared bus except for
      USB3.0), for example, lots of 'IN' token packets and 'NYET'
      handshake packets is transfered on 2.0 bus.
      
      - probabaly power might be saved for usb host controller since
      cancelling bulk transfer may disable the asynchronous schedule of
      host controller.
      
      With this patch, when link becomes off, about ~10% performance
      boost can be found on bulk transfer of anther usb device which
      is attached to same bus with the usbnet device, see below
      test on next-20130410:
      
      - read from usb mass storage(Sandisk Extreme USB 3.0) on pandaboard
      with below command after unplugging ethernet cable:
      
      	dd if=/dev/sda iflag=direct of=/dev/null bs=1M count=800
      
      - without the patch
      1, 838860800 bytes (839 MB) copied, 36.2216 s, 23.2 MB/s
      2, 838860800 bytes (839 MB) copied, 35.8368 s, 23.4 MB/s
      3, 838860800 bytes (839 MB) copied, 35.823 s, 23.4 MB/s
      4, 838860800 bytes (839 MB) copied, 35.937 s, 23.3 MB/s
      5, 838860800 bytes (839 MB) copied, 35.7365 s, 23.5 MB/s
      average: 23.6MB/s
      
      - with the patch
      1, 838860800 bytes (839 MB) copied, 32.3817 s, 25.9 MB/s
      2, 838860800 bytes (839 MB) copied, 31.7389 s, 26.4 MB/s
      3, 838860800 bytes (839 MB) copied, 32.438 s, 25.9 MB/s
      4, 838860800 bytes (839 MB) copied, 32.5492 s, 25.8 MB/s
      5, 838860800 bytes (839 MB) copied, 31.6178 s, 26.5 MB/s
      average: 26.1MB/s
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b49f58f
    • M
      usbnet: introduce usbnet_link_change API · ac64995d
      Ming Lei 提交于
      This patch introduces the API of usbnet_link_change, so that
      usbnet can handle link change centrally, which may help to
      implement killing traffic URBs for saving USB bus bandwidth
      and host controller power.
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac64995d
  9. 10 4月, 2013 4 次提交
  10. 09 4月, 2013 3 次提交
    • D
      net: ipv6: add tokenized interface identifier support · f53adae4
      Daniel Borkmann 提交于
      This patch adds support for IPv6 tokenized IIDs, that allow
      for administrators to assign well-known host-part addresses
      to nodes whilst still obtaining global network prefix from
      Router Advertisements. It is currently in draft status.
      
        The primary target for such support is server platforms
        where addresses are usually manually configured, rather
        than using DHCPv6 or SLAAC. By using tokenised identifiers,
        hosts can still determine their network prefix by use of
        SLAAC, but more readily be automatically renumbered should
        their network prefix change. [...]
      
        The disadvantage with static addresses is that they are
        likely to require manual editing should the network prefix
        in use change.  If instead there were a method to only
        manually configure the static identifier part of the IPv6
        address, then the address could be automatically updated
        when a new prefix was introduced, as described in [RFC4192]
        for example.  In such cases a DNS server might be
        configured with such a tokenised interface identifier of
        ::53, and SLAAC would use the token in constructing the
        interface address, using the advertised prefix. [...]
      
        http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
      
      The implementation is partially based on top of Mark K.
      Thompson's proof of concept. However, it uses the Netlink
      interface for configuration resp. data retrival, so that
      it can be easily extended in future. Successfully tested
      by myself.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f53adae4
    • W
      ieee802154/nl-mac.c: make some MLME operations optional · 56aa091d
      Werner Almesberger 提交于
      Check for NULL before calling the following operations from "struct
      ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
      and scan_req.
      
      This fixes a current oops where those functions are called but not
      implemented. It also updates the documentation to clarify that they
      are now optional by design. If a call to an unimplemented function
      is attempted, the kernel returns EOPNOTSUPP via netlink.
      
      The following operations are still required: get_phy, get_pan_id,
      get_short_addr, and get_dsn.
      
      Note that the places where this patch changes the initialization
      of "ret" should not affect the rest of the code since "ret" was
      always set (again) before returning its value.
      Signed-off-by: NWerner Almesberger <werner@almesberger.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56aa091d
    • W
      IEEE 802.15.4: remove get_bsn from "struct ieee802154_mlme_ops" · d87c8c6d
      Werner Almesberger 提交于
      It served no purpose: we never call it from anywhere in the stack
      and the only driver that did implement it (fakehard) merely provided
      a dummy value.
      
      There is also considerable doubt whether it would make sense to
      even attempt beacon processing at this level in the Linux kernel.
      Signed-off-by: NWerner Almesberger <werner@almesberger.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d87c8c6d
  11. 08 4月, 2013 2 次提交
  12. 06 4月, 2013 4 次提交
    • P
      netfilter: don't reset nf_trace in nf_reset() · 124dff01
      Patrick McHardy 提交于
      Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
      to reset nf_trace in nf_reset(). This is wrong and unnecessary.
      
      nf_reset() is used in the following cases:
      
      - when passing packets up the the socket layer, at which point we want to
        release all netfilter references that might keep modules pinned while
        the packet is queued. nf_trace doesn't matter anymore at this point.
      
      - when encapsulating or decapsulating IPsec packets. We want to continue
        tracing these packets after IPsec processing.
      
      - when passing packets through virtual network devices. Only devices on
        that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
        used anymore. Its not entirely clear whether those packets should
        be traced after that, however we've always done that.
      
      - when passing packets through virtual network devices that make the
        packet cross network namespace boundaries. This is the only cases
        where we clearly want to reset nf_trace and is also what the
        original patch intended to fix.
      
      Add a new function nf_reset_trace() and use it in dev_forward_skb() to
      fix this properly.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124dff01
    • P
      netfilter: remove unneeded variable proc_net_netfilter · 12202fa7
      Pablo Neira Ayuso 提交于
      Now that this supports net namespace for nflog and nfqueue,
      we can remove the global proc_net_netfilter which has no
      clients anymore.
      
      Based on patch from Gao feng.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      12202fa7
    • G
      netfilter: nf_log: prepare net namespace support for loggers · 30e0c6a6
      Gao feng 提交于
      This patch adds netns support to nf_log and it prepares netns
      support for existing loggers. It is composed of four major
      changes.
      
      1) nf_log_register has been split to two functions: nf_log_register
         and nf_log_set. The new nf_log_register is used to globally
         register the nf_logger and nf_log_set is used for enabling
         pernet support from nf_loggers.
      
         Per netns is not yet complete after this patch, it comes in
         separate follow up patches.
      
      2) Add net as a parameter of nf_log_bind_pf. Per netns is not
         yet complete after this patch, it only allows to bind the
         nf_logger to the protocol family from init_net and it skips
         other cases.
      
      3) Adapt all nf_log_packet callers to pass netns as parameter.
         After this patch, this function only works for init_net.
      
      4) Make the sysctl net/netfilter/nf_log pernet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30e0c6a6
    • G
      netfilter: make /proc/net/netfilter pernet · f3c1a44a
      Gao feng 提交于
      This patch makes this proc dentry pernet. So far only init_net
      had a /proc/net/netfilter directory.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f3c1a44a
  13. 05 4月, 2013 2 次提交
    • V
      net: count hw_addr syncs so that unsync works properly. · 4543fbef
      Vlad Yasevich 提交于
      A few drivers use dev_uc_sync/unsync to synchronize the
      address lists from master down to slave/lower devices.  In
      some cases (bond/team) a single address list is synched down
      to multiple devices.  At the time of unsync, we have a leak
      in these lower devices, because "synced" is treated as a
      boolean and the address will not be unsynced for anything after
      the first device/call.
      
      Treat "synced" as a count (same as refcount) and allow all
      unsync calls to work.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4543fbef
    • J
      net: frag queue per hash bucket locking · 19952cc4
      Jesper Dangaard Brouer 提交于
      This patch implements per hash bucket locking for the frag queue
      hash.  This removes two write locks, and the only remaining write
      lock is for protecting hash rebuild.  This essentially reduce the
      readers-writer lock to a rebuild lock.
      
      This patch is part of "net: frag performance followup"
       http://thread.gmane.org/gmane.linux.network/263644
      of which two patches have already been accepted:
      
      Same test setup as previous:
       (http://thread.gmane.org/gmane.linux.network/257155)
       Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
       Ethernet flow-control.  A third interface is used for generating the
       DoS attack (with trafgen).
      
      Notice, I have changed the frag DoS generator script to be more
      efficient/deadly.  Before it would only hit one RX queue, now its
      sending packets causing multi-queue RX, due to "better" RX hashing.
      
      Test types summary (netperf UDP_STREAM):
       Test-20G64K     == 2x10G with 65K fragments
       Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
       Test-20G64K+DoS == Same as 20G64K with frag DoS
       Test-20G3F+DoS  == Same as 20G3F  with frag DoS
       Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
       Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS
      
      When I rebased this-patch(03) (on top of net-next commit a210576c) and
      removed the _bh spinlock, I saw a performance regression.  BUT this
      was caused by some unrelated change in-between.  See tests below.
      
      Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
      Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
      Test (C) is what I reported before for this-patch
      
      Test (D) is net-next master HEAD (commit a210576c), which reveals some
      (unknown) performance regression (compared against test (B)).
      Test (D) function as a new base-test.
      
      Performance table summary (in Mbit/s):
      
      (#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
          ----------  -------   -------  ----------  ---------  --------  -------
      (A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
      (B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
      (C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6
      
      (D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
      (E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
      (F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3
      
      Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.
      
      I cannot explain the slow down for 20G64K (but its an artificial
      "lab-test" so I'm not worried).  But the other results does show
      improvements.  And test (E) "with _bh" version is slightly better.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      
      ----
      V2:
      - By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
        need the spinlock _bh versions, as Netfilter currently does a
        local_bh_disable() before entering inet_fragment.
      - Fold-in desc from cover-mail
      V3:
      - Drop the chain_len counter per hash bucket.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19952cc4
  14. 03 4月, 2013 2 次提交
  15. 02 4月, 2013 5 次提交
    • M
      152b0f5d
    • R
      PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset · 5faaa035
      Rajagopal Venkat 提交于
      Fix compiler warnings generated when devfreq is not enabled
      (CONFIG_PM_DEVFREQ is not set).
      Signed-off-by: NRajagopal Venkat <rajagopal.venkat@linaro.org>
      Acked-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5faaa035
    • H
      netfilter: xt_NFQUEUE: introduce CPU fanout · 8746ddcf
      holger@eitzenberger.org 提交于
      Current NFQUEUE target uses a hash, computed over source and
      destination address (and other parameters), for steering the packet
      to the actual NFQUEUE. This, however forgets about the fact that the
      packet eventually is handled by a particular CPU on user request.
      
      If E. g.
      
        1) IRQ affinity is used to handle packets on a particular CPU already
           (both single-queue or multi-queue case)
      
      and/or
      
        2) RPS is used to steer packets to a specific softirq
      
      the target easily chooses an NFQUEUE which is not handled by a process
      pinned to the same CPU.
      
      The idea is therefore to use the CPU index for determining the
      NFQUEUE handling the packet.
      
      E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
      looks like this:
      
       +-----+  +-----+  +-----+  +-----+
       |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
       +-----+  +-----+  +-----+  +-----+
          ^        ^        ^        ^
          |        |NFQUEUE |        |
          +        +        +        +
       +-----+  +-----+  +-----+  +-----+
       |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
       +-----+  +-----+  +-----+  +-----+
      
      The NFQUEUEs not necessarily have to start with number 0, setups with
      less NFQUEUEs than packet-handling CPUs are not a problem as well.
      
      This patch extends the NFQUEUE target to accept a new
      NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
      CPU index for determining the NFQUEUE being used. I have to introduce
      rev3 for this. The 'flags' are folded into _v2 'bypass'.
      
      By changing the way which queue is assigned, I'm able to improve the
      performance if the processes reading on the NFQUEUs are pinned
      correctly.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8746ddcf
    • J
      ipvs: convert services to rcu · ceec4c38
      Julian Anastasov 提交于
      This is the final step in RCU conversion.
      
      Things that are removed:
      
      - svc->usecnt: now svc is accessed under RCU read lock
      - svc->inc: and some unused code
      - ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
      - __ip_vs_svc_lock: replaced with RCU
      - IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
      	RCU and work in parallel with configuration
      
      Other changes:
      
      - before now, a RCU read-side critical section included the
      calling of the schedule method, now it is extended to include
      service lookup
      - ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
      - svc->pe and svc->scheduler remain to the end (of grace period),
      	the schedulers are prepared for such RCU readers
      	even after done_service is called but they need
      	to use synchronize_rcu because last ip_vs_scheduler_put
      	can happen while RCU read-side critical sections
      	use an outdated svc->scheduler pointer
      - as planned, update_service is removed
      - empty services can be freed immediately after grace period.
      	If dests were present, the services are freed from
      	the dest trash code
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      ceec4c38
    • J
      ipvs: convert dests to rcu · 413c2d04
      Julian Anastasov 提交于
      In previous commits the schedulers started to access
      svc->destinations with _rcu list traversal primitives
      because the IP_VS_WAIT_WHILE macro still plays the role of
      grace period. Now it is time to finish the updating part,
      i.e. adding and deleting of dests with _rcu suffix before
      removing the IP_VS_WAIT_WHILE in next commit.
      
      We use the same rule for conns as for the
      schedulers: dests can be searched in RCU read-side critical
      section where ip_vs_dest_hold can be called by ip_vs_bind_dest.
      
      Some things are not perfect, for example, calling
      functions like ip_vs_lookup_dest from updating code under
      RCU, just because we use some function both from reader
      and from updater.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      413c2d04