1. 16 4月, 2013 2 次提交
  2. 15 4月, 2013 1 次提交
  3. 13 4月, 2013 1 次提交
    • E
      tcp: GSO should be TSQ friendly · d6a4a104
      Eric Dumazet 提交于
      I noticed that TSQ (TCP Small queues) was less effective when TSO is
      turned off, and GSO is on. If BQL is not enabled, TSQ has then no
      effect.
      
      It turns out the GSO engine frees the original gso_skb at the time the
      fragments are generated and queued to the NIC.
      
      We should instead call the tcp_wfree() destructor for the last fragment,
      to keep the flow control as intended in TSQ. This effectively limits
      the number of queued packets on qdisc + NIC layers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a4a104
  4. 12 4月, 2013 2 次提交
    • M
      usbnet: handle link change · 4b49f58f
      Ming Lei 提交于
      The link change is detected via the interrupt pipe, and bulk
      pipes are responsible for transfering packets, so it is reasonable
      to stop bulk transfer after link is reported as off.
      
      Two adavantages may be obtained with stopping bulk transfer
      after link becomes off:
      
      - USB bus bandwidth is saved(USB bus is shared bus except for
      USB3.0), for example, lots of 'IN' token packets and 'NYET'
      handshake packets is transfered on 2.0 bus.
      
      - probabaly power might be saved for usb host controller since
      cancelling bulk transfer may disable the asynchronous schedule of
      host controller.
      
      With this patch, when link becomes off, about ~10% performance
      boost can be found on bulk transfer of anther usb device which
      is attached to same bus with the usbnet device, see below
      test on next-20130410:
      
      - read from usb mass storage(Sandisk Extreme USB 3.0) on pandaboard
      with below command after unplugging ethernet cable:
      
      	dd if=/dev/sda iflag=direct of=/dev/null bs=1M count=800
      
      - without the patch
      1, 838860800 bytes (839 MB) copied, 36.2216 s, 23.2 MB/s
      2, 838860800 bytes (839 MB) copied, 35.8368 s, 23.4 MB/s
      3, 838860800 bytes (839 MB) copied, 35.823 s, 23.4 MB/s
      4, 838860800 bytes (839 MB) copied, 35.937 s, 23.3 MB/s
      5, 838860800 bytes (839 MB) copied, 35.7365 s, 23.5 MB/s
      average: 23.6MB/s
      
      - with the patch
      1, 838860800 bytes (839 MB) copied, 32.3817 s, 25.9 MB/s
      2, 838860800 bytes (839 MB) copied, 31.7389 s, 26.4 MB/s
      3, 838860800 bytes (839 MB) copied, 32.438 s, 25.9 MB/s
      4, 838860800 bytes (839 MB) copied, 32.5492 s, 25.8 MB/s
      5, 838860800 bytes (839 MB) copied, 31.6178 s, 26.5 MB/s
      average: 26.1MB/s
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b49f58f
    • M
      usbnet: introduce usbnet_link_change API · ac64995d
      Ming Lei 提交于
      This patch introduces the API of usbnet_link_change, so that
      usbnet can handle link change centrally, which may help to
      implement killing traffic URBs for saving USB bus bandwidth
      and host controller power.
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac64995d
  5. 10 4月, 2013 4 次提交
  6. 09 4月, 2013 3 次提交
    • D
      net: ipv6: add tokenized interface identifier support · f53adae4
      Daniel Borkmann 提交于
      This patch adds support for IPv6 tokenized IIDs, that allow
      for administrators to assign well-known host-part addresses
      to nodes whilst still obtaining global network prefix from
      Router Advertisements. It is currently in draft status.
      
        The primary target for such support is server platforms
        where addresses are usually manually configured, rather
        than using DHCPv6 or SLAAC. By using tokenised identifiers,
        hosts can still determine their network prefix by use of
        SLAAC, but more readily be automatically renumbered should
        their network prefix change. [...]
      
        The disadvantage with static addresses is that they are
        likely to require manual editing should the network prefix
        in use change.  If instead there were a method to only
        manually configure the static identifier part of the IPv6
        address, then the address could be automatically updated
        when a new prefix was introduced, as described in [RFC4192]
        for example.  In such cases a DNS server might be
        configured with such a tokenised interface identifier of
        ::53, and SLAAC would use the token in constructing the
        interface address, using the advertised prefix. [...]
      
        http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
      
      The implementation is partially based on top of Mark K.
      Thompson's proof of concept. However, it uses the Netlink
      interface for configuration resp. data retrival, so that
      it can be easily extended in future. Successfully tested
      by myself.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f53adae4
    • W
      ieee802154/nl-mac.c: make some MLME operations optional · 56aa091d
      Werner Almesberger 提交于
      Check for NULL before calling the following operations from "struct
      ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
      and scan_req.
      
      This fixes a current oops where those functions are called but not
      implemented. It also updates the documentation to clarify that they
      are now optional by design. If a call to an unimplemented function
      is attempted, the kernel returns EOPNOTSUPP via netlink.
      
      The following operations are still required: get_phy, get_pan_id,
      get_short_addr, and get_dsn.
      
      Note that the places where this patch changes the initialization
      of "ret" should not affect the rest of the code since "ret" was
      always set (again) before returning its value.
      Signed-off-by: NWerner Almesberger <werner@almesberger.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56aa091d
    • W
      IEEE 802.15.4: remove get_bsn from "struct ieee802154_mlme_ops" · d87c8c6d
      Werner Almesberger 提交于
      It served no purpose: we never call it from anywhere in the stack
      and the only driver that did implement it (fakehard) merely provided
      a dummy value.
      
      There is also considerable doubt whether it would make sense to
      even attempt beacon processing at this level in the Linux kernel.
      Signed-off-by: NWerner Almesberger <werner@almesberger.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d87c8c6d
  7. 08 4月, 2013 2 次提交
  8. 06 4月, 2013 4 次提交
    • P
      netfilter: don't reset nf_trace in nf_reset() · 124dff01
      Patrick McHardy 提交于
      Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
      to reset nf_trace in nf_reset(). This is wrong and unnecessary.
      
      nf_reset() is used in the following cases:
      
      - when passing packets up the the socket layer, at which point we want to
        release all netfilter references that might keep modules pinned while
        the packet is queued. nf_trace doesn't matter anymore at this point.
      
      - when encapsulating or decapsulating IPsec packets. We want to continue
        tracing these packets after IPsec processing.
      
      - when passing packets through virtual network devices. Only devices on
        that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
        used anymore. Its not entirely clear whether those packets should
        be traced after that, however we've always done that.
      
      - when passing packets through virtual network devices that make the
        packet cross network namespace boundaries. This is the only cases
        where we clearly want to reset nf_trace and is also what the
        original patch intended to fix.
      
      Add a new function nf_reset_trace() and use it in dev_forward_skb() to
      fix this properly.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124dff01
    • P
      netfilter: remove unneeded variable proc_net_netfilter · 12202fa7
      Pablo Neira Ayuso 提交于
      Now that this supports net namespace for nflog and nfqueue,
      we can remove the global proc_net_netfilter which has no
      clients anymore.
      
      Based on patch from Gao feng.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      12202fa7
    • G
      netfilter: nf_log: prepare net namespace support for loggers · 30e0c6a6
      Gao feng 提交于
      This patch adds netns support to nf_log and it prepares netns
      support for existing loggers. It is composed of four major
      changes.
      
      1) nf_log_register has been split to two functions: nf_log_register
         and nf_log_set. The new nf_log_register is used to globally
         register the nf_logger and nf_log_set is used for enabling
         pernet support from nf_loggers.
      
         Per netns is not yet complete after this patch, it comes in
         separate follow up patches.
      
      2) Add net as a parameter of nf_log_bind_pf. Per netns is not
         yet complete after this patch, it only allows to bind the
         nf_logger to the protocol family from init_net and it skips
         other cases.
      
      3) Adapt all nf_log_packet callers to pass netns as parameter.
         After this patch, this function only works for init_net.
      
      4) Make the sysctl net/netfilter/nf_log pernet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30e0c6a6
    • G
      netfilter: make /proc/net/netfilter pernet · f3c1a44a
      Gao feng 提交于
      This patch makes this proc dentry pernet. So far only init_net
      had a /proc/net/netfilter directory.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f3c1a44a
  9. 05 4月, 2013 2 次提交
    • V
      net: count hw_addr syncs so that unsync works properly. · 4543fbef
      Vlad Yasevich 提交于
      A few drivers use dev_uc_sync/unsync to synchronize the
      address lists from master down to slave/lower devices.  In
      some cases (bond/team) a single address list is synched down
      to multiple devices.  At the time of unsync, we have a leak
      in these lower devices, because "synced" is treated as a
      boolean and the address will not be unsynced for anything after
      the first device/call.
      
      Treat "synced" as a count (same as refcount) and allow all
      unsync calls to work.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4543fbef
    • J
      net: frag queue per hash bucket locking · 19952cc4
      Jesper Dangaard Brouer 提交于
      This patch implements per hash bucket locking for the frag queue
      hash.  This removes two write locks, and the only remaining write
      lock is for protecting hash rebuild.  This essentially reduce the
      readers-writer lock to a rebuild lock.
      
      This patch is part of "net: frag performance followup"
       http://thread.gmane.org/gmane.linux.network/263644
      of which two patches have already been accepted:
      
      Same test setup as previous:
       (http://thread.gmane.org/gmane.linux.network/257155)
       Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
       Ethernet flow-control.  A third interface is used for generating the
       DoS attack (with trafgen).
      
      Notice, I have changed the frag DoS generator script to be more
      efficient/deadly.  Before it would only hit one RX queue, now its
      sending packets causing multi-queue RX, due to "better" RX hashing.
      
      Test types summary (netperf UDP_STREAM):
       Test-20G64K     == 2x10G with 65K fragments
       Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
       Test-20G64K+DoS == Same as 20G64K with frag DoS
       Test-20G3F+DoS  == Same as 20G3F  with frag DoS
       Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
       Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS
      
      When I rebased this-patch(03) (on top of net-next commit a210576c) and
      removed the _bh spinlock, I saw a performance regression.  BUT this
      was caused by some unrelated change in-between.  See tests below.
      
      Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
      Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
      Test (C) is what I reported before for this-patch
      
      Test (D) is net-next master HEAD (commit a210576c), which reveals some
      (unknown) performance regression (compared against test (B)).
      Test (D) function as a new base-test.
      
      Performance table summary (in Mbit/s):
      
      (#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
          ----------  -------   -------  ----------  ---------  --------  -------
      (A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
      (B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
      (C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6
      
      (D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
      (E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
      (F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3
      
      Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.
      
      I cannot explain the slow down for 20G64K (but its an artificial
      "lab-test" so I'm not worried).  But the other results does show
      improvements.  And test (E) "with _bh" version is slightly better.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      
      ----
      V2:
      - By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
        need the spinlock _bh versions, as Netfilter currently does a
        local_bh_disable() before entering inet_fragment.
      - Fold-in desc from cover-mail
      V3:
      - Drop the chain_len counter per hash bucket.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19952cc4
  10. 03 4月, 2013 2 次提交
  11. 02 4月, 2013 17 次提交
    • M
      152b0f5d
    • R
      PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset · 5faaa035
      Rajagopal Venkat 提交于
      Fix compiler warnings generated when devfreq is not enabled
      (CONFIG_PM_DEVFREQ is not set).
      Signed-off-by: NRajagopal Venkat <rajagopal.venkat@linaro.org>
      Acked-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5faaa035
    • H
      netfilter: xt_NFQUEUE: introduce CPU fanout · 8746ddcf
      holger@eitzenberger.org 提交于
      Current NFQUEUE target uses a hash, computed over source and
      destination address (and other parameters), for steering the packet
      to the actual NFQUEUE. This, however forgets about the fact that the
      packet eventually is handled by a particular CPU on user request.
      
      If E. g.
      
        1) IRQ affinity is used to handle packets on a particular CPU already
           (both single-queue or multi-queue case)
      
      and/or
      
        2) RPS is used to steer packets to a specific softirq
      
      the target easily chooses an NFQUEUE which is not handled by a process
      pinned to the same CPU.
      
      The idea is therefore to use the CPU index for determining the
      NFQUEUE handling the packet.
      
      E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
      looks like this:
      
       +-----+  +-----+  +-----+  +-----+
       |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
       +-----+  +-----+  +-----+  +-----+
          ^        ^        ^        ^
          |        |NFQUEUE |        |
          +        +        +        +
       +-----+  +-----+  +-----+  +-----+
       |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
       +-----+  +-----+  +-----+  +-----+
      
      The NFQUEUEs not necessarily have to start with number 0, setups with
      less NFQUEUEs than packet-handling CPUs are not a problem as well.
      
      This patch extends the NFQUEUE target to accept a new
      NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
      CPU index for determining the NFQUEUE being used. I have to introduce
      rev3 for this. The 'flags' are folded into _v2 'bypass'.
      
      By changing the way which queue is assigned, I'm able to improve the
      performance if the processes reading on the NFQUEUs are pinned
      correctly.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8746ddcf
    • J
      ipvs: convert services to rcu · ceec4c38
      Julian Anastasov 提交于
      This is the final step in RCU conversion.
      
      Things that are removed:
      
      - svc->usecnt: now svc is accessed under RCU read lock
      - svc->inc: and some unused code
      - ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
      - __ip_vs_svc_lock: replaced with RCU
      - IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
      	RCU and work in parallel with configuration
      
      Other changes:
      
      - before now, a RCU read-side critical section included the
      calling of the schedule method, now it is extended to include
      service lookup
      - ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
      - svc->pe and svc->scheduler remain to the end (of grace period),
      	the schedulers are prepared for such RCU readers
      	even after done_service is called but they need
      	to use synchronize_rcu because last ip_vs_scheduler_put
      	can happen while RCU read-side critical sections
      	use an outdated svc->scheduler pointer
      - as planned, update_service is removed
      - empty services can be freed immediately after grace period.
      	If dests were present, the services are freed from
      	the dest trash code
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      ceec4c38
    • J
      ipvs: convert dests to rcu · 413c2d04
      Julian Anastasov 提交于
      In previous commits the schedulers started to access
      svc->destinations with _rcu list traversal primitives
      because the IP_VS_WAIT_WHILE macro still plays the role of
      grace period. Now it is time to finish the updating part,
      i.e. adding and deleting of dests with _rcu suffix before
      removing the IP_VS_WAIT_WHILE in next commit.
      
      We use the same rule for conns as for the
      schedulers: dests can be searched in RCU read-side critical
      section where ip_vs_dest_hold can be called by ip_vs_bind_dest.
      
      Some things are not perfect, for example, calling
      functions like ip_vs_lookup_dest from updating code under
      RCU, just because we use some function both from reader
      and from updater.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      413c2d04
    • J
      ipvs: convert sched_lock to spin lock · ba3a3ce1
      Julian Anastasov 提交于
      As all read_locks are gone spin lock is preferred.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      ba3a3ce1
    • J
      ipvs: do not expect result from done_service · ed3ffc4e
      Julian Anastasov 提交于
      This method releases the scheduler state,
      it can not fail. Such change will help to properly
      replace the scheduler in following patch.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      ed3ffc4e
    • J
      ipvs: reorganize dest trash · 578bc3ef
      Julian Anastasov 提交于
      All dests will go to trash, no exceptions.
      But we have to use new list node t_list for this, due
      to RCU changes in following patches. Dests will wait there
      initial grace period and later all conns and schedulers to
      put their reference. The dests don't get reference for
      staying in dest trash as before.
      
      	As result, we do not load ip_vs_dest_put with
      extra checks for last refcnt and the schedulers do not
      need to play games with atomic_inc_not_zero while
      selecting best destination.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      578bc3ef
    • J
      ipvs: add ip_vs_dest_hold and ip_vs_dest_put · fca9c20a
      Julian Anastasov 提交于
      ip_vs_dest_hold will be used under RCU lock
      while ip_vs_dest_put can be called even after dest
      is removed from service, as it happens for conns and
      some schedulers.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      fca9c20a
    • J
      ipvs: preparations for using rcu in schedulers · 6b6df466
      Julian Anastasov 提交于
      Allow schedulers to use rcu_dereference when
      returning destination on lookup. The RCU read-side critical
      section will allow ip_vs_bind_dest to get dest refcnt as
      preparation for the step where destinations will be
      deleted without an IP_VS_WAIT_WHILE guard that holds the
      packet processing during update.
      
      	Add new optional scheduler methods add_dest,
      del_dest and upd_dest. For now the methods are called
      together with update_service but update_service will be
      removed in a following change.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      6b6df466
    • J
      ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new · 9a05475c
      Julian Anastasov 提交于
      We have many fields to set and few to reset,
      use kmem_cache_alloc instead to save some cycles.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      9a05475c
    • J
      ipvs: reorder keys in connection structure · 1845ed0b
      Julian Anastasov 提交于
      __ip_vs_conn_in_get and ip_vs_conn_out_get are
      hot places. Optimize them, so that ports are matched first.
      By moving net and fwmark below, on 32-bit arch we can fit
      caddr in 32-byte cache line and all addresses in 64-byte
      cache line.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1845ed0b
    • J
      ipvs: convert connection locking · 088339a5
      Julian Anastasov 提交于
      Convert __ip_vs_conntbl_lock_array as follows:
      
      - readers that do not modify conn lists will use RCU lock
      - updaters that modify lists will use spinlock_t
      
      Now for conn lookups we will use RCU read-side
      critical section. Without using __ip_vs_conn_get such
      places have access to connection fields and can
      dereference some pointers like pe and pe_data plus
      the ability to update timer expiration. If full access
      is required we contend for reference.
      
      We add barrier in __ip_vs_conn_put, so that
      other CPUs see the refcnt operation after other writes.
      
      With the introduction of ip_vs_conn_unlink()
      we try to reorganize ip_vs_conn_expire(), so that
      unhashing of connections that should stay more time is
      avoided, even if it is for very short time.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      088339a5
    • J
      ipvs: remove rs_lock by using RCU · 276472ea
      Julian Anastasov 提交于
      rs_lock was used to protect rs_table (hash table)
      from updaters (under global mutex) and readers (packet handlers).
      We can remove rs_lock by using RCU lock for readers. Reclaiming
      dest only with kfree_rcu is enough because the readers access
      only fields from the ip_vs_dest structure.
      
      Use hlist for rs_table.
      
      As we are now using hlist_del_rcu, introduce in_rs_table
      flag as replacement for the list_empty checks which do not
      work with RCU. It is needed because only NAT dests are in
      the rs_table.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      276472ea
    • J
      ipvs: convert app locks · 363c97d7
      Julian Anastasov 提交于
      We use locks like tcp_app_lock, udp_app_lock,
      sctp_app_lock to protect access to the protocol hash tables
      from readers in packet context while the application
      instances (inc) are [un]registered under global mutex.
      
      As the hash tables are mostly read when conns are
      created and bound to app, use RCU for readers and reclaim
      app instance after grace period.
      
      Simplify ip_vs_app_inc_get because we use usecnt
      only for statistics and rely on module refcounting.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      363c97d7
    • J
      ipvs: optimize dst usage for real server · 026ace06
      Julian Anastasov 提交于
      Currently when forwarding requests to real servers
      we use dst_lock and atomic operations when cloning the
      dst_cache value. As the dst_cache value does not change
      most of the time it is better to use RCU and to lock
      dst_lock only when we need to replace the obsoleted dst.
      For this to work we keep dst_cache in new structure protected
      by RCU. For packets to remote real servers we will use noref
      version of dst_cache, it will be valid while we are in RCU
      read-side critical section because now dst_release for replaced
      dsts will be invoked after the grace period. Packets to
      local real servers that are passed to local stack with
      NF_ACCEPT need a dst clone.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      026ace06
    • J
      ipvs: rename functions related to dst_cache reset · d1deae4d
      Julian Anastasov 提交于
      Move and give better names to two functions:
      
      - ip_vs_dst_reset to __ip_vs_dst_cache_reset
      - __ip_vs_dev_reset to ip_vs_forget_dev
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      d1deae4d