1. 27 9月, 2014 1 次提交
    • P
      netfilter: bridge: move br_netfilter out of the core · 34666d46
      Pablo Neira Ayuso 提交于
      Jesper reported that br_netfilter always registers the hooks since
      this is part of the bridge core. This harms performance for people that
      don't need this.
      
      This patch modularizes br_netfilter so it can be rmmod'ed, thus,
      the hooks can be unregistered. I think the bridge netfilter should have
      been a separated module since the beginning, Patrick agreed on that.
      
      Note that this is breaking compatibility for users that expect that
      bridge netfilter is going to be available after explicitly 'modprobe
      bridge' or via automatic load through brctl.
      
      However, the damage can be easily undone by modprobing br_netfilter.
      The bridge core also spots a message to provide a clue to people that
      didn't notice that this has been deprecated.
      
      On top of that, the plan is that nftables will not rely on this software
      layer, but integrate the connection tracking into the bridge layer to
      enable stateful filtering and NAT, which is was bridge netfilter users
      seem to require.
      
      This patch still keeps the fake_dst_ops in the bridge core, since this
      is required by when the bridge port is initialized. So we can safely
      modprobe/rmmod br_netfilter anytime.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      34666d46
  2. 15 7月, 2014 1 次提交
  3. 17 1月, 2014 1 次提交
  4. 12 12月, 2013 1 次提交
    • J
      ipv6: router reachability probing · 7e980569
      Jiri Benc 提交于
      RFC 4191 states in 3.5:
      
         When a host avoids using any non-reachable router X and instead sends
         a data packet to another router Y, and the host would have used
         router X if router X were reachable, then the host SHOULD probe each
         such router X's reachability by sending a single Neighbor
         Solicitation to that router's address.  A host MUST NOT probe a
         router's reachability in the absence of useful traffic that the host
         would have sent to the router if it were reachable.  In any case,
         these probes MUST be rate-limited to no more than one per minute per
         router.
      
      Currently, when the neighbour corresponding to a router falls into
      NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
      value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
      should be probed with a single NS. The probe is ratelimited by the existing
      code. To better distinguish meanings of the failure values, rename
      RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e980569
  5. 10 12月, 2013 4 次提交
  6. 01 8月, 2013 1 次提交
  7. 11 2月, 2013 1 次提交
  8. 29 1月, 2013 1 次提交
    • Y
      net neigh: Optimize neighbor entry size calculation. · 08433eff
      YOSHIFUJI Hideaki / 吉藤英明 提交于
      When allocating memory for neighbour cache entry, if
      tbl->entry_size is not set, we always calculate
      sizeof(struct neighbour) + tbl->key_len, which is common
      in the same table.
      
      With this change, set tbl->entry_size during the table
      initialization phase, if it was not set, and use it in
      neigh_alloc() and neighbour_priv().
      
      This change also allow us to have both of protocol private
      data and device priate data at tha same time.
      
      Note that the only user of prototcol private is DECnet
      and the only user of device private is ATM CLIP.
      Since those are exclusive, we have not been facing issues
      here.
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08433eff
  9. 08 8月, 2012 1 次提交
    • E
      net: output path optimizations · 425f09ab
      Eric Dumazet 提交于
      1) Avoid dirtying neighbour's confirmed field.
      
        TCP workloads hits this cache line for each incoming ACK.
        Lets write n->confirmed only if there is a jiffie change.
      
      2) Optimize neigh_hh_output() for the common Ethernet case, were
         hh_len is less than 16 bytes. Replace the memcpy() call
         by two inlined 64bit load/stores on x86_64.
      
      Bench results using udpflood test, with -C option (MSG_CONFIRM flag
      added to sendto(), to reproduce the n->confirmed dirtying on UDP)
      
      24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
      
      before : 2.247s, 2.235s, 2.247s, 2.318s
      after  : 1.884s, 1.905s, 1.891s, 1.895s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      425f09ab
  10. 05 7月, 2012 2 次提交
  11. 16 4月, 2012 1 次提交
  12. 14 4月, 2012 1 次提交
  13. 29 12月, 2011 1 次提交
  14. 20 12月, 2011 1 次提交
  15. 14 12月, 2011 1 次提交
  16. 01 12月, 2011 2 次提交
  17. 14 11月, 2011 1 次提交
    • E
      neigh: new unresolved queue limits · 8b5c171b
      Eric Dumazet 提交于
      Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
      > From: David Miller <davem@davemloft.net>
      > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
      >
      > > From: Eric Dumazet <eric.dumazet@gmail.com>
      > > Date: Wed, 09 Nov 2011 12:14:09 +0100
      > >
      > >> unres_qlen is the number of frames we are able to queue per unresolved
      > >> neighbour. Its default value (3) was never changed and is responsible
      > >> for strange drops, especially if IP fragments are used, or multiple
      > >> sessions start in parallel. Even a single tcp flow can hit this limit.
      > >  ...
      > >
      > > Ok, I've applied this, let's see what happens :-)
      >
      > Early answer, build fails.
      >
      > Please test build this patch with DECNET enabled and resubmit.  The
      > decnet neigh layer still refers to the removed ->queue_len member.
      >
      > Thanks.
      
      Ouch, this was fixed on one machine yesterday, but not the other one I
      used this morning, sorry.
      
      [PATCH V5 net-next] neigh: new unresolved queue limits
      
      unres_qlen is the number of frames we are able to queue per unresolved
      neighbour. Its default value (3) was never changed and is responsible
      for strange drops, especially if IP fragments are used, or multiple
      sessions start in parallel. Even a single tcp flow can hit this limit.
      
      $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
      8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5c171b
  18. 27 7月, 2011 1 次提交
  19. 18 7月, 2011 1 次提交
  20. 17 7月, 2011 4 次提交
  21. 14 7月, 2011 1 次提交
    • D
      net: Embed hh_cache inside of struct neighbour. · f6b72b62
      David S. Miller 提交于
      Now that there is a one-to-one correspondance between neighbour
      and hh_cache entries, we no longer need:
      
      1) dynamic allocation
      2) attachment to dst->hh
      3) refcounting
      
      Initialization of the hh_cache entry is indicated by hh_len
      being non-zero, and such initialization is always done with
      the neighbour's lock held as a writer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6b72b62
  22. 11 7月, 2011 1 次提交
    • D
      neigh: Store hash shift instead of mask. · cd089336
      David S. Miller 提交于
      And mask the hash function result by simply shifting
      down the "->hash_shift" most significant bits.
      
      Currently which bits we use is arbitrary since jhash
      produces entropy evenly across the whole hash function
      result.
      
      But soon we'll be using universal hashing functions,
      and in those cases more entropy exists in the higher
      bits than the lower bits, because they use multiplies.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd089336
  23. 19 11月, 2010 1 次提交
  24. 12 11月, 2010 1 次提交
  25. 12 10月, 2010 2 次提交
    • E
      neigh: reorder struct neighbour fields · e37ef961
      Eric Dumazet 提交于
      Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
      > Here is the followup patch.
      >
      > Thanks !
      >
      
      Oops, this was an old version, the up2date ones also took care of "used"
      field.
      
      I guess its time for a sleep, sorry again.
      
      [PATCH net-next V2] neigh: reorder struct neighbour fields
      
      (refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
      be placed on a separate cache lines.
      
      refcnt can be often written, while other fields are mostly read.
      
      This gave me good result on stress test :
      
      before:
      
      real    0m45.570s
      user    0m15.525s
      sys     9m56.669s
      
      After:
      
      real    0m41.841s
      user    0m15.261s
      sys     8m45.949s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e37ef961
    • E
      neigh: Protect neigh->ha[] with a seqlock · 0ed8ddf4
      Eric Dumazet 提交于
      Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
      dirtying neighbour in stress situation (many different flows / dsts)
      
      Dirtying takes place because of read_lock(&n->lock) and n->used writes.
      
      Switching to a seqlock, and writing n->used only on jiffies changes
      permits less dirtying.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ed8ddf4
  26. 07 10月, 2010 1 次提交
    • E
      neigh: RCU conversion of struct neighbour · 767e97e1
      Eric Dumazet 提交于
      This is the second step for neighbour RCU conversion.
      
      (first was commit d6bf7817 : RCU conversion of neigh hash table)
      
      neigh_lookup() becomes lockless, but still take a reference on found
      neighbour. (no more read_lock()/read_unlock() on tbl->lock)
      
      struct neighbour gets an additional rcu_head field and is freed after an
      RCU grace period.
      
      Future work would need to eventually not take a reference on neighbour
      for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
      use a noref bit like we did for skb->_dst.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767e97e1
  27. 06 10月, 2010 1 次提交
    • E
      net neigh: RCU conversion of neigh hash table · d6bf7817
      Eric Dumazet 提交于
      David
      
      This is the first step for RCU conversion of neigh code.
      
      Next patches will convert hash_buckets[] and "struct neighbour" to RCU
      protected objects.
      
      Thanks
      
      [PATCH net-next] net neigh: RCU conversion of neigh hash table
      
      Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
      neigh_table", a new structure is defined :
      
      struct neigh_hash_table {
             struct neighbour        **hash_buckets;
             unsigned int            hash_mask;
             __u32                   hash_rnd;
             struct rcu_head         rcu;
      };
      
      And "struct neigh_table" has an RCU protected pointer to such a
      neigh_hash_table.
      
      This means the signature of (*hash)() function changed: We need to add a
      third parameter with the actual hash_rnd value, since this is not
      anymore a neigh_table field.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bf7817
  28. 01 10月, 2010 1 次提交
  29. 01 7月, 2010 1 次提交
  30. 15 4月, 2010 1 次提交
    • B
      netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT · e179e632
      Bart De Schuymer 提交于
      - fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
      neigh_hh_output() or dst->neighbour->output() overwrite the complete
      Ethernet header, although we only need the destination MAC address.
      For encapsulated packets, they ended up overwriting the encapsulating
      header. The new code copies the Ethernet source MAC address and
      protocol number before calling dst->neighbour->output(). The Ethernet
      source MAC and protocol number are copied back in place in
      br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
      more transparent because in the old scheme the source MAC of the
      bridge was copied into the source address in the Ethernet header. We
      also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
      execution of the PF_INET resp. PF_INET6 hooks.
      
      - Speed up IP DNAT by calling neigh_hh_bridge() instead of
      neigh_hh_output(): if dst->hh is available, we already know the MAC
      address so we can just copy it.
      Signed-off-by: NBart De Schuymer <bdschuym@pandora.be>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      e179e632
  31. 17 2月, 2010 1 次提交
    • T
      percpu: add __percpu sparse annotations to net · 7d720c3e
      Tejun Heo 提交于
      Add __percpu sparse annotations to net.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      
      The macro and type tricks around snmp stats make things a bit
      interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
      as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
      snmp_mib_*() users which used to cast the argument to (void **) are
      updated to cast it to (void __percpu **).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d720c3e