1. 30 4月, 2013 2 次提交
  2. 25 4月, 2013 1 次提交
  3. 20 4月, 2013 2 次提交
  4. 19 4月, 2013 4 次提交
    • F
      netfilter: xt_rpfilter: depend on raw or mangle table · d37d6968
      Florian Westphal 提交于
      rpfilter is only valid in raw/mangle PREROUTING, i.e.
      RPFILTER=y|m is useless without raw or mangle table support.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d37d6968
    • F
      netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too · f83a7ea2
      Florian Westphal 提交于
      Alex Efros reported rpfilter module doesn't match following packets:
      IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
      (netfilter bugzilla #814).
      
      Problem is that network stack arranges for the locally generated broadcasts
      to appear on the interface they were sent out, so the IFF_LOOPBACK check
      doesn't trigger.
      
      As -m rpfilter is restricted to PREROUTING, we can check for existing
      rtable instead, it catches locally-generated broad/multicast case, too.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f83a7ea2
    • E
      tcp: introduce TCPSpuriousRtxHostQueues SNMP counter · 0e280af0
      Eric Dumazet 提交于
      Host queues (Qdisc + NIC) can hold packets so long that TCP can
      eventually retransmit a packet before the first transmit even left
      the host.
      
      Its not clear right now if we could avoid this in the first place :
      
      - We could arm RTO timer not at the time we enqueue packets, but
        at the time we TX complete them (tcp_wfree())
      
      - Cancel the sending of the new copy of the packet if prior one
        is still in queue.
      
      This patch adds instrumentation so that we can at least see how
      often this problem happens.
      
      TCPSpuriousRtxHostQueues SNMP counter is incremented every time
      we detect the fast clone is not yet freed in tcp_transmit_skb()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e280af0
    • P
      netfilter: add my copyright statements · f229f6ce
      Patrick McHardy 提交于
      Add copyright statements to all netfilter files which have had significant
      changes done by myself in the past.
      
      Some notes:
      
      - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
        Core Team when it got split out of nf_conntrack_core.c. The copyrights
        even state a date which lies six years before it was written. It was
        written in 2005 by Harald and myself.
      
      - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
        statements. I've added the copyright statement from net/netfilter/core.c,
        where this code originated
      
      - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
        it to give the wrong impression
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f229f6ce
  5. 17 4月, 2013 1 次提交
    • E
      net: drop dst before queueing fragments · 97599dc7
      Eric Dumazet 提交于
      Commit 4a94445c (net: Use ip_route_input_noref() in input path)
      added a bug in IP defragmentation handling, as non refcounted
      dst could escape an RCU protected section.
      
      Commit 64f3b9e2 (net: ip_expire() must revalidate route) fixed
      the case of timeouts, but not the general problem.
      
      Tom Parkin noticed crashes in UDP stack and provided a patch,
      but further analysis permitted us to pinpoint the root cause.
      
      Before queueing a packet into a frag list, we must drop its dst,
      as this dst has limited lifetime (RCU protected)
      
      When/if a packet is finally reassembled, we use the dst of the very
      last skb, still protected by RCU and valid, as the dst of the
      reassembled packet.
      
      Use same logic in IPv6, as there is no need to hold dst references.
      Reported-by: NTom Parkin <tparkin@katalix.com>
      Tested-by: NTom Parkin <tparkin@katalix.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97599dc7
  6. 16 4月, 2013 1 次提交
  7. 15 4月, 2013 2 次提交
  8. 14 4月, 2013 1 次提交
  9. 13 4月, 2013 1 次提交
    • E
      tcp: GSO should be TSQ friendly · d6a4a104
      Eric Dumazet 提交于
      I noticed that TSQ (TCP Small queues) was less effective when TSO is
      turned off, and GSO is on. If BQL is not enabled, TSQ has then no
      effect.
      
      It turns out the GSO engine frees the original gso_skb at the time the
      fragments are generated and queued to the NIC.
      
      We should instead call the tcp_wfree() destructor for the last fragment,
      to keep the flow control as intended in TSQ. This effectively limits
      the number of queued packets on qdisc + NIC layers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a4a104
  10. 12 4月, 2013 2 次提交
  11. 10 4月, 2013 2 次提交
  12. 09 4月, 2013 3 次提交
  13. 08 4月, 2013 2 次提交
  14. 06 4月, 2013 2 次提交
    • G
      netfilter: ipt_ULOG: add net namespace support for ipt_ULOG · 35543067
      Gao feng 提交于
      Add pernet support to ipt_ULOG by means of the new nf_log_set
      function added in (30e0c6a6 netfilter: nf_log: prepare net
      namespace support for loggers).
      
      This patch also make ulog_buffers and netlink socket
      nflognl per netns.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      35543067
    • G
      netfilter: nf_log: prepare net namespace support for loggers · 30e0c6a6
      Gao feng 提交于
      This patch adds netns support to nf_log and it prepares netns
      support for existing loggers. It is composed of four major
      changes.
      
      1) nf_log_register has been split to two functions: nf_log_register
         and nf_log_set. The new nf_log_register is used to globally
         register the nf_logger and nf_log_set is used for enabling
         pernet support from nf_loggers.
      
         Per netns is not yet complete after this patch, it comes in
         separate follow up patches.
      
      2) Add net as a parameter of nf_log_bind_pf. Per netns is not
         yet complete after this patch, it only allows to bind the
         nf_logger to the protocol family from init_net and it skips
         other cases.
      
      3) Adapt all nf_log_packet callers to pass netns as parameter.
         After this patch, this function only works for init_net.
      
      4) Make the sysctl net/netfilter/nf_log pernet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30e0c6a6
  15. 05 4月, 2013 2 次提交
    • J
      net: ipv4: notify when address lifetime changes · 34e2ed34
      Jiri Pirko 提交于
      if userspace changes lifetime of address, send netlink notification and
      call notifier.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34e2ed34
    • J
      net: frag queue per hash bucket locking · 19952cc4
      Jesper Dangaard Brouer 提交于
      This patch implements per hash bucket locking for the frag queue
      hash.  This removes two write locks, and the only remaining write
      lock is for protecting hash rebuild.  This essentially reduce the
      readers-writer lock to a rebuild lock.
      
      This patch is part of "net: frag performance followup"
       http://thread.gmane.org/gmane.linux.network/263644
      of which two patches have already been accepted:
      
      Same test setup as previous:
       (http://thread.gmane.org/gmane.linux.network/257155)
       Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
       Ethernet flow-control.  A third interface is used for generating the
       DoS attack (with trafgen).
      
      Notice, I have changed the frag DoS generator script to be more
      efficient/deadly.  Before it would only hit one RX queue, now its
      sending packets causing multi-queue RX, due to "better" RX hashing.
      
      Test types summary (netperf UDP_STREAM):
       Test-20G64K     == 2x10G with 65K fragments
       Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
       Test-20G64K+DoS == Same as 20G64K with frag DoS
       Test-20G3F+DoS  == Same as 20G3F  with frag DoS
       Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
       Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS
      
      When I rebased this-patch(03) (on top of net-next commit a210576c) and
      removed the _bh spinlock, I saw a performance regression.  BUT this
      was caused by some unrelated change in-between.  See tests below.
      
      Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
      Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
      Test (C) is what I reported before for this-patch
      
      Test (D) is net-next master HEAD (commit a210576c), which reveals some
      (unknown) performance regression (compared against test (B)).
      Test (D) function as a new base-test.
      
      Performance table summary (in Mbit/s):
      
      (#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
          ----------  -------   -------  ----------  ---------  --------  -------
      (A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
      (B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
      (C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6
      
      (D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
      (E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
      (F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3
      
      Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.
      
      I cannot explain the slow down for 20G64K (but its an artificial
      "lab-test" so I'm not worried).  But the other results does show
      improvements.  And test (E) "with _bh" version is slightly better.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      
      ----
      V2:
      - By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
        need the spinlock _bh versions, as Netfilter currently does a
        local_bh_disable() before entering inet_fragment.
      - Fold-in desc from cover-mail
      V3:
      - Drop the chain_len counter per hash bucket.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19952cc4
  16. 03 4月, 2013 1 次提交
    • P
      ipconfig: add informative timeout messages while waiting for carrier · 5e404cd6
      Paul Gortmaker 提交于
      Commit 3fb72f1e ("ipconfig wait
      for carrier") added a "wait for carrier on at least one interface"
      policy, with a worst case maximum wait of two minutes.
      
      However, if you encounter this, you won't get any feedback from
      the console as to the nature of what is going on.  You just see
      the booting process hang for two minutes and then continue.
      
      Here we add a message so the user knows what is going on, and
      hence can take action to rectify the situation (e.g. fix network
      cable or whatever.)  After the 1st 10s pause, output now begins
      that looks like this:
      
      	Waiting up to 110 more seconds for network.
      	Waiting up to 100 more seconds for network.
      	Waiting up to 90 more seconds for network.
      	Waiting up to 80 more seconds for network.
      	...
      
      Since most systems will have no problem getting link/carrier in the
      1st 10s, the only people who will see these messages are people with
      genuine issues that need to be resolved.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e404cd6
  17. 02 4月, 2013 1 次提交
  18. 31 3月, 2013 1 次提交
    • A
      ip_gre: don't overwrite iflink during net_dev init · 537aadc3
      Antonio Quartulli 提交于
      iflink is currently set to 0 in __gre_tunnel_init(). This
      function is invoked in gre_tap_init() and
      ipgre_tunnel_init() which are both used to initialise the
      ndo_init field of the respective net_device_ops structs
      (ipgre.. and gre_tap..) used by GRE interfaces.
      
      However, in netdevice_register() iflink is first set to -1,
      then ndo_init is invoked and then iflink is assigned to a
      proper value if and only if it still was -1.
      
      Assigning 0 to iflink in ndo_init is therefore first
      preventing netdev_register() to correctly assign it a proper
      value and then breaking iflink at all since 0 has not
      correct meaning.
      
      Fix this by removing the iflink assignment in
      __gre_tunnel_init().
      
      Introduced by c5441932
      ("GRE: Refactor GRE tunneling code.")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAntonio Quartulli <ordex@autistici.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      537aadc3
  19. 30 3月, 2013 1 次提交
  20. 29 3月, 2013 1 次提交
  21. 28 3月, 2013 2 次提交
  22. 27 3月, 2013 5 次提交
    • P
      ipv4: Fix ip-header identification for gso packets. · 330305cc
      Pravin B Shelar 提交于
      ip-header id needs to be incremented even if IP_DF flag is set.
      This behaviour was changed in commit 490ab081
      (IP_GRE: Fix IP-Identification).
      
      Following patch fixes it so that identification is always
      incremented.
      Reported-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      330305cc
    • Y
      firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. · 6752c8db
      YOSHIFUJI Hideaki / 吉藤英明 提交于
      Inspection of upper layer protocol is considered harmful, especially
      if it is about ARP or other stateful upper layer protocol; driver
      cannot (and should not) have full state of them.
      
      IPv4 over Firewire module used to inspect ARP (both in sending path
      and in receiving path), and record peer's GUID, max packet size, max
      speed and fifo address.  This patch removes such inspection by extending
      our "hardware address" definition to include other information as well:
      max packet size, max speed and fifo.  By doing this, The neighbour
      module in networking subsystem can cache them.
      
      Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
            information will not be used in the driver when sending.
      
      When a packet is being sent, the IP layer fills our pseudo header with
      the extended "hardware address", including GUID and fifo.  The driver
      can look-up node-id (the real but rather volatile low-level address)
      by GUID, and then the module can send the packet to the wire using
      parameters provided in the extendedn hardware address.
      
      This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
      over IEEE1394 (RFC3146) share same "hardware address" format
      in their address resolution protocols.
      
      Here, extended "hardware address" is defined as follows:
      
      union fwnet_hwaddr {
      	u8 u[16];
      	struct {
      		__be64 uniq_id;		/* EUI-64			*/
      		u8 max_rec;		/* max packet size		*/
      		u8 sspd;		/* max speed			*/
      		__be16 fifo_hi;		/* hi 16bits of FIFO addr	*/
      		__be32 fifo_lo;		/* lo 32bits of FIFO addr	*/
      	} __packed uc;
      };
      
      Note that Hardware address is declared as union, so that we can map full
      IP address into this, when implementing MCAP (Multicast Cannel Allocation
      Protocol) for IPv6, but IP and ARP subsystem do not need to know this
      format in detail.
      
      One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
      is that 1394 ARP Request/Reply do not contain the target hardware address
      field (aka ar$tha).  This difference is handled in the ARP subsystem.
      
      CC: Stephan Gatzka <stephan.gatzka@gmail.com>
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6752c8db
    • P
      Tunneling: use IP Tunnel stats APIs. · f61dd388
      Pravin B Shelar 提交于
      Use common function get calculate rtnl_link_stats64 stats.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f61dd388
    • P
      IPIP: Use ip-tunneling code. · fd58156e
      Pravin B Shelar 提交于
      Reuse common ip-tunneling code which is re-factored from GRE
      module.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd58156e
    • P
      GRE: Refactor GRE tunneling code. · c5441932
      Pravin B Shelar 提交于
      Following patch refactors GRE code into ip tunneling code and GRE
      specific code. Common tunneling code is moved to ip_tunnel module.
      ip_tunnel module is written as generic library which can be used
      by different tunneling implementations.
      
      ip_tunnel module contains following components:
       - packet xmit and rcv generic code. xmit flow looks like
         (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
       - hash table of all devices.
       - lookup for tunnel devices.
       - control plane operations like device create, destroy, ioctl, netlink
         operations code.
       - registration for tunneling modules, like gre, ipip etc.
       - define single pcpu_tstats dev->tstats.
       - struct tnl_ptk_info added to pass parsed tunnel packet parameters.
      
      ipip.h header is renamed to ip_tunnel.h
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5441932