1. 30 9月, 2015 11 次提交
    • E
      inet: constify __inet_inherit_port() sock argument · 1ce31c9e
      Eric Dumazet 提交于
      socket is not touched, make it const.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce31c9e
    • E
      inet: constify inet_csk_route_child_sock() socket argument · a2432c4f
      Eric Dumazet 提交于
      The socket points to the (shared) listener.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2432c4f
    • E
      dccp: use inet6_csk_route_req() helper · f76b33c3
      Eric Dumazet 提交于
      Before changing dccp_v6_request_recv_sock() sock argument
      to const, we need to get rid of security_sk_classify_flow(),
      and it seems doable by reusing inet6_csk_route_req() helper.
      
      We need to add a proto parameter to inet6_csk_route_req(),
      not assume it is TCP.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f76b33c3
    • E
      tcp: remove tcp_rcv_state_process() tcp_hdr argument · 72ab4a86
      Eric Dumazet 提交于
      Factorize code to get tcp header from skb. It makes no sense
      to duplicate code in callers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72ab4a86
    • E
      tcp: remove unused len argument from tcp_rcv_state_process() · bda07a64
      Eric Dumazet 提交于
      Once we realize tcp_rcv_synsent_state_process() does not use
      its 'len' argument and we get rid of it, then it becomes clear
      this argument is no longer used in tcp_rcv_state_process()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bda07a64
    • E
      tcp/dccp: constify send_synack and send_reset socket argument · a00e7444
      Eric Dumazet 提交于
      None of these functions need to change the socket, make it
      const.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a00e7444
    • D
      net: Remove martian_source_keep_err goto label · 0d753960
      David Ahern 提交于
      err is initialized to -EINVAL when it is declared. It is not reset until
      fib_lookup which is well after the 3 users of the martian_source jump. So
      resetting err to -EINVAL at martian_source label is not needed.
      
      Removing that line obviates the need for the martian_source_keep_err label
      so delete it.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d753960
    • A
      net: Swap ordering of tests in ip_route_input_mc · 75fea73d
      Alexander Duyck 提交于
      This patch just swaps the ordering of one of the conditional tests in
      ip_route_input_mc.  Specifically it swaps the testing for the source
      address to see if it is loopback, and the test to see if we allow a
      loopback source address.
      
      The reason for swapping these two tests is because it is much faster to
      test if an address is loopback than it is to dereference several pointers
      to get at the net structure to see if the use of loopback is allowed.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75fea73d
    • A
      net/ipv4: Pass proto as u8 instead of u16 in ip_check_mc_rcu · 2094acbb
      Alexander Duyck 提交于
      This patch updates ip_check_mc_rcu so that protocol is passed as a u8
      instead of a u16.
      
      The motivation is just to avoid any unneeded type transitions since some
      systems will require an instruction to zero extend a u8 field to a u16.
      Also it makes it a bit more readable as to the fact that protocol is a u8
      so there are no byte ordering changes needed to pass it.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2094acbb
    • A
      netpoll: Drop budget parameter from NAPI polling call hierarchy · 822d54b9
      Alexander Duyck 提交于
      For some reason we were carrying the budget value around between the
      various calls to napi->poll.  If for example one of the drivers called had
      a bug in which it returned a non-zero value for work this could result in
      the budget value becoming negative.
      
      Rather than carry around a value of budget that is 0 or less we can instead
      just loop through and pass 0 to each napi->poll call.  If any driver
      returns a value for work done that is non-zero then we can report that
      driver and continue rather than allowing a bad actor to make the budget
      value negative and pass that negative value to napi->poll.
      
      Note, the only actual change here is that instead of letting budget become
      negative we are keeping it at 0 regardless of the value returned for work
      since it should not be possible for the polling routine to do any actual
      work with a budget of 0.  So if the polling routine returns a non-0 value
      we are just reporting it and continuing with a budget of 0 rather than
      letting that work value be subtracted from the budget of 0.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      822d54b9
    • N
      bridge: vlan: add per-vlan struct and move to rhashtables · 2594e906
      Nikolay Aleksandrov 提交于
      This patch changes the bridge vlan implementation to use rhashtables
      instead of bitmaps. The main motivation behind this change is that we
      need extensible per-vlan structures (both per-port and global) so more
      advanced features can be introduced and the vlan support can be
      extended. I've tried to break this up but the moment net_port_vlans is
      changed and the whole API goes away, thus this is a larger patch.
      A few short goals of this patch are:
      - Extensible per-vlan structs stored in rhashtables and a sorted list
      - Keep user-visible behaviour (compressed vlans etc)
      - Keep fastpath ingress/egress logic the same (optimizations to come
        later)
      
      Here's a brief list of some of the new features we'd like to introduce:
      - per-vlan counters
      - vlan ingress/egress mapping
      - per-vlan igmp configuration
      - vlan priorities
      - avoid fdb entries replication (e.g. local fdb scaling issues)
      
      The structure is kept single for both global and per-port entries so to
      avoid code duplication where possible and also because we'll soon introduce
      "port0 / aka bridge as port" which should simplify things further
      (thanks to Vlad for the suggestion!).
      
      Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
      rhashtable, if an entry is added to a port it'll get a pointer to its
      global context so it can be quickly accessed later. There's also a
      sorted vlan list which is used for stable walks and some user-visible
      behaviour such as the vlan ranges, also for error paths.
      VLANs are stored in a "vlan group" which currently contains the
      rhashtable, sorted vlan list and the number of "real" vlan entries.
      A good side-effect of this change is that it resembles how hw keeps
      per-vlan data.
      One important note after this change is that if a VLAN is being looked up
      in the bridge's rhashtable for filtering purposes (or to check if it's an
      existing usable entry, not just a global context) then the new helper
      br_vlan_should_use() needs to be used if the vlan is found. In case the
      lookup is done only with a port's vlan group, then this check can be
      skipped.
      
      Things tested so far:
      - basic vlan ingress/egress
      - pvids
      - untagged vlans
      - undef CONFIG_BRIDGE_VLAN_FILTERING
      - adding/deleting vlans in different scenarios (with/without global ctx,
        while transmitting traffic, in ranges etc)
      - loading/removing the module while having/adding/deleting vlans
      - extracting bridge vlan information (user ABI), compressed requests
      - adding/deleting fdbs on vlans
      - bridge mac change, promisc mode
      - default pvid change
      - kmemleak ON during the whole time
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2594e906
  2. 29 9月, 2015 4 次提交
    • J
      net: help compiler generate better code in eth_get_headlen · 8a4683a5
      Jesper Dangaard Brouer 提交于
      Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC))
      generated suboptimal assembler code in eth_get_headlen().
      
      This early return coding style is usually not an issue, on super scalar CPUs,
      but the compiler choose to put the return statement after this very unlikely
      branch, thus creating larger jump down to the likely code path.
      
      Performance wise, I could measure slightly less L1-icache-load-misses
      and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
      use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a4683a5
    • B
      tcp: Fix CWV being too strict on thin streams · d2e1339f
      Bendik Rønning Opstad 提交于
      Application limited streams such as thin streams, that transmit small
      amounts of payload in relatively few packets per RTT, can be prevented
      from growing the CWND when in congestion avoidance. This leads to
      increased sojourn times for data segments in streams that often transmit
      time-dependent data.
      
      Currently, a connection is considered CWND limited only after having
      successfully transmitted at least one packet with new data, while at the
      same time failing to transmit some unsent data from the output queue
      because the CWND is full. Applications that produce small amounts of
      data may be left in a state where it is never considered to be CWND
      limited, because all unsent data is successfully transmitted each time
      an incoming ACK opens up for more data to be transmitted in the send
      window.
      
      Fix by always testing whether the CWND is fully used after successful
      packet transmissions, such that a connection is considered CWND limited
      whenever the CWND has been filled. This is the correct behavior as
      specified in RFC2861 (section 3.1).
      
      Cc: Andreas Petlund <apetlund@simula.no>
      Cc: Carsten Griwodz <griff@simula.no>
      Cc: Jonas Markussen <jonassm@ifi.uio.no>
      Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
      Cc: Mads Johannessen <madsjoh@ifi.uio.no>
      Signed-off-by: NBendik Rønning Opstad <bro.devel+kernel@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Tested-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2e1339f
    • D
      net: Remove redundant oif checks in rt6_device_match · 17fb0b2b
      David Ahern 提交于
      The oif has already been checked that it is non-zero; the 2 additional
      checks on oif within that if (oif) {...} block are redundant.
      
      CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17fb0b2b
    • E
      tcp: avoid reorders for TFO passive connections · 7c85af88
      Eric Dumazet 提交于
      We found that a TCP Fast Open passive connection was vulnerable
      to reorders, as the exchange might look like
      
      [1] C -> S S <FO ...> <request>
      [2] S -> C S. ack request <options>
      [3] S -> C . <answer>
      
      packets [2] and [3] can be generated at almost the same time.
      
      If C receives the 3rd packet before the 2nd, it will drop it as
      the socket is in SYN_SENT state and expects a SYNACK.
      
      S will have to retransmit the answer.
      
      Current OOO avoidance in linux is defeated because SYNACK
      packets are attached to the LISTEN socket, while DATA packets
      are attached to the children. They might be sent by different cpus,
      and different TX queues might be selected.
      
      It turns out that for TFO, we created a child, which is a
      full blown socket in TCP_SYN_RECV state, and we simply can attach
      the SYNACK packet to this socket.
      
      This means that at the time tcp_sendmsg() pushes DATA packet,
      skb->ooo_okay will be set iff the SYNACK packet had been sent
      and TX completed.
      
      This removes the reorder source at the host level.
      
      We also removed the export of tcp_try_fastopen(), as it is no
      longer called from IPv6.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c85af88
  3. 28 9月, 2015 1 次提交
  4. 27 9月, 2015 1 次提交
  5. 26 9月, 2015 21 次提交
  6. 25 9月, 2015 2 次提交
    • R
      net: fix net_device refcounting · 9861f720
      Russell King 提交于
      of_find_net_device_by_node() uses class_find_device() internally to
      lookup the corresponding network device.  class_find_device() returns
      a reference to the embedded struct device, with its refcount
      incremented.
      
      Add a comment to the definition in net/core/net-sysfs.c indicating the
      need to drop this refcount, and fix the DSA code to drop this refcount
      when the OF-generated platform data is cleaned up and freed.  Also
      arrange for the ref to be dropped when handling errors.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9861f720
    • R
      net: dsa: fix of_mdio_find_bus() device refcount leak · e496ae69
      Russell King 提交于
      Current users of of_mdio_find_bus() leak a struct device refcount, as
      they fail to clean up the reference obtained inside class_find_device().
      
      Fix the DSA code to properly refcount the returned MDIO bus by:
      1. taking a reference on the struct device whenever we assign it to
         pd->chip[x].host_dev.
      2. dropping the reference when we overwrite the existing reference.
      3. dropping the reference when we free the data structure.
      4. dropping the initial reference we obtained after setting up the
         platform data structure, or on failure.
      
      In step 2 above, where we obtain a new MDIO bus, there is no need to
      take a reference on it as we would only have to drop it immediately
      after assignment again, iow:
      
      	put_device(cd->host_dev);	/* drop original assignment ref */
      	cd->host_dev = get_device(&mdio_bus_switch->dev); /* get our ref */
      	put_device(&mdio_bus_switch->dev); /* drop of_mdio_find_bus ref */
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e496ae69