1. 03 12月, 2016 1 次提交
    • F
      tcp: randomize tcp timestamp offsets for each connection · 95a22cae
      Florian Westphal 提交于
      jiffies based timestamps allow for easy inference of number of devices
      behind NAT translators and also makes tracking of hosts simpler.
      
      commit ceaa1fef ("tcp: adding a per-socket timestamp offset")
      added the main infrastructure that is needed for per-connection ts
      randomization, in particular writing/reading the on-wire tcp header
      format takes the offset into account so rest of stack can use normal
      tcp_time_stamp (jiffies).
      
      So only two items are left:
       - add a tsoffset for request sockets
       - extend the tcp isn generator to also return another 32bit number
         in addition to the ISN.
      
      Re-use of ISN generator also means timestamps are still monotonically
      increasing for same connection quadruple, i.e. PAWS will still work.
      
      Includes fixes from Eric Dumazet.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a22cae
  2. 02 12月, 2016 5 次提交
  3. 01 12月, 2016 2 次提交
  4. 30 11月, 2016 7 次提交
  5. 28 11月, 2016 2 次提交
  6. 26 11月, 2016 8 次提交
    • J
      tipc: resolve connection flow control compatibility problem · 6998cc6e
      Jon Paul Maloy 提交于
      In commit 10724cc7 ("tipc: redesign connection-level flow control")
      we replaced the previous message based flow control with one based on
      1k blocks. In order to ensure backwards compatibility the mechanism
      falls back to using message as base unit when it senses that the peer
      doesn't support the new algorithm. The default flow control window,
      i.e., how many units can be sent before the sender blocks and waits
      for an acknowledge (aka advertisement) is 512. This was tested against
      the previous version, which uses an acknowledge frequency of on ack per
      256 received message, and found to work fine.
      
      However, we missed the fact that versions older than Linux 3.15 use an
      acknowledge frequency of 512, which is exactly the limit where a 4.6+
      sender will stop and wait for acknowledge. This would also work fine if
      it weren't for the fact that if the first sent message on a 4.6+ server
      side is an empty SYNACK, this one is also is counted as a sent message,
      while it is not counted as a received message on a legacy 3.15-receiver.
      This leads to the sender always being one step ahead of the receiver, a
      scenario causing the sender to block after 512 sent messages, while the
      receiver only has registered 511 read messages. Hence, the legacy
      receiver is not trigged to send an acknowledge, with a permanently
      blocked sender as result.
      
      We solve this deadlock by simply allowing the sender to send one more
      message before it blocks, i.e., by a making minimal change to the
      condition used for determining connection congestion.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6998cc6e
    • M
      net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS · 8006f6bf
      Miroslav Lichvar 提交于
      The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
      command and likewise it shouldn't require the CAP_NET_ADMIN capability.
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8006f6bf
    • J
      tipc: improve sanity check for received domain records · d876a4d2
      Jon Paul Maloy 提交于
      In commit 35c55c98 ("tipc: add neighbor monitoring framework") we
      added a data area to the link monitor STATE messages under the
      assumption that previous versions did not use any such data area.
      
      For versions older than Linux 4.3 this assumption is not correct. In
      those version, all STATE messages sent out from a node inadvertently
      contain a 16 byte data area containing a string; -a leftover from
      previous RESET messages which were using this during the setup phase.
      This string serves no purpose in STATE messages, and should no be there.
      
      Unfortunately, this data area is delivered to the link monitor
      framework, where a sanity check catches that it is not a correct domain
      record, and drops it. It also issues a rate limited warning about the
      event.
      
      Since such events occur much more frequently than anticipated, we now
      choose to remove the warning in order to not fill the kernel log with
      useless contents. We also make the sanity check stricter, to further
      reduce the risk that such data is inavertently admitted.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d876a4d2
    • J
      tipc: fix compatibility bug in link monitoring · f7967556
      Jon Paul Maloy 提交于
      commit 81729810 ("tipc: fix link priority propagation") introduced a
      compatibility problem between TIPC versions newer than Linux 4.6 and
      those older than Linux 4.4. In versions later than 4.4, link STATE
      messages only contain a non-zero link priority value when the sender
      wants the receiver to change its priority. This has the effect that the
      receiver resets itself in order to apply the new priority. This works
      well, and is consistent with the said commit.
      
      However, in versions older than 4.4 a valid link priority is present in
      all sent link STATE messages, leading to cyclic link establishment and
      reset on the 4.6+ node.
      
      We fix this by adding a test that the received value should not only
      be valid, but also differ from the current value in order to cause the
      receiving link endpoint to reset.
      Reported-by: NAmar Nv <amar.nv005@gmail.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7967556
    • E
      net: properly flush delay-freed skbs · f52dffe0
      Eric Dumazet 提交于
      Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
      This put skb in a percpu special queue, napi_alloc_cache, to get bulk
      frees.
      
      It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
      limit quite often, with skbs that were queued hundreds of usec earlier.
      I measured this can take ~6000 nsec to perform one flush.
      
      __kfree_skb_flush() can be called from two points right now :
      
      1) From net_tx_action(), but only for skbs that were queued to
      sd->completion_queue.
      
       -> Irrelevant for NAPI drivers in normal operation.
      
      2) From net_rx_action(), but only under high stress or if RPS/RFS has a
      pending action.
      
      This patch changes net_rx_action() to perform the flush in all cases and
      after more urgent operations happened (like kicking remote CPUS for
      RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f52dffe0
    • D
      net: ipv4, ipv6: run cgroup eBPF egress programs · 33b48679
      Daniel Mack 提交于
      If the cgroup associated with the receiving socket has an eBPF
      programs installed, run them from ip_output(), ip6_output() and
      ip_mc_output(). From mentioned functions we have two socket contexts
      as per 7026b1dd ("netfilter: Pass socket pointer down through
      okfn()."). We explicitly need to use sk instead of skb->sk here,
      since otherwise the same program would run multiple times on egress
      when encap devices are involved, which is not desired in our case.
      
      eBPF programs used in this context are expected to either return 1 to
      let the packet pass, or != 1 to drop them. The programs have access to
      the skb through bpf_skb_load_bytes(), and the payload starts at the
      network headers (L3).
      
      Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
      for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
      the feature is unused.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33b48679
    • D
      net: filter: run cgroup eBPF ingress programs · c11cd3a6
      Daniel Mack 提交于
      If the cgroup associated with the receiving socket has an eBPF
      programs installed, run them from sk_filter_trim_cap().
      
      eBPF programs used in this context are expected to either return 1 to
      let the packet pass, or != 1 to drop them. The programs have access to
      the skb through bpf_skb_load_bytes(), and the payload starts at the
      network headers (L3).
      
      Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
      for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
      the feature is unused.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c11cd3a6
    • D
      bpf: add new prog type for cgroup socket filtering · 0e33661d
      Daniel Mack 提交于
      This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that
      it does not allow BPF_LD_[ABS|IND] instructions and hooks up the
      bpf_skb_load_bytes() helper.
      
      Programs of this type will be attached to cgroups for network filtering
      and accounting.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e33661d
  7. 25 11月, 2016 8 次提交
  8. 24 11月, 2016 2 次提交
  9. 23 11月, 2016 3 次提交
  10. 22 11月, 2016 2 次提交