1. 30 12月, 2016 2 次提交
  2. 29 12月, 2016 1 次提交
  3. 28 12月, 2016 1 次提交
  4. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  5. 25 12月, 2016 1 次提交
  6. 18 12月, 2016 1 次提交
    • T
      inet: Fix get port to handle zero port number with soreuseport set · 0643ee4f
      Tom Herbert 提交于
      A user may call listen with binding an explicit port with the intent
      that the kernel will assign an available port to the socket. In this
      case inet_csk_get_port does a port scan. For such sockets, the user may
      also set soreuseport with the intent a creating more sockets for the
      port that is selected. The problem is that the initial socket being
      opened could inadvertently choose an existing and unreleated port
      number that was already created with soreuseport.
      
      This patch adds a boolean parameter to inet_bind_conflict that indicates
      rather soreuseport is allowed for the check (in addition to
      sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
      the argument is set to true if an explicit port is being looked up (snum
      argument is nonzero), and is false if port scan is done.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0643ee4f
  7. 14 12月, 2016 1 次提交
  8. 09 12月, 2016 4 次提交
  9. 07 12月, 2016 10 次提交
  10. 06 12月, 2016 4 次提交
    • E
      net_sched: gen_estimator: complete rewrite of rate estimators · 1c0d32fd
      Eric Dumazet 提交于
      1) Old code was hard to maintain, due to complex lock chains.
         (We probably will be able to remove some kfree_rcu() in callers)
      
      2) Using a single timer to update all estimators does not scale.
      
      3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
         is not supposed to work well)
      
      In this rewrite :
      
      - I removed the RB tree that had to be scanned in
        gen_estimator_active(). qdisc dumps should be much faster.
      
      - Each estimator has its own timer.
      
      - Estimations are maintained in net_rate_estimator structure,
        instead of dirtying the qdisc. Minor, but part of the simplification.
      
      - Reading the estimator uses RCU and a seqcount to provide proper
        support for 32bit kernels.
      
      - We reduce memory need when estimators are not used, since
        we store a pointer, instead of the bytes/packets counters.
      
      - xt_rateest_mt() no longer has to grab a spinlock.
        (In the future, xt_rateest_tg() could be switched to per cpu counters)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c0d32fd
    • A
      switch getfrag callbacks to ..._full() primitives · 0b62fca2
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0b62fca2
    • A
    • E
      net: reorganize struct sock for better data locality · 9115e8cd
      Eric Dumazet 提交于
      Group fields used in TX path, and keep some cache lines mostly read
      to permit sharing among cpus.
      
      Gained two 4 bytes holes on 64bit arches.
      
      Added a place holder for tcp tsq_flags, next to sk_wmem_alloc
      to speed up tcp_wfree() in the following patch.
      
      I have not added ____cacheline_aligned_in_smp, this might be done later.
      I prefer doing this once inet and tcp/udp sockets reorg is also done.
      
      Tested with both TCP and UDP.
      
      UDP receiver performance under flood increased by ~20 % :
      Accessing sk_filter/sk_wq/sk_napi_id no longer stalls because sk_drops
      was moved away from a critical cache line, now mostly read and shared.
      
      	/* --- cacheline 4 boundary (256 bytes) --- */
      	unsigned int               sk_napi_id;           /* 0x100   0x4 */
      	int                        sk_rcvbuf;            /* 0x104   0x4 */
      	struct sk_filter *         sk_filter;            /* 0x108   0x8 */
      	union {
      		struct socket_wq * sk_wq;                /*         0x8 */
      		struct socket_wq * sk_wq_raw;            /*         0x8 */
      	};                                               /* 0x110   0x8 */
      	struct xfrm_policy *       sk_policy[2];         /* 0x118  0x10 */
      	struct dst_entry *         sk_rx_dst;            /* 0x128   0x8 */
      	struct dst_entry *         sk_dst_cache;         /* 0x130   0x8 */
      	atomic_t                   sk_omem_alloc;        /* 0x138   0x4 */
      	int                        sk_sndbuf;            /* 0x13c   0x4 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      	int                        sk_wmem_queued;       /* 0x140   0x4 */
      	atomic_t                   sk_wmem_alloc;        /* 0x144   0x4 */
      	long unsigned int          sk_tsq_flags;         /* 0x148   0x8 */
      	struct sk_buff *           sk_send_head;         /* 0x150   0x8 */
      	struct sk_buff_head        sk_write_queue;       /* 0x158  0x18 */
      	__s32                      sk_peek_off;          /* 0x170   0x4 */
      	int                        sk_write_pending;     /* 0x174   0x4 */
      	long int                   sk_sndtimeo;          /* 0x178   0x8 */
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9115e8cd
  11. 05 12月, 2016 11 次提交
  12. 04 12月, 2016 3 次提交
    • E
      ipv6 addrconf: Implemented enhanced DAD (RFC7527) · adc176c5
      Erik Nordmark 提交于
      Implemented RFC7527 Enhanced DAD.
      IPv6 duplicate address detection can fail if there is some temporary
      loopback of Ethernet frames. RFC7527 solves this by including a random
      nonce in the NS messages used for DAD, and if an NS is received with the
      same nonce it is assumed to be a looped back DAD probe and is ignored.
      RFC7527 is enabled by default. Can be disabled by setting both of
      conf/{all,interface}/enhanced_dad to zero.
      Signed-off-by: NErik Nordmark <nordmark@arista.com>
      Signed-off-by: NBob Gilligan <gilligan@arista.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adc176c5
    • I
      ipv4: fib: Replay events when registering FIB notifier · c3852ef7
      Ido Schimmel 提交于
      Commit b90eb754 ("fib: introduce FIB notification infrastructure")
      introduced a new notification chain to notify listeners (f.e., switchdev
      drivers) about addition and deletion of routes.
      
      However, upon registration to the chain the FIB tables can already be
      populated, which means potential listeners will have an incomplete view
      of the tables.
      
      Solve that by dumping the FIB tables and replaying the events to the
      passed notification block. The dump itself is done using RCU in order
      not to starve consumers that need RTNL to make progress.
      
      The integrity of the dump is ensured by reading the FIB change sequence
      counter before and after the dump under RTNL. This allows us to avoid
      the problematic situation in which the dumping process sends a ENTRY_ADD
      notification following ENTRY_DEL generated by another process holding
      RTNL.
      
      Callers of the registration function may pass a callback that is
      executed in case the dump was inconsistent with current FIB tables.
      
      The number of retries until a consistent dump is achieved is set to a
      fixed number to prevent callers from looping for long periods of time.
      In case current limit proves to be problematic in the future, it can be
      easily converted to be configurable using a sysctl.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3852ef7
    • I
      ipv4: fib: Allow for consistent FIB dumping · cacaad11
      Ido Schimmel 提交于
      The next patch will enable listeners of the FIB notification chain to
      request a dump of the FIB tables. However, since RTNL isn't taken during
      the dump, it's possible for the FIB tables to change mid-dump, which
      will result in inconsistency between the listener's table and the
      kernel's.
      
      Allow listeners to know about changes that occurred mid-dump, by adding
      a change sequence counter to each net namespace. The counter is
      incremented just before a notification is sent in the FIB chain.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cacaad11