1. 24 2月, 2016 14 次提交
  2. 14 1月, 2016 6 次提交
  3. 13 1月, 2016 4 次提交
  4. 12 1月, 2016 7 次提交
  5. 11 1月, 2016 9 次提交
    • A
      net/rtnetlink: remove unused sz_idx variable · 617cfc75
      Alexander Kuleshov 提交于
      The sz_idx variable is defined in the rtnetlink_rcv_msg(), but
      not used anywhere. Let's remove it.
      Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      617cfc75
    • W
      unix: properly account for FDs passed over unix sockets · 712f4aad
      willy tarreau 提交于
      It is possible for a process to allocate and accumulate far more FDs than
      the process' limit by sending them over a unix socket then closing them
      to keep the process' fd count low.
      
      This change addresses this problem by keeping track of the number of FDs
      in flight per user and preventing non-privileged processes from having
      more FDs in flight than their configured FD limit.
      
      Reported-by: socketpair@gmail.com
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Mitigates: CVE-2013-4312 (Linux 2.0+)
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      712f4aad
    • J
      openvswitch: update kernel doc for struct vport · c5420eb1
      Jean Sacren 提交于
      commit be4ace6e ("openvswitch: Move dev pointer into vport itself")
      
      The commit above added @dev and moved @rcu to the bottom of struct
      vport, but the change was not reflected in the kernel doc. So let's
      update the kernel doc as well.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5420eb1
    • J
      openvswitch: fix struct geneve_port member name · 2f7066ad
      Jean Sacren 提交于
      commit 6b001e68 ("openvswitch: Use Geneve device.")
      
      The commit above introduced 'port_no' as the name for the member of
      struct geneve_port. The correct name should be 'dst_port' as described
      in the kernel doc. Let's fix that member name and all the pertinent
      instances so that both doc and code would be consistent.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f7066ad
    • J
      openvswitch: clean up unused function · 5ea03042
      Jean Sacren 提交于
      commit 6b001e68 ("openvswitch: Use Geneve device.")
      
      The commit above deleted the only call site of ovs_tunnel_route_lookup()
      and now that function is not used any more. So let's delete the function
      definition as well.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ea03042
    • E
      ipv6: tcp: add rcu locking in tcp_v6_send_synack() · 3e4006f0
      Eric Dumazet 提交于
      When first SYNACK is sent, we already hold rcu_read_lock(), but this
      is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
      does not hold rcu_read_lock()
      
      Fixes: 45f6fad8 ("ipv6: add complete rcu protection around np->opt")
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e4006f0
    • E
      net: add scheduling point in recvmmsg/sendmmsg · a78cb84c
      Eric Dumazet 提交于
      Applications often have to reduce number of datagrams
      they receive or send per system call to avoid starvation problems.
      
      Really the kernel should take care of this by using cond_resched(),
      so that applications can experiment bigger batch sizes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a78cb84c
    • L
      ipv6: always add flag an address that failed DAD with DADFAILED · 3d171f39
      Lubomir Rintel 提交于
      The userspace needs to know why is the address being removed so that it can
      perhaps obtain a new address.
      
      Without the DADFAILED flag it's impossible to distinguish removal of a
      temporary and tentative address due to DAD failure from other reasons (device
      removed, manual address removal).
      Signed-off-by: NLubomir Rintel <lkundrak@v3.sk>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d171f39
    • D
      net, sched: add clsact qdisc · 1f211a1b
      Daniel Borkmann 提交于
      This work adds a generalization of the ingress qdisc as a qdisc holding
      only classifiers. The clsact qdisc works on ingress, but also on egress.
      In both cases, it's execution happens without taking the qdisc lock, and
      the main difference for the egress part compared to prior version of [1]
      is that this can be applied with _any_ underlying real egress qdisc (also
      classless ones).
      
      Besides solving the use-case of [1], that is, allowing for more programmability
      on assigning skb->priority for the mqprio case that is supported by most
      popular 10G+ NICs, it also opens up a lot more flexibility for other tc
      applications. The main work on classification can already be done at clsact
      egress time if the use-case allows and state stored for later retrieval
      f.e. again in skb->priority with major/minors (which is checked by most
      classful qdiscs before consulting tc_classify()) and/or in other skb fields
      like skb->tc_index for some light-weight post-processing to get to the
      eventual classid in case of a classful qdisc. Another use case is that
      the clsact egress part allows to have a central egress counterpart to
      the ingress classifiers, so that classifiers can easily share state (e.g.
      in cls_bpf via eBPF maps) for ingress and egress.
      
      Currently, default setups like mq + pfifo_fast would require for this to
      use, for example, prio qdisc instead (to get a tc_classify() run) and to
      duplicate the egress classifier for each queue. With clsact, it allows
      for leaving the setup as is, it can additionally assign skb->priority to
      put the skb in one of pfifo_fast's bands and it can share state with maps.
      Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
      w/o the need to perform a skb_dst_force() to hold on to it any longer. In
      lwt case, we can also use this facility to setup dst metadata via cls_bpf
      (bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
      that (case of IFF_NO_QUEUE devices, for example).
      
      The realization can be done without any changes to the scheduler core
      framework. All it takes is that we have two a-priori defined minors/child
      classes, where we can mux between ingress and egress classifier list
      (dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
      dev->_tx to avoid extra cacheline miss for moderate loads). The egress
      part is a bit similar modelled to handle_ing() and patched to a noop in
      case the functionality is not used. Both handlers are now called
      sch_handle_ingress() and sch_handle_egress(), code sharing among the two
      doesn't seem practical as there are various minor differences in both
      paths, so that making them conditional in a single handler would rather
      slow things down.
      
      Full compatibility to ingress qdisc is provided as well. Since both
      piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
      per netdevice, and thus ingress qdisc specific behaviour can be retained
      for user space. This means, either a user does 'tc qdisc add dev foo ingress'
      and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
      alternative, where both, ingress and egress classifier can be configured
      as in the below example. ingress qdisc supports attaching classifier to any
      minor number whereas clsact has two fixed minors for muxing between the
      lists, therefore to not break user space setups, they are better done as
      two separate qdiscs.
      
      I decided to extend the sch_ingress module with clsact functionality so
      that commonly used code can be reused, the module is being aliased with
      sch_clsact so that it can be auto-loaded properly. Alternative would have been
      to add a flag when initializing ingress to alter its behaviour plus aliasing
      to a different name (as it's more than just ingress). However, the first would
      end up, based on the flag, choosing the new/old behaviour by calling different
      function implementations to handle each anyway, the latter would require to
      register ingress qdisc once again under different alias. So, this really begs
      to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
      by its own that share callbacks used by both.
      
      Example, adding qdisc:
      
         # tc qdisc add dev foo clsact
         # tc qdisc show dev foo
         qdisc mq 0: root
         qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc clsact ffff: parent ffff:fff1
      
      Adding filters (deleting, etc works analogous by specifying ingress/egress):
      
         # tc filter add dev foo ingress bpf da obj bar.o sec ingress
         # tc filter add dev foo egress  bpf da obj bar.o sec egress
         # tc filter show dev foo ingress
         filter protocol all pref 49152 bpf
         filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
         # tc filter show dev foo egress
         filter protocol all pref 49152 bpf
         filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action
      
      A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
      show an empty list for clsact. Either using the parent names (ingress/egress)
      or specifying the full major/minor will then show the related filter lists.
      
      Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.
      
        [1] http://patchwork.ozlabs.org/patch/512949/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f211a1b