1. 10 11月, 2016 1 次提交
  2. 08 11月, 2016 3 次提交
  3. 05 11月, 2016 3 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
    • L
      net: core: add UID to flows, rules, and routes · 622ec2c9
      Lorenzo Colitti 提交于
      - Define a new FIB rule attributes, FRA_UID_RANGE, to describe a
        range of UIDs.
      - Define a RTA_UID attribute for per-UID route lookups and dumps.
      - Support passing these attributes to and from userspace via
        rtnetlink. The value INVALID_UID indicates no UID was
        specified.
      - Add a UID field to the flow structures.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622ec2c9
    • L
      net: core: Add a UID field to struct sock. · 86741ec2
      Lorenzo Colitti 提交于
      Protocol sockets (struct sock) don't have UIDs, but most of the
      time, they map 1:1 to userspace sockets (struct socket) which do.
      
      Various operations such as the iptables xt_owner match need
      access to the "UID of a socket", and do so by following the
      backpointer to the struct socket. This involves taking
      sk_callback_lock and doesn't work when there is no socket
      because userspace has already called close().
      
      Simplify this by adding a sk_uid field to struct sock whose value
      matches the UID of the corresponding struct socket. The semantics
      are as follows:
      
      1. Whenever sk_socket is non-null: sk_uid is the same as the UID
         in sk_socket, i.e., matches the return value of sock_i_uid.
         Specifically, the UID is set when userspace calls socket(),
         fchown(), or accept().
      2. When sk_socket is NULL, sk_uid is defined as follows:
         - For a socket that no longer has a sk_socket because
           userspace has called close(): the previous UID.
         - For a cloned socket (e.g., an incoming connection that is
           established but on which userspace has not yet called
           accept): the UID of the socket it was cloned from.
         - For a socket that has never had an sk_socket: UID 0 inside
           the user namespace corresponding to the network namespace
           the socket belongs to.
      
      Kernel sockets created by sock_create_kern are a special case
      of #1 and sk_uid is the user that created them. For kernel
      sockets created at network namespace creation time, such as the
      per-processor ICMP and TCP sockets, this is the user that created
      the network namespace.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86741ec2
  4. 04 11月, 2016 1 次提交
    • W
      ipv4: add IP_RECVFRAGSIZE cmsg · 70ecc248
      Willem de Bruijn 提交于
      The IP stack records the largest fragment of a reassembled packet
      in IPCB(skb)->frag_max_size. When reading a datagram or raw packet
      that arrived fragmented, expose the value to allow applications to
      estimate receive path MTU.
      
      Tested:
        Sent data over a veth pair of which the source has a small mtu.
        Sent data using netcat, received using a dedicated process.
      
        Verified that the cmsg IP_RECVFRAGSIZE is returned only when
        data arrives fragmented, and in that cases matches the veth mtu.
      
          ip link add veth0 type veth peer name veth1
      
          ip netns add from
          ip netns add to
      
          ip link set dev veth1 netns to
          ip netns exec to ip addr add dev veth1 192.168.10.1/24
          ip netns exec to ip link set dev veth1 up
      
          ip link set dev veth0 netns from
          ip netns exec from ip addr add dev veth0 192.168.10.2/24
          ip netns exec from ip link set dev veth0 up
          ip netns exec from ip link set dev veth0 mtu 1300
          ip netns exec from ethtool -K veth0 ufo off
      
          dd if=/dev/zero bs=1 count=1400 2>/dev/null > payload
      
          ip netns exec to ./recv_cmsg_recvfragsize -4 -u -p 6000 &
          ip netns exec from nc -q 1 -u 192.168.10.1 6000 < payload
      
        using github.com/wdebruij/kerneltools/blob/master/tests/recvfragsize.c
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70ecc248
  5. 02 11月, 2016 3 次提交
    • P
      netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c · 8db4c5be
      Pablo Neira Ayuso 提交于
      We need this split to reuse existing codebase for the upcoming nf_tables
      socket expression.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8db4c5be
    • P
      netfilter: nf_log: add packet logging for netdev family · 1fddf4ba
      Pablo Neira Ayuso 提交于
      Move layer 2 packet logging into nf_log_l2packet() that resides in
      nf_log_common.c, so this can be shared by both bridge and netdev
      families.
      
      This patch adds the boiler plate code to register the netdev logging
      family.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1fddf4ba
    • F
      netfilter: nf_tables: add fib expression · f6d0cbcf
      Florian Westphal 提交于
      Add FIB expression, supported for ipv4, ipv6 and inet family (the latter
      just dispatches to ipv4 or ipv6 one based on nfproto).
      
      Currently supports fetching output interface index/name and the
      rtm_type associated with an address.
      
      This can be used for adding path filtering. rtm_type is useful
      to e.g. enforce a strong-end host model where packets
      are only accepted if daddr is configured on the interface the
      packet arrived on.
      
      The fib expression is a native nftables alternative to the
      xtables addrtype and rp_filter matches.
      
      FIB result order for oif/oifname retrieval is as follows:
       - if packet is local (skb has rtable, RTF_LOCAL set, this
         will also catch looped-back multicast packets), set oif to
         the loopback interface.
       - if fib lookup returns an error, or result points to local,
         store zero result.  This means '--local' option of -m rpfilter
         is not supported. It is possible to use 'fib type local' or add
         explicit saddr/daddr matching rules to create exceptions if this
         is really needed.
       - store result in the destination register.
         In case of multiple routes, search set for desired oif in case
         strict matching is requested.
      
      ipv4 and ipv6 behave fib expressions are supposed to behave the same.
      
      [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings")
      
      	http://patchwork.ozlabs.org/patch/688615/
      
        to address fallout from this patch after rebasing nf-next, that was
        posted to address compilation warnings. --pablo ]
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f6d0cbcf
  6. 01 11月, 2016 1 次提交
    • E
      net: set SK_MEM_QUANTUM to 4096 · bd68a2a8
      Eric Dumazet 提交于
      Systems with large pages (64KB pages for example) do not always have
      huge quantity of memory.
      
      A big SK_MEM_QUANTUM value leads to fewer interactions with the
      global counters (like tcp_memory_allocated) but might trigger
      memory pressure much faster, giving suboptimal TCP performance
      since windows are lowered to ridiculous values.
      
      Note that sysctl_mem units being in pages and in ABI, we also need
      to change sk_prot_mem_limits() accordingly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd68a2a8
  7. 30 10月, 2016 3 次提交
  8. 28 10月, 2016 7 次提交
  9. 27 10月, 2016 9 次提交
  10. 26 10月, 2016 1 次提交
  11. 24 10月, 2016 2 次提交
    • C
      net: ip, diag -- Add diag interface for raw sockets · 432490f9
      Cyrill Gorcunov 提交于
      In criu we are actively using diag interface to collect sockets
      present in the system when dumping applications. And while for
      unix, tcp, udp[lite], packet, netlink it works as expected,
      the raw sockets do not have. Thus add it.
      
      v2:
       - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
       - implement @destroy for diag requests (by dsa@)
      
      v3:
       - add export of raw_abort for IPv6 (by dsa@)
       - pass net-admin flag into inet_sk_diag_fill due to
         changes in net-next branch (by dsa@)
      
      v4:
       - use @pad in struct inet_diag_req_v2 for raw socket
         protocol specification: raw module carries sockets
         which may have custom protocol passed from socket()
         syscall and sole @sdiag_protocol is not enough to
         match underlied ones
       - start reporting protocol specifed in socket() call
         when sockets are raw ones for the same reason: user
         space tools like ss may parse this attribute and use
         it for socket matching
      
      v5 (by eric.dumazet@):
       - use sock_hold in raw_sock_get instead of atomic_inc,
         we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
         when looking up so counter won't be zero here.
      
      v6:
       - use sdiag_raw_protocol() helper which will access @pad
         structure used for raw sockets protocol specification:
         we can't simply rename this member without breaking uapi
      
      v7:
       - sine sdiag_raw_protocol() helper is not suitable for
         uapi lets rather make an alias structure with proper
         names. __check_inet_diag_req_raw helper will catch
         if any of structure unintentionally changed.
      
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andrey Vagin <avagin@openvz.org>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      432490f9
    • T
      lwt: Remove unused len field · f76a9db3
      Thomas Graf 提交于
      The field is initialized by ILA and MPLS but never used. Remove it.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f76a9db3
  12. 23 10月, 2016 2 次提交
    • P
      udp: implement memory accounting helpers · f970bd9e
      Paolo Abeni 提交于
      Avoid using the generic helpers.
      Use the receive queue spin lock to protect the memory
      accounting operation, both on enqueue and on dequeue.
      
      On dequeue perform partial memory reclaiming, trying to
      leave a quantum of forward allocated memory.
      
      On enqueue use a custom helper, to allow some optimizations:
      - use a plain spin_lock() variant instead of the slightly
        costly spin_lock_irqsave(),
      - avoid dst_force check, since the calling code has already
        dropped the skb dst
      - avoid orphaning the skb, since skb_steal_sock() already did
        the work for us
      
      The above needs custom memory reclaiming on shutdown, provided
      by the udp_destruct_sock().
      
      v5 -> v6:
        - don't orphan the skb on enqueue
      
      v4 -> v5:
        - replace the mem_lock with the receive queue spin lock
        - ensure that the bh is always allowed to enqueue at least
          a skb, even if sk_rcvbuf is exceeded
      
      v3 -> v4:
        - reworked memory accunting, simplifying the schema
        - provide an helper for both memory scheduling and enqueuing
      
      v1 -> v2:
        - use a udp specific destrctor to perform memory reclaiming
        - remove a couple of helpers, unneeded after the above cleanup
        - do not reclaim memory on dequeue if not under memory
          pressure
        - reworked the fwd accounting schema to avoid potential
          integer overflow
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f970bd9e
    • P
      net/socket: factor out helpers for memory and queue manipulation · f8c3bf00
      Paolo Abeni 提交于
      Basic sock operations that udp code can use with its own
      memory accounting schema. No functional change is introduced
      in the existing APIs.
      
      v4 -> v5:
        - avoid whitespace changes
      
      v2 -> v4:
        - avoid exporting __sock_enqueue_skb
      
      v1 -> v2:
        - avoid export sock_rmem_free
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8c3bf00
  13. 21 10月, 2016 2 次提交
  14. 19 10月, 2016 2 次提交