1. 07 1月, 2014 2 次提交
    • V
      net: pkt_sched: PIE AQM scheme · d4b36210
      Vijay Subramanian 提交于
      Proportional Integral controller Enhanced (PIE) is a scheduler to address the
      bufferbloat problem.
      
      >From the IETF draft below:
      " Bufferbloat is a phenomenon where excess buffers in the network cause high
      latency and jitter. As more and more interactive applications (e.g. voice over
      IP, real time video streaming and financial transactions) run in the Internet,
      high latency and jitter degrade application performance. There is a pressing
      need to design intelligent queue management schemes that can control latency and
      jitter; and hence provide desirable quality of service to users.
      
      We present here a lightweight design, PIE(Proportional Integral controller
      Enhanced) that can effectively control the average queueing latency to a target
      value. Simulation results, theoretical analysis and Linux testbed results have
      shown that PIE can ensure low latency and achieve high link utilization under
      various congestion situations. The design does not require per-packet
      timestamp, so it incurs very small overhead and is simple enough to implement
      in both hardware and software.  "
      
      Many thanks to Dave Taht for extensive feedback, reviews, testing and
      suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
      suggestions.  Naeem Khademi and Dave Taht independently contributed to ECN
      support.
      
      For more information, please see technical paper about PIE in the IEEE
      Conference on High Performance Switching and Routing 2013. A copy of the paper
      can be found at ftp://ftpeng.cisco.com/pie/.
      
      Please also refer to the IETF draft submission at
      http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
      
      All relevant code, documents and test scripts and results can be found at
      ftp://ftpeng.cisco.com/pie/.
      
      For problems with the iproute2/tc or Linux kernel code, please contact Vijay
      Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
      (mysuryan@cisco.com)
      Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NMythili Prabhu <mysuryan@cisco.com>
      CC: Dave Taht <dave.taht@bufferbloat.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4b36210
    • D
      netfilter: Fix build failure in nfnetlink_queue_core.c. · 83111e7f
      David S. Miller 提交于
      net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
      net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
      net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
      make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1
      
      Just a missing include of net/tcp_states.h
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83111e7f
  2. 06 1月, 2014 1 次提交
  3. 05 1月, 2014 5 次提交
  4. 04 1月, 2014 15 次提交
    • S
      llc: make lock static · 5e419e68
      stephen hemminger 提交于
      The llc_sap_list_lock does not need to be global, only acquired
      in core.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e419e68
    • S
      socket: cleanups · 8f09898b
      stephen hemminger 提交于
      Namespace related cleaning
      
       * make cred_to_ucred static
       * remove unused sock_rmalloc function
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f09898b
    • T
      ipv4: Use percpu Cache route in IP tunnels · 9a4aa9af
      Tom Herbert 提交于
      percpu route cache eliminates share of dst refcnt between CPUs.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a4aa9af
    • T
      ipv4: Cache dst in tunnels · 7d442fab
      Tom Herbert 提交于
      Avoid doing a route lookup on every packet being tunneled.
      
      In ip_tunnel.c cache the route returned from ip_route_output if
      the tunnel is "connected" so that all the rouitng parameters are
      taken from tunnel parms for a packet. Specifically, not NBMA tunnel
      and tos is from tunnel parms (not inner packet).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d442fab
    • N
      sctp: Add process name and pid to deprecation warnings · f916ec96
      Neil Horman 提交于
      Recently I updated the sctp socket option deprecation warnings to be both a bit
      more clear and ratelimited to prevent user processes from spamming the log file.
      Ben Hutchings suggested that I add the process name and pid to these warnings so
      that users can tell who is responsible for using the deprecated apis.  This
      patch accomplishes that.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f916ec96
    • P
      netfilter: nf_tables: dump sets in all existing families · c9c8e485
      Pablo Neira Ayuso 提交于
      This patch allows you to dump all sets available in all of
      the registered families. This allows you to use NFPROTO_UNSPEC
      to dump all existing sets, similarly to other existing table,
      chain and rule operations.
      
      This patch is based on original patch from Arturo Borrero
      González.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c9c8e485
    • D
      netfilter: x_tables: lightweight process control group matching · 82a37132
      Daniel Borkmann 提交于
      It would be useful e.g. in a server or desktop environment to have
      a facility in the notion of fine-grained "per application" or "per
      application group" firewall policies. Probably, users in the mobile,
      embedded area (e.g. Android based) with different security policy
      requirements for application groups could have great benefit from
      that as well. For example, with a little bit of configuration effort,
      an admin could whitelist well-known applications, and thus block
      otherwise unwanted "hard-to-track" applications like [1] from a
      user's machine. Blocking is just one example, but it is not limited
      to that, meaning we can have much different scenarios/policies that
      netfilter allows us than just blocking, e.g. fine grained settings
      where applications are allowed to connect/send traffic to, application
      traffic marking/conntracking, application-specific packet mangling,
      and so on.
      
      Implementation of PID-based matching would not be appropriate
      as they frequently change, and child tracking would make that
      even more complex and ugly. Cgroups would be a perfect candidate
      for accomplishing that as they associate a set of tasks with a
      set of parameters for one or more subsystems, in our case the
      netfilter subsystem, which, of course, can be combined with other
      cgroup subsystems into something more complex if needed.
      
      As mentioned, to overcome this constraint, such processes could
      be placed into one or multiple cgroups where different fine-grained
      rules can be defined depending on the application scenario, while
      e.g. everything else that is not part of that could be dropped (or
      vice versa), thus making life harder for unwanted processes to
      communicate to the outside world. So, we make use of cgroups here
      to track jobs and limit their resources in terms of iptables
      policies; in other words, limiting, tracking, etc what they are
      allowed to communicate.
      
      In our case we're working on outgoing traffic based on which local
      socket that originated from. Also, one doesn't even need to have
      an a-prio knowledge of the application internals regarding their
      particular use of ports or protocols. Matching is *extremly*
      lightweight as we just test for the sk_classid marker of sockets,
      originating from net_cls. net_cls and netfilter do not contradict
      each other; in fact, each construct can live as standalone or they
      can be used in combination with each other, which is perfectly fine,
      plus it serves Tejun's requirement to not introduce a new cgroups
      subsystem. Through this, we result in a very minimal and efficient
      module, and don't add anything except netfilter code.
      
      One possible, minimal usage example (many other iptables options
      can be applied obviously):
      
       1) Configuring cgroups if not already done, e.g.:
      
        mkdir /sys/fs/cgroup/net_cls
        mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
        mkdir /sys/fs/cgroup/net_cls/0
        echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
        (resp. a real flow handle id for tc)
      
       2) Configuring netfilter (iptables-nftables), e.g.:
      
        iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP
      
       3) Running applications, e.g.:
      
        ping 208.67.222.222  <pid:1799>
        echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
        64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
        [...]
        ping 208.67.220.220  <pid:1804>
        ping: sendmsg: Operation not permitted
        [...]
        echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
        64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
        [...]
      
      Of course, real-world deployments would make use of cgroups user
      space toolsuite, or own custom policy daemons dynamically moving
      applications from/to various cgroups.
      
        [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdfSigned-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: cgroups@vger.kernel.org
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      82a37132
    • D
      net: netprio: rename config to be more consistent with cgroup configs · 86f8515f
      Daniel Borkmann 提交于
      While we're at it and introduced CGROUP_NET_CLASSID, lets also make
      NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
      into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
      CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
      option consistent among networking cgroups, but also among cgroups
      CONFIG conventions in general as the vast majority has a prefix of
      CONFIG_CGROUP_<SUBSYS>.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: cgroups@vger.kernel.org
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      86f8515f
    • D
      net: net_cls: move cgroupfs classid handling into core · fe1217c4
      Daniel Borkmann 提交于
      Zefan Li requested [1] to perform the following cleanup/refactoring:
      
      - Split cgroupfs classid handling into net core to better express a
        possible more generic use.
      
      - Disable module support for cgroupfs bits as the majority of other
        cgroupfs subsystems do not have that, and seems to be not wished
        from cgroup side. Zefan probably might want to follow-up for netprio
        later on.
      
      - By this, code can be further reduced which previously took care of
        functionality built when compiled as module.
      
      cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
      that we are consistent with {netclassid,netprio}_cgroup naming that is
      under net/core/ as suggested by Zefan.
      
      No change in functionality, but only code refactoring that is being
      done here.
      
       [1] http://patchwork.ozlabs.org/patch/304825/Suggested-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: cgroups@vger.kernel.org
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fe1217c4
    • E
      netfilter: xt_CT: fix error value in xt_ct_tg_check() · 14abfa16
      Eric Leblond 提交于
      If setting event mask fails then we were returning 0 for success.
      This patch updates return code to -EINVAL in case of problem.
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      14abfa16
    • S
      netfilter: nf_conntrack: remove dead code · dcd93ed4
      stephen hemminger 提交于
      The following code is not used in current upstream code.
      Some of this seems to be old hooks, other might be used by some
      out of tree module (which I don't care about breaking), and
      the need_ipv4_conntrack was used by old NAT code but no longer
      called.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dcd93ed4
    • S
      netfilter: ipset: remove unused code · 02eca9d2
      stephen hemminger 提交于
      Function never used in current upstream code.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      02eca9d2
    • D
      netfilter: nf_nat: add full port randomization support · 34ce3240
      Daniel Borkmann 提交于
      We currently use prandom_u32() for allocation of ports in tcp bind(0)
      and udp code. In case of plain SNAT we try to keep the ports as is
      or increment on collision.
      
      SNAT --random mode does use per-destination incrementing port
      allocation. As a recent paper pointed out in [1] that this mode of
      port allocation makes it possible to an attacker to find the randomly
      allocated ports through a timing side-channel in a socket overloading
      attack conducted through an off-path attacker.
      
      So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
      in regard to the attack described in this paper. As we need to keep
      compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
      that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
      selection algorithm with a simple prandom_u32() in order to mitigate
      this attack vector. Note that the lfsr113's internal state is
      periodically reseeded by the kernel through a local secure entropy
      source.
      
      More details can be found in [1], the basic idea is to send bursts
      of packets to a socket to overflow its receive queue and measure
      the latency to detect a possible retransmit when the port is found.
      Because of increasing ports to given destination and port, further
      allocations can be predicted. This information could then be used by
      an attacker for e.g. for cache-poisoning, NS pinning, and degradation
      of service attacks against DNS servers [1]:
      
        The best defense against the poisoning attacks is to properly
        deploy and validate DNSSEC; DNSSEC provides security not only
        against off-path attacker but even against MitM attacker. We hope
        that our results will help motivate administrators to adopt DNSSEC.
        However, full DNSSEC deployment make take significant time, and
        until that happens, we recommend short-term, non-cryptographic
        defenses. We recommend to support full port randomisation,
        according to practices recommended in [2], and to avoid
        per-destination sequential port allocation, which we show may be
        vulnerable to derandomisation attacks.
      
      Joint work between Hannes Frederic Sowa and Daniel Borkmann.
      
       [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
       [2] http://arxiv.org/pdf/1205.5190v1.pdfSigned-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      34ce3240
    • M
      netfilter: nf_tables: remove unused variable in nf_tables_dump_set() · 720e0dfa
      Michal Nazarewicz 提交于
      The nfmsg variable is not used (except in sizeof operator which does
      not care about its value) between the first and second time it is
      assigned the value.  Furthermore, nlmsg_data has no side effects, so
      the assignment can be safely removed.
      Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      720e0dfa
    • D
      netfilter: nf_tables: fix type in parsing in nf_tables_set_alloc_name() · 14662917
      Daniel Borkmann 提交于
      In nf_tables_set_alloc_name(), we are trying to find a new, unused
      name for our new set and interate through the list of present sets.
      As far as I can see, we're using format string %d to parse already
      present names in order to mark their presence in a bitmap, so that
      we can later on find the first 0 in that map to assign the new set
      name to. We should rather use a temporary variable of type int to
      store the result of sscanf() to, and for making sanity checks on.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      14662917
  5. 03 1月, 2014 1 次提交
    • C
      net: revert "sched classifier: make cgroup table local" · c1ddf295
      Cong Wang 提交于
      This reverts commit de6fb288.
      Otherwise we got:
      
      net/sched/cls_cgroup.c:106:29: error: static declaration of ‘net_cls_subsys’ follows non-static declaration
       static struct cgroup_subsys net_cls_subsys = {
                                   ^
      In file included from include/linux/cgroup.h:654:0,
                       from net/sched/cls_cgroup.c:18:
      include/linux/cgroup_subsys.h:35:29: note: previous declaration of ‘net_cls_subsys’ was here
       SUBSYS(net_cls)
                                   ^
      make[2]: *** [net/sched/cls_cgroup.o] Error 1
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1ddf295
  6. 02 1月, 2014 12 次提交
  7. 01 1月, 2014 4 次提交