1. 22 1月, 2011 2 次提交
  2. 21 1月, 2011 9 次提交
  3. 20 1月, 2011 11 次提交
    • C
      netfilter: nf_nat: place conntrack in source hash after SNAT is done · 41a7cab6
      Changli Gao 提交于
      If SNAT isn't done, the wrong info maybe got by the other cts.
      
      As the filter table is after DNAT table, the packets dropped in filter
      table also bother bysource hash table.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      41a7cab6
    • F
      netfilter: do not omit re-route check on NF_QUEUE verdict · 28a51ba5
      Florian Westphal 提交于
      ret != NF_QUEUE only works in the "--queue-num 0" case; for
      queues > 0 the test should be '(ret & NF_VERDICT_MASK) != NF_QUEUE'.
      
      However, NF_QUEUE no longer DROPs the skb unconditionally if queueing
      fails (due to NF_VERDICT_FLAG_QUEUE_BYPASS verdict flag), so the
      re-route test should also be performed if this flag is set in the
      verdict.
      
      The full test would then look something like
      
      && ((ret & NF_VERDICT_MASK) == NF_QUEUE && (ret & NF_VERDICT_FLAG_QUEUE_BYPASS))
      
      This is rather ugly, so just remove the NF_QUEUE test altogether.
      
      The only effect is that we might perform an unnecessary route lookup
      in the NF_QUEUE case.
      
      ip6table_mangle did not have such a check.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      28a51ba5
    • E
      net_sched: cleanups · cc7ec456
      Eric Dumazet 提交于
      Cleanup net/sched code to current CodingStyle and practices.
      
      Reduce inline abuse
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc7ec456
    • A
    • J
      net_sched: implement a root container qdisc sch_mqprio · b8970f0b
      John Fastabend 提交于
      This implements a mqprio queueing discipline that by default creates
      a pfifo_fast qdisc per tx queue and provides the needed configuration
      interface.
      
      Using the mqprio qdisc the number of tcs currently in use along
      with the range of queues alloted to each class can be configured. By
      default skbs are mapped to traffic classes using the skb priority.
      This mapping is configurable.
      
      Configurable parameters,
      
      struct tc_mqprio_qopt {
      	__u8    num_tc;
      	__u8    prio_tc_map[TC_BITMASK + 1];
      	__u8    hw;
      	__u16   count[TC_MAX_QUEUE];
      	__u16   offset[TC_MAX_QUEUE];
      };
      
      Here the count/offset pairing give the queue alignment and the
      prio_tc_map gives the mapping from skb->priority to tc.
      
      The hw bit determines if the hardware should configure the count
      and offset values. If the hardware bit is set then the operation
      will fail if the hardware does not implement the ndo_setup_tc
      operation. This is to avoid undetermined states where the hardware
      may or may not control the queue mapping. Also minimal bounds
      checking is done on the count/offset to verify a queue does not
      exceed num_tx_queues and that queue ranges do not overlap. Otherwise
      it is left to user policy or hardware configuration to create
      useful mappings.
      
      It is expected that hardware QOS schemes can be implemented by
      creating appropriate mappings of queues in ndo_tc_setup().
      
      One expected use case is drivers will use the ndo_setup_tc to map
      queue ranges onto 802.1Q traffic classes. This provides a generic
      mechanism to map network traffic onto these traffic classes and
      removes the need for lower layer drivers to know specifics about
      traffic types.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8970f0b
    • J
      net: implement mechanism for HW based QOS · 4f57c087
      John Fastabend 提交于
      This patch provides a mechanism for lower layer devices to
      steer traffic using skb->priority to tx queues. This allows
      for hardware based QOS schemes to use the default qdisc without
      incurring the penalties related to global state and the qdisc
      lock. While reliably receiving skbs on the correct tx ring
      to avoid head of line blocking resulting from shuffling in
      the LLD. Finally, all the goodness from txq caching and xps/rps
      can still be leveraged.
      
      Many drivers and hardware exist with the ability to implement
      QOS schemes in the hardware but currently these drivers tend
      to rely on firmware to reroute specific traffic, a driver
      specific select_queue or the queue_mapping action in the
      qdisc.
      
      By using select_queue for this drivers need to be updated for
      each and every traffic type and we lose the goodness of much
      of the upstream work. Firmware solutions are inherently
      inflexible. And finally if admins are expected to build a
      qdisc and filter rules to steer traffic this requires knowledge
      of how the hardware is currently configured. The number of tx
      queues and the queue offsets may change depending on resources.
      Also this approach incurs all the overhead of a qdisc with filters.
      
      With the mechanism in this patch users can set skb priority using
      expected methods ie setsockopt() or the stack can set the priority
      directly. Then the skb will be steered to the correct tx queues
      aligned with hardware QOS traffic classes. In the normal case with
      single traffic class and all queues in this class everything
      works as is until the LLD enables multiple tcs.
      
      To steer the skb we mask out the lower 4 bits of the priority
      and allow the hardware to configure upto 15 distinct classes
      of traffic. This is expected to be sufficient for most applications
      at any rate it is more then the 8021Q spec designates and is
      equal to the number of prio bands currently implemented in
      the default qdisc.
      
      This in conjunction with a userspace application such as
      lldpad can be used to implement 8021Q transmission selection
      algorithms one of these algorithms being the extended transmission
      selection algorithm currently being used for DCB.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f57c087
    • V
      netlink: support setting devgroup parameters · e7ed828f
      Vlad Dogaru 提交于
      If a rtnetlink request specifies a negative or zero ifindex and has no
      interface name attribute, but has a group attribute, then the chenges
      are made to all the interfaces belonging to the specified group.
      Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org>
      Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7ed828f
    • V
      net_device: add support for network device groups · cbda10fa
      Vlad Dogaru 提交于
      Net devices can now be grouped, enabling simpler manipulation from
      userspace. This patch adds a group field to the net_device structure, as
      well as rtnetlink support to query and modify it.
      Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org>
      Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbda10fa
    • S
      net: cleanup unused macros in net directory · 441c793a
      Shan Wei 提交于
      Clean up some unused macros in net/*.
      1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
      2. never be used since introduced to kernel.
         e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.
      Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
      Acked-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      441c793a
    • P
      netfilter: nf_conntrack: fix lifetime display for disabled connections · f5c88f56
      Patrick McHardy 提交于
      When no tstamp extension exists, ct_delta_time() returns -1, which is
      then assigned to an u64 and tested for negative values to decide
      whether to display the lifetime. This obviously doesn't work, use
      a s64 and merge the two minor functions into one.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      f5c88f56
    • J
      netfilter: xtables: connlimit revision 1 · cc4fc022
      Jan Engelhardt 提交于
      This adds destination address-based selection. The old "inverse"
      member is overloaded (memory-wise) with a new "flags" variable,
      similar to how J.Park did it with xt_string rev 1. Since revision 0
      userspace only sets flag 0x1, no great changes are made to explicitly
      test for different revisions.
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      cc4fc022
  4. 19 1月, 2011 6 次提交
    • P
      netfilter: nf_conntrack_tstamp: add flow-based timestamp extension · a992ca2a
      Pablo Neira Ayuso 提交于
      This patch adds flow-based timestamping for conntracks. This
      conntrack extension is disabled by default. Basically, we use
      two 64-bits variables to store the creation timestamp once the
      conntrack has been confirmed and the other to store the deletion
      time. This extension is disabled by default, to enable it, you
      have to:
      
      echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp
      
      This patch allows to save memory for user-space flow-based
      loogers such as ulogd2. In short, ulogd2 does not need to
      keep a hashtable with the conntrack in user-space to know
      when they were created and destroyed, instead we use the
      kernel timestamp. If we want to have a sane IPFIX implementation
      in user-space, this nanosecs resolution timestamps are also
      useful. Other custom user-space applications can benefit from
      this via libnetfilter_conntrack.
      
      This patch modifies the /proc output to display the delta time
      in seconds since the flow start. You can also obtain the
      flow-start date by means of the conntrack-tools.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a992ca2a
    • E
      net: filter: dont block softirqs in sk_run_filter() · 80f8f102
      Eric Dumazet 提交于
      Packet filter (BPF) doesnt need to disable softirqs, being fully
      re-entrant and lock-less.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80f8f102
    • A
      af_unix: implement socket filter · d6ae3bae
      Alban Crequy 提交于
      Linux Socket Filters can already be successfully attached and detached on unix
      sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
      See: Documentation/networking/filter.txt
      
      But the filter was never used in the unix socket code so it did not work. This
      patch uses sk_filter() to filter buffers before delivery.
      
      This short program demonstrates the problem on SOCK_DGRAM.
      
      int main(void) {
        int i, j, ret;
        int sv[2];
        struct pollfd fds[2];
        char *message = "Hello world!";
        char buffer[64];
        struct sock_filter ins[32] = {{0,},};
        struct sock_fprog filter;
      
        socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);
      
        for (i = 0 ; i < 2 ; i++) {
          fds[i].fd = sv[i];
          fds[i].events = POLLIN;
          fds[i].revents = 0;
        }
      
        for(j = 1 ; j < 13 ; j++) {
      
          /* Set a socket filter to truncate the message */
          memset(ins, 0, sizeof(ins));
          ins[0].code = BPF_RET|BPF_K;
          ins[0].k = j;
          filter.len = 1;
          filter.filter = ins;
          setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));
      
          /* send a message */
          send(sv[0], message, strlen(message) + 1, 0);
      
          /* The filter should let the message pass but truncated. */
          poll(fds, 2, 0);
      
          /* Receive the truncated message*/
          ret = recv(sv[1], buffer, 64, 0);
          printf("received %d bytes, expected %d\n", ret, j);
        }
      
          for (i = 0 ; i < 2 ; i++)
            close(sv[i]);
      
        return 0;
      }
      Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk>
      Reviewed-by: NIan Molton <ian.molton@collabora.co.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6ae3bae
    • J
      net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan. · 6ee400aa
      Jesse Gross 提交于
      In netif_skb_features() we return only the features that are valid for vlans
      if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
      since it enables transmission of vlan tags and is obviously valid.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ee400aa
    • R
      ipv6: Silence privacy extensions initialization · 2fdc1c80
      Romain Francoise 提交于
      When a network namespace is created (via CLONE_NEWNET), the loopback
      interface is automatically added to the new namespace, triggering a
      printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.
      
      This is problematic for applications which use CLONE_NEWNET as
      part of a sandbox, like Chromium's suid sandbox or recent versions of
      vsftpd. On a busy machine, it can lead to thousands of useless
      "lo: Disabled Privacy Extensions" messages appearing in dmesg.
      
      It's easy enough to check the status of privacy extensions via the
      use_tempaddr sysctl, so just removing the printk seems like the most
      sensible solution.
      Signed-off-by: NRomain Francoise <romain@orebokech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fdc1c80
    • J
      netfilter: nf_conntrack: nf_conntrack snmp helper · 93557f53
      Jiri Olsa 提交于
      Adding support for SNMP broadcast connection tracking. The SNMP
      broadcast requests are now paired with the SNMP responses.
      Thus allowing using SNMP broadcasts with firewall enabled.
      
      Please refer to the following conversation:
      http://marc.info/?l=netfilter-devel&m=125992205006600&w=2
      
      Patrick McHardy wrote:
      > > The best solution would be to add generic broadcast tracking, the
      > > use of expectations for this is a bit of abuse.
      > > The second best choice I guess would be to move the help() function
      > > to a shared module and generalize it so it can be used for both.
      This patch implements the "second best choice".
      
      Since the netbios-ns conntrack module uses the same helper
      functionality as the snmp, only one helper function is added
      for both snmp and netbios-ns modules into the new object -
      nf_conntrack_broadcast.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      93557f53
  5. 18 1月, 2011 10 次提交
  6. 17 1月, 2011 2 次提交
    • T
      netfilter: create audit records for x_tables replaces · fbabf31e
      Thomas Graf 提交于
      The setsockopt() syscall to replace tables is already recorded
      in the audit logs. This patch stores additional information
      such as table name and netfilter protocol.
      
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      fbabf31e
    • T
      netfilter: audit target to record accepted/dropped packets · 43f393ca
      Thomas Graf 提交于
      This patch adds a new netfilter target which creates audit records
      for packets traversing a certain chain.
      
      It can be used to record packets which are rejected administraively
      as follows:
      
        -N AUDIT_DROP
        -A AUDIT_DROP -j AUDIT --type DROP
        -A AUDIT_DROP -j DROP
      
      a rule which would typically drop or reject a packet would then
      invoke the new chain to record packets before dropping them.
      
        -j AUDIT_DROP
      
      The module is protocol independant and works for iptables, ip6tables
      and ebtables.
      
      The following information is logged:
       - netfilter hook
       - packet length
       - incomming/outgoing interface
       - MAC src/dst/proto for ethernet packets
       - src/dst/protocol address for IPv4/IPv6
       - src/dst port for TCP/UDP/UDPLITE
       - icmp type/code
      
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      43f393ca