1. 20 1月, 2011 6 次提交
    • A
    • J
      net_sched: implement a root container qdisc sch_mqprio · b8970f0b
      John Fastabend 提交于
      This implements a mqprio queueing discipline that by default creates
      a pfifo_fast qdisc per tx queue and provides the needed configuration
      interface.
      
      Using the mqprio qdisc the number of tcs currently in use along
      with the range of queues alloted to each class can be configured. By
      default skbs are mapped to traffic classes using the skb priority.
      This mapping is configurable.
      
      Configurable parameters,
      
      struct tc_mqprio_qopt {
      	__u8    num_tc;
      	__u8    prio_tc_map[TC_BITMASK + 1];
      	__u8    hw;
      	__u16   count[TC_MAX_QUEUE];
      	__u16   offset[TC_MAX_QUEUE];
      };
      
      Here the count/offset pairing give the queue alignment and the
      prio_tc_map gives the mapping from skb->priority to tc.
      
      The hw bit determines if the hardware should configure the count
      and offset values. If the hardware bit is set then the operation
      will fail if the hardware does not implement the ndo_setup_tc
      operation. This is to avoid undetermined states where the hardware
      may or may not control the queue mapping. Also minimal bounds
      checking is done on the count/offset to verify a queue does not
      exceed num_tx_queues and that queue ranges do not overlap. Otherwise
      it is left to user policy or hardware configuration to create
      useful mappings.
      
      It is expected that hardware QOS schemes can be implemented by
      creating appropriate mappings of queues in ndo_tc_setup().
      
      One expected use case is drivers will use the ndo_setup_tc to map
      queue ranges onto 802.1Q traffic classes. This provides a generic
      mechanism to map network traffic onto these traffic classes and
      removes the need for lower layer drivers to know specifics about
      traffic types.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8970f0b
    • J
      net: implement mechanism for HW based QOS · 4f57c087
      John Fastabend 提交于
      This patch provides a mechanism for lower layer devices to
      steer traffic using skb->priority to tx queues. This allows
      for hardware based QOS schemes to use the default qdisc without
      incurring the penalties related to global state and the qdisc
      lock. While reliably receiving skbs on the correct tx ring
      to avoid head of line blocking resulting from shuffling in
      the LLD. Finally, all the goodness from txq caching and xps/rps
      can still be leveraged.
      
      Many drivers and hardware exist with the ability to implement
      QOS schemes in the hardware but currently these drivers tend
      to rely on firmware to reroute specific traffic, a driver
      specific select_queue or the queue_mapping action in the
      qdisc.
      
      By using select_queue for this drivers need to be updated for
      each and every traffic type and we lose the goodness of much
      of the upstream work. Firmware solutions are inherently
      inflexible. And finally if admins are expected to build a
      qdisc and filter rules to steer traffic this requires knowledge
      of how the hardware is currently configured. The number of tx
      queues and the queue offsets may change depending on resources.
      Also this approach incurs all the overhead of a qdisc with filters.
      
      With the mechanism in this patch users can set skb priority using
      expected methods ie setsockopt() or the stack can set the priority
      directly. Then the skb will be steered to the correct tx queues
      aligned with hardware QOS traffic classes. In the normal case with
      single traffic class and all queues in this class everything
      works as is until the LLD enables multiple tcs.
      
      To steer the skb we mask out the lower 4 bits of the priority
      and allow the hardware to configure upto 15 distinct classes
      of traffic. This is expected to be sufficient for most applications
      at any rate it is more then the 8021Q spec designates and is
      equal to the number of prio bands currently implemented in
      the default qdisc.
      
      This in conjunction with a userspace application such as
      lldpad can be used to implement 8021Q transmission selection
      algorithms one of these algorithms being the extended transmission
      selection algorithm currently being used for DCB.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f57c087
    • V
      netlink: support setting devgroup parameters · e7ed828f
      Vlad Dogaru 提交于
      If a rtnetlink request specifies a negative or zero ifindex and has no
      interface name attribute, but has a group attribute, then the chenges
      are made to all the interfaces belonging to the specified group.
      Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org>
      Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7ed828f
    • V
      net_device: add support for network device groups · cbda10fa
      Vlad Dogaru 提交于
      Net devices can now be grouped, enabling simpler manipulation from
      userspace. This patch adds a group field to the net_device structure, as
      well as rtnetlink support to query and modify it.
      Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org>
      Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbda10fa
    • S
      net: cleanup unused macros in net directory · 441c793a
      Shan Wei 提交于
      Clean up some unused macros in net/*.
      1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
      2. never be used since introduced to kernel.
         e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.
      Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
      Acked-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      441c793a
  2. 19 1月, 2011 4 次提交
    • E
      net: filter: dont block softirqs in sk_run_filter() · 80f8f102
      Eric Dumazet 提交于
      Packet filter (BPF) doesnt need to disable softirqs, being fully
      re-entrant and lock-less.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80f8f102
    • A
      af_unix: implement socket filter · d6ae3bae
      Alban Crequy 提交于
      Linux Socket Filters can already be successfully attached and detached on unix
      sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
      See: Documentation/networking/filter.txt
      
      But the filter was never used in the unix socket code so it did not work. This
      patch uses sk_filter() to filter buffers before delivery.
      
      This short program demonstrates the problem on SOCK_DGRAM.
      
      int main(void) {
        int i, j, ret;
        int sv[2];
        struct pollfd fds[2];
        char *message = "Hello world!";
        char buffer[64];
        struct sock_filter ins[32] = {{0,},};
        struct sock_fprog filter;
      
        socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);
      
        for (i = 0 ; i < 2 ; i++) {
          fds[i].fd = sv[i];
          fds[i].events = POLLIN;
          fds[i].revents = 0;
        }
      
        for(j = 1 ; j < 13 ; j++) {
      
          /* Set a socket filter to truncate the message */
          memset(ins, 0, sizeof(ins));
          ins[0].code = BPF_RET|BPF_K;
          ins[0].k = j;
          filter.len = 1;
          filter.filter = ins;
          setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));
      
          /* send a message */
          send(sv[0], message, strlen(message) + 1, 0);
      
          /* The filter should let the message pass but truncated. */
          poll(fds, 2, 0);
      
          /* Receive the truncated message*/
          ret = recv(sv[1], buffer, 64, 0);
          printf("received %d bytes, expected %d\n", ret, j);
        }
      
          for (i = 0 ; i < 2 ; i++)
            close(sv[i]);
      
        return 0;
      }
      Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk>
      Reviewed-by: NIan Molton <ian.molton@collabora.co.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6ae3bae
    • J
      net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan. · 6ee400aa
      Jesse Gross 提交于
      In netif_skb_features() we return only the features that are valid for vlans
      if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
      since it enables transmission of vlan tags and is obviously valid.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ee400aa
    • R
      ipv6: Silence privacy extensions initialization · 2fdc1c80
      Romain Francoise 提交于
      When a network namespace is created (via CLONE_NEWNET), the loopback
      interface is automatically added to the new namespace, triggering a
      printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.
      
      This is problematic for applications which use CLONE_NEWNET as
      part of a sandbox, like Chromium's suid sandbox or recent versions of
      vsftpd. On a busy machine, it can lead to thousands of useless
      "lo: Disabled Privacy Extensions" messages appearing in dmesg.
      
      It's easy enough to check the status of privacy extensions via the
      use_tempaddr sysctl, so just removing the printk seems like the most
      sensible solution.
      Signed-off-by: NRomain Francoise <romain@orebokech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fdc1c80
  3. 16 1月, 2011 3 次提交
  4. 15 1月, 2011 1 次提交
  5. 14 1月, 2011 5 次提交
  6. 13 1月, 2011 7 次提交
  7. 12 1月, 2011 8 次提交
  8. 11 1月, 2011 6 次提交