1. 12 1月, 2013 1 次提交
  2. 11 1月, 2013 6 次提交
    • A
      net: Add support for XPS without sysfs being defined · 024e9679
      Alexander Duyck 提交于
      This patch makes it so that we can support transmit packet steering without
      sysfs needing to be enabled.  The reason for making this change is to make
      it so that a driver can make use of the XPS even while the sysfs portion of
      the interface is not present.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      024e9679
    • A
      net: Rewrite netif_set_xps_queues to address several issues · 01c5f864
      Alexander Duyck 提交于
      This change is meant to address several issues I found within the
      netif_set_xps_queues function.
      
      If the allocation of one of the maps to be assigned to new_dev_maps failed
      we could end up with the device map in an inconsistent state since we had
      already worked through a number of CPUs and removed or added the queue.  To
      address that I split the process into several steps.  The first of which is
      just the allocation of updated maps for CPUs that will need larger maps to
      store the queue.  By doing this we can fail gracefully without actually
      altering the contents of the current device map.
      
      The second issue I found was the fact that we were always allocating a new
      device map even if we were not adding any queues.  I have updated the code
      so that we only allocate a new device map if we are adding queues,
      otherwise if we are not adding any queues to CPUs we just skip to the
      removal process.
      
      The last change I made was to reuse the code from remove_xps_queue to remove
      the queue from the CPU.  By making this change we can be consistent in how
      we go about adding and removing the queues from the CPUs.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01c5f864
    • A
      net: Rewrite netif_reset_xps_queue to allow for better code reuse · 10cdc3f3
      Alexander Duyck 提交于
      This patch does a minor refactor on netif_reset_xps_queue to address a few
      items I noticed.
      
      First is the fact that we are doing removal of queues in both
      netif_reset_xps_queue and netif_set_xps_queue.  Since there is no need to
      have the code in two places I am pushing it out into a separate function
      and will come back in another patch and reuse the code in
      netif_set_xps_queue.
      
      The second item this change addresses is the fact that the Tx queues were
      not getting their numa_node value cleared as a part of the XPS queue reset.
      This patch resolves that by resetting the numa_node value if the dev_maps
      value is set.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10cdc3f3
    • A
      net: Add functions netif_reset_xps_queue and netif_set_xps_queue · 537c00de
      Alexander Duyck 提交于
      This patch adds two functions, netif_reset_xps_queue and
      netif_set_xps_queue.  The main idea behind these two functions is to
      provide a mechanism through which drivers can update their defaults in
      regards to XPS.
      
      Currently no such mechanism exists and as a result we cannot use XPS for
      things such as ATR which would require a basic configuration to start in
      which the Tx queues are mapped to CPUs via a 1:1 mapping.  With this change
      I am making it possible for drivers such as ixgbe to be able to use the XPS
      feature by controlling the default configuration.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      537c00de
    • A
      net: Split core bits of netdev_pick_tx into __netdev_pick_tx · 416186fb
      Alexander Duyck 提交于
      This change splits the core bits of netdev_pick_tx into a separate function.
      The main idea behind this is to make this code accessible to select queue
      functions when they decide to process the standard path instead of their
      own custom path in their select queue routine.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      416186fb
    • E
      net_sched: more precise pkt_len computation · 1def9238
      Eric Dumazet 提交于
      One long standing problem with TSO/GSO/GRO packets is that skb->len
      doesn't represent a precise amount of bytes on wire.
      
      Headers are only accounted for the first segment.
      For TCP, thats typically 66 bytes per 1448 bytes segment missing,
      an error of 4.5 % for normal MSS value.
      
      As consequences :
      
      1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits.
      2) Device stats are slightly under estimated as well.
      
      Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len
      computation.
      
      Packet schedulers should use qdisc pkt_len instead of skb->len for their
      bandwidth limitations, and TSO enabled devices drivers could use pkt_len
      if their statistics are not hardware assisted, and if they don't scratch
      skb->cb[] first word.
      
      Both egress and ingress paths work, thanks to commit fda55eca
      (net: introduce skb_transport_header_was_set()) : If GRO built
      a GSO packet, it also set the transport header for us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Paolo Valente <paolo.valente@unimore.it>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1def9238
  3. 09 1月, 2013 4 次提交
  4. 08 1月, 2013 1 次提交
  5. 07 1月, 2013 2 次提交
  6. 05 1月, 2013 5 次提交
  7. 04 1月, 2013 3 次提交
  8. 30 12月, 2012 1 次提交
    • D
      net: filter: return -EINVAL if BPF_S_ANC* operation is not supported · aa1113d9
      Daniel Borkmann 提交于
      Currently, we return -EINVAL for malformed or wrong BPF filters.
      However, this is not done for BPF_S_ANC* operations, which makes it
      more difficult to detect if it's actually supported or not by the
      BPF machine. Therefore, we should also return -EINVAL if K is within
      the SKF_AD_OFF universe and the ancillary operation did not match.
      
      Why exactly is it needed? If tools such as libpcap/tcpdump want to
      make use of new ancillary operations (like filtering VLAN in kernel
      space), there is currently no sane way to test if this feature /
      BPF_S_ANC* op is present or not, since no error is returned. This
      patch will make life easier for that and allow for a proper usage
      for user space applications.
      
      There was concern, if this patch will break userland. Short answer: Yes
      and no. Long answer: It will "break" only for code that calls ...
      
        { BPF_LD | BPF_(W|H|B) | BPF_ABS, 0, 0, <K> },
      
      ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is *not* an
      ancillary. And here comes the BUT: assuming some *old* code will have
      such an instruction where <K> is between [0xfffff000, 0xffffffff] and
      it doesn't know ancillary operations, then this will give a
      non-expected / unwanted behavior as well (since we do not return the
      BPF machine with 0 after a failed load_pointer(), which was the case
      before introducing ancillary operations, but load sth. into the
      accumulator instead, and continue with the next instruction, for
      instance). Thus, user space code would already have been broken by
      introducing ancillary operations into the BPF machine per se. Code
      that does such a direct load, e.g. "load word at packet offset
      0xffffffff into accumulator" ("ld [0xffffffff]") is quite broken,
      isn't it? The whole assumption of ancillary operations is that no-one
      intentionally calls things like "ld [0xffffffff]" and expect this
      word to be loaded from such a packet offset. Hence, we can also safely
      make use of this feature testing patch and facilitate application
      development. Therefore, at least from this patch onwards, we have
      *for sure* a check whether current or in future implemented BPF_S_ANC*
      ops are supported in the kernel. Patch was tested on x86_64.
      
      (Thanks to Eric for the previous review.)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NAni Sinha <ani@aristanetworks.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa1113d9
  9. 29 12月, 2012 5 次提交
  10. 22 12月, 2012 2 次提交
  11. 15 12月, 2012 1 次提交
    • E
      userns: Require CAP_SYS_ADMIN for most uses of setns. · 5e4a0847
      Eric W. Biederman 提交于
      Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
      the permissions of setns.  With unprivileged user namespaces it
      became possible to create new namespaces without privilege.
      
      However the setns calls were relaxed to only require CAP_SYS_ADMIN in
      the user nameapce of the targed namespace.
      
      Which made the following nasty sequence possible.
      
      pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
      if (pid == 0) { /* child */
      	system("mount --bind /home/me/passwd /etc/passwd");
      }
      else if (pid != 0) { /* parent */
      	char path[PATH_MAX];
      	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
      	fd = open(path, O_RDONLY);
      	setns(fd, 0);
      	system("su -");
      }
      
      Prevent this possibility by requiring CAP_SYS_ADMIN
      in the current user namespace when joing all but the user namespace.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5e4a0847
  12. 12 12月, 2012 3 次提交
  13. 11 12月, 2012 1 次提交
  14. 09 12月, 2012 2 次提交
  15. 08 12月, 2012 2 次提交
  16. 06 12月, 2012 1 次提交