1. 07 1月, 2013 1 次提交
  2. 05 1月, 2013 5 次提交
  3. 04 1月, 2013 3 次提交
  4. 30 12月, 2012 1 次提交
    • D
      net: filter: return -EINVAL if BPF_S_ANC* operation is not supported · aa1113d9
      Daniel Borkmann 提交于
      Currently, we return -EINVAL for malformed or wrong BPF filters.
      However, this is not done for BPF_S_ANC* operations, which makes it
      more difficult to detect if it's actually supported or not by the
      BPF machine. Therefore, we should also return -EINVAL if K is within
      the SKF_AD_OFF universe and the ancillary operation did not match.
      
      Why exactly is it needed? If tools such as libpcap/tcpdump want to
      make use of new ancillary operations (like filtering VLAN in kernel
      space), there is currently no sane way to test if this feature /
      BPF_S_ANC* op is present or not, since no error is returned. This
      patch will make life easier for that and allow for a proper usage
      for user space applications.
      
      There was concern, if this patch will break userland. Short answer: Yes
      and no. Long answer: It will "break" only for code that calls ...
      
        { BPF_LD | BPF_(W|H|B) | BPF_ABS, 0, 0, <K> },
      
      ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is *not* an
      ancillary. And here comes the BUT: assuming some *old* code will have
      such an instruction where <K> is between [0xfffff000, 0xffffffff] and
      it doesn't know ancillary operations, then this will give a
      non-expected / unwanted behavior as well (since we do not return the
      BPF machine with 0 after a failed load_pointer(), which was the case
      before introducing ancillary operations, but load sth. into the
      accumulator instead, and continue with the next instruction, for
      instance). Thus, user space code would already have been broken by
      introducing ancillary operations into the BPF machine per se. Code
      that does such a direct load, e.g. "load word at packet offset
      0xffffffff into accumulator" ("ld [0xffffffff]") is quite broken,
      isn't it? The whole assumption of ancillary operations is that no-one
      intentionally calls things like "ld [0xffffffff]" and expect this
      word to be loaded from such a packet offset. Hence, we can also safely
      make use of this feature testing patch and facilitate application
      development. Therefore, at least from this patch onwards, we have
      *for sure* a check whether current or in future implemented BPF_S_ANC*
      ops are supported in the kernel. Patch was tested on x86_64.
      
      (Thanks to Eric for the previous review.)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NAni Sinha <ani@aristanetworks.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa1113d9
  5. 29 12月, 2012 5 次提交
  6. 22 12月, 2012 2 次提交
  7. 15 12月, 2012 1 次提交
    • E
      userns: Require CAP_SYS_ADMIN for most uses of setns. · 5e4a0847
      Eric W. Biederman 提交于
      Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
      the permissions of setns.  With unprivileged user namespaces it
      became possible to create new namespaces without privilege.
      
      However the setns calls were relaxed to only require CAP_SYS_ADMIN in
      the user nameapce of the targed namespace.
      
      Which made the following nasty sequence possible.
      
      pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
      if (pid == 0) { /* child */
      	system("mount --bind /home/me/passwd /etc/passwd");
      }
      else if (pid != 0) { /* parent */
      	char path[PATH_MAX];
      	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
      	fd = open(path, O_RDONLY);
      	setns(fd, 0);
      	system("su -");
      }
      
      Prevent this possibility by requiring CAP_SYS_ADMIN
      in the current user namespace when joing all but the user namespace.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5e4a0847
  8. 12 12月, 2012 3 次提交
  9. 11 12月, 2012 1 次提交
  10. 09 12月, 2012 2 次提交
  11. 08 12月, 2012 2 次提交
  12. 06 12月, 2012 2 次提交
  13. 05 12月, 2012 1 次提交
    • S
      net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD · 4e66ae2e
      Serge Hallyn 提交于
      When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
      to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
      nothing to ns1.
      
      This patch changes that behavior so that when moving a nic from ns1 to ns2, we
      send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
      sent to ns2).
      
      The effects of this can be seen when starting and stopping containers in
      an upstart based host.  Lxc will create a pair of veth nics, the kernel
      sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
      one nic is moved to the container, because no KOBJ_REMOVED event is
      received, the network-instance job for that veth never goes away.  This
      was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
      With this patch the networ-instance jobs properly go away.
      
      The other oddness solved here is that if a nic is passed into a running
      upstart-based container, without this patch no network-instance job is
      started in the container.  But when the container creates a new nic
      itself (ip link add new type veth) then network-interface jobs are
      created.  With this patch, behavior comes in line with a regular host.
      
      v2: also send KOBJ_ADD to new netns.  There will then be a
      _MOVE event from the device_rename() call, but that should
      be innocuous.
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e66ae2e
  14. 01 12月, 2012 1 次提交
    • R
      rtnelink: remove unused parameter from rtnl_create_link(). · c0713563
      Rami Rosen 提交于
      This patch removes an unused parameter (src_net) from rtnl_create_link()
      method and from the method single invocation, in veth.
      This parameter was used in the past when calling
      ops->get_tx_queues(src_net, tb) in rtnl_create_link().
      The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
      get_num_tx_queues() and get_num_rx_queues(), which do not get any
      parameter. This was done in commit d40156aa by
      Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0713563
  15. 30 11月, 2012 1 次提交
  16. 27 11月, 2012 1 次提交
    • B
      sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name · c91f6df2
      Brian Haley 提交于
      Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
      will then require another call like if_indextoname() to get the actual interface
      name, have it return the name directly.
      
      This also matches the existing man page description on socket(7) which mentions
      the argument being an interface name.
      
      If the value has not been set, zero is returned and optlen will be set to zero
      to indicate there is no interface name present.
      
      Added a seqlock to protect this code path, and dev_ifname(), from someone
      changing the device name via dev_change_name().
      
      v2: Added seqlock protection while copying device name.
      
      v3: Fixed word wrap in patch.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91f6df2
  17. 22 11月, 2012 6 次提交
  18. 21 11月, 2012 1 次提交
  19. 20 11月, 2012 1 次提交
    • E
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman 提交于
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      98f842e6