1. 29 4月, 2008 3 次提交
  2. 06 2月, 2008 1 次提交
    • S
      capabilities: introduce per-process capability bounding set · 3b7391de
      Serge E. Hallyn 提交于
      The capability bounding set is a set beyond which capabilities cannot grow.
       Currently cap_bset is per-system.  It can be manipulated through sysctl,
      but only init can add capabilities.  Root can remove capabilities.  By
      default it includes all caps except CAP_SETPCAP.
      
      This patch makes the bounding set per-process when file capabilities are
      enabled.  It is inherited at fork from parent.  Noone can add elements,
      CAP_SETPCAP is required to remove them.
      
      One example use of this is to start a safer container.  For instance, until
      device namespaces or per-container device whitelists are introduced, it is
      best to take CAP_MKNOD away from a container.
      
      The bounding set will not affect pP and pE immediately.  It will only
      affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
      and exec() does not constrain pI'.  So to really start a shell with no way
      of regain CAP_MKNOD, you would do
      
      	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
      	cap_t cap = cap_get_proc();
      	cap_value_t caparray[1];
      	caparray[0] = CAP_MKNOD;
      	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
      	cap_set_proc(cap);
      	cap_free(cap);
      
      The following test program will get and set the bounding
      set (but not pI).  For instance
      
      	./bset get
      		(lists capabilities in bset)
      	./bset drop cap_net_raw
      		(starts shell with new bset)
      		(use capset, setuid binary, or binary with
      		file capabilities to try to increase caps)
      
      ************************************************************
      cap_bound.c
      ************************************************************
       #include <sys/prctl.h>
       #include <linux/capability.h>
       #include <sys/types.h>
       #include <unistd.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
      
       #ifndef PR_CAPBSET_READ
       #define PR_CAPBSET_READ 23
       #endif
      
       #ifndef PR_CAPBSET_DROP
       #define PR_CAPBSET_DROP 24
       #endif
      
      int usage(char *me)
      {
      	printf("Usage: %s get\n", me);
      	printf("       %s drop <capability>\n", me);
      	return 1;
      }
      
       #define numcaps 32
      char *captable[numcaps] = {
      	"cap_chown",
      	"cap_dac_override",
      	"cap_dac_read_search",
      	"cap_fowner",
      	"cap_fsetid",
      	"cap_kill",
      	"cap_setgid",
      	"cap_setuid",
      	"cap_setpcap",
      	"cap_linux_immutable",
      	"cap_net_bind_service",
      	"cap_net_broadcast",
      	"cap_net_admin",
      	"cap_net_raw",
      	"cap_ipc_lock",
      	"cap_ipc_owner",
      	"cap_sys_module",
      	"cap_sys_rawio",
      	"cap_sys_chroot",
      	"cap_sys_ptrace",
      	"cap_sys_pacct",
      	"cap_sys_admin",
      	"cap_sys_boot",
      	"cap_sys_nice",
      	"cap_sys_resource",
      	"cap_sys_time",
      	"cap_sys_tty_config",
      	"cap_mknod",
      	"cap_lease",
      	"cap_audit_write",
      	"cap_audit_control",
      	"cap_setfcap"
      };
      
      int getbcap(void)
      {
      	int comma=0;
      	unsigned long i;
      	int ret;
      
      	printf("i know of %d capabilities\n", numcaps);
      	printf("capability bounding set:");
      	for (i=0; i<numcaps; i++) {
      		ret = prctl(PR_CAPBSET_READ, i);
      		if (ret < 0)
      			perror("prctl");
      		else if (ret==1)
      			printf("%s%s", (comma++) ? ", " : " ", captable[i]);
      	}
      	printf("\n");
      	return 0;
      }
      
      int capdrop(char *str)
      {
      	unsigned long i;
      
      	int found=0;
      	for (i=0; i<numcaps; i++) {
      		if (strcmp(captable[i], str) == 0) {
      			found=1;
      			break;
      		}
      	}
      	if (!found)
      		return 1;
      	if (prctl(PR_CAPBSET_DROP, i)) {
      		perror("prctl");
      		return 1;
      	}
      	return 0;
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc<2)
      		return usage(argv[0]);
      	if (strcmp(argv[1], "get")==0)
      		return getbcap();
      	if (strcmp(argv[1], "drop")!=0 || argc<3)
      		return usage(argv[0]);
      	if (capdrop(argv[2])) {
      		printf("unknown capability\n");
      		return 1;
      	}
      	return execl("/bin/bash", "/bin/bash", NULL);
      }
      ************************************************************
      
      [serue@us.ibm.com: fix typo]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>a
      Signed-off-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Tested-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7391de
  3. 01 2月, 2008 1 次提交
    • E
      [IPV4] route cache: Introduce rt_genid for smooth cache invalidation · 29e75252
      Eric Dumazet 提交于
      Current ip route cache implementation is not suited to large caches.
      
      We can consume a lot of CPU when cache must be invalidated, since we
      currently need to evict all cache entries, and this eviction is
      sometimes asynchronous. min_delay & max_delay can somewhat control this
      asynchronism behavior, but whole thing is a kludge, regularly triggering
      infamous soft lockup messages. When entries are still in use, this also
      consumes a lot of ram, filling dst_garbage.list.
      
      A better scheme is to use a generation identifier on each entry,
      so that cache invalidation can be performed by changing the table
      identifier, without having to scan all entries.
      No more delayed flushing, no more stalling when secret_interval expires.
      
      Invalidated entries will then be freed at GC time (controled by
      ip_rt_gc_timeout or stress), or when an invalidated entry is found
      in a chain when an insert is done.
      Thus we keep a normal equilibrium.
      
      This patch :
      - renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
      - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
      - Checks entry->rt_genid at appropriate places :
      29e75252
  4. 29 1月, 2008 3 次提交
    • E
      sysctl: Infrastructure for per namespace sysctls · e51b6ba0
      Eric W. Biederman 提交于
      This patch implements the basic infrastructure for per namespace sysctls.
      
      A list of lists of sysctl headers is added, allowing each namespace to have
      it's own list of sysctl headers.
      
      Each list of sysctl headers has a lookup function to find the first
      sysctl header in the list, allowing the lists to have a per namespace
      instance.
      
      register_sysct_root is added to tell sysctl.c about additional
      lists of sysctl_headers.  As all of the users are expected to be in
      kernel no unregister function is provided.
      
      sysctl_head_next is updated to walk through the list of lists.
      
      __register_sysctl_paths is added to add a new sysctl table on
      a non-default sysctl list.
      
      The only intrusive part of this patch is propagating the information
      to decided which list of sysctls to use for sysctl_check_table.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e51b6ba0
    • E
      sysctl: Remember the ctl_table we passed to register_sysctl_paths · 23eb06de
      Eric W. Biederman 提交于
      By doing this we allow users of register_sysctl_paths that build
      and dynamically allocate their ctl_table to be simpler.  This allows
      them to just remember the ctl_table_header returned from
      register_sysctl_paths from which they can now find the
      ctl_table array they need to free.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23eb06de
    • E
      sysctl: Add register_sysctl_paths function · 29e796fd
      Eric W. Biederman 提交于
      There are a number of modules that register a sysctl table
      somewhere deeply nested in the sysctl hierarchy, such as
      fs/nfs, fs/xfs, dev/cdrom, etc.
      
      They all specify several dummy ctl_tables for the path name.
      This patch implements register_sysctl_path that takes
      an additional path name, and makes up dummy sysctl nodes
      for each component.
      
      This patch was originally written by Olaf Kirch and
      brought to my attention and reworked some by Olaf Hering.
      I have changed a few additional things so the bugs are mine.
      
      After converting all of the easy callers Olaf Hering observed
      allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes.
      
      .text +897
      .data -7008
      
         text    data     bss     dec     hex filename
         26959310        4045899 4718592 35723801        2211a19 ../vmlinux-vanilla
         26960207        4038891 4718592 35717690        221023a ../O-allyesconfig/vmlinux
      
      So this change is both a space savings and a code simplification.
      
      CC: Olaf Kirch <okir@suse.de>
      CC: Olaf Hering <olaf@aepfle.de>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29e796fd
  5. 20 11月, 2007 2 次提交
  6. 19 10月, 2007 4 次提交
  7. 01 8月, 2007 1 次提交
  8. 26 4月, 2007 4 次提交
  9. 25 4月, 2007 1 次提交
  10. 15 2月, 2007 13 次提交
  11. 09 2月, 2007 1 次提交
  12. 13 12月, 2006 1 次提交
  13. 11 12月, 2006 1 次提交
  14. 03 12月, 2006 4 次提交