1. 25 1月, 2012 11 次提交
    • E
      sysctl: Stop requiring explicit management of sysctl directories · 7ec66d06
      Eric W. Biederman 提交于
      Simplify the code and the sysctl semantics by autogenerating
      sysctl directories when a sysctl table is registered that needs
      the directories and autodeleting the directories when there are
      no more sysctl tables registered that need them.
      
      Autogenerating directories keeps sysctl tables from depending
      on each other, removing all of the arcane register/unregister
      ordering constraints and makes it impossible to get the order
      wrong when reigsering and unregistering sysctl tables.
      
      Autogenerating directories yields one unique entity that dentries
      can point to, retaining the current effective use of the dcache.
      
      Add struct ctl_dir as the type of these new autogenerated
      directories.
      
      The attached_by and attached_to fields in ctl_table_header are
      removed as they are no longer needed.
      
      The child field in ctl_table is no longer needed by the core of
      the sysctl code.  ctl_table.child can be removed once all of the
      existing users have been updated.
      
      Benchmark before:
          make-dummies 0 999 -> 0.7s
          rmmod dummy        -> 0.07s
          make-dummies 0 9999 -> 1m10s
          rmmod dummy         -> 0.4s
      
      Benchmark after:
          make-dummies 0 999 -> 0.44s
          rmmod dummy        -> 0.065s
          make-dummies 0 9999 -> 1m36s
          rmmod dummy         -> 0.4s
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      7ec66d06
    • E
      sysctl: Add a root pointer to ctl_table_set · 9eb47c26
      Eric W. Biederman 提交于
      Add a ctl_table_root pointer to ctl_table set so it is easy to
      go from a ctl_table_set to a ctl_table_root.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      9eb47c26
    • E
      sysctl: Initial support for auto-unregistering sysctl tables. · 938aaa4f
      Eric W. Biederman 提交于
      Add nreg to ctl_table_header.  When nreg drops to 0 the ctl_table_header
      will be unregistered.
      
      Factor out drop_sysctl_table from unregister_sysctl_table, and add
      the logic for decrementing nreg.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      938aaa4f
    • E
      sysctl: Remove the now unused ctl_table parent field. · 8d6ecfcc
      Eric W. Biederman 提交于
      While useful at one time for selinux and the sysctl sanity
      checks those users no longer use the parent field and we can
      safely remove it.
      Inspired-by: NLucian Adrian Grijincu <lucian.grijincu@gmil.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      8d6ecfcc
    • E
      sysctl: register only tables of sysctl files · f728019b
      Eric W. Biederman 提交于
      Split the registration of a complex ctl_table array which may have
      arbitrary numbers of directories (->child != NULL) and tables of files
      into a series of simpler registrations that only register tables of files.
      
      Graphically:
      
         register('dir', { + file-a
                           + file-b
                           + subdir1
                             + file-c
                           + subdir2
                             + file-d
                             + file-e })
      
      is transformed into:
         wrapper->subheaders[0] = register('dir', {file1-a, file1-b})
         wrapper->subheaders[1] = register('dir/subdir1', {file-c})
         wrapper->subheaders[2] = register('dir/subdir2', {file-d, file-e})
         return wrapper
      
      This guarantees that __register_sysctl_table will only see a simple
      ctl_table array with all entries having (->child == NULL).
      
      Care was taken to pass the original simple ctl_table arrays to
      __register_sysctl_table whenever possible.
      
      This change is derived from a similar patch written
      by Lucrian Grijincu.
      Inspired-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f728019b
    • E
      sysctl: Add support for register sysctl tables with a normal cstring path. · 6e9d5164
      Eric W. Biederman 提交于
      Make __register_sysctl_table the core sysctl registration operation and
      make it take a char * string as path.
      
      Now that binary paths have been banished into the real of backwards
      compatibility in kernel/binary_sysctl.c where they can be safely
      ignored there is no longer a need to use struct ctl_path to represent
      path names when registering ctl_tables.
      
      Start the transition to using normal char * strings to represent
      pathnames when registering sysctl tables.  Normal strings are easier
      to deal with both in the internal sysctl implementation and for
      programmers registering sysctl tables.
      
      __register_sysctl_paths is turned into a backwards compatibility wrapper
      that converts a ctl_path array into a normal char * string.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      6e9d5164
    • E
      sysctl: Remove the unnecessary sysctl_set parent concept. · bd295b56
      Eric W. Biederman 提交于
      In sysctl_net register the two networking roots in the proper order.
      
      In register_sysctl walk the sysctl sets in the reverse order of the
      sysctl roots.
      
      Remove parent from ctl_table_set and setup_sysctl_set as it is no
      longer needed.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      bd295b56
    • E
      sysctl: Implement retire_sysctl_set · 97324cd8
      Eric W. Biederman 提交于
      This adds a small helper retire_sysctl_set to remove the intimate knowledge about
      the how a sysctl_set is implemented from net/sysct_net.c
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      97324cd8
    • E
      sysctl: Move the implementation into fs/proc/proc_sysctl.c · 1f87f0b5
      Eric W. Biederman 提交于
      Move the core sysctl code from kernel/sysctl.c and kernel/sysctl_check.c
      into fs/proc/proc_sysctl.c.
      
      Currently sysctl maintenance is hampered by the sysctl implementation
      being split across 3 files with artificial layering between them.
      Consolidate the entire sysctl implementation into 1 file so that
      it is easier to see what is going on and hopefully allowing for
      simpler maintenance.
      
      For functions that are now only used in fs/proc/proc_sysctl.c remove
      their declarations from sysctl.h and make them static in fs/proc/proc_sysctl.c
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      1f87f0b5
    • E
      sysctl: Register the base sysctl table like any other sysctl table. · de4e83bd
      Eric W. Biederman 提交于
      Simplify the code by treating the base sysctl table like any other
      sysctl table and register it with register_sysctl_table.
      
      To ensure this table is registered early enough to avoid problems
      call sysctl_init from proc_sys_init.
      
      Rename sysctl_net.c:sysctl_init() to net_sysctl_init() to avoid
      name conflicts now that kernel/sysctl.c:sysctl_init() is no longer
      static.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      de4e83bd
    • E
      sysctl: Consolidate !CONFIG_SYSCTL handling · 0ce8974d
      Eric W. Biederman 提交于
      - In sysctl.h move functions only available if CONFIG_SYSCL
        is defined inside of #ifdef CONFIG_SYSCTL
      
      - Move the stub function definitions for !CONFIG_SYSCTL
        into sysctl.h and make them static inlines.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      0ce8974d
  2. 04 1月, 2012 1 次提交
  3. 03 11月, 2011 1 次提交
  4. 04 10月, 2011 1 次提交
  5. 10 3月, 2011 2 次提交
  6. 08 3月, 2011 1 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  7. 16 5月, 2010 1 次提交
  8. 17 2月, 2010 2 次提交
  9. 07 1月, 2010 1 次提交
    • J
      net: RFC3069, private VLAN proxy arp support · 65324144
      Jesper Dangaard Brouer 提交于
      This is to be used together with switch technologies, like RFC3069,
      that where the individual ports are not allowed to communicate with
      each other, but they are allowed to talk to the upstream router.  As
      described in RFC 3069, it is possible to allow these hosts to
      communicate through the upstream router by proxy_arp'ing.
      
      This patch basically allow proxy arp replies back to the same
      interface (from which the ARP request/solicitation was received).
      
      Tunable per device via proc "proxy_arp_pvlan":
        /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan
      
      This switch technology is known by different vendor names:
       - In RFC 3069 it is called VLAN Aggregation.
       - Cisco and Allied Telesyn call it Private VLAN.
       - Hewlett-Packard call it Source-Port filtering or port-isolation.
       - Ericsson call it MAC-Forced Forwarding (RFC Draft).
      Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65324144
  10. 26 12月, 2009 1 次提交
    • J
      net: restore ip source validation · 28f6aeea
      Jamal Hadi Salim 提交于
      when using policy routing and the skb mark:
      there are cases where a back path validation requires us
      to use a different routing table for src ip validation than
      the one used for mapping ingress dst ip.
      One such a case is transparent proxying where we pretend to be
      the destination system and therefore the local table
      is used for incoming packets but possibly a main table would
      be used on outbound.
      Make the default behavior to allow the above and if users
      need to turn on the symmetry via sysctl src_valid_mark
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28f6aeea
  11. 04 12月, 2009 2 次提交
  12. 19 11月, 2009 1 次提交
  13. 18 11月, 2009 1 次提交
  14. 12 11月, 2009 1 次提交
  15. 11 11月, 2009 1 次提交
  16. 06 11月, 2009 1 次提交
  17. 24 9月, 2009 1 次提交
  18. 01 2月, 2009 1 次提交
  19. 17 10月, 2008 1 次提交
  20. 27 7月, 2008 4 次提交
    • A
      [PATCH] sanitize proc_sysctl · 9043476f
      Al Viro 提交于
      * keep references to ctl_table_head and ctl_table in /proc/sys inodes
      * grab the former during operations, use the latter for access to
        entry if that succeeds
      * have ->d_compare() check if table should be seen for one who does lookup;
        that allows us to avoid flipping inodes - if we have the same name resolve
        to different things, we'll just keep several dentries and ->d_compare()
        will reject the wrong ones.
      * have ->lookup() and ->readdir() scan the table of our inode first, then
        walk all ctl_table_header and scan ->attached_by for those that are
        attached to our directory.
      * implement ->getattr().
      * get rid of insane amounts of tree-walking
      * get rid of the need to know dentry in ->permission() and of the contortions
        induced by that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9043476f
    • A
      [PATCH] sysctl: keep track of tree relationships · ae7edecc
      Al Viro 提交于
      In a sense, that's the heart of the series.  It's based on the following
      property of the trees we are actually asked to add: they can be split into
      stem that is already covered by registered trees and crown that is entirely
      new.  IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
      introduced by it.
      
      That allows to associate tree and table entry with each node in the union;
      while directory nodes might be covered by many trees, only one will cover
      the node by its crown.  And that will allow much saner logics for /proc/sys
      in the next patches.  This patch introduces the data structures needed to
      keep track of that.
      
      When adding a sysctl table, we find a "parent" one.  Which is to say,
      find the deepest node on its stem that already is present in one of the
      tables from our table set or its ancestor sets.  That table will be our
      parent and that node in it - attachment point.  Add our table to list
      anchored in parent, have it refer the parent and contents of attachment
      point.  Also remember where its crown lives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ae7edecc
    • A
      [PATCH] allow delayed freeing of ctl_table_header · f7e6ced4
      Al Viro 提交于
      Refcount the sucker; instead of freeing it by the end of unregistration
      just drop the refcount and free only when it hits zero.  Make sure that
      we _always_ make ->unregistering non-NULL in start_unregistering().
      
      That allows anybody to get a reference to such puppy, preventing its
      freeing and reuse.  It does *not* block unregistration.  Anybody who
      holds such a reference can
      	* try to grab a "use" reference (ctl_head_grab()); that will
      succeeds if and only if it hadn't entered unregistration yet.  If it
      succeeds, we can use it in all normal ways until we release the "use"
      reference (with ctl_head_finish()).  Note that this relies on having
      ->unregistering become non-NULL in all cases when one starts to unregister
      the sucker.
      	* keep pointers to ctl_table entries; they *can* be freed if
      the entire thing is unregistered.  However, if ctl_head_grab() succeeds,
      we know that unregistration had not happened (and will not happen until
      ctl_head_finish()) and such pointers can be used safely.
      
      IOW, now we can have inodes under /proc/sys keep references to ctl_table
      entries, protecting them with references to ctl_table_header and
      grabbing the latter for the duration of operations that require access
      to ctl_table.  That won't cause deadlocks, since unregistration will not
      be stopped by mere keeping a reference to ctl_table_header.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f7e6ced4
    • A
      [PATCH] beginning of sysctl cleanup - ctl_table_set · 73455092
      Al Viro 提交于
      New object: set of sysctls [currently - root and per-net-ns].
      Contains: pointer to parent set, list of tables and "should I see this set?"
      method (->is_seen(set)).
      Current lists of tables are subsumed by that; net-ns contains such a beast.
      ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
      that to ->list of that ctl_table_set.
      
      [folded compile fixes by rdd for configs without sysctl]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73455092
  21. 29 4月, 2008 3 次提交
  22. 06 2月, 2008 1 次提交
    • S
      capabilities: introduce per-process capability bounding set · 3b7391de
      Serge E. Hallyn 提交于
      The capability bounding set is a set beyond which capabilities cannot grow.
       Currently cap_bset is per-system.  It can be manipulated through sysctl,
      but only init can add capabilities.  Root can remove capabilities.  By
      default it includes all caps except CAP_SETPCAP.
      
      This patch makes the bounding set per-process when file capabilities are
      enabled.  It is inherited at fork from parent.  Noone can add elements,
      CAP_SETPCAP is required to remove them.
      
      One example use of this is to start a safer container.  For instance, until
      device namespaces or per-container device whitelists are introduced, it is
      best to take CAP_MKNOD away from a container.
      
      The bounding set will not affect pP and pE immediately.  It will only
      affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
      and exec() does not constrain pI'.  So to really start a shell with no way
      of regain CAP_MKNOD, you would do
      
      	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
      	cap_t cap = cap_get_proc();
      	cap_value_t caparray[1];
      	caparray[0] = CAP_MKNOD;
      	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
      	cap_set_proc(cap);
      	cap_free(cap);
      
      The following test program will get and set the bounding
      set (but not pI).  For instance
      
      	./bset get
      		(lists capabilities in bset)
      	./bset drop cap_net_raw
      		(starts shell with new bset)
      		(use capset, setuid binary, or binary with
      		file capabilities to try to increase caps)
      
      ************************************************************
      cap_bound.c
      ************************************************************
       #include <sys/prctl.h>
       #include <linux/capability.h>
       #include <sys/types.h>
       #include <unistd.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
      
       #ifndef PR_CAPBSET_READ
       #define PR_CAPBSET_READ 23
       #endif
      
       #ifndef PR_CAPBSET_DROP
       #define PR_CAPBSET_DROP 24
       #endif
      
      int usage(char *me)
      {
      	printf("Usage: %s get\n", me);
      	printf("       %s drop <capability>\n", me);
      	return 1;
      }
      
       #define numcaps 32
      char *captable[numcaps] = {
      	"cap_chown",
      	"cap_dac_override",
      	"cap_dac_read_search",
      	"cap_fowner",
      	"cap_fsetid",
      	"cap_kill",
      	"cap_setgid",
      	"cap_setuid",
      	"cap_setpcap",
      	"cap_linux_immutable",
      	"cap_net_bind_service",
      	"cap_net_broadcast",
      	"cap_net_admin",
      	"cap_net_raw",
      	"cap_ipc_lock",
      	"cap_ipc_owner",
      	"cap_sys_module",
      	"cap_sys_rawio",
      	"cap_sys_chroot",
      	"cap_sys_ptrace",
      	"cap_sys_pacct",
      	"cap_sys_admin",
      	"cap_sys_boot",
      	"cap_sys_nice",
      	"cap_sys_resource",
      	"cap_sys_time",
      	"cap_sys_tty_config",
      	"cap_mknod",
      	"cap_lease",
      	"cap_audit_write",
      	"cap_audit_control",
      	"cap_setfcap"
      };
      
      int getbcap(void)
      {
      	int comma=0;
      	unsigned long i;
      	int ret;
      
      	printf("i know of %d capabilities\n", numcaps);
      	printf("capability bounding set:");
      	for (i=0; i<numcaps; i++) {
      		ret = prctl(PR_CAPBSET_READ, i);
      		if (ret < 0)
      			perror("prctl");
      		else if (ret==1)
      			printf("%s%s", (comma++) ? ", " : " ", captable[i]);
      	}
      	printf("\n");
      	return 0;
      }
      
      int capdrop(char *str)
      {
      	unsigned long i;
      
      	int found=0;
      	for (i=0; i<numcaps; i++) {
      		if (strcmp(captable[i], str) == 0) {
      			found=1;
      			break;
      		}
      	}
      	if (!found)
      		return 1;
      	if (prctl(PR_CAPBSET_DROP, i)) {
      		perror("prctl");
      		return 1;
      	}
      	return 0;
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc<2)
      		return usage(argv[0]);
      	if (strcmp(argv[1], "get")==0)
      		return getbcap();
      	if (strcmp(argv[1], "drop")!=0 || argc<3)
      		return usage(argv[0]);
      	if (capdrop(argv[2])) {
      		printf("unknown capability\n");
      		return 1;
      	}
      	return execl("/bin/bash", "/bin/bash", NULL);
      }
      ************************************************************
      
      [serue@us.ibm.com: fix typo]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>a
      Signed-off-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Tested-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7391de