1. 10 3月, 2011 1 次提交
  2. 08 3月, 2011 1 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  3. 16 5月, 2010 1 次提交
  4. 17 2月, 2010 2 次提交
  5. 07 1月, 2010 1 次提交
    • J
      net: RFC3069, private VLAN proxy arp support · 65324144
      Jesper Dangaard Brouer 提交于
      This is to be used together with switch technologies, like RFC3069,
      that where the individual ports are not allowed to communicate with
      each other, but they are allowed to talk to the upstream router.  As
      described in RFC 3069, it is possible to allow these hosts to
      communicate through the upstream router by proxy_arp'ing.
      
      This patch basically allow proxy arp replies back to the same
      interface (from which the ARP request/solicitation was received).
      
      Tunable per device via proc "proxy_arp_pvlan":
        /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan
      
      This switch technology is known by different vendor names:
       - In RFC 3069 it is called VLAN Aggregation.
       - Cisco and Allied Telesyn call it Private VLAN.
       - Hewlett-Packard call it Source-Port filtering or port-isolation.
       - Ericsson call it MAC-Forced Forwarding (RFC Draft).
      Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65324144
  6. 26 12月, 2009 1 次提交
    • J
      net: restore ip source validation · 28f6aeea
      Jamal Hadi Salim 提交于
      when using policy routing and the skb mark:
      there are cases where a back path validation requires us
      to use a different routing table for src ip validation than
      the one used for mapping ingress dst ip.
      One such a case is transparent proxying where we pretend to be
      the destination system and therefore the local table
      is used for incoming packets but possibly a main table would
      be used on outbound.
      Make the default behavior to allow the above and if users
      need to turn on the symmetry via sysctl src_valid_mark
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28f6aeea
  7. 04 12月, 2009 2 次提交
  8. 19 11月, 2009 1 次提交
  9. 18 11月, 2009 1 次提交
  10. 12 11月, 2009 1 次提交
  11. 11 11月, 2009 1 次提交
  12. 06 11月, 2009 1 次提交
  13. 24 9月, 2009 1 次提交
  14. 01 2月, 2009 1 次提交
  15. 17 10月, 2008 1 次提交
  16. 27 7月, 2008 4 次提交
    • A
      [PATCH] sanitize proc_sysctl · 9043476f
      Al Viro 提交于
      * keep references to ctl_table_head and ctl_table in /proc/sys inodes
      * grab the former during operations, use the latter for access to
        entry if that succeeds
      * have ->d_compare() check if table should be seen for one who does lookup;
        that allows us to avoid flipping inodes - if we have the same name resolve
        to different things, we'll just keep several dentries and ->d_compare()
        will reject the wrong ones.
      * have ->lookup() and ->readdir() scan the table of our inode first, then
        walk all ctl_table_header and scan ->attached_by for those that are
        attached to our directory.
      * implement ->getattr().
      * get rid of insane amounts of tree-walking
      * get rid of the need to know dentry in ->permission() and of the contortions
        induced by that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9043476f
    • A
      [PATCH] sysctl: keep track of tree relationships · ae7edecc
      Al Viro 提交于
      In a sense, that's the heart of the series.  It's based on the following
      property of the trees we are actually asked to add: they can be split into
      stem that is already covered by registered trees and crown that is entirely
      new.  IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
      introduced by it.
      
      That allows to associate tree and table entry with each node in the union;
      while directory nodes might be covered by many trees, only one will cover
      the node by its crown.  And that will allow much saner logics for /proc/sys
      in the next patches.  This patch introduces the data structures needed to
      keep track of that.
      
      When adding a sysctl table, we find a "parent" one.  Which is to say,
      find the deepest node on its stem that already is present in one of the
      tables from our table set or its ancestor sets.  That table will be our
      parent and that node in it - attachment point.  Add our table to list
      anchored in parent, have it refer the parent and contents of attachment
      point.  Also remember where its crown lives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ae7edecc
    • A
      [PATCH] allow delayed freeing of ctl_table_header · f7e6ced4
      Al Viro 提交于
      Refcount the sucker; instead of freeing it by the end of unregistration
      just drop the refcount and free only when it hits zero.  Make sure that
      we _always_ make ->unregistering non-NULL in start_unregistering().
      
      That allows anybody to get a reference to such puppy, preventing its
      freeing and reuse.  It does *not* block unregistration.  Anybody who
      holds such a reference can
      	* try to grab a "use" reference (ctl_head_grab()); that will
      succeeds if and only if it hadn't entered unregistration yet.  If it
      succeeds, we can use it in all normal ways until we release the "use"
      reference (with ctl_head_finish()).  Note that this relies on having
      ->unregistering become non-NULL in all cases when one starts to unregister
      the sucker.
      	* keep pointers to ctl_table entries; they *can* be freed if
      the entire thing is unregistered.  However, if ctl_head_grab() succeeds,
      we know that unregistration had not happened (and will not happen until
      ctl_head_finish()) and such pointers can be used safely.
      
      IOW, now we can have inodes under /proc/sys keep references to ctl_table
      entries, protecting them with references to ctl_table_header and
      grabbing the latter for the duration of operations that require access
      to ctl_table.  That won't cause deadlocks, since unregistration will not
      be stopped by mere keeping a reference to ctl_table_header.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f7e6ced4
    • A
      [PATCH] beginning of sysctl cleanup - ctl_table_set · 73455092
      Al Viro 提交于
      New object: set of sysctls [currently - root and per-net-ns].
      Contains: pointer to parent set, list of tables and "should I see this set?"
      method (->is_seen(set)).
      Current lists of tables are subsumed by that; net-ns contains such a beast.
      ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
      that to ->list of that ctl_table_set.
      
      [folded compile fixes by rdd for configs without sysctl]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73455092
  17. 29 4月, 2008 3 次提交
  18. 06 2月, 2008 1 次提交
    • S
      capabilities: introduce per-process capability bounding set · 3b7391de
      Serge E. Hallyn 提交于
      The capability bounding set is a set beyond which capabilities cannot grow.
       Currently cap_bset is per-system.  It can be manipulated through sysctl,
      but only init can add capabilities.  Root can remove capabilities.  By
      default it includes all caps except CAP_SETPCAP.
      
      This patch makes the bounding set per-process when file capabilities are
      enabled.  It is inherited at fork from parent.  Noone can add elements,
      CAP_SETPCAP is required to remove them.
      
      One example use of this is to start a safer container.  For instance, until
      device namespaces or per-container device whitelists are introduced, it is
      best to take CAP_MKNOD away from a container.
      
      The bounding set will not affect pP and pE immediately.  It will only
      affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
      and exec() does not constrain pI'.  So to really start a shell with no way
      of regain CAP_MKNOD, you would do
      
      	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
      	cap_t cap = cap_get_proc();
      	cap_value_t caparray[1];
      	caparray[0] = CAP_MKNOD;
      	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
      	cap_set_proc(cap);
      	cap_free(cap);
      
      The following test program will get and set the bounding
      set (but not pI).  For instance
      
      	./bset get
      		(lists capabilities in bset)
      	./bset drop cap_net_raw
      		(starts shell with new bset)
      		(use capset, setuid binary, or binary with
      		file capabilities to try to increase caps)
      
      ************************************************************
      cap_bound.c
      ************************************************************
       #include <sys/prctl.h>
       #include <linux/capability.h>
       #include <sys/types.h>
       #include <unistd.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
      
       #ifndef PR_CAPBSET_READ
       #define PR_CAPBSET_READ 23
       #endif
      
       #ifndef PR_CAPBSET_DROP
       #define PR_CAPBSET_DROP 24
       #endif
      
      int usage(char *me)
      {
      	printf("Usage: %s get\n", me);
      	printf("       %s drop <capability>\n", me);
      	return 1;
      }
      
       #define numcaps 32
      char *captable[numcaps] = {
      	"cap_chown",
      	"cap_dac_override",
      	"cap_dac_read_search",
      	"cap_fowner",
      	"cap_fsetid",
      	"cap_kill",
      	"cap_setgid",
      	"cap_setuid",
      	"cap_setpcap",
      	"cap_linux_immutable",
      	"cap_net_bind_service",
      	"cap_net_broadcast",
      	"cap_net_admin",
      	"cap_net_raw",
      	"cap_ipc_lock",
      	"cap_ipc_owner",
      	"cap_sys_module",
      	"cap_sys_rawio",
      	"cap_sys_chroot",
      	"cap_sys_ptrace",
      	"cap_sys_pacct",
      	"cap_sys_admin",
      	"cap_sys_boot",
      	"cap_sys_nice",
      	"cap_sys_resource",
      	"cap_sys_time",
      	"cap_sys_tty_config",
      	"cap_mknod",
      	"cap_lease",
      	"cap_audit_write",
      	"cap_audit_control",
      	"cap_setfcap"
      };
      
      int getbcap(void)
      {
      	int comma=0;
      	unsigned long i;
      	int ret;
      
      	printf("i know of %d capabilities\n", numcaps);
      	printf("capability bounding set:");
      	for (i=0; i<numcaps; i++) {
      		ret = prctl(PR_CAPBSET_READ, i);
      		if (ret < 0)
      			perror("prctl");
      		else if (ret==1)
      			printf("%s%s", (comma++) ? ", " : " ", captable[i]);
      	}
      	printf("\n");
      	return 0;
      }
      
      int capdrop(char *str)
      {
      	unsigned long i;
      
      	int found=0;
      	for (i=0; i<numcaps; i++) {
      		if (strcmp(captable[i], str) == 0) {
      			found=1;
      			break;
      		}
      	}
      	if (!found)
      		return 1;
      	if (prctl(PR_CAPBSET_DROP, i)) {
      		perror("prctl");
      		return 1;
      	}
      	return 0;
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc<2)
      		return usage(argv[0]);
      	if (strcmp(argv[1], "get")==0)
      		return getbcap();
      	if (strcmp(argv[1], "drop")!=0 || argc<3)
      		return usage(argv[0]);
      	if (capdrop(argv[2])) {
      		printf("unknown capability\n");
      		return 1;
      	}
      	return execl("/bin/bash", "/bin/bash", NULL);
      }
      ************************************************************
      
      [serue@us.ibm.com: fix typo]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>a
      Signed-off-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Tested-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7391de
  19. 01 2月, 2008 1 次提交
    • E
      [IPV4] route cache: Introduce rt_genid for smooth cache invalidation · 29e75252
      Eric Dumazet 提交于
      Current ip route cache implementation is not suited to large caches.
      
      We can consume a lot of CPU when cache must be invalidated, since we
      currently need to evict all cache entries, and this eviction is
      sometimes asynchronous. min_delay & max_delay can somewhat control this
      asynchronism behavior, but whole thing is a kludge, regularly triggering
      infamous soft lockup messages. When entries are still in use, this also
      consumes a lot of ram, filling dst_garbage.list.
      
      A better scheme is to use a generation identifier on each entry,
      so that cache invalidation can be performed by changing the table
      identifier, without having to scan all entries.
      No more delayed flushing, no more stalling when secret_interval expires.
      
      Invalidated entries will then be freed at GC time (controled by
      ip_rt_gc_timeout or stress), or when an invalidated entry is found
      in a chain when an insert is done.
      Thus we keep a normal equilibrium.
      
      This patch :
      - renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
      - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
      - Checks entry->rt_genid at appropriate places :
      29e75252
  20. 29 1月, 2008 3 次提交
    • E
      sysctl: Infrastructure for per namespace sysctls · e51b6ba0
      Eric W. Biederman 提交于
      This patch implements the basic infrastructure for per namespace sysctls.
      
      A list of lists of sysctl headers is added, allowing each namespace to have
      it's own list of sysctl headers.
      
      Each list of sysctl headers has a lookup function to find the first
      sysctl header in the list, allowing the lists to have a per namespace
      instance.
      
      register_sysct_root is added to tell sysctl.c about additional
      lists of sysctl_headers.  As all of the users are expected to be in
      kernel no unregister function is provided.
      
      sysctl_head_next is updated to walk through the list of lists.
      
      __register_sysctl_paths is added to add a new sysctl table on
      a non-default sysctl list.
      
      The only intrusive part of this patch is propagating the information
      to decided which list of sysctls to use for sysctl_check_table.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e51b6ba0
    • E
      sysctl: Remember the ctl_table we passed to register_sysctl_paths · 23eb06de
      Eric W. Biederman 提交于
      By doing this we allow users of register_sysctl_paths that build
      and dynamically allocate their ctl_table to be simpler.  This allows
      them to just remember the ctl_table_header returned from
      register_sysctl_paths from which they can now find the
      ctl_table array they need to free.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23eb06de
    • E
      sysctl: Add register_sysctl_paths function · 29e796fd
      Eric W. Biederman 提交于
      There are a number of modules that register a sysctl table
      somewhere deeply nested in the sysctl hierarchy, such as
      fs/nfs, fs/xfs, dev/cdrom, etc.
      
      They all specify several dummy ctl_tables for the path name.
      This patch implements register_sysctl_path that takes
      an additional path name, and makes up dummy sysctl nodes
      for each component.
      
      This patch was originally written by Olaf Kirch and
      brought to my attention and reworked some by Olaf Hering.
      I have changed a few additional things so the bugs are mine.
      
      After converting all of the easy callers Olaf Hering observed
      allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes.
      
      .text +897
      .data -7008
      
         text    data     bss     dec     hex filename
         26959310        4045899 4718592 35723801        2211a19 ../vmlinux-vanilla
         26960207        4038891 4718592 35717690        221023a ../O-allyesconfig/vmlinux
      
      So this change is both a space savings and a code simplification.
      
      CC: Olaf Kirch <okir@suse.de>
      CC: Olaf Hering <olaf@aepfle.de>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29e796fd
  21. 20 11月, 2007 2 次提交
  22. 19 10月, 2007 4 次提交
  23. 01 8月, 2007 1 次提交
  24. 26 4月, 2007 4 次提交