1. 14 5月, 2011 5 次提交
    • V
      net: ipv4: add IPPROTO_ICMP socket kind · c319b4d7
      Vasiliy Kulikov 提交于
      This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
      ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
      without any special privileges.  In other words, the patch makes it
      possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
      order not to increase the kernel's attack surface, the new functionality
      is disabled by default, but is enabled at bootup by supporting Linux
      distributions, optionally with restriction to a group or a group range
      (see below).
      
      Similar functionality is implemented in Mac OS X:
      http://www.manpagez.com/man/4/icmp/
      
      A new ping socket is created with
      
          socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
      
      Message identifiers (octets 4-5 of ICMP header) are interpreted as local
      ports. Addresses are stored in struct sockaddr_in. No port numbers are
      reserved for privileged processes, port 0 is reserved for API ("let the
      kernel pick a free number"). There is no notion of remote ports, remote
      port numbers provided by the user (e.g. in connect()) are ignored.
      
      Data sent and received include ICMP headers. This is deliberate to:
      1) Avoid the need to transport headers values like sequence numbers by
      other means.
      2) Make it easier to port existing programs using raw sockets.
      
      ICMP headers given to send() are checked and sanitized. The type must be
      ICMP_ECHO and the code must be zero (future extensions might relax this,
      see below). The id is set to the number (local port) of the socket, the
      checksum is always recomputed.
      
      ICMP reply packets received from the network are demultiplexed according
      to their id's, and are returned by recv() without any modifications.
      IP header information and ICMP errors of those packets may be obtained
      via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
      quenches and redirects are reported as fake errors via the error queue
      (IP_RECVERR); the next hop address for redirects is saved to ee_info (in
      network order).
      
      socket(2) is restricted to the group range specified in
      "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
      that nobody (not even root) may create ping sockets.  Setting it to "100
      100" would grant permissions to the single group (to either make
      /sbin/ping g+s and owned by this group or to grant permissions to the
      "netadmins" group), "0 4294967295" would enable it for the world, "100
      4294967295" would enable it for the users, but not daemons.
      
      The existing code might be (in the unlikely case anyone needs it)
      extended rather easily to handle other similar pairs of ICMP messages
      (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
      etc.).
      
      Userspace ping util & patch for it:
      http://openwall.info/wiki/people/segoon/ping
      
      For Openwall GNU/*/Linux it was the last step on the road to the
      setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
      is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
      http://mirrors.kernel.org/openwall/Owl/current/iso/
      
      Initially this functionality was written by Pavel Kankovsky for
      Linux 2.4.32, but unfortunately it was never made public.
      
      All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
      the patch.
      
      PATCH v3:
          - switched to flowi4.
          - minor changes to be consistent with raw sockets code.
      
      PATCH v2:
          - changed ping_debug() to pr_debug().
          - removed CONFIG_IP_PING.
          - removed ping_seq_fops.owner field (unused for procfs).
          - switched to proc_net_fops_create().
          - switched to %pK in seq_printf().
      
      PATCH v1:
          - fixed checksumming bug.
          - CAP_NET_RAW may not create icmp sockets anymore.
      
      RFC v2:
          - minor cleanups.
          - introduced sysctl'able group range to restrict socket(2).
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c319b4d7
    • K
      convert old cpumask API into new one · f2019030
      KOSAKI Motohiro 提交于
      Adapt new API.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NFrank Blaschka <frank.blaschka@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2019030
    • U
      af_iucv: get rid of compile warning · 9f6298a6
      Ursula Braun 提交于
      -Wunused-but-set-variable generates compile warnings. The affected
      variables are removed.
      Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: NFrank Blaschka <frank.blaschka@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f6298a6
    • U
      iucv: get rid of compile warning · 5db79c06
      Ursula Braun 提交于
      -Wunused-but-set-variable generates a compile warning. The affected
      variable is removed.
      Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: NFrank Blaschka <frank.blaschka@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5db79c06
    • A
      ethtool: Added support for FW dump · 29dd54b7
      Anirban Chakraborty 提交于
      Added code to take FW dump via ethtool. Dump level can be controlled via setting the
      dump flag. A get function is provided to query the current setting of the dump flag.
      Dump data is obtained from the driver via a separate get function.
      
      Changes from v3:
      Fixed buffer length issue in ethtool_get_dump_data function.
      Updated kernel doc for ethtool_dump struct and get_dump_flag function.
      
      Changes from v2:
      Provided separate commands for get flag and data.
      Check for minimum of the two buffer length obtained via ethtool and driver and
      use that for dump buffer
      Pass up the driver return error codes up to the caller.
      Added kernel doc comments.
      Signed-off-by: NAnirban Chakraborty <anirban.chakraborty@qlogic.com>
      Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29dd54b7
  2. 13 5月, 2011 15 次提交
  3. 11 5月, 2011 20 次提交
    • S
      xfrm: Don't allow esn with disabled anti replay detection · 6fa5ddcc
      Steffen Klassert 提交于
      Unlike the standard case, disabled anti replay detection needs some
      nontrivial extra treatment on ESN. RFC 4303 states:
      
      Note: If a receiver chooses to not enable anti-replay for an SA, then
      the receiver SHOULD NOT negotiate ESN in an SA management protocol.
      Use of ESN creates a need for the receiver to manage the anti-replay
      window (in order to determine the correct value for the high-order
      bits of the ESN, which are employed in the ICV computation), which is
      generally contrary to the notion of disabling anti-replay for an SA.
      
      So return an error if an ESN state with disabled anti replay detection
      is inserted for now and add the extra treatment later if we need it.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fa5ddcc
    • S
      xfrm: Assign the inner mode output function to the dst entry · 43a4dea4
      Steffen Klassert 提交于
      As it is, we assign the outer modes output function to the dst entry
      when we create the xfrm bundle. This leads to two problems on interfamily
      scenarios. We might insert ipv4 packets into ip6_fragment when called
      from xfrm6_output. The system crashes if we try to fragment an ipv4
      packet with ip6_fragment. This issue was introduced with git commit
      ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
      as needed). The second issue is, that we might insert ipv4 packets in
      netfilter6 and vice versa on interfamily scenarios.
      
      With this patch we assign the inner mode output function to the dst entry
      when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
      mode is used and the right fragmentation and netfilter functions are called.
      We switch then to outer mode with the output_finish functions.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43a4dea4
    • E
      net: dev_close() should check IFF_UP · e14a5993
      Eric Dumazet 提交于
      Commit 44345724 (factorize sync-rcu call in
      unregister_netdevice_many) mistakenly removed one test from dev_close()
      
      Following actions trigger a BUG :
      
      modprobe bonding
      modprobe dummy
      ifconfig bond0 up
      ifenslave bond0 dummy0
      rmmod dummy
      
      dev_close() must not close a non IFF_UP device.
      
      With help from Frank Blaschka and Einar EL Lueck
      Reported-by: NFrank Blaschka <blaschka@linux.vnet.ibm.com>
      Reported-by: NEinar EL Lueck <ELELUECK@de.ibm.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e14a5993
    • E
      vlan: fix GVRP at dismantle time · 55aee10d
      Eric Dumazet 提交于
      ip link add link eth2 eth2.103 type vlan id 103 gvrp on loose_binding on
      ip link set eth2.103 up
      rmmod tg3    # driver providing eth2
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp]
       PGD 11d251067 PUD 11b9e0067 PMD 0
       Oops: 0000 [#1] SMP
       last sysfs file: /sys/devices/virtual/net/eth2.104/ifindex
       CPU 0
       Modules linked in: tg3(-) 8021q garp nfsd lockd auth_rpcgss sunrpc libphy sg [last unloaded: x_tables]
      
       Pid: 11494, comm: rmmod Tainted: G        W   2.6.39-rc6-00261-gfd71257-dirty #580 HP ProLiant BL460c G6
       RIP: 0010:[<ffffffffa0030c9e>]  [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp]
       RSP: 0018:ffff88007a19bae8  EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff88011b5e2000 RCX: 0000000000000002
       RDX: 0000000000000000 RSI: 0000000000000175 RDI: ffffffffa0030d5b
       RBP: ffff88007a19bb18 R08: 0000000000000001 R09: ffff88011bd64a00
       R10: ffff88011d34ec00 R11: 0000000000000000 R12: 0000000000000002
       R13: ffff88007a19bc48 R14: ffff88007a19bb88 R15: 0000000000000001
       FS:  0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f77d76c0
       CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
       CR2: 0000000000000000 CR3: 000000011a675000 CR4: 00000000000006f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process rmmod (pid: 11494, threadinfo ffff88007a19a000, task ffff8800798595c0)
       Stack:
        ffff88007a19bb36 ffff88011c84b800 ffff88011b5e2000 ffff88007a19bc48
        ffff88007a19bb88 0000000000000006 ffff88007a19bb38 ffffffffa003a5f6
        ffff88007a19bb38 670088007a19bba8 ffff88007a19bb58 ffffffffa00397e7
       Call Trace:
        [<ffffffffa003a5f6>] vlan_gvrp_request_leave+0x46/0x50 [8021q]
        [<ffffffffa00397e7>] vlan_dev_stop+0xb7/0xc0 [8021q]
        [<ffffffff8137e427>] __dev_close_many+0x87/0xe0
        [<ffffffff8137e507>] dev_close_many+0x87/0x110
        [<ffffffff8137e630>] rollback_registered_many+0xa0/0x240
        [<ffffffff8137e7e9>] unregister_netdevice_many+0x19/0x60
        [<ffffffffa00389eb>] vlan_device_event+0x53b/0x550 [8021q]
        [<ffffffff8143f448>] ? ip6mr_device_event+0xa8/0xd0
        [<ffffffff81479d03>] notifier_call_chain+0x53/0x80
        [<ffffffff81062539>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff81062551>] raw_notifier_call_chain+0x11/0x20
        [<ffffffff8137df82>] call_netdevice_notifiers+0x32/0x60
        [<ffffffff8137e69f>] rollback_registered_many+0x10f/0x240
        [<ffffffff8137e85f>] rollback_registered+0x2f/0x40
        [<ffffffff8137e8c8>] unregister_netdevice_queue+0x58/0x90
        [<ffffffff8137e9eb>] unregister_netdev+0x1b/0x30
        [<ffffffffa005d73f>] tg3_remove_one+0x6f/0x10b [tg3]
      
      We should call vlan_gvrp_request_leave() from unregister_vlan_dev(),
      not from vlan_dev_stop(), because vlan_gvrp_uninit_applicant()
      is called right after unregister_netdevice_queue(). In batch mode,
      unregister_netdevice_queue() doesn’t immediately call vlan_dev_stop().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55aee10d
    • E
      net: fix two lockdep splats · 1fc19aff
      Eric Dumazet 提交于
      Commit e67f88dd (net: dont hold rtnl mutex during netlink dump
      callbacks) switched rtnl protection to RCU, but we forgot to adjust two
      rcu_dereference() lockdep annotations :
      
      inet_get_link_af_size() or inet_fill_link_af() might be called with
      rcu_read_lock or rtnl held, so use rcu_dereference_rtnl()
      instead of rtnl_dereference()
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc19aff
    • D
      ipv4: xfrm: Eliminate ->rt_src reference in policy code. · 8f01cb08
      David S. Miller 提交于
      Rearrange xfrm4_dst_lookup() so that it works by calling a helper
      function __xfrm_dst_lookup() that takes an explicit flow key storage
      area as an argument.
      
      Use this new helper in xfrm4_get_saddr() so we can fetch the selected
      source address from the flow instead of from rt->rt_src
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f01cb08
    • D
      sctp: Remove rt->rt_src usage in sctp_v4_get_saddr() · 902ebd3e
      David S. Miller 提交于
      Flow key is available, so fetch it from there.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      902ebd3e
    • D
      ipv4: udp: Eliminate remaining uses of rt->rt_src · 79ab0531
      David S. Miller 提交于
      We already track and pass around the correct flow key,
      so simply use it in udp_send_skb().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ab0531
    • D
      ipv4: icmp: Eliminate remaining uses of rt->rt_src · 9f6abb5f
      David S. Miller 提交于
      On input packets, rt->rt_src always equals ip_hdr(skb)->saddr
      
      Anything that mangles or otherwise changes the IP header must
      relookup the route found at skb_rtable().  Therefore this
      invariant must always hold true.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f6abb5f
    • D
      ipv4: Pass explicit daddr arg to ip_send_reply(). · 0a5ebb80
      David S. Miller 提交于
      This eliminates an access to rt->rt_src.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a5ebb80
    • A
      tipc: Revise timings used when sending link request messages · 972a77fb
      Allan Stephens 提交于
      Revises the algorithm governing the sending of link request messages
      to take into account the number of nodes each bearer is currently in
      contact with, and to ensure more rapid rediscovery of neighboring nodes
      if a bearer fails and then recovers.
      
      The discovery object now sends requests at least once a second if it
      is not in contact with any other nodes, and at least once a minute if
      it has at least one neighbor; if contact with the only neighbor is
      lost, the object immediately reverts to its initial rapid-fire search
      timing to accelerate the rediscovery process.
      
      In addition, the discovery object now stops issuing link request
      messages if it is in contact with the only neighboring node it is
      configured to communicate with, since further searching is unnecessary.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      972a77fb
    • A
      tipc: Add monitoring of number of nodes discovered by bearer · 1209966c
      Allan Stephens 提交于
      Augments TIPC's discovery object to track the number of neighboring nodes
      having an active link to the associated bearer.
      
      This means tipc_disc_update_link_req() becomes either one of:
      
             tipc_disc_add_dest()
      or:
             tipc_disc_remove_dest()
      
      depending on the code flow direction of things.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      1209966c
    • A
      tipc: Enhance sending of discovery object link request messages · 691a6207
      Allan Stephens 提交于
      Augments TIPC's discovery object to send its initial neighbor discovery
      request message as soon as the associated bearer is created, rather than
      waiting for its first periodic timeout to occur, thereby speeding up the
      discovery process. Also adds a check to suppress the initial request or
      subsequent requests if the bearer is blocked at the time the request is
      scheduled for transmission.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      691a6207
    • A
      tipc: Enhance handling of discovery object creation failures · 3a777ff8
      Allan Stephens 提交于
      Modifies bearer creation and deletion code to improve handling of
      scenarios when a neighbor discovery object cannot be created. The
      creation routine now aborts the creation of a bearer if its discovery
      object cannot be created, and deletes the newly created bearer, rather
      than failing quietly and leaving an unusable bearer hanging around.
      
      Since the exit via the goto label really isn't a definitive failure
      in all cases, relabel it appropriately.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      3a777ff8
    • A
      tipc: Introduce routine to enqueue a chain of messages on link tx queue · dc63d91e
      Allan Stephens 提交于
      Create a helper routine to enqueue a chain of sk_buffs to a link's
      transmit queue.  It improves readability and the new function is
      anticipated to be used more than just once in the future as well.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      dc63d91e
    • A
      tipc: Avoid recomputation of outgoing message length · 26896904
      Allan Stephens 提交于
      Rework TIPC's message sending routines to take advantage of the total
      amount of data value passed to it by the kernel socket infrastructure.
      This change eliminates the need for TIPC to compute the size of outgoing
      messages itself, as well as the check for an oversize message in
      tipc_msg_build().  In addition, this change warrants an explanation:
      
         -     res = send_packet(NULL, sock, &my_msg, 0);
         +     res = send_packet(NULL, sock, &my_msg, bytes_to_send);
      
      Previously, the final argument to send_packet() was ignored (since the
      amount of data being sent was recalculated by a lower-level routine)
      and we could just pass in a dummy value (0). Now that the
      recalculation is being eliminated, the argument value being passed to
      send_packet() is significant and we have to supply the actual amount
      of data we want to send.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      26896904
    • A
      tipc: Abort excessive send requests as early as possible · c29c3f70
      Allan Stephens 提交于
      Adds checks to TIPC's socket send routines to promptly detect and
      abort attempts to send more than 66,000 bytes in a single TIPC
      message or more than 2**31-1 bytes in a single TIPC byte stream request.
      In addition, this ensures that the number of iovecs in a send request
      does not exceed the limits of a standard integer variable.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c29c3f70
    • A
      tipc: Strengthen checks for neighboring node discovery · 66e019a6
      Allan Stephens 提交于
      Enhances existing checks on the discovery domain associated with a TIPC
      bearer. A bearer can no longer be configured to accept links from itself
      only (which would be pointless), or to nodes outside its own cluster
      (since multi-cluster support has now been removed from TIPC). Also, the
      neighbor discovery routine now validates link setup requests against the
      configured discovery domain for the bearer, rather than simply ensuring
      the requesting node belongs to the node's own cluster.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      66e019a6
    • P
      tipc: make zone/cluster mask constants a define · 1f3de471
      Paul Gortmaker 提交于
      This allows them to be available for easy re-use in other places
      and avoids trivial mistakes caused by  "count the f's and 0's".
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      1f3de471
    • A
      tipc: Fix sk_buff leaks when link congestion is detected · bebc55ae
      Allan Stephens 提交于
      Modifies a TIPC send routine that did not discard the outgoing sk_buff
      if it was not transmitted because of link congestion; this eliminates
      the potential for buffer leakage in the many callers who did not clean up
      the unsent buffer. (The two routines that previously did discard the unsent
      buffer have been updated to eliminate their now-redundant clean up.)
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      bebc55ae