1. 23 11月, 2012 3 次提交
    • P
      tipc: delete TIPC_ADVANCED Kconfig variable · 94fc9c47
      Paul Gortmaker 提交于
      There used to be a time when TIPC had lots of Kconfig knobs the
      end user could alter, but they have all been made automatic or
      obsolete, with the exception of CONFIG_TIPC_PORTS.  This
      previously existing set of options was all hidden under the
      TIPC_ADVANCED setting, which does not exist in any code, but
      only in Kconfig scope.
      
      Having this now, just to hide the one remaining "advanced"
      option no longer makes sense.  Remove it.  Also get rid of the
      ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
      possibly undefined.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      94fc9c47
    • Y
      tipc: eliminate an unnecessary cast of node variable · 4cb7d55a
      Ying Xue 提交于
      As the variable:node is currently defined to u32 type, it is
      unnecessary to cast its type to u32 again when using it.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      4cb7d55a
    • J
      tipc: introduce message to synchronize broadcast link · c64f7a6a
      Jon Maloy 提交于
      Upon establishing a first link between two nodes, there is
      currently a risk that the two endpoints will disagree on exactly
      which sequence number reception and acknowleding of broadcast
      packets should start.
      
      The following scenarios may happen:
      
      1: Node A sends an ACTIVATE message to B, telling it to start acking
         packets from sequence number N.
      2: Node A sends out broadcast N, but does not expect an acknowledge
         from B, since B is not yet in its broadcast receiver's list.
      3: Node A receives ACK for N from all nodes except B, and releases
         packet N.
      4: Node B receives the ACTIVATE, activates its link endpoint, and
         stores the value N as sequence number of first expected packet.
      5: Node B sends a NAME_DISTR message to A.
      6: Node A receives the NAME_DISTR message, and activates its endpoint.
         At this moment B is added to A's broadcast receiver's set.
         Node A also sets sequence number 0 as the first broadcast packet
         to be received from B.
      7: Node A sends broadcast N+1.
      8: B receives N+1, determines there is a gap in the sequence, since
         it is expecting N, and sends a NACK for N back to A.
      9: Node A has already released N, so no retransmission is possible.
         The broadcast link in direction A->B is stale.
      
      In addition to, or instead of, 7-9 above, the following may happen:
      
      10: Node B sends broadcast M > 0 to A.
      11: Node A receives M, falsely decides there must be a gap, since
          it is expecting packet 0, and asks for retransmission of packets
          [0,M-1].
      12: Node B has already released these packets, so the broadcast
          link is stale in direction B->A.
      
      We solve this problem by introducing a new unicast message type,
      BCAST_PROTOCOL/STATE, to convey the sequence number of the next
      sent broadcast packet to the other endpoint, at exactly the moment
      that endpoint is added to the own node's broadcast receivers list,
      and before any other unicast messages are permitted to be sent.
      
      Furthermore, we don't allow any node to start receiving and
      processing broadcast packets until this new synchronization
      message has been received.
      
      To maintain backwards compatibility, we still open up for
      broadcast reception if we receive a NAME_DISTR message without
      any preceding broadcast sync message. In this case, we must
      assume that the other end has an older code version, and will
      never send out the new synchronization message. Hence, for mixed
      old and new nodes, the issue arising in 7-12 of the above may
      happen with the same probability as before.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c64f7a6a
  2. 22 11月, 2012 6 次提交
    • Y
      tipc: rename supported flag to recv_permitted · 389dd9bc
      Ying Xue 提交于
      Rename the "supported" flag in bclink structure to "recv_permitted"
      to better reflect what it is used for. When this flag is set for a
      given node, we are permitted to receive and acknowledge broadcast
      messages from that node.  Convert it to a bool at the same time,
      since it is not used to store any numerical values.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      389dd9bc
    • Y
      tipc: remove supportable flag from bclink structure · 818f4da5
      Ying Xue 提交于
      The "supportable" flag in bclink structure is a compatibility flag
      indicating whether a peer node is capable of receiving TIPC broadcast
      messages. However, all TIPC versions since tipc-1.5, and after the
      inclusion in the upstream Linux kernel in 2006, support this capability.
      It is highly unlikely that anybody is still using such an old
      version of TIPC, let alone that they want to mix it with TIPC-2.0
      nodes. Therefore, we now remove the "supportable" flag.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      818f4da5
    • Y
      tipc: remove the bearer congestion mechanism · 3c294cb3
      Ying Xue 提交于
      Currently at the TIPC bearer layer there is the following congestion
      mechanism:
      
      Once sending packets has failed via that bearer, the bearer will be
      flagged as being in congested state at once. During bearer congestion,
      all packets arriving at link will be queued on the link's outgoing
      buffer.  When we detect that the state of bearer congestion has
      relaxed (e.g. some packets are received from the bearer) we will try
      our best to push all packets in the link's outgoing buffer until the
      buffer is empty, or until the bearer is congested again.
      
      However, in fact the TIPC bearer never receives any feedback from the
      device layer whether a send was successful or not, so it must always
      assume it was successful. Therefore, the bearer congestion mechanism
      as it exists currently is of no value.
      
      But the bearer blocking state is still useful for us. For example,
      when the physical media goes down/up, we need to change the state of
      the links bound to the bearer.  So the code maintaing the state
      information is not removed.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      3c294cb3
    • Y
      tipc: wake up all waiting threads at socket shutdown · 75031151
      Ying Xue 提交于
      When a socket is shut down, we should wake up all thread sleeping on
      it, instead of just one of them. Otherwise, when several threads are
      polling the same socket, and one of them does shutdown(), the
      remaining threads may end up sleeping forever.
      
      Also, to align socket usage with common practice in other stacks, we
      use one of the common socket callback handlers, sk_state_change(),
      to wake up pending users. This is similar to the usage in e.g.
      inet_shutdown(). [net/ipv4/af_inet.c].
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      75031151
    • E
      tipc: return POLLOUT for sockets in an unconnected state · c4fc298a
      Erik Hugne 提交于
      If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
      socket during link congestion, the connect message will be discarded
      and sendmsg will return EAGAIN. This is normal behavior, and the
      application is expected to poll the socket until POLLOUT is set,
      after which the connection attempt can be retried.
      However, the POLLOUT flag is never set for unconnected sockets and
      poll() always returns a zero mask. The application is then left without
      a trigger for when it can make another attempt at sending the message.
      
      The solution is to check if we're polling on an unconnected socket
      and set the POLLOUT flag if the TIPC port owned by this socket
      is not congested. The TIPC ports waiting on a specific link will be
      marked as 'not congested' when the link congestion have abated.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c4fc298a
    • Y
      tipc: fix race/inefficiencies in poll/wait behaviour · f288bef4
      Ying Xue 提交于
      When an application blocks at poll/select on a TIPC socket
      while requesting a specific event mask, both the filter_rcv() and
      wakeupdispatch() case will wake it up unconditionally whenever
      the state changes (i.e an incoming message arrives, or congestion
      has subsided).  No mask is used.
      
      To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
      with tipc_data_ready and tipc_write_space respectively, which makes
      tipc more in alignment with the rest of the networking code.  These
      pass the exact set of possible events to the waker in fs/select.c
      hence avoiding waking up blocked processes unnecessarily.
      
      In doing so, we uncover another issue -- that there needs to be a
      memory barrier in these poll/receive callbacks, otherwise we are
      subject to the the same race as documented above wq_has_sleeper()
      [in commit a57de0b4 "net: adding memory barrier to the poll and
      receive callbacks"].  So we need to replace poll_wait() with
      sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
      in these two new functions.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      f288bef4
  3. 21 11月, 2012 6 次提交
  4. 20 11月, 2012 17 次提交
  5. 19 11月, 2012 8 次提交
    • E
      net: Make CAP_NET_BIND_SERVICE per user namespace · 3594698a
      Eric W. Biederman 提交于
      Allow privileged users in any user namespace to bind to
      privileged sockets in network namespaces they control.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3594698a
    • E
      net: Enable a userns root rtnl calls that are safe for unprivilged users · b51642f6
      Eric W. Biederman 提交于
      - Only allow moving network devices to network namespaces you have
        CAP_NET_ADMIN privileges over.
      
      - Enable creating/deleting/modifying interfaces
      - Enable adding/deleting addresses
      - Enable adding/setting/deleting neighbour entries
      - Enable adding/removing routes
      - Enable adding/removing fib rules
      - Enable setting the forwarding state
      - Enable adding/removing ipv6 address labels
      - Enable setting bridge parameter
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b51642f6
    • E
      net: Enable some sysctls that are safe for the userns root · c027aab4
      Eric W. Biederman 提交于
      - Enable the per device ipv4 sysctls:
         net/ipv4/conf/<if>/forwarding
         net/ipv4/conf/<if>/mc_forwarding
         net/ipv4/conf/<if>/accept_redirects
         net/ipv4/conf/<if>/secure_redirects
         net/ipv4/conf/<if>/shared_media
         net/ipv4/conf/<if>/rp_filter
         net/ipv4/conf/<if>/send_redirects
         net/ipv4/conf/<if>/accept_source_route
         net/ipv4/conf/<if>/accept_local
         net/ipv4/conf/<if>/src_valid_mark
         net/ipv4/conf/<if>/proxy_arp
         net/ipv4/conf/<if>/medium_id
         net/ipv4/conf/<if>/bootp_relay
         net/ipv4/conf/<if>/log_martians
         net/ipv4/conf/<if>/tag
         net/ipv4/conf/<if>/arp_filter
         net/ipv4/conf/<if>/arp_announce
         net/ipv4/conf/<if>/arp_ignore
         net/ipv4/conf/<if>/arp_accept
         net/ipv4/conf/<if>/arp_notify
         net/ipv4/conf/<if>/proxy_arp_pvlan
         net/ipv4/conf/<if>/disable_xfrm
         net/ipv4/conf/<if>/disable_policy
         net/ipv4/conf/<if>/force_igmp_version
         net/ipv4/conf/<if>/promote_secondaries
         net/ipv4/conf/<if>/route_localnet
      
      - Enable the global ipv4 sysctl:
         net/ipv4/ip_forward
      
      - Enable the per device ipv6 sysctls:
         net/ipv6/conf/<if>/forwarding
         net/ipv6/conf/<if>/hop_limit
         net/ipv6/conf/<if>/mtu
         net/ipv6/conf/<if>/accept_ra
         net/ipv6/conf/<if>/accept_redirects
         net/ipv6/conf/<if>/autoconf
         net/ipv6/conf/<if>/dad_transmits
         net/ipv6/conf/<if>/router_solicitations
         net/ipv6/conf/<if>/router_solicitation_interval
         net/ipv6/conf/<if>/router_solicitation_delay
         net/ipv6/conf/<if>/force_mld_version
         net/ipv6/conf/<if>/use_tempaddr
         net/ipv6/conf/<if>/temp_valid_lft
         net/ipv6/conf/<if>/temp_prefered_lft
         net/ipv6/conf/<if>/regen_max_retry
         net/ipv6/conf/<if>/max_desync_factor
         net/ipv6/conf/<if>/max_addresses
         net/ipv6/conf/<if>/accept_ra_defrtr
         net/ipv6/conf/<if>/accept_ra_pinfo
         net/ipv6/conf/<if>/accept_ra_rtr_pref
         net/ipv6/conf/<if>/router_probe_interval
         net/ipv6/conf/<if>/accept_ra_rt_info_max_plen
         net/ipv6/conf/<if>/proxy_ndp
         net/ipv6/conf/<if>/accept_source_route
         net/ipv6/conf/<if>/optimistic_dad
         net/ipv6/conf/<if>/mc_forwarding
         net/ipv6/conf/<if>/disable_ipv6
         net/ipv6/conf/<if>/accept_dad
         net/ipv6/conf/<if>/force_tllao
      
      - Enable the global ipv6 sysctls:
         net/ipv6/bindv6only
         net/ipv6/icmp/ratelimit
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c027aab4
    • E
      net: Allow the userns root to control vlans. · 276996fd
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow the vlan ioctls:
      SET_VLAN_INGRESS_PRIORITY_CMD
      SET_VLAN_EGRESS_PRIORITY_CMD
      SET_VLAN_FLAG_CMD
      SET_VLAN_NAME_TYPE_CMD
      ADD_VLAN_CMD
      DEL_VLAN_CMD
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      276996fd
    • E
      net: Allow userns root to control the network bridge code. · cb990503
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow setting bridge paramters via sysfs.
      
      Allow all of the bridge ioctls:
      BRCTL_ADD_IF
      BRCTL_DEL_IF
      BRCTL_SET_BRDIGE_FORWARD_DELAY
      BRCTL_SET_BRIDGE_HELLO_TIME
      BRCTL_SET_BRIDGE_MAX_AGE
      BRCTL_SET_BRIDGE_AGING_TIME
      BRCTL_SET_BRIDGE_STP_STATE
      BRCTL_SET_BRIDGE_PRIORITY
      BRCTL_SET_PORT_PRIORITY
      BRCTL_SET_PATH_COST
      BRCTL_ADD_BRIDGE
      BRCTL_DEL_BRDIGE
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb990503
    • E
      net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm · df008c91
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow creation of af_key sockets.
      Allow creation of llc sockets.
      Allow creation of af_packet sockets.
      
      Allow sending xfrm netlink control messages.
      
      Allow binding to netlink multicast groups.
      Allow sending to netlink multicast groups.
      Allow adding and dropping netlink multicast groups.
      Allow sending to all netlink multicast groups and port ids.
      
      Allow reading the netfilter SO_IP_SET socket option.
      Allow sending netfilter netlink messages.
      Allow setting and getting ip_vs netfilter socket options.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df008c91
    • E
      net: Allow userns root to control ipv6 · af31f412
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed while
      resource control is left unchanged.
      
      Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
      Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
      Allow the SIOCADDRT ioctl to add ipv6 routes.
      Allow the SIOCDELRT ioctl to delete ipv6 routes.
      
      Allow creation of ipv6 raw sockets.
      
      Allow setting the IPV6_JOIN_ANYCAST socket option.
      Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
      socket option.
      
      Allow setting the IPV6_TRANSPARENT socket option.
      Allow setting the IPV6_HOPOPTS socket option.
      Allow setting the IPV6_RTHDRDSTOPTS socket option.
      Allow setting the IPV6_DSTOPTS socket option.
      Allow setting the IPV6_IPSEC_POLICY socket option.
      Allow setting the IPV6_XFRM_POLICY socket option.
      
      Allow sending packets with the IPV6_2292HOPOPTS control message.
      Allow sending packets with the IPV6_2292DSTOPTS control message.
      Allow sending packets with the IPV6_RTHDRDSTOPTS control message.
      
      Allow setting the multicast routing socket options on non multicast
      routing sockets.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
      setting up, changing and deleting tunnels over ipv6.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
      setting up, changing and deleting ipv6 over ipv4 tunnels.
      
      Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
      deleting, and changing the potential router list for ISATAP tunnels.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af31f412
    • E
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e804c6