1. 17 2月, 2014 1 次提交
  2. 15 2月, 2014 7 次提交
  3. 14 2月, 2014 22 次提交
    • Y
      sch_netem: replace magic numbers with enumerate in GE model · c045a734
      Yang Yingliang 提交于
      Replace some magic numbers which describe states of GE model
      loss generator with enumerate.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c045a734
    • Y
      sch_netem: change some func's param from "struct Qdisc *" to "struct netem_sched_data *" · 49545a77
      Yang Yingliang 提交于
      In netem_change(), we have already get "struct netem_sched_data *q".
      Replace params of get_correlation() and other similar functions with
      "struct netem_sched_data *q".
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49545a77
    • Y
      sch_netem: return errcode before setting params · 54a4b05c
      Yang Yingliang 提交于
      get_dist_table() and get_loss_clg() may be failed. These
      two functions should be called after setting the members
      of qdisc_priv(sch), or it will break the old settings while
      either of them is failed.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54a4b05c
    • D
      ipv4: ip_forward: perform skb->pkt_type check at the beginning · d4f2fa6a
      Denis Kirjanov 提交于
      Packets which have L2 address different from ours should be
      already filtered before entering into ip_forward().
      
      Perform that check at the beginning to avoid processing such packets.
      Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4f2fa6a
    • S
      net: remove unnecessary return's · 2045ceae
      stephen hemminger 提交于
      One of my pet coding style peeves is the practice of
      adding extra return; at the end of function.
      Kill several instances of this in network code.
      
      I suppose some coccinelle wizardy could do this automatically.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2045ceae
    • S
      tcp: remove unused min_cwnd member of tcp_congestion_ops · 45f74359
      Stanislav Fomichev 提交于
      Commit 684bad11 "tcp: use PRR to reduce cwin in CWR state" removed all
      calls to min_cwnd, so we can safely remove it.
      Also, remove tcp_reno_min_cwnd because it was only used for min_cwnd.
      Signed-off-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f74359
    • Y
      socket: replace some printk with pr_* · 3410f22e
      Yang Yingliang 提交于
      Prefer pr_*(...) to printk(KERN_* ...).
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3410f22e
    • J
      tipc: add node_lock protection to link lookup function · e099e86c
      Jon Paul Maloy 提交于
      In an earlier commit, ("tipc: remove links list from bearer struct")
      we described three issues that need to be pre-emptively resolved before
      we can remove tipc_net_lock. Here we resolve issue a) described in that
      commit:
      
      "a) In access method #2, we access the link before taking the
          protecting node_lock. This will not work once net_lock is gone,
          so we will have to change the access order. We will deal with
          this in a later commit in this series."
      
      Here, we change that access order, by ensuring that the function
      link_find_link() returns only a safe reference for finding
      the link, i.e., a node pointer and an index into its 'links' array,
      not the link pointer itself. We also change all callers of this
      function to first take the node lock before they can check if there
      still is a valid link pointer at the returned index. Since the
      function now returns a node pointer rather than a link pointer,
      we rename it to the more appropriate 'tipc_link_find_owner().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e099e86c
    • Y
      tipc: remove bearer_lock from tipc_bearer struct · a8304529
      Ying Xue 提交于
      After the earlier commits ("tipc: remove 'links' list from
      tipc_bearer struct") and ("tipc: introduce new spinlock to protect
      struct link_req"), there is no longer any need to protect struct
      link_req or or any link list by use of bearer_lock. Furthermore,
      we have eliminated the need for using bearer_lock during downcalls
      (send) from the link to the bearer, since we have ensured that
      bearers always have a longer life cycle that their associated links,
      and always contain valid data.
      
      So, the only need now for a lock protecting bearers is for guaranteeing
      consistency of the bearer list itself. For this, it is sufficient, at
      least for the time being, to continue applying 'net_lock´ in write mode.
      
      By removing bearer_lock we also pre-empt introduction of issue b) descibed
      in the previous commit "tipc: remove 'links' list from tipc_bearer struct":
      
      "b) When the outer protection from net_lock is gone, taking
          bearer_lock and node_lock in opposite order of method 1) and 2)
          will become an obvious deadlock hazard".
      
      Therefore, we now eliminate the bearer_lock spinlock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8304529
    • J
      tipc: delay delete of link when failover is needed · 7d33939f
      Jon Paul Maloy 提交于
      When a bearer is disabled, all its attached links are deleted.
      Ideally, we should do link failover to redundant links on other bearers,
      if there are any, in such cases. This would be consistent with current
      behavior when a link is reset, but not deleted. However, due to the
      complexity involved, and the (wrongly) perceived low demand for this
      feature, it was never implemented until now.
      
      We mark the doomed link for deletion with a new flag, but wait until the
      failover process is finished before we actually delete it. With the
      improved link tunnelling/failover code introduced earlier in this commit
      series, it is now easy to identify a spot in the code where the failover
      is finished and it is safe to delete the marked link. Moreover, the test
      for the flag and the deletion can be done synchronously, and outside the
      most time critical data path.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d33939f
    • J
      tipc: changes to general packet reception algorithm · a5377831
      Jon Paul Maloy 提交于
      We change the order of checking for destination users when processing
      incoming packets. By placing the checks for users that may potentially
      replace the processed buffer, i.e., CHANGEOVER_PROTOCOL and
      MSG_FRAGMENTER, in a separate step before we check for the true end
      users, we get rid of a label and a 'goto', at the same time making the
      code more comprehensible and easy to follow.
      
      This commit does not change any functionality, it is just a cosmetic
      code reshuffle.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5377831
    • J
      tipc: rename stack variables in function tipc_link_tunnel_rcv · 02842f71
      Jon Paul Maloy 提交于
      After the previous redesign of the tunnel reception algorithm and
      functions, we finalize it by renaming a couple of stack variables
      in tipc_tunnel_rcv(). This makes it more consistent with the naming
      scheme elsewhere in this part of the code.
      
      This change is purely cosmetic, with no functional changes.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02842f71
    • J
      tipc: more cleanup of tunnelling reception function · 1e9d47a9
      Jon Paul Maloy 提交于
      We simplify and slim down the code in function tipc_tunnel_rcv()
      No impact on the users of this function.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e9d47a9
    • J
      tipc: change signature of tunnelling reception function · 3bb53380
      Jon Paul Maloy 提交于
      After the earlier commits in this series related to the function
      tipc_link_tunnel_rcv(), we can now go further and simplify its
      signature.
      
      The function now consumes all DUPLICATE packets, and only returns such
      ORIGINAL packets that are ready for immediate delivery, i.e., no
      more link level protocol processing needs to be done by the caller.
      As a consequence, the the caller, tipc_rcv(), does not access the link
      pointer after call return, and it becomes unnecessary to pass a link
      pointer reference in the call. Instead, we now only pass it the tunnel
      link's owner node, which is sufficient to find the destination link for
      the tunnelled packet.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bb53380
    • J
      tipc: change reception of tunnelled failover packets · f006c9c7
      Jon Paul Maloy 提交于
      When a link is reset, and there is a redundant link available, all
      sender sockets will steer their subsequent traffic through the
      remaining link. In order to guarantee preserved packet order and
      cardinality during the transition, we tunnel the failing link's send
      queue through the remaining link before we allow any sockets to use it.
      
      In this commit, we change the algorithm for receiving failover
      ("ORIGINAL_MSG") packets in tipc_link_tunnel_rcv(), at the same time
      delegating it to a new subfuncton, tipc_link_failover_rcv(). Instead
      of directly returning an extracted inner packet to the packet reception
      loop in tipc_rcv(), we first check if it is a message fragment, in which
      case we append it to the reset link's fragment chain. If the fragment
      chain is complete, we return the whole chain instead of the individual
      buffer, eliminating any need for the tipc_rcv() loop to do reassembly of
      tunneled packets.
      
      This change makes it possible to further simplify tipc_link_tunnel_rcv(),
      as well as the calling tipc_rcv() loop. We will do that in later
      commits. It also makes it possible to identify a single spot in the code
      where we can tell that a failover procedure is finished, something that
      is useful when we are deleting links after a failover. This will also
      be done in a later commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f006c9c7
    • J
      tipc: change reception of tunnelled duplicate packets · 1dab3d5a
      Jon Paul Maloy 提交于
      When a second link to a destination comes up, some sender sockets will
      steer their subsequent traffic through the new link. In order to
      guarantee preserved packet order and cardinality for those sockets, we
      tunnel a duplicate of the old link's send queue through the new link
      before we open it for regular traffic. The last arriving packet copy,
      on whichever link, will be dropped at the receiving end based on the
      original sequence number, to ensure that only one copy is delivered to
      the end receiver.
      
      In this commit, we change the algorithm for receiving DUPLICATE_MSG
      packets, at the same time delegating it to a new subfunction,
      tipc_link_dup_rcv(). Instead of returning an extracted inner packet to
      the packet reception loop in tipc_rcv(), we just add it to the receiving
      (new) link's deferred packet queue. The packet will then be processed by
      that link when it receives its first non-tunneled packet, i.e., at
      latest when the changeover procedure is finished.
      
      Because tipc_link_tunnel_rcv()/tipc_link_dup_rcv() now is consuming all
      packets of type DUPLICATE_MSG, the calling tipc_rcv() function can omit
      testing for this. This in turn means that the current conditional jump
      to the label 'protocol_check' becomes redundant, and we can remove that
      label.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dab3d5a
    • Y
      tipc: remove 'links' list from tipc_bearer struct · c61dd61d
      Ying Xue 提交于
      In our ongoing effort to simplify the TIPC locking structure,
      we see a need to remove the linked list for tipc_links
      in the bearer. This can be explained as follows.
      
      Currently, we have three different ways to access a link,
      via three different lists/tables:
      
      1: Via a node hash table:
         Used by the time-critical outgoing/incoming data paths.
         (e.g. link_send_sections_fast() and tipc_recv_msg() ):
      
      grab net_lock(read)
         find node from node hash table
         grab node_lock
             select link
             grab bearer_lock
                send_msg()
             release bearer_lock
         release node lock
      release net_lock
      
      2: Via a global linked list for nodes:
         Used by configuration commands (link_cmd_set_value())
      
      grab net_lock(read)
         find node and link from global node list (using link name)
         grab node_lock
             update link
         release node lock
      release net_lock
      
      (Same locking order as above. No problem.)
      
      3: Via the bearer's linked link list:
         Used by notifications from interface (e.g. tipc_disable_bearer() )
      
      grab net_lock(write)
         grab bearer_lock
            get link ptr from bearer's link list
            get node from link
            grab node_lock
               delete link
            release node lock
         release bearer_lock
      release net_lock
      
      (Different order from above, but works because we grab the
      outer net_lock in write mode first, excluding all other access.)
      
      The first major goal in our simplification effort is to get rid
      of the "big" net_lock, replacing it with rcu-locks when accessing
      the node list and node hash array. This will come in a later patch
      series.
      
      But to get there we first need to rewrite access methods ##2 and 3,
      since removal of net_lock would introduce three major problems:
      
      a) In access method #2, we access the link before taking the
         protecting node_lock. This will not work once net_lock is gone,
         so we will have to change the access order. We will deal with
         this in a later commit in this series, "tipc: add node lock
         protection to link found by link_find_link()".
      
      b) When the outer protection from net_lock is gone, taking
         bearer_lock and node_lock in opposite order of method 1) and 2)
         will become an obvious deadlock hazard. This is fixed in the
         commit ("tipc: remove bearer_lock from tipc_bearer struct")
         later in this series.
      
      c) Similar to what is described in problem a), access method #3
         starts with using a link pointer that is unprotected by node_lock,
         in order to via that pointer find the correct node struct and
         lock it. Before we remove net_lock, this access order must be
         altered. This is what we do with this commit.
      
      We can avoid introducing problem problem c) by even here using the
      global node list to find the node, before accessing its links. When
      we loop though the node list we use the own bearer identity as search
      criteria, thus easily finding the links that are associated to the
      resetting/disabling bearer. It should be noted that although this
      method is somewhat slower than the current list traversal, it is in
      no way time critical. This is only about resetting or deleting links,
      something that must be considered relatively infrequent events.
      
      As a bonus, we can get rid of the mutual pointers between links and
      bearers. After this commit, pointer dependency go in one direction
      only: from the link to the bearer.
      
      This commit pre-empts introduction of problem c) as described above.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c61dd61d
    • Y
      tipc: redefine 'started' flag in struct link to bitmap · 135daee6
      Ying Xue 提交于
      Currently, the 'started' field in struct tipc_link represents only a
      binary state, 'started' or 'not started'. We need it to represent
      more link execution states in the coming commits in this series.
      Hence, we rename the field to 'flags', and define the current
      started/non-started state to be represented by the LSB bit of
      that field.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      135daee6
    • Y
      tipc: move code for deleting links from bearer.c to link.c · 8d8439b6
      Ying Xue 提交于
      We break out the code for deleting attached links in the
      function bearer_disable(), and define a new function named
      tipc_link_delete_list() to do this job.
      
      This commit incurs no functional changes, but makes the code of
      function bearer_disable() cleaner. It is also a preparation
      for a more important change to the bearer code, in a subsequent
      commit in this series.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d8439b6
    • Y
      tipc: move code for resetting links from bearer.c to link.c · e0ca2c30
      Ying Xue 提交于
      We break out the code for resetting attached links in the
      function tipc_reset_bearer(), and define a new function named
      tipc_link_reset_list() to do this job.
      
      This commit incurs no functional changes, but makes the code
      of function tipc_reset_bearer() cleaner. It is also a preparation
      for a more important change to the bearer code, in a subsequent
      commit in this series.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0ca2c30
    • J
      tipc: stricter behavior of message reassembly function · 03b92017
      Jon Paul Maloy 提交于
      The function tipc_link_recv_fragment(struct sk_buff **buf) currently
      leaves the value of the input buffer pointer undefined when it returns,
      except when the return code indicates that the reassembly is complete.
      This despite the fact that it always consumes the input buffer.
      
      Here, we enforce a stricter behavior by this function, ensuring that
      the returned buffer pointer is non-NULL if and only if the reassembly
      is complete. This makes it possible to test for the buffer pointer as
      criteria for successful reassembly.
      
      We also rename the function to tipc_link_frag_rcv(), which is both
      shorter and more in line with common naming practice in the network
      subsystem.
      
      Apart from the new name, these changes have no impact on current
      users of the function, but makes it more practical for use in some
      planned future commits.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b92017
    • A
      tipc: explicitly include core.h in addr.h · b3f0f5c3
      Andreas Bofjäll 提交于
      The inline functions in addr.h uses tipc_own_addr which is exported by
      core.h, but addr.h never actually includes it. It works because it is
      explicitly included where this is used, but it looks a bit strange.
      
      Include core.h in addr.h explicitly to make the dependency clearer.
      Signed-off-by: NAndreas Bofjäll <andreas.bofjall@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3f0f5c3
  4. 13 2月, 2014 6 次提交
  5. 11 2月, 2014 4 次提交
    • E
      6lowpan: fix lockdep splats · 20e7c4e8
      Eric Dumazet 提交于
      When a device ndo_start_xmit() calls again dev_queue_xmit(),
      lockdep can complain because dev_queue_xmit() is re-entered and the
      spinlocks protecting tx queues share a common lockdep class.
      
      Same issue was fixed for bonding/l2tp/ppp in commits
      
      0daa2303 ("[PATCH] bonding: lockdep annotation")
      49ee4920 ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat")
      23d3b8bf ("net: qdisc busylock needs lockdep annotations ")
      303c07db ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ")
      Reported-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20e7c4e8
    • R
      9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers · b6f52ae2
      Richard Yao 提交于
      The 9p-virtio transport does zero copy on things larger than 1024 bytes
      in size. It accomplishes this by returning the physical addresses of
      pages to the virtio-pci device. At present, the translation is usually a
      bit shift.
      
      That approach produces an invalid page address when we read/write to
      vmalloc buffers, such as those used for Linux kernel modules. Any
      attempt to load a Linux kernel module from 9p-virtio produces the
      following stack.
      
      [<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510
      [<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0
      [<ffffffff814839dd>] p9_client_read+0x15d/0x240
      [<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0
      [<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20
      [<ffffffff811c84e7>] v9fs_file_read+0x37/0x70
      [<ffffffff8114e3fb>] vfs_read+0x9b/0x160
      [<ffffffff81153571>] kernel_read+0x41/0x60
      [<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180
      
      Subsequently, QEMU will die printing:
      
      qemu-system-x86_64: virtio: trying to map MMIO memory
      
      This patch enables 9p-virtio to correctly handle this case. This not
      only enables us to load Linux kernel modules off virtfs, but also
      enables ZFS file-based vdevs on virtfs to be used without killing QEMU.
      
      Special thanks to both Avi Kivity and Alexander Graf for their
      interpretation of QEMU backtraces. Without their guidence, tracking down
      this bug would have taken much longer. Also, special thanks to Linus
      Torvalds for his insightful explanation of why this should use
      is_vmalloc_addr() instead of is_vmalloc_or_module_addr():
      
      https://lkml.org/lkml/2014/2/8/272Signed-off-by: NRichard Yao <ryao@gentoo.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f52ae2
    • J
      tcp: tsq: fix nonagle handling · bf06200e
      John Ogness 提交于
      Commit 46d3ceab ("tcp: TCP Small Queues") introduced a possible
      regression for applications using TCP_NODELAY.
      
      If TCP session is throttled because of tsq, we should consult
      tp->nonagle when TX completion is done and allow us to send additional
      segment, especially if this segment is not a full MSS.
      Otherwise this segment is sent after an RTO.
      
      [edumazet] : Cooked the changelog, added another fix about testing
      sk_wmem_alloc twice because TX completion can happen right before
      setting TSQ_THROTTLED bit.
      
      This problem is particularly visible with recent auto corking,
      but might also be triggered with low tcp_limit_output_bytes
      values or NIC drivers delaying TX completion by hundred of usec,
      and very low rtt.
      
      Thomas Glanzmann for example reported an iscsi regression, caused
      by tcp auto corking making this bug quite visible.
      
      Fixes: 46d3ceab ("tcp: TCP Small Queues")
      Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf06200e
    • T
      bridge: Prevent possible race condition in br_fdb_change_mac_address · ac4c8868
      Toshiaki Makita 提交于
      br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without
      br->hash_lock.
      
      These hash list updates are racy with br_fdb_update()/br_fdb_cleanup().
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac4c8868