1. 17 9月, 2014 3 次提交
  2. 08 9月, 2014 13 次提交
    • P
      rcu: Per-CPU operation cleanups to rcu_*_qs() functions · 284a8c93
      Paul E. McKenney 提交于
      The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
      old-style per-CPU variable access and write to ->passed_quiesce even
      if it is already set.  This commit therefore updates to use the new-style
      per-CPU variable access functions and avoids the spurious writes.
      This commit also eliminates the "cpu" argument to these functions because
      they are always invoked on the indicated CPU.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      284a8c93
    • P
      rcu: Remove local_irq_disable() in rcu_preempt_note_context_switch() · 1d082fd0
      Paul E. McKenney 提交于
      The rcu_preempt_note_context_switch() function is on a scheduling fast
      path, so it would be good to avoid disabling irqs.  The reason that irqs
      are disabled is to synchronize process-level and irq-handler access to
      the task_struct ->rcu_read_unlock_special bitmask.  This commit therefore
      makes ->rcu_read_unlock_special instead be a union of bools with a short
      allowing single-access checks in RCU's __rcu_read_unlock().  This results
      in the process-level and irq-handler accesses being simple loads and
      stores, so that irqs need no longer be disabled.  This commit therefore
      removes the irq disabling from rcu_preempt_note_context_switch().
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1d082fd0
    • P
      rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() · 01a81330
      Paul E. McKenney 提交于
      In theory, synchronize_sched() requires a read-side critical section
      to order against.  In practice, preemption can be thought of as
      being disabled across every machine instruction, at least for those
      machine instructions that are not in the idle loop and not on offline
      CPUs.  So this commit removes the redundant preempt_disable() from
      rcu_note_voluntary_context_switch().
      
      Please note that the single instruction in question is the store of
      zero to ->rcu_tasks_holdout.  The "if" is simply a performance optimization
      that avoids unnecessary stores.  To see this, keep in mind that both
      the "if" condition and the store are in a quiescent state.  Therefore,
      even if the task is preempted for a full grace period (presumably due
      to its having done a context switch beforehand), the store will be
      recording a legitimate quiescent state.
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      Conflicts:
      	include/linux/rcupdate.h
      01a81330
    • P
      rcu: Make TASKS_RCU handle nohz_full= CPUs · 176f8f7a
      Paul E. McKenney 提交于
      Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
      usermode execution.  There would be neither a context switch nor a
      scheduling-clock interrupt to tell TASKS_RCU that the task in question
      had passed through a quiescent state.  The grace period would therefore
      extend indefinitely.  This commit therefore makes RCU's dyntick-idle
      subsystem record the task_struct structure of the task that is running
      in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
      then access this information and record a quiescent state on
      behalf of any CPU running in dyntick-idle usermode.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      176f8f7a
    • P
      rcutorture: Add torture tests for RCU-tasks · 69c60455
      Paul E. McKenney 提交于
      This commit adds torture tests for RCU-tasks.  It also fixes a bug that
      would segfault for an RCU flavor lacking a callback-barrier function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      69c60455
    • P
      rcu: Make TASKS_RCU handle tasks that are almost done exiting · 3f95aa81
      Paul E. McKenney 提交于
      Once a task has passed exit_notify() in the do_exit() code path, it
      is no longer on the task lists, and is therefore no longer visible
      to rcu_tasks_kthread().  This means that an almost-exited task might
      be preempted while within a trampoline, and this task won't be waited
      on by rcu_tasks_kthread().  This commit fixes this bug by adding an
      srcu_struct.  An exiting task does srcu_read_lock() just before calling
      exit_notify(), and does the corresponding srcu_read_unlock() after
      doing the final preempt_disable().  This means that rcu_tasks_kthread()
      can do synchronize_srcu() to wait for all mostly-exited tasks to reach
      their final preempt_disable() region, and then use synchronize_sched()
      to wait for those tasks to finish exiting.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3f95aa81
    • P
      rcu: Add synchronous grace-period waiting for RCU-tasks · 53c6d4ed
      Paul E. McKenney 提交于
      It turns out to be easier to add the synchronous grace-period waiting
      functions to RCU-tasks than to work around their absense in rcutorture,
      so this commit adds them.  The key point is that the existence of
      call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      53c6d4ed
    • P
      rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops · bde6c3aa
      Paul E. McKenney 提交于
      RCU-tasks requires the occasional voluntary context switch
      from CPU-bound in-kernel tasks.  In some cases, this requires
      instrumenting cond_resched().  However, there is some reluctance
      to countenance unconditionally instrumenting cond_resched() (see
      http://lwn.net/Articles/603252/), so this commit creates a separate
      cond_resched_rcu_qs() that may be used in place of cond_resched() in
      locations prone to long-duration in-kernel looping.
      
      This commit currently instruments only RCU-tasks.  Future possibilities
      include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
      IPI usage.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bde6c3aa
    • P
      rcu: Add call_rcu_tasks() · 8315f422
      Paul E. McKenney 提交于
      This commit adds a new RCU-tasks flavor of RCU, which provides
      call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
      context switch (not preemption!) and userspace execution (not the idle
      loop -- use some sort of schedule_on_each_cpu() if you need to handle the
      idle tasks.  Note that unlike other RCU flavors, these quiescent states
      occur in tasks, not necessarily CPUs.  Includes fixes from Steven Rostedt.
      
      This RCU flavor is assumed to have very infrequent latency-tolerant
      updaters.  This assumption permits significant simplifications, including
      a single global callback list protected by a single global lock, along
      with a single task-private linked list containing all tasks that have not
      yet passed through a quiescent state.  If experience shows this assumption
      to be incorrect, the required additional complexity will be added.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8315f422
    • J
      rcu: Use pr_alert/pr_cont for printing logs · eea203fe
      Joe Perches 提交于
      User pr_alert/pr_cont for printing the logs from rcutorture module directly
      instead of writing it to a buffer and then printing it. This allows us from not
      having to allocate such buffers. Also remove a resulting empty function.
      
      I tested this using the parse-torture.sh script as follows:
      
      $ dmesg | grep torture > log.txt
      $ bash parse-torture.sh log.txt test
      $
      
      There were no warnings which means that parsing went fine.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      eea203fe
    • P
      rcu: Break more call_rcu() deadlock involving scheduler and perf · 9fdd3bc9
      Paul E. McKenney 提交于
      Commit 96d3fd0d (rcu: Break call_rcu() deadlock involving scheduler
      and perf) covered the case where __call_rcu_nocb_enqueue() needs to wake
      the rcuo kthread due to the queue being initially empty, but did not
      do anything for the case where the queue was overflowing.  This commit
      therefore also defers wakeup for the overflow case.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fdd3bc9
    • O
      rcu: Uninline rcu_read_lock_held() · 85b39d30
      Oleg Nesterov 提交于
      This commit uninlines rcu_read_lock_held(). According to "size vmlinux"
      this saves 28549 in .text:
      
      	- 5541731 3014560 14757888 23314179
      	+ 5513182 3026848 14757888 23297918
      
      Note: it looks as if the data grows by 12288 bytes but this is not true,
      it does not actually grow. But .data starts with ALIGN(THREAD_SIZE) and
      since .text shrinks the padding grows, and thus .data grows too as it
      seen by /bin/size. diff System.map:
      
      	- ffffffff81510000 D _sdata
      	- ffffffff81510000 D init_thread_union
      	+ ffffffff81509000 D _sdata
      	+ ffffffff8150c000 D init_thread_union
      
      Perhaps we can change vmlinux.lds.S to .data itself, so that /bin/size
      can't "wrongly" report that .data grows if .text shinks.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      85b39d30
    • P
      rcu: Return bool type in rcu_lockdep_current_cpu_online() · 521d24ee
      Pranith Kumar 提交于
      Return true instead of 1 in rcu_lockdep_current_cpu_online() as this
      has bool as return type.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      521d24ee
  3. 23 8月, 2014 2 次提交
    • W
      nfs: don't sleep with inode lock in lock_and_join_requests · 7c3af975
      Weston Andros Adamson 提交于
      This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
      If the group is already locked and blocking is allowed, drop the inode lock
      and wait for the group lock to be cleared before trying it all again.
      This should fix warnings found in peterz's tree (sched/wait branch), where
      might_sleep() checks are added to wait.[ch].
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7c3af975
    • S
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) 提交于
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33b7f99c
  4. 22 8月, 2014 3 次提交
  5. 21 8月, 2014 1 次提交
  6. 19 8月, 2014 3 次提交
  7. 17 8月, 2014 1 次提交
  8. 15 8月, 2014 6 次提交
    • T
      rhashtable: fix annotations for rht_for_each_entry_rcu() · 93f56081
      Thomas Graf 提交于
      Call rcu_deference_raw() directly from within rht_for_each_entry_rcu()
      as list_for_each_entry_rcu() does.
      
      Fixes the following sparse warnings:
      net/netlink/af_netlink.c:2906:25:    expected struct rhash_head const *__mptr
      net/netlink/af_netlink.c:2906:25:    got struct rhash_head [noderef] <asn:4>*<noident>
      
      Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f56081
    • T
      rhashtable: unexport and make rht_obj() static · c91eee56
      Thomas Graf 提交于
      No need to export rht_obj(), all inner to outer object translations
      occur internally. It was intended to be used with rht_for_each() which
      now primarily serves as the iterator for rhashtable_remove_pprev() to
      effectively flush and free the full table.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91eee56
    • T
      rhashtable: RCU annotations for next pointers · 5300fdcb
      Thomas Graf 提交于
      Properly annotate next pointers as access is RCU protected in
      the lookup path.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5300fdcb
    • H
      tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic · a26552af
      Hannes Frederic Sowa 提交于
      tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
      ordering of incoming connections and teardowns without the need to
      hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
      the last measured RTO. To do so, we keep the last seen timestamp in a
      per-host indexed data structure and verify if the incoming timestamp
      in a connection request is strictly greater than the saved one during
      last connection teardown. Thus we can verify later on that no old data
      packets will be accepted by the new connection.
      
      During moving a socket to time-wait state we already verify if timestamps
      where seen on a connection. Only if that was the case we let the
      time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
      will be used. But we don't verify this on incoming SYN packets. If a
      connection teardown was less than TCP_PAWS_MSL seconds in the past we
      cannot guarantee to not accept data packets from an old connection if
      no timestamps are present. We should drop this SYN packet. This patch
      closes this loophole.
      
      Please note, this patch does not make tcp_tw_recycle in any way more
      usable but only adds another safety check:
      Sporadic drops of SYN packets because of reordering in the network or
      in the socket backlog queues can happen. Users behing NAT trying to
      connect to a tcp_tw_recycle enabled server can get caught in blackholes
      and their connection requests may regullary get dropped because hosts
      behind an address translator don't have synchronized tcp timestamp clocks.
      tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
      
      In general, use of tcp_tw_recycle is disadvised.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a26552af
    • N
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 4fab9071
      Neal Cardwell 提交于
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fab9071
    • A
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 9d186cac
      Andrey Vagin 提交于
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d186cac
  9. 13 8月, 2014 3 次提交
  10. 12 8月, 2014 1 次提交
    • V
      net: Always untag vlan-tagged traffic on input. · 0d5501c1
      Vlad Yasevich 提交于
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5501c1
  11. 11 8月, 2014 3 次提交
  12. 10 8月, 2014 1 次提交