1. 09 9月, 2014 3 次提交
  2. 05 9月, 2014 1 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039
  3. 04 9月, 2014 1 次提交
  4. 03 9月, 2014 2 次提交
    • G
    • J
      Revert "leds: convert blink timer to workqueue" · 9067359f
      Jiri Kosina 提交于
      This reverts commit 8b37e1be.
      
      It's broken as it changes led_blink_set() in a way that it can now sleep
      (while synchronously waiting for workqueue to be cancelled). That's a
      problem, because it's possible that this function gets called from atomic
      context (tpt_trig_timer() takes a readlock and thus disables preemption).
      
      This has been brought up 3 weeks ago already [1] but no proper fix has
      materialized, and I keep seeing the problem since 3.17-rc1.
      
      [1] https://lkml.org/lkml/2014/8/16/128
      
       BUG: sleeping function called from invalid context at kernel/workqueue.c:2650
       in_atomic(): 1, irqs_disabled(): 0, pid: 2335, name: wpa_supplicant
       5 locks held by wpa_supplicant/2335:
        #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814c7c92>] rtnl_lock+0x12/0x20
        #1:  (&wdev->mtx){+.+.+.}, at: [<ffffffffc06e649c>] cfg80211_mgd_wext_siwessid+0x5c/0x180 [cfg80211]
        #2:  (&local->mtx){+.+.+.}, at: [<ffffffffc0817dea>] ieee80211_prep_connection+0x17a/0x9a0 [mac80211]
        #3:  (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffc08081ed>] ieee80211_vif_use_channel+0x5d/0x2a0 [mac80211]
        #4:  (&trig->leddev_list_lock){.+.+..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
       CPU: 0 PID: 2335 Comm: wpa_supplicant Not tainted 3.17.0-rc3 #1
       Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
        ffff8800360b5a50 ffff8800751f76d8 ffffffff8159e97f ffff8800360b5a30
        ffff8800751f76e8 ffffffff810739a5 ffff8800751f77b0 ffffffff8106862f
        ffffffff810685d0 0aa2209200000000 ffff880000000004 ffff8800361c59d0
       Call Trace:
        [<ffffffff8159e97f>] dump_stack+0x4d/0x66
        [<ffffffff810739a5>] __might_sleep+0xe5/0x120
        [<ffffffff8106862f>] flush_work+0x5f/0x270
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
        [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
        [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
        [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
        [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
        [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
        [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
        [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
        [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
        [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
        [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
        [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
        [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
        [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
        [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
        [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
        [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
        [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
        [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
        [<ffffffffc06e36c0>] ? cfg80211_wext_giwessid+0x50/0x50 [cfg80211]
        [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
        [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
        [<ffffffff81584fa0>] ? ioctl_standard_iw_point+0x3e0/0x3e0
        [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
        [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
        [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
        [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
        [<ffffffff815a67fb>] ? sysret_check+0x1b/0x56
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
        [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
       wlan0: send auth to 00:0b:6b:3c:8c:e4 (try 1/3)
       wlan0: authenticated
       wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1/3)
       wlan0: RX AssocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=2)
       wlan0: associated
       IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
       cfg80211: Calling CRDA for country: NA
       wlan0: Limiting TX power to 27 (27 - 0) dBm as advertised by 00:0b:6b:3c:8c:e4
      
       =================================
       [ INFO: inconsistent lock state ]
       3.17.0-rc3 #1 Not tainted
       ---------------------------------
       inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
       swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
        ((&(&led_cdev->blink_work)->work)){+.?...}, at: [<ffffffff810685d0>] flush_work+0x0/0x270
       {SOFTIRQ-ON-W} state was registered at:
         [<ffffffff81094dbe>] __lock_acquire+0x30e/0x1a30
         [<ffffffff81096c81>] lock_acquire+0x91/0x110
         [<ffffffff81068608>] flush_work+0x38/0x270
         [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
         [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
         [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
         [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
         [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
         [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
         [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
         [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
         [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
         [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
         [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
         [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
         [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
         [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
         [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
         [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
         [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
         [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
         [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
         [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
         [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
         [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
         [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
         [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
         [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
         [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
         [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
         [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
       irq event stamp: 493416
       hardirqs last  enabled at (493416): [<ffffffff81068a5f>] __cancel_work_timer+0x6f/0x100
       hardirqs last disabled at (493415): [<ffffffff81067e9f>] try_to_grab_pending+0x1f/0x160
       softirqs last  enabled at (493408): [<ffffffff81053ced>] _local_bh_enable+0x1d/0x50
       softirqs last disabled at (493409): [<ffffffff81054c75>] irq_exit+0xa5/0xb0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock((&(&led_cdev->blink_work)->work));
         <Interrupt>
           lock((&(&led_cdev->blink_work)->work));
      
        *** DEADLOCK ***
      
       2 locks held by swapper/0/0:
        #0:  (((&tpt_trig->timer))){+.-...}, at: [<ffffffff810b4c50>] call_timer_fn+0x0/0x180
        #1:  (&trig->leddev_list_lock){.+.?..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
      
       stack backtrace:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3 #1
       Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
        ffffffff8246eb30 ffff88007c203b00 ffffffff8159e97f ffffffff81a194c0
        ffff88007c203b50 ffffffff81599c29 0000000000000001 ffffffff00000001
        ffff880000000000 0000000000000006 ffffffff81a194c0 ffffffff81093ad0
       Call Trace:
        <IRQ>  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
        [<ffffffff81599c29>] print_usage_bug+0x1f4/0x205
        [<ffffffff81093ad0>] ? check_usage_backwards+0x140/0x140
        [<ffffffff810944d3>] mark_lock+0x223/0x2b0
        [<ffffffff81094d60>] __lock_acquire+0x2b0/0x1a30
        [<ffffffff81096c81>] lock_acquire+0x91/0x110
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff81068608>] flush_work+0x38/0x270
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
        [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff8109469d>] ? trace_hardirqs_on_caller+0xad/0x1c0
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
        [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
        [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
        [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
        [<ffffffff810b4cc5>] call_timer_fn+0x75/0x180
        [<ffffffff810b4c50>] ? process_timeout+0x10/0x10
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff810b50ac>] run_timer_softirq+0x1fc/0x2f0
        [<ffffffff81054805>] __do_softirq+0x115/0x2e0
        [<ffffffff81054c75>] irq_exit+0xa5/0xb0
        [<ffffffff810049b3>] do_IRQ+0x53/0xf0
        [<ffffffff815a74af>] common_interrupt+0x6f/0x6f
        <EOI>  [<ffffffff8147b56e>] ? cpuidle_enter_state+0x6e/0x180
        [<ffffffff8147b732>] cpuidle_enter+0x12/0x20
        [<ffffffff8108bba0>] cpu_startup_entry+0x330/0x360
        [<ffffffff8158fb51>] rest_init+0xc1/0xd0
        [<ffffffff8158fa90>] ? csum_partial_copy_generic+0x170/0x170
        [<ffffffff81af3ff2>] start_kernel+0x44f/0x45a
        [<ffffffff81af399c>] ? set_init_arg+0x53/0x53
        [<ffffffff81af35ad>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81af36a0>] x86_64_start_kernel+0xf1/0xf4
      
      Cc: Vincent Donnefort <vdonnefort@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NBryan Wu <cooloney@gmail.com>
      9067359f
  5. 02 9月, 2014 1 次提交
  6. 30 8月, 2014 1 次提交
  7. 29 8月, 2014 1 次提交
    • D
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong 提交于
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
  8. 28 8月, 2014 2 次提交
  9. 26 8月, 2014 2 次提交
  10. 23 8月, 2014 3 次提交
    • W
      nfs: don't sleep with inode lock in lock_and_join_requests · 7c3af975
      Weston Andros Adamson 提交于
      This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
      If the group is already locked and blocking is allowed, drop the inode lock
      and wait for the group lock to be cleared before trying it all again.
      This should fix warnings found in peterz's tree (sched/wait branch), where
      might_sleep() checks are added to wait.[ch].
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7c3af975
    • C
      f2fs: use macro for code readability · b5b82205
      Chao Yu 提交于
      This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
      instead of numbers in code for readability.
      
      change log from v1:
       o fix typo pointed out by Jaegeuk Kim.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b5b82205
    • S
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) 提交于
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33b7f99c
  11. 22 8月, 2014 4 次提交
  12. 21 8月, 2014 1 次提交
  13. 20 8月, 2014 1 次提交
  14. 19 8月, 2014 3 次提交
  15. 18 8月, 2014 1 次提交
  16. 17 8月, 2014 1 次提交
  17. 16 8月, 2014 1 次提交
  18. 15 8月, 2014 6 次提交
    • T
      rhashtable: fix annotations for rht_for_each_entry_rcu() · 93f56081
      Thomas Graf 提交于
      Call rcu_deference_raw() directly from within rht_for_each_entry_rcu()
      as list_for_each_entry_rcu() does.
      
      Fixes the following sparse warnings:
      net/netlink/af_netlink.c:2906:25:    expected struct rhash_head const *__mptr
      net/netlink/af_netlink.c:2906:25:    got struct rhash_head [noderef] <asn:4>*<noident>
      
      Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f56081
    • T
      rhashtable: unexport and make rht_obj() static · c91eee56
      Thomas Graf 提交于
      No need to export rht_obj(), all inner to outer object translations
      occur internally. It was intended to be used with rht_for_each() which
      now primarily serves as the iterator for rhashtable_remove_pprev() to
      effectively flush and free the full table.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91eee56
    • T
      rhashtable: RCU annotations for next pointers · 5300fdcb
      Thomas Graf 提交于
      Properly annotate next pointers as access is RCU protected in
      the lookup path.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5300fdcb
    • H
      tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic · a26552af
      Hannes Frederic Sowa 提交于
      tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
      ordering of incoming connections and teardowns without the need to
      hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
      the last measured RTO. To do so, we keep the last seen timestamp in a
      per-host indexed data structure and verify if the incoming timestamp
      in a connection request is strictly greater than the saved one during
      last connection teardown. Thus we can verify later on that no old data
      packets will be accepted by the new connection.
      
      During moving a socket to time-wait state we already verify if timestamps
      where seen on a connection. Only if that was the case we let the
      time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
      will be used. But we don't verify this on incoming SYN packets. If a
      connection teardown was less than TCP_PAWS_MSL seconds in the past we
      cannot guarantee to not accept data packets from an old connection if
      no timestamps are present. We should drop this SYN packet. This patch
      closes this loophole.
      
      Please note, this patch does not make tcp_tw_recycle in any way more
      usable but only adds another safety check:
      Sporadic drops of SYN packets because of reordering in the network or
      in the socket backlog queues can happen. Users behing NAT trying to
      connect to a tcp_tw_recycle enabled server can get caught in blackholes
      and their connection requests may regullary get dropped because hosts
      behind an address translator don't have synchronized tcp timestamp clocks.
      tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
      
      In general, use of tcp_tw_recycle is disadvised.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a26552af
    • N
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 4fab9071
      Neal Cardwell 提交于
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fab9071
    • A
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 9d186cac
      Andrey Vagin 提交于
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d186cac
  19. 13 8月, 2014 4 次提交
  20. 12 8月, 2014 1 次提交
    • V
      net: Always untag vlan-tagged traffic on input. · 0d5501c1
      Vlad Yasevich 提交于
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5501c1