1. 24 9月, 2014 5 次提交
  2. 15 9月, 2014 1 次提交
    • L
      vfs: avoid non-forwarding large load after small store in path lookup · 9226b5b4
      Linus Torvalds 提交于
      The performance regression that Josef Bacik reported in the pathname
      lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made
      me look at performance stability of the dcache code, just to verify that
      the problem was actually fixed.  That turned up a few other problems in
      this area.
      
      There are a few cases where we exit RCU lookup mode and go to the slow
      serializing case when we shouldn't, Al has fixed those and they'll come
      in with the next VFS pull.
      
      But my performance verification also shows that link_path_walk() turns
      out to have a very unfortunate 32-bit store of the length and hash of
      the name we look up, followed by a 64-bit read of the combined hash_len
      field.  That screws up the processor store to load forwarding, causing
      an unnecessary hickup in this critical routine.
      
      It's caused by the ugly calling convention for the "hash_name()"
      function, and easily fixed by just making hash_name() fill in the whole
      'struct qstr' rather than passing it a pointer to just the hash value.
      
      With that, the profile for this function looks much smoother.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9226b5b4
  3. 14 9月, 2014 1 次提交
    • L
      Make hash_64() use a 64-bit multiply when appropriate · 23d0db76
      Linus Torvalds 提交于
      The hash_64() function historically does the multiply by the
      GOLDEN_RATIO_PRIME_64 number with explicit shifts and adds, because
      unlike the 32-bit case, gcc seems unable to turn the constant multiply
      into the more appropriate shift and adds when required.
      
      However, that means that we generate those shifts and adds even when the
      architecture has a fast multiplier, and could just do it better in
      hardware.
      
      Use the now-cleaned-up CONFIG_ARCH_HAS_FAST_MULTIPLIER (together with
      "is it a 64-bit architecture") to decide whether to use an integer
      multiply or the explicit sequence of shift/add instructions.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23d0db76
  4. 13 9月, 2014 1 次提交
    • A
      jiffies: Fix timeval conversion to jiffies · d78c9300
      Andrew Hunter 提交于
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: NPaul Turner <pjt@google.com>
      Reported-by: NAaron Jacobs <jacobsa@google.com>
      Signed-off-by: NAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d78c9300
  5. 09 9月, 2014 3 次提交
  6. 06 9月, 2014 1 次提交
  7. 05 9月, 2014 1 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039
  8. 03 9月, 2014 2 次提交
    • G
    • J
      Revert "leds: convert blink timer to workqueue" · 9067359f
      Jiri Kosina 提交于
      This reverts commit 8b37e1be.
      
      It's broken as it changes led_blink_set() in a way that it can now sleep
      (while synchronously waiting for workqueue to be cancelled). That's a
      problem, because it's possible that this function gets called from atomic
      context (tpt_trig_timer() takes a readlock and thus disables preemption).
      
      This has been brought up 3 weeks ago already [1] but no proper fix has
      materialized, and I keep seeing the problem since 3.17-rc1.
      
      [1] https://lkml.org/lkml/2014/8/16/128
      
       BUG: sleeping function called from invalid context at kernel/workqueue.c:2650
       in_atomic(): 1, irqs_disabled(): 0, pid: 2335, name: wpa_supplicant
       5 locks held by wpa_supplicant/2335:
        #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814c7c92>] rtnl_lock+0x12/0x20
        #1:  (&wdev->mtx){+.+.+.}, at: [<ffffffffc06e649c>] cfg80211_mgd_wext_siwessid+0x5c/0x180 [cfg80211]
        #2:  (&local->mtx){+.+.+.}, at: [<ffffffffc0817dea>] ieee80211_prep_connection+0x17a/0x9a0 [mac80211]
        #3:  (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffc08081ed>] ieee80211_vif_use_channel+0x5d/0x2a0 [mac80211]
        #4:  (&trig->leddev_list_lock){.+.+..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
       CPU: 0 PID: 2335 Comm: wpa_supplicant Not tainted 3.17.0-rc3 #1
       Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
        ffff8800360b5a50 ffff8800751f76d8 ffffffff8159e97f ffff8800360b5a30
        ffff8800751f76e8 ffffffff810739a5 ffff8800751f77b0 ffffffff8106862f
        ffffffff810685d0 0aa2209200000000 ffff880000000004 ffff8800361c59d0
       Call Trace:
        [<ffffffff8159e97f>] dump_stack+0x4d/0x66
        [<ffffffff810739a5>] __might_sleep+0xe5/0x120
        [<ffffffff8106862f>] flush_work+0x5f/0x270
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
        [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
        [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
        [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
        [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
        [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
        [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
        [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
        [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
        [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
        [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
        [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
        [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
        [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
        [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
        [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
        [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
        [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
        [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
        [<ffffffffc06e36c0>] ? cfg80211_wext_giwessid+0x50/0x50 [cfg80211]
        [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
        [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
        [<ffffffff81584fa0>] ? ioctl_standard_iw_point+0x3e0/0x3e0
        [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
        [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
        [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
        [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
        [<ffffffff815a67fb>] ? sysret_check+0x1b/0x56
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
        [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
       wlan0: send auth to 00:0b:6b:3c:8c:e4 (try 1/3)
       wlan0: authenticated
       wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1/3)
       wlan0: RX AssocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=2)
       wlan0: associated
       IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
       cfg80211: Calling CRDA for country: NA
       wlan0: Limiting TX power to 27 (27 - 0) dBm as advertised by 00:0b:6b:3c:8c:e4
      
       =================================
       [ INFO: inconsistent lock state ]
       3.17.0-rc3 #1 Not tainted
       ---------------------------------
       inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
       swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
        ((&(&led_cdev->blink_work)->work)){+.?...}, at: [<ffffffff810685d0>] flush_work+0x0/0x270
       {SOFTIRQ-ON-W} state was registered at:
         [<ffffffff81094dbe>] __lock_acquire+0x30e/0x1a30
         [<ffffffff81096c81>] lock_acquire+0x91/0x110
         [<ffffffff81068608>] flush_work+0x38/0x270
         [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
         [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
         [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
         [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
         [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
         [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
         [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
         [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
         [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
         [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
         [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
         [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
         [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
         [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
         [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
         [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
         [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
         [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
         [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
         [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
         [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
         [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
         [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
         [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
         [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
         [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
         [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
         [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
       irq event stamp: 493416
       hardirqs last  enabled at (493416): [<ffffffff81068a5f>] __cancel_work_timer+0x6f/0x100
       hardirqs last disabled at (493415): [<ffffffff81067e9f>] try_to_grab_pending+0x1f/0x160
       softirqs last  enabled at (493408): [<ffffffff81053ced>] _local_bh_enable+0x1d/0x50
       softirqs last disabled at (493409): [<ffffffff81054c75>] irq_exit+0xa5/0xb0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock((&(&led_cdev->blink_work)->work));
         <Interrupt>
           lock((&(&led_cdev->blink_work)->work));
      
        *** DEADLOCK ***
      
       2 locks held by swapper/0/0:
        #0:  (((&tpt_trig->timer))){+.-...}, at: [<ffffffff810b4c50>] call_timer_fn+0x0/0x180
        #1:  (&trig->leddev_list_lock){.+.?..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
      
       stack backtrace:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3 #1
       Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
        ffffffff8246eb30 ffff88007c203b00 ffffffff8159e97f ffffffff81a194c0
        ffff88007c203b50 ffffffff81599c29 0000000000000001 ffffffff00000001
        ffff880000000000 0000000000000006 ffffffff81a194c0 ffffffff81093ad0
       Call Trace:
        <IRQ>  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
        [<ffffffff81599c29>] print_usage_bug+0x1f4/0x205
        [<ffffffff81093ad0>] ? check_usage_backwards+0x140/0x140
        [<ffffffff810944d3>] mark_lock+0x223/0x2b0
        [<ffffffff81094d60>] __lock_acquire+0x2b0/0x1a30
        [<ffffffff81096c81>] lock_acquire+0x91/0x110
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff81068608>] flush_work+0x38/0x270
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
        [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff8109469d>] ? trace_hardirqs_on_caller+0xad/0x1c0
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
        [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
        [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
        [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
        [<ffffffff810b4cc5>] call_timer_fn+0x75/0x180
        [<ffffffff810b4c50>] ? process_timeout+0x10/0x10
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff810b50ac>] run_timer_softirq+0x1fc/0x2f0
        [<ffffffff81054805>] __do_softirq+0x115/0x2e0
        [<ffffffff81054c75>] irq_exit+0xa5/0xb0
        [<ffffffff810049b3>] do_IRQ+0x53/0xf0
        [<ffffffff815a74af>] common_interrupt+0x6f/0x6f
        <EOI>  [<ffffffff8147b56e>] ? cpuidle_enter_state+0x6e/0x180
        [<ffffffff8147b732>] cpuidle_enter+0x12/0x20
        [<ffffffff8108bba0>] cpu_startup_entry+0x330/0x360
        [<ffffffff8158fb51>] rest_init+0xc1/0xd0
        [<ffffffff8158fa90>] ? csum_partial_copy_generic+0x170/0x170
        [<ffffffff81af3ff2>] start_kernel+0x44f/0x45a
        [<ffffffff81af399c>] ? set_init_arg+0x53/0x53
        [<ffffffff81af35ad>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81af36a0>] x86_64_start_kernel+0xf1/0xf4
      
      Cc: Vincent Donnefort <vdonnefort@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NBryan Wu <cooloney@gmail.com>
      9067359f
  9. 02 9月, 2014 1 次提交
  10. 30 8月, 2014 1 次提交
  11. 29 8月, 2014 1 次提交
    • D
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong 提交于
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
  12. 28 8月, 2014 2 次提交
  13. 26 8月, 2014 1 次提交
    • R
      mtd: nand: omap: Revert to using software ECC by default · 7d5929c1
      Roger Quadros 提交于
      For v3.12 and prior, 1-bit Hamming code ECC via software was the
      default choice. Commit c66d0391 in v3.13 changed the behaviour
      to use 1-bit Hamming code via Hardware using a different ECC layout
      i.e. (ROM code layout) than what is used by software ECC.
      
      This ECC layout change causes NAND filesystems created in v3.12
      and prior to be unusable in v3.13 and later. So revert back to
      using software ECC by default if an ECC scheme is not explicitely
      specified.
      
      This defect can be observed on the following boards during legacy boot
      
      -omap3beagle
      -omap3touchbook
      -overo
      -am3517crane
      -devkit8000
      -ldp
      -3430sdp
      Signed-off-by: NRoger Quadros <rogerq@ti.com>
      Tested-by: NGrazvydas Ignotas <notasas@gmail.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      7d5929c1
  14. 25 8月, 2014 1 次提交
  15. 23 8月, 2014 3 次提交
    • W
      nfs: don't sleep with inode lock in lock_and_join_requests · 7c3af975
      Weston Andros Adamson 提交于
      This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
      If the group is already locked and blocking is allowed, drop the inode lock
      and wait for the group lock to be cleared before trying it all again.
      This should fix warnings found in peterz's tree (sched/wait branch), where
      might_sleep() checks are added to wait.[ch].
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7c3af975
    • C
      f2fs: use macro for code readability · b5b82205
      Chao Yu 提交于
      This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
      instead of numbers in code for readability.
      
      change log from v1:
       o fix typo pointed out by Jaegeuk Kim.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b5b82205
    • S
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) 提交于
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33b7f99c
  16. 22 8月, 2014 1 次提交
  17. 21 8月, 2014 1 次提交
  18. 20 8月, 2014 1 次提交
  19. 19 8月, 2014 1 次提交
  20. 17 8月, 2014 1 次提交
  21. 16 8月, 2014 1 次提交
  22. 15 8月, 2014 3 次提交
  23. 13 8月, 2014 2 次提交
  24. 12 8月, 2014 1 次提交
    • V
      net: Always untag vlan-tagged traffic on input. · 0d5501c1
      Vlad Yasevich 提交于
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5501c1
  25. 11 8月, 2014 1 次提交
  26. 09 8月, 2014 2 次提交
    • V
      kexec: verify the signature of signed PE bzImage · 8e7d8381
      Vivek Goyal 提交于
      This is the final piece of the puzzle of verifying kernel image signature
      during kexec_file_load() syscall.
      
      This patch calls into PE file routines to verify signature of bzImage.  If
      signature are valid, kexec_file_load() succeeds otherwise it fails.
      
      Two new config options have been introduced.  First one is
      CONFIG_KEXEC_VERIFY_SIG.  This option enforces that kernel has to be
      validly signed otherwise kernel load will fail.  If this option is not
      set, no signature verification will be done.  Only exception will be when
      secureboot is enabled.  In that case signature verification should be
      automatically enforced when secureboot is enabled.  But that will happen
      when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      
      I tested these patches with both "pesign" and "sbsign" signed bzImages.
      
      I used signing_key.priv key and signing_key.x509 cert for signing as
      generated during kernel build process (if module signing is enabled).
      
      Used following method to sign bzImage.
      
      pesign
      ======
      - Convert DER format cert to PEM format cert
      openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
      PEM
      
      - Generate a .p12 file from existing cert and private key file
      openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
      signing_key.x509.PEM
      
      - Import .p12 file into pesign db
      pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign
      
      - Sign bzImage
      pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
      -c "Glacier signing key - Magrathea" -s
      
      sbsign
      ======
      sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
      /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+
      
      Patch details:
      
      Well all the hard work is done in previous patches.  Now bzImage loader
      has just call into that code and verify whether bzImage signature are
      valid or not.
      
      Also create two config options.  First one is CONFIG_KEXEC_VERIFY_SIG.
      This option enforces that kernel has to be validly signed otherwise kernel
      load will fail.  If this option is not set, no signature verification will
      be done.  Only exception will be when secureboot is enabled.  In that case
      signature verification should be automatically enforced when secureboot is
      enabled.  But that will happen when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e7d8381
    • V
      kexec: support kexec/kdump on EFI systems · 6a2c20e7
      Vivek Goyal 提交于
      This patch does two things.  It passes EFI run time mappings to second
      kernel in bootparams efi_info.  Second kernel parse this info and create
      new mappings in second kernel.  That means mappings in first and second
      kernel will be same.  This paves the way to enable EFI in kexec kernel.
      
      This patch also prepares and passes EFI setup data through bootparams.
      This contains bunch of information about various tables and their
      addresses.
      
      These information gathering and passing has been written along the lines
      of what current kexec-tools is doing to make kexec work with UEFI.
      
      [akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt]
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a2c20e7