1. 06 11月, 2013 1 次提交
    • T
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi 提交于
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
      
      Before:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
      
      After:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      appropriate).
      
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f306cc82
  2. 19 10月, 2013 1 次提交
    • N
      ftrace: Add set_graph_notrace filter · 29ad23b0
      Namhyung Kim 提交于
      The set_graph_notrace filter is analogous to set_ftrace_notrace and
      can be used for eliminating uninteresting part of function graph trace
      output.  It also works with set_graph_function nicely.
      
        # cd /sys/kernel/debug/tracing/
        # echo do_page_fault > set_graph_function
        # perf ftrace live true
         2)               |  do_page_fault() {
         2)               |    __do_page_fault() {
         2)   0.381 us    |      down_read_trylock();
         2)   0.055 us    |      __might_sleep();
         2)   0.696 us    |      find_vma();
         2)               |      handle_mm_fault() {
         2)               |        handle_pte_fault() {
         2)               |          __do_fault() {
         2)               |            filemap_fault() {
         2)               |              find_get_page() {
         2)   0.033 us    |                __rcu_read_lock();
         2)   0.035 us    |                __rcu_read_unlock();
         2)   1.696 us    |              }
         2)   0.031 us    |              __might_sleep();
         2)   2.831 us    |            }
         2)               |            _raw_spin_lock() {
         2)   0.046 us    |              add_preempt_count();
         2)   0.841 us    |            }
         2)   0.033 us    |            page_add_file_rmap();
         2)               |            _raw_spin_unlock() {
         2)   0.057 us    |              sub_preempt_count();
         2)   0.568 us    |            }
         2)               |            unlock_page() {
         2)   0.084 us    |              page_waitqueue();
         2)   0.126 us    |              __wake_up_bit();
         2)   1.117 us    |            }
         2)   7.729 us    |          }
         2)   8.397 us    |        }
         2)   8.956 us    |      }
         2)   0.085 us    |      up_read();
         2) + 12.745 us   |    }
         2) + 13.401 us   |  }
        ...
      
        # echo handle_mm_fault > set_graph_notrace
        # perf ftrace live true
         1)               |  do_page_fault() {
         1)               |    __do_page_fault() {
         1)   0.205 us    |      down_read_trylock();
         1)   0.041 us    |      __might_sleep();
         1)   0.344 us    |      find_vma();
         1)   0.069 us    |      up_read();
         1)   4.692 us    |    }
         1)   5.311 us    |  }
        ...
      
      Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      29ad23b0
  3. 03 10月, 2013 1 次提交
  4. 01 10月, 2013 4 次提交
    • N
      skbuff: size of hole is wrong in a comment · 45906723
      Nicolas Dichtel 提交于
      Since commit c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC
      reserves"), hole size is one bit less than what is written in the comment.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45906723
    • R
      mm: avoid reinserting isolated balloon pages into LRU lists · 117aad1e
      Rafael Aquini 提交于
      Isolated balloon pages can wrongly end up in LRU lists when
      migrate_pages() finishes its round without draining all the isolated
      page list.
      
      The same issue can happen when reclaim_clean_pages_from_list() tries to
      reclaim pages from an isolated page list, before migration, in the CMA
      path.  Such balloon page leak opens a race window against LRU lists
      shrinkers that leads us to the following kernel panic:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
        IP: [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
        PGD 3cda2067 PUD 3d713067 PMD 0
        Oops: 0000 [#1] SMP
        CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        RIP: shrink_page_list+0x24e/0x897
        RSP: 0000:ffff88003da499b8  EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
        RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
        RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
        R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
        R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
        FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
        Call Trace:
          shrink_inactive_list+0x240/0x3de
          shrink_lruvec+0x3e0/0x566
          __shrink_zone+0x94/0x178
          shrink_zone+0x3a/0x82
          balance_pgdat+0x32a/0x4c2
          kswapd+0x2f0/0x372
          kthread+0xa2/0xaa
          ret_from_fork+0x7c/0xb0
        Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
        RIP  [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
         RSP <ffff88003da499b8>
        CR2: 0000000000000028
        ---[ end trace 703d2451af6ffbfd ]---
        Kernel panic - not syncing: Fatal exception
      
      This patch fixes the issue, by assuring the proper tests are made at
      putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
      isolated balloon pages being wrongly reinserted in LRU lists.
      
      [akpm@linux-foundation.org: clarify awkward comment text]
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Reported-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      117aad1e
    • A
      include/asm-generic/vtime.h: avoid zero-length file · 2a156a6b
      Andrew Morton 提交于
      patch(1) can't handle zero-length files - it appears to simply not create
      the file, so my powerpc build fails.
      
      Put something in here to make life easier.
      
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a156a6b
    • P
      vxlan: Use RCU apis to access sk_user_data. · 559835ea
      Pravin B Shelar 提交于
      Use of RCU api makes vxlan code easier to understand.  It also
      fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
      In rare case without ACCESS_ONCE() compiler might omit vs on
      sk_user_data dereference.
      Compiler can use vs as alias for sk->sk_user_data, resulting in
      multiple sk_user_data dereference in rcu read context which
      could change.
      
      CC: Jesse Gross <jesse@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      559835ea
  5. 30 9月, 2013 1 次提交
  6. 29 9月, 2013 4 次提交
  7. 28 9月, 2013 1 次提交
  8. 27 9月, 2013 2 次提交
    • K
      Drivers: hv: util: Correctly support ws2008R2 and earlier · 3a491605
      K. Y. Srinivasan 提交于
      The current code does not correctly negotiate the version numbers for the util
      driver when hosted on earlier hosts. The version numbers presented by this
      driver were not compatible with the version numbers supported by Windows Server
      2008. Fix this problem.
      
      I would like to thank Olaf Hering (ohering@suse.com) for identifying the problem.
      Reported-by: NOlaf Hering <ohering@suse.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a491605
    • A
      bcma: make bcma_core_pci_{up,down}() callable from atomic context · 2bedea8f
      Arend van Spriel 提交于
      This patch removes the bcma_core_pci_power_save() call from
      the bcma_core_pci_{up,down}() functions as it tries to schedule
      thus requiring to call them from non-atomic context. The function
      bcma_core_pci_power_save() is now exported so the calling module
      can explicitly use it in non-atomic context. This fixes the
      'scheduling while atomic' issue reported by Tod Jackson and
      Joe Perches.
      
      [   13.210710] BUG: scheduling while atomic: dhcpcd/1800/0x00000202
      [   13.210718] Modules linked in: brcmsmac nouveau coretemp kvm_intel kvm cordic brcmutil bcma dell_wmi atl1c ttm mxm_wmi wmi
      [   13.210756] CPU: 2 PID: 1800 Comm: dhcpcd Not tainted 3.11.0-wl #1
      [   13.210762] Hardware name: Alienware M11x R2/M11x R2, BIOS A04 11/23/2010
      [   13.210767]  ffff880177c92c40 ffff880170fd1948 ffffffff8169af5b 0000000000000007
      [   13.210777]  ffff880170fd1ab0 ffff880170fd1958 ffffffff81697ee2 ffff880170fd19d8
      [   13.210785]  ffffffff816a19f5 00000000000f4240 000000000000d080 ffff880170fd1fd8
      [   13.210794] Call Trace:
      [   13.210813]  [<ffffffff8169af5b>] dump_stack+0x4f/0x84
      [   13.210826]  [<ffffffff81697ee2>] __schedule_bug+0x43/0x51
      [   13.210837]  [<ffffffff816a19f5>] __schedule+0x6e5/0x810
      [   13.210845]  [<ffffffff816a1c34>] schedule+0x24/0x70
      [   13.210855]  [<ffffffff816a04fc>] schedule_hrtimeout_range_clock+0x10c/0x150
      [   13.210867]  [<ffffffff810684e0>] ? update_rmtp+0x60/0x60
      [   13.210877]  [<ffffffff8106915f>] ? hrtimer_start_range_ns+0xf/0x20
      [   13.210887]  [<ffffffff816a054e>] schedule_hrtimeout_range+0xe/0x10
      [   13.210897]  [<ffffffff8104f6fb>] usleep_range+0x3b/0x40
      [   13.210910]  [<ffffffffa00371af>] bcma_pcie_mdio_set_phy.isra.3+0x4f/0x80 [bcma]
      [   13.210921]  [<ffffffffa003729f>] bcma_pcie_mdio_write.isra.4+0xbf/0xd0 [bcma]
      [   13.210932]  [<ffffffffa0037498>] bcma_pcie_mdio_writeread.isra.6.constprop.13+0x18/0x30 [bcma]
      [   13.210942]  [<ffffffffa00374ee>] bcma_core_pci_power_save+0x3e/0x80 [bcma]
      [   13.210953]  [<ffffffffa003765d>] bcma_core_pci_up+0x2d/0x60 [bcma]
      [   13.210975]  [<ffffffffa03dc17c>] brcms_c_up+0xfc/0x430 [brcmsmac]
      [   13.210989]  [<ffffffffa03d1a7d>] brcms_up+0x1d/0x20 [brcmsmac]
      [   13.211003]  [<ffffffffa03d2498>] brcms_ops_start+0x298/0x340 [brcmsmac]
      [   13.211020]  [<ffffffff81600a12>] ? cfg80211_netdev_notifier_call+0xd2/0x5f0
      [   13.211030]  [<ffffffff815fa53d>] ? packet_notifier+0xad/0x1d0
      [   13.211064]  [<ffffffff81656e75>] ieee80211_do_open+0x325/0xf80
      [   13.211076]  [<ffffffff8106ac09>] ? __raw_notifier_call_chain+0x9/0x10
      [   13.211086]  [<ffffffff81657b41>] ieee80211_open+0x71/0x80
      [   13.211101]  [<ffffffff81526267>] __dev_open+0x87/0xe0
      [   13.211109]  [<ffffffff8152650c>] __dev_change_flags+0x9c/0x180
      [   13.211117]  [<ffffffff815266a3>] dev_change_flags+0x23/0x70
      [   13.211127]  [<ffffffff8158cd68>] devinet_ioctl+0x5b8/0x6a0
      [   13.211136]  [<ffffffff8158d5c5>] inet_ioctl+0x75/0x90
      [   13.211147]  [<ffffffff8150b38b>] sock_do_ioctl+0x2b/0x70
      [   13.211155]  [<ffffffff8150b681>] sock_ioctl+0x71/0x2a0
      [   13.211169]  [<ffffffff8114ed47>] do_vfs_ioctl+0x87/0x520
      [   13.211180]  [<ffffffff8113f159>] ? ____fput+0x9/0x10
      [   13.211198]  [<ffffffff8106228c>] ? task_work_run+0x9c/0xd0
      [   13.211202]  [<ffffffff8114f271>] SyS_ioctl+0x91/0xb0
      [   13.211208]  [<ffffffff816aa252>] system_call_fastpath+0x16/0x1b
      [   13.211217] NOHZ: local_softirq_pending 202
      
      The issue was introduced in v3.11 kernel by following commit:
      
      commit aa51e598
      Author: Hauke Mehrtens <hauke@hauke-m.de>
      Date:   Sat Aug 24 00:32:31 2013 +0200
      
          brcmsmac: use bcma PCIe up and down functions
      
          replace the calls to bcma_core_pci_extend_L1timer() by calls to the
          newly introduced bcma_core_pci_ip() and bcma_core_pci_down()
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
          Cc: Arend van Spriel <arend@broadcom.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      
      This fix has been discussed with Hauke Mehrtens [1] selection
      option 3) and is intended for v3.12.
      
      Ref:
      [1] http://mid.gmane.org/5239B12D.3040206@hauke-m.de
      
      Cc: <stable@vger.kernel.org> # 3.11.x
      Cc: Tod Jackson <tod.jackson@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Reviewed-by: NHante Meuleman <meuleman@broadcom.com>
      Signed-off-by: NArend van Spriel <arend@broadcom.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      2bedea8f
  9. 26 9月, 2013 1 次提交
  10. 25 9月, 2013 7 次提交
    • P
      rcu: Consistent rcu_is_watching() naming · 5c173eb8
      Paul E. McKenney 提交于
      The old rcu_is_cpu_idle() function is just __rcu_is_watching() with
      preemption disabled.  This commit therefore renames rcu_is_cpu_idle()
      to rcu_is_watching.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      5c173eb8
    • P
      rcu: Is it safe to enter an RCU read-side critical section? · cc6783f7
      Paul E. McKenney 提交于
      There is currently no way for kernel code to determine whether it
      is safe to enter an RCU read-side critical section, in other words,
      whether or not RCU is paying attention to the currently running CPU.
      Given the large and increasing quantity of code shared by the idle loop
      and non-idle code, the this shortcoming is becoming increasingly painful.
      
      This commit therefore adds __rcu_is_watching(), which returns true if
      it is safe to enter an RCU read-side critical section on the currently
      running CPU.  This function is quite fast, using only a __this_cpu_read().
      However, the caller must disable preemption.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      cc6783f7
    • R
      of: clean-up ifdefs in of_irq.h · b0b8c960
      Rob Herring 提交于
      Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the
      amount ifdef'ed code. This fixes some  build warnings when CONFIG_OF
      is not enabled (seen on i386 and x86_64):
      
      include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default]
      include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
      include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default]
      
      Compile tested on i386, sparc and arm.
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Signed-off-by: NRob Herring <rob.herring@calxeda.com>
      b0b8c960
    • A
      revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code" · 0608f43d
      Andrew Morton 提交于
      Revert commit 3b38722e ("memcg, vmscan: integrate soft reclaim
      tighter with zone shrinking code")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0608f43d
    • A
      revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim" · b1aff7fc
      Andrew Morton 提交于
      Revert commit a5b7c87f ("vmscan, memcg: do softlimit reclaim also
      for targeted reclaim")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1aff7fc
    • A
      revert "memcg: enhance memcg iterator to support predicates" · 694fbc0f
      Andrew Morton 提交于
      Revert commit de57780d ("memcg: enhance memcg iterator to support
      predicates")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      694fbc0f
    • M
      watchdog: update watchdog_thresh properly · 9809b18f
      Michal Hocko 提交于
      watchdog_tresh controls how often nmi perf event counter checks per-cpu
      hrtimer_interrupts counter and blows up if the counter hasn't changed
      since the last check.  The counter is updated by per-cpu
      watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
      period which guarantees that hrtimer is scheduled 2 times per the main
      period.  Both hrtimer and perf event are started together when the
      watchdog is enabled.
      
      So far so good.  But...
      
      But what happens when watchdog_thresh is updated from sysctl handler?
      
      proc_dowatchdog will set a new sampling period and hrtimer callback
      (watchdog_timer_fn) will use the new value in the next round.  The
      problem, however, is that nobody tells the perf event that the sampling
      period has changed so it is ticking with the period configured when it
      has been set up.
      
      This might result in an ear ripping dissonance between perf and hrtimer
      parts if the watchdog_thresh is increased.  And even worse it might lead
      to KABOOM if the watchdog is configured to panic on such a spurious
      lockup.
      
      This patch fixes the issue by updating both nmi perf even counter and
      hrtimers if the threshold value has changed.
      
      The nmi one is disabled and then reinitialized from scratch.  This has
      an unpleasant side effect that the allocation of the new event might
      fail theoretically so the hard lockup detector would be disabled for
      such cpus.  On the other hand such a memory allocation failure is very
      unlikely because the original event is deallocated right before.
      
      It would be much nicer if we just changed perf event period but there
      doesn't seem to be any API to do that right now.  It is also unfortunate
      that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
      cannot use on_each_cpu() and do the same thing from the per-cpu context.
      The update from the current CPU should be safe because
      perf_event_disable removes the event atomically before it clears the
      per-cpu watchdog_ev so it cannot change anything under running handler
      feet.
      
      The hrtimer is simply restarted (thanks to Don Zickus who has pointed
      this out) if it is queued because we cannot rely it will fire&adopt to
      the new sampling period before a new nmi event triggers (when the
      treshold is decreased).
      
      [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Fabio Estevam <festevam@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9809b18f
  11. 24 9月, 2013 2 次提交
  12. 22 9月, 2013 1 次提交
  13. 21 9月, 2013 3 次提交
  14. 20 9月, 2013 5 次提交
    • M
      dm mpath: disable WRITE SAME if it fails · f84cb8a4
      Mike Snitzer 提交于
      Workaround the SCSI layer's problematic WRITE SAME heuristics by
      disabling WRITE SAME in the DM multipath device's queue_limits if an
      underlying device disabled it.
      
      The WRITE SAME heuristics, with both the original commit 5db44863
      ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
      66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
      WRITE SAME(10) even without successfully determining it is supported.
      After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
      for the device (by setting sdkp->device->no_write_same which results in
      'max_write_same_sectors' in device's queue_limits to be set to 0).
      
      When a device is stacked ontop of such a SCSI device any changes to that
      SCSI device's queue_limits do not automatically propagate up the stack.
      As such, a DM multipath device will not have its WRITE SAME support
      disabled.  This causes the block layer to continue to issue WRITE SAME
      requests to the mpath device which causes paths to fail and (if mpath IO
      isn't configured to queue when no paths are available) it will result in
      actual IO errors to the upper layers.
      
      This fix doesn't help configurations that have additional devices
      stacked ontop of the mpath device (e.g. LVM created linear DM devices
      ontop).  A proper fix that restacks all the queue_limits from the bottom
      of the device stack up will need to be explored if SCSI will continue to
      use this model of optimistically allowing op codes and then disabling
      them after they fail for the first time.
      
      Before this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      device-mapper: multipath: Failing path 8:112.
      end_request: I/O error, dev dm-6, sector 4616
      dm-6: WRITE SAME failed. Manually zeroing.
      end_request: I/O error, dev dm-6, sector 4616
      end_request: I/O error, dev dm-6, sector 5640
      end_request: I/O error, dev dm-6, sector 6664
      end_request: I/O error, dev dm-6, sector 7688
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      end_request: I/O error, dev dm-6, sector 524296
      Aborting journal on device dm-6-8.
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      33553920
      
      After this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      0
      
      It should be noted that WRITE SAME support wasn't enabled in DM
      multipath until v3.10.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.10+
      f84cb8a4
    • P
      perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' · fa731587
      Peter Zijlstra 提交于
      Solve the problems around the broken definition of perf_event_mmap_page::
      cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
      fixed by:
      
        860f085b ("perf: Fix broken union in 'struct perf_event_mmap_page'")
      
      The problem with the fix (merged in v3.12-rc1 and not yet released
      officially), noticed by Vince Weaver is that the new behavior is
      not detectable by new user-space, and that due to the reuse of the
      field names it's easy to mis-compile a binary if old headers are used
      on a new kernel or new headers are used on an old kernel.
      
      To solve all that make this change explicit, detectable and self-contained,
      by iterating the ABI the following way:
      
       - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
         confuse old user-space binaries. RDPMC will be marked as unavailable
         to old binaries but that's within the ABI, this is a capability bit.
      
       - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
         libraries can reliably detect that bit 0 is deprecated and perma-zero
         without having to check the kernel version.
      
       - Use bits 2, 3, 4 for the newly defined, correct functionality:
      
      	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
      	cap_user_time		: 1, /* The time_* fields are used */
      	cap_user_time_zero	: 1, /* The time_zero field is used */
      
       - Rename all the bitfield names in perf_event.h to be different from the
         old names, to make sure it's not possible to mis-compile it
         accidentally with old assumptions.
      
      The 'size' field can then be used in the future to add new fields and it
      will act as a natural ABI version indicator as well.
      
      Also adjust tools/perf/ userspace for the new definitions, noticed by
      Adrian Hunter.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Also-Fixed-by: NAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa731587
    • P
      perf: Update ABI comment · c5ecceef
      Peter Zijlstra 提交于
      For some mysterious reason the sample_id field of PERF_RECORD_MMAP went AWOL.
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c5ecceef
    • D
      Revert "drm: mark context support as a legacy subsystem" · c21eb21c
      Dave Airlie 提交于
      This reverts commit 7c510133.
      
      Well looks like not enough digging was done, libdrm_nouveau before 2.4.33
      used contexts,
      
      292da616fe1f936ca78a3fa8e1b1b19883e343b6 nouveau: pull in major libdrm rewrite
      
      got rid of them,
      Reported-by: NPaul Zimmerman <Paul.Zimmerman@synopsys.com>
      Reported-by: NMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      c21eb21c
    • A
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka 提交于
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      703133de
  15. 19 9月, 2013 3 次提交
    • J
      ipvs: make the service replacement more robust · bcbde4c0
      Julian Anastasov 提交于
      commit 578bc3ef ("ipvs: reorganize dest trash") added
      IP_VS_DEST_STATE_REMOVING flag and RCU callback named
      ip_vs_dest_wait_readers() to keep dests and services after
      removal for at least a RCU grace period. But we have the
      following corner cases:
      
      - we can not reuse the same dest if its service is removed
      while IP_VS_DEST_STATE_REMOVING is still set because another dest
      removal in the first grace period can not extend this period.
      It can happen when ipvsadm -C && ipvsadm -R is used.
      
      - dest->svc can be replaced but ip_vs_in_stats() and
      ip_vs_out_stats() have no explicit read memory barriers
      when accessing dest->svc. It can happen that dest->svc
      was just freed (replaced) while we use it to update
      the stats.
      
      We solve the problems as follows:
      
      - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed
      idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start
      will remember when for first time after deletion we noticed
      dest->refcnt=0. Later, the connections can grab a reference
      while in RCU grace period but if refcnt becomes 0 we can
      safely free the dest and its svc.
      
      - dest->svc becomes RCU pointer. As result, we add explicit
      RCU locking in ip_vs_in_stats() and ip_vs_out_stats().
      
      - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it
      now can free the service immediately or after a RCU grace
      period. dest->svc is not set to NULL anymore.
      
      	As result, unlinked dests and their services are
      freed always after IP_VS_DEST_TRASH_PERIOD period, unused
      services are freed after a RCU grace period.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      bcbde4c0
    • S
      ipvs: fix overflow on dest weight multiply · c16526a7
      Simon Kirby 提交于
      Schedulers such as lblc and lblcr require the weight to be as high as the
      maximum number of active connections. In commit b552f7e3
      ("ipvs: unify the formula to estimate the overhead of processing
      connections"), the consideration of inactconns and activeconns was cleaned
      up to always count activeconns as 256 times more important than inactconns.
      In cases where 3000 or more connections are expected, a weight of 3000 *
      256 * 3000 connections overflows the 32-bit signed result used to determine
      if rescheduling is required.
      
      On amd64, this merely changes the multiply and comparison instructions to
      64-bit. On x86, a 64-bit result is already present from imull, so only
      a few more comparison instructions are emitted.
      Signed-off-by: NSimon Kirby <sim@hostway.ca>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      c16526a7
    • J
      Bluetooth: Introduce a new HCI_RFKILLED flag · 5e130367
      Johan Hedberg 提交于
      This makes it more convenient to check for rfkill (no need to check for
      dev->rfkill before calling rfkill_blocked()) and also avoids potential
      races if the RFKILL state needs to be checked from within the rfkill
      callback.
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      Cc: stable@vger.kernel.org
      Acked-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      5e130367
  16. 18 9月, 2013 1 次提交
  17. 17 9月, 2013 2 次提交