1. 07 12月, 2013 4 次提交
    • E
      tcp: auto corking · f54b3111
      Eric Dumazet 提交于
      With the introduction of TCP Small Queues, TSO auto sizing, and TCP
      pacing, we can implement Automatic Corking in the kernel, to help
      applications doing small write()/sendmsg() to TCP sockets.
      
      Idea is to change tcp_push() to check if the current skb payload is
      under skb optimal size (a multiple of MSS bytes)
      
      If under 'size_goal', and at least one packet is still in Qdisc or
      NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
      will be delayed up to TX completion time.
      
      This delay might allow the application to coalesce more bytes
      in the skb in following write()/sendmsg()/sendfile() system calls.
      
      The exact duration of the delay is depending on the dynamics
      of the system, and might be zero if no packet for this flow
      is actually held in Qdisc or NIC TX ring.
      
      Using FQ/pacing is a way to increase the probability of
      autocorking being triggered.
      
      Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
      this feature and default it to 1 (enabled)
      
      Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
      This counter is incremented every time we detected skb was under used
      and its flush was deferred.
      
      Tested:
      
      Interesting effects when using line buffered commands under ssh.
      
      Excellent performance results in term of cpu usage and total throughput.
      
      lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
      lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
      9410.39
      
       Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      
            35209.439626 task-clock                #    2.901 CPUs utilized
                   2,294 context-switches          #    0.065 K/sec
                     101 CPU-migrations            #    0.003 K/sec
                   4,079 page-faults               #    0.116 K/sec
          97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
          51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
          25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
         102,225,978,536 instructions              #    1.04  insns per cycle
                                                   #    0.51  stalled cycles per insn [83.38%]
          18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
              91,679,646 branch-misses             #    0.49% of all branches         [83.40%]
      
            12.136204899 seconds time elapsed
      
      lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
      lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
      6624.89
      
       Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
            40045.864494 task-clock                #    3.301 CPUs utilized
                     171 context-switches          #    0.004 K/sec
                      53 CPU-migrations            #    0.001 K/sec
                   4,080 page-faults               #    0.102 K/sec
         111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
          61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
          29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
         108,654,349,355 instructions              #    0.98  insns per cycle
                                                   #    0.57  stalled cycles per insn [83.34%]
          19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
             157,875,417 branch-misses             #    0.81% of all branches         [83.34%]
      
            12.130267788 seconds time elapsed
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f54b3111
    • J
      netfilter: Fix FSF address in file headers · e664eabd
      Jeff Kirsher 提交于
      Several files refer to an old address for the Free Software Foundation
      in the file header comment.  Resolve by replacing the address with
      the URL <http://www.gnu.org/licenses/> so that we do not have to keep
      updating the header comments anytime the address changes.
      
      CC: netfilter@vger.kernel.org
      CC: Pablo Neira Ayuso <pablo@netfilter.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e664eabd
    • J
      include/net/: Fix FSF address in file headers · a6227e26
      Jeff Kirsher 提交于
      Several files refer to an old address for the Free Software Foundation
      in the file header comment.  Resolve by replacing the address with
      the URL <http://www.gnu.org/licenses/> so that we do not have to keep
      updating the header comments anytime the address changes.
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6227e26
    • J
      sctp: Fix FSF address in file headers · 4b2f13a2
      Jeff Kirsher 提交于
      Several files refer to an old address for the Free Software Foundation
      in the file header comment.  Resolve by replacing the address with
      the URL <http://www.gnu.org/licenses/> so that we do not have to keep
      updating the header comments anytime the address changes.
      
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b2f13a2
  2. 03 12月, 2013 2 次提交
  3. 01 12月, 2013 1 次提交
  4. 29 11月, 2013 3 次提交
  5. 26 11月, 2013 2 次提交
    • S
      tracing: Allow events to have NULL strings · 4e58e547
      Steven Rostedt (Red Hat) 提交于
      If an TRACE_EVENT() uses __assign_str() or __get_str on a NULL pointer
      then the following oops will happen:
      
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      IP: [<c127a17b>] strlen+0x10/0x1a
      *pde = 00000000 ^M
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc1-test+ #2
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006^M
      task: f5cde9f0 ti: f5e5e000 task.ti: f5e5e000
      EIP: 0060:[<c127a17b>] EFLAGS: 00210046 CPU: 1
      EIP is at strlen+0x10/0x1a
      EAX: 00000000 EBX: c2472da8 ECX: ffffffff EDX: c2472da8
      ESI: c1c5e5fc EDI: 00000000 EBP: f5e5fe84 ESP: f5e5fe80
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      CR0: 8005003b CR2: 00000000 CR3: 01f32000 CR4: 000007d0
      Stack:
       f5f18b90 f5e5feb8 c10687a8 0759004f 00000005 00000005 00000005 00200046
       00000002 00000000 c1082a93 f56c7e28 c2472da8 c1082a93 f5e5fee4 c106bc61^M
       00000000 c1082a93 00000000 00000000 00000001 00200046 00200082 00000000
      Call Trace:
       [<c10687a8>] ftrace_raw_event_lock+0x39/0xc0
       [<c1082a93>] ? ktime_get+0x29/0x69
       [<c1082a93>] ? ktime_get+0x29/0x69
       [<c106bc61>] lock_release+0x57/0x1a5
       [<c1082a93>] ? ktime_get+0x29/0x69
       [<c10824dd>] read_seqcount_begin.constprop.7+0x4d/0x75
       [<c1082a93>] ? ktime_get+0x29/0x69^M
       [<c1082a93>] ktime_get+0x29/0x69
       [<c108a46a>] __tick_nohz_idle_enter+0x1e/0x426
       [<c10690e8>] ? lock_release_holdtime.part.19+0x48/0x4d
       [<c10bc184>] ? time_hardirqs_off+0xe/0x28
       [<c1068c82>] ? trace_hardirqs_off_caller+0x3f/0xaf
       [<c108a8cb>] tick_nohz_idle_enter+0x59/0x62
       [<c1079242>] cpu_startup_entry+0x64/0x192
       [<c102299c>] start_secondary+0x277/0x27c
      Code: 90 89 c6 89 d0 88 c4 ac 38 e0 74 09 84 c0 75 f7 be 01 00 00 00 89 f0 48 5e 5d c3 55 89 e5 57 66 66 66 66 90 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 8d 41 ff 5f 5d c3 55 89 e5 57 66 66 66 66 90 31 ff
      EIP: [<c127a17b>] strlen+0x10/0x1a SS:ESP 0068:f5e5fe80
      CR2: 0000000000000000
      ---[ end trace 01bc47bf519ec1b2 ]---
      
      New tracepoints have been added that have allowed for NULL pointers
      being assigned to strings. To fix this, change the TRACE_EVENT() code
      to check for NULL and if it is, it will assign "(null)" to it instead
      (similar to what glibc printf does).
      Reported-by: NShuah Khan <shuah.kh@samsung.com>
      Reported-by: NJovi Zhangwei <jovi.zhangwei@gmail.com>
      Link: http://lkml.kernel.org/r/CAGdX0WFeEuy+DtpsJzyzn0343qEEjLX97+o1VREFkUEhndC+5Q@mail.gmail.com
      Link: http://lkml.kernel.org/r/528D6972.9010702@samsung.com
      Fixes: 9cbf1176 ("tracing/events: provide string with undefined size support")
      Cc: stable@vger.kernel.org # 2.6.31+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4e58e547
    • T
      ARM: tegra: Provide dummy powergate implementation · 9886e1fd
      Thierry Reding 提交于
      In order to support increased build test coverage for drivers, implement
      dummies for the powergate implementation. This will allow the drivers to
      be built without requiring support for Tegra to be selected.
      
      This patch solves the following build errors, which can be triggered in
      v3.13-rc1 by selecting DRM_TEGRA without ARCH_TEGRA:
      
      drivers/built-in.o: In function `gr3d_remove':
      drivers/gpu/drm/tegra/gr3d.c:321: undefined reference to `tegra_powergate_power_off'
      drivers/gpu/drm/tegra/gr3d.c:325: undefined reference to `tegra_powergate_power_off'
      drivers/built-in.o: In function `gr3d_probe':
      drivers/gpu/drm/tegra/gr3d.c:266: undefined reference to `tegra_powergate_sequence_power_up'
      drivers/gpu/drm/tegra/gr3d.c:273: undefined reference to `tegra_powergate_sequence_power_up'
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      [swarren, updated commit description]
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      9886e1fd
  6. 25 11月, 2013 2 次提交
  7. 24 11月, 2013 1 次提交
  8. 22 11月, 2013 5 次提交
    • K
      mm: place page->pmd_huge_pte to right union · 7aa555bf
      Kirill A. Shutemov 提交于
      I don't know what went wrong, mis-merge or something, but ->pmd_huge_pte
      placed in wrong union within struct page.
      
      In original patch[1] it's placed to union with ->lru and ->slab, but in
      commit e009bb30 ("mm: implement split page table lock for PMD
      level") it's in union with ->index and ->freelist.
      
      That union seems also unused for pages with table tables and safe to
      re-use, but it's not what I've tested.
      
      Let's move it to original place.  It fixes indentation at least.  :)
      
      [1] https://lkml.org/lkml/2013/10/7/288Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7aa555bf
    • A
      mm: hugetlbfs: fix hugetlbfs optimization · 27c73ae7
      Andrea Arcangeli 提交于
      Commit 7cb2ef56 ("mm: fix aio performance regression for database
      caused by THP") can cause dereference of a dangling pointer if
      split_huge_page runs during PageHuge() if there are updates to the
      tail_page->private field.
      
      Also it is repeating compound_head twice for hugetlbfs and it is running
      compound_head+compound_trans_head for THP when a single one is needed in
      both cases.
      
      The new code within the PageSlab() check doesn't need to verify that the
      THP page size is never bigger than the smallest hugetlbfs page size, to
      avoid memory corruption.
      
      A longstanding theoretical race condition was found while fixing the
      above (see the change right after the skip_unlock label, that is
      relevant for the compound_lock path too).
      
      By re-establishing the _mapcount tail refcounting for all compound
      pages, this also fixes the below problem:
      
        echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
        BUG: Bad page state in process bash  pfn:59a01
        page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
        page flags: 0x1c00000000008000(tail)
        Modules linked in:
        CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        Call Trace:
          dump_stack+0x55/0x76
          bad_page+0xd5/0x130
          free_pages_prepare+0x213/0x280
          __free_pages+0x36/0x80
          update_and_free_page+0xc1/0xd0
          free_pool_huge_page+0xc2/0xe0
          set_max_huge_pages.part.58+0x14c/0x220
          nr_hugepages_store_common.isra.60+0xd0/0xf0
          nr_hugepages_store+0x13/0x20
          kobj_attr_store+0xf/0x20
          sysfs_write_file+0x189/0x1e0
          vfs_write+0xc5/0x1f0
          SyS_write+0x55/0xb0
          system_call_fastpath+0x16/0x1b
      Signed-off-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27c73ae7
    • D
      mm: thp: give transparent hugepage code a separate copy_page · 30b0a105
      Dave Hansen 提交于
      Right now, the migration code in migrate_page_copy() uses copy_huge_page()
      for hugetlbfs and thp pages:
      
             if (PageHuge(page) || PageTransHuge(page))
                      copy_huge_page(newpage, page);
      
      So, yay for code reuse.  But:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
      
      and a non-hugetlbfs page has no page_hstate().  This works 99% of the
      time because page_hstate() determines the hstate from the page order
      alone.  Since the page order of a THP page matches the default hugetlbfs
      page order, it works.
      
      But, if you change the default huge page size on the boot command-line
      (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
      so page_hstate() returns null and copy_huge_page() oopses pretty fast
      since copy_huge_page() dereferences the hstate:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
              if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
        ...
      
      Mel noticed that the migration code is really the only user of these
      functions.  This moves all the copy code over to migrate.c and makes
      copy_huge_page() work for THP by checking for it explicitly.
      
      I believe the bug was introduced in commit b32967ff ("mm: numa: Add
      THP migration for the NUMA working set scanning fault case")
      
      [akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30b0a105
    • J
      genetlink: fix genl_set_err() group ID · 91398a09
      Johannes Berg 提交于
      Fix another really stupid bug - I introduced genl_set_err()
      precisely to be able to adjust the group and reject invalid
      ones, but then forgot to do so.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91398a09
    • J
      genetlink: fix genlmsg_multicast() bug · 220815a9
      Johannes Berg 提交于
      Unfortunately, I introduced a tremendously stupid bug into
      genlmsg_multicast() when doing all those multicast group
      changes: it adjusts the group number, but then passes it
      to genlmsg_multicast_netns() which does that again.
      
      Somehow, my tests failed to catch this, so add a warning
      into genlmsg_multicast_netns() and remove the offending
      group ID adjustment.
      
      Also add a warning to the similar code in other functions
      so people who misuse them are more loudly warned.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      220815a9
  9. 21 11月, 2013 8 次提交
  10. 20 11月, 2013 12 次提交