1. 25 2月, 2016 2 次提交
    • D
      kbuild: Add option to turn incompatible pointer check into error · ef50c046
      Daniel Wagner 提交于
      With the introduction of the simple wait API we have two very
      similar APIs in the kernel. For example wake_up() and swake_up()
      is only one character away. Although the compiler will warn
      happily the wrong usage it keeps on going an even links the kernel.
      Thomas and Peter would rather like to see early missuses reported
      as error early on.
      
      In a first attempt we tried to wrap all swait and wait calls
      into a macro which has an compile time type assertion. The result
      was pretty ugly and wasn't able to catch all wrong usages.
      woken_wake_function(), autoremove_wake_function() and wake_bit_function()
      are assigned as function pointers. Wrapping them with a macro around is
      not possible. Prefixing them with '_' was also not a real option
      because there some users in the kernel which do use them as well.
      All in all this attempt looked to intrusive and too ugly.
      
      An alternative is to turn the pointer type check into an error which
      catches wrong type uses. Obviously not only the swait/wait ones. That
      isn't a bad thing either.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-3-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ef50c046
    • P
      wait.[ch]: Introduce the simple waitqueue (swait) implementation · 13b35686
      Peter Zijlstra (Intel) 提交于
      The existing wait queue support has support for custom wake up call
      backs, wake flags, wake key (passed to call back) and exclusive
      flags that allow wakers to be tagged as exclusive, for limiting
      the number of wakers.
      
      In a lot of cases, none of these features are used, and hence we
      can benefit from a slimmed down version that lowers memory overhead
      and reduces runtime overhead.
      
      The concept originated from -rt, where waitqueues are a constant
      source of trouble, as we can't convert the head lock to a raw
      spinlock due to fancy and long lasting callbacks.
      
      With the removal of custom callbacks, we can use a raw lock for
      queue list manipulations, hence allowing the simple wait support
      to be used in -rt.
      
      [Patch is from PeterZ which is based on Thomas version. Commit message is
       written by Paul G.
       Daniel:  - Fixed some compile issues
       	  - Added non-lazy implementation of swake_up_locked as suggested
      	     by Boqun Feng.]
      Originally-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-2-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      13b35686
  2. 24 2月, 2016 1 次提交
  3. 17 2月, 2016 1 次提交
  4. 09 2月, 2016 2 次提交
    • R
      sched/numa: Spread memory according to CPU and memory use · 4142c3eb
      Rik van Riel 提交于
      The pseudo-interleaving in NUMA placement has a fundamental problem:
      using hard usage thresholds to spread memory equally between nodes
      can prevent workloads from converging, or keep memory "trapped" on
      nodes where the workload is barely running any more.
      
      In order for workloads to properly converge, the memory migration
      should not be stopped when nodes reach parity, but instead be
      distributed according to how heavily memory is used from each node.
      This way memory migration and task migration reinforce each other,
      instead of one putting the brakes on the other.
      
      Remove the hard thresholds from the pseudo-interleaving code, and
      instead use a more gradual policy on memory placement. This also
      seems to improve convergence of workloads that do not run flat out,
      but sleep in between bursts of activity.
      
      We still want to slow down NUMA scanning and migration once a workload
      has settled on a few actively used nodes, so keep the 3/4 hysteresis
      in place. Keep track of whether a workload is actively running on
      multiple nodes, so task_numa_migrate does a full scan of the system
      for better task placement.
      
      In the case of running 3 SPECjbb2005 instances on a 4 node system,
      this code seems to result in fairer distribution of memory between
      nodes, with more memory bandwidth for each instance.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mgorman@suse.de
      Link: http://lkml.kernel.org/r/20160125170739.2fc9a641@annuminas.surriel.com
      [ Minor readability tweaks. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4142c3eb
    • M
      sched/debug: Make schedstats a runtime tunable that is disabled by default · cb251765
      Mel Gorman 提交于
      schedstats is very useful during debugging and performance tuning but it
      incurs overhead to calculate the stats. As such, even though it can be
      disabled at build time, it is often enabled as the information is useful.
      
      This patch adds a kernel command-line and sysctl tunable to enable or
      disable schedstats on demand (when it's built in). It is disabled
      by default as someone who knows they need it can also learn to enable
      it when necessary.
      
      The benefits are dependent on how scheduler-intensive the workload is.
      If it is then the patch reduces the number of cycles spent calculating
      the stats with a small benefit from reducing the cache footprint of the
      scheduler.
      
      These measurements were taken from a 48-core 2-socket
      machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a
      single socket machine 8-core machine with Intel i7-3770 processors.
      
      netperf-tcp
                                 4.5.0-rc1             4.5.0-rc1
                                   vanilla          nostats-v3r1
      Hmean    64         560.45 (  0.00%)      575.98 (  2.77%)
      Hmean    128        766.66 (  0.00%)      795.79 (  3.80%)
      Hmean    256        950.51 (  0.00%)      981.50 (  3.26%)
      Hmean    1024      1433.25 (  0.00%)     1466.51 (  2.32%)
      Hmean    2048      2810.54 (  0.00%)     2879.75 (  2.46%)
      Hmean    3312      4618.18 (  0.00%)     4682.09 (  1.38%)
      Hmean    4096      5306.42 (  0.00%)     5346.39 (  0.75%)
      Hmean    8192     10581.44 (  0.00%)    10698.15 (  1.10%)
      Hmean    16384    18857.70 (  0.00%)    18937.61 (  0.42%)
      
      Small gains here, UDP_STREAM showed nothing intresting and neither did
      the TCP_RR tests. The gains on the 8-core machine were very similar.
      
      tbench4
                                       4.5.0-rc1             4.5.0-rc1
                                         vanilla          nostats-v3r1
      Hmean    mb/sec-1         500.85 (  0.00%)      522.43 (  4.31%)
      Hmean    mb/sec-2         984.66 (  0.00%)     1018.19 (  3.41%)
      Hmean    mb/sec-4        1827.91 (  0.00%)     1847.78 (  1.09%)
      Hmean    mb/sec-8        3561.36 (  0.00%)     3611.28 (  1.40%)
      Hmean    mb/sec-16       5824.52 (  0.00%)     5929.03 (  1.79%)
      Hmean    mb/sec-32      10943.10 (  0.00%)    10802.83 ( -1.28%)
      Hmean    mb/sec-64      15950.81 (  0.00%)    16211.31 (  1.63%)
      Hmean    mb/sec-128     15302.17 (  0.00%)    15445.11 (  0.93%)
      Hmean    mb/sec-256     14866.18 (  0.00%)    15088.73 (  1.50%)
      Hmean    mb/sec-512     15223.31 (  0.00%)    15373.69 (  0.99%)
      Hmean    mb/sec-1024    14574.25 (  0.00%)    14598.02 (  0.16%)
      Hmean    mb/sec-2048    13569.02 (  0.00%)    13733.86 (  1.21%)
      Hmean    mb/sec-3072    12865.98 (  0.00%)    13209.23 (  2.67%)
      
      Small gains of 2-4% at low thread counts and otherwise flat.  The
      gains on the 8-core machine were slightly different
      
      tbench4 on 8-core i7-3770 single socket machine
      Hmean    mb/sec-1        442.59 (  0.00%)      448.73 (  1.39%)
      Hmean    mb/sec-2        796.68 (  0.00%)      794.39 ( -0.29%)
      Hmean    mb/sec-4       1322.52 (  0.00%)     1343.66 (  1.60%)
      Hmean    mb/sec-8       2611.65 (  0.00%)     2694.86 (  3.19%)
      Hmean    mb/sec-16      2537.07 (  0.00%)     2609.34 (  2.85%)
      Hmean    mb/sec-32      2506.02 (  0.00%)     2578.18 (  2.88%)
      Hmean    mb/sec-64      2511.06 (  0.00%)     2569.16 (  2.31%)
      Hmean    mb/sec-128     2313.38 (  0.00%)     2395.50 (  3.55%)
      Hmean    mb/sec-256     2110.04 (  0.00%)     2177.45 (  3.19%)
      Hmean    mb/sec-512     2072.51 (  0.00%)     2053.97 ( -0.89%)
      
      In constract, this shows a relatively steady 2-3% gain at higher thread
      counts. Due to the nature of the patch and the type of workload, it's
      not a surprise that the result will depend on the CPU used.
      
      hackbench-pipes
                               4.5.0-rc1             4.5.0-rc1
                                 vanilla          nostats-v3r1
      Amean    1        0.0637 (  0.00%)      0.0660 ( -3.59%)
      Amean    4        0.1229 (  0.00%)      0.1181 (  3.84%)
      Amean    7        0.1921 (  0.00%)      0.1911 (  0.52%)
      Amean    12       0.3117 (  0.00%)      0.2923 (  6.23%)
      Amean    21       0.4050 (  0.00%)      0.3899 (  3.74%)
      Amean    30       0.4586 (  0.00%)      0.4433 (  3.33%)
      Amean    48       0.5910 (  0.00%)      0.5694 (  3.65%)
      Amean    79       0.8663 (  0.00%)      0.8626 (  0.43%)
      Amean    110      1.1543 (  0.00%)      1.1517 (  0.22%)
      Amean    141      1.4457 (  0.00%)      1.4290 (  1.16%)
      Amean    172      1.7090 (  0.00%)      1.6924 (  0.97%)
      Amean    192      1.9126 (  0.00%)      1.9089 (  0.19%)
      
      Some small gains and losses and while the variance data is not included,
      it's close to the noise. The UMA machine did not show anything particularly
      different
      
      pipetest
                                   4.5.0-rc1             4.5.0-rc1
                                     vanilla          nostats-v2r2
      Min         Time        4.13 (  0.00%)        3.99 (  3.39%)
      1st-qrtle   Time        4.38 (  0.00%)        4.27 (  2.51%)
      2nd-qrtle   Time        4.46 (  0.00%)        4.39 (  1.57%)
      3rd-qrtle   Time        4.56 (  0.00%)        4.51 (  1.10%)
      Max-90%     Time        4.67 (  0.00%)        4.60 (  1.50%)
      Max-93%     Time        4.71 (  0.00%)        4.65 (  1.27%)
      Max-95%     Time        4.74 (  0.00%)        4.71 (  0.63%)
      Max-99%     Time        4.88 (  0.00%)        4.79 (  1.84%)
      Max         Time        4.93 (  0.00%)        4.83 (  2.03%)
      Mean        Time        4.48 (  0.00%)        4.39 (  1.91%)
      Best99%Mean Time        4.47 (  0.00%)        4.39 (  1.91%)
      Best95%Mean Time        4.46 (  0.00%)        4.38 (  1.93%)
      Best90%Mean Time        4.45 (  0.00%)        4.36 (  1.98%)
      Best50%Mean Time        4.36 (  0.00%)        4.25 (  2.49%)
      Best10%Mean Time        4.23 (  0.00%)        4.10 (  3.13%)
      Best5%Mean  Time        4.19 (  0.00%)        4.06 (  3.20%)
      Best1%Mean  Time        4.13 (  0.00%)        4.00 (  3.39%)
      
      Small improvement and similar gains were seen on the UMA machine.
      
      The gain is small but it stands to reason that doing less work in the
      scheduler is a good thing. The downside is that the lack of schedstats and
      tracepoints may be surprising to experts doing performance analysis until
      they find the existence of the schedstats= parameter or schedstats sysctl.
      It will be automatically activated for latencytop and sleep profiling to
      alleviate the problem. For tracepoints, there is a simple warning as it's
      not safe to activate schedstats in the context when it's known the tracepoint
      may be wanted but is unavailable.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb251765
  5. 05 2月, 2016 1 次提交
  6. 04 2月, 2016 24 次提交
  7. 02 2月, 2016 3 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 34229b27
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "This looks like a lot but it's a mixture of regression fixes as well
        as fixes for longer standing issues.
      
         1) Fix on-channel cancellation in mac80211, from Johannes Berg.
      
         2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
            module, from Eric Dumazet.
      
         3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
            Dumazet.
      
         4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
            bound, from Craig Gallek.
      
         5) GRO key comparisons don't take lightweight tunnels into account,
            from Jesse Gross.
      
         6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
            Dumazet.
      
         7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
            register them, otherwise the NEWLINK netlink message is missing
            the proper attributes.  From Thadeu Lima de Souza Cascardo.
      
         8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
            Schimmel
      
         9) Handle fragments properly in ipv4 easly socket demux, from Eric
            Dumazet.
      
        10) Don't ignore the ifindex key specifier on ipv6 output route
            lookups, from Paolo Abeni"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
        tcp: avoid cwnd undo after receiving ECN
        irda: fix a potential use-after-free in ircomm_param_request
        net: tg3: avoid uninitialized variable warning
        net: nb8800: avoid uninitialized variable warning
        net: vxge: avoid unused function warnings
        net: bgmac: clarify CONFIG_BCMA dependency
        net: hp100: remove unnecessary #ifdefs
        net: davinci_cpdma: use dma_addr_t for DMA address
        ipv6/udp: use sticky pktinfo egress ifindex on connect()
        ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
        netlink: not trim skb for mmaped socket when dump
        vxlan: fix a out of bounds access in __vxlan_find_mac
        net: dsa: mv88e6xxx: fix port VLAN maps
        fib_trie: Fix shift by 32 in fib_table_lookup
        net: moxart: use correct accessors for DMA memory
        ipv4: ipconfig: avoid unused ic_proto_used symbol
        bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
        bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
        bnxt_en: Ring free response from close path should use completion ring
        net_sched: drr: check for NULL pointer in drr_dequeue
        ...
      34229b27
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 2c923414
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
       "This fixes the following issues:
      
        API:
         - algif_hash needs to wait for init operations to complete.
         - The has_key setting for shash was always true.
      
        Algorithms:
         - Add missing selections of CRYPTO_HASH.
         - Fix pkcs7 authentication.
      
        Drivers:
         - Fix stack alignment bug in chacha20-ssse3.
         - Fix performance regression in caam due to incorrect setting.
         - Fix potential compile-only build failure of stm32"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: atmel-aes - remove calls of clk_prepare() from atomic contexts
        crypto: algif_hash - wait for crypto_ahash_init() to complete
        crypto: shash - Fix has_key setting
        hwrng: stm32 - Fix dependencies for !HAS_IOMEM archs
        crypto: ghash,poly1305 - select CRYPTO_HASH where needed
        crypto: chacha20-ssse3 - Align stack pointer to 64 bytes
        PKCS#7: Don't require SpcSpOpusInfo in Authenticode pkcs7 signatures
        crypto: caam - make write transactions bufferable on PPC platforms
      2c923414
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 29a8ea4f
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved
           area for storing a struct page array.
      
        2/ Fixes for dax operations on a raw block device to prevent pagecache
           collisions with dax mappings.
      
        3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null
           pointer de-reference.
      
        These have received build success notification from the kbuild robot
        across 153 configs and pass the latest ndctl tests"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        phys_to_pfn_t: use phys_addr_t
        mm: fix pfn_t to page conversion in vm_insert_mixed
        block: use DAX for partition table reads
        block: revert runtime dax control of the raw block device
        fs, block: force direct-I/O for dax-enabled block devices
        devm_memremap_pages: fix vmem_altmap lifetime + alignment handling
        libnvdimm, pfn: fix restoring memmap location
        libnvdimm: fix mode determination for e820 devices
      29a8ea4f
  8. 01 2月, 2016 6 次提交