1. 26 3月, 2018 1 次提交
  2. 15 3月, 2018 1 次提交
  3. 15 2月, 2018 1 次提交
  4. 01 2月, 2018 1 次提交
    • J
      mm: use sc->priority for slab shrink targets · 9092c71b
      Josef Bacik 提交于
      Previously we were using the ratio of the number of lru pages scanned to
      the number of eligible lru pages to determine the number of slab objects
      to scan.  The problem with this is that these two things have nothing to
      do with each other, so in slab heavy work loads where there is little to
      no page cache we can end up with the pages scanned being a very low
      number.  This means that we reclaim next to no slab pages and waste a
      lot of time reclaiming small amounts of space.
      
      Consider the following scenario, where we have the following values and
      the rest of the memory usage is in slab
      
        Active:            58840 kB
        Inactive:          46860 kB
      
      Every time we do a get_scan_count() we do this
      
        scan = size >> sc->priority
      
      where sc->priority starts at DEF_PRIORITY, which is 12.  The first loop
      through reclaim would result in a scan target of 2 pages to 11715 total
      inactive pages, and 3 pages to 14710 total active pages.  This is a
      really really small target for a system that is entirely slab pages.
      And this is super optimistic, this assumes we even get to scan these
      pages.  We don't increment sc->nr_scanned unless we 1) isolate the page,
      which assumes it's not in use, and 2) can lock the page.  Under pressure
      these numbers could probably go down, I'm sure there's some random pages
      from daemons that aren't actually in use, so the targets get even
      smaller.
      
      Instead use sc->priority in the same way we use it to determine scan
      amounts for the lru's.  This generally equates to pages.  Consider the
      following
      
        slab_pages = (nr_objects * object_size) / PAGE_SIZE
      
      What we would like to do is
      
        scan = slab_pages >> sc->priority
      
      but we don't know the number of slab pages each shrinker controls, only
      the objects.  However say that theoretically we knew how many pages a
      shrinker controlled, we'd still have to convert this to objects, which
      would look like the following
      
        scan = shrinker_pages >> sc->priority
        scan_objects = (PAGE_SIZE / object_size) * scan
      
      or written another way
      
        scan_objects = (shrinker_pages >> sc->priority) *
      		 (PAGE_SIZE / object_size)
      
      which can thus be written
      
        scan_objects = ((shrinker_pages * PAGE_SIZE) / object_size) >>
      		 sc->priority
      
      which is just
      
        scan_objects = nr_objects >> sc->priority
      
      We don't need to know exactly how many pages each shrinker represents,
      it's objects are all the information we need.  Making this change allows
      us to place an appropriate amount of pressure on the shrinker pools for
      their relative size.
      
      Link: http://lkml.kernel.org/r/1510780549-6812-1-git-send-email-josef@toxicpanda.comSigned-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDave Chinner <david@fromorbit.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9092c71b
  5. 24 1月, 2018 1 次提交
  6. 23 1月, 2018 14 次提交
  7. 22 1月, 2018 1 次提交
  8. 16 1月, 2018 3 次提交
  9. 09 1月, 2018 1 次提交
  10. 03 1月, 2018 4 次提交
  11. 28 12月, 2017 1 次提交
  12. 27 12月, 2017 1 次提交
    • M
      tcp: Avoid preprocessor directives in tracepoint macro args · 6a6b0b99
      Mat Martineau 提交于
      Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
      a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
      of errors:
      
      ./include/trace/events/tcp.h:68:1: error: directive in argument list
      ./include/trace/events/tcp.h:75:1: error: directive in argument list
      ./include/trace/events/tcp.h:144:1: error: directive in argument list
      ./include/trace/events/tcp.h:151:1: error: directive in argument list
      ./include/trace/events/tcp.h:216:1: error: directive in argument list
      ./include/trace/events/tcp.h:223:1: error: directive in argument list
      ./include/trace/events/tcp.h:274:1: error: directive in argument list
      ./include/trace/events/tcp.h:281:1: error: directive in argument list
      
      Once sparse finds an error, it stops printing warnings for the file it
      is checking. This masks any sparse warnings that would normally be
      reported for the core TCP code.
      
      Instead, handle the preprocessor conditionals in a couple of auxiliary
      macros. This also has the benefit of reducing duplicate code.
      
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a6b0b99
  13. 21 12月, 2017 2 次提交
    • Y
      net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint · 563e0bb0
      Yafang Shao 提交于
      As sk_state is a common field for struct sock, so the state
      transition tracepoint should not be a TCP specific feature.
      Currently it traces all AF_INET state transition, so I rename this
      tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
      into trace/events/sock.h.
      We dont need to create a file named trace/events/inet_sock.h for this one single
      tracepoint.
      
      Two helpers are introduced to trace sk_state transition
          - void inet_sk_state_store(struct sock *sk, int newstate);
          - void inet_sk_set_state(struct sock *sk, int state);
      As trace header should not be included in other header files,
      so they are defined in sock.c.
      
      The protocol such as SCTP maybe compiled as a ko, hence export
      inet_sk_set_state().
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563e0bb0
    • S
      tcp: Export to userspace the TCP state names for the trace events · d7b850a7
      Steven Rostedt (VMware) 提交于
      The TCP trace events (specifically tcp_set_state), maps emums to symbol
      names via __print_symbolic(). But this only works for reading trace events
      from the tracefs trace files. If perf or trace-cmd were to record these
      events, the event format file does not convert the enum names into numbers,
      and you get something like:
      
      __print_symbolic(REC->oldstate,
          { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
          { TCP_SYN_SENT, "TCP_SYN_SENT" },
          { TCP_SYN_RECV, "TCP_SYN_RECV" },
          { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
          { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
          { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
          { TCP_CLOSE, "TCP_CLOSE" },
          { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
          { TCP_LAST_ACK, "TCP_LAST_ACK" },
          { TCP_LISTEN, "TCP_LISTEN" },
          { TCP_CLOSING, "TCP_CLOSING" },
          { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
      
      Where trace-cmd and perf do not know the values of those enums.
      
      Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
      the enum strings into their values at system boot. This will allow perf and
      trace-cmd to see actual numbers and not enums:
      
      __print_symbolic(REC->oldstate,
          { 1, "TCP_ESTABLISHED" },
          { 2, "TCP_SYN_SENT" },
          { 3, "TCP_SYN_RECV" },
          { 4, "TCP_FIN_WAIT1" },
          { 5, "TCP_FIN_WAIT2" },
          { 6, "TCP_TIME_WAIT" },
          { 7, "TCP_CLOSE" },
          { 8, "TCP_CLOSE_WAIT" },
          { 9, "TCP_LAST_ACK" },
          { 10, "TCP_LISTEN" },
          { 11, "TCP_CLOSING" },
          { 12, "TCP_NEW_SYN_RECV" })
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7b850a7
  14. 19 12月, 2017 1 次提交
  15. 18 12月, 2017 1 次提交
    • W
      KVM: Fix stack-out-of-bounds read in write_mmio · e39d200f
      Wanpeng Li 提交于
      Reported by syzkaller:
      
        BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
        Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298
      
        CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
        Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
        Call Trace:
         dump_stack+0xab/0xe1
         print_address_description+0x6b/0x290
         kasan_report+0x28a/0x370
         write_mmio+0x11e/0x270 [kvm]
         emulator_read_write_onepage+0x311/0x600 [kvm]
         emulator_read_write+0xef/0x240 [kvm]
         emulator_fix_hypercall+0x105/0x150 [kvm]
         em_hypercall+0x2b/0x80 [kvm]
         x86_emulate_insn+0x2b1/0x1640 [kvm]
         x86_emulate_instruction+0x39a/0xb90 [kvm]
         handle_exception+0x1b4/0x4d0 [kvm_intel]
         vcpu_enter_guest+0x15a0/0x2640 [kvm]
         kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
         kvm_vcpu_ioctl+0x479/0x880 [kvm]
         do_vfs_ioctl+0x142/0x9a0
         SyS_ioctl+0x74/0x80
         entry_SYSCALL_64_fastpath+0x23/0x9a
      
      The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
      to the guest memory, however, write_mmio tracepoint always prints 8 bytes
      through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
      leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
      it by just accessing the bytes which we operate on.
      
      Before patch:
      
      syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f
      
      After patch:
      
      syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e39d200f
  16. 14 12月, 2017 1 次提交
    • N
      net: bridge: use rhashtable for fdbs · eb793583
      Nikolay Aleksandrov 提交于
      Before this patch the bridge used a fixed 256 element hash table which
      was fine for small use cases (in my tests it starts to degrade
      above 1000 entries), but it wasn't enough for medium or large
      scale deployments. Modern setups have thousands of participants in a
      single bridge, even only enabling vlans and adding a few thousand vlan
      entries will cause a few thousand fdbs to be automatically inserted per
      participating port. So we need to scale the fdb table considerably to
      cope with modern workloads, and this patch converts it to use a
      rhashtable for its operations thus improving the bridge scalability.
      Tests show the following results (10 runs each), at up to 1000 entries
      rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
      is 2 times faster and at 30000 it is 50 times faster.
      Obviously this happens because of the properties of the two constructs
      and is expected, rhashtable keeps pretty much a constant time even with
      10000000 entries (tested), while the fixed hash table struggles
      considerably even above 10000.
      As a side effect this also reduces the net_bridge struct size from 3248
      bytes to 1344 bytes. Also note that the key struct is 8 bytes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb793583
  17. 12 12月, 2017 1 次提交
  18. 08 12月, 2017 1 次提交
  19. 06 12月, 2017 1 次提交
  20. 04 12月, 2017 1 次提交
  21. 30 11月, 2017 1 次提交
    • X
      trace/xdp: fix compile warning: 'struct bpf_map' declared inside parameter list · 23721a75
      Xie XiuQi 提交于
      We meet this compile warning, which caused by missing bpf.h in xdp.h.
      
      In file included from ./include/trace/events/xdp.h:10:0,
                       from ./include/linux/bpf_trace.h:6,
                       from drivers/net/ethernet/intel/i40e/i40e_txrx.c:29:
      ./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration
          const struct bpf_map *map, u32 map_index),
                       ^
      ./include/linux/tracepoint.h:187:34: note: in definition of macro ‘__DECLARE_TRACE’
        static inline void trace_##name(proto)    \
                                        ^~~~~
      ./include/linux/tracepoint.h:352:24: note: in expansion of macro ‘PARAMS’
        __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
                              ^~~~~~
      ./include/linux/tracepoint.h:477:2: note: in expansion of macro ‘DECLARE_TRACE’
        DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
        ^~~~~~~~~~~~~
      ./include/linux/tracepoint.h:477:22: note: in expansion of macro ‘PARAMS’
        DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
                            ^~~~~~
      ./include/trace/events/xdp.h:89:1: note: in expansion of macro ‘DEFINE_EVENT’
       DEFINE_EVENT(xdp_redirect_template, xdp_redirect,
       ^~~~~~~~~~~~
      ./include/trace/events/xdp.h:90:2: note: in expansion of macro ‘TP_PROTO’
        TP_PROTO(const struct net_device *dev,
        ^~~~~~~~
      ./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration
          const struct bpf_map *map, u32 map_index),
                       ^
      ./include/linux/tracepoint.h:203:38: note: in definition of macro ‘__DECLARE_TRACE’
        register_trace_##name(void (*probe)(data_proto), void *data) \
                                            ^~~~~~~~~~
      ./include/linux/tracepoint.h:354:4: note: in expansion of macro ‘PARAMS’
          PARAMS(void *__data, proto),   \
          ^~~~~~
      Reported-by: NHuang Daode <huangdaode@hisilicon.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Fixes: 8d3b778f ("xdp: tracepoint xdp_redirect also need a map argument")
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      23721a75