1. 23 8月, 2017 1 次提交
  2. 22 8月, 2017 11 次提交
  3. 18 8月, 2017 13 次提交
  4. 17 8月, 2017 5 次提交
  5. 16 8月, 2017 10 次提交
    • W
      perf bpf: Fix endianness problem when loading parameters in prologue · db26984a
      Wang Nan 提交于
      Perf's BPF prologue generator unconditionally fetches 8 bytes for
      function parameters, which causes problems on big endian machines. Thomas
      gives a detailed analysis for this problem:
      
       http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d151@linux.vnet.ibm.com
      
       ---- 8< ----
        I investigated perf test BPF for s390x and have a question regarding
        the 38.3 subtest (bpf-prologue test) which fails on s390x.
      
        When I turn on trace_printk in tests/bpf-script-test-prologue.c
        I see this output in /sys/kernel/debug/tracing/trace:
      
        [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace
        perf-30229 [000] d..2 170161.535791: : f_mode 2001d00000000 offset:0 orig:0
        perf-30229 [000] d..2 170161.535809: : f_mode 6001f00000000 offset:0 orig:0
        perf-30229 [000] d..2 170161.535815: : f_mode 6001f00000000 offset:1 orig:0
        perf-30229 [000] d..2 170161.535819: : f_mode 2001d00000000 offset:1 orig:0
        perf-30229 [000] d..2 170161.535822: : f_mode 2001d00000000 offset:2 orig:1
        perf-30229 [000] d..2 170161.535825: : f_mode 6001f00000000 offset:2 orig:1
        perf-30229 [000] d..2 170161.535828: : f_mode 6001f00000000 offset:3 orig:1
        perf-30229 [000] d..2 170161.535832: : f_mode 2001d00000000 offset:3 orig:1
        perf-30229 [000] d..2 170161.535835: : f_mode 2001d00000000 offset:4 orig:0
        perf-30229 [000] d..2 170161.535841: : f_mode 6001f00000000 offset:4 orig:0
      
        [...]
      
        There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c
        accesses: f_mode (member of struct file at offset 140) offset and orig.  They
        are parameters of the lseek() system call triggered in this test case in
        function llseek_loop().
      
        What is really strange is the value of f_mode. It is an 8 byte value, whereas
        in the probe event it is defined as a 4 byte value.  The lower 4 bytes are all
        zero and do not belong to member f_mode.  The correct value should be 2001d for
        read-only and 6001f for read-write open mode.
      
        Here is the output of the 'perf test -vv bpf' trace:
        Try to find probe point from debuginfo.
        Matched function: null_lseek [2d9310d]
         Probe point found: null_lseek+0
        Searching 'file' variable in context.
        Converting variable file into trace event.
        converting f_mode in file
        f_mode type is unsigned int.
        Opening /sys/kernel/debug/tracing//README write=0
        Searching 'offset' variable in context.
        Converting variable offset into trace event.
        offset type is long long int.
        Searching 'orig' variable in context.
        Converting variable orig into trace event.
        orig type is int.
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing//kprobe_events write=1
        Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32
       ---- 8< ----
      
      This patch parses the type of each argument and converts data from memory to
      expected type.
      
      Now the test runs successfully on 4.13.0-rc5:
      
        [root@s8360046 perf]# ./perf test  bpf
        38: BPF filter                                 :
        38.1: Basic BPF filtering                      : Ok
        38.2: BPF pinning                              : Ok
        38.3: BPF prologue generation                  : Ok
        38.4: BPF relocation checker                   : Ok
        [root@s8360046 perf]#
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20170815092159.31912-1-tmricht@linux.vnet.ibm.comSigned-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db26984a
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 510c8a89
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
          Grumbach.
      
       2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
          Vivien Didelot.
      
       3) Fix two regressions with bonding on wireless, from Andreas Born.
      
       4) Fix build when busypoll is disabled, from Daniel Borkmann.
      
       5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.
      
       6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
          Westphal.
      
       7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
          Tianhong.
      
       8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.
      
       9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
          Dumazet.
      
      10) Need to purge socket write queue in dccp_destroy_sock(), also from
          Eric Dumazet.
      
      11) Make bpf_trace_printk() work properly on 32-bit architectures, from
          Daniel Borkmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
        bpf: fix bpf_trace_printk on 32 bit archs
        PCI: fix oops when try to find Root Port for a PCI device
        sfc: don't try and read ef10 data on non-ef10 NIC
        net_sched: remove warning from qdisc_hash_add
        net_sched/sfq: update hierarchical backlog when drop packet
        net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
        ipv4: fix NULL dereference in free_fib_info_rcu()
        net: Fix a typo in comment about sock flags.
        ipv6: fix NULL dereference in ip6_route_dev_notify()
        tcp: fix possible deadlock in TCP stack vs BPF filter
        dccp: purge write queue in dccp_destroy_sock()
        udp: fix linear skb reception with PEEK_OFF
        ipv6: release rt6->rt6i_idev properly during ifdown
        af_key: do not use GFP_KERNEL in atomic contexts
        tcp: ulp: avoid module refcnt leak in tcp_set_ulp
        net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
        net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
        PCI: Disable Relaxed Ordering Attributes for AMD A1100
        PCI: Disable Relaxed Ordering for some Intel processors
        PCI: Disable PCIe Relaxed Ordering if unsupported
        ...
      510c8a89
    • D
      bpf: fix bpf_trace_printk on 32 bit archs · 88a5c690
      Daniel Borkmann 提交于
      James reported that on MIPS32 bpf_trace_printk() is currently
      broken while MIPS64 works fine:
      
        bpf_trace_printk() uses conditional operators to attempt to
        pass different types to __trace_printk() depending on the
        format operators. This doesn't work as intended on 32-bit
        architectures where u32 and long are passed differently to
        u64, since the result of C conditional operators follows the
        "usual arithmetic conversions" rules, such that the values
        passed to __trace_printk() will always be u64 [causing issues
        later in the va_list handling for vscnprintf()].
      
        For example the samples/bpf/tracex5 test printed lines like
        below on MIPS32, where the fd and buf have come from the u64
        fd argument, and the size from the buf argument:
      
          [...] 1180.941542: 0x00000001: write(fd=1, buf=  (null), size=6258688)
      
        Instead of this:
      
          [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
      
      One way to get it working is to expand various combinations
      of argument types into 8 different combinations for 32 bit
      and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
      as well that it resolves the issue.
      
      Fixes: 9c959c86 ("tracing: Allow BPF programs to call bpf_trace_printk()")
      Reported-by: NJames Hogan <james.hogan@imgtec.com>
      Tested-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88a5c690
    • D
      PCI: fix oops when try to find Root Port for a PCI device · 0e405232
      dingtianhong 提交于
      Eric report a oops when booting the system after applying
      the commit a99b646a ("PCI: Disable PCIe Relaxed..."):
      
      [    4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [    4.247001] IP: pci_find_pcie_root_port+0x62/0x80
      [    4.253011] PGD 0
      [    4.253011] P4D 0
      [    4.253011]
      [    4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [    4.262015] Modules linked in:
      [    4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
      [    4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
      [    4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
      [    4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
      [    4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
      [    4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
      [    4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
      [    4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
      [    4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
      [    4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
      [    4.331002] FS:  0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
      [    4.339002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
      [    4.351002] Call Trace:
      [    4.354012]  ? pci_configure_device+0x19f/0x570
      [    4.359002]  ? pci_conf1_read+0xb8/0xf0
      [    4.363002]  ? raw_pci_read+0x23/0x40
      [    4.366011]  ? pci_read+0x2c/0x30
      [    4.370014]  ? pci_read_config_word+0x67/0x70
      [    4.374012]  pci_device_add+0x28/0x230
      [    4.378012]  ? pci_vpd_f0_read+0x50/0x80
      [    4.382014]  pci_scan_single_device+0x96/0xc0
      [    4.386012]  pci_scan_slot+0x79/0xf0
      [    4.389001]  pci_scan_child_bus+0x31/0x180
      [    4.394014]  acpi_pci_root_create+0x1c6/0x240
      [    4.398013]  pci_acpi_scan_root+0x15f/0x1b0
      [    4.402012]  acpi_pci_root_add+0x2e6/0x400
      [    4.406012]  ? acpi_evaluate_integer+0x37/0x60
      [    4.411002]  acpi_bus_attach+0xdf/0x200
      [    4.415002]  acpi_bus_attach+0x6a/0x200
      [    4.418014]  acpi_bus_attach+0x6a/0x200
      [    4.422013]  acpi_bus_scan+0x38/0x70
      [    4.426011]  acpi_scan_init+0x10c/0x271
      [    4.429001]  acpi_init+0x2fa/0x348
      [    4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
      [    4.437001]  do_one_initcall+0x43/0x169
      [    4.441001]  kernel_init_freeable+0x1d0/0x258
      [    4.445003]  ? rest_init+0xe0/0xe0
      [    4.449001]  kernel_init+0xe/0x150
      
      ====================== cut here =============================
      
      It looks like the pci_find_pcie_root_port() was trying to
      find the Root Port for the PCI device which is the Root
      Port already, it will return NULL and trigger the problem,
      so check the highest_pcie_bridge to fix thie problem.
      
      Fixes: a99b646a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
      Fixes: c56d4450 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e405232
    • B
      sfc: don't try and read ef10 data on non-ef10 NIC · 61deee96
      Bert Kenward 提交于
      The MAC stats command takes a port ID, which doesn't exist on
      pre-ef10 NICs (5000- and 6000- series). This is extracted from the
      NIC specific data; we misinterpret this as the ef10 data structure,
      causing us to read potentially unallocated data. With a KASAN kernel
      this can cause errors with:
         BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats
      
      Fixes: 0a2ab4d9 ("sfc: set the port-id when calling MC_CMD_MAC_STATS")
      Reported-by: NStefano Brivio <sbrivio@redhat.com>
      Tested-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NBert Kenward <bkenward@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61deee96
    • K
      net_sched: remove warning from qdisc_hash_add · c90e9514
      Konstantin Khlebnikov 提交于
      It was added in commit e57a784d ("pkt_sched: set root qdisc
      before change() in attach_default_qdiscs()") to hide duplicates
      from "tc qdisc show" for incative deivices.
      
      After 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      it triggered when classful qdisc is added to inactive device because
      default qdiscs are added before switching root qdisc.
      
      Anyway after commit ea327469 ("net: sched: avoid duplicates in
      qdisc dump") duplicates are filtered right in dumper.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c90e9514
    • K
      net_sched/sfq: update hierarchical backlog when drop packet · 325d5dc3
      Konstantin Khlebnikov 提交于
      When sfq_enqueue() drops head packet or packet from another queue it
      have to update backlog at upper qdiscs too.
      
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      325d5dc3
    • K
      net_sched: reset pointers to tcf blocks in classful qdiscs' destructors · 89890422
      Konstantin Khlebnikov 提交于
      Traffic filters could keep direct pointers to classes in classful qdisc,
      thus qdisc destruction first removes all filters before freeing classes.
      Class destruction methods also tries to free attached filters but now
      this isn't safe because tcf_block_put() unlike to tcf_destroy_chain()
      cannot be called second time.
      
      This patch set class->block to NULL after first tcf_block_put() and
      turn second call into no-op.
      
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89890422
    • E
      ipv4: fix NULL dereference in free_fib_info_rcu() · 187e5b3a
      Eric Dumazet 提交于
      If fi->fib_metrics could not be allocated in fib_create_info()
      we attempt to dereference a NULL pointer in free_fib_info_rcu() :
      
          m = fi->fib_metrics;
          if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
                  kfree(m);
      
      Before my recent patch, we used to call kfree(NULL) and nothing wrong
      happened.
      
      Instead of using RCU to defer freeing while we are under memory stress,
      it seems better to take immediate action.
      
      This was reported by syzkaller team.
      
      Fixes: 3fb07daf ("ipv4: add reference counting to metrics")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      187e5b3a
    • T
      b3dc8f77