1. 03 1月, 2018 3 次提交
    • T
      qed*: Refactoring and rearranging FW API with no functional impact · a2e7699e
      Tomer Tayar 提交于
      This patch refactors and reorders the FW API files in preparation of
      upgrading the code to support new FW.
      
      - Make use of the BIT macro in appropriate places.
      - Whitespace changes to align values and code blocks.
      - Comments are updated (spelling mistakes, removed if not clear).
      - Group together code blocks which are related or deal with similar
       matters.
      Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e7699e
    • K
      inet_diag: Add equal-operator for ports · bbb6189d
      Kristian Evensen 提交于
      inet_diag currently provides less/greater than or equal operators for
      comparing ports when filtering sockets. An equal comparison can be
      performed by combining the two existing operators, or a user can for
      example request a port range and then do the final filtering in
      userspace. However, these approaches both have drawbacks. Implementing
      equal using LE/GE causes the size and complexity of a filter to grow
      quickly as the number of ports increase, while it on busy machines would
      be great if the kernel only returns information about relevant sockets.
      
      This patch introduces source and destination port equal operators.
      INET_DIAG_BC_S_EQ is used to match a source port, INET_DIAG_BC_D_EQ a
      destination port, and usage is the same as for the existing port
      operators.  I.e., the port to match is stored in the no-member of the
      next inet_diag_bc_op-struct in the filter.
      Signed-off-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb6189d
    • J
      net: ptr_ring: otherwise safe empty checks can overrun array bounds · bcecb4bb
      John Fastabend 提交于
      When running consumer and/or producer operations and empty checks in
      parallel its possible to have the empty check run past the end of the
      array. The scenario occurs when an empty check is run while
      __ptr_ring_discard_one() is in progress. Specifically after the
      consumer_head is incremented but before (consumer_head >= ring_size)
      check is made and the consumer head is zeroe'd.
      
      To resolve this, without having to rework how consumer/producer ops
      work on the array, simply add an extra dummy slot to the end of the
      array. Even if we did a rework to avoid the extra slot it looks
      like the normal case checks would suffer some so best to just
      allocate an extra pointer.
      Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcecb4bb
  2. 29 12月, 2017 2 次提交
  3. 28 12月, 2017 5 次提交
  4. 27 12月, 2017 2 次提交
    • M
      tcp: Avoid preprocessor directives in tracepoint macro args · 6a6b0b99
      Mat Martineau 提交于
      Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
      a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
      of errors:
      
      ./include/trace/events/tcp.h:68:1: error: directive in argument list
      ./include/trace/events/tcp.h:75:1: error: directive in argument list
      ./include/trace/events/tcp.h:144:1: error: directive in argument list
      ./include/trace/events/tcp.h:151:1: error: directive in argument list
      ./include/trace/events/tcp.h:216:1: error: directive in argument list
      ./include/trace/events/tcp.h:223:1: error: directive in argument list
      ./include/trace/events/tcp.h:274:1: error: directive in argument list
      ./include/trace/events/tcp.h:281:1: error: directive in argument list
      
      Once sparse finds an error, it stops printing warnings for the file it
      is checking. This masks any sparse warnings that would normally be
      reported for the core TCP code.
      
      Instead, handle the preprocessor conditionals in a couple of auxiliary
      macros. This also has the benefit of reducing duplicate code.
      
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a6b0b99
    • L
      rtnetlink: Replace implementation of ASSERT_RTNL() macro with WARN_ONCE() · 66364bdf
      Leon Romanovsky 提交于
      ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with
      two exceptions. First, it prints stack for multiple hits and not only
      once as WARN_ONCE() does. Second, the user can disable prints of
      WARN_ONCE by setting CONFIG_BUG to N.
      
      The multiple prints of dump stack are actually not needed, because calls
      without rtnl lock are programming errors and user can't do anything
      about them except to complain to the mailing list after first occurrence
      of such failure.
      
      The user who disabled BUG/WARN prints did it explicitly because by default
      in upstream kernel and distributions this option is enabled. It means
      that user doesn't want to see prints about missing locks too.
      
      This patch replaces open-coded variant in favor of already existing
      macro and change error prints to be once only.
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66364bdf
  5. 23 12月, 2017 2 次提交
    • T
      init: Invoke init_espfix_bsp() from mm_init() · 613e396b
      Thomas Gleixner 提交于
      init_espfix_bsp() needs to be invoked before the page table isolation
      initialization. Move it into mm_init() which is the place where pti_init()
      will be added.
      
      While at it get rid of the #ifdeffery and provide proper stub functions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      613e396b
    • T
      arch, mm: Allow arch_dup_mmap() to fail · c10e83f5
      Thomas Gleixner 提交于
      In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
      allowed to fail. Fix up all instances.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: dan.j.williams@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c10e83f5
  6. 22 12月, 2017 12 次提交
  7. 21 12月, 2017 11 次提交
    • S
      xfrm: wrap xfrmdev_ops with offload config · 9cb0d21d
      Shannon Nelson 提交于
      There's no reason to define netdev->xfrmdev_ops if
      the offload facility is not CONFIG'd in.
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      9cb0d21d
    • S
      xfrm: check for xdo_dev_state_free · 7f05b467
      Shannon Nelson 提交于
      The current XFRM code assumes that we've implemented the
      xdo_dev_state_free() callback, even if it is meaningless to the driver.
      This patch adds a check for it before calling, as done in other APIs,
      to prevent a NULL function pointer kernel crash.
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7f05b467
    • D
      bpf: allow for correlation of maps and helpers in dump · 7105e828
      Daniel Borkmann 提交于
      Currently a dump of an xlated prog (post verifier stage) doesn't
      correlate used helpers as well as maps. The prog info lists
      involved map ids, however there's no correlation of where in the
      program they are used as of today. Likewise, bpftool does not
      correlate helper calls with the target functions.
      
      The latter can be done w/o any kernel changes through kallsyms,
      and also has the advantage that this works with inlined helpers
      and BPF calls.
      
      Example, via interpreter:
      
        # tc filter show dev foo ingress
        filter protocol all pref 49152 bpf chain 0
        filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
                            direct-action not_in_hw id 1 tag c74773051b364165   <-- prog id:1
      
        * Output before patch (calls/maps remain unclear):
      
        # bpftool prog dump xlated id 1             <-- dump prog id:1
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = 0xffff95c47a8d4800
         6: (85) call unknown#73040
         7: (15) if r0 == 0x0 goto pc+18
         8: (bf) r2 = r10
         9: (07) r2 += -4
        10: (bf) r1 = r0
        11: (85) call unknown#73040
        12: (15) if r0 == 0x0 goto pc+23
        [...]
      
        * Output after patch:
      
        # bpftool prog dump xlated id 1
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = map[id:2]                     <-- map id:2
         6: (85) call bpf_map_lookup_elem#73424     <-- helper call
         7: (15) if r0 == 0x0 goto pc+18
         8: (bf) r2 = r10
         9: (07) r2 += -4
        10: (bf) r1 = r0
        11: (85) call bpf_map_lookup_elem#73424
        12: (15) if r0 == 0x0 goto pc+23
        [...]
      
        # bpftool map show id 2                     <-- show/dump/etc map id:2
        2: hash_of_maps  flags 0x0
              key 4B  value 4B  max_entries 3  memlock 4096B
      
      Example, JITed, same prog:
      
        # tc filter show dev foo ingress
        filter protocol all pref 49152 bpf chain 0
        filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
                        direct-action not_in_hw id 3 tag c74773051b364165 jited
      
        # bpftool prog show id 3
        3: sched_cls  tag c74773051b364165
              loaded_at Dec 19/13:48  uid 0
              xlated 384B  jited 257B  memlock 4096B  map_ids 2
      
        # bpftool prog dump xlated id 3
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = map[id:2]                      <-- map id:2
         6: (85) call __htab_map_lookup_elem#77408   <-+ inlined rewrite
         7: (15) if r0 == 0x0 goto pc+2                |
         8: (07) r0 += 56                              |
         9: (79) r0 = *(u64 *)(r0 +0)                <-+
        10: (15) if r0 == 0x0 goto pc+24
        11: (bf) r2 = r10
        12: (07) r2 += -4
        [...]
      
      Example, same prog, but kallsyms disabled (in that case we are
      also not allowed to pass any relative offsets, etc, so prog
      becomes pointer sanitized on dump):
      
        # sysctl kernel.kptr_restrict=2
        kernel.kptr_restrict = 2
      
        # bpftool prog dump xlated id 3
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = map[id:2]
         6: (85) call bpf_unspec#0
         7: (15) if r0 == 0x0 goto pc+2
        [...]
      
      Example, BPF calls via interpreter:
      
        # bpftool prog dump xlated id 1
         0: (85) call pc+2#__bpf_prog_run_args32
         1: (b7) r0 = 1
         2: (95) exit
         3: (b7) r0 = 2
         4: (95) exit
      
      Example, BPF calls via JIT:
      
        # sysctl net.core.bpf_jit_enable=1
        net.core.bpf_jit_enable = 1
        # sysctl net.core.bpf_jit_kallsyms=1
        net.core.bpf_jit_kallsyms = 1
      
        # bpftool prog dump xlated id 1
         0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F
         1: (b7) r0 = 1
         2: (95) exit
         3: (b7) r0 = 2
         4: (95) exit
      
      And finally, an example for tail calls that is now working
      as well wrt correlation:
      
        # bpftool prog dump xlated id 2
        [...]
        10: (b7) r2 = 8
        11: (85) call bpf_trace_printk#-41312
        12: (bf) r1 = r6
        13: (18) r2 = map[id:1]
        15: (b7) r3 = 0
        16: (85) call bpf_tail_call#12
        17: (b7) r1 = 42
        18: (6b) *(u16 *)(r6 +46) = r1
        19: (b7) r0 = 0
        20: (95) exit
      
        # bpftool map show id 1
        1: prog_array  flags 0x0
              key 4B  value 4B  max_entries 1  memlock 4096B
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7105e828
    • A
      bpf: fix integer overflows · bb7f0f98
      Alexei Starovoitov 提交于
      There were various issues related to the limited size of integers used in
      the verifier:
       - `off + size` overflow in __check_map_access()
       - `off + reg->off` overflow in check_mem_access()
       - `off + reg->var_off.value` overflow or 32-bit truncation of
         `reg->var_off.value` in check_mem_access()
       - 32-bit truncation in check_stack_boundary()
      
      Make sure that any integer math cannot overflow by not allowing
      pointer math with large values.
      
      Also reduce the scope of "scalar op scalar" tracking.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bb7f0f98
    • J
      block: unalign call_single_data in struct request · 4ccafe03
      Jens Axboe 提交于
      A previous change blindly added massive alignment to the
      call_single_data structure in struct request. This ballooned it in size
      from 296 to 320 bytes on my setup, for no valid reason at all.
      
      Use the unaligned struct __call_single_data variant instead.
      
      Fixes: 966a9671 ("smp: Avoid using two cache lines for struct call_single_data")
      Cc: stable@vger.kernel.org # v4.14
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ccafe03
    • Y
      net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store · 986ffdfd
      Yafang Shao 提交于
      sk_state_load is only used by AF_INET/AF_INET6, so rename it to
      inet_sk_state_load and move it into inet_sock.h.
      
      sk_state_store is removed as it is not used any more.
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      986ffdfd
    • Y
      net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint · 563e0bb0
      Yafang Shao 提交于
      As sk_state is a common field for struct sock, so the state
      transition tracepoint should not be a TCP specific feature.
      Currently it traces all AF_INET state transition, so I rename this
      tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
      into trace/events/sock.h.
      We dont need to create a file named trace/events/inet_sock.h for this one single
      tracepoint.
      
      Two helpers are introduced to trace sk_state transition
          - void inet_sk_state_store(struct sock *sk, int newstate);
          - void inet_sk_set_state(struct sock *sk, int state);
      As trace header should not be included in other header files,
      so they are defined in sock.c.
      
      The protocol such as SCTP maybe compiled as a ko, hence export
      inet_sk_set_state().
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563e0bb0
    • S
      tcp: Export to userspace the TCP state names for the trace events · d7b850a7
      Steven Rostedt (VMware) 提交于
      The TCP trace events (specifically tcp_set_state), maps emums to symbol
      names via __print_symbolic(). But this only works for reading trace events
      from the tracefs trace files. If perf or trace-cmd were to record these
      events, the event format file does not convert the enum names into numbers,
      and you get something like:
      
      __print_symbolic(REC->oldstate,
          { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
          { TCP_SYN_SENT, "TCP_SYN_SENT" },
          { TCP_SYN_RECV, "TCP_SYN_RECV" },
          { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
          { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
          { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
          { TCP_CLOSE, "TCP_CLOSE" },
          { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
          { TCP_LAST_ACK, "TCP_LAST_ACK" },
          { TCP_LISTEN, "TCP_LISTEN" },
          { TCP_CLOSING, "TCP_CLOSING" },
          { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
      
      Where trace-cmd and perf do not know the values of those enums.
      
      Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
      the enum strings into their values at system boot. This will allow perf and
      trace-cmd to see actual numbers and not enums:
      
      __print_symbolic(REC->oldstate,
          { 1, "TCP_ESTABLISHED" },
          { 2, "TCP_SYN_SENT" },
          { 3, "TCP_SYN_RECV" },
          { 4, "TCP_FIN_WAIT1" },
          { 5, "TCP_FIN_WAIT2" },
          { 6, "TCP_TIME_WAIT" },
          { 7, "TCP_CLOSE" },
          { 8, "TCP_CLOSE_WAIT" },
          { 9, "TCP_LAST_ACK" },
          { 10, "TCP_LISTEN" },
          { 11, "TCP_CLOSING" },
          { 12, "TCP_NEW_SYN_RECV" })
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7b850a7
    • B
      xen/balloon: Mark unallocated host memory as UNUSABLE · b3cf8528
      Boris Ostrovsky 提交于
      Commit f5775e0b ("x86/xen: discard RAM regions above the maximum
      reservation") left host memory not assigned to dom0 as available for
      memory hotplug.
      
      Unfortunately this also meant that those regions could be used by
      others. Specifically, commit fa564ad9 ("x86/PCI: Enable a 64bit BAR
      on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those
      addresses as MMIO.
      
      To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus
      effectively reverting f5775e0b) and keep track of that region as
      a hostmem resource that can be used for the hotplug.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      b3cf8528
    • S
      block-throttle: avoid double charge · 111be883
      Shaohua Li 提交于
      If a bio is throttled and split after throttling, the bio could be
      resubmited and enters the throttling again. This will cause part of the
      bio to be charged multiple times. If the cgroup has an IO limit, the
      double charge will significantly harm the performance. The bio split
      becomes quite common after arbitrary bio size change.
      
      To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
      If the bio is cloned/split, we copy the flag to new bio too to avoid a
      double charge. However, cloned bio could be directed to a new disk,
      keeping the flag be a problem. The observation is we always set new disk
      for the bio in this case, so we can clear the flag in bio_set_dev().
      
      This issue exists for a long time, arbitrary bio size change just makes
      it worse, so this should go into stable at least since v4.2.
      
      V1-> V2: Not add extra field in bio based on discussion with Tejun
      
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      111be883
    • J
      cls_bpf: fix offload assumptions after callback conversion · 102740bd
      Jakub Kicinski 提交于
      cls_bpf used to take care of tracking what offload state a filter
      is in, i.e. it would track if offload request succeeded or not.
      This information would then be used to issue correct requests to
      the driver, e.g. requests for statistics only on offloaded filters,
      removing only filters which were offloaded, using add instead of
      replace if previous filter was not added etc.
      
      This tracking of offload state no longer functions with the new
      callback infrastructure.  There could be multiple entities trying
      to offload the same filter.
      
      Throw out all the tracking and corresponding commands and simply
      pass to the drivers both old and new bpf program.  Drivers will
      have to deal with offload state tracking by themselves.
      
      Fixes: 3f7889c4 ("net: sched: cls_bpf: call block callbacks for offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      102740bd
  8. 20 12月, 2017 3 次提交