1. 20 1月, 2017 1 次提交
  2. 10 1月, 2017 16 次提交
  3. 09 1月, 2017 10 次提交
    • D
      afs: Refcount the afs_call struct · 341f741f
      David Howells 提交于
      A static checker warning occurs in the AFS filesystem:
      
      	fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
      	error: dereferencing freed memory 'call'
      
      due to the reply being sent before we access the server it points to.  The
      act of sending the reply causes the call to be freed if an error occurs
      (but not if it doesn't).
      
      On top of this, the lifetime handling of afs_call structs is fragile
      because they get passed around through workqueues without any sort of
      refcounting.
      
      Deal with the issues by:
      
       (1) Fix the maybe/maybe not nature of the reply sending functions with
           regards to whether they release the call struct.
      
       (2) Refcount the afs_call struct and sort out places that need to get/put
           references.
      
       (3) Pass a ref through the work queue and release (or pass on) that ref in
           the work function.  Care has to be taken because a work queue may
           already own a ref to the call.
      
       (4) Do the cleaning up in the put function only.
      
       (5) Simplify module cleanup by always incrementing afs_outstanding_calls
           whenever a call is allocated.
      
       (6) Set the backlog to 0 with kernel_listen() at the beginning of the
           process of closing the socket to prevent new incoming calls from
           occurring and to remove the contribution of preallocated calls from
           afs_outstanding_calls before we wait on it.
      
      A tracepoint is also added to monitor the afs_call refcount and lifetime.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Fixes: 08e0e7c8: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
      341f741f
    • D
      afs: Add some tracepoints · 8e8d7f13
      David Howells 提交于
      Add three tracepoints to the AFS filesystem:
      
       (1) The afs_recv_data tracepoint logs data segments that are extracted
           from the data received from the peer through afs_extract_data().
      
       (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data
           coming in to an asynchronous call.
      
       (3) The afs_cb_call tracepoint logs incoming calls that have had their
           operation ID extracted and mapped into a supported cache manager
           service call.
      
      To make (3) work, the name strings in the afs_call_type struct objects have
      to be annotated with __tracepoint_string.  This is done with the CM_NAME()
      macro.
      
      Further, the AFS call state enum needs a name so that it can be used to
      declare parameter types.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8e8d7f13
    • W
      net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905
      Willem de Bruijn 提交于
      The tc_from field fulfills two roles. It encodes whether a packet was
      redirected by an act_mirred device and, if so, whether act_mirred was
      called on ingress or egress. Split it into separate fields.
      
      The information is needed by the special IFB loop, where packets are
      taken out of the normal path by act_mirred, forwarded to IFB, then
      reinjected at their original location (ingress or egress) by IFB.
      
      The IFB device cannot use skb->tc_at_ingress, because that may have
      been overwritten as the packet travels from act_mirred to ifb_xmit,
      when it passes through tc_classify on the IFB egress path. Cache this
      value in skb->tc_from_ingress.
      
      That field is valid only if a packet arriving at ifb_xmit came from
      act_mirred. Other packets can be crafted to reach ifb_xmit. These
      must be dropped. Set tc_redirected on redirection and drop all packets
      that do not have this bit set.
      
      Both fields are set only on cloned skbs in tc actions, so original
      packet sources do not have to clear the bit when reusing packets
      (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc31c905
    • W
      net-tc: convert tc_at to tc_at_ingress · 8dc07fdb
      Willem de Bruijn 提交于
      Field tc_at is used only within tc actions to distinguish ingress from
      egress processing. A single bit is sufficient for this purpose.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dc07fdb
    • W
      net-tc: convert tc_verd to integer bitfields · a5135bcf
      Willem de Bruijn 提交于
      Extract the remaining two fields from tc_verd and remove the __u16
      completely. TC_AT and TC_FROM are converted to equivalent two-bit
      integer fields tc_at and tc_from. Where possible, use existing
      helper skb_at_tc_ingress when reading tc_at. Introduce helper
      skb_reset_tc to clear fields.
      
      Not documenting tc_from and tc_at, because they will be replaced
      with single bit fields in follow-on patches.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5135bcf
    • W
      net-tc: extract skip classify bit from tc_verd · e7246e12
      Willem de Bruijn 提交于
      Packets sent by the IFB device skip subsequent tc classification.
      A single bit governs this state. Move it out of tc_verd in
      anticipation of removing that __u16 completely.
      
      The new bitfield tc_skip_classify temporarily uses one bit of a
      hole, until tc_verd is removed completely in a follow-up patch.
      
      Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
      With that many options, little value in documenting it.
      
      Introduce a helper function to deduplicate the logic in the two
      sites that check this bit.
      
      The field tc_skip_classify is set only in IFB on skbs cloned in
      act_mirred, so original packet sources do not have to clear the
      bit when reusing packets (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7246e12
    • W
      net-tc: make MAX_RECLASSIFY_LOOP local · d6264071
      Willem de Bruijn 提交于
      This field is no longer kept in tc_verd. Remove it from the global
      definition of that struct.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6264071
    • W
      net-tc: remove unused tc_verd fields · aec745e2
      Willem de Bruijn 提交于
      Remove the last reference to tc_verd's munge and redirect ttl bits.
      These fields are no longer used.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aec745e2
    • S
      net: make ndo_get_stats64 a void function · bc1f4470
      stephen hemminger 提交于
      The network device operation for reading statistics is only called
      in one place, and it ignores the return value. Having a structure
      return value is potentially confusing because some future driver could
      incorrectly assume that the return value was used.
      
      Fix all drivers with ndo_get_stats64 to have a void function.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc1f4470
    • D
      net: ipmr: Remove nowait arg to ipmr_get_route · 9f09eaea
      David Ahern 提交于
      ipmr_get_route has 1 caller and the nowait arg is 0. Remove the arg and
      simplify ipmr_get_route accordingly.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f09eaea
  4. 08 1月, 2017 4 次提交
    • E
      net/mlx5: Introduce blue flame register allocator · a6d51b68
      Eli Cohen 提交于
      Here is an implementation of an allocator that allocates blue flame
      registers. A blue flame register is used for generating send doorbells.
      A blue flame register can be used to generate either a regular doorbell
      or a blue flame doorbell where the data to be sent is written to the
      device's I/O memory hence saving the need to read the data from memory.
      For blue flame kind of doorbells to succeed, the blue flame register
      need to be mapped as write combining. The user can specify what kind of
      send doorbells she wishes to use. If she requested write combining
      mapping but that failed, the allocator will fall back to non write
      combining mapping and will indicate that to the user.
      Subsequent patches in this series will make use of this allocator.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a6d51b68
    • E
      mlx5: Fix naming convention with respect to UARs · 2f5ff264
      Eli Cohen 提交于
      This establishes a solid naming conventions for UARs. A UAR (User Access
      Region) can have size identical to a system page or can be fixed 4KB
      depending on a value queried by firmware. Each UAR always has 4 blue
      flame register which are used to post doorbell to send queue. In
      addition, a UAR has section used for posting doorbells to CQs or EQs. In
      this patch we change names to reflect this conventions.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      2f5ff264
    • J
      mm: workingset: fix use-after-free in shadow node shrinker · ea07b862
      Johannes Weiner 提交于
      Several people report seeing warnings about inconsistent radix tree
      nodes followed by crashes in the workingset code, which all looked like
      use-after-free access from the shadow node shrinker.
      
      Dave Jones managed to reproduce the issue with a debug patch applied,
      which confirmed that the radix tree shrinking indeed frees shadow nodes
      while they are still linked to the shadow LRU:
      
        WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
        CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
        Call Trace:
           delete_node+0x1e4/0x200
           __radix_tree_delete_node+0xd/0x10
           shadow_lru_isolate+0xe6/0x220
           __list_lru_walk_one.isra.4+0x9b/0x190
           list_lru_walk_one+0x23/0x30
           scan_shadow_nodes+0x2e/0x40
           shrink_slab.part.44+0x23d/0x5d0
           shrink_node+0x22c/0x330
           kswapd+0x392/0x8f0
      
      This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
      inlined radix_tree_shrink().
      
      The problem is with 14b46879 ("mm: workingset: move shadow entry
      tracking to radix tree exceptional tracking"), which passes an update
      callback into the radix tree to link and unlink shadow leaf nodes when
      tree entries change, but forgot to pass the callback when reclaiming a
      shadow node.
      
      While the reclaimed shadow node itself is unlinked by the shrinker, its
      deletion from the tree can cause the left-most leaf node in the tree to
      be shrunk.  If that happens to be a shadow node as well, we don't unlink
      it from the LRU as we should.
      
      Consider this tree, where the s are shadow entries:
      
             root->rnode
                  |
             [0       n]
              |       |
           [s    ] [sssss]
      
      Now the shadow node shrinker reclaims the rightmost leaf node through
      the shadow node LRU:
      
             root->rnode
                  |
             [0        ]
              |
          [s     ]
      
      Because the parent of the deleted node is the first level below the
      root and has only one child in the left-most slot, the intermediate
      level is shrunk and the node containing the single shadow is put in
      its place:
      
             root->rnode
                  |
             [s        ]
      
      The shrinker again sees a single left-most slot in a first level node
      and thus decides to store the shadow in root->rnode directly and free
      the node - which is a leaf node on the shadow node LRU.
      
        root->rnode
             |
             s
      
      Without the update callback, the freed node remains on the shadow LRU,
      where it causes later shrinker runs to crash.
      
      Pass the node updater callback into __radix_tree_delete_node() in case
      the deletion causes the left-most branch in the tree to collapse too.
      
      Also add warnings when linked nodes are freed right away, rather than
      wait for the use-after-free when the list is scanned much later.
      
      Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-and-tested-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea07b862
    • K
      net: netcp: extract eflag from desc for rx_hook handling · 69d707d0
      Karicheri, Muralidharan 提交于
      Extract the eflag bits from the received desc and pass it down
      the rx_hook chain to be available for netcp modules. Also the
      psdata and epib data has to be inspected by the netcp modules.
      So the desc can be freed only after returning from the rx_hook.
      So move knav_pool_desc_put() after the rx_hook processing.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69d707d0
  5. 07 1月, 2017 3 次提交
  6. 05 1月, 2017 6 次提交
    • M
      asm-prototypes: Clear any CPP defines before declaring the functions · c7858bf1
      Michal Marek 提交于
      The asm-prototypes.h file is used to provide dummy function declarations
      for genksyms, when processing asm files with EXPORT_SYMBOL. Make sure
      that any architecture defines get out of our way. x86 currently has an
      issue with memcpy on 64bit with CONFIG_KMEMCHECK=y and with
      memset/__memset on 32bit:
      
      	$ cat init/test.c
      	#include <asm/asm-prototypes.h>
      	$ make -s init/test.o
      	In file included from ./arch/x86/include/asm/string.h:4:0,
      			 from ./include/linux/string.h:18,
      			 from ./include/linux/bitmap.h:8,
      			 from ./include/linux/cpumask.h:11,
      			 from ./arch/x86/include/asm/cpumask.h:4,
      			 from ./arch/x86/include/asm/msr.h:10,
      			 from ./arch/x86/include/asm/processor.h:20,
      			 from ./arch/x86/include/asm/cpufeature.h:4,
      			 from ./arch/x86/include/asm/thread_info.h:52,
      			 from ./include/linux/thread_info.h:25,
      			 from ./arch/x86/include/asm/preempt.h:6,
      			 from ./include/linux/preempt.h:59,
      			 from ./include/linux/spinlock.h:50,
      			 from ./include/linux/seqlock.h:35,
      			 from ./include/linux/time.h:5,
      			 from ./include/uapi/linux/timex.h:56,
      			 from ./include/linux/timex.h:56,
      			 from ./include/linux/sched.h:19,
      			 from ./include/linux/uaccess.h:4,
      			 from ./arch/x86/include/asm/asm-prototypes.h:2,
      			 from init/test.c:1:
      	./arch/x86/include/asm/string_64.h:52:47: error: expected declaration specifiers or ‘...’ before ‘(’ token
      	 #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len))
      	 ./include/asm-generic/asm-prototypes.h:6:14: note: in expansion of macro ‘memcpy’
      	  extern void *memcpy(void *, const void *, __kernel_size_t);
      
      						       ^
      	...
      
      During real build, this manifests itself by genksyms segfaulting.
      
      Fixes: 334bb773 ("x86/kbuild: enable modversions for symbols exported from asm")
      Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
      Cc: Adam Borowski <kilobyte@angband.pl>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      c7858bf1
    • D
      rxrpc: Add some more tracing · b1d9f7fd
      David Howells 提交于
      Add the following extra tracing information:
      
       (1) Modify the rxrpc_transmit tracepoint to record the Tx window size as
           this is varied by the slow-start algorithm.
      
       (2) Modify the rxrpc_rx_ack tracepoint to record more information from
           received ACK packets.
      
       (3) Add an rxrpc_rx_data tracepoint to record the information in DATA
           packets.
      
       (4) Add an rxrpc_disconnect_call tracepoint to record call disconnection,
           including the reason the call was disconnected.
      
       (5) Add an rxrpc_improper_term tracepoint to record implicit termination
           of a call by a client either by starting a new call on a particular
           connection channel without first transmitting the final ACK for the
           previous call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b1d9f7fd
    • D
      rxrpc: Fix handling of enums-to-string translation in tracing · b54a134a
      David Howells 提交于
      Fix the way enum values are translated into strings in AF_RXRPC
      tracepoints.  The problem with just doing a lookup in a normal flat array
      of strings or chars is that external tracing infrastructure can't find it.
      Rather, TRACE_DEFINE_ENUM must be used.
      
      Also sort the enums and string tables to make it easier to keep them in
      order so that a future patch to __print_symbolic() can be optimised to try
      a direct lookup into the table first before iterating over it.
      
      A couple of _proto() macro calls are removed because they refered to tables
      that got moved to the tracing infrastructure.  The relevant data can be
      found by way of tracing.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b54a134a
    • A
      dsa: mv88e6xxx: Optimise atu_get · 59527581
      Andrew Lunn 提交于
      Lookup in the ATU can be performed starting from a given MAC
      address. This is faster than starting with the first possible MAC
      address and iterating all entries.
      
      Entries are returned in numeric order. So if the MAC address returned
      is bigger than what we are searching for, we know it is not in the
      ATU.
      
      Using the benchmark provided by Volodymyr Bendiuga
      <volodymyr.bendiuga@gmail.com>,
      
      https://www.spinics.net/lists/netdev/msg411550.html
      
      on an Marvell Armada 370 RD, the test to add a number of static fdb
      entries went from 1.616531 seconds to 0.312052 seconds.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59527581
    • P
      vfio-mdev: fix non-standard ioctl return val causing i386 build fail · c6ef7fd4
      Paul Gortmaker 提交于
      What appears to be a copy and paste error from the line above gets
      the ioctl a ssize_t return value instead of the traditional "int".
      
      The associated sample code used "long" which meant it would compile
      for x86-64 but not i386, with the latter failing as follows:
      
        CC [M]  samples/vfio-mdev/mtty.o
      samples/vfio-mdev/mtty.c:1418:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
        .ioctl          = mtty_ioctl,
                          ^
      samples/vfio-mdev/mtty.c:1418:20: note: (near initialization for ‘mdev_fops.ioctl’)
      cc1: some warnings being treated as errors
      
      Since in this case, vfio is working with struct file_operations; as such:
      
          long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
          long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
      
      ...and so here we just standardize on long vs. the normal int that user
      space typically sees and documents as per "man ioctl" and similar.
      
      Fixes: 9d1a546c ("docs: Sample driver to demonstrate how to use Mediated device framework.")
      Cc: Kirti Wankhede <kwankhede@nvidia.com>
      Cc: Neo Jia <cjia@nvidia.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      c6ef7fd4
    • Y
      scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr)) · 1ff8cebf
      yuan linyu 提交于
      sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
      remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
      CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ff8cebf