1. 02 7月, 2019 4 次提交
  2. 29 6月, 2019 5 次提交
    • A
      net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_create · 5233794b
      Arnd Bergmann 提交于
      Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
      wasteful and causes a warning on 32-bit architectures when building
      with clang -fsanitize-coverage:
      
      drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
      drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      Since the structure is never written to, we can statically allocate
      it to avoid the stack usage. To be on the safe side, mark all
      subsequent function arguments that we pass it into as 'const'
      as well.
      
      Fixes: 10caabda ("net/mlx5e: Use termination table for VLAN push actions")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      5233794b
    • V
      taprio: Add support for txtime-assist mode · 4cfd5779
      Vedang Patel 提交于
      Currently, we are seeing non-critical packets being transmitted outside of
      their timeslice. We can confirm that the packets are being dequeued at the
      right time. So, the delay is induced in the hardware side.  The most likely
      reason is the hardware queues are starving the lower priority queues.
      
      In order to improve the performance of taprio, we will be making use of the
      txtime feature provided by the ETF qdisc. For all the packets which do not
      have the SO_TXTIME option set, taprio will set the transmit timestamp (set
      in skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit
      time for the packet is set to when the gate is open. If SO_TXTIME is set,
      the TAPrio qdisc will validate whether the timestamp (in skb->tstamp)
      occurs when the gate corresponding to skb's traffic class is open.
      
      Following two parameters added to support this mode:
      - flags: used to enable txtime-assist mode. Will also be used to enable
        other modes (like hardware offloading) later.
      - txtime-delay: This indicates the minimum time it will take for the packet
        to hit the wire. This is useful in determining whether we can transmit
      the packet in the remaining time if the gate corresponding to the packet is
      currently open.
      
      An example configuration for enabling txtime-assist:
      
      tc qdisc replace dev eth0 parent root handle 100 taprio \\
            num_tc 3 \\
            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
            queues 1@0 1@0 1@0 \\
            base-time 1558653424279842568 \\
            sched-entry S 01 300000 \\
            sched-entry S 02 300000 \\
            sched-entry S 04 400000 \\
            flags 0x1 \\
            txtime-delay 40000 \\
            clockid CLOCK_TAI
      
      tc qdisc replace dev $IFACE parent 100:1 etf skip_sock_check \\
            offload delta 200000 clockid CLOCK_TAI
      
      Note that all the traffic classes are mapped to the same queue.  This is
      only possible in taprio when txtime-assist is enabled. Also, note that the
      ETF Qdisc is enabled with offload mode set.
      
      In this mode, if the packet's traffic class is open and the complete packet
      can be transmitted, taprio will try to transmit the packet immediately.
      This will be done by setting skb->tstamp to current_time + the time delta
      indicated in the txtime-delay parameter. This parameter indicates the time
      taken (in software) for packet to reach the network adapter.
      
      If the packet cannot be transmitted in the current interval or if the
      packet's traffic is not currently transmitting, the skb->tstamp is set to
      the next available timestamp value. This is tracked in the next_launchtime
      parameter in the struct sched_entry.
      
      The behaviour w.r.t admin and oper schedules is not changed from what is
      present in software mode.
      
      The transmit time is already known in advance. So, we do not need the HR
      timers to advance the schedule and wakeup the dequeue side of taprio.  So,
      HR timer won't be run when this mode is enabled.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cfd5779
    • V
      etf: Add skip_sock_check · d14d2b20
      Vedang Patel 提交于
      Currently, etf expects a socket with SO_TXTIME option set for each packet
      it encounters. So, it will drop all other packets. But, in the future
      commits we are planning to add functionality where tstamp value will be set
      by another qdisc. Also, some packets which are generated from within the
      kernel (e.g. ICMP packets) do not have any socket associated with them.
      
      So, this commit adds support for skip_sock_check. When this option is set,
      etf will skip checking for a socket and other associated options for all
      skbs.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d14d2b20
    • V
      etf: Don't use BIT() in UAPI headers. · 9903c8dc
      Vedang Patel 提交于
      The BIT() macro isn't exported as part of the UAPI interface. So, the
      compile-test to ensure they are self contained fails. So, use _BITUL()
      instead.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9903c8dc
    • J
      net: sched: refactor reinsert action · 720f22fe
      John Hurley 提交于
      The TC_ACT_REINSERT return type was added as an in-kernel only option to
      allow a packet ingress or egress redirect. This is used to avoid
      unnecessary skb clones in situations where they are not required. If a TC
      hook returns this code then the packet is 'reinserted' and no skb consume
      is carried out as no clone took place.
      
      This return type is only used in act_mirred. Rather than have the reinsert
      called from the main datapath, call it directly in act_mirred. Instead of
      returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
      which tells the caller that the packet has been stolen by another process
      and that no consume call is required.
      
      Moving all redirect calls to the act_mirred code is in preparation for
      tracking recursion created by act_mirred.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      720f22fe
  3. 28 6月, 2019 1 次提交
    • L
      batman-adv: mcast: detect, distribute and maintain multicast router presence · 61caf3d1
      Linus Lüssing 提交于
      To be able to apply our group aware multicast optimizations to packets
      with a scope greater than link-local we need to not only keep track of
      multicast listeners but also multicast routers.
      
      With this patch a node detects the presence of multicast routers on
      its segment by checking if
      /proc/sys/net/ipv{4,6}/conf/<bat0|br0(bat)>/mc_forwarding is set for one
      thing. This option is enabled by multicast routing daemons and needed
      for the kernel's multicast routing tables to receive and route packets.
      
      For another thing if a bridge is configured on top of bat0 then the
      presence of an IPv6 multicast router behind this bridge is currently
      detected by checking for an IPv6 multicast "All Routers Address"
      (ff02::2). This should later be replaced by querying the bridge, which
      performs proper, RFC4286 compliant Multicast Router Discovery (our
      simplified approach includes more hosts than necessary, most notably
      not just multicast routers but also unicast ones and is not applicable
      for IPv4).
      
      If no multicast router is detected then this is signalized via the new
      BATADV_MCAST_WANT_NO_RTR4 and BATADV_MCAST_WANT_NO_RTR6
      multicast tvlv flags.
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      61caf3d1
  4. 27 6月, 2019 8 次提交
  5. 26 6月, 2019 8 次提交
  6. 25 6月, 2019 4 次提交
    • M
      net/mlx5: Convert mkey_table to XArray · 792c4e9d
      Matthew Wilcox 提交于
      The lock protecting the data structure does not need to be an rwlock.  The
      only read access to the lock is in an error path, and if that's limiting
      your scalability, you have bigger performance problems.
      
      Eliminate mlx5_mkey_table in favour of using the xarray directly.
      reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
      be called in interrupt context.
      
      This also fixes a minor bug where SRCU locking was being used on the radix
      tree read side, when RCU was needed too.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      792c4e9d
    • S
      ipv6: Dump route exceptions if requested · 1e47b483
      Stefano Brivio 提交于
      Since commit 2b760fcf ("ipv6: hook up exception table to store dst
      cache"), route exceptions reside in a separate hash table, and won't be
      found by walking the FIB, so they won't be dumped to userspace on a
      RTM_GETROUTE message.
      
      This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to
      have no function anymore:
      
       # ip -6 route get fc00:3::1
       fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium
       # ip -6 route get fc00:4::1
       fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium
       # ip -6 route list cache
       # ip -6 route flush cache
       # ip -6 route get fc00:3::1
       fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium
       # ip -6 route get fc00:4::1
       fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium
      
      because iproute2 lists cached routes using RTM_GETROUTE, and flushes them
      by listing all the routes, and deleting them with RTM_DELROUTE one by one.
      
      If cached routes are requested using the RTM_F_CLONED flag together with
      strict checking, or if no strict checking is requested (and hence we can't
      consistently apply filters), look up exceptions in the hash table
      associated with the current fib6_info in rt6_dump_route(), and, if present
      and not expired, add them to the dump.
      
      We might be unable to dump all the entries for a given node in a single
      message, so keep track of how many entries were handled for the current
      node in fib6_walker, and skip that amount in case we start from the same
      partially dumped node.
      
      When a partial dump restarts, as the starting node might change when
      'sernum' changes, we have no guarantee that we need to skip the same
      amount of in-node entries. Therefore, we need two counters, and we need to
      zero the in-node counter if the node from which the dump is resumed
      differs.
      
      Note that, with the current version of iproute2, this only fixes the
      'ip -6 route list cache': on a flush command, iproute2 doesn't pass
      RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is
      still unable to fetch the routes to be flushed. This will be addressed in
      a patch for iproute2.
      
      To flush cached routes, a procfs entry could be introduced instead: that's
      how it works for IPv4. We already have a rt6_flush_exception() function
      ready to be wired to it. However, this would not solve the issue for
      listing.
      
      Versions of iproute2 and kernel tested:
      
                          iproute2
      kernel             4.14.0   4.15.0   4.19.0   5.0.0   5.1.0    5.1.0, patched
       3.18    list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.4     list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.9     list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.14    list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.15    list
               flush
       4.19    list
               flush
       5.0     list
               flush
       5.1     list
               flush
       with    list        +        +        +        +       +            +
       fix     flush       +        +        +                             +
      
      v7:
        - Explain usage of "skip" counters in commit message (suggested by
          David Ahern)
      
      v6:
        - Rebase onto net-next, use recently introduced nexthop walker
        - Make rt6_nh_dump_exceptions() a separate function (suggested by David
          Ahern)
      
      v5:
        - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH,
          update test results (flushing works with iproute2 < 5.0.0 now)
      
      v4:
        - Split NLM_F_MATCH and strict check handling in separate patches
        - Filter routes using RTM_F_CLONED: if it's not set, only return
          non-cached routes, and if it's set, only return cached routes:
          change requested by David Ahern and Martin Lau. This implies that
          iproute2 needs a separate patch to be able to flush IPv6 cached
          routes. This is not ideal because we can't fix the breakage caused
          by 2b760fcf entirely in kernel. However, two years have passed
          since then, and this makes it more tolerable
      
      v3:
        - More descriptive comment about expired exceptions in rt6_dump_route()
        - Swap return values of rt6_dump_route() (suggested by Martin Lau)
        - Don't zero skip_in_node in case we don't dump anything in a given pass
          (also suggested by Martin Lau)
        - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic,
          it's just a flag to indicate the route was cloned, not to filter on
          routes
      
      v2: Add tracking of number of entries to be skipped in current node after
          a partial dump. As we restart from the same node, if not all the
          exceptions for a given node fit in a single message, the dump will
          not terminate, as suggested by Martin Lau. This is a concrete
          possibility, setting up a big number of exceptions for the same route
          actually causes the issue, suggested by David Ahern.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e47b483
    • S
      ipv4: Dump route exceptions if requested · ee28906f
      Stefano Brivio 提交于
      Since commit 4895c771 ("ipv4: Add FIB nexthop exceptions."), cached
      exception routes are stored as a separate entity, so they are not dumped
      on a FIB dump, even if the RTM_F_CLONED flag is passed.
      
      This implies that the command 'ip route list cache' doesn't return any
      result anymore.
      
      If the RTM_F_CLONED is passed, and strict checking requested, retrieve
      nexthop exception routes and dump them. If no strict checking is
      requested, filtering can't be performed consistently: dump everything in
      that case.
      
      With this, we need to add an argument to the netlink callback in order to
      track how many entries were already dumped for the last leaf included in
      a partial netlink dump.
      
      A single additional argument is sufficient, even if we traverse logically
      nested structures (nexthop objects, hash table buckets, bucket chains): it
      doesn't matter if we stop in the middle of any of those, because they are
      always traversed the same way. As an example, s_i values in [], s_fa
      values in ():
      
        node (fa) #1 [1]
          nexthop #1
          bucket #1 -> #0 in chain (1)
          bucket #2 -> #0 in chain (2) -> #1 in chain (3) -> #2 in chain (4)
          bucket #3 -> #0 in chain (5) -> #1 in chain (6)
      
          nexthop #2
          bucket #1 -> #0 in chain (7) -> #1 in chain (8)
          bucket #2 -> #0 in chain (9)
        --
        node (fa) #2 [2]
          nexthop #1
          bucket #1 -> #0 in chain (1) -> #1 in chain (2)
          bucket #2 -> #0 in chain (3)
      
      it doesn't matter if we stop at (3), (4), (7) for "node #1", or at (2)
      for "node #2": walking flattens all that.
      
      It would even be possible to drop the distinction between the in-tree
      (s_i) and in-node (s_fa) counter, but a further improvement might
      advise against this. This is only as accurate as the existing tracking
      mechanism for leaves: if a partial dump is restarted after exceptions
      are removed or expired, we might skip some non-dumped entries.
      
      To improve this, we could attach a 'sernum' attribute (similar to the
      one used for IPv6) to nexthop entities, and bump this counter whenever
      exceptions change: having a distinction between the two counters would
      make this more convenient.
      
      Listing of exception routes (modified routes pre-3.5) was tested against
      these versions of kernel and iproute2:
      
                          iproute2
      kernel         4.14.0   4.15.0   4.19.0   5.0.0   5.1.0
       3.5-rc4         +        +        +        +       +
       4.4
       4.9
       4.14
       4.15
       4.19
       5.0
       5.1
       fixed           +        +        +        +       +
      
      v7:
         - Move loop over nexthop objects to route.c, and pass struct fib_info
           and table ID to it, not a struct fib_alias (suggested by David Ahern)
         - While at it, note that the NULL check on fa->fa_info is redundant,
           and the check on RTNH_F_DEAD is also not consistent with what's done
           with regular route listing: just keep it for nhc_flags
         - Rename entry point function for dumping exceptions to
           fib_dump_info_fnhe(), and rearrange arguments for consistency with
           fib_dump_info()
         - Rename fnhe_dump_buckets() to fnhe_dump_bucket() and make it handle
           one bucket at a time
         - Expand commit message to describe why we can have a single "skip"
           counter for all exceptions stored in bucket chains in nexthop objects
           (suggested by David Ahern)
      
      v6:
         - Rebased onto net-next
         - Loop over nexthop paths too. Move loop over fnhe buckets to route.c,
           avoids need to export rt_fill_info() and to touch exceptions from
           fib_trie.c. Pass NULL as flow to rt_fill_info(), it now allows that
           (suggested by David Ahern)
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee28906f
    • S
      fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED · 564c91f7
      Stefano Brivio 提交于
      The following patches add back the ability to dump IPv4 and IPv6 exception
      routes, and we need to allow selection of regular routes or exceptions.
      
      Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions:
      iproute2 passes it in dump requests (except for IPv6 cache flush requests,
      this will be fixed in iproute2) and this used to work as long as
      exceptions were stored directly in the FIB, for both IPv4 and IPv6.
      
      Caveat: if strict checking is not requested (that is, if the dump request
      doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol,
      tables or route types.
      
      In this case, filtering on RTM_F_CLONED would be inconsistent: we would
      fix 'ip route list cache' by returning exception routes and at the same
      time introduce another bug in case another selector is present, e.g. on
      'ip route list cache table main' we would return all exception routes,
      without filtering on tables.
      
      Keep this consistent by applying no filters at all, and dumping both
      routes and exceptions, if strict checking is not requested. iproute2
      currently filters results anyway, and no unwanted results will be
      presented to the user. The kernel will just dump more data than needed.
      
      v7: No changes
      
      v6: Rebase onto net-next, no changes
      
      v5: New patch: add dump_routes and dump_exceptions flags in filter and
          simply clear the unwanted one if strict checking is enabled, don't
          ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set.
          Skip filtering altogether if no strict checking is requested:
          selecting routes or exceptions only would be inconsistent with the
          fact we can't filter on tables.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564c91f7
  7. 24 6月, 2019 6 次提交
    • D
      net/tls: fix page double free on TX cleanup · 9354544c
      Dirk van der Merwe 提交于
      With commit 94850257 ("tls: Fix tls_device handling of partial records")
      a new path was introduced to cleanup partial records during sk_proto_close.
      This path does not handle the SW KTLS tx_list cleanup.
      
      This is unnecessary though since the free_resources calls for both
      SW and offload paths will cleanup a partial record.
      
      The visible effect is the following warning, but this bug also causes
      a page double free.
      
          WARNING: CPU: 7 PID: 4000 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
          RIP: 0010:sk_stream_kill_queues+0x103/0x110
          RSP: 0018:ffffb6df87e07bd0 EFLAGS: 00010206
          RAX: 0000000000000000 RBX: ffff8c21db4971c0 RCX: 0000000000000007
          RDX: ffffffffffffffa0 RSI: 000000000000001d RDI: ffff8c21db497270
          RBP: ffff8c21db497270 R08: ffff8c29f4748600 R09: 000000010020001a
          R10: ffffb6df87e07aa0 R11: ffffffff9a445600 R12: 0000000000000007
          R13: 0000000000000000 R14: ffff8c21f03f2900 R15: ffff8c21f03b8df0
          Call Trace:
           inet_csk_destroy_sock+0x55/0x100
           tcp_close+0x25d/0x400
           ? tcp_check_oom+0x120/0x120
           tls_sk_proto_close+0x127/0x1c0
           inet_release+0x3c/0x60
           __sock_release+0x3d/0xb0
           sock_close+0x11/0x20
           __fput+0xd8/0x210
           task_work_run+0x84/0xa0
           do_exit+0x2dc/0xb90
           ? release_sock+0x43/0x90
           do_group_exit+0x3a/0xa0
           get_signal+0x295/0x720
           do_signal+0x36/0x610
           ? SYSC_recvfrom+0x11d/0x130
           exit_to_usermode_loop+0x69/0xb0
           do_syscall_64+0x173/0x180
           entry_SYSCALL_64_after_hwframe+0x3d/0xa2
          RIP: 0033:0x7fe9b9abc10d
          RSP: 002b:00007fe9b19a1d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
          RAX: fffffffffffffe00 RBX: 0000000000000006 RCX: 00007fe9b9abc10d
          RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fe948003430
          RBP: 00007fe948003410 R08: 00007fe948003430 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000246 R12: 00005603739d9080
          R13: 00007fe9b9ab9f90 R14: 00007fe948003430 R15: 0000000000000000
      
      Fixes: 94850257 ("tls: Fix tls_device handling of partial records")
      Signed-off-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9354544c
    • W
      ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF · 7d9e5f42
      Wei Wang 提交于
      For tx path, in most cases, we still have to take refcnt on the dst
      cause the caller is caching the dst somewhere. But it still is
      beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the
      route lookup. It is cause this flag prevents manipulating refcnt on
      net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each
      routing table. The null_entry is a shared object and constant updates on
      it cause false sharing.
      
      We converted the current major lookup function ip6_route_output_flags()
      to make use of RT6_LOOKUP_F_DST_NOREF.
      
      Together with the change in the rx path, we see noticable performance
      boost:
      I ran synflood tests between 2 hosts under the same switch. Both hosts
      have 20G mlx NIC, and 8 tx/rx queues.
      Sender sends pure SYN flood with random src IPs and ports using trafgen.
      Receiver has a simple TCP listener on the target port.
      Both hosts have multiple custom rules:
      - For incoming packets, only local table is traversed.
      - For outgoing packets, 3 tables are traversed to find the route.
      The packet processing rate on the receiver is as follows:
      - Before the fix: 3.78Mpps
      - After the fix:  5.50Mpps
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d9e5f42
    • W
      ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic · d64a1f57
      Wei Wang 提交于
      This patch specifically converts the rule lookup logic to honor this
      flag and not release refcnt when traversing each rule and calling
      lookup() on each routing table.
      Similar to previous patch, we also need some special handling of dst
      entries in uncached list because there is always 1 refcnt taken for them
      even if RT6_LOOKUP_F_DST_NOREF flag is set.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d64a1f57
    • W
      ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route() · 0e09edcc
      Wei Wang 提交于
      This new flag is to instruct the route lookup function to not take
      refcnt on the dst entry. The user which does route lookup with this flag
      must properly use rcu protection.
      ip6_pol_route() is the major route lookup function for both tx and rx
      path.
      In this function:
      Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and
      directly return the route entry. The caller should be holding rcu lock
      when using this flag, and decide whether to take refcnt or not.
      
      One note on the dst cache in the uncached_list:
      As uncached_list does not consume refcnt, one refcnt is always returned
      back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set.
      Uncached dst is only possible in the output path. So in such call path,
      caller MUST check if the dst is in the uncached_list before assuming
      that there is no refcnt taken on the returned dst.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e09edcc
    • Q
      inet: fix compilation warnings in fqdir_pre_exit() · 08003d0b
      Qian Cai 提交于
      The linux-next commit "inet: fix various use-after-free in defrags
      units" [1] introduced compilation warnings,
      
      ./include/net/inet_frag.h:117:1: warning: 'inline' is not at beginning
      of declaration [-Wold-style-declaration]
       static void inline fqdir_pre_exit(struct fqdir *fqdir)
       ^~~~~~
      In file included from ./include/net/netns/ipv4.h:10,
                       from ./include/net/net_namespace.h:20,
                       from ./include/linux/netdevice.h:38,
                       from ./include/linux/icmpv6.h:13,
                       from ./include/linux/ipv6.h:86,
                       from ./include/net/ipv6.h:12,
                       from ./include/rdma/ib_verbs.h:51,
                       from ./include/linux/mlx5/device.h:37,
                       from ./include/linux/mlx5/driver.h:51,
                       from
      drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:37:
      
      [1] https://lore.kernel.org/netdev/20190618180900.88939-3-edumazet@google.com/Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08003d0b
    • T
      mtd: spi-nor: use 16-bit WRR command when QE is set on spansion flashes · 191f5c2e
      Tudor Ambarus 提交于
      SPI memory devices from different manufacturers have widely
      different configurations for Status, Control and Configuration
      registers. JEDEC 216C defines a new map for these common register
      bits and their functions, and describes how the individual bits may
      be accessed for a specific device. For the JEDEC 216B compliant
      flashes, we can partially deduce Status and Configuration registers
      functions by inspecting the 16th DWORD of BFPT. Older flashes that
      don't declare the SFDP tables (SPANSION FL512SAIFG1 311QQ063 A ©11
      SPANSION) let the software decide how to interact with these registers.
      
      The commit dcb4b22e ("spi-nor: s25fl512s supports region locking")
      uncovered a probe error for s25fl512s, when the Quad Enable bit CR[1]
      was set to one in the bootloader. When this bit is one, only the Write
      Status (01h) command with two data byts may be used, the 01h command with
      one data byte is not recognized and hence the error when trying to clear
      the block protection bits.
      
      Fix the above by using the Write Status (01h) command with two data bytes
      when the Quad Enable bit is one.
      
      Backward compatibility should be fine. The newly introduced
      spi_nor_spansion_clear_sr_bp() is tightly coupled with the
      spansion_quad_enable() function. Both assume that the Write Register
      with 16 bits, together with the Read Configuration Register (35h)
      instructions are supported.
      
      Fixes: dcb4b22e ("spi-nor: s25fl512s supports region locking")
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NTudor Ambarus <tudor.ambarus@microchip.com>
      Tested-by: NJonas Bonn <jonas@norrbonn.se>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NVignesh Raghavendra <vigneshr@ti.com>
      Tested-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      191f5c2e
  8. 23 6月, 2019 1 次提交
    • A
      net: fastopen: robustness and endianness fixes for SipHash · 438ac880
      Ard Biesheuvel 提交于
      Some changes to the TCP fastopen code to make it more robust
      against future changes in the choice of key/cookie size, etc.
      
      - Instead of keeping the SipHash key in an untyped u8[] buffer
        and casting it to the right type upon use, use the correct
        type directly. This ensures that the key will appear at the
        correct alignment if we ever change the way these data
        structures are allocated. (Currently, they are only allocated
        via kmalloc so they always appear at the correct alignment)
      
      - Use DIV_ROUND_UP when sizing the u64[] array to hold the
        cookie, so it is always of sufficient size, even if
        TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.
      
      - Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
        function, which is no longer used.
      
      - Add endian swabbing when setting the keys and calculating the hash,
        to ensure that cookie values are the same for a given key and
        source/destination address pair regardless of the endianness of
        the server.
      
      Note that none of these are functional changes wrt the current
      state of the code, with the exception of the swabbing, which only
      affects big endian systems.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      438ac880
  9. 22 6月, 2019 2 次提交
  10. 21 6月, 2019 1 次提交
    • A
      netfilter: fix nf_conntrack_bridge/ipv6 link error · 43a38c3f
      Arnd Bergmann 提交于
      When CONFIG_IPV6 is disabled, the bridge netfilter code
      produces a link error:
      
      ERROR: "br_ip6_fragment" [net/bridge/netfilter/nf_conntrack_bridge.ko] undefined!
      ERROR: "nf_ct_frag6_gather" [net/bridge/netfilter/nf_conntrack_bridge.ko] undefined!
      
      The problem is that it assumes that whenever IPV6 is not a loadable
      module, we can call the functions direction. This is clearly
      not true when IPV6 is disabled.
      
      There are two other functions defined like this in linux/netfilter_ipv6.h,
      so change them all the same way.
      
      Fixes: 764dd163 ("netfilter: nf_conntrack_bridge: add support for IPv6")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      43a38c3f