提交 · 4b307a8edb6b6f59b6f2bfe9f36fcec6e43ec911 · openanolis / cloud-kernel

07 5月, 2016 11 次提交

Merge branch 'bpf-direct-pkt-access' · 4b307a8e

由 David S. Miller 提交于 5月 06, 2016

Alexei Starovoitov says:

====================
bpf: introduce direct packet access

This set of patches introduce 'direct packet access' from
cls_bpf and act_bpf programs (which are root only).

Current bpf programs use LD_ABS, LD_INS instructions which have
to do 'if (off < skb_headlen)' for every packet access.
It's ok for socket filters, but too slow for XDP, since single
LD_ABS insn consumes 3% of cpu. Therefore we have to amortize the cost
of length check over multiple packet accesses via direct access
to skb->data, data_end pointers.

The existing packet parser typically look like:
  if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP)
     return 0;
  if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP ||
      load_byte(skb, ETH_HLEN) != 0x45)
     return 0;
  ...
with 'direct packet access' the bpf program becomes:
   void *data = (void *)(long)skb->data;
   void *data_end = (void *)(long)skb->data_end;
   struct eth_hdr *eth = data;
   struct iphdr *iph = data + sizeof(*eth);

   if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
      return 0;
   if (eth->h_proto != htons(ETH_P_IP))
      return 0;
   if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
      return 0;
   ...
which is more natural to write and significantly faster.
See patch 6 for performance tests:
21Mpps(old) vs 24Mpps(new) with just 5 loads.
For more complex parsers the performance gain is higher.

The other approach implemented in [1] was adding two new instructions
to interpreter and JITs and was too hard to use from llvm side.
The approach presented here doesn't need any instruction changes,
but the verifier has to work harder to check safety of the packet access.

Patch 1 prepares the code and Patch 2 adds new checks for direct
packet access and all of them are gated with 'env->allow_ptr_leaks'
which is true for root only.
Patch 3 improves search pruning for large programs.
Patch 4 wires in verifier's changes with net/core/filter side.
Patch 5 updates docs
Patches 6 and 7 add tests.

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b307a8e

samples/bpf: add verifier tests · 883e44e4

由 Alexei Starovoitov 提交于 5月 05, 2016

add few tests for "pointer to packet" logic of the verifier
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

883e44e4

samples/bpf: add 'pointer to packet' tests · 65d472fb

由 Alexei Starovoitov 提交于 5月 05, 2016

parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9

parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.

parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.

simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs  = 21.4Mpps

Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.

These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65d472fb

bpf: add documentation for 'direct packet access' · f9c8d19d

由 Alexei Starovoitov 提交于 5月 05, 2016

explain how verifier checks safety of packet access
and update email addresses.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9c8d19d

bpf: wire in data and data_end for cls_act_bpf · db58ba45

由 Alexei Starovoitov 提交于 5月 05, 2016

allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers.
The bpf helpers that change skb->data need to update data_end pointer as well.
The verifier checks that programs always reload data, data_end pointers
after calls to such bpf helpers.
We cannot add 'data_end' pointer to struct qdisc_skb_cb directly,
since it's embedded as-is by infiniband ipoib, so wrapper struct is needed.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db58ba45

bpf: improve verifier state equivalence · 735b4333

由 Alexei Starovoitov 提交于 5月 05, 2016

since UNKNOWN_VALUE type is weaker than CONST_IMM we can un-teach
verifier its recognition of constants in conditional branches
without affecting safety.
Ex:
if (reg == 123) {
  .. here verifier was marking reg->type as CONST_IMM
     instead keep reg as UNKNOWN_VALUE
}

Two verifier states with UNKNOWN_VALUE are equivalent, whereas
CONST_IMM_X != CONST_IMM_Y, since CONST_IMM is used for stack range
verification and other cases.
So help search pruning by marking registers as UNKNOWN_VALUE
where possible instead of CONST_IMM.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

735b4333

bpf: direct packet access · 969bf05e

由 Alexei Starovoitov 提交于 5月 05, 2016

Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.

Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
  struct iphdr *ip;

  if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
      /* packet too small */
      return 0;

  ip = skb->data + ETH_HLEN;

  /* access IP header fields with direct loads */
  if (ip->version != 4 || ip->saddr == 0x7f000001)
      return 1;
  [...]
}

This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.

For XDP the direct store instructions can be allowed as well.

The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dwSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

969bf05e

bpf: cleanup verifier code · 1a0dc1ac

由 Alexei Starovoitov 提交于 5月 05, 2016

cleanup verifier code and prepare it for addition of "pointer to packet" logic
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a0dc1ac

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 95aef7ce

由 David S. Miller 提交于 5月 06, 2016

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-05-05

This series contains updates to i40e and i40evf.

The theme behind this series is code reduction, yeah!  Jesse provides
most of the changes starting with a refactor of the interpretation of
a tunnel which lets us start using the hardware's parsing.  Removed
the packet split receive routine and ancillary code in preparation
for the Rx-refactor.  The refactor of the receive routine,
aligns the receive routine with the one in ixgbe which was highly
optimized.  The hardware supports a 16 byte descriptor for receive,
but the driver was never using it in production.  There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole lot
of complexity while getting rid of the code.  Fixed a bug where while
changing the number of descriptors using ethtool, the driver did not
test the limits of the system memory before permanently assuming it
would be able to get receive buffer memory.

Mitch fixes a memory leak of one page each time the driver is opened by
allocating the correct number of receive buffers and do not fiddle with
next_to_use in the VF driver.

Arnd Bergmann fixed a indentation issue by adding the appropriate
curly braces in i40e_vc_config_promiscuous_mode_msg().

Julia Lawall fixed an issue found by Coccinelle, where i40e_client_ops
structure can be const since it is never modified.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95aef7ce

net: vrf: Create FIB tables on link create · b3b4663c

由 David Ahern 提交于 5月 04, 2016

Tables have to exist for VRFs to function. Ensure they exist
when VRF device is created.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3b4663c

cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure · f37bd0cc

由 Jon Maxwell 提交于 5月 05, 2016

We recently had a system crash in the cnic module. Vmcore analysis confirmed
that "ip link up" was executed which failed due to an allocation failure
because of memory fragmentation. Futher analysis revealed that the cnic irq
vector was still allocated after the "ip link up" that failed. When
"ip link down" was executed it called free_msi_irqs() which crashed the system
because the cnic irq was still inuse.

PANIC: "kernel BUG at drivers/pci/msi.c:411!"

The code execution was:

cnic_netdev_event()
if (event == NETDEV_UP) {
.
.
       ▹       if (!cnic_start_hw(dev))
cnic_start_hw()
calls cnic_cm_open() which failed with -ENOMEM
cnic_start_hw() then took the err1 path:

err1:↩
       cp->free_resc(dev);↩ <---- frees resources but not irq vector
       pci_dev_put(dev->pcidev);↩
       return err;↩
}↩

This returns control back to cnic_netdev_event() but now the cnic irq vector
is still allocated even although cnic_cm_open() failed. The next
"ip link down" while trigger the crash.

The cnic_start_hw() routine is not handling the allocation failure correctly.
Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating
that the hardware has been started in cnic_start_hw(). If it has then call
cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise
just maintain the previous behaviour and free cnic resources.

I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return
code.

# ip link set dev enpX down
# ip link set dev enpX up <--- hit's allocation failure
# ip link set dev enpX down <--- crashes here

With this patch I confirmed there was no crash in the reproducer.
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f37bd0cc

06 5月, 2016 12 次提交

i40e: constify i40e_client_ops structure · 3949c4ac

由 Julia Lawall 提交于 5月 01, 2016

The i40e_client_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3949c4ac

i40e: fix misleading indentation · ce927db4

由 Arnd Bergmann 提交于 4月 29, 2016

Newly added code in i40e_vc_config_promiscuous_mode_msg() is indented
in a way that gcc rightly complains about:

drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function 'i40e_vc_config_promiscuous_mode_msg':
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1543:4: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
    if (f->vlan >= 0 && f->vlan <= I40E_MAX_VLANID)
    ^~
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1550:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
     aq_err = pf->hw.aq.asq_last_status;

From the context, it looks like the aq_err assignment was meant to be
inside of the conditional expression, so I'm adding the appropriate
curly braces now.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 5676a8b9 ("i40e: Add VF promiscuous mode driver support")
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ce927db4

i40e: Test memory before ethtool alloc succeeds · 147e81ec

由 Jesse Brandeburg 提交于 4月 18, 2016

When testing on systems with very limited amounts of RAM, a bug was
found where, while changing the number of descriptors using ethtool,
the driver didn't test the limits of system memory before permanently
assuming it would be able to get receive buffer memory.

Work around this issue by pre-allocation of the receive buffer
memory, in the "ghost" ring, which is then used during reinit
using the new ring length.

Change-Id: I92d7a5fb59a6c884b2efdd1ec652845f101c3359
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

147e81ec

i40evf: Allocate Rx buffers properly · b163098e

由 Mitch Williams 提交于 4月 18, 2016

Allocate the correct number of RX buffers, and don't fiddle with
next_to_use. The common RX code handles all of this. This fixes a memory
leak of one page each time the driver is opened.

Change-Id: Id06eca353086e084921f047acad28c14745684ee
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b163098e

i40e/i40evf: Remove unused hardware receive descriptor code · bec60fc4

由 Jesse Brandeburg 提交于 4月 18, 2016

The hardware supports a 16 byte descriptor for receive, but the
driver was never using it in production.  There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole
lot of complexity while getting rid of the code.

Also since the previous patch made us use no-split mode all the
time, drop any support in the driver for any other value in dtype
and assume it is always zero (aka no-split).

Hooray for code removal!

Change-ID: I2257e902e4dad84a07b94db6d2e6f4ce69b27bc0
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

bec60fc4

i40evf: refactor receive routine · ab9ad98e

由 Jesse Brandeburg 提交于 4月 18, 2016

This is part 2 of the Rx refactor series, just including
changes to i40evf.

This refactor aligns the receive routine with the one in
ixgbe which was highly optimized.  This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.

In order to do this:
- consolidate the receive path into a single function that doesn't
  use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove VF ethtool control over packet split
- remove priv_flags interface since it is unused
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ab9ad98e

i40evf: Drop packet split receive routine · 19b85e67

由 Jesse Brandeburg 提交于 4月 18, 2016

As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.

Some of the split related context set up code stays in
i40e_virtchnl_pf.c in case an older VF driver tries to load
and still wants to use packet split.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

19b85e67

i40e: Refactor receive routine · 1a557afc

由 Jesse Brandeburg 提交于 4月 20, 2016

This is part 1 of the Rx refactor series, just including
changes to i40e.

This refactor aligns the receive routine with the one in
ixgbe which was highly optimized.  This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.

In order to do this:
- consolidate the receive path into a single function that doesn't
  use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove ethtool control over packet split

Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1a557afc

net/mlx4: Avoid wrong virtual mappings · 73898db0

由 Haggai Abramovsky 提交于 5月 04, 2016

The dma_alloc_coherent() function returns a virtual address which can
be used for coherent access to the underlying memory.  On some
architectures, like arm64, undefined behavior results if this memory is
also accessed via virtual mappings that are not coherent.  Because of
their undefined nature, operations like virt_to_page() return garbage
when passed virtual addresses obtained from dma_alloc_coherent().  Any
subsequent mappings via vmap() of the garbage page values are unusable
and result in bad things like bus errors (synchronous aborts in ARM64
speak).

The mlx4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
device is opened.

Prevent Ethernet driver to run this problematic code by forcing it to
allocate contiguous memory. As for the Infiniband driver, at first we
are trying to allocate contiguous memory, but in case of failure roll
back to work with fragmented memory.
Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reported-by: NDavid Daney <david.daney@cavium.com>
Tested-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73898db0

i40e/i40evf: Remove reference to ring->dtype · 04b3b779

由 Jesse Brandeburg 提交于 4月 18, 2016

As part of the rx-refactor, the dtype variable in the i40e_ring
struct is no longer used, so remove it.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

04b3b779

i40e: Drop packet split receive routine · b32bfa17

由 Jesse Brandeburg 提交于 4月 18, 2016

As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b32bfa17

i40e/i40evf: Refactor tunnel interpretation · f8a952cb

由 Jesse Brandeburg 提交于 4月 18, 2016

Refactor the interpretation of a tunnel.  This removes
some code and lets us start using the hardware's parsing.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f8a952cb

05 5月, 2016 17 次提交

MAINTAINERS: Cleanup Intel Wired LAN maintainers list · 035cd6ba

由 Jeff Kirsher 提交于 5月 04, 2016

With the recent "retirements" and other changes, make the maintainers
list a lot less confusing and a bit more straight forward.
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NShannon Nelson <sln@onemain.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

035cd6ba

tcp: two more missing bh disable · 777c6ae5

由 Eric Dumazet 提交于 5月 04, 2016

percpu_counter only have protection against preemption.

TCP stack uses them possibly from BH, so we need BH protection
in contexts that could be run in process context

Fixes: c10d9310 ("tcp: do not assume TCP code is non preemptible")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

777c6ae5

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · aa8a8b05

由 David S. Miller 提交于 5月 04, 2016

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2016-05-04

This series contains updates to ixgbe, ixgbevf and traffic class helpers.

Sridhar adds helper functions to the tc_mirred header to access tcf_mirred
information and then implements them for ixgbe to enable redirection to
a SRIOV VF or an offloaded MACVLAN device queue via tc 'mirred' action.

Amritha adds support to set filters with multiple header fields (L3,L4)
to match on.

KY Srinivasan from Microsoft add Hyper-V support into ixgbevf.

Emil adds 82599 sub-device IDs that were missing from the list of parts
that support WoL.  Then simplified the logic we use to determine WoL
support by reading the EEPROM bits for MACs X540 and newer.

Preethi cleaned up duplicate and unused device IDs.  Fixed our ethtool
stat reporting where we were ignoring higher 32 bits of stats registers,
so fill out 64 bit stat values into two 32 bit words.

Babu Moger from Oracle improves VF performance issues on SPARC.

Alex Duyck cleans up some of the Hyper-V implementation from KY so that
we can just use function pointers instead of having to identify if a
given VF is running on a Linux or Windows PF.

Usha makes sure that DCB and FCoE is disabled for X550EM_x/a MACs and
cleans up the DCB initialization in the process.

Tony cleans up the API for ixgbevf_update_xcast_mode() so we do not
have to pass in the netdev parameter, since it was never used in the
function.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa8a8b05

drivers: fix dev->trans_start removal fallout · 3e66bab3

由 Florian Westphal 提交于 5月 04, 2016

kbuild test robot reported a build failure on s390.
While at it, also fix missing conversion in the tilera driver.

Fixes: 9b36627a ("net: remove dev->trans_start")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e66bab3

bonding: update documentation section after dev->trans_start removal · 5c2a9644

由 Florian Westphal 提交于 5月 04, 2016

Drivers that use LLTX need to update trans_start of the netdev_queue.
(Most drivers don't use LLTX; stack does this update if .ndo_start_xmit
 returned TX_OK).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c2a9644

usbnet: smsc95xx: silence an uninitialized variable warning · 5a36b68b

由 Dan Carpenter 提交于 5月 04, 2016

If the call to fn() fails then "buf" is uninitialized.  Just return the
error code in that case.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a36b68b

usbnet/smsc75xx: silence uninitialized variable warning · 58ef6a3f

由 Dan Carpenter 提交于 5月 04, 2016

If the fn() calls fail then "buf" is uninitialized.  Just return early
in that situation.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58ef6a3f

tcp: must block bh in __inet_twsk_hashdance() · 614bdd4d

由 Eric Dumazet 提交于 5月 03, 2016

__inet_twsk_hashdance() might be called from process context,
better block BH before acquiring bind hash and established locks

Fixes: c10d9310 ("tcp: do not assume TCP code is non preemptible")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

614bdd4d

tcp: fix lockdep splat in tcp_snd_una_update() · 46cc6e49

由 Eric Dumazet 提交于 5月 03, 2016

tcp_snd_una_update() and tcp_rcv_nxt_update() call
u64_stats_update_begin() either from process context or BH handler.

This triggers a lockdep splat on 32bit & SMP builds.

We could add u64_stats_update_begin_bh() variant but this would
slow down 32bit builds with useless local_disable_bh() and
local_enable_bh() pairs, since we own the socket lock at this point.

I add sock_owned_by_me() helper to have proper lockdep support
even on 64bit builds, and new u64_stats_update_begin_raw()
and u64_stats_update_end_raw methods.

Fixes: c10d9310 ("tcp: do not assume TCP code is non preemptible")
Reported-by: NFabio Estevam <festevam@gmail.com>
Diagnosed-by: NFrancois Romieu <romieu@fr.zoreil.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NFabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46cc6e49

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 5332174a

由 David S. Miller 提交于 5月 04, 2016

Antonio Quartulli says:

====================
pull request: batman-adv 20160504

In this pull request you have:
- two changes to the MAINTAINERS file where one marks our mailing list
  as moderated and the other adds a missing documentation file
- kernel-doc fixes
- code refactoring and various cleanups
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5332174a

mdio_bus: don't return NULL from mdiobus_scan() · e98a3aab

由 Sergei Shtylyov 提交于 5月 03, 2016

I've finally noticed that mdiobus_scan() also returns either NULL or error
value on failure. Return ERR_PTR(-ENODEV) instead of NULL since this is
the error value already filtered out by the callers that want to ignore
the MDIO address scan failure...
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e98a3aab

Merge branch 'kill_trans_start' · 4d659fcb

由 David S. Miller 提交于 5月 04, 2016

Florian Westphal says:

====================
net: remove trans_start from struct net_device

We currently have two instances for trans_start, once in
net_device and once in netdev_queue.

This series removes trans_start from net_device.
Updates to dev->trans_start are replaced with updates to netdev queue 0.

This series is compile-tested only.
Replacement is done in 3 steps:

1. Replace read-accesses:
  x = dev->trans_start

  gets replaced by
  x = dev_trans_start(dev)

2. Replace write accesses:
  dev->trans_start = jiffies;

  gets replaced with new helper:
  netif_trans_update(dev);

3. This helper is then changed to set
   netdev_get_tx_queue(dev, 0)->trans_start
   instead of dev->trans_start.

After this dev->trans_start can be removed.

It should be noted that after this series several instances
of netif_trans_update() are useless (if they occur in
.ndo_start_xmit and driver doesn't set LLTX flag -- stack already
did an update).

Comments welcome.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d659fcb

net: remove dev->trans_start · 9b36627a

由 Florian Westphal 提交于 5月 03, 2016

previous patches removed all direct accesses to dev->trans_start,
so change the netif_trans_update helper to update trans_start of
netdev queue 0 instead and then remove trans_start from struct net_device.

AFAICS a lot of the netif_trans_update() invocations are now useless
because they occur in ndo_start_xmit and driver doesn't set LLTX
(i.e. stack already took care of the update).

As I can't test any of them it seems better to just leave them alone.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b36627a

treewide: replace dev->trans_start update with helper · 860e9538

由 Florian Westphal 提交于 5月 03, 2016

Replace all trans_start updates with netif_trans_update helper.
change was done via spatch:

struct net_device *d;
@@
- d->trans_start = jiffies
+ netif_trans_update(d)

Compile tested only.

Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: linux-xtensa@linux-xtensa.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: MPT-FusionLinux.pdl@broadcom.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-can@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: linux-bluetooth@vger.kernel.org
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
Acked-by: NAntonio Quartulli <a@unstable.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

860e9538

netdevice: add helper to update trans_start · ba162f8e

由 Florian Westphal 提交于 5月 03, 2016

trans_start exists twice:
- as member of net_device (legacy)
- as member of netdev_queue

In order to get rid of the legacy case, add a helper for the
dev->trans_update (this patch), then convert spots that do

dev->trans_start = jiffies

to use this helper (next patch).

This would then allow us to change the helper so that it updates the
trans_stamp of netdev queue 0 instead.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba162f8e

drivers: replace dev->trans_start accesses with dev_trans_start · 4d0e9657

由 Florian Westphal 提交于 5月 03, 2016

a trans_start struct member exists twice:
- in struct net_device (legacy)
- in struct netdev_queue

Instead of open-coding dev->trans_start usage to obtain the current
trans_start value, use dev_trans_start() instead.

This is not exactly the same, as dev_trans_start also considers
the trans_start values of the netdev queues owned by the device
and provides the most recent one.

For legacy devices this doesn't matter as dev_trans_start can cope
with netdev trans_start values of 0 (they are ignored).

This is a prerequisite to eventual removal of dev->trans_start.

Cc: linux-rdma@vger.kernel.org
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d0e9657

dmfe: kill DEVICE define · a6e5472d

由 Florian Westphal 提交于 5月 03, 2016

use net_device directly. Compile tested, objdiff shows no changes.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6e5472d

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功