1. 29 6月, 2013 11 次提交
  2. 28 6月, 2013 6 次提交
    • N
      bonding: when cloning a MAC use NET_ADDR_STOLEN · ae0d6750
      nikolay@redhat.com 提交于
      A simple semantic change, when a slave's MAC is cloned by the bond
      master then set addr_assign_type to NET_ADDR_STOLEN instead of
      NET_ADDR_SET. Also use bond_set_dev_addr() in BOND_FOM_ACTIVE mode
      to change the bond's MAC address because the assign_type has to be
      set properly.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0d6750
    • N
      bonding: remove unnecessary dev_addr_from_first member · 97a1e639
      nikolay@redhat.com 提交于
      In struct bonding there's a member called dev_addr_from_first which is
      used to denote when the bond dev should clone the first slave's MAC
      address but since we have netdev's addr_assign_type variable that is not
      necessary. We clone the first slave's MAC each time we have a random MAC
      set to the bond device. This has the nice side-effect of also fixing an
      inconsistency - when the MAC address of the bond dev is set after its
      creation, but prior to having slaves, it's not kept and the first slave's
      MAC is cloned. The only way to keep the MAC was to create the bond device
      with the MAC address set (e.g. through ip link). In all cases if the
      bond device is left without any slaves - its MAC gets reset to a random
      one as before.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a1e639
    • N
      bonding: remove unnecessary setup_by_slave member · 8d2ada77
      nikolay@redhat.com 提交于
      We have a member called setup_by_slave in struct bonding to denote if the
      bond dev has different type than ARPHRD_ETHER, but that is already denoted
      in bond's netdev type variable if it was setup by the slave, so use that
      instead of the member.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d2ada77
    • P
      netlink: fix splat in skb_clone with large messages · 3a36515f
      Pablo Neira 提交于
      Since (c05cdb1b netlink: allow large data transfers from user-space),
      netlink splats if it invokes skb_clone on large netlink skbs since:
      
      * skb_shared_info was not correctly initialized.
      * skb->destructor is not set in the cloned skb.
      
      This was spotted by trinity:
      
      [  894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
      [  894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0
      [...]
      [  894.991034] Call Trace:
      [  894.991034]  [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240
      [  894.991034]  [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40
      [  894.991034]  [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0
      [  894.991034]  [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0
      
      Fix it by:
      
      1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
         that sets our special skb->destructor in the cloned skb. Moreover, handle
         the release of the large cloned skb head area in the destructor path.
      
      2) not allowing large skbuffs in the netlink broadcast path. I cannot find
         any reasonable use of the large data transfer using netlink in that path,
         moreover this helps to skip extra skb_clone handling.
      
      I found two more netlink clients that are cloning the skbs, but they are
      not in the sendmsg path. Therefore, the sole client cloning that I found
      seems to be the fib frontend.
      
      Thanks to Eric Dumazet for helping to address this issue.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a36515f
    • N
      sit: add support of x-netns · 5e6700b3
      Nicolas Dichtel 提交于
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e6700b3
    • N
      dev: introduce skb_scrub_packet() · 621e84d6
      Nicolas Dichtel 提交于
      The goal of this new function is to perform all needed cleanup before sending
      an skb into another netns.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      621e84d6
  3. 27 6月, 2013 4 次提交
  4. 26 6月, 2013 19 次提交
    • A
      arc_emac: fix compile-time errors & warnings on PPC64 · a4a1139b
      Alexey Brodkin 提交于
      As reported by "kbuild test robot" there were some errors and warnings
      on attempt to build kernel with "make ARCH=powerpc allmodconfig".
      
      And this patch addresses both errors and warnings.
      Below is a list of introduced changes:
      1. Fix compile-time errors (misspellings in "dma_unmap_single") on PPC.
      2. Use DMA address instead of "skb->data" as a pointer to data buffer.
      This fixed warnings on pointer to int conversion on 64-bit systems.
      3. Re-implemented initial allocation of Rx buffers in "arc_emac_open" in
      the same way they're re-allocated during operation (receiving packets).
      So once again DMA address could be used instead of "skb->data".
      4. Explicitly use EMAC_BUFFER_SIZE for Rx buffers allocation.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      
      Cc: netdev@vger.kernel.org
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Mischa Jonker <mjonker@synopsys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: Florian Fainelli <florian@openwrt.org>
      Cc: David Laight <david.laight@aculab.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4a1139b
    • V
      bonding: add an option to fail when any of arp_ip_target is inaccessible · 8599b52e
      Veaceslav Falico 提交于
      Currently, we fail only when all of the ips in arp_ip_target are gone.
      However, in some situations we might need to fail if even one host from
      arp_ip_target becomes unavailable.
      
      All situations, obviously, rely on the idea that we need *completely*
      functional network, with all interfaces/addresses working correctly.
      
      One real world example might be:
      vlans on top on bond (hybrid port). If bond and vlans have ips assigned
      and we have their peers monitored via arp_ip_target - in case of switch
      misconfiguration (trunk/access port), slave driver malfunction or
      tagged/untagged traffic dropped on the way - we will be able to switch
      to another slave.
      
      Though any other configuration needs that if we need to have access to all
      arp_ip_targets.
      
      This patch adds this possibility by adding a new parameter -
      arp_all_targets (both as a module parameter and as a sysfs knob). It can be
      set to:
      
      	0 or any (the default) - which works exactly as it's working now -
      	the slave is up if any of the arp_ip_targets are up.
      
      	1 or all - the slave is up if all of the arp_ip_targets are up.
      
      This parameter can be changed on the fly (via sysfs), and requires the mode
      to be active-backup and arp_validate to be enabled (it obeys the
      arp_validate config on which slaves to validate).
      
      Internally it's done through:
      
      1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
         an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
         last time we've received arp from bond->params.arp_targets[i] on this
         slave.
      
      2) If we successfully validate an arp from bond->params.arp_targets[i] in
         bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
         current jiffies value.
      
      3) When getting slave's last_rx via slave_last_rx(), we return the oldest
         time when we've received an arp from any address in
         bond->params.arp_targets[].
      
      If the value of arp_all_targets == 0 - we still work the same way as
      before.
      
      Also, update the documentation to reflect the new parameter.
      
      v3->v4:
      Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
      more clear, don't fail setting arp_all_targets if arp_validate is not set -
      it has no effect anyway but can be easier to set up. Also, print a warning
      if the last arp_ip_target is removed while the arp_interval is on, but not
      the arp_validate.
      
      v2->v3:
      Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
      arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
      use the same initialization value for target_last_arp_rx[] as is used
      for the default last_arp_rx, to avoid useless interface flaps.
      
      Also, instead of failing to remove the last arp_ip_target just print a
      warning - otherwise it might break existing scripts.
      
      v1->v2:
      Correctly handle adding/removing hosts in arp_ip_target - we need to
      shift/initialize all slave's target_last_arp_rx. Also, don't fail module
      loading on arp_all_targets misconfiguration, just disable it, and some
      minor style fixes.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8599b52e
    • V
      bonding: doc: some details on backup slave arp validation · d7d35c68
      Veaceslav Falico 提交于
      Add some details to bonding documentation on how backup slave arp
      validation works.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7d35c68
    • V
      bonding: don't trust arp requests unless active slave really works · aeea64ac
      Veaceslav Falico 提交于
      Currently, if we receive any arp packet on a backup slave in active-backup
      mode and arp_validate enabled, we suppose that it's an arp request, swap
      source/target ip and try to validate it. This optimization gives us
      virtually no downtime in the most common situation (active and backup
      slaves are in the same broadcast domain and the active slave failed).
      
      However, if we can't reach the arp_ip_target(s), we end up in an endless
      loop of reselecting slaves, because we receive our arp requests, sent by
      the active slave, and think that backup slaves are up, thus selecting them
      as active and, again, sending arp requests, which fool our backup slaves.
      
      Fix this by not validating the swapped arp packets if the current active
      slave didn't receive any arp reply after it was selected as active. This
      way we will only accept arp requests if we know that the current active
      slave can actually reach arp_ip_target.
      
      v3->v4:
      Obey 80 lines and make checkpatch.pl happy, per Sergei's suggestion.
      
      v1->v3:
      No change.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeea64ac
    • V
      bonding: don't validate arp if we don't have to · 2c146102
      Veaceslav Falico 提交于
      Currently, we validate all the incoming arps if arp_validate not 0.
      However, we don't have to validate backup slaves if arp_validate == active
      and vice versa, so return early in bond_arp_rcv() in these cases.
      
      It works correctly now because we verify arp_validate in slave_last_rx(),
      however we're just doing useless work in bond_arp_rcv().
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c146102
    • V
      bonding: don't add duplicate targets to arp_ip_target · 0afee4e8
      Veaceslav Falico 提交于
      Print a warning and skip them.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0afee4e8
    • V
      bonding: add helper function bond_get_targets_ip(targets, ip) · 87a7b84b
      Veaceslav Falico 提交于
      Add function bond_get_targets_ip(targets, ip) which searches through
      targets array of ips (arp_targets) and returns the position of first
      match. If ip == 0, returns the first free slot. On failure to find the
      ip or free slot, return -1.
      
      Use it to verify if the arp we've received is valid and in sysfs.
      
      v1->v2:
      Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
      per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
      update 'null' arp_ip_targets' last_rx. Also, address style.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87a7b84b
    • L
      net: davinci_mdio: gaurd the DT code with IS_ENABLED(CONFIG_OF) · 277e2a84
      Lad, Prabhakar 提交于
      guard the davinci_mdio_of_mtable table and davinci_mdio_probe_dt()
      with CONFIG_OF.
      Signed-off-by: NLad, Prabhakar <prabhakar.csengg@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      277e2a84
    • L
      net: davinci_emac: simplify the OF parser code · 151328c8
      Lad, Prabhakar 提交于
      This patch cleans up the OF parser code, removes unnecessary checks
      on of_property_read_*() and guards davinci_emac_of_match table with
      CONFIG_OF.
      Signed-off-by: NLad, Prabhakar <prabhakar.csengg@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      151328c8
    • L
      net: davinci: emac: Convert to devm_* api · 6892b41d
      Lad, Prabhakar 提交于
      Use devm_ioremap_resource instead of devm_request_mem_region()/devm_ioremap()
      and devm_request_irq() instead of request_irq().
      
      This ensures more consistent error values and simplifies error paths.
      Signed-off-by: NLad, Prabhakar <prabhakar.csengg@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6892b41d
    • C
      doc: fix some syntax errors in netlink mmap sample code · 76237576
      Cong Wang 提交于
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76237576
    • V
      macvtap: Perform GSO on forwarding path. · 3e4f8b78
      Vlad Yasevich 提交于
      When macvtap forwards skb to its tap, it needs to check
      if GSO needs to be performed.  This is sometimes necessary
      when the HW device performed GRO, but the guest reading
      from the tap does not support it (ex: Windows 7).
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e4f8b78
    • V
      macvtap: Let TUNSETOFFLOAD actually controll offload features. · 2be5c767
      Vlad Yasevich 提交于
      When the user issues TUNSETOFFLOAD ioctl, macvtap does not do
      anything other then to verify arguments.  This patch adds
      functionality to allow users to actually control offload features.
      NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the
      features can be controlled.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2be5c767
    • V
      macvtap: Consistently use rcu functions · ac4e4af1
      Vlad Yasevich 提交于
      Currently macvtap uses rcu_bh functions in its
      user facing fuction macvtap_get_user() and macvtap_put_user().
      However, its packet handlers use normal rcu as the rcu_read_lock()
      is taken in netif_receive_skb().  We can safely discontinue
      the usage or rcu with bh disabled.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac4e4af1
    • V
      macvtap: Convert to using rtnl lock · 441ac0fc
      Vlad Yasevich 提交于
      Macvtap uses a private lock to protect the relationship between
      macvtap_queue and macvlan_dev.  The private lock is not needed
      since the relationship is managed by user via open(), release(),
      and dellink() calls.  dellink() already happens under rtnl, so
      we can safely convert open() and release(), and use it in ioctl()
      as well.
      
      Suggested by Eric Dumazet.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      441ac0fc
    • E
      net: poll/select low latency socket support · 2d48d67f
      Eliezer Tamir 提交于
      select/poll busy-poll support.
      
      Split sysctl value into two separate ones, one for read and one for poll.
      updated Documentation/sysctl/net.txt
      
      Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
      sk_poll_ll if possible. sock_poll sets this flag in its return value
      to indicate to select/poll when a socket that can busy poll is found.
      
      When poll/select have nothing to report, call the low-level
      sock_poll again until we are out of time or we find something.
      
      Once the system call finds something, it stops setting POLL_LL, so it can
      return the result to the user ASAP.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d48d67f
    • A
      ethernet/arc/arc_emac - Add new driver · e4f2379d
      Alexey Brodkin 提交于
      Driver for non-standard on-chip ethernet device ARC EMAC 10/100,
      instantiated in some legacy ARC (Synopsys) FPGA Boards such as
      ARCAngel4/ML50x.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Mischa Jonker <mjonker@synopsys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: Florian Fainelli <florian@openwrt.org>
      Cc: David Laight <david.laight@aculab.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4f2379d
    • D
      net: sctp: simplify sctp_get_port · 62208f12
      Daniel Borkmann 提交于
      No need to have an extra ret variable when we directly can return
      the value of sctp_get_port_local().
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62208f12
    • D
      net: sctp: decouple cleaning some socket data from endpoint · 0a2fbac1
      Daniel Borkmann 提交于
      Rather instead of having the endpoint clean the garbage from the
      socket, use a sk_destruct handler sctp_destruct_sock(), that does
      the job for that when there are no more references on the socket.
      At least do this for our crypto transform through crypto_free_hash()
      that is allocated when in listening state.
      
      Also, perform sctp_put_port() only when sk is valid. At a later
      point in time we can still determine if there's an option of
      placing this into sk_prot->unhash() or sctp_endpoint_free() without
      any races. For now, leave it in sctp_endpoint_destroy() though.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a2fbac1