1. 10 10月, 2013 1 次提交
    • E
      net: gro: allow to build full sized skb · 8a29111c
      Eric Dumazet 提交于
      skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
      typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.
      
      It's relatively easy to extend the skb using frag_list to allow
      more frags to be appended into the last sk_buff.
      
      This still builds very efficient skbs, and allows reaching 45 MSS per
      skb.
      
      (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
      sk_buff)
      
      High speed TCP flows benefit from this extension by lowering TCP stack
      cpu usage (less packets stored in receive queue, less ACK packets
      processed)
      
      Forwarding setups could be hurt, as such skbs will need to be
      linearized, although its not a new problem, as GRO could already
      provide skbs with a frag_list.
      
      We could make the 65536 bytes threshold a tunable to mitigate this.
      
      (First time we need to linearize skb in skb_needs_linearize(), we could
      lower the tunable to ~16*1460 so that following skb_gro_receive() calls
      build smaller skbs)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a29111c
  2. 09 10月, 2013 2 次提交
  3. 08 10月, 2013 3 次提交
    • E
      net: Separate the close_list and the unreg_list v2 · 5cde2829
      Eric W. Biederman 提交于
      Separate the unreg_list and the close_list in dev_close_many preventing
      dev_close_many from permuting the unreg_list.  The permutations of the
      unreg_list have resulted in cases where the loopback device is accessed
      it has been freed in code such as dst_ifdown.  Resulting in subtle memory
      corruption.
      
      This is the second bug from sharing the storage between the close_list
      and the unreg_list.  The issues that crop up with sharing are
      apparently too subtle to show up in normal testing or usage, so let's
      forget about being clever and use two separate lists.
      
      v2: Make all callers pass in a close_list to dev_close_many
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cde2829
    • A
      net: fix unsafe set_memory_rw from softirq · d45ed4a4
      Alexei Starovoitov 提交于
      on x86 system with net.core.bpf_jit_enable = 1
      
      sudo tcpdump -i eth1 'tcp port 22'
      
      causes the warning:
      [   56.766097]  Possible unsafe locking scenario:
      [   56.766097]
      [   56.780146]        CPU0
      [   56.786807]        ----
      [   56.793188]   lock(&(&vb->lock)->rlock);
      [   56.799593]   <Interrupt>
      [   56.805889]     lock(&(&vb->lock)->rlock);
      [   56.812266]
      [   56.812266]  *** DEADLOCK ***
      [   56.812266]
      [   56.830670] 1 lock held by ksoftirqd/1/13:
      [   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
      [   56.849757]
      [   56.849757] stack backtrace:
      [   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
      [   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
      [   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
      [   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
      [   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
      [   56.923006] Call Trace:
      [   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
      [   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
      [   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
      [   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
      [   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
      [   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
      [   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
      [   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
      [   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
      [   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
      [   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
      [   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
      [   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
      [   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
      [   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
      [   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
      [   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
      [   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
      [   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
      [   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
      [   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
      [   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
      [   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
      [   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70
      
      cannot reuse jited filter memory, since it's readonly,
      so use original bpf insns memory to hold work_struct
      
      defer kfree of sk_filter until jit completed freeing
      
      tested on x86_64 and i386
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d45ed4a4
    • M
      netif_set_xps_queue: make cpu mask const · 3573540c
      Michael S. Tsirkin 提交于
      virtio wants to pass in cpumask_of(cpu), make parameter
      const to avoid build warnings.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3573540c
  4. 04 10月, 2013 1 次提交
  5. 01 10月, 2013 3 次提交
  6. 29 9月, 2013 3 次提交
    • E
      net: introduce SO_MAX_PACING_RATE · 62748f32
      Eric Dumazet 提交于
      As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet
      scheduler"), this patch adds a new socket option.
      
      SO_MAX_PACING_RATE offers the application the ability to cap the
      rate computed by transport layer. Value is in bytes per second.
      
      u32 val = 1000000;
      setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
      
      To be effectively paced, a flow must use FQ packet scheduler.
      
      Note that a packet scheduler takes into account the headers for its
      computations. The effective payload rate depends on MSS and retransmits
      if any.
      
      I chose to make this pacing rate a SOL_SOCKET option instead of a
      TCP one because this can be used by other protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steinar H. Gunderson <sesse@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62748f32
    • E
      net: net_secret should not depend on TCP · 9a3bab6b
      Eric Dumazet 提交于
      A host might need net_secret[] and never open a single socket.
      
      Problem added in commit aebda156
      ("net: defer net_secret[] initialization")
      
      Based on prior patch from Hannes Frederic Sowa.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@strressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a3bab6b
    • E
      net: Delay default_device_exit_batch until no devices are unregistering v2 · 50624c93
      Eric W. Biederman 提交于
      There is currently serialization network namespaces exiting and
      network devices exiting as the final part of netdev_run_todo does not
      happen under the rtnl_lock.  This is compounded by the fact that the
      only list of devices unregistering in netdev_run_todo is local to the
      netdev_run_todo.
      
      This lack of serialization in extreme cases results in network devices
      unregistering in netdev_run_todo after the loopback device of their
      network namespace has been freed (making dst_ifdown unsafe), and after
      the their network namespace has exited (making the NETDEV_UNREGISTER,
      and NETDEV_UNREGISTER_FINAL callbacks unsafe).
      
      Add the missing serialization by a per network namespace count of how
      many network devices are unregistering and having a wait queue that is
      woken up whenever the count is decreased.  The count and wait queue
      allow default_device_exit_batch to wait until all of the unregistration
      activity for a network namespace has finished before proceeding to
      unregister the loopback device and then allowing the network namespace
      to exit.
      
      Only a single global wait queue is used because there is a single global
      lock, and there is a single waiter, per network namespace wait queues
      would be a waste of resources.
      
      The per network namespace count of unregistering devices gives a
      progress guarantee because the number of network devices unregistering
      in an exiting network namespace must ultimately drop to zero (assuming
      network device unregistration completes).
      
      The basic logic remains the same as in v1.  This patch is now half
      comment and half rtnl_lock_unregistering an expanded version of
      wait_event performs no extra work in the common case where no network
      devices are unregistering when we get to default_device_exit_batch.
      Reported-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50624c93
  7. 27 9月, 2013 9 次提交
    • V
      net: create sysfs symlinks for neighbour devices · 5831d66e
      Veaceslav Falico 提交于
      Also, remove the same functionality from bonding - it will be already done
      for any device that links to its lower/upper neighbour.
      
      The links will be created for dev's kobject, and will look like
      lower_eth0 for lower device eth0 and upper_bridge0 for upper device
      bridge0.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5831d66e
    • V
      net: expose the master link to sysfs, and remove it from bond · 842d67a7
      Veaceslav Falico 提交于
      Currently, we can have only one master upper neighbour, so it would be
      useful to create a symlink to it in the sysfs device directory, the way
      that bonding now does it, for every device. Lower devices from
      bridge/team/etc will automagically get it, so we could rely on it.
      
      Also, remove the same functionality from bonding.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      842d67a7
    • V
      net: add a possibility to get private from netdev_adjacent->list · b6ccba4c
      Veaceslav Falico 提交于
      It will be useful to get first/last element.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ccba4c
    • V
      net: add for_each iterators through neighbour lower link's private · 31088a11
      Veaceslav Falico 提交于
      Add a possibility to iterate through netdev_adjacent's private, currently
      only for lower neighbours.
      
      Add both RCU and RTNL/other locking variants of iterators, and make the
      non-rcu variant to be safe from removal.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31088a11
    • V
      net: add netdev_adjacent->private and allow to use it · 402dae96
      Veaceslav Falico 提交于
      Currently, even though we can access any linked device, we can't attach
      anything to it, which is vital to properly manage them.
      
      To fix this, add a new void *private to netdev_adjacent and functions
      setting/getting it (per link), so that we can save, per example, bonding's
      slave structures there, per slave device.
      
      netdev_master_upper_dev_link_private(dev, upper_dev, private) links dev to
      upper dev and populates the neighbour link only with private.
      
      netdev_lower_dev_get_private{,_rcu}() returns the private, if found.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      402dae96
    • V
      net: add RCU variant to search for netdev_adjacent link · 5249dec7
      Veaceslav Falico 提交于
      Currently we have only the RTNL flavour, however we can traverse it while
      holding only RCU, so add the RCU search. Add an RCU variant that uses
      list_head * as an argument, so that it can be universally used afterwards.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5249dec7
    • V
      net: add adj_list to save only neighbours · 2f268f12
      Veaceslav Falico 提交于
      Currently, we distinguish neighbours (first-level linked devices) from
      non-neighbours by the neighbour bool in the netdev_adjacent. This could be
      quite time-consuming in case we would like to traverse *only* through
      neighbours - cause we'd have to traverse through all devices and check for
      this flag, and in a (quite common) scenario where we have lots of vlans on
      top of bridge, which is on top of a bond - the bonding would have to go
      through all those vlans to get its upper neighbour linked devices.
      
      This situation is really unpleasant, cause there are already a lot of cases
      when a device with slaves needs to go through them in hot path.
      
      To fix this, introduce a new upper/lower device lists structure -
      adj_list, which contains only the neighbours. It works always in
      pair with the all_adj_list structure (renamed from upper/lower_dev_list),
      i.e. both of them contain the same links, only that all_adj_list contains
      also non-neighbour device links. It's really a small change visible,
      currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't
      change the main linked logic at all.
      
      Also, add some comments a fix a name collision in
      netdev_for_each_upper_dev_rcu() and rework the naming by the following
      rules:
      
      netdev_(all_)(upper|lower)_*
      
      If "all_" is present, then we work with the whole list of upper/lower
      devices, otherwise - only with direct neighbours. Uninline functions - to
      get better stack traces.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f268f12
    • V
      net: use lists as arguments instead of bool upper · 7863c054
      Veaceslav Falico 提交于
      Currently we make use of bool upper when we want to specify if we want to
      work with upper/lower list. It's, however, harder to read, debug and
      occupies a lot more code.
      
      Fix this by just passing the correct upper/lower_dev_list list_head pointer
      instead of bool upper, and work internally with it.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7863c054
    • H
      net: neighbour: use source address of last enqueued packet for solicitation · 4ed377e3
      Hannes Frederic Sowa 提交于
      Currently we always use the first member of the arp_queue to determine
      the sender ip address of the arp packet (or in case of IPv6 - source
      address of the ndisc packet). This skb is fixed as long as the queue is
      not drained by a complete purge because of a timeout or by a successful
      response.
      
      If the first packet enqueued on the arp_queue is from a local application
      with a manually set source address and the to be discovered system
      does some kind of uRPF checks on the source address in the arp packet
      the resolving process hangs until a timeout and restarts. This hurts
      communication with the participating network node.
      
      This could be mitigated a bit if we use the latest enqueued skb's
      source address for the resolving process, which is not as static as
      the arp_queue's head. This change of the source address could result in
      better recovery of a failed solicitation.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Julian Anastasov <ja@ssi.bg>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ed377e3
  8. 20 9月, 2013 1 次提交
    • N
      netpoll: fix NULL pointer dereference in netpoll_cleanup · d0fe8c88
      Nikolay Aleksandrov 提交于
      I've been hitting a NULL ptr deref while using netconsole because the
      np->dev check and the pointer manipulation in netpoll_cleanup are done
      without rtnl and the following sequence happens when having a netconsole
      over a vlan and we remove the vlan while disabling the netconsole:
      	CPU 1					CPU2
      					removes vlan and calls the notifier
      enters store_enabled(), calls
      netdev_cleanup which checks np->dev
      and then waits for rtnl
      					executes the netconsole netdev
      					release notifier making np->dev
      					== NULL and releases rtnl
      continues to dereference a member of
      np->dev which at this point is == NULL
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0fe8c88
  9. 13 9月, 2013 1 次提交
  10. 12 9月, 2013 1 次提交
    • E
      net: fix multiqueue selection · 50d1784e
      Eric Dumazet 提交于
      commit 416186fb ("net: Split core bits of netdev_pick_tx
      into __netdev_pick_tx") added a bug that disables caching of queue
      index in the socket.
      
      This is the source of packet reorders for TCP flows, and
      again this is happening more often when using FQ pacing.
      
      Old code was doing
      
      if (queue_index != old_index)
      	sk_tx_queue_set(sk, queue_index);
      
      Alexander renamed the variables but forgot to change sk_tx_queue_set()
      2nd parameter.
      
      if (queue_index != new_index)
      	sk_tx_queue_set(sk, queue_index);
      
      This means we store -1 over and over in sk->sk_tx_queue_mapping
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50d1784e
  11. 04 9月, 2013 3 次提交
  12. 31 8月, 2013 3 次提交
  13. 30 8月, 2013 4 次提交
    • V
      net: add netdev_upper_get_next_dev_rcu(dev, iter) · 48311f46
      Veaceslav Falico 提交于
      This function returns the next dev in the dev->upper_dev_list after the
      struct list_head **iter position, and updates *iter accordingly. Returns
      NULL if there are no devices left.
      
      Caller must hold RCU read lock.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48311f46
    • V
      net: remove search_list from netdev_adjacent · 620f3186
      Veaceslav Falico 提交于
      We already don't need it cause we see every upper/lower device in the list
      already.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      620f3186
    • V
      net: add lower_dev_list to net_device and make a full mesh · 5d261913
      Veaceslav Falico 提交于
      This patch adds lower_dev_list list_head to net_device, which is the same
      as upper_dev_list, only for lower devices, and begins to use it in the same
      way as the upper list.
      
      It also changes the way the whole adjacent device lists work - now they
      contain *all* of upper/lower devices, not only the first level. The first
      level devices are distinguished by the bool neighbour field in
      netdev_adjacent, also added by this patch.
      
      There are cases when a device can be added several times to the adjacent
      list, the simplest would be:
      
           /---- eth0.10 ---\
      eth0-		       --- bond0
           \---- eth0.20 ---/
      
      where both bond0 and eth0 'see' each other in the adjacent lists two times.
      To avoid duplication of netdev_adjacent structures ref_nr is being kept as
      the number of times the device was added to the list.
      
      The 'full view' is achieved by adding, on link creation, all of the
      upper_dev's upper_dev_list devices as upper devices to all of the
      lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
      versa. On unlink they are removed using the same logic.
      
      I've tested it with thousands vlans/bonds/bridges, everything works ok and
      no observable lags even on a huge number of interfaces.
      
      Memory footprint for 128 devices interconnected with each other via both
      upper and lower (which is impossible, but for the comparison) lists would be:
      
      128*128*2*sizeof(netdev_adjacent) = 1.5MB
      
      but in the real world we usualy have at most several devices with slaves
      and a lot of vlans, so the footprint will be much lower.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d261913
    • V
      net: rename netdev_upper to netdev_adjacent · aa9d8560
      Veaceslav Falico 提交于
      Rename the structure to reflect the upcoming addition of lower_dev_list.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa9d8560
  14. 29 8月, 2013 1 次提交
    • E
      sysfs: Restrict mounting sysfs · 7dc5dbc8
      Eric W. Biederman 提交于
      Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
      over the net namespace.  The principle here is if you create or have
      capabilities over it you can mount it, otherwise you get to live with
      what other people have mounted.
      
      Instead of testing this with a straight forward ns_capable call,
      perform this check the long and torturous way with kobject helpers,
      this keeps direct knowledge of namespaces out of sysfs, and preserves
      the existing sysfs abstractions.
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7dc5dbc8
  15. 28 8月, 2013 1 次提交
  16. 15 8月, 2013 2 次提交
  17. 14 8月, 2013 1 次提交
    • A
      rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header · 3e805ad2
      Asbjoern Sloth Toennesen 提交于
      Fix the iproute2 command `bridge vlan show`, after switching from
      rtgenmsg to ifinfomsg.
      
      Let's start with a little history:
      
      Feb 20:   Vlad Yasevich got his VLAN-aware bridge patchset included in
                the 3.9 merge window.
                In the kernel commit 6cbdceeb, he added attribute support to
                bridge GETLINK requests sent with rtgenmsg.
      
      Mar 6th:  Vlad got this iproute2 reference implementation of the bridge
                vlan netlink interface accepted (iproute2 9eff0e5c)
      
      Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
                http://patchwork.ozlabs.org/patch/239602/
                http://marc.info/?t=136680900700007
      
      Apr 28th: Linus released 3.9
      
      Apr 30th: Stephen released iproute2 3.9.0
      
      The `bridge vlan show` command haven't been working since the switch to
      ifinfomsg, or in a released version of iproute2. Since the kernel side
      only supports rtgenmsg, which iproute2 switched away from just prior to
      the iproute2 3.9.0 release.
      
      I haven't been able to find any documentation, about neither rtgenmsg
      nor ifinfomsg, and in which situation to use which, but kernel commit
      88c5b5ce seams to suggest that ifinfomsg should be used.
      
      Fixing this in kernel will break compatibility, but I doubt that anybody
      have been using it due to this bug in the user space reference
      implementation, at least not without noticing this bug. That said the
      functionality is still fully functional in 3.9, when reversing iproute2
      commit 63338dca.
      
      This could also be fixed in iproute2, but thats an ugly patch that would
      reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
      like rtgenmsg usage is discouraged. I'm assuming that the only reason
      that Vlad implemented the kernel side to use rtgenmsg, was because
      iproute2 was using it at the time.
      Signed-off-by: NAsbjoern Sloth Toennesen <ast@fiberby.net>
      Reviewed-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e805ad2