1. 30 3月, 2013 2 次提交
  2. 28 3月, 2013 5 次提交
  3. 27 3月, 2013 4 次提交
    • P
      ipv4: Fix ip-header identification for gso packets. · 330305cc
      Pravin B Shelar 提交于
      ip-header id needs to be incremented even if IP_DF flag is set.
      This behaviour was changed in commit 490ab081
      (IP_GRE: Fix IP-Identification).
      
      Following patch fixes it so that identification is always
      incremented.
      Reported-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      330305cc
    • H
      netlink: remove duplicated NLMSG_ALIGN · a88b9ce5
      Hong zhi guo 提交于
      NLMSG_HDRLEN is already aligned value. It's for directly reference
      without extra alignment.
      
      The redundant alignment here may confuse the API users.
      Signed-off-by: NHong Zhiguo <honkiko@gmail.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a88b9ce5
    • Y
      firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. · 6752c8db
      YOSHIFUJI Hideaki / 吉藤英明 提交于
      Inspection of upper layer protocol is considered harmful, especially
      if it is about ARP or other stateful upper layer protocol; driver
      cannot (and should not) have full state of them.
      
      IPv4 over Firewire module used to inspect ARP (both in sending path
      and in receiving path), and record peer's GUID, max packet size, max
      speed and fifo address.  This patch removes such inspection by extending
      our "hardware address" definition to include other information as well:
      max packet size, max speed and fifo.  By doing this, The neighbour
      module in networking subsystem can cache them.
      
      Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
            information will not be used in the driver when sending.
      
      When a packet is being sent, the IP layer fills our pseudo header with
      the extended "hardware address", including GUID and fifo.  The driver
      can look-up node-id (the real but rather volatile low-level address)
      by GUID, and then the module can send the packet to the wire using
      parameters provided in the extendedn hardware address.
      
      This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
      over IEEE1394 (RFC3146) share same "hardware address" format
      in their address resolution protocols.
      
      Here, extended "hardware address" is defined as follows:
      
      union fwnet_hwaddr {
      	u8 u[16];
      	struct {
      		__be64 uniq_id;		/* EUI-64			*/
      		u8 max_rec;		/* max packet size		*/
      		u8 sspd;		/* max speed			*/
      		__be16 fifo_hi;		/* hi 16bits of FIFO addr	*/
      		__be32 fifo_lo;		/* lo 32bits of FIFO addr	*/
      	} __packed uc;
      };
      
      Note that Hardware address is declared as union, so that we can map full
      IP address into this, when implementing MCAP (Multicast Cannel Allocation
      Protocol) for IPv6, but IP and ARP subsystem do not need to know this
      format in detail.
      
      One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
      is that 1394 ARP Request/Reply do not contain the target hardware address
      field (aka ar$tha).  This difference is handled in the ARP subsystem.
      
      CC: Stephan Gatzka <stephan.gatzka@gmail.com>
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6752c8db
    • P
      GRE: Refactor GRE tunneling code. · c5441932
      Pravin B Shelar 提交于
      Following patch refactors GRE code into ip tunneling code and GRE
      specific code. Common tunneling code is moved to ip_tunnel module.
      ip_tunnel module is written as generic library which can be used
      by different tunneling implementations.
      
      ip_tunnel module contains following components:
       - packet xmit and rcv generic code. xmit flow looks like
         (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
       - hash table of all devices.
       - lookup for tunnel devices.
       - control plane operations like device create, destroy, ioctl, netlink
         operations code.
       - registration for tunneling modules, like gre, ipip etc.
       - define single pcpu_tstats dev->tstats.
       - struct tnl_ptk_info added to pass parsed tunnel packet parameters.
      
      ipip.h header is renamed to ip_tunnel.h
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5441932
  4. 26 3月, 2013 1 次提交
  5. 25 3月, 2013 3 次提交
  6. 23 3月, 2013 3 次提交
    • R
      mm: zone_end_pfn is too small · f9228b20
      Russ Anderson 提交于
      Booting with 32 TBytes memory hits BUG at mm/page_alloc.c:552! (output
      below).
      
      The key hint is "page 4294967296 outside zone".
      4294967296 = 0x100000000 (bit 32 is set).
      
      The problem is in include/linux/mmzone.h:
      
        530 static inline unsigned zone_end_pfn(const struct zone *zone)
        531 {
        532         return zone->zone_start_pfn + zone->spanned_pages;
        533 }
      
      zone_end_pfn is "unsigned" (32 bits).  Changing it to "unsigned long"
      (64 bits) fixes the problem.
      
      zone_end_pfn() was added recently in commit 108bcc96 ("mm: add & use
      zone_end_pfn() and zone_spans_pfn()")
      
      Output from the failure.
      
        No AGP bridge found
        page 4294967296 outside zone [ 4294967296 - 4327469056 ]
        ------------[ cut here ]------------
        kernel BUG at mm/page_alloc.c:552!
        invalid opcode: 0000 [#1] SMP
        Modules linked in:
        CPU 0
        Pid: 0, comm: swapper Not tainted 3.9.0-rc2.dtp+ #10
        RIP: free_one_page+0x382/0x430
        Process swapper (pid: 0, threadinfo ffffffff81942000, task ffffffff81955420)
        Call Trace:
          __free_pages_ok+0x96/0xb0
          __free_pages+0x25/0x50
          __free_pages_bootmem+0x8a/0x8c
          __free_memory_core+0xea/0x131
          free_low_memory_core_early+0x4a/0x98
          free_all_bootmem+0x45/0x47
          mem_init+0x7b/0x14c
          start_kernel+0x216/0x433
          x86_64_start_reservations+0x2a/0x2c
          x86_64_start_kernel+0x144/0x153
        Code: 89 f1 ba 01 00 00 00 31 f6 d3 e2 4c 89 ef e8 66 a4 01 00 e9 2c fe ff ff 0f 0b eb fe 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 eb f3 <0f> 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 49
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      Reported-by: NGeorge Beshers <gbeshers@sgi.com>
      Acked-by: NHedi Berriche <hedi@sgi.com>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9228b20
    • F
      printk: Provide a wake_up_klogd() off-case · dc72c32e
      Frederic Weisbecker 提交于
      wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
      nor printk_sched() are in use and there are actually no waiter on
      log_wait waitqueue.  It should be a stub in this case for users like
      bust_spinlocks().
      
      Otherwise this results in this warning when CONFIG_PRINTK=n and
      CONFIG_IRQ_WORK=n:
      
      	kernel/built-in.o In function `wake_up_klogd':
      	(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
      
      To fix this, provide an off-case for wake_up_klogd() when
      CONFIG_PRINTK=n.
      
      There is much more from console_unlock() and other console related code
      in printk.c that should be moved under CONFIG_PRINTK.  But for now,
      focus on a minimal fix as we passed the merged window already.
      
      [akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reported-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc72c32e
    • J
      irq_work.h: fix warning when CONFIG_IRQ_WORK=n · fe8d5261
      James Hogan 提交于
      A randconfig caught repeated compiler warnings when CONFIG_IRQ_WORK=n
      due to the definition of a non-inline static function in
      <linux/irq_work.h>:
      
        include/linux/irq_work.h +40 : warning: 'irq_work_needs_cpu' defined but not used
      
      Make it inline to supress the warning.  This is caused commit
      00b42959 ("irq_work: Don't stop the tick with pending works") merged
      in v3.9-rc1.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe8d5261
  7. 22 3月, 2013 9 次提交
    • T
      rtnetlink: Remove passing of attributes into rtnl_doit functions · 661d2967
      Thomas Graf 提交于
      With decnet converted, we can finally get rid of rta_buf and its
      computations around it. It also gets rid of the minimal header
      length verification since all message handlers do that explicitly
      anyway.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      661d2967
    • T
      decnet: Parse netlink attributes on our own · 58d7d8f9
      Thomas Graf 提交于
      decnet is the only subsystem left that is relying on the global
      netlink attribute buffer rta_buf. It's horrible design and we
      want to get rid of it.
      
      This converts all of decnet to do implicit attribute parsing. It
      also gets rid of the error prone struct dn_kern_rta.
      
      Yes, the fib_magic() stuff is not pretty.
      
      It's compiled tested but I need someone with appropriate hardware
      to test the patch since I don't have access to it.
      
      Cc: linux-decnet-user@lists.sourceforge.net
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58d7d8f9
    • F
      mv643xx_eth: convert to use the Marvell Orion MDIO driver · c3a07134
      Florian Fainelli 提交于
      This patch converts the Marvell MV643XX ethernet driver to use the
      Marvell Orion MDIO driver. As a result, PowerPC and ARM platforms
      registering the Marvell MV643XX ethernet driver are also updated to
      register a Marvell Orion MDIO driver. This driver voluntarily overlaps
      with the Marvell Ethernet shared registers because it will use a subset
      of this shared register (shared_base + 0x4 to shared_base + 0x84). The
      Ethernet driver is also updated to look up for a PHY device using the
      Orion MDIO bus driver.
      
      For ARM and PowerPC we register a single instance of the "mvmdio" driver
      in the system like it used to be done with the use of the "shared_smi"
      platform_data cookie on ARM.
      
      Note that it is safe to register the mvmdio driver only for the "ge00"
      instance of the driver because this "ge00" interface is guaranteed to
      always be explicitely registered by consumers of
      arch/arm/plat-orion/common.c and other instances (ge01, ge10 and ge11)
      were all pointing their shared_smi to ge00. For PowerPC the in-tree
      Device Tree Source files mention only one MV643XX ethernet MAC instance
      so the MDIO bus driver is registered only when id == 0.
      Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a07134
    • R
      virtio: remove obsolete virtqueue_get_queue_index() · 9d0ca6ed
      Rusty Russell 提交于
      You can access it directly now, since 3.8: v3.7-rc1-13-g06ca287d
      'virtio: move queue_index and num_free fields into core struct
      virtqueue.'
      
      Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d0ca6ed
    • M
      Revert "KVM: allow host header to be included even for !CONFIG_KVM" · 09a6e1f4
      Marcelo Tosatti 提交于
      This reverts commit f445f11e as
      it breaks PPC with CONFIG_KVM=n.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      09a6e1f4
    • J
      USB: serial: add modem-status-change wait queue · e5b33dc9
      Johan Hovold 提交于
      Add modem-status-change wait queue to struct usb_serial_port that
      subdrivers can use to implement TIOCMIWAIT.
      
      Currently subdrivers use a private wait queue which may have been
      released when waking up after device disconnected.
      
      Note that we're adding a new wait queue rather than reusing the tty-port
      one as we do not want to get woken up at hangup (yet).
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5b33dc9
    • D
      filter: bpf_jit_comp: refactor and unify BPF JIT image dump output · 79617801
      Daniel Borkmann 提交于
      If bpf_jit_enable > 1, then we dump the emitted JIT compiled image
      after creation. Currently, only SPARC and PowerPC has similar output
      as in the reference implementation on x86_64. Make a small helper
      function in order to reduce duplicated code and make the dump output
      uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM
      flen, pass and proglen are currently not shown, but would be
      interesting to know as well), also for future BPF JIT implementations
      on other archs.
      
      Cc: Mircea Gherzan <mgherzan@gmail.com>
      Cc: Matt Evans <matt@ozlabs.org>
      Cc: Eric Dumazet <eric.dumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79617801
    • A
      netlink: Diag core and basic socket info dumping (v2) · eaaa3139
      Andrey Vagin 提交于
      The netlink_diag can be built as a module, just like it's done in
      unix sockets.
      
      The core dumping message carries the basic info about netlink sockets:
      family, type and protocol, portis, dst_group, dst_portid, state.
      
      Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.
      
      Netlink sockets cab be filtered by protocols.
      
      The socket inode number and cookie is reserved for future per-socket info
      retrieving. The per-protocol filtering is also reserved for future by
      requiring the sdiag_protocol to be zero.
      
      The file /proc/net/netlink doesn't provide enough information for
      dumping netlink sockets. It doesn't provide dst_group, dst_portid,
      groups above 32.
      
      v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaaa3139
    • A
      net: fix *_DIAG_MAX constants · ae5fc987
      Andrey Vagin 提交于
      Follow the common pattern and define *_DIAG_MAX like:
      
              [...]
              __XXX_DIAG_MAX,
      };
      
      Because everyone is used to do:
      
              struct nlattr *attrs[XXX_DIAG_MAX+1];
      
              nla_parse([...], XXX_DIAG_MAX, [...]
      Reported-by: NThomas Graf <tgraf@suug.ch>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5fc987
  8. 21 3月, 2013 8 次提交
    • Y
      tcp: implement RFC5682 F-RTO · e33099f9
      Yuchung Cheng 提交于
      This patch implements F-RTO (foward RTO recovery):
      
      When the first retransmission after timeout is acknowledged, F-RTO
      sends new data instead of old data. If the next ACK acknowledges
      some never-retransmitted data, then the timeout was spurious and the
      congestion state is reverted.  Otherwise if the next ACK selectively
      acknowledges the new data, then the timeout was genuine and the
      loss recovery continues. This idea applies to recurring timeouts
      as well. While F-RTO sends different data during timeout recovery,
      it does not (and should not) change the congestion control.
      
      The implementaion follows the three steps of SACK enhanced algorithm
      (section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
      3 are in tcp_process_loss().  The basic version is not supported
      because SACK enhanced version also works for non-SACK connections.
      
      The new implementation is functionally in parity with the old F-RTO
      implementation except the one case where it increases undo events:
      In addition to the RFC algorithm, a spurious timeout may be detected
      without sending data in step 2, as long as the SACK confirms not
      all the original data are dropped. When this happens, the sender
      will undo the cwnd and perhaps enter fast recovery instead. This
      additional check increases the F-RTO undo events by 5x compared
      to the prior implementation on Google Web servers, since the sender
      often does not have new data to send for HTTP.
      
      Note F-RTO may detect spurious timeout before Eifel with timestamps
      does so.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e33099f9
    • Y
      tcp: refactor F-RTO · 9b44190d
      Yuchung Cheng 提交于
      The patch series refactor the F-RTO feature (RFC4138/5682).
      
      This is to simplify the loss recovery processing. Existing F-RTO
      was developed during the experimental stage (RFC4138) and has
      many experimental features.  It takes a separate code path from
      the traditional timeout processing by overloading CA_Disorder
      instead of using CA_Loss state. This complicates CA_Disorder state
      handling because it's also used for handling dubious ACKs and undos.
      While the algorithm in the RFC does not change the congestion control,
      the implementation intercepts congestion control in various places
      (e.g., frto_cwnd in tcp_ack()).
      
      The new code implements newer F-RTO RFC5682 using CA_Loss processing
      path.  F-RTO becomes a small extension in the timeout processing
      and interfaces with congestion control and Eifel undo modules.
      It lets congestion control (module) determines how many to send
      independently.  F-RTO only chooses what to send in order to detect
      spurious retranmission. If timeout is found spurious it invokes
      existing Eifel undo algorithms like DSACK or TCP timestamp based
      detection.
      
      The first patch removes all F-RTO code except the sysctl_tcp_frto is
      left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
      westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b44190d
    • M
      thermal: shorten too long mcast group name · 73214f5d
      Masatake YAMATO 提交于
      The original name is too long.
      Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73214f5d
    • J
      connector: Added coredumping event to the process connector · 2b5faa4c
      Jesper Derehag 提交于
      Process connector can now also detect coredumping events.
      
      Main aim of patch is get notified at start of coredumping, instead of
      having to wait for it to finish and then being notified through EXIT
      event.
      
      Could be used for instance by process-managers that want to get
      notified as soon as possible about process failures, and not
      necessarily beeing notified after coredump, which could be in the
      order of minutes depending on size of coredump, piping and so on.
      Signed-off-by: NJesper Derehag <jderehag@hotmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b5faa4c
    • D
      filter: add ANC_PAY_OFFSET instruction for loading payload start offset · 3e5289d5
      Daniel Borkmann 提交于
      It is very useful to do dynamic truncation of packets. In particular,
      we're interested to push the necessary header bytes to the user space and
      cut off user payload that should probably not be transferred for some reasons
      (e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
      we can load it into the accumulator, and return it. E.g. in bpfc syntax ...
      
              ld #poff        ; { 0x20, 0, 0, 0xfffff034 },
              ret a           ; { 0x16, 0, 0, 0x00000000 },
      
      ... as a filter will accomplish this without having to do a big hackery in
      a BPF filter itself. Follow-up JIT implementations are welcome.
      
      Thanks to Eric Dumazet for suggesting and discussing this during the
      Netfilter Workshop in Copenhagen.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e5289d5
    • D
      net: flow_dissector: add __skb_get_poff to get a start offset to payload · f77668dc
      Daniel Borkmann 提交于
      __skb_get_poff() returns the offset to the payload as far as it could
      be dissected. The main user is currently BPF, so that we can dynamically
      truncate packets without needing to push actual payload to the user
      space and instead can analyze headers only.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f77668dc
    • D
      flow_keys: include thoff into flow_keys for later usage · 8ed78166
      Daniel Borkmann 提交于
      In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
      doing the work here anyway, also store thoff for a later usage, e.g. in
      the BPF filter.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ed78166
    • T
      udp: add encap_destroy callback · 44046a59
      Tom Parkin 提交于
      Users of udp encapsulation currently have an encap_rcv callback which they can
      use to hook into the udp receive path.
      
      In situations where a encapsulation user allocates resources associated with a
      udp encap socket, it may be convenient to be able to also hook the proto
      .destroy operation.  For example, if an encap user holds a reference to the
      udp socket, the destroy hook might be used to relinquish this reference.
      
      This patch adds a socket destroy hook into udp, which is set and enabled
      in the same way as the existing encap_rcv hook.
      Signed-off-by: NTom Parkin <tparkin@katalix.com>
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44046a59
  9. 20 3月, 2013 3 次提交
    • F
      usb: ulpi: Define a *otg_ulpi_create no-op · 7fa4cd1a
      Fabio Estevam 提交于
      Building a kernel for imx_v4_v5_defconfig with CONFIG_USB_ULPI disabled, results
      in the following error:
      
      arch/arm/mach-imx/built-in.o: In function 'pca100_init':
      platform-mx2-emma.c:(.init.text+0x6788): undefined reference to 'otg_ulpi_create'
      platform-mx2-emma.c:(.init.text+0x682c): undefined reference to 'mxc_ulpi_access_ops'
      
      Fix this by providing a no-op definition of *otg_ulpi_create for the case when
      CONFIG_USB_ULPI is not defined.
      Acked-by: NIgor Grinberg <grinberg@compulab.co.il>
      Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: NFelipe Balbi <balbi@ti.com>
      7fa4cd1a
    • W
      packet: packet fanout rollover during socket overload · 77f65ebd
      Willem de Bruijn 提交于
      Changes:
        v3->v2: rebase (no other changes)
                passes selftest
        v2->v1: read f->num_members only once
                fix bug: test rollover mode + flag
      
      Minimize packet drop in a fanout group. If one socket is full,
      roll over packets to another from the group. Maintain flow
      affinity during normal load using an rxhash fanout policy, while
      dispersing unexpected traffic storms that hit a single cpu, such
      as spoofed-source DoS flows. Rollover breaks affinity for flows
      arriving at saturated sockets during those conditions.
      
      The patch adds a fanout policy ROLLOVER that rotates between sockets,
      filling each socket before moving to the next. It also adds a fanout
      flag ROLLOVER. If passed along with any other fanout policy, the
      primary policy is applied until the chosen socket is full. Then,
      rollover selects another socket, to delay packet drop until the
      entire system is saturated.
      
      Probing sockets is not free. Selecting the last used socket, as
      rollover does, is a greedy approach that maximizes chance of
      success, at the cost of extreme load imbalance. In practice, with
      sufficiently long queues to absorb bursts, sockets are drained in
      parallel and load balance looks uniform in `top`.
      
      To avoid contention, scales counters with number of sockets and
      accesses them lockfree. Values are bounds checked to ensure
      correctness.
      
      Tested using an application with 9 threads pinned to CPUs, one socket
      per thread and sufficient busywork per packet operation to limits each
      thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
      packets, a FANOUT_CPU setup processes 32 Kpps in total without this
      patch, 270 Kpps with the patch. Tested with read() and with a packet
      ring (V1).
      
      Also, passes psock_fanout.c unit test added to selftests.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f65ebd
    • V
      netfilter: nf_conntrack: speed up module removal path if netns in use · dece40e8
      Vladimir Davydov 提交于
      The patch introduces nf_conntrack_cleanup_net_list(), which cleanups
      nf_conntrack for a list of netns and calls synchronize_net() only once
      for them all. This should reduce netns destruction time.
      
      I've measured cleanup time for 1k dummy net ns. Here are the results:
      
       <without the patch>
       # modprobe nf_conntrack
       # time modprobe -r nf_conntrack
      
       real	0m10.337s
       user	0m0.000s
       sys	0m0.376s
      
       <with the patch>
       # modprobe nf_conntrack
       # time modprobe -r nf_conntrack
      
       real    0m5.661s
       user    0m0.000s
       sys     0m0.216s
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dece40e8
  10. 19 3月, 2013 2 次提交
    • H
      inet: limit length of fragment queue hash table bucket lists · 5a3da1fe
      Hannes Frederic Sowa 提交于
      This patch introduces a constant limit of the fragment queue hash
      table bucket list lengths. Currently the limit 128 is choosen somewhat
      arbitrary and just ensures that we can fill up the fragment cache with
      empty packets up to the default ip_frag_high_thresh limits. It should
      just protect from list iteration eating considerable amounts of cpu.
      
      If we reach the maximum length in one hash bucket a warning is printed.
      This is implemented on the caller side of inet_frag_find to distinguish
      between the different users of inet_fragment.c.
      
      I dropped the out of memory warning in the ipv4 fragment lookup path,
      because we already get a warning by the slab allocator.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a3da1fe
    • J
      ipvs: add backup_only flag to avoid loops · 0c12582f
      Julian Anastasov 提交于
      Dmitry Akindinov is reporting for a problem where SYNs are looping
      between the master and backup server when the backup server is used as
      real server in DR mode and has IPVS rules to function as director.
      
      Even when the backup function is enabled we continue to forward
      traffic and schedule new connections when the current master is using
      the backup server as real server. While this is not a problem for NAT,
      for DR and TUN method the backup server can not determine if a request
      comes from client or from director.
      
      To avoid such loops add new sysctl flag backup_only. It can be needed
      for DR/TUN setups that do not need backup and director function at the
      same time. When the backup function is enabled we stop any forwarding
      and pass the traffic to the local stack (real server mode). The flag
      disables the director function when the backup function is enabled.
      
      For setups that enable backup function for some virtual services and
      director function for other virtual services there should be another
      more complex solution to support DR/TUN mode, may be to assign
      per-virtual service syncid value, so that we can differentiate the
      requests.
      Reported-by: NDmitry Akindinov <dimak@stalker.com>
      Tested-by: NGerman Myzovsky <lawyer@sipnet.ru>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      0c12582f