- 15 1月, 2010 3 次提交
-
-
由 Michael S. Tsirkin 提交于
What it is: vhost net is a character device that can be used to reduce the number of system calls involved in virtio networking. Existing virtio net code is used in the guest without modification. There's similarity with vringfd, with some differences and reduced scope - uses eventfd for signalling - structures can be moved around in memory at any time (good for migration, bug work-arounds in userspace) - write logging is supported (good for migration) - support memory table and not just an offset (needed for kvm) common virtio related code has been put in a separate file vhost.c and can be made into a separate module if/when more backends appear. I used Rusty's lguest.c as the source for developing this part : this supplied me with witty comments I wouldn't be able to write myself. What it is not: vhost net is not a bus, and not a generic new system call. No assumptions are made on how guest performs hypercalls. Userspace hypervisors are supported as well as kvm. How it works: Basically, we connect virtio frontend (configured by userspace) to a backend. The backend could be a network device, or a tap device. Backend is also configured by userspace, including vlan/mac etc. Status: This works for me, and I haven't see any crashes. Compared to userspace, people reported improved latency (as I save up to 4 system calls per packet), as well as better bandwidth and CPU utilization. Features that I plan to look at in the future: - mergeable buffers - zero copy - scalability tuning: figure out the best threading model to use Note on RCU usage (this is also documented in vhost.h, near private_pointer which is the value protected by this variant of RCU): what is happening is that the rcu_dereference() is being used in a workqueue item. The role of rcu_read_lock() is taken on by the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue()/flush_work(). In the future we might need to apply some gcc attribute or sparse annotation to the function passed to INIT_WORK(). Paul's ack below is for this RCU usage. (Includes fixes by Alan Cox <alan@linux.intel.com>, David L Stevens <dlstevens@us.ibm.com>, Chris Wright <chrisw@redhat.com>) Acked-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
Tun device looks similar to a packet socket in that both pass complete frames from/to userspace. This patch fills in enough fields in the socket underlying tun driver to support sendmsg/recvmsg operations, and message flags MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket to modules. Regular read/write behaviour is unchanged. This way, code using raw sockets to inject packets into a physical device, can support injecting packets into host network stack almost without modification. First user of this interface will be vhost virtualization accelerator. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christian Pellegrin 提交于
This patch adds error checking of ctrlmode values for CAN devices. As an example all availabe bits are implemented in the mcp251x driver. Signed-off-by: NChristian Pellegrin <chripell@fsfe.org> Acked-by: NWolfgang Grandegger <wg@grandegger.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2010 2 次提交
-
-
由 Alexey Dobriyan 提交于
Convert code away from ->read_proc/->write_proc interfaces. Switch to proc_create()/proc_create_data() which make addition of proc entries reliable wrt NULL ->proc_fops, NULL ->data and so on. Problem with ->read_proc et al is described here commit 786d7e16 "Fix rmmod/read/write races in /proc entries" [akpm@linux-foundation.org: CONFIG_PROC_FS=n build fix] Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NTilman Schmidt <tilman@imap.cc> Signed-off-by: NKarsten Keil <keil@b1-systems.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Signed-off-by: NDaniel Borkmann <danborkmann@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 1月, 2010 2 次提交
-
-
由 Oliver Hartkopp 提交于
To prevent the CAN drivers to operate on invalid socketbuffers the skbs are now checked and silently dropped at the xmit-function consistently. Also the netdev stats are consistently using the CAN data length code (dlc) for [rx|tx]_bytes now. Signed-off-by: NOliver Hartkopp <oliver@hartkopp.net> Acked-by: NWolfgang Grandegger <wg@grandegger.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
This patch adds the kernel portions needed to implement RFC 5082 Generalized TTL Security Mechanism (GTSM). It is a lightweight security measure against forged packets causing DoS attacks (for BGP). This is already implemented the same way in BSD kernels. For the necessary Quagga patch http://www.gossamer-threads.com/lists/quagga/dev/17389 Description from Cisco http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html It does add one byte to each socket structure, but I did a little rearrangement to reuse a hole (on 64 bit), but it does grow the structure on 32 bit This should be documented on ip(4) man page and the Glibc in.h file also needs update. IPV6_MINHOPLIMIT should also be added (although BSD doesn't support that). Only TCP is supported, but could also be added to UDP, DCCP, SCTP if desired. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 1月, 2010 1 次提交
-
-
由 Giuseppe CAVALLARO 提交于
Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 1月, 2010 4 次提交
-
-
由 Jesper Dangaard Brouer 提交于
This is to be used together with switch technologies, like RFC3069, that where the individual ports are not allowed to communicate with each other, but they are allowed to talk to the upstream router. As described in RFC 3069, it is possible to allow these hosts to communicate through the upstream router by proxy_arp'ing. This patch basically allow proxy arp replies back to the same interface (from which the ARP request/solicitation was received). Tunable per device via proc "proxy_arp_pvlan": /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan This switch technology is known by different vendor names: - In RFC 3069 it is called VLAN Aggregation. - Cisco and Allied Telesyn call it Private VLAN. - Hewlett-Packard call it Source-Port filtering or port-isolation. - Ericsson call it MAC-Forced Forwarding (RFC Draft). Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rémi Denis-Courmont 提交于
Send aligned pipe payload if requested to do so. Then, the socket buffer needs not be fragmented anymore. Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rémi Denis-Courmont 提交于
Newer Nokia cellular modems can use aligned payload for their GPRS pipe. Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
When we have L3 tunnels with different inner/outer families (i.e. IPV4/IPV6) which use a multicast address as the outer tunnel destination address, multicast packets will be loopbacked back to the sending socket even if IP*_MULTICAST_LOOP is set to disabled. The mc_loop flag is present in the family specific part of the socket (e.g. the IPv4 or IPv4 specific part). setsockopt sets the inner family mc_loop flag. When the packet is pushed through the L3 tunnel it will eventually be processed by the outer family which if different will check the flag in a different part of the socket then it was set. Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 1月, 2010 1 次提交
-
-
由 Marc Kleine-Budde 提交于
This patch adds the flag CAN_CTRLMODE_ONE_SHOT. It is used as mask or flag in the "struct can_ctrlmode". It allows userspace via netlink to set a CAN controller into the special "one-shot" mode. In this mode, if supported by the CAN controller, it tries only once to deliver a CAN frame and aborts it if an error (e.g.: arbitration lost) happens. Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Acked-by: NWolfgang Grandegger <wg@grandegger.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 12月, 2009 1 次提交
-
-
由 Anton Vorontsov 提交于
Since hibernation assumes power loss, we should fully reinitialize PHYs (including platform fixups), as if PHYs were just attached. This patch factors phy_init_hw() out of phy_attach_direct(), then converts mdio_bus to dev_pm_ops and adds an appropriate restore() callback. Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 12月, 2009 7 次提交
-
-
由 Kalle Valo 提交于
To make it easier to notice cases of calling sleeping ops in atomic context, annotate driver-ops.h with appropiate might_sleep() calls. At the same time, also document in mac80211.h the op functions with missing contexts. mac80211 doesn't seem to use get_tx_stats anywhere currently. Just to be on the safe side, I documented it to be atomic, but hopefully the op can be removed in the future. Compile-tested only. Signed-off-by: NKalle Valo <kalle.valo@iki.fi> Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Johannes Berg 提交于
All its members (vif, mac_addr, type) are now available in the vif struct directly, so we can pass that instead of the conf struct. I generated this patch (except the mac80211 and header file changes) with this semantic patch: @@ identifier conf, fn, hw; type tp; @@ tp fn(struct ieee80211_hw *hw, -struct ieee80211_if_init_conf *conf) +struct ieee80211_vif *vif) { <... ( -conf->type +vif->type | -conf->mac_addr +vif->addr | -conf->vif +vif ) ...> } Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Johannes Berg 提交于
When, for instance, a new IBSS peer is found, userspace wants to be notified. Add events for all new stations that mac80211 learns about. Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Jouni Malinen 提交于
Add new commands for requesting the driver to remain awake on a specified channel for the specified amount of time (and another command to cancel such an operation). This can be used to implement userspace-controlled off-channel operations, like Public Action frame exchange on another channel than the operation channel. The off-channel operation should behave similarly to scan, i.e. the local station (if associated) moves into power save mode to request the AP to buffer frames for it and then moves to the other channel to allow the off-channel operation to be completed. The duration parameter can be used to request enough time to receive a response from the target station. Signed-off-by: NJouni Malinen <jouni.malinen@atheros.com> Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Johannes Berg 提交于
Currently, we insert all user-specified IEs before the HT IE for association, and after the HT IE for probe requests. For association, that's correct only if the user-specified IEs are RSN only, incorrect in all other cases including WPA. Change this to split apart the user-specified IEs in two places for association: before the HT IE (e.g. RSN), after the HT IE (generally empty right now I think?) and after WMM (all other vendor-specific IEs). For probes, split the IEs in different places to be correct according to the spec. Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Johannes Berg 提交于
We've long lacked a good confirmation that frames have really gone out, e.g. before going off-channel for a scan. Add a flush() operation that drivers can implement to provide that confirmation, and use it in a few places: * before scanning sends the nullfunc frames * after scanning sends the nullfunc frames, if any * when going idle, to send any pending frames Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Johannes Berg 提交于
This removes the remaining users of the rx status 'qual' field and the field itself. Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 27 12月, 2009 5 次提交
-
-
由 Octavian Purdila 提交于
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
For the cases where a lot of interfaces are used in conjunction with a lot of LLC sockets bound to the same SAP, the iteration of the socket list becomes prohibitively expensive. Replacing the list with a a local address based hash significantly improves the bind and listener lookup operations as well as the datagram delivery. Connected sockets delivery is also improved, but this patch does not address the case where we have lots of sockets with the same local address connected to different remote addresses. In order to keep the socket sanity checks alive and fast a socket counter was added to the SAP structure. Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
This patch adds a per SAP device based hash table to solve the multicast delivery scalability issue when we have large number of interfaces and a large number of sockets bound to the same SAP. Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism, which require some extra checks in the lookup code: a) If the current socket was released, reallocated & inserted in another list it will short circuit the iteration for the current list, thus we need to restart the lookup. b) If the current socket was released, reallocated & inserted in the same list we just need to recheck it matches the look-up criteria and if not we can skip to the next element. In this case there is no need to restart the lookup, since sockets are inserted at the start of the list and the worst that will happen is that we will iterate throught some of the list elements more then once. Note that the /proc and multicast delivery was not yet converted to RCU, it still uses spinlocks for protection. Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 12月, 2009 1 次提交
-
-
由 Jamal Hadi Salim 提交于
when using policy routing and the skb mark: there are cases where a back path validation requires us to use a different routing table for src ip validation than the one used for mapping ingress dst ip. One such a case is transparent proxying where we pretend to be the destination system and therefore the local table is used for incoming packets but possibly a main table would be used on outbound. Make the default behavior to allow the above and if users need to turn on the symmetry via sysctl src_valid_mark Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 12月, 2009 8 次提交
-
-
由 laurent chavey 提交于
Add rtnetlink init_rcvwnd to set the TCP initial receive window size advertised by passive and active TCP connections. The current Linux TCP implementation limits the advertised TCP initial receive window to the one prescribed by slow start. For short lived TCP connections used for transaction type of traffic (i.e. http requests), bounding the advertised TCP initial receive window results in increased latency to complete the transaction. Support for setting initial congestion window is already supported using rtnetlink init_cwnd, but the feature is useless without the ability to set a larger TCP initial receive window. The rtnetlink init_rcvwnd allows increasing the TCP initial receive window, allowing TCP connection to advertise larger TCP receive window than the ones bounded by slow start. Signed-off-by: NLaurent Chavey <chavey@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Krishna Kumar 提交于
tcp_push checks tcp_send_head and calls __tcp_push_pending_frames, which again checks tcp_send_head, and this unnecessary check is done for every other caller of __tcp_push_pending_frames. Remove tcp_send_head check in __tcp_push_pending_frames and add the check to tcp_push_pending_frames. Other functions call __tcp_push_pending_frames only when tcp_send_head would evaluate to true. Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Greg Kroah-Hartman 提交于
DST is dead, no one is using it and upstream has abandoned it, so remove it from the tree because it is not going anywhere. Acked-by: NEvgeniy Polyakov <zbr@ioremap.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Phil Carmody 提交于
Many struct driver_attribute descriptors are purely read-only structures, and there's no need to change them. Therefore make the promise not to, which will let those descriptors be put in a ro section. Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Phil Carmody 提交于
Many struct bin_attribute descriptors are purely read-only structures, and there's no need to change them. Therefore make the promise not to, which will let those descriptors be put in a ro section. Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Phil Carmody 提交于
Most device_attributes are const, and are begging to be put in a ro section. However, the create and remove file interfaces were failing to propagate the const promise which the only functions they call offer. Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Randy Dunlap 提交于
Fix kernel-doc errors and warnings in new header file kfifo.h. Don't use kernel-doc "/**" for internal functions whose comments are not in kernel-doc format. kernel-doc section header names (like "Note:") must be unique per function. Looks like I need to document that. Error(include/linux/kfifo.h:76): duplicate section name 'Note' Warning(include/linux/kfifo.h:88): Excess function parameter 'size' description in 'INIT_KFIFO' Error(include/linux/kfifo.h:101): duplicate section name 'Note' Warning(include/linux/kfifo.h:257): No description found for parameter 'fifo' (many of this last type, from internal functions) Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Stefani Seibold <stefani@seibold.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stefani Seibold 提交于
The USB serial code was a new user of the kfifo API, and it was missed when porting things to the new kfifo API. Please make the write_fifo in place. Here is my patch to fix the regression and full ported version. Signed-off-by: NStefani Seibold <stefani@seibold.net> Reported-and-tested-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2009 5 次提交
-
-
由 Eric Sandeen 提交于
Use a separate lock to protect s_groups_count and the other block group descriptors which get changed via an on-line resize operation, so we can stop overloading the use of lock_super(). Port of ext4 commit 32ed5058 by Theodore Ts'o <tytso@mit.edu>. CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Eric Sandeen 提交于
Use a separate lock to protect the orphan list, so we can stop overloading the use of lock_super(). Port of ext4 commit 3b9d4ed2 by Theodore Ts'o <tytso@mit.edu>. CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Dmitry Monakhov 提交于
Currently inode_reservation is managed by fs itself and this reservation is transfered on dquot_transfer(). This means what inode_reservation must always be in sync with dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be triggered) This is not easy because of complex locking order issues for example http://bugzilla.kernel.org/show_bug.cgi?id=14739 The patch introduce quota reservation field for each fs-inode (fs specific inode is used in order to prevent bloating generic vfs inode). This reservation is managed by quota code internally similar to i_blocks/i_bytes and may not be always in sync with internal fs reservation. Also perform some code rearrangement: - Unify dquot_reserve_space() and dquot_reserve_space() - Unify dquot_release_reserved_space() and dquot_free_space() - Also this patch add missing warning update to release_rsv() dquot_release_reserved_space() must call flush_warnings() as dquot_free_space() does. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Dmitry Monakhov 提交于
Quota code requires unlocked version of this function. Off course we can just copy-paste the code, but copy-pasting is always an evil. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Dmitry Monakhov 提交于
Currently all quota block reservation macros contains hardcoded "2" aka MAXQUOTAS value. This is no good because in some places it is not obvious to understand what does this digit represent. Let's introduce new macro with self descriptive name. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJan Kara <jack@suse.cz>
-