提交 · 3a4d5c94e959359ece6d6b55045c3f046677f55c · openeuler / Kernel

15 1月, 2010 3 次提交

vhost_net: a kernel-level virtio server · 3a4d5c94

由 Michael S. Tsirkin 提交于 1月 14, 2010

What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
  migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear.  I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device.  Backend is also configured by userspace, including vlan/mac
etc.

Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.

Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use

Note on RCU usage (this is also documented in vhost.h, near
private_pointer which is the value protected by this variant of RCU):
what is happening is that the rcu_dereference() is being used in a
workqueue item.  The role of rcu_read_lock() is taken on by the start of
execution of the workqueue item, of rcu_read_unlock() by the end of
execution of the workqueue item, and of synchronize_rcu() by
flush_workqueue()/flush_work(). In the future we might need to apply
some gcc attribute or sparse annotation to the function passed to
INIT_WORK(). Paul's ack below is for this RCU usage.

(Includes fixes by Alan Cox <alan@linux.intel.com>,
David L Stevens <dlstevens@us.ibm.com>,
Chris Wright <chrisw@redhat.com>)
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a4d5c94

tun: export underlying socket · 05c2828c

由 Michael S. Tsirkin 提交于 1月 14, 2010

Tun device looks similar to a packet socket
in that both pass complete frames from/to userspace.

This patch fills in enough fields in the socket underlying tun driver
to support sendmsg/recvmsg operations, and message flags
MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
to modules.  Regular read/write behaviour is unchanged.

This way, code using raw sockets to inject packets
into a physical device, can support injecting
packets into host network stack almost without modification.

First user of this interface will be vhost virtualization
accelerator.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05c2828c

can: Proper ctrlmode handling for CAN devices · ad72c347

由 Christian Pellegrin 提交于 1月 14, 2010

This patch adds error checking of ctrlmode values for CAN devices. As
an example all availabe bits are implemented in the mcp251x driver.
Signed-off-by: NChristian Pellegrin <chripell@fsfe.org>
Acked-by: NWolfgang Grandegger <wg@grandegger.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad72c347

14 1月, 2010 2 次提交

proc_fops: convert drivers/isdn/ to seq_file · 9a58a80a

由 Alexey Dobriyan 提交于 1月 14, 2010

Convert code away from ->read_proc/->write_proc interfaces.  Switch to
proc_create()/proc_create_data() which make addition of proc entries
reliable wrt NULL ->proc_fops, NULL ->data and so on.

Problem with ->read_proc et al is described here commit
786d7e16 "Fix rmmod/read/write races in
/proc entries"

[akpm@linux-foundation.org: CONFIG_PROC_FS=n build fix]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NTilman Schmidt <tilman@imap.cc>
Signed-off-by: NKarsten Keil <keil@b1-systems.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a58a80a

netpoll: allow execution of multiple rx_hooks per interface · 508e14b4

由 Daniel Borkmann 提交于 1月 12, 2010

Signed-off-by: NDaniel Borkmann <danborkmann@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

508e14b4

12 1月, 2010 2 次提交

can: Unify droping of invalid tx skbs and netdev stats · 3ccd4c61

由 Oliver Hartkopp 提交于 1月 12, 2010

To prevent the CAN drivers to operate on invalid socketbuffers the skbs are
now checked and silently dropped at the xmit-function consistently.

Also the netdev stats are consistently using the CAN data length code (dlc)
for [rx|tx]_bytes now.
Signed-off-by: NOliver Hartkopp <oliver@hartkopp.net>
Acked-by: NWolfgang Grandegger <wg@grandegger.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ccd4c61

tcp: Generalized TTL Security Mechanism · d218d111

由 Stephen Hemminger 提交于 1月 11, 2010

This patch adds the kernel portions needed to implement
RFC 5082 Generalized TTL Security Mechanism (GTSM).
It is a lightweight security measure against forged
packets causing DoS attacks (for BGP). 

This is already implemented the same way in BSD kernels.
For the necessary Quagga patch 
  http://www.gossamer-threads.com/lists/quagga/dev/17389

Description from Cisco
  http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html

It does add one byte to each socket structure, but I did
a little rearrangement to reuse a hole (on 64 bit), but it
does grow the structure on 32 bit

This should be documented on ip(4) man page and the Glibc in.h
file also needs update.  IPV6_MINHOPLIMIT should also be added
(although BSD doesn't support that).  

Only TCP is supported, but could also be added to UDP, DCCP, SCTP
if desired.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d218d111

08 1月, 2010 1 次提交
- G
  stmmac: add the new Header file for stmmac platform data · 3c9732c0
  由 Giuseppe CAVALLARO 提交于 1月 06, 2010
```
Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  3c9732c0
07 1月, 2010 4 次提交

net: RFC3069, private VLAN proxy arp support · 65324144

由 Jesper Dangaard Brouer 提交于 1月 05, 2010

This is to be used together with switch technologies, like RFC3069,
that where the individual ports are not allowed to communicate with
each other, but they are allowed to talk to the upstream router.  As
described in RFC 3069, it is possible to allow these hosts to
communicate through the upstream router by proxy_arp'ing.

This patch basically allow proxy arp replies back to the same
interface (from which the ARP request/solicitation was received).

Tunable per device via proc "proxy_arp_pvlan":
  /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan

This switch technology is known by different vendor names:
 - In RFC 3069 it is called VLAN Aggregation.
 - Cisco and Allied Telesyn call it Private VLAN.
 - Hewlett-Packard call it Source-Port filtering or port-isolation.
 - Ericsson call it MAC-Forced Forwarding (RFC Draft).
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65324144

Phonet: zero-copy GPRS TX · fea93ece

由 Rémi Denis-Courmont 提交于 1月 04, 2010

Send aligned pipe payload if requested to do so. Then, the socket buffer
needs not be fragmented anymore.
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fea93ece

Phonet: zero-copy aligned GPRS RX · fc6a1107

由 Rémi Denis-Courmont 提交于 1月 04, 2010

Newer Nokia cellular modems can use aligned payload for their GPRS pipe.
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc6a1107

ip: fix mc_loop checks for tunnels with multicast outer addresses · 7ad6848c

由 Octavian Purdila 提交于 1月 06, 2010

When we have L3 tunnels with different inner/outer families
(i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
destination address, multicast packets will be loopbacked back to the
sending socket even if IP*_MULTICAST_LOOP is set to disabled.

The mc_loop flag is present in the family specific part of the socket
(e.g. the IPv4 or IPv4 specific part).  setsockopt sets the inner
family mc_loop flag. When the packet is pushed through the L3 tunnel
it will eventually be processed by the outer family which if different
will check the flag in a different part of the socket then it was set.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ad6848c

04 1月, 2010 1 次提交

can/netlink: add CAN_CTRLMODE_ONE_SHOT · c1c5523d

由 Marc Kleine-Budde 提交于 12月 23, 2009

This patch adds the flag CAN_CTRLMODE_ONE_SHOT. It is used as mask
or flag in the "struct can_ctrlmode".

It allows userspace via netlink to set a CAN controller into the special
"one-shot" mode. In this mode, if supported by the CAN controller, it
tries only once to deliver a CAN frame and aborts it if an error
(e.g.: arbitration lost) happens.
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Acked-by: NWolfgang Grandegger <wg@grandegger.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1c5523d

31 12月, 2009 1 次提交

phylib: Properly reinitialize PHYs after hibernation · 2f5cb434

由 Anton Vorontsov 提交于 12月 30, 2009

Since hibernation assumes power loss, we should fully reinitialize
PHYs (including platform fixups), as if PHYs were just attached.

This patch factors phy_init_hw() out of phy_attach_direct(), then
converts mdio_bus to dev_pm_ops and adds an appropriate restore()
callback.
Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f5cb434

29 12月, 2009 7 次提交

mac80211: annotate sleeping driver ops · e1781ed3

由 Kalle Valo 提交于 12月 23, 2009

To make it easier to notice cases of calling sleeping ops in atomic context,
annotate driver-ops.h with appropiate might_sleep() calls. At the same time,
also document in mac80211.h the op functions with missing contexts.

mac80211 doesn't seem to use get_tx_stats anywhere currently. Just to be on
the safe side, I documented it to be atomic, but hopefully the op can be
removed in the future.

Compile-tested only.
Signed-off-by: NKalle Valo <kalle.valo@iki.fi>
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

e1781ed3

mac80211: remove struct ieee80211_if_init_conf · 1ed32e4f

由 Johannes Berg 提交于 12月 23, 2009

All its members (vif, mac_addr, type) are now available
in the vif struct directly, so we can pass that instead
of the conf struct. I generated this patch (except the
mac80211 and header file changes) with this semantic
patch:

@@
identifier conf, fn, hw;
type tp;
@@
tp fn(struct ieee80211_hw *hw,
-struct ieee80211_if_init_conf *conf)
+struct ieee80211_vif *vif)
{
<...
(
-conf->type
+vif->type
|
-conf->mac_addr
+vif->addr
|
-conf->vif
+vif
)
...>
}
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

1ed32e4f

mac80211/cfg80211: add station events · 98b62183

由 Johannes Berg 提交于 12月 23, 2009

When, for instance, a new IBSS peer is found, userspace
wants to be notified. Add events for all new stations
that mac80211 learns about.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

98b62183

cfg80211: add remain-on-channel command · 9588bbd5

由 Jouni Malinen 提交于 12月 23, 2009

Add new commands for requesting the driver to remain awake
on a specified channel for the specified amount of time
(and another command to cancel such an operation). This
can be used to implement userspace-controlled off-channel
operations, like Public Action frame exchange on another
channel than the operation channel.

The off-channel operation should behave similarly to scan,
i.e. the local station (if associated) moves into power
save mode to request the AP to buffer frames for it and
then moves to the other channel to allow the off-channel
operation to be completed. The duration parameter can be
used to request enough time to receive a response from
the target station.
Signed-off-by: NJouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

9588bbd5

mac80211: split up and insert custom IEs correctly · 8e664fb3

由 Johannes Berg 提交于 12月 23, 2009

Currently, we insert all user-specified IEs before the HT
IE for association, and after the HT IE for probe requests.
For association, that's correct only if the user-specified
IEs are RSN only, incorrect in all other cases including
WPA. Change this to split apart the user-specified IEs in
two places for association: before the HT IE (e.g. RSN),
after the HT IE (generally empty right now I think?) and
after WMM (all other vendor-specific IEs). For probes,
split the IEs in different places to be correct according
to the spec.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

8e664fb3

mac80211: introduce flush operation · a80f7c0b

由 Johannes Berg 提交于 12月 23, 2009

We've long lacked a good confirmation that frames
have really gone out, e.g. before going off-channel
for a scan. Add a flush() operation that drivers
can implement to provide that confirmation, and use
it in a few places:
 * before scanning sends the nullfunc frames
 * after scanning sends the nullfunc frames, if any
 * when going idle, to send any pending frames
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

a80f7c0b

wireless: remove remaining qual code · 671adc93

由 Johannes Berg 提交于 12月 23, 2009

This removes the remaining users of the rx status
'qual' field and the field itself.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

671adc93

27 12月, 2009 5 次提交

llc: convert llc_sap_list to RCU · 8beb9ab6

由 Octavian Purdila 提交于 12月 26, 2009

Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8beb9ab6

llc: replace the socket list with a local address based hash · 52d58aef

由 Octavian Purdila 提交于 12月 26, 2009

For the cases where a lot of interfaces are used in conjunction with a
lot of LLC sockets bound to the same SAP, the iteration of the socket
list becomes prohibitively expensive.

Replacing the list with a a local address based hash significantly
improves the bind and listener lookup operations as well as the
datagram delivery.

Connected sockets delivery is also improved, but this patch does not
address the case where we have lots of sockets with the same local
address connected to different remote addresses.

In order to keep the socket sanity checks alive and fast a socket
counter was added to the SAP structure.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52d58aef

llc: use a device based hash table to speed up multicast delivery · 6d2e3ea2

由 Octavian Purdila 提交于 12月 26, 2009

This patch adds a per SAP device based hash table to solve the
multicast delivery scalability issue when we have large number of
interfaces and a large number of sockets bound to the same SAP.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d2e3ea2

llc: convert the socket list to RCU locking · b76f5a84

由 Octavian Purdila 提交于 12月 26, 2009

For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
which require some extra checks in the lookup code:

a) If the current socket was released, reallocated & inserted in
another list it will short circuit the iteration for the current list,
thus we need to restart the lookup.

b) If the current socket was released, reallocated & inserted in the
same list we just need to recheck it matches the look-up criteria and
if not we can skip to the next element.

In this case there is no need to restart the lookup, since sockets are
inserted at the start of the list and the worst that will happen is
that we will iterate throught some of the list elements more then
once.

Note that the /proc and multicast delivery was not yet converted to
RCU, it still uses spinlocks for protection.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b76f5a84

llc: add support for LLC_OPT_PKTINFO · e5cd6fe3

由 Octavian Purdila 提交于 12月 26, 2009

Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5cd6fe3

26 12月, 2009 1 次提交

net: restore ip source validation · 28f6aeea

由 Jamal Hadi Salim 提交于 12月 25, 2009

when using policy routing and the skb mark:
there are cases where a back path validation requires us
to use a different routing table for src ip validation than
the one used for mapping ingress dst ip.
One such a case is transparent proxying where we pretend to be
the destination system and therefore the local table
is used for incoming packets but possibly a main table would
be used on outbound.
Make the default behavior to allow the above and if users
need to turn on the symmetry via sysctl src_valid_mark
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28f6aeea

24 12月, 2009 8 次提交

net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926

由 laurent chavey 提交于 12月 15, 2009

Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.
Signed-off-by: NLaurent Chavey <chavey@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31d12926

tcp: Remove check in __tcp_push_pending_frames · 12d50c46

由 Krishna Kumar 提交于 12月 08, 2009

tcp_push checks tcp_send_head and calls __tcp_push_pending_frames,
which again checks tcp_send_head, and this unnecessary check is
done for every other caller of __tcp_push_pending_frames.

Remove tcp_send_head check in __tcp_push_pending_frames and add
the check to tcp_push_pending_frames. Other functions call
__tcp_push_pending_frames only when tcp_send_head would evaluate
to true.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12d50c46

Staging: dst: remove from the tree · 29d249ed

由 Greg Kroah-Hartman 提交于 12月 18, 2009

DST is dead, no one is using it and upstream
has abandoned it, so remove it from the tree because
it is not going anywhere.
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

29d249ed

Driver core: driver_attribute parameters can often be const* · 099c2f21

由 Phil Carmody 提交于 12月 18, 2009

Many struct driver_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore make
the promise not to, which will let those descriptors be put in
a ro section.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

099c2f21

Driver core: bin_attribute parameters can often be const* · 66ecb92b

由 Phil Carmody 提交于 12月 18, 2009

Many struct bin_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore
make the promise not to, which will let those descriptors
be put in a ro section.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

66ecb92b

Driver core: device_attribute parameters can often be const* · 26579ab7

由 Phil Carmody 提交于 12月 18, 2009

Most device_attributes are const, and are begging to be
put in a ro section. However, the create and remove
file interfaces were failing to propagate the const promise
which the only functions they call offer.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

26579ab7

kfifo: fix Error/broken kernel-doc notation · 9c717de9

由 Randy Dunlap 提交于 12月 23, 2009

Fix kernel-doc errors and warnings in new header file kfifo.h.
Don't use kernel-doc "/**" for internal functions whose comments
are not in kernel-doc format.

kernel-doc section header names (like "Note:") must be unique
per function.  Looks like I need to document that.

  Error(include/linux/kfifo.h:76): duplicate section name 'Note'
  Warning(include/linux/kfifo.h:88): Excess function parameter 'size' description in 'INIT_KFIFO'
  Error(include/linux/kfifo.h:101): duplicate section name 'Note'
  Warning(include/linux/kfifo.h:257): No description found for parameter 'fifo'
    (many of this last type, from internal functions)
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c717de9

Fix usb_serial_probe() problem introduced by the recent kfifo changes · 119eecc8

由 Stefani Seibold 提交于 12月 23, 2009

The USB serial code was a new user of the kfifo API, and it was missed
when porting things to the new kfifo API.

Please make the write_fifo in place.  Here is my patch to fix the
regression and full ported version.
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Reported-and-tested-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

119eecc8

23 12月, 2009 5 次提交

ext3: Replace lock/unlock_super() with an explicit lock for resizing · 96d2a495

由 Eric Sandeen 提交于 12月 14, 2009

Use a separate lock to protect s_groups_count and the other block
group descriptors which get changed via an on-line resize operation,
so we can stop overloading the use of lock_super().

Port of ext4 commit 32ed5058 by
Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

96d2a495

ext3: Replace lock/unlock_super() with an explicit lock for the orphan list · b8a052d0

由 Eric Sandeen 提交于 12月 14, 2009

Use a separate lock to protect the orphan list, so we can stop
overloading the use of lock_super().

Port of ext4 commit 3b9d4ed2
by Theodore Ts'o <tytso@mit.edu>.

CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

b8a052d0

quota: decouple fs reserved space from quota reservation · fd8fbfc1

由 Dmitry Monakhov 提交于 12月 14, 2009

Currently inode_reservation is managed by fs itself and this
reservation is transfered on dquot_transfer(). This means what
inode_reservation must always be in sync with
dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result
in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be
triggered)
This is not easy because of complex locking order issues
for example http://bugzilla.kernel.org/show_bug.cgi?id=14739

The patch introduce quota reservation field for each fs-inode
(fs specific inode is used in order to prevent bloating generic
vfs inode). This reservation is managed by quota code internally
similar to i_blocks/i_bytes and may not be always in sync with
internal fs reservation.

Also perform some code rearrangement:
- Unify dquot_reserve_space() and dquot_reserve_space()
- Unify dquot_release_reserved_space() and dquot_free_space()
- Also this patch add missing warning update to release_rsv()
  dquot_release_reserved_space() must call flush_warnings() as
  dquot_free_space() does.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

fd8fbfc1

Add unlocked version of inode_add_bytes() function · b462707e

由 Dmitry Monakhov 提交于 12月 14, 2009

Quota code requires unlocked version of this function. Off course
we can just copy-paste the code, but copy-pasting is always an evil.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

b462707e

ext3: quota macros cleanup [V2] · c459001f

由 Dmitry Monakhov 提交于 12月 09, 2009

Currently all quota block reservation macros contains hardcoded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

c459001f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功