提交 05cfbd66 编写于 作者: I Ingo Molnar

Merge branch 'core/urgent' into core/rcu

Merge reason: new patches to be queued up depend on:

   ef631b0c: rcu: Make hierarchical RCU less IPI-happy
Signed-off-by: NIngo Molnar <mingo@elte.hu>

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
...@@ -495,6 +495,11 @@ S: Kopmansg 2 ...@@ -495,6 +495,11 @@ S: Kopmansg 2
S: 411 13 Goteborg S: 411 13 Goteborg
S: Sweden S: Sweden
N: Paul Bristow
E: paul@paulbristow.net
W: http://paulbristow.net/linux/idefloppy.html
D: Maintainer of IDE/ATAPI floppy driver
N: Dominik Brodowski N: Dominik Brodowski
E: linux@brodo.de E: linux@brodo.de
W: http://www.brodo.de/ W: http://www.brodo.de/
...@@ -1407,8 +1412,8 @@ P: 1024D/77D4FC9B F5C5 1C20 1DFC DEC3 3107 54A4 2332 ADFC 77D4 FC9B ...@@ -1407,8 +1412,8 @@ P: 1024D/77D4FC9B F5C5 1C20 1DFC DEC3 3107 54A4 2332 ADFC 77D4 FC9B
D: National Language Support D: National Language Support
D: Linux Internationalization Project D: Linux Internationalization Project
D: German Localization for Linux and GNU software D: German Localization for Linux and GNU software
S: Kriemhildring 12a S: Auf der Fittel 18
S: 65795 Hattersheim am Main S: 53347 Alfter
S: Germany S: Germany
N: Christoph Hellwig N: Christoph Hellwig
...@@ -2642,6 +2647,10 @@ S: C/ Mieses 20, 9-B ...@@ -2642,6 +2647,10 @@ S: C/ Mieses 20, 9-B
S: Valladolid 47009 S: Valladolid 47009
S: Spain S: Spain
N: Gadi Oxman
E: gadio@netvision.net.il
D: Original author and maintainer of IDE/ATAPI floppy/tape drivers
N: Greg Page N: Greg Page
E: gpage@sovereign.org E: gpage@sovereign.org
D: IPX development and support D: IPX development and support
...@@ -3571,6 +3580,12 @@ N: Dirk Verworner ...@@ -3571,6 +3580,12 @@ N: Dirk Verworner
D: Co-author of German book ``Linux-Kernel-Programmierung'' D: Co-author of German book ``Linux-Kernel-Programmierung''
D: Co-founder of Berlin Linux User Group D: Co-founder of Berlin Linux User Group
N: Riku Voipio
E: riku.voipio@iki.fi
D: Author of PCA9532 LED and Fintek f75375s hwmon driver
D: Some random ARM board patches
S: Finland
N: Patrick Volkerding N: Patrick Volkerding
E: volkerdi@ftp.cdrom.com E: volkerdi@ftp.cdrom.com
D: Produced the Slackware distribution, updated the SVGAlib D: Produced the Slackware distribution, updated the SVGAlib
......
...@@ -86,6 +86,8 @@ cachetlb.txt ...@@ -86,6 +86,8 @@ cachetlb.txt
- describes the cache/TLB flushing interfaces Linux uses. - describes the cache/TLB flushing interfaces Linux uses.
cdrom/ cdrom/
- directory with information on the CD-ROM drivers that Linux has. - directory with information on the CD-ROM drivers that Linux has.
cgroups/
- cgroups features, including cpusets and memory controller.
connector/ connector/
- docs on the netlink based userspace<->kernel space communication mod. - docs on the netlink based userspace<->kernel space communication mod.
console/ console/
...@@ -98,8 +100,6 @@ cpu-load.txt ...@@ -98,8 +100,6 @@ cpu-load.txt
- document describing how CPU load statistics are collected. - document describing how CPU load statistics are collected.
cpuidle/ cpuidle/
- info on CPU_IDLE, CPU idle state management subsystem. - info on CPU_IDLE, CPU idle state management subsystem.
cpusets.txt
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
cputopology.txt cputopology.txt
- documentation on how CPU topology info is exported via sysfs. - documentation on how CPU topology info is exported via sysfs.
cris/ cris/
......
What: /sys/kernel/debug/kmemtrace/
Date: July 2008
Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Description:
In kmemtrace-enabled kernels, the following files are created:
/sys/kernel/debug/kmemtrace/
cpu<n> (0400) Per-CPU tracing data, see below. (binary)
total_overruns (0400) Total number of bytes which were dropped from
cpu<n> files because of full buffer condition,
non-binary. (text)
abi_version (0400) Kernel's kmemtrace ABI version. (text)
Each per-CPU file should be read according to the relay interface. That is,
the reader should set affinity to that specific CPU and, as currently done by
the userspace application (though there are other methods), use poll() with
an infinite timeout before every read(). Otherwise, erroneous data may be
read. The binary data has the following _core_ format:
Event ID (1 byte) Unsigned integer, one of:
0 - represents an allocation (KMEMTRACE_EVENT_ALLOC)
1 - represents a freeing of previously allocated memory
(KMEMTRACE_EVENT_FREE)
Type ID (1 byte) Unsigned integer, one of:
0 - this is a kmalloc() / kfree()
1 - this is a kmem_cache_alloc() / kmem_cache_free()
2 - this is a __get_free_pages() et al.
Event size (2 bytes) Unsigned integer representing the
size of this event. Used to extend
kmemtrace. Discard the bytes you
don't know about.
Sequence number (4 bytes) Signed integer used to reorder data
logged on SMP machines. Wraparound
must be taken into account, although
it is unlikely.
Caller address (8 bytes) Return address to the caller.
Pointer to mem (8 bytes) Pointer to target memory area. Can be
NULL, but not all such calls might be
recorded.
In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow:
Requested bytes (8 bytes) Total number of requested bytes,
unsigned, must not be zero.
Allocated bytes (8 bytes) Total number of actually allocated
bytes, unsigned, must not be lower
than requested bytes.
Requested flags (4 bytes) GFP flags supplied by the caller.
Target CPU (4 bytes) Signed integer, valid for event id 1.
If equal to -1, target CPU is the same
as origin CPU, but the reverse might
not be true.
The data is made available in the same endianness the machine has.
Other event ids and type ids may be defined and added. Other fields may be
added by increasing event size, but see below for details.
Every modification to the ABI, including new id definitions, are followed
by bumping the ABI version by one.
Adding new data to the packet (features) is done at the end of the mandatory
data:
Feature size (2 byte)
Feature ID (1 byte)
Feature data (Feature size - 3 bytes)
Users:
kmemtrace-user - git://repo.or.cz/kmemtrace-user.git
What: /debug/pktcdvd/pktcdvd[0-7] What: /sys/kernel/debug/pktcdvd/pktcdvd[0-7]
Date: Oct. 2006 Date: Oct. 2006
KernelVersion: 2.6.20 KernelVersion: 2.6.20
Contact: Thomas Maier <balagi@justmail.de> Contact: Thomas Maier <balagi@justmail.de>
...@@ -10,10 +10,10 @@ debugfs interface ...@@ -10,10 +10,10 @@ debugfs interface
The pktcdvd module (packet writing driver) creates The pktcdvd module (packet writing driver) creates
these files in debugfs: these files in debugfs:
/debug/pktcdvd/pktcdvd[0-7]/ /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/
info (0444) Lots of driver statistics and infos. info (0444) Lots of driver statistics and infos.
Example: Example:
------- -------
cat /debug/pktcdvd/pktcdvd0/info cat /sys/kernel/debug/pktcdvd/pktcdvd0/info
...@@ -41,6 +41,49 @@ Description: ...@@ -41,6 +41,49 @@ Description:
for the device and attempt to bind to it. For example: for the device and attempt to bind to it. For example:
# echo "8086 10f5" > /sys/bus/pci/drivers/foo/new_id # echo "8086 10f5" > /sys/bus/pci/drivers/foo/new_id
What: /sys/bus/pci/drivers/.../remove_id
Date: February 2009
Contact: Chris Wright <chrisw@sous-sol.org>
Description:
Writing a device ID to this file will remove an ID
that was dynamically added via the new_id sysfs entry.
The format for the device ID is:
VVVV DDDD SVVV SDDD CCCC MMMM. That is Vendor ID, Device
ID, Subsystem Vendor ID, Subsystem Device ID, Class,
and Class Mask. The Vendor ID and Device ID fields are
required, the rest are optional. After successfully
removing an ID, the driver will no longer support the
device. This is useful to ensure auto probing won't
match the driver to the device. For example:
# echo "8086 10f5" > /sys/bus/pci/drivers/foo/remove_id
What: /sys/bus/pci/rescan
Date: January 2009
Contact: Linux PCI developers <linux-pci@vger.kernel.org>
Description:
Writing a non-zero value to this attribute will
force a rescan of all PCI buses in the system, and
re-discover previously removed devices.
Depends on CONFIG_HOTPLUG.
What: /sys/bus/pci/devices/.../remove
Date: January 2009
Contact: Linux PCI developers <linux-pci@vger.kernel.org>
Description:
Writing a non-zero value to this attribute will
hot-remove the PCI device and any of its children.
Depends on CONFIG_HOTPLUG.
What: /sys/bus/pci/devices/.../rescan
Date: January 2009
Contact: Linux PCI developers <linux-pci@vger.kernel.org>
Description:
Writing a non-zero value to this attribute will
force a rescan of the device's parent bus and all
child buses, and re-discover devices removed earlier
from this part of the device tree.
Depends on CONFIG_HOTPLUG.
What: /sys/bus/pci/devices/.../vpd What: /sys/bus/pci/devices/.../vpd
Date: February 2008 Date: February 2008
Contact: Ben Hutchings <bhutchings@solarflare.com> Contact: Ben Hutchings <bhutchings@solarflare.com>
...@@ -52,3 +95,30 @@ Description: ...@@ -52,3 +95,30 @@ Description:
that some devices may have malformatted data. If the that some devices may have malformatted data. If the
underlying VPD has a writable section then the underlying VPD has a writable section then the
corresponding section of this file will be writable. corresponding section of this file will be writable.
What: /sys/bus/pci/devices/.../virtfnN
Date: March 2009
Contact: Yu Zhao <yu.zhao@intel.com>
Description:
This symbolic link appears when hardware supports the SR-IOV
capability and the Physical Function driver has enabled it.
The symbolic link points to the PCI device sysfs entry of the
Virtual Function whose index is N (0...MaxVFs-1).
What: /sys/bus/pci/devices/.../dep_link
Date: March 2009
Contact: Yu Zhao <yu.zhao@intel.com>
Description:
This symbolic link appears when hardware supports the SR-IOV
capability and the Physical Function driver has enabled it,
and this device has vendor specific dependencies with others.
The symbolic link points to the PCI device sysfs entry of
Physical Function this device depends on.
What: /sys/bus/pci/devices/.../physfn
Date: March 2009
Contact: Yu Zhao <yu.zhao@intel.com>
Description:
This symbolic link appears when a device is a Virtual Function.
The symbolic link points to the PCI device sysfs entry of the
Physical Function this device associates with.
...@@ -4,8 +4,8 @@ KernelVersion: 2.6.26 ...@@ -4,8 +4,8 @@ KernelVersion: 2.6.26
Contact: Liam Girdwood <lrg@slimlogic.co.uk> Contact: Liam Girdwood <lrg@slimlogic.co.uk>
Description: Description:
Some regulator directories will contain a field called Some regulator directories will contain a field called
state. This reports the regulator enable status, for state. This reports the regulator enable control, for
regulators which can report that value. regulators which can report that input value.
This will be one of the following strings: This will be one of the following strings:
...@@ -14,16 +14,54 @@ Description: ...@@ -14,16 +14,54 @@ Description:
'unknown' 'unknown'
'enabled' means the regulator output is ON and is supplying 'enabled' means the regulator output is ON and is supplying
power to the system. power to the system (assuming no error prevents it).
'disabled' means the regulator output is OFF and is not 'disabled' means the regulator output is OFF and is not
supplying power to the system.. supplying power to the system (unless some non-Linux
control has enabled it).
'unknown' means software cannot determine the state, or 'unknown' means software cannot determine the state, or
the reported state is invalid. the reported state is invalid.
NOTE: this field can be used in conjunction with microvolts NOTE: this field can be used in conjunction with microvolts
and microamps to determine regulator output levels. or microamps to determine configured regulator output levels.
What: /sys/class/regulator/.../status
Description:
Some regulator directories will contain a field called
"status". This reports the current regulator status, for
regulators which can report that output value.
This will be one of the following strings:
off
on
error
fast
normal
idle
standby
"off" means the regulator is not supplying power to the
system.
"on" means the regulator is supplying power to the system,
and the regulator can't report a detailed operation mode.
"error" indicates an out-of-regulation status such as being
disabled due to thermal shutdown, or voltage being unstable
because of problems with the input power supply.
"fast", "normal", "idle", and "standby" are all detailed
regulator operation modes (described elsewhere). They
imply "on", but provide more detail.
Note that regulator status is a function of many inputs,
not limited to control inputs from Linux. For example,
the actual load presented may trigger "error" status; or
a regulator may be enabled by another user, even though
Linux did not enable it.
What: /sys/class/regulator/.../type What: /sys/class/regulator/.../type
...@@ -58,7 +96,7 @@ Description: ...@@ -58,7 +96,7 @@ Description:
Some regulator directories will contain a field called Some regulator directories will contain a field called
microvolts. This holds the regulator output voltage setting microvolts. This holds the regulator output voltage setting
measured in microvolts (i.e. E-6 Volts), for regulators measured in microvolts (i.e. E-6 Volts), for regulators
which can report that voltage. which can report the control input for voltage.
NOTE: This value should not be used to determine the regulator NOTE: This value should not be used to determine the regulator
output voltage level as this value is the same regardless of output voltage level as this value is the same regardless of
...@@ -73,7 +111,7 @@ Description: ...@@ -73,7 +111,7 @@ Description:
Some regulator directories will contain a field called Some regulator directories will contain a field called
microamps. This holds the regulator output current limit microamps. This holds the regulator output current limit
setting measured in microamps (i.e. E-6 Amps), for regulators setting measured in microamps (i.e. E-6 Amps), for regulators
which can report that current. which can report the control input for a current limit.
NOTE: This value should not be used to determine the regulator NOTE: This value should not be used to determine the regulator
output current level as this value is the same regardless of output current level as this value is the same regardless of
...@@ -87,7 +125,7 @@ Contact: Liam Girdwood <lrg@slimlogic.co.uk> ...@@ -87,7 +125,7 @@ Contact: Liam Girdwood <lrg@slimlogic.co.uk>
Description: Description:
Some regulator directories will contain a field called Some regulator directories will contain a field called
opmode. This holds the current regulator operating mode, opmode. This holds the current regulator operating mode,
for regulators which can report it. for regulators which can report that control input value.
The opmode value can be one of the following strings: The opmode value can be one of the following strings:
...@@ -101,7 +139,8 @@ Description: ...@@ -101,7 +139,8 @@ Description:
NOTE: This value should not be used to determine the regulator NOTE: This value should not be used to determine the regulator
output operating mode as this value is the same regardless of output operating mode as this value is the same regardless of
whether the regulator is enabled or disabled. whether the regulator is enabled or disabled. A "status"
attribute may be available to determine the actual mode.
What: /sys/class/regulator/.../min_microvolts What: /sys/class/regulator/.../min_microvolts
......
What: /sys/fs/ext4/<disk>/mb_stats
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Controls whether the multiblock allocator should
collect statistics, which are shown during the unmount.
1 means to collect statistics, 0 means not to collect
statistics
What: /sys/fs/ext4/<disk>/mb_group_prealloc
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
The multiblock allocator will round up allocation
requests to a multiple of this tuning parameter if the
stripe size is not set in the ext4 superblock
What: /sys/fs/ext4/<disk>/mb_max_to_scan
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
The maximum number of extents the multiblock allocator
will search to find the best extent
What: /sys/fs/ext4/<disk>/mb_min_to_scan
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
The minimum number of extents the multiblock allocator
will search to find the best extent
What: /sys/fs/ext4/<disk>/mb_order2_req
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Tuning parameter which controls the minimum size for
requests (as a power of 2) where the buddy cache is
used
What: /sys/fs/ext4/<disk>/mb_stream_req
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Files which have fewer blocks than this tunable
parameter will have their blocks allocated out of a
block group specific preallocation pool, so that small
files are packed closely together. Each large file
will have its blocks allocated out of its own unique
preallocation pool.
What: /sys/fs/ext4/<disk>/inode_readahead
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Tuning parameter which controls the maximum number of
inode table blocks that ext4's inode table readahead
algorithm will pre-read into the buffer cache
What: /sys/fs/ext4/<disk>/delayed_allocation_blocks
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
This file is read-only and shows the number of blocks
that are dirty in the page cache, but which do not
have their location in the filesystem allocated yet.
What: /sys/fs/ext4/<disk>/lifetime_write_kbytes
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
This file is read-only and shows the number of kilobytes
of data that have been written to this filesystem since it was
created.
What: /sys/fs/ext4/<disk>/session_write_kbytes
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
This file is read-only and shows the number of
kilobytes of data that have been written to this
filesystem since it was mounted.
...@@ -136,7 +136,7 @@ exactly why. ...@@ -136,7 +136,7 @@ exactly why.
The standard 32-bit addressing PCI device would do something like The standard 32-bit addressing PCI device would do something like
this: this:
if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
printk(KERN_WARNING printk(KERN_WARNING
"mydev: No suitable DMA available.\n"); "mydev: No suitable DMA available.\n");
goto ignore_this_device; goto ignore_this_device;
...@@ -155,9 +155,9 @@ all 64-bits when accessing streaming DMA: ...@@ -155,9 +155,9 @@ all 64-bits when accessing streaming DMA:
int using_dac; int using_dac;
if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) { if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
using_dac = 1; using_dac = 1;
} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
using_dac = 0; using_dac = 0;
} else { } else {
printk(KERN_WARNING printk(KERN_WARNING
...@@ -170,14 +170,14 @@ the case would look like this: ...@@ -170,14 +170,14 @@ the case would look like this:
int using_dac, consistent_using_dac; int using_dac, consistent_using_dac;
if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) { if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
using_dac = 1; using_dac = 1;
consistent_using_dac = 1; consistent_using_dac = 1;
pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) { } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
using_dac = 0; using_dac = 0;
consistent_using_dac = 0; consistent_using_dac = 0;
pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK); pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
} else { } else {
printk(KERN_WARNING printk(KERN_WARNING
"mydev: No suitable DMA available.\n"); "mydev: No suitable DMA available.\n");
...@@ -192,7 +192,7 @@ check the return value from pci_set_consistent_dma_mask(). ...@@ -192,7 +192,7 @@ check the return value from pci_set_consistent_dma_mask().
Finally, if your device can only drive the low 24-bits of Finally, if your device can only drive the low 24-bits of
address during PCI bus mastering you might do something like: address during PCI bus mastering you might do something like:
if (pci_set_dma_mask(pdev, DMA_24BIT_MASK)) { if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) {
printk(KERN_WARNING printk(KERN_WARNING
"mydev: 24-bit DMA addressing not available.\n"); "mydev: 24-bit DMA addressing not available.\n");
goto ignore_this_device; goto ignore_this_device;
...@@ -213,7 +213,7 @@ most specific mask. ...@@ -213,7 +213,7 @@ most specific mask.
Here is pseudo-code showing how this might be done: Here is pseudo-code showing how this might be done:
#define PLAYBACK_ADDRESS_BITS DMA_32BIT_MASK #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32)
#define RECORD_ADDRESS_BITS 0x00ffffff #define RECORD_ADDRESS_BITS 0x00ffffff
struct my_sound_card *card; struct my_sound_card *card;
......
...@@ -4,3 +4,7 @@ ...@@ -4,3 +4,7 @@
*.html *.html
*.9.gz *.9.gz
*.9 *.9
*.aux
*.dvi
*.log
*.out
...@@ -31,7 +31,7 @@ PS_METHOD = $(prefer-db2x) ...@@ -31,7 +31,7 @@ PS_METHOD = $(prefer-db2x)
### ###
# The targets that may be used. # The targets that may be used.
PHONY += xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs PHONY += xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs cleandocs
BOOKS := $(addprefix $(obj)/,$(DOCBOOKS)) BOOKS := $(addprefix $(obj)/,$(DOCBOOKS))
xmldocs: $(BOOKS) xmldocs: $(BOOKS)
...@@ -213,11 +213,12 @@ silent_gen_xml = : ...@@ -213,11 +213,12 @@ silent_gen_xml = :
dochelp: dochelp:
@echo ' Linux kernel internal documentation in different formats:' @echo ' Linux kernel internal documentation in different formats:'
@echo ' htmldocs - HTML' @echo ' htmldocs - HTML'
@echo ' installmandocs - install man pages generated by mandocs'
@echo ' mandocs - man pages'
@echo ' pdfdocs - PDF' @echo ' pdfdocs - PDF'
@echo ' psdocs - Postscript' @echo ' psdocs - Postscript'
@echo ' xmldocs - XML DocBook' @echo ' xmldocs - XML DocBook'
@echo ' mandocs - man pages'
@echo ' installmandocs - install man pages generated by mandocs'
@echo ' cleandocs - clean all generated DocBook files'
### ###
# Temporary files left by various tools # Temporary files left by various tools
...@@ -235,6 +236,10 @@ clean-files := $(DOCBOOKS) \ ...@@ -235,6 +236,10 @@ clean-files := $(DOCBOOKS) \
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
cleandocs:
$(Q)rm -f $(call objectify, $(clean-files))
$(Q)rm -rf $(call objectify, $(clean-dirs))
# Declare the contents of the .PHONY variable as phony. We keep that # Declare the contents of the .PHONY variable as phony. We keep that
# information in a variable se we can use it in if_changed and friends. # information in a variable se we can use it in if_changed and friends.
......
...@@ -199,6 +199,7 @@ X!Edrivers/pci/hotplug.c ...@@ -199,6 +199,7 @@ X!Edrivers/pci/hotplug.c
--> -->
!Edrivers/pci/probe.c !Edrivers/pci/probe.c
!Edrivers/pci/rom.c !Edrivers/pci/rom.c
!Edrivers/pci/iov.c
</sect1> </sect1>
<sect1><title>PCI Hotplug Support Library</title> <sect1><title>PCI Hotplug Support Library</title>
!Edrivers/pci/hotplug/pci_hotplug_core.c !Edrivers/pci/hotplug/pci_hotplug_core.c
...@@ -258,7 +259,7 @@ X!Earch/x86/kernel/mca_32.c ...@@ -258,7 +259,7 @@ X!Earch/x86/kernel/mca_32.c
!Eblock/blk-tag.c !Eblock/blk-tag.c
!Iblock/blk-tag.c !Iblock/blk-tag.c
!Eblock/blk-integrity.c !Eblock/blk-integrity.c
!Iblock/blktrace.c !Ikernel/trace/blktrace.c
!Iblock/genhd.c !Iblock/genhd.c
!Eblock/genhd.c !Eblock/genhd.c
</chapter> </chapter>
......
...@@ -117,9 +117,6 @@ static int __init init_procfs_example(void) ...@@ -117,9 +117,6 @@ static int __init init_procfs_example(void)
rv = -ENOMEM; rv = -ENOMEM;
goto out; goto out;
} }
example_dir->owner = THIS_MODULE;
/* create jiffies using convenience function */ /* create jiffies using convenience function */
jiffies_file = create_proc_read_entry("jiffies", jiffies_file = create_proc_read_entry("jiffies",
0444, example_dir, 0444, example_dir,
...@@ -130,8 +127,6 @@ static int __init init_procfs_example(void) ...@@ -130,8 +127,6 @@ static int __init init_procfs_example(void)
goto no_jiffies; goto no_jiffies;
} }
jiffies_file->owner = THIS_MODULE;
/* create foo and bar files using same callback /* create foo and bar files using same callback
* functions * functions
*/ */
...@@ -146,7 +141,6 @@ static int __init init_procfs_example(void) ...@@ -146,7 +141,6 @@ static int __init init_procfs_example(void)
foo_file->data = &foo_data; foo_file->data = &foo_data;
foo_file->read_proc = proc_read_foobar; foo_file->read_proc = proc_read_foobar;
foo_file->write_proc = proc_write_foobar; foo_file->write_proc = proc_write_foobar;
foo_file->owner = THIS_MODULE;
bar_file = create_proc_entry("bar", 0644, example_dir); bar_file = create_proc_entry("bar", 0644, example_dir);
if(bar_file == NULL) { if(bar_file == NULL) {
...@@ -159,7 +153,6 @@ static int __init init_procfs_example(void) ...@@ -159,7 +153,6 @@ static int __init init_procfs_example(void)
bar_file->data = &bar_data; bar_file->data = &bar_data;
bar_file->read_proc = proc_read_foobar; bar_file->read_proc = proc_read_foobar;
bar_file->write_proc = proc_write_foobar; bar_file->write_proc = proc_write_foobar;
bar_file->owner = THIS_MODULE;
/* create symlink */ /* create symlink */
symlink = proc_symlink("jiffies_too", example_dir, symlink = proc_symlink("jiffies_too", example_dir,
...@@ -169,8 +162,6 @@ static int __init init_procfs_example(void) ...@@ -169,8 +162,6 @@ static int __init init_procfs_example(void)
goto no_symlink; goto no_symlink;
} }
symlink->owner = THIS_MODULE;
/* everything OK */ /* everything OK */
printk(KERN_INFO "%s %s initialised\n", printk(KERN_INFO "%s %s initialised\n",
MODULE_NAME, MODULE_VERS); MODULE_NAME, MODULE_VERS);
......
...@@ -1137,8 +1137,8 @@ ...@@ -1137,8 +1137,8 @@
if (err < 0) if (err < 0)
return err; return err;
/* check PCI availability (28bit DMA) */ /* check PCI availability (28bit DMA) */
if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || if (pci_set_dma_mask(pci, DMA_BIT_MASK(28)) < 0 ||
pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(28)) < 0) {
printk(KERN_ERR "error to set 28bit mask DMA\n"); printk(KERN_ERR "error to set 28bit mask DMA\n");
pci_disable_device(pci); pci_disable_device(pci);
return -ENXIO; return -ENXIO;
...@@ -1252,8 +1252,8 @@ ...@@ -1252,8 +1252,8 @@
err = pci_enable_device(pci); err = pci_enable_device(pci);
if (err < 0) if (err < 0)
return err; return err;
if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 || if (pci_set_dma_mask(pci, DMA_BIT_MASK(28)) < 0 ||
pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) { pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(28)) < 0) {
printk(KERN_ERR "error to set 28bit mask DMA\n"); printk(KERN_ERR "error to set 28bit mask DMA\n");
pci_disable_device(pci); pci_disable_device(pci);
return -ENXIO; return -ENXIO;
......
此差异已折叠。
PCI Express I/O Virtualization Howto
Copyright (C) 2009 Intel Corporation
Yu Zhao <yu.zhao@intel.com>
1. Overview
1.1 What is SR-IOV
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
capability which makes one physical device appear as multiple virtual
devices. The physical device is referred to as Physical Function (PF)
while the virtual devices are referred to as Virtual Functions (VF).
Allocation of the VF can be dynamically controlled by the PF via
registers encapsulated in the capability. By default, this feature is
not enabled and the PF behaves as traditional PCIe device. Once it's
turned on, each VF's PCI configuration space can be accessed by its own
Bus, Device and Function Number (Routing ID). And each VF also has PCI
Memory Space, which is used to map its register set. VF device driver
operates on the register set so it can be functional and appear as a
real existing PCI device.
2. User Guide
2.1 How can I enable SR-IOV capability
The device driver (PF driver) will control the enabling and disabling
of the capability via API provided by SR-IOV core. If the hardware
has SR-IOV capability, loading its PF driver would enable it and all
VFs associated with the PF.
2.2 How can I use the Virtual Functions
The VF is treated as hot-plugged PCI devices in the kernel, so they
should be able to work in the same way as real PCI devices. The VF
requires device driver that is same as a normal PCI device's.
3. Developer Guide
3.1 SR-IOV API
To enable SR-IOV capability:
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
'nr_virtfn' is number of VFs to be enabled.
To disable SR-IOV capability:
void pci_disable_sriov(struct pci_dev *dev);
To notify SR-IOV core of Virtual Function Migration:
irqreturn_t pci_sriov_migration(struct pci_dev *dev);
3.2 Usage example
Following piece of code illustrates the usage of the SR-IOV API.
static int __devinit dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
pci_enable_sriov(dev, NR_VIRTFN);
...
return 0;
}
static void __devexit dev_remove(struct pci_dev *dev)
{
pci_disable_sriov(dev);
...
}
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
{
...
return 0;
}
static int dev_resume(struct pci_dev *dev)
{
...
return 0;
}
static void dev_shutdown(struct pci_dev *dev)
{
...
}
static struct pci_driver dev_driver = {
.name = "SR-IOV Physical Function driver",
.id_table = dev_id_table,
.probe = dev_probe,
.remove = __devexit_p(dev_remove),
.suspend = dev_suspend,
.resume = dev_resume,
.shutdown = dev_shutdown,
};
...@@ -118,7 +118,7 @@ Following are the RCU equivalents for these two functions: ...@@ -118,7 +118,7 @@ Following are the RCU equivalents for these two functions:
list_for_each_entry(e, list, list) { list_for_each_entry(e, list, list) {
if (!audit_compare_rule(rule, &e->rule)) { if (!audit_compare_rule(rule, &e->rule)) {
list_del_rcu(&e->list); list_del_rcu(&e->list);
call_rcu(&e->rcu, audit_free_rule, e); call_rcu(&e->rcu, audit_free_rule);
return 0; return 0;
} }
} }
...@@ -206,7 +206,7 @@ RCU ("read-copy update") its name. The RCU code is as follows: ...@@ -206,7 +206,7 @@ RCU ("read-copy update") its name. The RCU code is as follows:
ne->rule.action = newaction; ne->rule.action = newaction;
ne->rule.file_count = newfield_count; ne->rule.file_count = newfield_count;
list_replace_rcu(e, ne); list_replace_rcu(e, ne);
call_rcu(&e->rcu, audit_free_rule, e); call_rcu(&e->rcu, audit_free_rule);
return 0; return 0;
} }
} }
...@@ -283,7 +283,7 @@ flag under the spinlock as follows: ...@@ -283,7 +283,7 @@ flag under the spinlock as follows:
list_del_rcu(&e->list); list_del_rcu(&e->list);
e->deleted = 1; e->deleted = 1;
spin_unlock(&e->lock); spin_unlock(&e->lock);
call_rcu(&e->rcu, audit_free_rule, e); call_rcu(&e->rcu, audit_free_rule);
return 0; return 0;
} }
} }
......
...@@ -81,7 +81,7 @@ o I hear that RCU needs work in order to support realtime kernels? ...@@ -81,7 +81,7 @@ o I hear that RCU needs work in order to support realtime kernels?
This work is largely completed. Realtime-friendly RCU can be This work is largely completed. Realtime-friendly RCU can be
enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter. enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter.
However, work is in progress for enabling priority boosting of However, work is in progress for enabling priority boosting of
preempted RCU read-side critical sections.This is needed if you preempted RCU read-side critical sections. This is needed if you
have CPU-bound realtime threads. have CPU-bound realtime threads.
o Where can I find more information on RCU? o Where can I find more information on RCU?
......
...@@ -21,7 +21,7 @@ if (obj) { ...@@ -21,7 +21,7 @@ if (obj) {
/* /*
* Because a writer could delete object, and a writer could * Because a writer could delete object, and a writer could
* reuse these object before the RCU grace period, we * reuse these object before the RCU grace period, we
* must check key after geting the reference on object * must check key after getting the reference on object
*/ */
if (obj->key != key) { // not the object we expected if (obj->key != key) { // not the object we expected
put_ref(obj); put_ref(obj);
...@@ -117,7 +117,7 @@ a race (some writer did a delete and/or a move of an object ...@@ -117,7 +117,7 @@ a race (some writer did a delete and/or a move of an object
to another chain) checking the final 'nulls' value if to another chain) checking the final 'nulls' value if
the lookup met the end of chain. If final 'nulls' value the lookup met the end of chain. If final 'nulls' value
is not the slot number, then we must restart the lookup at is not the slot number, then we must restart the lookup at
the begining. If the object was moved to same chain, the beginning. If the object was moved to the same chain,
then the reader doesnt care : It might eventually then the reader doesnt care : It might eventually
scan the list again without harm. scan the list again without harm.
......
...@@ -8,6 +8,8 @@ cpqarray.txt ...@@ -8,6 +8,8 @@ cpqarray.txt
- info on using Compaq's SMART2 Intelligent Disk Array Controllers. - info on using Compaq's SMART2 Intelligent Disk Array Controllers.
floppy.txt floppy.txt
- notes and driver options for the floppy disk driver. - notes and driver options for the floppy disk driver.
mflash.txt
- info on mGine m(g)flash driver for linux.
nbd.txt nbd.txt
- info on a TCP implementation of a network block device. - info on a TCP implementation of a network block device.
paride.txt paride.txt
......
This document describes m[g]flash support in linux.
Contents
1. Overview
2. Reserved area configuration
3. Example of mflash platform driver registration
1. Overview
Mflash and gflash are embedded flash drive. The only difference is mflash is
MCP(Multi Chip Package) device. These two device operate exactly same way.
So the rest mflash repersents mflash and gflash altogether.
Internally, mflash has nand flash and other hardware logics and supports
2 different operation (ATA, IO) modes. ATA mode doesn't need any new
driver and currently works well under standard IDE subsystem. Actually it's
one chip SSD. IO mode is ATA-like custom mode for the host that doesn't have
IDE interface.
Followings are brief descriptions about IO mode.
A. IO mode based on ATA protocol and uses some custom command. (read confirm,
write confirm)
B. IO mode uses SRAM bus interface.
C. IO mode supports 4kB boot area, so host can boot from mflash.
2. Reserved area configuration
If host boot from mflash, usually needs raw area for boot loader image. All of
the mflash's block device operation will be taken this value as start offset.
Note that boot loader's size of reserved area and kernel configuration value
must be same.
3. Example of mflash platform driver registration
Working mflash is very straight forward. Adding platform device stuff to board
configuration file is all. Here is some pseudo example.
static struct mg_drv_data mflash_drv_data = {
/* If you want to polling driver set to 1 */
.use_polling = 0,
/* device attribution */
.dev_attr = MG_BOOT_DEV
};
static struct resource mg_mflash_rsc[] = {
/* Base address of mflash */
[0] = {
.start = 0x08000000,
.end = 0x08000000 + SZ_64K - 1,
.flags = IORESOURCE_MEM
},
/* mflash interrupt pin */
[1] = {
.start = IRQ_GPIO(84),
.end = IRQ_GPIO(84),
.flags = IORESOURCE_IRQ
},
/* mflash reset pin */
[2] = {
.start = 43,
.end = 43,
.name = MG_RST_PIN,
.flags = IORESOURCE_IO
},
/* mflash reset-out pin
* If you use mflash as storage device (i.e. other than MG_BOOT_DEV),
* should assign this */
[3] = {
.start = 51,
.end = 51,
.name = MG_RSTOUT_PIN,
.flags = IORESOURCE_IO
}
};
static struct platform_device mflash_dev = {
.name = MG_DEV_NAME,
.id = -1,
.dev = {
.platform_data = &mflash_drv_data,
},
.num_resources = ARRAY_SIZE(mg_mflash_rsc),
.resource = mg_mflash_rsc
};
platform_device_register(&mflash_dev);
00-INDEX
- this file
cgroups.txt
- Control Groups definition, implementation details, examples and API.
cpuacct.txt
- CPU Accounting Controller; account CPU usage for groups of tasks.
cpusets.txt
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
devices.txt
- Device Whitelist Controller; description, interface and security.
freezer-subsystem.txt
- checkpointing; rationale to not use signals, interface.
memcg_test.txt
- Memory Resource Controller; implementation details.
memory.txt
- Memory Resource Controller; design, accounting, interface, testing.
resource_counter.txt
- Resource Counter API.
...@@ -56,7 +56,7 @@ hierarchy, and a set of subsystems; each subsystem has system-specific ...@@ -56,7 +56,7 @@ hierarchy, and a set of subsystems; each subsystem has system-specific
state attached to each cgroup in the hierarchy. Each hierarchy has state attached to each cgroup in the hierarchy. Each hierarchy has
an instance of the cgroup virtual filesystem associated with it. an instance of the cgroup virtual filesystem associated with it.
At any one time there may be multiple active hierachies of task At any one time there may be multiple active hierarchies of task
cgroups. Each hierarchy is a partition of all tasks in the system. cgroups. Each hierarchy is a partition of all tasks in the system.
User level code may create and destroy cgroups by name in an User level code may create and destroy cgroups by name in an
...@@ -124,10 +124,10 @@ following lines: ...@@ -124,10 +124,10 @@ following lines:
/ \ / \
Prof (15%) students (5%) Prof (15%) students (5%)
Browsers like firefox/lynx go into the WWW network class, while (k)nfsd go Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go
into NFS network class. into NFS network class.
At the same time firefox/lynx will share an appropriate CPU/Memory class At the same time Firefox/Lynx will share an appropriate CPU/Memory class
depending on who launched it (prof/student). depending on who launched it (prof/student).
With the ability to classify tasks differently for different resources With the ability to classify tasks differently for different resources
...@@ -325,7 +325,7 @@ and then start a subshell 'sh' in that cgroup: ...@@ -325,7 +325,7 @@ and then start a subshell 'sh' in that cgroup:
Creating, modifying, using the cgroups can be done through the cgroup Creating, modifying, using the cgroups can be done through the cgroup
virtual filesystem. virtual filesystem.
To mount a cgroup hierarchy will all available subsystems, type: To mount a cgroup hierarchy with all available subsystems, type:
# mount -t cgroup xxx /dev/cgroup # mount -t cgroup xxx /dev/cgroup
The "xxx" is not interpreted by the cgroup code, but will appear in The "xxx" is not interpreted by the cgroup code, but will appear in
...@@ -333,12 +333,23 @@ The "xxx" is not interpreted by the cgroup code, but will appear in ...@@ -333,12 +333,23 @@ The "xxx" is not interpreted by the cgroup code, but will appear in
To mount a cgroup hierarchy with just the cpuset and numtasks To mount a cgroup hierarchy with just the cpuset and numtasks
subsystems, type: subsystems, type:
# mount -t cgroup -o cpuset,numtasks hier1 /dev/cgroup # mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
To change the set of subsystems bound to a mounted hierarchy, just To change the set of subsystems bound to a mounted hierarchy, just
remount with different options: remount with different options:
# mount -o remount,cpuset,ns hier1 /dev/cgroup
# mount -o remount,cpuset,ns /dev/cgroup Now memory is removed from the hierarchy and ns is added.
Note this will add ns to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
# mount -o remount,ns /dev/cgroup
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /dev/cgroup
Note that specifying 'release_agent' more than once will return failure.
Note that changing the set of subsystems is currently only supported Note that changing the set of subsystems is currently only supported
when the hierarchy consists of a single (root) cgroup. Supporting when the hierarchy consists of a single (root) cgroup. Supporting
...@@ -349,6 +360,11 @@ Then under /dev/cgroup you can find a tree that corresponds to the ...@@ -349,6 +360,11 @@ Then under /dev/cgroup you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /dev/cgroup tree of the cgroups in the system. For instance, /dev/cgroup
is the cgroup that holds the whole system. is the cgroup that holds the whole system.
If you want to change the value of release_agent:
# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent
It can also be changed via remount.
If you want to create a new cgroup under /dev/cgroup: If you want to create a new cgroup under /dev/cgroup:
# cd /dev/cgroup # cd /dev/cgroup
# mkdir my_cgroup # mkdir my_cgroup
...@@ -476,11 +492,13 @@ cgroup->parent is still valid. (Note - can also be called for a ...@@ -476,11 +492,13 @@ cgroup->parent is still valid. (Note - can also be called for a
newly-created cgroup if an error occurs after this subsystem's newly-created cgroup if an error occurs after this subsystem's
create() method has been called for the new cgroup). create() method has been called for the new cgroup).
void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp); int pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
Called before checking the reference count on each subsystem. This may Called before checking the reference count on each subsystem. This may
be useful for subsystems which have some extra references even if be useful for subsystems which have some extra references even if
there are not tasks in the cgroup. there are not tasks in the cgroup. If pre_destroy() returns error code,
rmdir() will fail with it. From this behavior, pre_destroy() can be
called multiple times against a cgroup.
int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct task_struct *task) struct task_struct *task)
...@@ -521,7 +539,7 @@ always handled well. ...@@ -521,7 +539,7 @@ always handled well.
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller) (cgroup_mutex held by caller)
Called at the end of cgroup_clone() to do any paramater Called at the end of cgroup_clone() to do any parameter
initialization which might be required before a task could attach. For initialization which might be required before a task could attach. For
example in cpusets, no task may attach before 'cpus' and 'mems' are set example in cpusets, no task may attach before 'cpus' and 'mems' are set
up. up.
......
...@@ -30,3 +30,21 @@ The above steps create a new group g1 and move the current shell ...@@ -30,3 +30,21 @@ The above steps create a new group g1 and move the current shell
process (bash) into it. CPU time consumed by this bash and its children process (bash) into it. CPU time consumed by this bash and its children
can be obtained from g1/cpuacct.usage and the same is accumulated in can be obtained from g1/cpuacct.usage and the same is accumulated in
/cgroups/cpuacct.usage also. /cgroups/cpuacct.usage also.
cpuacct.stat file lists a few statistics which further divide the
CPU time obtained by the cgroup into user and system times. Currently
the following statistics are supported:
user: Time spent by tasks of the cgroup in user mode.
system: Time spent by tasks of the cgroup in kernel mode.
user and system are in USER_HZ unit.
cpuacct controller uses percpu_counter interface to collect user and
system times. This has two side effects:
- It is theoretically possible to see wrong values for user and system times.
This is because percpu_counter_read() on 32bit systems isn't safe
against concurrent writes.
- It is possible to see slightly outdated values for user and system times
due to the batch processing nature of percpu_counter.
...@@ -131,7 +131,7 @@ Cpusets extends these two mechanisms as follows: ...@@ -131,7 +131,7 @@ Cpusets extends these two mechanisms as follows:
- The hierarchy of cpusets can be mounted at /dev/cpuset, for - The hierarchy of cpusets can be mounted at /dev/cpuset, for
browsing and manipulation from user space. browsing and manipulation from user space.
- A cpuset may be marked exclusive, which ensures that no other - A cpuset may be marked exclusive, which ensures that no other
cpuset (except direct ancestors and descendents) may contain cpuset (except direct ancestors and descendants) may contain
any overlapping CPUs or Memory Nodes. any overlapping CPUs or Memory Nodes.
- You can list all the tasks (by pid) attached to any cpuset. - You can list all the tasks (by pid) attached to any cpuset.
...@@ -226,7 +226,7 @@ nodes with memory--using the cpuset_track_online_nodes() hook. ...@@ -226,7 +226,7 @@ nodes with memory--using the cpuset_track_online_nodes() hook.
-------------------------------- --------------------------------
If a cpuset is cpu or mem exclusive, no other cpuset, other than If a cpuset is cpu or mem exclusive, no other cpuset, other than
a direct ancestor or descendent, may share any of the same CPUs or a direct ancestor or descendant, may share any of the same CPUs or
Memory Nodes. Memory Nodes.
A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled", A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
...@@ -427,7 +427,7 @@ child cpusets have this flag enabled. ...@@ -427,7 +427,7 @@ child cpusets have this flag enabled.
When doing this, you don't usually want to leave any unpinned tasks in When doing this, you don't usually want to leave any unpinned tasks in
the top cpuset that might use non-trivial amounts of CPU, as such tasks the top cpuset that might use non-trivial amounts of CPU, as such tasks
may be artificially constrained to some subset of CPUs, depending on may be artificially constrained to some subset of CPUs, depending on
the particulars of this flag setting in descendent cpusets. Even if the particulars of this flag setting in descendant cpusets. Even if
such a task could use spare CPU cycles in some other CPUs, the kernel such a task could use spare CPU cycles in some other CPUs, the kernel
scheduler might not consider the possibility of load balancing that scheduler might not consider the possibility of load balancing that
task to that underused CPU. task to that underused CPU.
...@@ -531,9 +531,9 @@ be idle. ...@@ -531,9 +531,9 @@ be idle.
Of course it takes some searching cost to find movable tasks and/or Of course it takes some searching cost to find movable tasks and/or
idle CPUs, the scheduler might not search all CPUs in the domain idle CPUs, the scheduler might not search all CPUs in the domain
everytime. In fact, in some architectures, the searching ranges on every time. In fact, in some architectures, the searching ranges on
events are limited in the same socket or node where the CPU locates, events are limited in the same socket or node where the CPU locates,
while the load balance on tick searchs all. while the load balance on tick searches all.
For example, assume CPU Z is relatively far from CPU X. Even if CPU Z For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
is idle while CPU X and the siblings are busy, scheduler can't migrate is idle while CPU X and the siblings are busy, scheduler can't migrate
...@@ -601,7 +601,7 @@ its new cpuset, then the task will continue to use whatever subset ...@@ -601,7 +601,7 @@ its new cpuset, then the task will continue to use whatever subset
of MPOL_BIND nodes are still allowed in the new cpuset. If the task of MPOL_BIND nodes are still allowed in the new cpuset. If the task
was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
in the new cpuset, then the task will be essentially treated as if it in the new cpuset, then the task will be essentially treated as if it
was MPOL_BIND bound to the new cpuset (even though its numa placement, was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
as queried by get_mempolicy(), doesn't change). If a task is moved as queried by get_mempolicy(), doesn't change). If a task is moved
from one cpuset to another, then the kernel will adjust the tasks from one cpuset to another, then the kernel will adjust the tasks
memory placement, as above, the next time that the kernel attempts memory placement, as above, the next time that the kernel attempts
......
...@@ -42,7 +42,7 @@ suffice, but we can decide the best way to adequately restrict ...@@ -42,7 +42,7 @@ suffice, but we can decide the best way to adequately restrict
movement as people get some experience with this. We may just want movement as people get some experience with this. We may just want
to require CAP_SYS_ADMIN, which at least is a separate bit from to require CAP_SYS_ADMIN, which at least is a separate bit from
CAP_MKNOD. We may want to just refuse moving to a cgroup which CAP_MKNOD. We may want to just refuse moving to a cgroup which
isn't a descendent of the current one. Or we may want to use isn't a descendant of the current one. Or we may want to use
CAP_MAC_ADMIN, since we really are trying to lock down root. CAP_MAC_ADMIN, since we really are trying to lock down root.
CAP_SYS_ADMIN is needed to modify the whitelist or move another CAP_SYS_ADMIN is needed to modify the whitelist or move another
......
Memory Resource Controller(Memcg) Implementation Memo. Memory Resource Controller(Memcg) Implementation Memo.
Last Updated: 2009/1/19 Last Updated: 2009/1/20
Base Kernel Version: based on 2.6.29-rc2. Base Kernel Version: based on 2.6.29-rc2.
Because VM is getting complex (one of reasons is memcg...), memcg's behavior Because VM is getting complex (one of reasons is memcg...), memcg's behavior
...@@ -356,7 +356,25 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. ...@@ -356,7 +356,25 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
(Shell-B) (Shell-B)
# move all tasks in /cgroup/test to /cgroup # move all tasks in /cgroup/test to /cgroup
# /sbin/swapoff -a # /sbin/swapoff -a
# rmdir /test/cgroup # rmdir /cgroup/test
# kill malloc task. # kill malloc task.
Of course, tmpfs v.s. swapoff test should be tested, too. Of course, tmpfs v.s. swapoff test should be tested, too.
9.8 OOM-Killer
Out-of-memory caused by memcg's limit will kill tasks under
the memcg. When hierarchy is used, a task under hierarchy
will be killed by the kernel.
In this case, panic_on_oom shouldn't be invoked and tasks
in other groups shouldn't be killed.
It's not difficult to cause OOM under memcg as following.
Case A) when you can swapoff
#swapoff -a
#echo 50M > /memory.limit_in_bytes
run 51M of malloc
Case B) when you use mem+swap limitation.
#echo 50M > memory.limit_in_bytes
#echo 50M > memory.memsw.limit_in_bytes
run 51M of malloc
...@@ -6,15 +6,14 @@ used here with the memory controller that is used in hardware. ...@@ -6,15 +6,14 @@ used here with the memory controller that is used in hardware.
Salient features Salient features
a. Enable control of both RSS (mapped) and Page Cache (unmapped) pages a. Enable control of Anonymous, Page Cache (mapped and unmapped) and
Swap Cache memory pages.
b. The infrastructure allows easy addition of other types of memory to control b. The infrastructure allows easy addition of other types of memory to control
c. Provides *zero overhead* for non memory controller users c. Provides *zero overhead* for non memory controller users
d. Provides a double LRU: global memory pressure causes reclaim from the d. Provides a double LRU: global memory pressure causes reclaim from the
global LRU; a cgroup on hitting a limit, reclaims from the per global LRU; a cgroup on hitting a limit, reclaims from the per
cgroup LRU cgroup LRU
NOTE: Swap Cache (unmapped) is not accounted now.
Benefits and Purpose of the memory controller Benefits and Purpose of the memory controller
The memory controller isolates the memory behaviour of a group of tasks The memory controller isolates the memory behaviour of a group of tasks
...@@ -290,34 +289,44 @@ will be charged as a new owner of it. ...@@ -290,34 +289,44 @@ will be charged as a new owner of it.
moved to the parent. If you want to avoid that, force_empty will be useful. moved to the parent. If you want to avoid that, force_empty will be useful.
5.2 stat file 5.2 stat file
memory.stat file includes following statistics (now)
cache - # of pages from page-cache and shmem. memory.stat file includes following statistics
rss - # of pages from anonymous memory.
pgpgin - # of event of charging cache - # of bytes of page cache memory.
pgpgout - # of event of uncharging rss - # of bytes of anonymous and swap cache memory.
active_anon - # of pages on active lru of anon, shmem. pgpgin - # of pages paged in (equivalent to # of charging events).
inactive_anon - # of pages on active lru of anon, shmem pgpgout - # of pages paged out (equivalent to # of uncharging events).
active_file - # of pages on active lru of file-cache active_anon - # of bytes of anonymous and swap cache memory on active
inactive_file - # of pages on inactive lru of file cache lru list.
unevictable - # of pages cannot be reclaimed.(mlocked etc) inactive_anon - # of bytes of anonymous memory and swap cache memory on
inactive lru list.
Below is depend on CONFIG_DEBUG_VM. active_file - # of bytes of file-backed memory on active lru list.
inactive_ratio - VM inernal parameter. (see mm/page_alloc.c) inactive_file - # of bytes of file-backed memory on inactive lru list.
recent_rotated_anon - VM internal parameter. (see mm/vmscan.c) unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
recent_scanned_anon - VM internal parameter. (see mm/vmscan.c) The following additional stats are dependent on CONFIG_DEBUG_VM.
recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
inactive_ratio - VM internal parameter. (see mm/page_alloc.c)
Memo: recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
Memo:
recent_rotated means recent frequency of lru rotation. recent_rotated means recent frequency of lru rotation.
recent_scanned means recent # of scans to lru. recent_scanned means recent # of scans to lru.
showing for better debug please see the code for meanings. showing for better debug please see the code for meanings.
Note:
Only anonymous and swap cache memory is listed as part of 'rss' stat.
This should not be confused with the true 'resident set size' or the
amount of physical memory used by the cgroup. Per-cgroup rss
accounting is not done yet.
5.3 swappiness 5.3 swappiness
Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
Following cgroup's swapiness can't be changed. Following cgroups' swapiness can't be changed.
- root cgroup (uses /proc/sys/vm/swappiness). - root cgroup (uses /proc/sys/vm/swappiness).
- a cgroup which uses hierarchy and it has child cgroup. - a cgroup which uses hierarchy and it has child cgroup.
- a cgroup which uses hierarchy and not the root of hierarchy. - a cgroup which uses hierarchy and not the root of hierarchy.
......
...@@ -47,13 +47,18 @@ to work with it. ...@@ -47,13 +47,18 @@ to work with it.
2. Basic accounting routines 2. Basic accounting routines
a. void res_counter_init(struct res_counter *rc) a. void res_counter_init(struct res_counter *rc,
struct res_counter *rc_parent)
Initializes the resource counter. As usual, should be the first Initializes the resource counter. As usual, should be the first
routine called for a new counter. routine called for a new counter.
b. int res_counter_charge[_locked] The struct res_counter *parent can be used to define a hierarchical
(struct res_counter *rc, unsigned long val) child -> parent relationship directly in the res_counter structure,
NULL can be used to define no relationship.
c. int res_counter_charge(struct res_counter *rc, unsigned long val,
struct res_counter **limit_fail_at)
When a resource is about to be allocated it has to be accounted When a resource is about to be allocated it has to be accounted
with the appropriate resource counter (controller should determine with the appropriate resource counter (controller should determine
...@@ -67,15 +72,25 @@ to work with it. ...@@ -67,15 +72,25 @@ to work with it.
* if the charging is performed first, then it should be uncharged * if the charging is performed first, then it should be uncharged
on error path (if the one is called). on error path (if the one is called).
c. void res_counter_uncharge[_locked] If the charging fails and a hierarchical dependency exists, the
limit_fail_at parameter is set to the particular res_counter element
where the charging failed.
d. int res_counter_charge_locked
(struct res_counter *rc, unsigned long val)
The same as res_counter_charge(), but it must not acquire/release the
res_counter->lock internally (it must be called with res_counter->lock
held).
e. void res_counter_uncharge[_locked]
(struct res_counter *rc, unsigned long val) (struct res_counter *rc, unsigned long val)
When a resource is released (freed) it should be de-accounted When a resource is released (freed) it should be de-accounted
from the resource counter it was accounted to. This is called from the resource counter it was accounted to. This is called
"uncharging". "uncharging".
The _locked routines imply that the res_counter->lock is taken. The _locked routines imply that the res_counter->lock is taken.
2.1 Other accounting routines 2.1 Other accounting routines
......
LINUX ALLOCATED DEVICES (2.6+ version) LINUX ALLOCATED DEVICES (2.6+ version)
Maintained by Torben Mathiasen <device@lanana.org> Maintained by Alan Cox <device@lanana.org>
Last revised: 29 November 2006 Last revised: 6th April 2009
This list is the Linux Device List, the official registry of allocated This list is the Linux Device List, the official registry of allocated
device numbers and /dev directory nodes for the Linux operating device numbers and /dev directory nodes for the Linux operating
...@@ -67,6 +67,11 @@ up to date. Due to the number of registrations I have to maintain it ...@@ -67,6 +67,11 @@ up to date. Due to the number of registrations I have to maintain it
in "batch mode", so there is likely additional registrations that in "batch mode", so there is likely additional registrations that
haven't been listed yet. haven't been listed yet.
Fourth, remember that Linux now has extensive support for dynamic allocation
of device numbering and can use sysfs and udev to handle the naming needs.
There are still some exceptions in the serial and boot device area. Before
asking for a device number make sure you actually need one.
Finally, sometimes I have to play "namespace police." Please don't be Finally, sometimes I have to play "namespace police." Please don't be
offended. I often get submissions for /dev names that would be bound offended. I often get submissions for /dev names that would be bound
to cause conflicts down the road. I am trying to avoid getting in a to cause conflicts down the road. I am trying to avoid getting in a
...@@ -101,7 +106,7 @@ Your cooperation is appreciated. ...@@ -101,7 +106,7 @@ Your cooperation is appreciated.
0 = /dev/ram0 First RAM disk 0 = /dev/ram0 First RAM disk
1 = /dev/ram1 Second RAM disk 1 = /dev/ram1 Second RAM disk
... ...
250 = /dev/initrd Initial RAM disk {2.6} 250 = /dev/initrd Initial RAM disk
Older kernels had /dev/ramdisk (1, 1) here. Older kernels had /dev/ramdisk (1, 1) here.
/dev/initrd refers to a RAM disk which was preloaded /dev/initrd refers to a RAM disk which was preloaded
...@@ -340,7 +345,7 @@ Your cooperation is appreciated. ...@@ -340,7 +345,7 @@ Your cooperation is appreciated.
14 = /dev/touchscreen/ucb1x00 UCB 1x00 touchscreen 14 = /dev/touchscreen/ucb1x00 UCB 1x00 touchscreen
15 = /dev/touchscreen/mk712 MK712 touchscreen 15 = /dev/touchscreen/mk712 MK712 touchscreen
128 = /dev/beep Fancy beep device 128 = /dev/beep Fancy beep device
129 = /dev/modreq Kernel module load request {2.6} 129 =
130 = /dev/watchdog Watchdog timer port 130 = /dev/watchdog Watchdog timer port
131 = /dev/temperature Machine internal temperature 131 = /dev/temperature Machine internal temperature
132 = /dev/hwtrap Hardware fault trap 132 = /dev/hwtrap Hardware fault trap
...@@ -350,10 +355,10 @@ Your cooperation is appreciated. ...@@ -350,10 +355,10 @@ Your cooperation is appreciated.
139 = /dev/openprom SPARC OpenBoot PROM 139 = /dev/openprom SPARC OpenBoot PROM
140 = /dev/relay8 Berkshire Products Octal relay card 140 = /dev/relay8 Berkshire Products Octal relay card
141 = /dev/relay16 Berkshire Products ISO-16 relay card 141 = /dev/relay16 Berkshire Products ISO-16 relay card
142 = /dev/msr x86 model-specific registers {2.6} 142 =
143 = /dev/pciconf PCI configuration space 143 = /dev/pciconf PCI configuration space
144 = /dev/nvram Non-volatile configuration RAM 144 = /dev/nvram Non-volatile configuration RAM
145 = /dev/hfmodem Soundcard shortwave modem control {2.6} 145 = /dev/hfmodem Soundcard shortwave modem control
146 = /dev/graphics Linux/SGI graphics device 146 = /dev/graphics Linux/SGI graphics device
147 = /dev/opengl Linux/SGI OpenGL pipe 147 = /dev/opengl Linux/SGI OpenGL pipe
148 = /dev/gfx Linux/SGI graphics effects device 148 = /dev/gfx Linux/SGI graphics effects device
...@@ -435,6 +440,9 @@ Your cooperation is appreciated. ...@@ -435,6 +440,9 @@ Your cooperation is appreciated.
228 = /dev/hpet HPET driver 228 = /dev/hpet HPET driver
229 = /dev/fuse Fuse (virtual filesystem in user-space) 229 = /dev/fuse Fuse (virtual filesystem in user-space)
230 = /dev/midishare MidiShare driver 230 = /dev/midishare MidiShare driver
231 = /dev/snapshot System memory snapshot device
232 = /dev/kvm Kernel-based virtual machine (hardware virtualization extensions)
233 = /dev/kmview View-OS A process with a view
240-254 Reserved for local use 240-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR 255 Reserved for MISC_DYNAMIC_MINOR
...@@ -466,10 +474,7 @@ Your cooperation is appreciated. ...@@ -466,10 +474,7 @@ Your cooperation is appreciated.
The device names specified are proposed -- if there The device names specified are proposed -- if there
are "standard" names for these devices, please let me know. are "standard" names for these devices, please let me know.
12 block MSCDEX CD-ROM callback support {2.6} 12 block
0 = /dev/dos_cd0 First MSCDEX CD-ROM
1 = /dev/dos_cd1 Second MSCDEX CD-ROM
...
13 char Input core 13 char Input core
0 = /dev/input/js0 First joystick 0 = /dev/input/js0 First joystick
...@@ -498,7 +503,7 @@ Your cooperation is appreciated. ...@@ -498,7 +503,7 @@ Your cooperation is appreciated.
2 = /dev/midi00 First MIDI port 2 = /dev/midi00 First MIDI port
3 = /dev/dsp Digital audio 3 = /dev/dsp Digital audio
4 = /dev/audio Sun-compatible digital audio 4 = /dev/audio Sun-compatible digital audio
6 = /dev/sndstat Sound card status information {2.6} 6 =
7 = /dev/audioctl SPARC audio control device 7 = /dev/audioctl SPARC audio control device
8 = /dev/sequencer2 Sequencer -- alternate device 8 = /dev/sequencer2 Sequencer -- alternate device
16 = /dev/mixer1 Second soundcard mixer control 16 = /dev/mixer1 Second soundcard mixer control
...@@ -510,14 +515,7 @@ Your cooperation is appreciated. ...@@ -510,14 +515,7 @@ Your cooperation is appreciated.
34 = /dev/midi02 Third MIDI port 34 = /dev/midi02 Third MIDI port
50 = /dev/midi03 Fourth MIDI port 50 = /dev/midi03 Fourth MIDI port
14 block BIOS harddrive callback support {2.6} 14 block
0 = /dev/dos_hda First BIOS harddrive whole disk
64 = /dev/dos_hdb Second BIOS harddrive whole disk
128 = /dev/dos_hdc Third BIOS harddrive whole disk
192 = /dev/dos_hdd Fourth BIOS harddrive whole disk
Partitions are handled in the same way as IDE disks
(see major number 3).
15 char Joystick 15 char Joystick
0 = /dev/js0 First analog joystick 0 = /dev/js0 First analog joystick
...@@ -535,14 +533,14 @@ Your cooperation is appreciated. ...@@ -535,14 +533,14 @@ Your cooperation is appreciated.
16 block GoldStar CD-ROM 16 block GoldStar CD-ROM
0 = /dev/gscd GoldStar CD-ROM 0 = /dev/gscd GoldStar CD-ROM
17 char Chase serial card 17 char OBSOLETE (was Chase serial card)
0 = /dev/ttyH0 First Chase port 0 = /dev/ttyH0 First Chase port
1 = /dev/ttyH1 Second Chase port 1 = /dev/ttyH1 Second Chase port
... ...
17 block Optics Storage CD-ROM 17 block Optics Storage CD-ROM
0 = /dev/optcd Optics Storage CD-ROM 0 = /dev/optcd Optics Storage CD-ROM
18 char Chase serial card - alternate devices 18 char OBSOLETE (was Chase serial card - alternate devices)
0 = /dev/cuh0 Callout device for ttyH0 0 = /dev/cuh0 Callout device for ttyH0
1 = /dev/cuh1 Callout device for ttyH1 1 = /dev/cuh1 Callout device for ttyH1
... ...
...@@ -644,8 +642,7 @@ Your cooperation is appreciated. ...@@ -644,8 +642,7 @@ Your cooperation is appreciated.
2 = /dev/sbpcd2 Panasonic CD-ROM controller 0 unit 2 2 = /dev/sbpcd2 Panasonic CD-ROM controller 0 unit 2
3 = /dev/sbpcd3 Panasonic CD-ROM controller 0 unit 3 3 = /dev/sbpcd3 Panasonic CD-ROM controller 0 unit 3
26 char Quanta WinVision frame grabber {2.6} 26 char
0 = /dev/wvisfgrab Quanta WinVision frame grabber
26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM 26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM
0 = /dev/sbpcd4 Panasonic CD-ROM controller 1 unit 0 0 = /dev/sbpcd4 Panasonic CD-ROM controller 1 unit 0
...@@ -872,7 +869,7 @@ Your cooperation is appreciated. ...@@ -872,7 +869,7 @@ Your cooperation is appreciated.
and "user level packet I/O." This board is also and "user level packet I/O." This board is also
accessible as a standard networking "eth" device. accessible as a standard networking "eth" device.
38 block Reserved for Linux/AP+ 38 block OBSOLETE (was Linux/AP+)
39 char ML-16P experimental I/O board 39 char ML-16P experimental I/O board
0 = /dev/ml16pa-a0 First card, first analog channel 0 = /dev/ml16pa-a0 First card, first analog channel
...@@ -892,29 +889,16 @@ Your cooperation is appreciated. ...@@ -892,29 +889,16 @@ Your cooperation is appreciated.
50 = /dev/ml16pb-c1 Second card, second counter/timer 50 = /dev/ml16pb-c1 Second card, second counter/timer
51 = /dev/ml16pb-c2 Second card, third counter/timer 51 = /dev/ml16pb-c2 Second card, third counter/timer
... ...
39 block Reserved for Linux/AP+ 39 block
40 char Matrox Meteor frame grabber {2.6} 40 char
0 = /dev/mmetfgrab Matrox Meteor frame grabber
40 block Syquest EZ135 parallel port removable drive 40 block
0 = /dev/eza Parallel EZ135 drive, whole disk
This device is obsolete and will be removed in a
future version of Linux. It has been replaced with
the parallel port IDE disk driver at major number 45.
Partitions are handled in the same way as IDE disks
(see major number 3).
41 char Yet Another Micro Monitor 41 char Yet Another Micro Monitor
0 = /dev/yamm Yet Another Micro Monitor 0 = /dev/yamm Yet Another Micro Monitor
41 block MicroSolutions BackPack parallel port CD-ROM 41 block
0 = /dev/bpcd BackPack CD-ROM
This device is obsolete and will be removed in a
future version of Linux. It has been replaced with
the parallel port ATAPI CD-ROM driver at major number 46.
42 char Demo/sample use 42 char Demo/sample use
...@@ -1681,13 +1665,7 @@ Your cooperation is appreciated. ...@@ -1681,13 +1665,7 @@ Your cooperation is appreciated.
disks (see major number 3) except that the limit on disks (see major number 3) except that the limit on
partitions is 15. partitions is 15.
93 char IBM Smart Capture Card frame grabber {2.6} 93 char
0 = /dev/iscc0 First Smart Capture Card
1 = /dev/iscc1 Second Smart Capture Card
...
128 = /dev/isccctl0 First Smart Capture Card control
129 = /dev/isccctl1 Second Smart Capture Card control
...
93 block NAND Flash Translation Layer filesystem 93 block NAND Flash Translation Layer filesystem
0 = /dev/nftla First NFTL layer 0 = /dev/nftla First NFTL layer
...@@ -1695,10 +1673,7 @@ Your cooperation is appreciated. ...@@ -1695,10 +1673,7 @@ Your cooperation is appreciated.
... ...
240 = /dev/nftlp 16th NTFL layer 240 = /dev/nftlp 16th NTFL layer
94 char miroVIDEO DC10/30 capture/playback device {2.6} 94 char
0 = /dev/dcxx0 First capture card
1 = /dev/dcxx1 Second capture card
...
94 block IBM S/390 DASD block storage 94 block IBM S/390 DASD block storage
0 = /dev/dasda First DASD device, major 0 = /dev/dasda First DASD device, major
...@@ -1791,11 +1766,7 @@ Your cooperation is appreciated. ...@@ -1791,11 +1766,7 @@ Your cooperation is appreciated.
... ...
15 = /dev/amiraid/ar?p15 15th partition 15 = /dev/amiraid/ar?p15 15th partition
102 char Philips SAA5249 Teletext signal decoder {2.6} 102 char
0 = /dev/tlk0 First Teletext decoder
1 = /dev/tlk1 Second Teletext decoder
2 = /dev/tlk2 Third Teletext decoder
3 = /dev/tlk3 Fourth Teletext decoder
102 block Compressed block device 102 block Compressed block device
0 = /dev/cbd/a First compressed block device, whole device 0 = /dev/cbd/a First compressed block device, whole device
...@@ -1916,10 +1887,7 @@ Your cooperation is appreciated. ...@@ -1916,10 +1887,7 @@ Your cooperation is appreciated.
DAC960 (see major number 48) except that the limit on DAC960 (see major number 48) except that the limit on
partitions is 15. partitions is 15.
111 char Philips SAA7146-based audio/video card {2.6} 111 char
0 = /dev/av0 First A/V card
1 = /dev/av1 Second A/V card
...
111 block Compaq Next Generation Drive Array, eighth controller 111 block Compaq Next Generation Drive Array, eighth controller
0 = /dev/cciss/c7d0 First logical drive, whole disk 0 = /dev/cciss/c7d0 First logical drive, whole disk
...@@ -2079,8 +2047,8 @@ Your cooperation is appreciated. ...@@ -2079,8 +2047,8 @@ Your cooperation is appreciated.
... ...
119 char VMware virtual network control 119 char VMware virtual network control
0 = /dev/vmnet0 1st virtual network 0 = /dev/vnet0 1st virtual network
1 = /dev/vmnet1 2nd virtual network 1 = /dev/vnet1 2nd virtual network
... ...
120-127 char LOCAL/EXPERIMENTAL USE 120-127 char LOCAL/EXPERIMENTAL USE
...@@ -2450,7 +2418,7 @@ Your cooperation is appreciated. ...@@ -2450,7 +2418,7 @@ Your cooperation is appreciated.
2 = /dev/raw/raw2 Second raw I/O device 2 = /dev/raw/raw2 Second raw I/O device
... ...
163 char UNASSIGNED (was Radio Tech BIM-XXX-RS232 radio modem - see 51) 163 char
164 char Chase Research AT/PCI-Fast serial card 164 char Chase Research AT/PCI-Fast serial card
0 = /dev/ttyCH0 AT/PCI-Fast board 0, port 0 0 = /dev/ttyCH0 AT/PCI-Fast board 0, port 0
...@@ -2542,6 +2510,12 @@ Your cooperation is appreciated. ...@@ -2542,6 +2510,12 @@ Your cooperation is appreciated.
1 = /dev/clanvi1 Second cLAN adapter 1 = /dev/clanvi1 Second cLAN adapter
... ...
179 block MMC block devices
0 = /dev/mmcblk0 First SD/MMC card
1 = /dev/mmcblk0p1 First partition on first MMC card
8 = /dev/mmcblk1 Second SD/MMC card
...
179 char CCube DVXChip-based PCI products 179 char CCube DVXChip-based PCI products
0 = /dev/dvxirq0 First DVX device 0 = /dev/dvxirq0 First DVX device
1 = /dev/dvxirq1 Second DVX device 1 = /dev/dvxirq1 Second DVX device
...@@ -2560,6 +2534,9 @@ Your cooperation is appreciated. ...@@ -2560,6 +2534,9 @@ Your cooperation is appreciated.
96 = /dev/usb/hiddev0 1st USB HID device 96 = /dev/usb/hiddev0 1st USB HID device
... ...
111 = /dev/usb/hiddev15 16th USB HID device 111 = /dev/usb/hiddev15 16th USB HID device
112 = /dev/usb/auer0 1st auerswald ISDN device
...
127 = /dev/usb/auer15 16th auerswald ISDN device
128 = /dev/usb/brlvgr0 First Braille Voyager device 128 = /dev/usb/brlvgr0 First Braille Voyager device
... ...
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device 131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
...@@ -2810,6 +2787,20 @@ Your cooperation is appreciated. ...@@ -2810,6 +2787,20 @@ Your cooperation is appreciated.
... ...
190 = /dev/ttyUL3 Xilinx uartlite - port 3 190 = /dev/ttyUL3 Xilinx uartlite - port 3
191 = /dev/xvc0 Xen virtual console - port 0 191 = /dev/xvc0 Xen virtual console - port 0
192 = /dev/ttyPZ0 pmac_zilog - port 0
...
195 = /dev/ttyPZ3 pmac_zilog - port 3
196 = /dev/ttyTX0 TX39/49 serial port 0
...
204 = /dev/ttyTX7 TX39/49 serial port 7
205 = /dev/ttySC0 SC26xx serial port 0
206 = /dev/ttySC1 SC26xx serial port 1
207 = /dev/ttySC2 SC26xx serial port 2
208 = /dev/ttySC3 SC26xx serial port 3
209 = /dev/ttyMAX0 MAX3100 serial port 0
210 = /dev/ttyMAX1 MAX3100 serial port 1
211 = /dev/ttyMAX2 MAX3100 serial port 2
212 = /dev/ttyMAX3 MAX3100 serial port 3
205 char Low-density serial ports (alternate device) 205 char Low-density serial ports (alternate device)
0 = /dev/culu0 Callout device for ttyLU0 0 = /dev/culu0 Callout device for ttyLU0
...@@ -3145,6 +3136,14 @@ Your cooperation is appreciated. ...@@ -3145,6 +3136,14 @@ Your cooperation is appreciated.
1 = /dev/blockrom1 Second ROM card's translation layer interface 1 = /dev/blockrom1 Second ROM card's translation layer interface
... ...
259 block Block Extended Major
Used dynamically to hold additional partition minor
numbers and allow large numbers of partitions per device
259 char FPGA configuration interfaces
0 = /dev/icap0 First Xilinx internal configuration
1 = /dev/icap1 Second Xilinx internal configuration
260 char OSD (Object-based-device) SCSI Device 260 char OSD (Object-based-device) SCSI Device
0 = /dev/osd0 First OSD Device 0 = /dev/osd0 First OSD Device
1 = /dev/osd1 Second OSD Device 1 = /dev/osd1 Second OSD Device
......
...@@ -11,8 +11,6 @@ aty128fb.txt ...@@ -11,8 +11,6 @@ aty128fb.txt
- info on the ATI Rage128 frame buffer driver. - info on the ATI Rage128 frame buffer driver.
cirrusfb.txt cirrusfb.txt
- info on the driver for Cirrus Logic chipsets. - info on the driver for Cirrus Logic chipsets.
cyblafb/
- directory with documentation files related to the cyblafb driver.
deferred_io.txt deferred_io.txt
- an introduction to deferred IO. - an introduction to deferred IO.
fbcon.txt fbcon.txt
......
Bugs
====
I currently don't know of any bug. Please do send reports to:
- linux-fbdev-devel@lists.sourceforge.net
- Knut_Petersen@t-online.de.
Untested features
=================
All LCD stuff is untested. If it worked in tridentfb, it should work in
cyblafb. Please test and report the results to Knut_Petersen@t-online.de.
Thanks to
=========
* Alan Hourihane, for writing the X trident driver
* Jani Monoses, for writing the tridentfb driver
* Antonino A. Daplas, for review of the first published
version of cyblafb and some code
* Jochen Hein, for testing and a helpfull bug report
Available Documentation
=======================
Apollo PLE 133 Chipset VT8601A North Bridge Datasheet, Rev. 1.82, October 22,
2001, available from VIA:
http://www.viavpsd.com/product/6/15/DS8601A182.pdf
The datasheet is incomplete, some registers that need to be programmed are not
explained at all and important bits are listed as "reserved". But you really
need the datasheet to understand the code. "p. xxx" comments refer to page
numbers of this document.
XFree/XOrg drivers are available and of good quality, looking at the code
there is a good idea if the datasheet does not provide enough information
or if the datasheet seems to be wrong.
#
# Sample fb.modes file
#
# Provides an incomplete list of working modes for
# the cyberblade/i1 graphics core.
#
# The value 4294967256 is used instead of -40. Of course, -40 is not
# a really reasonable value, but chip design does not always follow
# logic. Believe me, it's ok, and it's the way the BIOS does it.
#
# fbset requires 4294967256 in fb.modes and -40 as an argument to
# the -t parameter. That's also not too reasonable, and it might change
# in the future or might even be differt for your current version.
#
mode "640x480-50"
geometry 640 480 2048 4096 8
timings 47619 4294967256 24 17 0 216 3
endmode
mode "640x480-60"
geometry 640 480 2048 4096 8
timings 39682 4294967256 24 17 0 216 3
endmode
mode "640x480-70"
geometry 640 480 2048 4096 8
timings 34013 4294967256 24 17 0 216 3
endmode
mode "640x480-72"
geometry 640 480 2048 4096 8
timings 33068 4294967256 24 17 0 216 3
endmode
mode "640x480-75"
geometry 640 480 2048 4096 8
timings 31746 4294967256 24 17 0 216 3
endmode
mode "640x480-80"
geometry 640 480 2048 4096 8
timings 29761 4294967256 24 17 0 216 3
endmode
mode "640x480-85"
geometry 640 480 2048 4096 8
timings 28011 4294967256 24 17 0 216 3
endmode
mode "800x600-50"
geometry 800 600 2048 4096 8
timings 30303 96 24 14 0 136 11
endmode
mode "800x600-60"
geometry 800 600 2048 4096 8
timings 25252 96 24 14 0 136 11
endmode
mode "800x600-70"
geometry 800 600 2048 4096 8
timings 21645 96 24 14 0 136 11
endmode
mode "800x600-72"
geometry 800 600 2048 4096 8
timings 21043 96 24 14 0 136 11
endmode
mode "800x600-75"
geometry 800 600 2048 4096 8
timings 20202 96 24 14 0 136 11
endmode
mode "800x600-80"
geometry 800 600 2048 4096 8
timings 18939 96 24 14 0 136 11
endmode
mode "800x600-85"
geometry 800 600 2048 4096 8
timings 17825 96 24 14 0 136 11
endmode
mode "1024x768-50"
geometry 1024 768 2048 4096 8
timings 19054 144 24 29 0 120 3
endmode
mode "1024x768-60"
geometry 1024 768 2048 4096 8
timings 15880 144 24 29 0 120 3
endmode
mode "1024x768-70"
geometry 1024 768 2048 4096 8
timings 13610 144 24 29 0 120 3
endmode
mode "1024x768-72"
geometry 1024 768 2048 4096 8
timings 13232 144 24 29 0 120 3
endmode
mode "1024x768-75"
geometry 1024 768 2048 4096 8
timings 12703 144 24 29 0 120 3
endmode
mode "1024x768-80"
geometry 1024 768 2048 4096 8
timings 11910 144 24 29 0 120 3
endmode
mode "1024x768-85"
geometry 1024 768 2048 4096 8
timings 11209 144 24 29 0 120 3
endmode
mode "1280x1024-50"
geometry 1280 1024 2048 4096 8
timings 11114 232 16 39 0 160 3
endmode
mode "1280x1024-60"
geometry 1280 1024 2048 4096 8
timings 9262 232 16 39 0 160 3
endmode
mode "1280x1024-70"
geometry 1280 1024 2048 4096 8
timings 7939 232 16 39 0 160 3
endmode
mode "1280x1024-72"
geometry 1280 1024 2048 4096 8
timings 7719 232 16 39 0 160 3
endmode
mode "1280x1024-75"
geometry 1280 1024 2048 4096 8
timings 7410 232 16 39 0 160 3
endmode
mode "1280x1024-80"
geometry 1280 1024 2048 4096 8
timings 6946 232 16 39 0 160 3
endmode
mode "1280x1024-85"
geometry 1280 1024 2048 4096 8
timings 6538 232 16 39 0 160 3
endmode
Speed
=====
CyBlaFB is much faster than tridentfb and vesafb. Compare the performance data
for mode 1280x1024-[8,16,32]@61 Hz.
Test 1: Cat a file with 2000 lines of 0 characters.
Test 2: Cat a file with 2000 lines of 80 characters.
Test 3: Cat a file with 2000 lines of 160 characters.
All values show system time use in seconds, kernel 2.6.12 was used for
the measurements. 2.6.13 is a bit slower, 2.6.14 hopefully will include a
patch that speeds up kernel bitblitting a lot ( > 20%).
+-----------+-----------------------------------------------------+
| | not accelerated |
| TRIDENTFB +-----------------+-----------------+-----------------+
| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
| | noypan | ypan | noypan | ypan | noypan | ypan |
+-----------+--------+--------+--------+--------+--------+--------+
| Test 1 | 4.31 | 4.33 | 6.05 | 12.81 | ---- | ---- |
| Test 2 | 67.94 | 5.44 | 123.16 | 14.79 | ---- | ---- |
| Test 3 | 131.36 | 6.55 | 240.12 | 16.76 | ---- | ---- |
+-----------+--------+--------+--------+--------+--------+--------+
| Comments | | | completely bro- |
| | | | ken, monitor |
| | | | switches off |
+-----------+-----------------+-----------------+-----------------+
+-----------+-----------------------------------------------------+
| | accelerated |
| TRIDENTFB +-----------------+-----------------+-----------------+
| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
| | noypan | ypan | noypan | ypan | noypan | ypan |
+-----------+--------+--------+--------+--------+--------+--------+
| Test 1 | ---- | ---- | 20.62 | 1.22 | ---- | ---- |
| Test 2 | ---- | ---- | 22.61 | 3.19 | ---- | ---- |
| Test 3 | ---- | ---- | 24.59 | 5.16 | ---- | ---- |
+-----------+--------+--------+--------+--------+--------+--------+
| Comments | broken, writing | broken, ok only | completely bro- |
| | to wrong places | if bgcolor is | ken, monitor |
| | on screen + bug | black, bug in | switches off |
| | in fillrect() | fillrect() | |
+-----------+-----------------+-----------------+-----------------+
+-----------+-----------------------------------------------------+
| | not accelerated |
| VESAFB +-----------------+-----------------+-----------------+
| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
| | noypan | ypan | noypan | ypan | noypan | ypan |
+-----------+--------+--------+--------+--------+--------+--------+
| Test 1 | 4.26 | 3.76 | 5.99 | 7.23 | ---- | ---- |
| Test 2 | 65.65 | 4.89 | 120.88 | 9.08 | ---- | ---- |
| Test 3 | 126.91 | 5.94 | 235.77 | 11.03 | ---- | ---- |
+-----------+--------+--------+--------+--------+--------+--------+
| Comments | vga=0x307 | vga=0x31a | vga=0x31b not |
| | fh=80kHz | fh=80kHz | supported by |
| | fv=75kHz | fv=75kHz | video BIOS and |
| | | | hardware |
+-----------+-----------------+-----------------+-----------------+
+-----------+-----------------------------------------------------+
| | accelerated |
| CYBLAFB +-----------------+-----------------+-----------------+
| | 8 bpp | 16 bpp | 32 bpp |
| | noypan | ypan | noypan | ypan | noypan | ypan |
+-----------+--------+--------+--------+--------+--------+--------+
| Test 1 | 8.02 | 0.23 | 19.04 | 0.61 | 57.12 | 2.74 |
| Test 2 | 8.38 | 0.55 | 19.39 | 0.92 | 57.54 | 3.13 |
| Test 3 | 8.73 | 0.86 | 19.74 | 1.24 | 57.95 | 3.51 |
+-----------+--------+--------+--------+--------+--------+--------+
| Comments | | | |
| | | | |
| | | | |
| | | | |
+-----------+-----------------+-----------------+-----------------+
TODO / Missing features
=======================
Verify LCD stuff "stretch" and "center" options are
completely untested ... this code needs to be
verified. As I don't have access to such
hardware, please contact me if you are
willing run some tests.
Interlaced video modes The reason that interleaved
modes are disabled is that I do not know
the meaning of the vertical interlace
parameter. Also the datasheet mentions a
bit d8 of a horizontal interlace parameter,
but nowhere the lower 8 bits. Please help
if you can.
low-res double scan modes Who needs it?
accelerated color blitting Who needs it? The console driver does use color
blitting for nothing but drawing the penguine,
everything else is done using color expanding
blitting of 1bpp character bitmaps.
ioctls Who needs it?
TV-out Will be done later. Use "vga= " at boot time
to set a suitable video mode.
??? Feel free to contact me if you have any
feature requests
CyBlaFB is a framebuffer driver for the Cyberblade/i1 graphics core integrated
into the VIA Apollo PLE133 (aka vt8601) south bridge. It is developed and
tested using a VIA EPIA 5000 board.
Cyblafb - compiled into the kernel or as a module?
==================================================
You might compile cyblafb either as a module or compile it permanently into the
kernel.
Unless you have a real reason to do so you should not compile both vesafb and
cyblafb permanently into the kernel. It's possible and it helps during the
developement cycle, but it's useless and will at least block some otherwise
usefull memory for ordinary users.
Selecting Modes
===============
Startup Mode
============
First of all, you might use the "vga=???" boot parameter as it is
documented in vesafb.txt and svga.txt. Cyblafb will detect the video
mode selected and will use the geometry and timings found by
inspecting the hardware registers.
video=cyblafb vga=0x317
Alternatively you might use a combination of the mode, ref and bpp
parameters. If you compiled the driver into the kernel, add something
like this to the kernel command line:
video=cyblafb:1280x1024,bpp=16,ref=50 ...
If you compiled the driver as a module, the same mode would be
selected by the following command:
modprobe cyblafb mode=1280x1024 bpp=16 ref=50 ...
None of the modes possible to select as startup modes are affected by
the problems described at the end of the next subsection.
For all startup modes cyblafb chooses a virtual x resolution of 2048,
the only exception is mode 1280x1024 in combination with 32 bpp. This
allows ywrap scrolling for all those modes if rotation is 0 or 2, and
also fast scrolling if rotation is 1 or 3. The default virtual y reso-
lution is 4096 for bpp == 8, 2048 for bpp==16 and 1024 for bpp == 32,
again with the only exception of 1280x1024 at 32 bpp.
Please do set your video memory size to 8 Mb in the Bios setup. Other
values will work, but performace is decreased for a lot of modes.
Mode changes using fbset
========================
You might use fbset to change the video mode, see "man fbset". Cyblafb
generally does assume that you know what you are doing. But it does
some checks, especially those that are needed to prevent you from
damaging your hardware.
- only 8, 16, 24 and 32 bpp video modes are accepted
- interlaced video modes are not accepted
- double scan video modes are not accepted
- if a flat panel is found, cyblafb does not allow you
to program a resolution higher than the physical
resolution of the flat panel monitor
- cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp
and (currently) 24 bit modes use a doubled vclk internally,
the dotclock limit as seen by fbset is 115 MHz for those
modes and 230 MHz for 8 and 16 bpp modes.
- cyblafb will allow you to select very high resolutions as
long as the hardware can be programmed to these modes. The
documented limit 1600x1200 is not enforced, but don't expect
perfect signal quality.
Any request that violates the rules given above will be either changed
to something the hardware supports or an error value will be returned.
If you program a virtual y resolution higher than the hardware limit,
cyblafb will silently decrease that value to the highest possible
value. The same is true for a virtual x resolution that is not
supported by the hardware. Cyblafb tries to adapt vyres first because
vxres decides if ywrap scrolling is possible or not.
Attempts to disable acceleration are ignored, I believe that this is
safe.
Some video modes that should work do not work as expected. If you use
the standard fb.modes, fbset 640x480-60 will program that mode, but
you will see a vertical area, about two characters wide, with only
much darker characters than the other characters on the screen.
Cyblafb does allow that mode to be set, as it does not violate the
official specifications. It would need a lot of code to reliably sort
out all invalid modes, playing around with the margin values will
give a valid mode quickly. And if cyblafb would detect such an invalid
mode, should it silently alter the requested values or should it
report an error? Both options have some pros and cons. As stated
above, none of the startup modes are affected, and if you set
verbosity to 1 or higher, cyblafb will print the fbset command that
would be needed to program that mode using fbset.
Other Parameters
================
crt don't autodetect, assume monitor connected to
standard VGA connector
fp don't autodetect, assume flat panel display
connected to flat panel monitor interface
nativex inform driver about native x resolution of
flat panel monitor connected to special
interface (should be autodetected)
stretch stretch image to adapt low resolution modes to
higer resolutions of flat panel monitors
connected to special interface
center center image to adapt low resolution modes to
higer resolutions of flat panel monitors
connected to special interface
memsize use if autodetected memsize is wrong ...
should never be necessary
nopcirr disable PCI read retry
nopciwr disable PCI write retry
nopcirb disable PCI read bursts
nopciwb disable PCI write bursts
bpp bpp for specified modes
valid values: 8 || 16 || 24 || 32
ref refresh rate for specified mode
valid values: 50 <= ref <= 85
mode 640x480 or 800x600 or 1024x768 or 1280x1024
if not specified, the startup mode will be detected
and used, so you might also use the vga=??? parameter
described in vesafb.txt. If you do not specify a mode,
bpp and ref parameters are ignored.
verbosity 0 is the default, increase to at least 2 for every
bug report!
Development hints
=================
It's much faster do compile a module and to load the new version after
unloading the old module than to compile a new kernel and to reboot. So if you
try to work on cyblafb, it might be a good idea to use cyblafb as a module.
In real life, fast often means dangerous, and that's also the case here. If
you introduce a serious bug when cyblafb is compiled into the kernel, the
kernel will lock or oops with a high probability before the file system is
mounted, and the danger for your data is low. If you load a broken own version
of cyblafb on a running system, the danger for the integrity of the file
system is much higher as you might need a hard reset afterwards. Decide
yourself.
Module unloading, the vfb method
================================
If you want to unload/reload cyblafb using the virtual framebuffer, you need
to enable vfb support in the kernel first. After that, load the modules as
shown below:
modprobe vfb vfb_enable=1
modprobe fbcon
modprobe cyblafb
fbset -fb /dev/fb1 1280x1024-60 -vyres 2662
con2fb /dev/fb1 /dev/tty1
...
If you now made some changes to cyblafb and want to reload it, you might do it
as show below:
con2fb /dev/fb0 /dev/tty1
...
rmmod cyblafb
modprobe cyblafb
con2fb /dev/fb1 /dev/tty1
...
Of course, you might choose another mode, and most certainly you also want to
map some other /dev/tty* to the real framebuffer device. You might also choose
to compile fbcon as a kernel module or place it permanently in the kernel.
I do not know of any way to unload fbcon, and fbcon will prevent the
framebuffer device loaded first from unloading. [If there is a way, then
please add a description here!]
Module unloading, the vesafb method
===================================
Configure the kernel:
<*> Support for frame buffer devices
[*] VESA VGA graphics support
<M> Cyberblade/i1 support
Add e.g. "video=vesafb:ypan vga=0x307" to the kernel parameters. The ypan
parameter is important, choose any vga parameter you like as long as it is
a graphics mode.
After booting, load cyblafb without any mode and bpp parameter and assign
cyblafb to individual ttys using con2fb, e.g.:
modprobe cyblafb
con2fb /dev/fb1 /dev/tty1
Unloading cyblafb works without problems after you assign vesafb to all
ttys again, e.g.:
con2fb /dev/fb0 /dev/tty1
rmmod cyblafb
0.62
====
- the vesafb parameter has been removed as I decided to allow the
feature without any special parameter.
- Cyblafb does not use the vga style of panning any longer, now the
"right view" register in the graphics engine IO space is used. Without
that change it was impossible to use all available memory, and without
access to all available memory it is impossible to ywrap.
- The imageblit function now uses hardware acceleration for all font
widths. Hardware blitting across pixel column 2048 is broken in the
cyberblade/i1 graphics core, but we work around that hardware bug.
- modes with vxres != xres are supported now.
- ywrap scrolling is supported now and the default. This is a big
performance gain.
- default video modes use vyres > yres and vxres > xres to allow
almost optimal scrolling speed for normal and rotated screens
- some features mainly usefull for debugging the upper layers of the
framebuffer system have been added, have a look at the code
- fixed: Oops after unloading cyblafb when reading /proc/io*
- we work around some bugs of the higher framebuffer layers.
I tried the following framebuffer drivers:
- TRIDENTFB is full of bugs. Acceleration is broken for Blade3D
graphics cores like the cyberblade/i1. It claims to support a great
number of devices, but documentation for most of these devices is
unfortunately not available. There is _no_ reason to use tridentfb
for cyberblade/i1 + CRT users. VESAFB is faster, and the one
advantage, mode switching, is broken in tridentfb.
- VESAFB is used by many distributions as a standard. Vesafb does
not support mode switching. VESAFB is a bit faster than the working
configurations of TRIDENTFB, but it is still too slow, even if you
use ypan.
- EPIAFB (you'll find it on sourceforge) supports the Cyberblade/i1
graphics core, but it still has serious bugs and developement seems
to have stopped. This is the one driver with TV-out support. If you
do need this feature, try epiafb.
None of these drivers was a real option for me.
I believe that is unreasonable to change code that announces to support 20
devices if I only have more or less sufficient documentation for exactly one
of these. The risk of breaking device foo while fixing device bar is too high.
So I decided to start CyBlaFB as a stripped down tridentfb.
All code specific to other Trident chips has been removed. After that there
were a lot of cosmetic changes to increase the readability of the code. All
register names were changed to those mnemonics used in the datasheet. Function
and macro names were changed if they hindered easy understanding of the code.
After that I debugged the code and implemented some new features. I'll try to
give a little summary of the main changes:
- calculation of vertical and horizontal timings was fixed
- video signal quality has been improved dramatically
- acceleration:
- fillrect and copyarea were fixed and reenabled
- color expanding imageblit was newly implemented, color
imageblit (only used to draw the penguine) still uses the
generic code.
- init of the acceleration engine was improved and moved to a
place where it really works ...
- sync function has a timeout now and tries to reset and
reinit the accel engine if necessary
- fewer slow copyarea calls when doing ypan scrolling by using
undocumented bit d21 of screen start address stored in
CR2B[5]. BIOS does use it also, so this should be safe.
- cyblafb rejects any attempt to set modes that would cause vclk
values above reasonable 230 MHz. 32bit modes use a clock
multiplicator of 2, so fbset does show the correct values for
pixclock but not for vclk in this case. The fbset limit is 115 MHz
for 32 bpp modes.
- cyblafb rejects modes known to be broken or unimplemented (all
interlaced modes, all doublescan modes for now)
- cyblafb now works independant of the video mode in effect at startup
time (tridentfb does not init all needed registers to reasonable
values)
- switching between video modes does work reliably now
- the first video mode now is the one selected on startup using the
vga=???? mechanism or any of
- 640x480, 800x600, 1024x768, 1280x1024
- 8, 16, 24 or 32 bpp
- refresh between 50 Hz and 85 Hz, 1 Hz steps (1280x1024-32
is limited to 63Hz)
- pci retry and pci burst mode are settable (try to disable if you
experience latency problems)
- built as a module cyblafb might be unloaded and reloaded using
the vfb module and con2vt or might be used together with vesafb
...@@ -59,7 +59,8 @@ Accepted options: ...@@ -59,7 +59,8 @@ Accepted options:
ypan Enable display panning using the VESA protected mode ypan Enable display panning using the VESA protected mode
interface. The visible screen is just a window of the interface. The visible screen is just a window of the
video memory, console scrolling is done by changing the video memory, console scrolling is done by changing the
start of the window. Available on x86 only. start of the window. This option is available on x86
only and is the default option on that architecture.
ywrap Same as ypan, but assumes your gfx board can wrap-around ywrap Same as ypan, but assumes your gfx board can wrap-around
the video memory (i.e. starts reading from top if it the video memory (i.e. starts reading from top if it
...@@ -67,7 +68,7 @@ ywrap Same as ypan, but assumes your gfx board can wrap-around ...@@ -67,7 +68,7 @@ ywrap Same as ypan, but assumes your gfx board can wrap-around
Available on x86 only. Available on x86 only.
redraw Scroll by redrawing the affected part of the screen, this redraw Scroll by redrawing the affected part of the screen, this
is the safe (and slow) default. is the default on non-x86.
(If you're using uvesafb as a module, the above three options are (If you're using uvesafb as a module, the above three options are
used a parameter of the scroll option, e.g. scroll=ypan.) used a parameter of the scroll option, e.g. scroll=ypan.)
...@@ -182,7 +183,7 @@ from the Video BIOS if you set pixclock to 0 in fb_var_screeninfo. ...@@ -182,7 +183,7 @@ from the Video BIOS if you set pixclock to 0 in fb_var_screeninfo.
-- --
Michal Januszewski <spock@gentoo.org> Michal Januszewski <spock@gentoo.org>
Last updated: 2007-06-16 Last updated: 2009-03-30
Documentation of the uvesafb options is loosely based on vesafb.txt. Documentation of the uvesafb options is loosely based on vesafb.txt.
...@@ -255,6 +255,16 @@ Who: Jan Engelhardt <jengelh@computergmbh.de> ...@@ -255,6 +255,16 @@ Who: Jan Engelhardt <jengelh@computergmbh.de>
--------------------------- ---------------------------
What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib
When: February 2010
Why: All callers should use explicit gpio_request()/gpio_free().
The autorequest mechanism in gpiolib was provided mostly as a
migration aid for legacy GPIO interfaces (for SOC based GPIOs).
Those users have now largely migrated. Platforms implementing
the GPIO interfaces without using gpiolib will see no changes.
Who: David Brownell <dbrownell@users.sourceforge.net>
---------------------------
What: b43 support for firmware revision < 410 What: b43 support for firmware revision < 410
When: The schedule was July 2008, but it was decided that we are going to keep the When: The schedule was July 2008, but it was decided that we are going to keep the
code as long as there are no major maintanance headaches. code as long as there are no major maintanance headaches.
...@@ -273,13 +283,6 @@ Who: Glauber Costa <gcosta@redhat.com> ...@@ -273,13 +283,6 @@ Who: Glauber Costa <gcosta@redhat.com>
--------------------------- ---------------------------
What: remove HID compat support
When: 2.6.29
Why: needed only as a temporary solution until distros fix themselves up
Who: Jiri Slaby <jirislaby@gmail.com>
---------------------------
What: print_fn_descriptor_symbol() What: print_fn_descriptor_symbol()
When: October 2009 When: October 2009
Why: The %pF vsprintf format provides the same functionality in a Why: The %pF vsprintf format provides the same functionality in a
...@@ -311,6 +314,18 @@ Who: Vlad Yasevich <vladislav.yasevich@hp.com> ...@@ -311,6 +314,18 @@ Who: Vlad Yasevich <vladislav.yasevich@hp.com>
--------------------------- ---------------------------
What: Ability for non root users to shm_get hugetlb pages based on mlock
resource limits
When: 2.6.31
Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or
have CAP_IPC_LOCK to be able to allocate shm segments backed by
huge pages. The mlock based rlimit check to allow shm hugetlb is
inconsistent with mmap based allocations. Hence it is being
deprecated.
Who: Ravikiran Thirumalai <kiran@scalex86.org>
---------------------------
What: CONFIG_THERMAL_HWMON What: CONFIG_THERMAL_HWMON
When: January 2009 When: January 2009
Why: This option was introduced just to allow older lm-sensors userspace Why: This option was introduced just to allow older lm-sensors userspace
...@@ -339,7 +354,8 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl> ...@@ -339,7 +354,8 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl>
--------------------------- ---------------------------
What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client() What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client(),
i2c_adapter->client_register(), i2c_adapter->client_unregister
When: 2.6.30 When: 2.6.30
Check: i2c_attach_client i2c_detach_client Check: i2c_attach_client i2c_detach_client
Why: Deprecated by the new (standard) device driver binding model. Use Why: Deprecated by the new (standard) device driver binding model. Use
...@@ -380,3 +396,44 @@ Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t) ...@@ -380,3 +396,44 @@ Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t)
have been kept around for migration reasons. After more than two years have been kept around for migration reasons. After more than two years
it's time to remove them finally it's time to remove them finally
Who: Thomas Gleixner <tglx@linutronix.de> Who: Thomas Gleixner <tglx@linutronix.de>
---------------------------
What: fakephp and associated sysfs files in /sys/bus/pci/slots/
When: 2011
Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to
represent a machine's physical PCI slots. The change in semantics
had userspace implications, as the hotplug core no longer allowed
drivers to create multiple sysfs files per physical slot (required
for multi-function devices, e.g.). fakephp was seen as a developer's
tool only, and its interface changed. Too late, we learned that
there were some users of the fakephp interface.
In 2.6.30, the original fakephp interface was restored. At the same
time, the PCI core gained the ability that fakephp provided, namely
function-level hot-remove and hot-add.
Since the PCI core now provides the same functionality, exposed in:
/sys/bus/pci/rescan
/sys/bus/pci/devices/.../remove
/sys/bus/pci/devices/.../rescan
there is no functional reason to maintain fakephp as well.
We will keep the existing module so that 'modprobe fakephp' will
present the old /sys/bus/pci/slots/... interface for compatibility,
but users are urged to migrate their applications to the API above.
After a reasonable transition period, we will remove the legacy
fakephp interface.
Who: Alex Chiang <achiang@hp.com>
---------------------------
What: i2c-voodoo3 driver
When: October 2009
Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
driver but this caused driver conflicts.
Who: Jean Delvare <khali@linux-fr.org>
Krzysztof Helt <krzysztof.h1@wp.pl>
...@@ -68,6 +68,8 @@ ncpfs.txt ...@@ -68,6 +68,8 @@ ncpfs.txt
- info on Novell Netware(tm) filesystem using NCP protocol. - info on Novell Netware(tm) filesystem using NCP protocol.
nfsroot.txt nfsroot.txt
- short guide on setting up a diskless box with NFS root filesystem. - short guide on setting up a diskless box with NFS root filesystem.
nilfs2.txt
- info and mount options for the NILFS2 filesystem.
ntfs.txt ntfs.txt
- info and mount options for the NTFS filesystem (Windows NT). - info and mount options for the NTFS filesystem (Windows NT).
ocfs2.txt ocfs2.txt
......
...@@ -505,7 +505,7 @@ prototypes: ...@@ -505,7 +505,7 @@ prototypes:
void (*open)(struct vm_area_struct*); void (*open)(struct vm_area_struct*);
void (*close)(struct vm_area_struct*); void (*close)(struct vm_area_struct*);
int (*fault)(struct vm_area_struct*, struct vm_fault *); int (*fault)(struct vm_area_struct*, struct vm_fault *);
int (*page_mkwrite)(struct vm_area_struct *, struct page *); int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *);
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
locking rules: locking rules:
......
此差异已折叠。
===============================================
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
===============================================
Contents:
(*) Overview.
(*) Requirements.
(*) Configuration.
(*) Starting the cache.
(*) Things to avoid.
(*) Cache culling.
(*) Cache structure.
(*) Security model and SELinux.
(*) A note on security.
(*) Statistical information.
(*) Debugging.
========
OVERVIEW
========
CacheFiles is a caching backend that's meant to use as a cache a directory on
an already mounted filesystem of a local type (such as Ext3).
CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling. This is called cachefilesd and lives in
/sbin.
The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services. Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.
CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
to communication with the daemon. Only one thing may have this open at once,
and whilst it is open, a cache is at least partially in existence. The daemon
opens this and sends commands down it to control the cache.
CacheFiles is currently limited to a single cache.
CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section. This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.
============
REQUIREMENTS
============
The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:
- dnotify.
- extended attributes (xattrs).
- openat() and friends.
- bmap() support on files in the filesystem (FIBMAP ioctl).
- The use of bmap() to detect a partial page at the end of the file.
It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.
=============
CONFIGURATION
=============
The cache is configured by a script in /etc/cachefilesd.conf. These commands
set up cache ready for use. The following script commands are available:
(*) brun <N>%
(*) bcull <N>%
(*) bstop <N>%
(*) frun <N>%
(*) fcull <N>%
(*) fstop <N>%
Configure the culling limits. Optional. See the section on culling
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
The commands beginning with a 'b' are file space (block) limits, those
beginning with an 'f' are file count limits.
(*) dir <path>
Specify the directory containing the root of the cache. Mandatory.
(*) tag <name>
Specify a tag to FS-Cache to use in distinguishing multiple caches.
Optional. The default is "CacheFiles".
(*) debug <mask>
Specify a numeric bitmask to control debugging in the kernel module.
Optional. The default is zero (all off). The following values can be
OR'd into the mask to collect various information:
1 Turn on trace of function entry (_enter() macros)
2 Turn on trace of function exit (_leave() macros)
4 Turn on trace of internal debug points (_debug())
This mask can also be set through sysfs, eg:
echo 5 >/sys/modules/cachefiles/parameters/debug
==================
STARTING THE CACHE
==================
The cache is started by running the daemon. The daemon opens the cache device,
configures the cache and tells it to begin caching. At that point the cache
binds to fscache and the cache becomes live.
The daemon is run as follows:
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
The flags are:
(*) -d
Increase the debugging level. This can be specified multiple times and
is cumulative with itself.
(*) -s
Send messages to stderr instead of syslog.
(*) -n
Don't daemonise and go into background.
(*) -f <configfile>
Use an alternative configuration file rather than the default one.
===============
THINGS TO AVOID
===============
Do not mount other things within the cache as this will cause problems. The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.
Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.
Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).
Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.
Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.
Do not chmod files in the cache. The module creates things with minimal
permissions to prevent random users being able to access them directly.
=============
CACHE CULLING
=============
The cache may need culling occasionally to make space. This involves
discarding objects from the cache that have been used less recently than
anything else. Culling is based on the access time of data objects. Empty
directories are culled if not in use.
Cache culling is done on the basis of the percentage of blocks and the
percentage of files available in the underlying filesystem. There are six
"limits":
(*) brun
(*) frun
If the amount of free space and the number of available files in the cache
rises above both these limits, then culling is turned off.
(*) bcull
(*) fcull
If the amount of available space or the number of available files in the
cache falls below either of these limits, then culling is started.
(*) bstop
(*) fstop
If the amount of available space or the number of available files in the
cache falls below either of these limits, then no further allocation of
disk space or files is permitted until culling has raised things above
these limits again.
These must be configured thusly:
0 <= bstop < bcull < brun < 100
0 <= fstop < fcull < frun < 100
Note that these are percentages of available space and available files, and do
_not_ appear as 100 minus the percentage displayed by the "df" program.
The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order. A new scan of the cache is
started as soon as space is made in the table. Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.
===============
CACHE STRUCTURE
===============
The CacheFiles module will create two directories in the directory it was
given:
(*) cache/
(*) graveyard/
The active cache objects all reside in the first directory. The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.
The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.
The module represents index objects as directories with the filename "I..." or
"J...". Note that the "cache/" directory is itself a special index.
Data objects are represented as files if they have no children, or directories
if they do. Their filenames all begin "D..." or "E...". If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.
Special objects are similar to data objects, except their filenames begin
"S..." or "T...".
If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended. Into
this directory, if possible, will be placed the representations of the child
objects:
INDEX INDEX INDEX DATA FILES
========= ========== ================================= ================
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory. The names of the intermediate directories will have
'+' prepended:
J1223/@23/+xy...z/+kl...m/Epqr
Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.
To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable. The two versions of
object filenames indicate the encoding:
OBJECT TYPE PRINTABLE ENCODED
=============== =============== ===============
Index "I..." "J..."
Data "D..." "E..."
Special "S..." "T..."
Intermediate directories are always "@" or "+" as appropriate.
Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs. The latter is used to detect stale objects in the cache and update
or retire them.
Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).
==========================
SECURITY MODEL AND SELINUX
==========================
CacheFiles is implemented to deal properly with the LSM security features of
the Linux kernel and the SELinux facility.
One of the problems that CacheFiles faces is that it is generally acting on
behalf of a process, and running in that process's context, and that includes a
security context that is not appropriate for accessing the cache - either
because the files in the cache are inaccessible to that process, or because if
the process creates a file in the cache, that file may be inaccessible to other
processes.
The way CacheFiles works is to temporarily change the security context (fsuid,
fsgid and actor security label) that the process acts as - without changing the
security context of the process when it the target of an operation performed by
some other process (so signalling and suchlike still work correctly).
When the CacheFiles module is asked to bind to its cache, it:
(1) Finds the security label attached to the root cache directory and uses
that as the security label with which it will create files. By default,
this is:
cachefiles_var_t
(2) Finds the security label of the process which issued the bind request
(presumed to be the cachefilesd daemon), which by default will be:
cachefilesd_t
and asks LSM to supply a security ID as which it should act given the
daemon's label. By default, this will be:
cachefiles_kernel_t
SELinux transitions the daemon's security ID to the module's security ID
based on a rule of this form in the policy.
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
For instance:
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
The module's security ID gives it permission to create, move and remove files
and directories in the cache, to find and access directories and files in the
cache, to set and access extended attributes on cache objects, and to read and
write files in the cache.
The daemon's security ID gives it only a very restricted set of permissions: it
may scan directories, stat files and erase files and directories. It may
not read or write files in the cache, and so it is precluded from accessing the
data cached therein; nor is it permitted to create new files in the cache.
There are policy source files available in:
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
and later versions. In that tarball, see the files:
cachefilesd.te
cachefilesd.fc
cachefilesd.if
They are built and installed directly by the RPM.
If a non-RPM based system is being used, then copy the above files to their own
directory and run:
make -f /usr/share/selinux/devel/Makefile
semodule -i cachefilesd.pp
You will need checkpolicy and selinux-policy-devel installed prior to the
build.
By default, the cache is located in /var/fscache, but if it is desirable that
it should be elsewhere, than either the above policy files must be altered, or
an auxiliary policy must be installed to label the alternate location of the
cache.
For instructions on how to add an auxiliary policy to enable the cache to be
located elsewhere when SELinux is in enforcing mode, please see:
/usr/share/doc/cachefilesd-*/move-cache.txt
When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.
==================
A NOTE ON SECURITY
==================
CacheFiles makes use of the split security in the task_struct. It allocates
its own task_security structure, and redirects current->act_as to point to it
when it acts on behalf of another process, in that process's context.
The reason it does this is that it calls vfs_mkdir() and suchlike rather than
bypassing security and calling inode ops directly. Therefore the VFS and LSM
may deny the CacheFiles access to the cache data because under some
circumstances the caching code is running in the security context of whatever
process issued the original syscall on the netfs.
Furthermore, should CacheFiles create a file or directory, the security
parameters with that object is created (UID, GID, security label) would be
derived from that process that issued the system call, thus potentially
preventing other processes from accessing the cache - including CacheFiles's
cache management daemon (cachefilesd).
What is required is to temporarily override the security of the process that
issued the system call. We can't, however, just do an in-place change of the
security data as that affects the process as an object, not just as a subject.
This means it may lose signals or ptrace events for example, and affects what
the process looks like in /proc.
So CacheFiles makes use of a logical split in the security between the
objective security (task->sec) and the subjective security (task->act_as). The
objective security holds the intrinsic security properties of a process and is
never overridden. This is what appears in /proc, and is what is used when a
process is the target of an operation by some other process (SIGKILL for
example).
The subjective security holds the active security properties of a process, and
may be overridden. This is not seen externally, and is used whan a process
acts upon another object, for example SIGKILLing another process or opening a
file.
LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
for CacheFiles to run in a context of a specific security label, or to create
files and directories with another security label.
=======================
STATISTICAL INFORMATION
=======================
If FS-Cache is compiled with the following option enabled:
CONFIG_CACHEFILES_HISTOGRAM=y
then it will gather certain statistics and display them through a proc file.
(*) /proc/fs/cachefiles/histogram
cat /proc/fs/cachefiles/histogram
JIFS SECS LOOKUPS MKDIRS CREATES
===== ===== ========= ========= =========
This shows the breakdown of the number of times each amount of time
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
columns are as follows:
COLUMN TIME MEASUREMENT
======= =======================================================
LOOKUPS Length of time to perform a lookup on the backing fs
MKDIRS Length of time to perform a mkdir on the backing fs
CREATES Length of time to perform a create on the backing fs
Each row shows the number of events that took a particular range of times.
Each step is 1 jiffy in size. The JIFS column indicates the particular
jiffy range covered, and the SECS field the equivalent number of seconds.
=========
DEBUGGING
=========
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
debugging enabled by adjusting the value in:
/sys/module/cachefiles/parameters/debug
This is a bitmask of debugging streams to enable:
BIT VALUE STREAM POINT
======= ======= =============================== =======================
0 1 General Function entry trace
1 2 Function exit trace
2 4 General
The appropriate set of values should be OR'd together and the result written to
the control file. For example:
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
will turn on all function entry debugging.
==========================
General Filesystem Caching
==========================
========
OVERVIEW
========
This facility is a general purpose cache for network filesystems, though it
could be used for caching other things such as ISO9660 filesystems too.
FS-Cache mediates between cache backends (such as CacheFS) and network
filesystems:
+---------+
| | +--------------+
| NFS |--+ | |
| | | +-->| CacheFS |
+---------+ | +----------+ | | /dev/hda5 |
| | | | +--------------+
+---------+ +-->| | |
| | | |--+
| AFS |----->| FS-Cache |
| | | |--+
+---------+ +-->| | |
| | | | +--------------+
+---------+ | +----------+ | | |
| | | +-->| CacheFiles |
| ISOFS |--+ | /var/cache |
| | +--------------+
+---------+
Or to look at it another way, FS-Cache is a module that provides a caching
facility to a network filesystem such that the cache is transparent to the
user:
+---------+
| |
| Server |
| |
+---------+
| NETWORK
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
| +----------+
V | |
+---------+ | |
| | | |
| NFS |----->| FS-Cache |
| | | |--+
+---------+ | | | +--------------+ +--------------+
| | | | | | | |
V +----------+ +-->| CacheFiles |-->| Ext3 |
+---------+ | /var/cache | | /dev/sda6 |
| | +--------------+ +--------------+
| VFS | ^ ^
| | | |
+---------+ +--------------+ |
| KERNEL SPACE | |
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
| USER SPACE | |
V | |
+---------+ +--------------+
| | | |
| Process | | cachefilesd |
| | | |
+---------+ +--------------+
FS-Cache does not follow the idea of completely loading every netfs file
opened in its entirety into a cache before permitting it to be accessed and
then serving the pages out of that cache rather than the netfs inode because:
(1) It must be practical to operate without a cache.
(2) The size of any accessible file must not be limited to the size of the
cache.
(3) The combined size of all opened files (this includes mapped libraries)
must not be limited to the size of the cache.
(4) The user should not be forced to download an entire file just to do a
one-off access of a small portion of it (such as might be done with the
"file" program).
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
the netfs('s) using it.
FS-Cache provides the following facilities:
(1) More than one cache can be used at once. Caches can be selected
explicitly by use of tags.
(2) Caches can be added / removed at any time.
(3) The netfs is provided with an interface that allows either party to
withdraw caching facilities from a file (required for (2)).
(4) The interface to the netfs returns as few errors as possible, preferring
rather to let the netfs remain oblivious.
(5) Cookies are used to represent indices, files and other objects to the
netfs. The simplest cookie is just a NULL pointer - indicating nothing
cached there.
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
desires, though it must be aware that the index search function is
recursive, stack space is limited, and indices can only be children of
indices.
(7) Data I/O is done direct to and from the netfs's pages. The netfs
indicates that page A is at index B of the data-file represented by cookie
C, and that it should be read or written. The cache backend may or may
not start I/O on that page, but if it does, a netfs callback will be
invoked to indicate completion. The I/O may be either synchronous or
asynchronous.
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
them as obsolete and the index hierarchy rooted at that point will get
recycled.
(9) The netfs provides a "match" function for index searches. In addition to
saying whether a match was made or not, this can also specify that an
entry should be updated or deleted.
(10) As much as possible is done asynchronously.
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
and pages are kept. Bits of this tree may actually reside in one or more
caches.
FSDEF
|
+------------------------------------+
| |
NFS AFS
| |
+--------------------------+ +-----------+
| | | |
homedir mirror afs.org redhat.com
| | |
+------------+ +---------------+ +----------+
| | | | | |
00001 00002 00007 00125 vol00001 vol00002
| | | | |
+---+---+ +-----+ +---+ +------+------+ +-----+----+
| | | | | | | | | | | | |
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
| |
PG0 +-------+
| |
00001 00003
|
+---+---+
| | |
PG0 PG1 PG2
In the example above, you can see two netfs's being backed: NFS and AFS. These
have different index hierarchies:
(*) The NFS primary index contains per-server indices. Each server index is
indexed by NFS file handles to get data file objects. Each data file
objects can have an array of pages, but may also have further child
objects, such as extended attributes and directory entries. Extended
attribute objects themselves have page-array contents.
(*) The AFS primary index contains per-cell indices. Each cell index contains
per-logical-volume indices. Each of volume index contains up to three
indices for the read-write, read-only and backup mirrors of those volumes.
Each of these contains vnode data file objects, each of which contains an
array of pages.
The very top index is the FS-Cache master index in which individual netfs's
have entries.
Any index object may reside in more than one cache, provided it only has index
children. Any index with non-index object children will be assumed to only
reside in one cache.
The netfs API to FS-Cache can be found in:
Documentation/filesystems/caching/netfs-api.txt
The cache backend API to FS-Cache can be found in:
Documentation/filesystems/caching/backend-api.txt
A description of the internal representations and object state machine can be
found in:
Documentation/filesystems/caching/object.txt
=======================
STATISTICAL INFORMATION
=======================
If FS-Cache is compiled with the following options enabled:
CONFIG_FSCACHE_STATS=y
CONFIG_FSCACHE_HISTOGRAM=y
then it will gather certain statistics and display them through a number of
proc files.
(*) /proc/fs/fscache/stats
This shows counts of a number of events that can happen in FS-Cache:
CLASS EVENT MEANING
======= ======= =======================================================
Cookies idx=N Number of index cookies allocated
dat=N Number of data storage cookies allocated
spc=N Number of special cookies allocated
Objects alc=N Number of objects allocated
nal=N Number of object allocation failures
avl=N Number of objects that reached the available state
ded=N Number of objects that reached the dead state
ChkAux non=N Number of objects that didn't have a coherency check
ok=N Number of objects that passed a coherency check
upd=N Number of objects that needed a coherency data update
obs=N Number of objects that were declared obsolete
Pages mrk=N Number of pages marked as being cached
unc=N Number of uncache page requests seen
Acquire n=N Number of acquire cookie requests seen
nul=N Number of acq reqs given a NULL parent
noc=N Number of acq reqs rejected due to no cache available
ok=N Number of acq reqs succeeded
nbf=N Number of acq reqs rejected due to error
oom=N Number of acq reqs failed on ENOMEM
Lookups n=N Number of lookup calls made on cache backends
neg=N Number of negative lookups made
pos=N Number of positive lookups made
crt=N Number of objects created by lookup
Updates n=N Number of update cookie requests seen
nul=N Number of upd reqs given a NULL parent
run=N Number of upd reqs granted CPU time
Relinqs n=N Number of relinquish cookie requests seen
nul=N Number of rlq reqs given a NULL parent
wcr=N Number of rlq reqs waited on completion of creation
AttrChg n=N Number of attribute changed requests seen
ok=N Number of attr changed requests queued
nbf=N Number of attr changed rejected -ENOBUFS
oom=N Number of attr changed failed -ENOMEM
run=N Number of attr changed ops given CPU time
Allocs n=N Number of allocation requests seen
ok=N Number of successful alloc reqs
wt=N Number of alloc reqs that waited on lookup completion
nbf=N Number of alloc reqs rejected -ENOBUFS
ops=N Number of alloc reqs submitted
owt=N Number of alloc reqs waited for CPU time
Retrvls n=N Number of retrieval (read) requests seen
ok=N Number of successful retr reqs
wt=N Number of retr reqs that waited on lookup completion
nod=N Number of retr reqs returned -ENODATA
nbf=N Number of retr reqs rejected -ENOBUFS
int=N Number of retr reqs aborted -ERESTARTSYS
oom=N Number of retr reqs failed -ENOMEM
ops=N Number of retr reqs submitted
owt=N Number of retr reqs waited for CPU time
Stores n=N Number of storage (write) requests seen
ok=N Number of successful store reqs
agn=N Number of store reqs on a page already pending storage
nbf=N Number of store reqs rejected -ENOBUFS
oom=N Number of store reqs failed -ENOMEM
ops=N Number of store reqs submitted
run=N Number of store reqs granted CPU time
Ops pend=N Number of times async ops added to pending queues
run=N Number of times async ops given CPU time
enq=N Number of times async ops queued for processing
dfr=N Number of async ops queued for deferred release
rel=N Number of async ops released
gc=N Number of deferred-release async ops garbage collected
(*) /proc/fs/fscache/histogram
cat /proc/fs/fscache/histogram
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
===== ===== ========= ========= ========= ========= =========
This shows the breakdown of the number of times each amount of time
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
columns are as follows:
COLUMN TIME MEASUREMENT
======= =======================================================
OBJ INST Length of time to instantiate an object
OP RUNS Length of time a call to process an operation took
OBJ RUNS Length of time a call to process an object event took
RETRV DLY Time between an requesting a read and lookup completing
RETRIEVLS Time between beginning and end of a retrieval
Each row shows the number of events that took a particular range of times.
Each step is 1 jiffy in size. The JIFS column indicates the particular
jiffy range covered, and the SECS field the equivalent number of seconds.
=========
DEBUGGING
=========
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
debugging enabled by adjusting the value in:
/sys/module/fscache/parameters/debug
This is a bitmask of debugging streams to enable:
BIT VALUE STREAM POINT
======= ======= =============================== =======================
0 1 Cache management Function entry trace
1 2 Function exit trace
2 4 General
3 8 Cookie management Function entry trace
4 16 Function exit trace
5 32 General
6 64 Page handling Function entry trace
7 128 Function exit trace
8 256 General
9 512 Operation management Function entry trace
10 1024 Function exit trace
11 2048 General
The appropriate set of values should be OR'd together and the result written to
the control file. For example:
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
will turn on all function entry debugging.
此差异已折叠。
====================================================
IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
====================================================
By: David Howells <dhowells@redhat.com>
Contents:
(*) Representation
(*) Object management state machine.
- Provision of cpu time.
- Locking simplification.
(*) The set of states.
(*) The set of events.
==============
REPRESENTATION
==============
FS-Cache maintains an in-kernel representation of each object that a netfs is
currently interested in. Such objects are represented by the fscache_cookie
struct and are referred to as cookies.
FS-Cache also maintains a separate in-kernel representation of the objects that
a cache backend is currently actively caching. Such objects are represented by
the fscache_object struct. The cache backends allocate these upon request, and
are expected to embed them in their own representations. These are referred to
as objects.
There is a 1:N relationship between cookies and objects. A cookie may be
represented by multiple objects - an index may exist in more than one cache -
or even by no objects (it may not be cached).
Furthermore, both cookies and objects are hierarchical. The two hierarchies
correspond, but the cookies tree is a superset of the union of the object trees
of multiple caches:
NETFS INDEX TREE : CACHE 1 : CACHE 2
: :
: +-----------+ :
+----------->| IObject | :
+-----------+ | : +-----------+ :
| ICookie |-------+ : | :
+-----------+ | : | : +-----------+
| +------------------------------>| IObject |
| : | : +-----------+
| : V : |
| : +-----------+ : |
V +----------->| IObject | : |
+-----------+ | : +-----------+ : |
| ICookie |-------+ : | : V
+-----------+ | : | : +-----------+
| +------------------------------>| IObject |
+-----+-----+ : | : +-----------+
| | : | : |
V | : V : |
+-----------+ | : +-----------+ : |
| ICookie |------------------------->| IObject | : |
+-----------+ | : +-----------+ : |
| V : | : V
| +-----------+ : | : +-----------+
| | ICookie |-------------------------------->| IObject |
| +-----------+ : | : +-----------+
V | : V : |
+-----------+ | : +-----------+ : |
| DCookie |------------------------->| DObject | : |
+-----------+ | : +-----------+ : |
| : : |
+-------+-------+ : : |
| | : : |
V V : : V
+-----------+ +-----------+ : : +-----------+
| DCookie | | DCookie |------------------------>| DObject |
+-----------+ +-----------+ : : +-----------+
: :
In the above illustration, ICookie and IObject represent indices and DCookie
and DObject represent data storage objects. Indices may have representation in
multiple caches, but currently, non-index objects may not. Objects of any type
may also be entirely unrepresented.
As far as the netfs API goes, the netfs is only actually permitted to see
pointers to the cookies. The cookies themselves and any objects attached to
those cookies are hidden from it.
===============================
OBJECT MANAGEMENT STATE MACHINE
===============================
Within FS-Cache, each active object is managed by its own individual state
machine. The state for an object is kept in the fscache_object struct, in
object->state. A cookie may point to a set of objects that are in different
states.
Each state has an action associated with it that is invoked when the machine
wakes up in that state. There are four logical sets of states:
(1) Preparation: states that wait for the parent objects to become ready. The
representations are hierarchical, and it is expected that an object must
be created or accessed with respect to its parent object.
(2) Initialisation: states that perform lookups in the cache and validate
what's found and that create on disk any missing metadata.
(3) Normal running: states that allow netfs operations on objects to proceed
and that update the state of objects.
(4) Termination: states that detach objects from their netfs cookies, that
delete objects from disk, that handle disk and system errors and that free
up in-memory resources.
In most cases, transitioning between states is in response to signalled events.
When a state has finished processing, it will usually set the mask of events in
which it is interested (object->event_mask) and relinquish the worker thread.
Then when an event is raised (by calling fscache_raise_event()), if the event
is not masked, the object will be queued for processing (by calling
fscache_enqueue_object()).
PROVISION OF CPU TIME
---------------------
The work to be done by the various states is given CPU time by the threads of
the slow work facility (see Documentation/slow-work.txt). This is used in
preference to the workqueue facility because:
(1) Threads may be completely occupied for very long periods of time by a
particular work item. These state actions may be doing sequences of
synchronous, journalled disk accesses (lookup, mkdir, create, setxattr,
getxattr, truncate, unlink, rmdir, rename).
(2) Threads may do little actual work, but may rather spend a lot of time
sleeping on I/O. This means that single-threaded and 1-per-CPU-threaded
workqueues don't necessarily have the right numbers of threads.
LOCKING SIMPLIFICATION
----------------------
Because only one worker thread may be operating on any particular object's
state machine at once, this simplifies the locking, particularly with respect
to disconnecting the netfs's representation of a cache object (fscache_cookie)
from the cache backend's representation (fscache_object) - which may be
requested from either end.
=================
THE SET OF STATES
=================
The object state machine has a set of states that it can be in. There are
preparation states in which the object sets itself up and waits for its parent
object to transit to a state that allows access to its children:
(1) State FSCACHE_OBJECT_INIT.
Initialise the object and wait for the parent object to become active. In
the cache, it is expected that it will not be possible to look an object
up from the parent object, until that parent object itself has been looked
up.
There are initialisation states in which the object sets itself up and accesses
disk for the object metadata:
(2) State FSCACHE_OBJECT_LOOKING_UP.
Look up the object on disk, using the parent as a starting point.
FS-Cache expects the cache backend to probe the cache to see whether this
object is represented there, and if it is, to see if it's valid (coherency
management).
The cache should call fscache_object_lookup_negative() to indicate lookup
failure for whatever reason, and should call fscache_obtained_object() to
indicate success.
At the completion of lookup, FS-Cache will let the netfs go ahead with
read operations, no matter whether the file is yet cached. If not yet
cached, read operations will be immediately rejected with ENODATA until
the first known page is uncached - as to that point there can be no data
to be read out of the cache for that file that isn't currently also held
in the pagecache.
(3) State FSCACHE_OBJECT_CREATING.
Create an object on disk, using the parent as a starting point. This
happens if the lookup failed to find the object, or if the object's
coherency data indicated what's on disk is out of date. In this state,
FS-Cache expects the cache to create
The cache should call fscache_obtained_object() if creation completes
successfully, fscache_object_lookup_negative() otherwise.
At the completion of creation, FS-Cache will start processing write
operations the netfs has queued for an object. If creation failed, the
write ops will be transparently discarded, and nothing recorded in the
cache.
There are some normal running states in which the object spends its time
servicing netfs requests:
(4) State FSCACHE_OBJECT_AVAILABLE.
A transient state in which pending operations are started, child objects
are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary
lookup data is freed.
(5) State FSCACHE_OBJECT_ACTIVE.
The normal running state. In this state, requests the netfs makes will be
passed on to the cache.
(6) State FSCACHE_OBJECT_UPDATING.
The state machine comes here to update the object in the cache from the
netfs's records. This involves updating the auxiliary data that is used
to maintain coherency.
And there are terminal states in which an object cleans itself up, deallocates
memory and potentially deletes stuff from disk:
(7) State FSCACHE_OBJECT_LC_DYING.
The object comes here if it is dying because of a lookup or creation
error. This would be due to a disk error or system error of some sort.
Temporary data is cleaned up, and the parent is released.
(8) State FSCACHE_OBJECT_DYING.
The object comes here if it is dying due to an error, because its parent
cookie has been relinquished by the netfs or because the cache is being
withdrawn.
Any child objects waiting on this one are given CPU time so that they too
can destroy themselves. This object waits for all its children to go away
before advancing to the next state.
(9) State FSCACHE_OBJECT_ABORT_INIT.
The object comes to this state if it was waiting on its parent in
FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself
so that the parent may proceed from the FSCACHE_OBJECT_DYING state.
(10) State FSCACHE_OBJECT_RELEASING.
(11) State FSCACHE_OBJECT_RECYCLING.
The object comes to one of these two states when dying once it is rid of
all its children, if it is dying because the netfs relinquished its
cookie. In the first state, the cached data is expected to persist, and
in the second it will be deleted.
(12) State FSCACHE_OBJECT_WITHDRAWING.
The object transits to this state if the cache decides it wants to
withdraw the object from service, perhaps to make space, but also due to
error or just because the whole cache is being withdrawn.
(13) State FSCACHE_OBJECT_DEAD.
The object transits to this state when the in-memory object record is
ready to be deleted. The object processor shouldn't ever see an object in
this state.
THE SET OF EVENTS
-----------------
There are a number of events that can be raised to an object state machine:
(*) FSCACHE_OBJECT_EV_UPDATE
The netfs requested that an object be updated. The state machine will ask
the cache backend to update the object, and the cache backend will ask the
netfs for details of the change through its cookie definition ops.
(*) FSCACHE_OBJECT_EV_CLEARED
This is signalled in two circumstances:
(a) when an object's last child object is dropped and
(b) when the last operation outstanding on an object is completed.
This is used to proceed from the dying state.
(*) FSCACHE_OBJECT_EV_ERROR
This is signalled when an I/O error occurs during the processing of some
object.
(*) FSCACHE_OBJECT_EV_RELEASE
(*) FSCACHE_OBJECT_EV_RETIRE
These are signalled when the netfs relinquishes a cookie it was using.
The event selected depends on whether the netfs asks for the backing
object to be retired (deleted) or retained.
(*) FSCACHE_OBJECT_EV_WITHDRAW
This is signalled when the cache backend wants to withdraw an object.
This means that the object will have to be detached from the netfs's
cookie.
Because the withdrawing releasing/retiring events are all handled by the object
state machine, it doesn't matter if there's a collision with both ends trying
to sever the connection at the same time. The state machine can just pick
which one it wants to honour, and that effects the other.
此差异已折叠。
此差异已折叠。
...@@ -14,6 +14,11 @@ Options ...@@ -14,6 +14,11 @@ Options
When mounting an ext3 filesystem, the following option are accepted: When mounting an ext3 filesystem, the following option are accepted:
(*) == default (*) == default
ro Mount filesystem read only. Note that ext3 will replay
the journal (and thus write to the partition) even when
mounted "read only". Mount options "ro,noload" can be
used to prevent writes to the filesystem.
journal=update Update the ext3 file system's journal to the current journal=update Update the ext3 file system's journal to the current
format. format.
...@@ -27,7 +32,9 @@ journal_dev=devnum When the external journal device's major/minor numbers ...@@ -27,7 +32,9 @@ journal_dev=devnum When the external journal device's major/minor numbers
identified through its new major/minor numbers encoded identified through its new major/minor numbers encoded
in devnum. in devnum.
noload Don't load the journal on mounting. noload Don't load the journal on mounting. Note that this forces
mount of inconsistent filesystem, which can lead to
various problems.
data=journal All data are committed into the journal prior to being data=journal All data are committed into the journal prior to being
written into the main file system. written into the main file system.
...@@ -92,9 +99,12 @@ nocheck ...@@ -92,9 +99,12 @@ nocheck
debug Extra debugging information is sent to syslog. debug Extra debugging information is sent to syslog.
errors=remount-ro(*) Remount the filesystem read-only on an error. errors=remount-ro Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error. errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs. errors=panic Panic and halt the machine if an error occurs.
(These mount options override the errors behavior
specified in the superblock, which can be
configured using tune2fs.)
data_err=ignore(*) Just print an error message if an error occurs data_err=ignore(*) Just print an error message if an error occurs
in a file data buffer in ordered mode. in a file data buffer in ordered mode.
......
...@@ -85,7 +85,7 @@ Note: More extensive information for getting started with ext4 can be ...@@ -85,7 +85,7 @@ Note: More extensive information for getting started with ext4 can be
* extent format more robust in face of on-disk corruption due to magics, * extent format more robust in face of on-disk corruption due to magics,
* internal redundancy in tree * internal redundancy in tree
* improved file allocation (multi-block alloc) * improved file allocation (multi-block alloc)
* fix 32000 subdirectory limit * lift 32000 subdirectory limit imposed by i_links_count[1]
* nsec timestamps for mtime, atime, ctime, create time * nsec timestamps for mtime, atime, ctime, create time
* inode version field on disk (NFSv4, Lustre) * inode version field on disk (NFSv4, Lustre)
* reduced e2fsck time via uninit_bg feature * reduced e2fsck time via uninit_bg feature
...@@ -100,6 +100,9 @@ Note: More extensive information for getting started with ext4 can be ...@@ -100,6 +100,9 @@ Note: More extensive information for getting started with ext4 can be
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force * efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
the ordering) the ordering)
[1] Filesystems with a block size of 1k may see a limit imposed by the
directory hash tree having a maximum depth of two.
2.2 Candidate features for future inclusion 2.2 Candidate features for future inclusion
* Online defrag (patches available but not well tested) * Online defrag (patches available but not well tested)
...@@ -180,8 +183,8 @@ commit=nrsec (*) Ext4 can be told to sync all its data and metadata ...@@ -180,8 +183,8 @@ commit=nrsec (*) Ext4 can be told to sync all its data and metadata
performance. performance.
barrier=<0|1(*)> This enables/disables the use of write barriers in barrier=<0|1(*)> This enables/disables the use of write barriers in
the jbd code. barrier=0 disables, barrier=1 enables. barrier(*) the jbd code. barrier=0 disables, barrier=1 enables.
This also requires an IO stack which can support nobarrier This also requires an IO stack which can support
barriers, and if jbd gets an error on a barrier barriers, and if jbd gets an error on a barrier
write, it will disable again with a warning. write, it will disable again with a warning.
Write barriers enforce proper on-disk ordering Write barriers enforce proper on-disk ordering
...@@ -189,6 +192,9 @@ barrier=<0|1(*)> This enables/disables the use of write barriers in ...@@ -189,6 +192,9 @@ barrier=<0|1(*)> This enables/disables the use of write barriers in
safe to use, at some performance penalty. If safe to use, at some performance penalty. If
your disks are battery-backed in one way or another, your disks are battery-backed in one way or another,
disabling barriers may safely improve performance. disabling barriers may safely improve performance.
The mount options "barrier" and "nobarrier" can
also be used to enable or disable barriers, for
consistency with other ext4 mount options.
inode_readahead=n This tuning parameter controls the maximum inode_readahead=n This tuning parameter controls the maximum
number of inode table blocks that ext4's inode number of inode table blocks that ext4's inode
...@@ -310,6 +316,24 @@ journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the ...@@ -310,6 +316,24 @@ journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the
a slightly higher priority than the default I/O a slightly higher priority than the default I/O
priority. priority.
auto_da_alloc(*) Many broken applications don't use fsync() when
noauto_da_alloc replacing existing files via patterns such as
fd = open("foo.new")/write(fd,..)/close(fd)/
rename("foo.new", "foo"), or worse yet,
fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
If auto_da_alloc is enabled, ext4 will detect
the replace-via-rename and replace-via-truncate
patterns and force that any delayed allocation
blocks are allocated such that at the next
journal commit, in the default data=ordered
mode, the data blocks of the new file are forced
to disk before the rename() operation is
commited. This provides roughly the same level
of guarantees as ext3, and avoids the
"zero-length" problem that can happen when a
system crashes before the delayed allocation
blocks are forced to disk.
Data Mode Data Mode
========= =========
There are 3 different data modes: There are 3 different data modes:
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -12,6 +12,7 @@ that support it. For example, a given bus might look like this: ...@@ -12,6 +12,7 @@ that support it. For example, a given bus might look like this:
| |-- enable | |-- enable
| |-- irq | |-- irq
| |-- local_cpus | |-- local_cpus
| |-- remove
| |-- resource | |-- resource
| |-- resource0 | |-- resource0
| |-- resource1 | |-- resource1
...@@ -36,6 +37,7 @@ files, each with their own function. ...@@ -36,6 +37,7 @@ files, each with their own function.
enable Whether the device is enabled (ascii, rw) enable Whether the device is enabled (ascii, rw)
irq IRQ number (ascii, ro) irq IRQ number (ascii, ro)
local_cpus nearby CPU mask (cpumask, ro) local_cpus nearby CPU mask (cpumask, ro)
remove remove device from kernel's list (ascii, wo)
resource PCI resource host addresses (ascii, ro) resource PCI resource host addresses (ascii, ro)
resource0..N PCI resource N, if present (binary, mmap) resource0..N PCI resource N, if present (binary, mmap)
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap) resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
...@@ -46,6 +48,7 @@ files, each with their own function. ...@@ -46,6 +48,7 @@ files, each with their own function.
ro - read only file ro - read only file
rw - file is readable and writable rw - file is readable and writable
wo - write only file
mmap - file is mmapable mmap - file is mmapable
ascii - file contains ascii text ascii - file contains ascii text
binary - file contains binary data binary - file contains binary data
...@@ -73,6 +76,13 @@ that the device must be enabled for a rom read to return data succesfully. ...@@ -73,6 +76,13 @@ that the device must be enabled for a rom read to return data succesfully.
In the event a driver is not bound to the device, it can be enabled using the In the event a driver is not bound to the device, it can be enabled using the
'enable' file, documented above. 'enable' file, documented above.
The 'remove' file is used to remove the PCI device, by writing a non-zero
integer to the file. This does not involve any kind of hot-plug functionality,
e.g. powering off the device. The device is removed from the kernel's list of
PCI devices, the sysfs directory for it is removed, and the device will be
removed from any drivers attached to it. Removal of PCI root buses is
disallowed.
Accessing legacy resources through sysfs Accessing legacy resources through sysfs
---------------------------------------- ----------------------------------------
......
...@@ -24,6 +24,8 @@ The following mount options are supported: ...@@ -24,6 +24,8 @@ The following mount options are supported:
gid= Set the default group. gid= Set the default group.
umask= Set the default umask. umask= Set the default umask.
mode= Set the default file permissions.
dmode= Set the default directory permissions.
uid= Set the default user. uid= Set the default user.
bs= Set the block size. bs= Set the block size.
unhide Show otherwise hidden files. unhide Show otherwise hidden files.
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -42,7 +42,7 @@ Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qe ...@@ -42,7 +42,7 @@ Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qe
hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg
you can get the firmware's binary in the directory of efi-vfirmware.hg/binaries. you can get the firmware's binary in the directory of efi-vfirmware.hg/binaries.
(3) Rename the firware you owned to Flash.fd, and copy it to /usr/local/share/qemu (3) Rename the firmware you owned to Flash.fd, and copy it to /usr/local/share/qemu
4. Boot up Linux or Windows guests: 4. Boot up Linux or Windows guests:
4.1 Create or install a image for guest boot. If you have xen experience, it should be easy. 4.1 Create or install a image for guest boot. If you have xen experience, it should be easy.
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册