提交 214e2ca2 编写于 作者: M Mauro Carvalho Chehab

Merge tag 'v3.7-rc1' into staging/for_v3.8

Linux 3.7-rc1

* tag 'v3.7-rc1': (9579 commits)
  Linux 3.7-rc1
  x86, boot: Explicitly include autoconf.h for hostprogs
  perf: Fix UAPI fallout
  ARM: config: make sure that platforms are ordered by option string
  ARM: config: sort select statements alphanumerically
  UAPI: (Scripted) Disintegrate include/linux/byteorder
  UAPI: (Scripted) Disintegrate include/linux
  UAPI: Unexport linux/blk_types.h
  UAPI: Unexport part of linux/ppp-comp.h
  perf: Handle new rbtree implementation
  procfs: don't need a PATH_MAX allocation to hold a string representation of an int
  vfs: embed struct filename inside of names_cache allocation if possible
  audit: make audit_inode take struct filename
  vfs: make path_openat take a struct filename pointer
  vfs: turn do_path_lookup into wrapper around struct filename variant
  audit: allow audit code to satisfy getname requests from its names_list
  vfs: define struct filename and have getname() return it
  btrfs: Fix compilation with user namespace support enabled
  userns: Fix posix_acl_file_xattr_userns gid conversion
  userns: Properly print bluetooth socket uids
  ...

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
...@@ -14,6 +14,10 @@ ...@@ -14,6 +14,10 @@
*.o.* *.o.*
*.a *.a
*.s *.s
*.ko.unsigned
*.ko.stripped
*.ko.stripped.dig
*.ko.stripped.sig
*.ko *.ko
*.so *.so
*.so.dbg *.so.dbg
...@@ -84,3 +88,13 @@ GTAGS ...@@ -84,3 +88,13 @@ GTAGS
*.orig *.orig
*~ *~
\#*# \#*#
#
# Leavings from module signing
#
extra_certificates
signing_key.priv
signing_key.x509
signing_key.x509.keyid
signing_key.x509.signer
x509.genkey
...@@ -270,8 +270,6 @@ preempt-locking.txt ...@@ -270,8 +270,6 @@ preempt-locking.txt
- info on locking under a preemptive kernel. - info on locking under a preemptive kernel.
printk-formats.txt printk-formats.txt
- how to get printk format specifiers right - how to get printk format specifiers right
prio_tree.txt
- info on radix-priority-search-tree use for indexing vmas.
ramoops.txt ramoops.txt
- documentation of the ramoops oops/panic logging module. - documentation of the ramoops oops/panic logging module.
rbtree.txt rbtree.txt
......
What: /proc/<pid>/oom_adj
When: August 2012
Why: /proc/<pid>/oom_adj allows userspace to influence the oom killer's
badness heuristic used to determine which task to kill when the kernel
is out of memory.
The badness heuristic has since been rewritten since the introduction of
this tunable such that its meaning is deprecated. The value was
implemented as a bitshift on a score generated by the badness()
function that did not have any precise units of measure. With the
rewrite, the score is given as a proportion of available memory to the
task allocating pages, so using a bitshift which grows the score
exponentially is, thus, impossible to tune with fine granularity.
A much more powerful interface, /proc/<pid>/oom_score_adj, was
introduced with the oom killer rewrite that allows users to increase or
decrease the badness score linearly. This interface will replace
/proc/<pid>/oom_adj.
A warning will be emitted to the kernel log if an application uses this
deprecated interface. After it is printed once, future warnings will be
suppressed until the kernel is rebooted.
...@@ -12,11 +12,14 @@ Description: ...@@ -12,11 +12,14 @@ Description:
then closing the file. The new policy takes effect after then closing the file. The new policy takes effect after
the file ima/policy is closed. the file ima/policy is closed.
IMA appraisal, if configured, uses these file measurements
for local measurement appraisal.
rule format: action [condition ...] rule format: action [condition ...]
action: measure | dont_measure action: measure | dont_measure | appraise | dont_appraise | audit
condition:= base | lsm condition:= base | lsm
base: [[func=] [mask=] [fsmagic=] [uid=]] base: [[func=] [mask=] [fsmagic=] [uid=] [fowner]]
lsm: [[subj_user=] [subj_role=] [subj_type=] lsm: [[subj_user=] [subj_role=] [subj_type=]
[obj_user=] [obj_role=] [obj_type=]] [obj_user=] [obj_role=] [obj_type=]]
...@@ -24,36 +27,50 @@ Description: ...@@ -24,36 +27,50 @@ Description:
mask:= [MAY_READ] [MAY_WRITE] [MAY_APPEND] [MAY_EXEC] mask:= [MAY_READ] [MAY_WRITE] [MAY_APPEND] [MAY_EXEC]
fsmagic:= hex value fsmagic:= hex value
uid:= decimal value uid:= decimal value
fowner:=decimal value
lsm: are LSM specific lsm: are LSM specific
default policy: default policy:
# PROC_SUPER_MAGIC # PROC_SUPER_MAGIC
dont_measure fsmagic=0x9fa0 dont_measure fsmagic=0x9fa0
dont_appraise fsmagic=0x9fa0
# SYSFS_MAGIC # SYSFS_MAGIC
dont_measure fsmagic=0x62656572 dont_measure fsmagic=0x62656572
dont_appraise fsmagic=0x62656572
# DEBUGFS_MAGIC # DEBUGFS_MAGIC
dont_measure fsmagic=0x64626720 dont_measure fsmagic=0x64626720
dont_appraise fsmagic=0x64626720
# TMPFS_MAGIC # TMPFS_MAGIC
dont_measure fsmagic=0x01021994 dont_measure fsmagic=0x01021994
dont_appraise fsmagic=0x01021994
# RAMFS_MAGIC
dont_measure fsmagic=0x858458f6
dont_appraise fsmagic=0x858458f6
# SECURITYFS_MAGIC # SECURITYFS_MAGIC
dont_measure fsmagic=0x73636673 dont_measure fsmagic=0x73636673
dont_appraise fsmagic=0x73636673
measure func=BPRM_CHECK measure func=BPRM_CHECK
measure func=FILE_MMAP mask=MAY_EXEC measure func=FILE_MMAP mask=MAY_EXEC
measure func=FILE_CHECK mask=MAY_READ uid=0 measure func=FILE_CHECK mask=MAY_READ uid=0
appraise fowner=0
The default policy measures all executables in bprm_check, The default policy measures all executables in bprm_check,
all files mmapped executable in file_mmap, and all files all files mmapped executable in file_mmap, and all files
open for read by root in do_filp_open. open for read by root in do_filp_open. The default appraisal
policy appraises all files owned by root.
Examples of LSM specific definitions: Examples of LSM specific definitions:
SELinux: SELinux:
# SELINUX_MAGIC # SELINUX_MAGIC
dont_measure fsmagic=0xF97CFF8C dont_measure fsmagic=0xf97cff8c
dont_appraise fsmagic=0xf97cff8c
dont_measure obj_type=var_log_t dont_measure obj_type=var_log_t
dont_appraise obj_type=var_log_t
dont_measure obj_type=auditd_log_t dont_measure obj_type=auditd_log_t
dont_appraise obj_type=auditd_log_t
measure subj_user=system_u func=FILE_CHECK mask=MAY_READ measure subj_user=system_u func=FILE_CHECK mask=MAY_READ
measure subj_role=system_r func=FILE_CHECK mask=MAY_READ measure subj_role=system_r func=FILE_CHECK mask=MAY_READ
......
...@@ -206,3 +206,17 @@ Description: ...@@ -206,3 +206,17 @@ Description:
when a discarded area is read the discard_zeroes_data when a discarded area is read the discard_zeroes_data
parameter will be set to one. Otherwise it will be 0 and parameter will be set to one. Otherwise it will be 0 and
the result of reading a discarded area is undefined. the result of reading a discarded area is undefined.
What: /sys/block/<disk>/queue/write_same_max_bytes
Date: January 2012
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Some devices support a write same operation in which a
single data block can be written to a range of several
contiguous blocks on storage. This can be used to wipe
areas on disk or to initialize drives in a RAID
configuration. write_same_max_bytes indicates how many
bytes can be written in a single write same command. If
write_same_max_bytes is 0, write same is not supported
by the device.
...@@ -9,19 +9,19 @@ Attributes: ...@@ -9,19 +9,19 @@ Attributes:
this value will change the dev_loss_tmo for all this value will change the dev_loss_tmo for all
FCFs discovered by this controller. FCFs discovered by this controller.
lesb_link_fail: Link Error Status Block (LESB) link failure count. lesb/link_fail: Link Error Status Block (LESB) link failure count.
lesb_vlink_fail: Link Error Status Block (LESB) virtual link lesb/vlink_fail: Link Error Status Block (LESB) virtual link
failure count. failure count.
lesb_miss_fka: Link Error Status Block (LESB) missed FCoE lesb/miss_fka: Link Error Status Block (LESB) missed FCoE
Initialization Protocol (FIP) Keep-Alives (FKA). Initialization Protocol (FIP) Keep-Alives (FKA).
lesb_symb_err: Link Error Status Block (LESB) symbolic error count. lesb/symb_err: Link Error Status Block (LESB) symbolic error count.
lesb_err_block: Link Error Status Block (LESB) block error count. lesb/err_block: Link Error Status Block (LESB) block error count.
lesb_fcs_error: Link Error Status Block (LESB) Fibre Channel lesb/fcs_error: Link Error Status Block (LESB) Fibre Channel
Serivces error count. Serivces error count.
Notes: ctlr_X (global increment starting at 0) Notes: ctlr_X (global increment starting at 0)
......
...@@ -25,6 +25,10 @@ client_id ...@@ -25,6 +25,10 @@ client_id
The ceph unique client id that was assigned for this specific session. The ceph unique client id that was assigned for this specific session.
features
A hexadecimal encoding of the feature bits for this image.
major major
The block device major number. The block device major number.
...@@ -33,6 +37,11 @@ name ...@@ -33,6 +37,11 @@ name
The name of the rbd image. The name of the rbd image.
image_id
The unique id for the rbd image. (For rbd image format 1
this is empty.)
pool pool
The name of the storage pool where this rbd image resides. The name of the storage pool where this rbd image resides.
...@@ -57,12 +66,6 @@ current_snap ...@@ -57,12 +66,6 @@ current_snap
The current snapshot for which the device is mapped. The current snapshot for which the device is mapped.
create_snap
Create a snapshot:
$ echo <snap-name> > /sys/bus/rbd/devices/<dev-id>/snap_create
snap_* snap_*
A directory per each snapshot A directory per each snapshot
...@@ -79,4 +82,7 @@ snap_size ...@@ -79,4 +82,7 @@ snap_size
The size of the image when this snapshot was taken. The size of the image when this snapshot was taken.
snap_features
A hexadecimal encoding of the feature bits for this snapshot.
...@@ -220,3 +220,10 @@ Description: ...@@ -220,3 +220,10 @@ Description:
If the device doesn't support LTM, the file will read "no". If the device doesn't support LTM, the file will read "no".
The file will be present for all speeds of USB devices, and will The file will be present for all speeds of USB devices, and will
always read "no" for USB 1.1 and USB 2.0 devices. always read "no" for USB 1.1 and USB 2.0 devices.
What: /sys/bus/usb/devices/.../(hub interface)/portX
Date: August 2012
Contact: Lan Tianyu <tianyu.lan@intel.com>
Description:
The /sys/bus/usb/devices/.../(hub interface)/portX
is usb port device's sysfs directory.
...@@ -13,7 +13,7 @@ Description: ...@@ -13,7 +13,7 @@ Description:
accessory cables have such capability. For example, accessory cables have such capability. For example,
the 30-pin port of Nuri board (/arch/arm/mach-exynos) the 30-pin port of Nuri board (/arch/arm/mach-exynos)
may have both HDMI and Charger attached, or analog audio, may have both HDMI and Charger attached, or analog audio,
video, and USB cables attached simulteneously. video, and USB cables attached simultaneously.
If there are cables mutually exclusive with each other, If there are cables mutually exclusive with each other,
such binary relations may be expressed with extcon_dev's such binary relations may be expressed with extcon_dev's
...@@ -35,7 +35,7 @@ Description: ...@@ -35,7 +35,7 @@ Description:
The /sys/class/extcon/.../state shows and stores the cable The /sys/class/extcon/.../state shows and stores the cable
attach/detach information of the corresponding extcon object. attach/detach information of the corresponding extcon object.
If the extcon object has an optional callback "show_state" If the extcon object has an optional callback "show_state"
defined, the showing function is overriden with the optional defined, the showing function is overridden with the optional
callback. callback.
If the default callback for showing function is used, the If the default callback for showing function is used, the
...@@ -46,19 +46,19 @@ Description: ...@@ -46,19 +46,19 @@ Description:
TA=1 TA=1
EAR_JACK=0 EAR_JACK=0
# #
In this example, the extcon device have USB_OTG and TA In this example, the extcon device has USB_OTG and TA
cables attached and HDMI and EAR_JACK cables detached. cables attached and HDMI and EAR_JACK cables detached.
In order to update the state of an extcon device, enter a hex In order to update the state of an extcon device, enter a hex
state number starting with 0x. state number starting with 0x:
echo 0xHEX > state # echo 0xHEX > state
This updates the whole state of the extcon dev. This updates the whole state of the extcon device.
Inputs of all the methods are required to meet the Inputs of all the methods are required to meet the
mutually_exclusive contidions if they exist. mutually_exclusive conditions if they exist.
It is recommended to use this "global" state interface if It is recommended to use this "global" state interface if
you need to enter the value atomically. The later state you need to set the value atomically. The later state
interface associated with each cable cannot update interface associated with each cable cannot update
multiple cable states of an extcon device simultaneously. multiple cable states of an extcon device simultaneously.
...@@ -73,7 +73,7 @@ What: /sys/class/extcon/.../cable.x/state ...@@ -73,7 +73,7 @@ What: /sys/class/extcon/.../cable.x/state
Date: February 2012 Date: February 2012
Contact: MyungJoo Ham <myungjoo.ham@samsung.com> Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
Description: Description:
The /sys/class/extcon/.../cable.x/name shows and stores the The /sys/class/extcon/.../cable.x/state shows and stores the
state of cable "x" (integer between 0 and 31) of an extcon state of cable "x" (integer between 0 and 31) of an extcon
device. The state value is either 0 (detached) or 1 device. The state value is either 0 (detached) or 1
(attached). (attached).
...@@ -83,8 +83,8 @@ Date: December 2011 ...@@ -83,8 +83,8 @@ Date: December 2011
Contact: MyungJoo Ham <myungjoo.ham@samsung.com> Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
Description: Description:
Shows the relations of mutually exclusiveness. For example, Shows the relations of mutually exclusiveness. For example,
if the mutually_exclusive array of extcon_dev is if the mutually_exclusive array of extcon device is
{0x3, 0x5, 0xC, 0x0}, the, the output is: {0x3, 0x5, 0xC, 0x0}, then the output is:
# ls mutually_exclusive/ # ls mutually_exclusive/
0x3 0x3
0x5 0x5
......
...@@ -349,3 +349,24 @@ Description: ...@@ -349,3 +349,24 @@ Description:
This will be one of the same strings reported by This will be one of the same strings reported by
the "state" attribute. the "state" attribute.
What: /sys/class/regulator/.../bypass
Date: September 2012
KernelVersion: 3.7
Contact: Mark Brown <broonie@opensource.wolfsonmicro.com>
Description:
Some regulator directories will contain a field called
bypass. This indicates if the device is in bypass mode.
This will be one of the following strings:
'enabled'
'disabled'
'unknown'
'enabled' means the regulator is in bypass mode.
'disabled' means that the regulator is regulating.
'unknown' means software cannot determine the state, or
the reported state is invalid.
What: /sys/devices/.../firmware_node/
Date: September 2012
Contact: <>
Description:
The /sys/devices/.../firmware_node directory contains attributes
allowing the user space to check and modify some firmware
related properties of given device.
What: /sys/devices/.../firmware_node/description
Date: September 2012
Contact: Lance Ortiz <lance.ortiz@hp.com>
Description:
The /sys/devices/.../firmware/description attribute contains a string
that describes the device as provided by the _STR method in the ACPI
namespace. This attribute is read-only. If the device does not have
an _STR method associated with it in the ACPI namespace, this
attribute is not present.
...@@ -176,3 +176,14 @@ Description: Disable L3 cache indices ...@@ -176,3 +176,14 @@ Description: Disable L3 cache indices
All AMD processors with L3 caches provide this functionality. All AMD processors with L3 caches provide this functionality.
For details, see BKDGs at For details, see BKDGs at
http://developer.amd.com/documentation/guides/Pages/default.aspx http://developer.amd.com/documentation/guides/Pages/default.aspx
What: /sys/devices/system/cpu/cpufreq/boost
Date: August 2012
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Processor frequency boosting control
This switch controls the boost setting for the whole system.
Boosting allows the CPU and the firmware to run at a frequency
beyound it's nominal limit.
More details can be found in Documentation/cpu-freq/boost.txt
What: /sys/devices/pnp0/<bus-num>/ppi/
Date: August 2012
Kernel Version: 3.6
Contact: xiaoyan.zhang@intel.com
Description:
This folder includes the attributes related with PPI (Physical
Presence Interface). Only if TPM is supported by BIOS, this
folder makes sence. The folder path can be got by command
'find /sys/ -name 'pcrs''. For the detail information of PPI,
please refer to the PPI specification from
http://www.trustedcomputinggroup.org/
What: /sys/devices/pnp0/<bus-num>/ppi/version
Date: August 2012
Contact: xiaoyan.zhang@intel.com
Description:
This attribute shows the version of the PPI supported by the
platform.
This file is readonly.
What: /sys/devices/pnp0/<bus-num>/ppi/request
Date: August 2012
Contact: xiaoyan.zhang@intel.com
Description:
This attribute shows the request for an operation to be
executed in the pre-OS environment. It is the only input from
the OS to the pre-OS environment. The request should be an
integer value range from 1 to 160, and 0 means no request.
This file can be read and written.
What: /sys/devices/pnp0/00:<bus-num>/ppi/response
Date: August 2012
Contact: xiaoyan.zhang@intel.com
Description:
This attribute shows the response to the most recent operation
request it acted upon. The format is "<request> <response num>
: <response description>".
This file is readonly.
What: /sys/devices/pnp0/<bus-num>/ppi/transition_action
Date: August 2012
Contact: xiaoyan.zhang@intel.com
Description:
This attribute shows the platform-specific action that should
take place in order to transition to the BIOS for execution of
a requested operation. The format is "<action num>: <action
description>".
This file is readonly.
What: /sys/devices/pnp0/<bus-num>/ppi/tcg_operations
Date: August 2012
Contact: xiaoyan.zhang@intel.com
Description:
This attribute shows whether it is allowed to request an
operation to be executed in the pre-OS environment by the BIOS
for the requests defined by TCG, i.e. requests from 1 to 22.
The format is "<request> <status num>: <status description>".
This attribute is only supported by PPI version 1.2+.
This file is readonly.
What: /sys/devices/pnp0/<bus-num>/ppi/vs_operations
Date: August 2012
Contact: xiaoyan.zhang@intel.com
Description:
This attribute shows whether it is allowed to request an
operation to be executed in the pre-OS environment by the BIOS
for the verdor specific requests, i.e. requests from 128 to
255. The format is same with tcg_operations. This attribute
is also only supported by PPI version 1.2+.
This file is readonly.
WWhat: /sys/class/hidraw/hidraw*/device/oled*_img
Date: June 2012
Contact: linux-bluetooth@vger.kernel.org
Description:
The /sys/class/hidraw/hidraw*/device/oled*_img files control
OLED mocro displays on Intuos4 Wireless tablet. Accepted image
has to contain 256 bytes (64x32 px 1 bit colour). The format
is the same as PBM image 62x32px without header (64 bits per
horizontal line, 32 lines). An example of setting OLED No. 0:
dd bs=256 count=1 if=img_file of=[path to oled0_img]/oled0_img
The attribute is read only and no local copy of the image is
stored.
What: /sys/class/hidraw/hidraw*/device/speed What: /sys/class/hidraw/hidraw*/device/speed
Date: April 2010 Date: April 2010
Kernel Version: 2.6.35 Kernel Version: 2.6.35
......
...@@ -96,3 +96,16 @@ Contact: "Theodore Ts'o" <tytso@mit.edu> ...@@ -96,3 +96,16 @@ Contact: "Theodore Ts'o" <tytso@mit.edu>
Description: Description:
The maximum number of megabytes the writeback code will The maximum number of megabytes the writeback code will
try to write out before move on to another inode. try to write out before move on to another inode.
What: /sys/fs/ext4/<disk>/extent_max_zeroout_kb
Date: August 2012
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
The maximum number of kilobytes which will be zeroed
out in preference to creating a new uninitialized
extent when manipulating an inode's extent tree. Note
that using a larger value will increase the
variability of time necessary to complete a random
write operation (since a 4k random write might turn
into a much larger write due to the zeroout
operation).
...@@ -19,7 +19,11 @@ Date: September 2010 ...@@ -19,7 +19,11 @@ Date: September 2010
Contact: Richard Cochran <richardcochran@gmail.com> Contact: Richard Cochran <richardcochran@gmail.com>
Description: Description:
This file contains the name of the PTP hardware clock This file contains the name of the PTP hardware clock
as a human readable string. as a human readable string. The purpose of this
attribute is to provide the user with a "friendly
name" and to help distinguish PHY based devices from
MAC based ones. The string does not necessarily have
to be any kind of unique id.
What: /sys/class/ptp/ptpN/max_adjustment What: /sys/class/ptp/ptpN/max_adjustment
Date: September 2010 Date: September 2010
......
...@@ -17,3 +17,12 @@ Description: ...@@ -17,3 +17,12 @@ Description:
device, like 'tty1'. device, like 'tty1'.
The file supports poll() to detect virtual The file supports poll() to detect virtual
console switches. console switches.
What: /sys/class/tty/ttyS0/uartclk
Date: Sep 2012
Contact: Tomas Hlavacek <tmshlvck@gmail.com>
Description:
Shows the current uartclk value associated with the
UART port in serial_core, that is bound to TTY like ttyS0.
uartclk = 16 * baud_base
...@@ -454,6 +454,16 @@ The preferred style for long (multi-line) comments is: ...@@ -454,6 +454,16 @@ The preferred style for long (multi-line) comments is:
* with beginning and ending almost-blank lines. * with beginning and ending almost-blank lines.
*/ */
For files in net/ and drivers/net/ the preferred style for long (multi-line)
comments is a little different.
/* The preferred comment style for files in net/ and drivers/net
* looks like this.
*
* It is nearly the same as the generally preferred comment style,
* but there is no initial almost-blank line.
*/
It's also important to comment data, whether they are basic types or derived It's also important to comment data, whether they are basic types or derived
types. To this end, use just one data declaration per line (no commas for types. To this end, use just one data declaration per line (no commas for
multiple data declarations). This leaves you room for a small comment on each multiple data declarations). This leaves you room for a small comment on each
......
此差异已折叠。
...@@ -1216,8 +1216,6 @@ in this page</entry> ...@@ -1216,8 +1216,6 @@ in this page</entry>
#define NAND_BBT_LASTBLOCK 0x00000010 #define NAND_BBT_LASTBLOCK 0x00000010
/* The bbt is at the given page, else we must scan for the bbt */ /* The bbt is at the given page, else we must scan for the bbt */
#define NAND_BBT_ABSPAGE 0x00000020 #define NAND_BBT_ABSPAGE 0x00000020
/* The bbt is at the given page, else we must scan for the bbt */
#define NAND_BBT_SEARCH 0x00000040
/* bbt is stored per chip on multichip devices */ /* bbt is stored per chip on multichip devices */
#define NAND_BBT_PERCHIP 0x00000080 #define NAND_BBT_PERCHIP 0x00000080
/* bbt has a version counter at offset veroffs */ /* bbt has a version counter at offset veroffs */
......
...@@ -310,6 +310,12 @@ over a rather long period of time, but improvements are always welcome! ...@@ -310,6 +310,12 @@ over a rather long period of time, but improvements are always welcome!
code under the influence of preempt_disable(), you instead code under the influence of preempt_disable(), you instead
need to use synchronize_irq() or synchronize_sched(). need to use synchronize_irq() or synchronize_sched().
This same limitation also applies to synchronize_rcu_bh()
and synchronize_srcu(), as well as to the asynchronous and
expedited forms of the three primitives, namely call_rcu(),
call_rcu_bh(), call_srcu(), synchronize_rcu_expedited(),
synchronize_rcu_bh_expedited(), and synchronize_srcu_expedited().
12. Any lock acquired by an RCU callback must be acquired elsewhere 12. Any lock acquired by an RCU callback must be acquired elsewhere
with softirq disabled, e.g., via spin_lock_irqsave(), with softirq disabled, e.g., via spin_lock_irqsave(),
spin_lock_bh(), etc. Failing to disable irq on a given spin_lock_bh(), etc. Failing to disable irq on a given
......
...@@ -99,7 +99,7 @@ In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is ...@@ -99,7 +99,7 @@ In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
printed: printed:
INFO: rcu_preempt detected stall on CPU INFO: rcu_preempt detected stall on CPU
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer=-1 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending
(t=65000 jiffies) (t=65000 jiffies)
The "(64628 ticks this GP)" indicates that this CPU has taken more The "(64628 ticks this GP)" indicates that this CPU has taken more
...@@ -116,13 +116,13 @@ number between the two "/"s is the value of the nesting, which will ...@@ -116,13 +116,13 @@ number between the two "/"s is the value of the nesting, which will
be a small positive number if in the idle loop and a very large positive be a small positive number if in the idle loop and a very large positive
number (as shown above) otherwise. number (as shown above) otherwise.
For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is
CPU is not in the process of trying to force itself into dyntick-idle not in the process of trying to force itself into dyntick-idle state, the
state, the "." indicates that the CPU has not given up forcing RCU "." indicates that the CPU has not given up forcing RCU into dyntick-idle
into dyntick-idle mode (it would be "H" otherwise), and the "timer=-1" mode (it would be "H" otherwise), and the "timer not pending" indicates
indicates that the CPU has not recented forced RCU into dyntick-idle that the CPU has not recently forced RCU into dyntick-idle mode (it
mode (it would otherwise indicate the number of microseconds remaining would otherwise indicate the number of microseconds remaining in this
in this forced state). forced state).
Multiple Warnings From One Stall Multiple Warnings From One Stall
......
...@@ -333,23 +333,23 @@ o Each element of the form "1/1 0:127 ^0" represents one struct ...@@ -333,23 +333,23 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
The output of "cat rcu/rcu_pending" looks as follows: The output of "cat rcu/rcu_pending" looks as follows:
rcu_sched: rcu_sched:
0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741 0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nn=146741
1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792 1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nn=155792
2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629 2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nn=136629
3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723 3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nn=137723
4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110 4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nn=123110
5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456 5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nn=137456
6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834 6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nn=120834
7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888 7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nn=144888
rcu_bh: rcu_bh:
0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314 0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nn=145314
1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180 1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nn=143180
2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936 2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nn=117936
3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863 3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nn=134863
4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671 4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nn=110671
5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235 5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nn=133235
6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921 6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nn=110921
7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542 7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nn=118542
As always, this is once again split into "rcu_sched" and "rcu_bh" As always, this is once again split into "rcu_sched" and "rcu_bh"
portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
...@@ -377,17 +377,6 @@ o "gpc" is the number of times that an old grace period had ...@@ -377,17 +377,6 @@ o "gpc" is the number of times that an old grace period had
o "gps" is the number of times that a new grace period had started, o "gps" is the number of times that a new grace period had started,
but this CPU was not yet aware of it. but this CPU was not yet aware of it.
o "nf" is the number of times that this CPU suspected that the
current grace period had run for too long, and thus needed to
be forced.
Please note that "forcing" consists of sending resched IPIs
to holdout CPUs. If that CPU really still is in an old RCU
read-side critical section, then we really do have to wait for it.
The assumption behing "forcing" is that the CPU is not still in
an old RCU read-side critical section, but has not yet responded
for some other reason.
o "nn" is the number of times that this CPU needed nothing. Alert o "nn" is the number of times that this CPU needed nothing. Alert
readers will note that the rcu "nn" number for a given CPU very readers will note that the rcu "nn" number for a given CPU very
closely matches the rcu_bh "np" number for that same CPU. This closely matches the rcu_bh "np" number for that same CPU. This
......
...@@ -873,7 +873,7 @@ d. Do you need to treat NMI handlers, hardirq handlers, ...@@ -873,7 +873,7 @@ d. Do you need to treat NMI handlers, hardirq handlers,
and code segments with preemption disabled (whether and code segments with preemption disabled (whether
via preempt_disable(), local_irq_save(), local_bh_disable(), via preempt_disable(), local_irq_save(), local_bh_disable(),
or some other mechanism) as if they were explicit RCU readers? or some other mechanism) as if they were explicit RCU readers?
If so, you need RCU-sched. If so, RCU-sched is the only choice that will work for you.
e. Do you need RCU grace periods to complete even in the face e. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For of softirq monopolization of one or more of the CPUs? For
...@@ -884,7 +884,12 @@ f. Is your workload too update-intensive for normal use of ...@@ -884,7 +884,12 @@ f. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms? RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_DESTROY_BY_RCU. But please be careful! If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
g. Otherwise, use RCU. g. Do you need read-side critical sections that are respected
even though they are in the middle of the idle loop, during
user-mode execution, or on an offlined CPU? If so, SRCU is the
only choice that will work for you.
h. Otherwise, use RCU.
Of course, this all assumes that you have determined that RCU is in fact Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job. the right tool for your job.
......
...@@ -98,10 +98,9 @@ static int create_nl_socket(int protocol) ...@@ -98,10 +98,9 @@ static int create_nl_socket(int protocol)
if (rcvbufsz) if (rcvbufsz)
if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
&rcvbufsz, sizeof(rcvbufsz)) < 0) { &rcvbufsz, sizeof(rcvbufsz)) < 0) {
fprintf(stderr, "Unable to set socket rcv buf size " fprintf(stderr, "Unable to set socket rcv buf size to %d\n",
"to %d\n",
rcvbufsz); rcvbufsz);
return -1; goto error;
} }
memset(&local, 0, sizeof(local)); memset(&local, 0, sizeof(local));
......
The EtherDrive (R) HOWTO for users of 2.6 kernels is found at ... ATA over Ethernet is a network protocol that provides simple access to
block storage on the LAN.
http://www.coraid.com/SUPPORT/EtherDrive-HBA http://support.coraid.com/documents/AoEr11.txt
It has many tips and hints! The EtherDrive (R) HOWTO for 2.6 and 3.x kernels is found at ...
http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html
It has many tips and hints! Please see, especially, recommended
tunings for virtual memory:
http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19
The aoetools are userland programs that are designed to work with this The aoetools are userland programs that are designed to work with this
driver. The aoetools are on sourceforge. driver. The aoetools are on sourceforge.
...@@ -23,20 +31,12 @@ CREATING DEVICE NODES ...@@ -23,20 +31,12 @@ CREATING DEVICE NODES
There is a udev-install.sh script that shows how to install these There is a udev-install.sh script that shows how to install these
rules on your system. rules on your system.
If you are not using udev, two scripts are provided in
Documentation/aoe as examples of static device node creation for
using the aoe driver.
rm -rf /dev/etherd
sh Documentation/aoe/mkdevs.sh /dev/etherd
... or to make just one shelf's worth of block device nodes ...
sh Documentation/aoe/mkshelf.sh /dev/etherd 0
There is also an autoload script that shows how to edit There is also an autoload script that shows how to edit
/etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when /etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when
necessary. necessary. Preloading the aoe module is preferable to autoloading,
however, because AoE discovery takes a few seconds. It can be
confusing when an AoE device is not present the first time the a
command is run but appears a second later.
USING DEVICE NODES USING DEVICE NODES
...@@ -51,9 +51,9 @@ USING DEVICE NODES ...@@ -51,9 +51,9 @@ USING DEVICE NODES
"echo > /dev/etherd/discover" tells the driver to find out what AoE "echo > /dev/etherd/discover" tells the driver to find out what AoE
devices are available. devices are available.
These character devices may disappear and be replaced by sysfs In the future these character devices may disappear and be replaced
counterparts. Using the commands in aoetools insulates users from by sysfs counterparts. Using the commands in aoetools insulates
these implementation details. users from these implementation details.
The block devices are named like this: The block devices are named like this:
...@@ -76,8 +76,8 @@ USING SYSFS ...@@ -76,8 +76,8 @@ USING SYSFS
The netif attribute is the network interface on the localhost The netif attribute is the network interface on the localhost
through which we are communicating with the remote AoE device. through which we are communicating with the remote AoE device.
There is a script in this directory that formats this information There is a script in this directory that formats this information in
in a convenient way. Users with aoetools can use the aoe-stat a convenient way. Users with aoetools should use the aoe-stat
command. command.
root@makki root# sh Documentation/aoe/status.sh root@makki root# sh Documentation/aoe/status.sh
...@@ -121,3 +121,21 @@ DRIVER OPTIONS ...@@ -121,3 +121,21 @@ DRIVER OPTIONS
usage example for the module parameter. usage example for the module parameter.
modprobe aoe_iflist="eth1 eth3" modprobe aoe_iflist="eth1 eth3"
The aoe_deadsecs module parameter determines the maximum number of
seconds that the driver will wait for an AoE device to provide a
response to an AoE command. After aoe_deadsecs seconds have
elapsed, the AoE device will be marked as "down".
The aoe_maxout module parameter has a default of 128. This is the
maximum number of unresponded packets that will be sent to an AoE
target at one time.
The aoe_dyndevs module parameter defaults to 1, meaning that the
driver will assign a block device minor number to a discovered AoE
target based on the order of its discovery. With dynamic minor
device numbers in use, a greater range of AoE shelf and slot
addresses can be supported. Users with udev will never have to
think about minor numbers. Using aoe_dyndevs=0 allows device nodes
to be pre-created using a static minor-number scheme with the
aoe-mkshelf script in the aoetools.
#!/bin/sh
n_shelves=${n_shelves:-10}
n_partitions=${n_partitions:-16}
if test "$#" != "1"; then
echo "Usage: sh `basename $0` {dir}" 1>&2
echo " n_partitions=16 sh `basename $0` {dir}" 1>&2
exit 1
fi
dir=$1
MAJOR=152
echo "Creating AoE devnode files in $dir ..."
set -e
mkdir -p $dir
# (Status info is in sysfs. See status.sh.)
# rm -f $dir/stat
# mknod -m 0400 $dir/stat c $MAJOR 1
rm -f $dir/err
mknod -m 0400 $dir/err c $MAJOR 2
rm -f $dir/discover
mknod -m 0200 $dir/discover c $MAJOR 3
rm -f $dir/interfaces
mknod -m 0200 $dir/interfaces c $MAJOR 4
rm -f $dir/revalidate
mknod -m 0200 $dir/revalidate c $MAJOR 5
rm -f $dir/flush
mknod -m 0200 $dir/flush c $MAJOR 6
export n_partitions
mkshelf=`echo $0 | sed 's!mkdevs!mkshelf!'`
i=0
while test $i -lt $n_shelves; do
sh -xc "sh $mkshelf $dir $i"
i=`expr $i + 1`
done
#! /bin/sh
if test "$#" != "2"; then
echo "Usage: sh `basename $0` {dir} {shelfaddress}" 1>&2
echo " n_partitions=16 sh `basename $0` {dir} {shelfaddress}" 1>&2
exit 1
fi
n_partitions=${n_partitions:-16}
dir=$1
shelf=$2
nslots=16
maxslot=`echo $nslots 1 - p | dc`
MAJOR=152
set -e
minor=`echo $nslots \* $shelf \* $n_partitions | bc`
endp=`echo $n_partitions - 1 | bc`
for slot in `seq 0 $maxslot`; do
for part in `seq 0 $endp`; do
name=e$shelf.$slot
test "$part" != "0" && name=${name}p$part
rm -f $dir/$name
mknod -m 0660 $dir/$name b $MAJOR $minor
minor=`expr $minor + 1`
done
done
#! /bin/sh #! /bin/sh
# collate and present sysfs information about AoE storage # collate and present sysfs information about AoE storage
#
# A more complete version of this script is aoe-stat, in the
# aoetools.
set -e set -e
format="%8s\t%8s\t%8s\n" format="%8s\t%8s\t%8s\n"
......
...@@ -154,13 +154,33 @@ In either case, the following conditions must be met: ...@@ -154,13 +154,33 @@ In either case, the following conditions must be met:
- CPU mode - CPU mode
All forms of interrupts must be disabled (IRQs and FIQs) All forms of interrupts must be disabled (IRQs and FIQs)
The CPU must be in SVC mode. (A special exception exists for Angel)
For CPUs which do not include the ARM virtualization extensions, the
CPU must be in SVC mode. (A special exception exists for Angel)
CPUs which include support for the virtualization extensions can be
entered in HYP mode in order to enable the kernel to make full use of
these extensions. This is the recommended boot method for such CPUs,
unless the virtualisations are already in use by a pre-installed
hypervisor.
If the kernel is not entered in HYP mode for any reason, it must be
entered in SVC mode.
- Caches, MMUs - Caches, MMUs
The MMU must be off. The MMU must be off.
Instruction cache may be on or off. Instruction cache may be on or off.
Data cache must be off. Data cache must be off.
If the kernel is entered in HYP mode, the above requirements apply to
the HYP mode configuration in addition to the ordinary PL1 (privileged
kernel modes) configuration. In addition, all traps into the
hypervisor must be disabled, and PL1 access must be granted for all
peripherals and CPU resources for which this is architecturally
possible. Except for entering in HYP mode, the system configuration
should be such that a kernel which does not include support for the
virtualization extensions can boot correctly without extra help.
- The boot loader is expected to call the kernel image by jumping - The boot loader is expected to call the kernel image by jumping
directly to the first instruction of the kernel image. directly to the first instruction of the kernel image.
......
ARM Marvell SoCs
================
This document lists all the ARM Marvell SoCs that are currently
supported in mainline by the Linux kernel. As the Marvell families of
SoCs are large and complex, it is hard to understand where the support
for a particular SoC is available in the Linux kernel. This document
tries to help in understanding where those SoCs are supported, and to
match them with their corresponding public datasheet, when available.
Orion family
------------
Flavors:
88F5082
88F5181
88F5181L
88F5182
Datasheet : http://www.embeddedarm.com/documentation/third-party/MV88F5182-datasheet.pdf
Programmer's User Guide : http://www.embeddedarm.com/documentation/third-party/MV88F5182-opensource-manual.pdf
User Manual : http://www.embeddedarm.com/documentation/third-party/MV88F5182-usermanual.pdf
88F5281
Datasheet : http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf
88F6183
Core: Feroceon ARMv5 compatible
Linux kernel mach directory: arch/arm/mach-orion5x
Linux kernel plat directory: arch/arm/plat-orion
Kirkwood family
---------------
Flavors:
88F6282 a.k.a Armada 300
Product Brief : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
88F6283 a.k.a Armada 310
Product Brief : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
88F6190
Product Brief : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6190-003_WEB.pdf
Hardware Spec : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
88F6192
Product Brief : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6192-003_ver1.pdf
Hardware Spec : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
88F6182
88F6180
Product Brief : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6180-003_ver1.pdf
Hardware Spec : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6180_OpenSource.pdf
Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
88F6281
Product Brief : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6281-004_ver1.pdf
Hardware Spec : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6281_OpenSource.pdf
Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
Homepage: http://www.marvell.com/embedded-processors/kirkwood/
Core: Feroceon ARMv5 compatible
Linux kernel mach directory: arch/arm/mach-kirkwood
Linux kernel plat directory: arch/arm/plat-orion
Discovery family
----------------
Flavors:
MV78100
Product Brief : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78100-003_WEB.pdf
Hardware Spec : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78100_OpenSource.pdf
Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
MV78200
Product Brief : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78200-002_WEB.pdf
Hardware Spec : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78200_OpenSource.pdf
Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
MV76100
Not supported by the Linux kernel.
Core: Feroceon ARMv5 compatible
Linux kernel mach directory: arch/arm/mach-mv78xx0
Linux kernel plat directory: arch/arm/plat-orion
EBU Armada family
-----------------
Armada 370 Flavors:
88F6710
88F6707
88F6W11
Armada XP Flavors:
MV78230
MV78260
MV78460
Product Brief: http://www.marvell.com/embedded-processors/armada-xp/assets/Marvell-ArmadaXP-SoC-product%20brief.pdf
No public datasheet available.
Core: Sheeva ARMv7 compatible
Linux kernel mach directory: arch/arm/mach-mvebu
Linux kernel plat directory: none
Avanta family
-------------
Flavors:
88F6510
88F6530P
88F6550
88F6560
Homepage : http://www.marvell.com/broadband/
Product Brief: http://www.marvell.com/broadband/assets/Marvell_Avanta_88F6510_305_060-001_product_brief.pdf
No public datasheet available.
Core: ARMv5 compatible
Linux kernel mach directory: no code in mainline yet, planned for the future
Linux kernel plat directory: no code in mainline yet, planned for the future
Dove family (application processor)
-----------------------------------
Flavors:
88AP510 a.k.a Armada 510
Product Brief : http://www.marvell.com/application-processors/armada-500/assets/Marvell_Armada510_SoC.pdf
Hardware Spec : http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Hardware-Spec.pdf
Functional Spec : http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Functional-Spec.pdf
Homepage: http://www.marvell.com/application-processors/armada-500/
Core: ARMv7 compatible
Directory: arch/arm/mach-dove
PXA 2xx/3xx/93x/95x family
--------------------------
Flavors:
PXA21x, PXA25x, PXA26x
Application processor only
Core: ARMv5 XScale core
PXA270, PXA271, PXA272
Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_pb.pdf
Design guide : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_design_guide.pdf
Developers manual : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_dev_man.pdf
Specification : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_emts.pdf
Specification update : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_spec_update.pdf
Application processor only
Core: ARMv5 XScale core
PXA300, PXA310, PXA320
PXA 300 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA300_PB_R4.pdf
PXA 310 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA310_PB_R4.pdf
PXA 320 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA320_PB_R4.pdf
Design guide : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Design_Guide.pdf
Developers manual : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Developers_Manual.zip
Specifications : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_EMTS.pdf
Specification Update : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Spec_Update.zip
Reference Manual : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_TavorP_BootROM_Ref_Manual.pdf
Application processor only
Core: ARMv5 XScale core
PXA930, PXA935
Application processor with Communication processor
Core: ARMv5 XScale core
PXA955
Application processor with Communication processor
Core: ARMv7 compatible Sheeva PJ4 core
Comments:
* This line of SoCs originates from the XScale family developed by
Intel and acquired by Marvell in ~2006. The PXA21x, PXA25x,
PXA26x, PXA27x, PXA3xx and PXA93x were developed by Intel, while
the later PXA95x were developed by Marvell.
* Due to their XScale origin, these SoCs have virtually nothing in
common with the other (Kirkwood, Dove, etc.) families of Marvell
SoCs, except with the MMP/MMP2 family of SoCs.
Linux kernel mach directory: arch/arm/mach-pxa
Linux kernel plat directory: arch/arm/plat-pxa
MMP/MMP2 family (communication processor)
-----------------------------------------
Flavors:
PXA168, a.k.a Armada 168
Homepage : http://www.marvell.com/application-processors/armada-100/armada-168.jsp
Product brief : http://www.marvell.com/application-processors/armada-100/assets/pxa_168_pb.pdf
Hardware manual : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_datasheet.pdf
Software manual : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_software_manual.pdf
Specification update : http://www.marvell.com/application-processors/armada-100/assets/ARMADA16x_Spec_update.pdf
Boot ROM manual : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_ref_manual.pdf
App node package : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_app_note_package.pdf
Application processor only
Core: ARMv5 compatible Marvell PJ1 (Mohawk)
PXA910
Homepage : http://www.marvell.com/communication-processors/pxa910/
Product Brief : http://www.marvell.com/communication-processors/pxa910/assets/Marvell_PXA910_Platform-001_PB_final.pdf
Application processor with Communication processor
Core: ARMv5 compatible Marvell PJ1 (Mohawk)
MMP2, a.k.a Armada 610
Product Brief : http://www.marvell.com/application-processors/armada-600/assets/armada610_pb.pdf
Application processor only
Core: ARMv7 compatible Sheeva PJ4 core
Comments:
* This line of SoCs originates from the XScale family developed by
Intel and acquired by Marvell in ~2006. All the processors of
this MMP/MMP2 family were developed by Marvell.
* Due to their XScale origin, these SoCs have virtually nothing in
common with the other (Kirkwood, Dove, etc.) families of Marvell
SoCs, except with the PXA family of SoCs listed above.
Linux kernel mach directory: arch/arm/mach-mmp
Linux kernel plat directory: arch/arm/plat-pxa
Long-term plans
---------------
* Unify the mach-dove/, mach-mv78xx0/, mach-orion5x/ and
mach-kirkwood/ into the mach-mvebu/ to support all SoCs from the
Marvell EBU (Engineering Business Unit) in a single mach-<foo>
directory. The plat-orion/ would therefore disappear.
* Unify the mach-mmp/ and mach-pxa/ into the same mach-pxa
directory. The plat-pxa/ would therefore disappear.
Credits
-------
Maen Suleiman <maen@marvell.com>
Lior Amsalem <alior@marvell.com>
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Andrew Lunn <andrew@lunn.ch>
Nicolas Pitre <nico@fluxnic.net>
Eric Miao <eric.y.miao@gmail.com>
S3C2410 GPIO Control S3C24XX GPIO Control
==================== ====================
Introduction Introduction
...@@ -12,7 +12,7 @@ Introduction ...@@ -12,7 +12,7 @@ Introduction
of the s3c2410 GPIO system, please read the Samsung provided of the s3c2410 GPIO system, please read the Samsung provided
data-sheet/users manual to find out the complete list. data-sheet/users manual to find out the complete list.
See Documentation/arm/Samsung/GPIO.txt for the core implemetation. See Documentation/arm/Samsung/GPIO.txt for the core implementation.
GPIOLIB GPIOLIB
...@@ -41,8 +41,8 @@ GPIOLIB ...@@ -41,8 +41,8 @@ GPIOLIB
GPIOLIB conversion GPIOLIB conversion
------------------ ------------------
If you need to convert your board or driver to use gpiolib from the exiting If you need to convert your board or driver to use gpiolib from the phased
s3c2410 api, then here are some notes on the process. out s3c2410 API, then here are some notes on the process.
1) If your board is exclusively using an GPIO, say to control peripheral 1) If your board is exclusively using an GPIO, say to control peripheral
power, then it will require to claim the gpio with gpio_request() before power, then it will require to claim the gpio with gpio_request() before
...@@ -55,7 +55,7 @@ s3c2410 api, then here are some notes on the process. ...@@ -55,7 +55,7 @@ s3c2410 api, then here are some notes on the process.
as they have the same arguments, and can either take the pin specific as they have the same arguments, and can either take the pin specific
values, or the more generic special-function-number arguments. values, or the more generic special-function-number arguments.
3) s3c2410_gpio_pullup() changs have the problem that whilst the 3) s3c2410_gpio_pullup() changes have the problem that whilst the
s3c2410_gpio_pullup(x, 1) can be easily translated to the s3c2410_gpio_pullup(x, 1) can be easily translated to the
s3c_gpio_setpull(x, S3C_GPIO_PULL_NONE), the s3c2410_gpio_pullup(x, 0) s3c_gpio_setpull(x, S3C_GPIO_PULL_NONE), the s3c2410_gpio_pullup(x, 0)
are not so easy. are not so easy.
...@@ -74,7 +74,7 @@ s3c2410 api, then here are some notes on the process. ...@@ -74,7 +74,7 @@ s3c2410 api, then here are some notes on the process.
when using gpio_get_value() on an output pin (s3c2410_gpio_getpin when using gpio_get_value() on an output pin (s3c2410_gpio_getpin
would return the value the pin is supposed to be outputting). would return the value the pin is supposed to be outputting).
6) s3c2410_gpio_getirq() should be directly replacable with the 6) s3c2410_gpio_getirq() should be directly replaceable with the
gpio_to_irq() call. gpio_to_irq() call.
The s3c2410_gpio and gpio_ calls have always operated on the same gpio The s3c2410_gpio and gpio_ calls have always operated on the same gpio
...@@ -105,7 +105,7 @@ PIN Numbers ...@@ -105,7 +105,7 @@ PIN Numbers
----------- -----------
Each pin has an unique number associated with it in regs-gpio.h, Each pin has an unique number associated with it in regs-gpio.h,
eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell e.g. S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
the GPIO functions which pin is to be used. the GPIO functions which pin is to be used.
With the conversion to gpiolib, there is no longer a direct conversion With the conversion to gpiolib, there is no longer a direct conversion
...@@ -120,31 +120,27 @@ Configuring a pin ...@@ -120,31 +120,27 @@ Configuring a pin
The following function allows the configuration of a given pin to The following function allows the configuration of a given pin to
be changed. be changed.
void s3c2410_gpio_cfgpin(unsigned int pin, unsigned int function); void s3c_gpio_cfgpin(unsigned int pin, unsigned int function);
Eg: e.g.:
s3c2410_gpio_cfgpin(S3C2410_GPA(0), S3C2410_GPA0_ADDR0); s3c_gpio_cfgpin(S3C2410_GPA(0), S3C_GPIO_SFN(1));
s3c2410_gpio_cfgpin(S3C2410_GPE(8), S3C2410_GPE8_SDDAT1); s3c_gpio_cfgpin(S3C2410_GPE(8), S3C_GPIO_SFN(2));
which would turn GPA(0) into the lowest Address line A0, and set which would turn GPA(0) into the lowest Address line A0, and set
GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line. GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
The s3c_gpio_cfgpin() call is a functional replacement for this call.
Reading the current configuration Reading the current configuration
--------------------------------- ---------------------------------
The current configuration of a pin can be read by using: The current configuration of a pin can be read by using standard
gpiolib function:
s3c2410_gpio_getcfg(unsigned int pin); s3c_gpio_getcfg(unsigned int pin);
The return value will be from the same set of values which can be The return value will be from the same set of values which can be
passed to s3c2410_gpio_cfgpin(). passed to s3c_gpio_cfgpin().
The s3c_gpio_getcfg() call should be a functional replacement for
this call.
Configuring a pull-up resistor Configuring a pull-up resistor
...@@ -154,61 +150,33 @@ Configuring a pull-up resistor ...@@ -154,61 +150,33 @@ Configuring a pull-up resistor
pull-up resistors enabled. This can be configured by the following pull-up resistors enabled. This can be configured by the following
function: function:
void s3c2410_gpio_pullup(unsigned int pin, unsigned int to); void s3c_gpio_setpull(unsigned int pin, unsigned int to);
Where the to value is zero to set the pull-up off, and 1 to enable
the specified pull-up. Any other values are currently undefined.
The s3c_gpio_setpull() offers similar functionality, but with the
ability to encode whether the pull is up or down. Currently there
is no 'just on' state, so up or down must be selected.
Getting the state of a PIN
--------------------------
The state of a pin can be read by using the function:
unsigned int s3c2410_gpio_getpin(unsigned int pin);
This will return either zero or non-zero. Do not count on this Where the to value is S3C_GPIO_PULL_NONE to set the pull-up off,
function returning 1 if the pin is set. and S3C_GPIO_PULL_UP to enable the specified pull-up. Any other
values are currently undefined.
This call is now implemented by the relevant gpiolib calls, convert
your board or driver to use gpiolib.
Setting the state of a PIN
--------------------------
The value an pin is outputing can be modified by using the following:
void s3c2410_gpio_setpin(unsigned int pin, unsigned int to); Getting and setting the state of a PIN
--------------------------------------
Which sets the given pin to the value. Use 0 to write 0, and 1 to These calls are now implemented by the relevant gpiolib calls, convert
set the output to 1.
This call is now implemented by the relevant gpiolib calls, convert
your board or driver to use gpiolib. your board or driver to use gpiolib.
Getting the IRQ number associated with a PIN Getting the IRQ number associated with a PIN
-------------------------------------------- --------------------------------------------
The following function can map the given pin number to an IRQ A standard gpiolib function can map the given pin number to an IRQ
number to pass to the IRQ system. number to pass to the IRQ system.
int s3c2410_gpio_getirq(unsigned int pin); int gpio_to_irq(unsigned int pin);
Note, not all pins have an IRQ. Note, not all pins have an IRQ.
This call is now implemented by the relevant gpiolib calls, convert
your board or driver to use gpiolib.
Authour Author
------- -------
Ben Dooks, 03 October 2004 Ben Dooks, 03 October 2004
Copyright 2004 Ben Dooks, Simtec Electronics Copyright 2004 Ben Dooks, Simtec Electronics
...@@ -5,14 +5,14 @@ Introduction ...@@ -5,14 +5,14 @@ Introduction
------------ ------------
This outlines the Samsung GPIO implementation and the architecture This outlines the Samsung GPIO implementation and the architecture
specific calls provided alongisde the drivers/gpio core. specific calls provided alongside the drivers/gpio core.
S3C24XX (Legacy) S3C24XX (Legacy)
---------------- ----------------
See Documentation/arm/Samsung-S3C24XX/GPIO.txt for more information See Documentation/arm/Samsung-S3C24XX/GPIO.txt for more information
about these devices. Their implementation is being brought into line about these devices. Their implementation has been brought into line
with the core samsung implementation described in this document. with the core samsung implementation described in this document.
...@@ -29,7 +29,7 @@ GPIO numbering is synchronised between the Samsung and gpiolib system. ...@@ -29,7 +29,7 @@ GPIO numbering is synchronised between the Samsung and gpiolib system.
PIN configuration PIN configuration
----------------- -----------------
Pin configuration is specific to the Samsung architecutre, with each SoC Pin configuration is specific to the Samsung architecture, with each SoC
registering the necessary information for the core gpio configuration registering the necessary information for the core gpio configuration
implementation to configure pins as necessary. implementation to configure pins as necessary.
...@@ -38,5 +38,3 @@ driver or machine to change gpio configuration. ...@@ -38,5 +38,3 @@ driver or machine to change gpio configuration.
See arch/arm/plat-samsung/include/plat/gpio-cfg.h for more information See arch/arm/plat-samsung/include/plat/gpio-cfg.h for more information
on these functions. on these functions.
...@@ -51,6 +51,9 @@ ffc00000 ffefffff DMA memory mapping region. Memory returned ...@@ -51,6 +51,9 @@ ffc00000 ffefffff DMA memory mapping region. Memory returned
ff000000 ffbfffff Reserved for future expansion of DMA ff000000 ffbfffff Reserved for future expansion of DMA
mapping region. mapping region.
fee00000 feffffff Mapping of PCI I/O space. This is a static
mapping within the vmalloc space.
VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space.
Memory returned by vmalloc/ioremap will Memory returned by vmalloc/ioremap will
be dynamically placed in this region. be dynamically placed in this region.
......
Booting AArch64 Linux
=====================
Author: Will Deacon <will.deacon@arm.com>
Date : 07 September 2012
This document is based on the ARM booting document by Russell King and
is relevant to all public releases of the AArch64 Linux kernel.
The AArch64 exception model is made up of a number of exception levels
(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
counterpart. EL2 is the hypervisor level and exists only in non-secure
mode. EL3 is the highest priority level and exists only in secure mode.
For the purposes of this document, we will use the term `boot loader'
simply to define all software that executes on the CPU(s) before control
is passed to the Linux kernel. This may include secure monitor and
hypervisor code, or it may just be a handful of instructions for
preparing a minimal boot environment.
Essentially, the boot loader should provide (as a minimum) the
following:
1. Setup and initialise the RAM
2. Setup the device tree
3. Decompress the kernel image
4. Call the kernel image
1. Setup and initialise RAM
---------------------------
Requirement: MANDATORY
The boot loader is expected to find and initialise all RAM that the
kernel will use for volatile data storage in the system. It performs
this in a machine dependent manner. (It may use internal algorithms
to automatically locate and size all RAM, or it may use knowledge of
the RAM in the machine, or any other method the boot loader designer
sees fit.)
2. Setup the device tree
-------------------------
Requirement: MANDATORY
The device tree blob (dtb) must be no bigger than 2 megabytes in size
and placed at a 2-megabyte boundary within the first 512 megabytes from
the start of the kernel image. This is to allow the kernel to map the
blob using a single section mapping in the initial page tables.
3. Decompress the kernel image
------------------------------
Requirement: OPTIONAL
The AArch64 kernel does not currently provide a decompressor and
therefore requires decompression (gzip etc.) to be performed by the boot
loader if a compressed Image target (e.g. Image.gz) is used. For
bootloaders that do not implement this requirement, the uncompressed
Image target is available instead.
4. Call the kernel image
------------------------
Requirement: MANDATORY
The decompressed kernel image contains a 32-byte header as follows:
u32 magic = 0x14000008; /* branch to stext, little-endian */
u32 res0 = 0; /* reserved */
u64 text_offset; /* Image load offset */
u64 res1 = 0; /* reserved */
u64 res2 = 0; /* reserved */
The image must be placed at the specified offset (currently 0x80000)
from the start of the system RAM and called there. The start of the
system RAM must be aligned to 2MB.
Before jumping into the kernel, the following conditions must be met:
- Quiesce all DMA capable devices so that memory does not get
corrupted by bogus network packets or disk data. This will save
you many hours of debug.
- Primary CPU general-purpose register settings
x0 = physical address of device tree blob (dtb) in system RAM.
x1 = 0 (reserved for future use)
x2 = 0 (reserved for future use)
x3 = 0 (reserved for future use)
- CPU mode
All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
IRQ and FIQ).
The CPU must be in either EL2 (RECOMMENDED in order to have access to
the virtualisation extensions) or non-secure EL1.
- Caches, MMUs
The MMU must be off.
Instruction cache may be on or off.
Data cache must be off and invalidated.
External caches (if present) must be configured and disabled.
- Architected timers
CNTFRQ must be programmed with the timer frequency.
If entering the kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0)
set where available.
- Coherency
All CPUs to be booted by the kernel must be part of the same coherency
domain on entry to the kernel. This may require IMPLEMENTATION DEFINED
initialisation to enable the receiving of maintenance operations on
each CPU.
- System registers
All writable architected system registers at the exception level where
the kernel image will be entered must be initialised by software at a
higher exception level to prevent execution in an UNKNOWN state.
The boot loader is expected to enter the kernel on each CPU in the
following manner:
- The primary CPU must jump directly to the first instruction of the
kernel image. The device tree blob passed by this CPU must contain
for each CPU node:
1. An 'enable-method' property. Currently, the only supported value
for this field is the string "spin-table".
2. A 'cpu-release-addr' property identifying a 64-bit,
zero-initialised memory location.
It is expected that the bootloader will generate these device tree
properties and insert them into the blob prior to kernel entry.
- Any secondary CPUs must spin outside of the kernel in a reserved area
of memory (communicated to the kernel by a /memreserve/ region in the
device tree) polling their cpu-release-addr location, which must be
contained in the reserved region. A wfe instruction may be inserted
to reduce the overhead of the busy-loop and a sev will be issued by
the primary CPU. When a read of the location pointed to by the
cpu-release-addr returns a non-zero value, the CPU must jump directly
to this value.
- Secondary CPU general-purpose register settings
x0 = 0 (reserved for future use)
x1 = 0 (reserved for future use)
x2 = 0 (reserved for future use)
x3 = 0 (reserved for future use)
Memory Layout on AArch64 Linux
==============================
Author: Catalin Marinas <catalin.marinas@arm.com>
Date : 20 February 2012
This document describes the virtual memory layout used by the AArch64
Linux kernel. The architecture allows up to 4 levels of translation
tables with a 4KB page size and up to 3 levels with a 64KB page size.
AArch64 Linux uses 3 levels of translation tables with the 4KB page
configuration, allowing 39-bit (512GB) virtual addresses for both user
and kernel. With 64KB pages, only 2 levels of translation tables are
used but the memory layout is the same.
User addresses have bits 63:39 set to 0 while the kernel addresses have
the same bits set to 1. TTBRx selection is given by bit 63 of the
virtual address. The swapper_pg_dir contains only kernel (global)
mappings while the user pgd contains only user (non-global) mappings.
The swapper_pgd_dir address is written to TTBR1 and never written to
TTBR0.
AArch64 Linux memory layout:
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 0000007fffffffff 512GB user
ffffff8000000000 ffffffbbfffcffff ~240GB vmalloc
ffffffbbfffd0000 ffffffbcfffdffff 64KB [guard page]
ffffffbbfffe0000 ffffffbcfffeffff 64KB PCI I/O space
ffffffbbffff0000 ffffffbcffffffff 64KB [guard page]
ffffffbc00000000 ffffffbdffffffff 8GB vmemmap
ffffffbe00000000 ffffffbffbffffff ~8GB [guard, future vmmemap]
ffffffbffc000000 ffffffbfffffffff 64MB modules
ffffffc000000000 ffffffffffffffff 256GB memory
Translation table lookup with 4KB pages:
+--------+--------+--------+--------+--------+--------+--------+--------+
|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
+--------+--------+--------+--------+--------+--------+--------+--------+
| | | | | |
| | | | | v
| | | | | [11:0] in-page offset
| | | | +-> [20:12] L3 index
| | | +-----------> [29:21] L2 index
| | +---------------------> [38:30] L1 index
| +-------------------------------> [47:39] L0 index (not used)
+-------------------------------------------------> [63] TTBR0/1
Translation table lookup with 64KB pages:
+--------+--------+--------+--------+--------+--------+--------+--------+
|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
+--------+--------+--------+--------+--------+--------+--------+--------+
| | | | |
| | | | v
| | | | [15:0] in-page offset
| | | +----------> [28:16] L3 index
| | +--------------------------> [41:29] L2 index (only 38:29 used)
| +-------------------------------> [47:42] L1 index (not used)
+-------------------------------------------------> [63] TTBR0/1
...@@ -465,7 +465,6 @@ struct bio { ...@@ -465,7 +465,6 @@ struct bio {
bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ bio_end_io_t *bi_end_io; /* bi_end_io (bio) */
atomic_t bi_cnt; /* pin count: free when it hits zero */ atomic_t bi_cnt; /* pin count: free when it hits zero */
void *bi_private; void *bi_private;
bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
}; };
With this multipage bio design: With this multipage bio design:
...@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs, ...@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs,
so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
given size from these slabs. given size from these slabs.
The bi_destructor() routine takes into account the possibility of the bio
having originated from a different source (see later discussions on
n/w to block transfers and kvec_cb)
The bio_get() routine may be used to hold an extra reference on a bio prior The bio_get() routine may be used to hold an extra reference on a bio prior
to i/o submission, if the bio fields are likely to be accessed after the to i/o submission, if the bio fields are likely to be accessed after the
i/o is issued (since the bio may otherwise get freed in case i/o completion i/o is issued (since the bio may otherwise get freed in case i/o completion
......
...@@ -29,7 +29,8 @@ CONTENTS: ...@@ -29,7 +29,8 @@ CONTENTS:
3.1 Overview 3.1 Overview
3.2 Synchronization 3.2 Synchronization
3.3 Subsystem API 3.3 Subsystem API
4. Questions 4. Extended attributes usage
5. Questions
1. Control Groups 1. Control Groups
================= =================
...@@ -62,9 +63,9 @@ an instance of the cgroup virtual filesystem associated with it. ...@@ -62,9 +63,9 @@ an instance of the cgroup virtual filesystem associated with it.
At any one time there may be multiple active hierarchies of task At any one time there may be multiple active hierarchies of task
cgroups. Each hierarchy is a partition of all tasks in the system. cgroups. Each hierarchy is a partition of all tasks in the system.
User level code may create and destroy cgroups by name in an User-level code may create and destroy cgroups by name in an
instance of the cgroup virtual file system, specify and query to instance of the cgroup virtual file system, specify and query to
which cgroup a task is assigned, and list the task pids assigned to which cgroup a task is assigned, and list the task PIDs assigned to
a cgroup. Those creations and assignments only affect the hierarchy a cgroup. Those creations and assignments only affect the hierarchy
associated with that instance of the cgroup file system. associated with that instance of the cgroup file system.
...@@ -72,7 +73,7 @@ On their own, the only use for cgroups is for simple job ...@@ -72,7 +73,7 @@ On their own, the only use for cgroups is for simple job
tracking. The intention is that other subsystems hook into the generic tracking. The intention is that other subsystems hook into the generic
cgroup support to provide new attributes for cgroups, such as cgroup support to provide new attributes for cgroups, such as
accounting/limiting the resources which processes in a cgroup can accounting/limiting the resources which processes in a cgroup can
access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allows access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allow
you to associate a set of CPUs and a set of memory nodes with the you to associate a set of CPUs and a set of memory nodes with the
tasks in each cgroup. tasks in each cgroup.
...@@ -80,11 +81,11 @@ tasks in each cgroup. ...@@ -80,11 +81,11 @@ tasks in each cgroup.
---------------------------- ----------------------------
There are multiple efforts to provide process aggregations in the There are multiple efforts to provide process aggregations in the
Linux kernel, mainly for resource tracking purposes. Such efforts Linux kernel, mainly for resource-tracking purposes. Such efforts
include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
namespaces. These all require the basic notion of a namespaces. These all require the basic notion of a
grouping/partitioning of processes, with newly forked processes ending grouping/partitioning of processes, with newly forked processes ending
in the same group (cgroup) as their parent process. up in the same group (cgroup) as their parent process.
The kernel cgroup patch provides the minimum essential kernel The kernel cgroup patch provides the minimum essential kernel
mechanisms required to efficiently implement such groups. It has mechanisms required to efficiently implement such groups. It has
...@@ -127,14 +128,14 @@ following lines: ...@@ -127,14 +128,14 @@ following lines:
/ \ / \
Professors (15%) students (5%) Professors (15%) students (5%)
Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes
into NFS network class. into the NFS network class.
At the same time Firefox/Lynx will share an appropriate CPU/Memory class At the same time Firefox/Lynx will share an appropriate CPU/Memory class
depending on who launched it (prof/student). depending on who launched it (prof/student).
With the ability to classify tasks differently for different resources With the ability to classify tasks differently for different resources
(by putting those resource subsystems in different hierarchies) then (by putting those resource subsystems in different hierarchies),
the admin can easily set up a script which receives exec notifications the admin can easily set up a script which receives exec notifications
and depending on who is launching the browser he can and depending on who is launching the browser he can
...@@ -145,19 +146,19 @@ a separate cgroup for every browser launched and associate it with ...@@ -145,19 +146,19 @@ a separate cgroup for every browser launched and associate it with
appropriate network and other resource class. This may lead to appropriate network and other resource class. This may lead to
proliferation of such cgroups. proliferation of such cgroups.
Also lets say that the administrator would like to give enhanced network Also let's say that the administrator would like to give enhanced network
access temporarily to a student's browser (since it is night and the user access temporarily to a student's browser (since it is night and the user
wants to do online gaming :)) OR give one of the students simulation wants to do online gaming :)) OR give one of the student's simulation
apps enhanced CPU power, apps enhanced CPU power.
With ability to write pids directly to resource classes, it's just a With ability to write PIDs directly to resource classes, it's just a
matter of : matter of:
# echo pid > /sys/fs/cgroup/network/<new_class>/tasks # echo pid > /sys/fs/cgroup/network/<new_class>/tasks
(after some time) (after some time)
# echo pid > /sys/fs/cgroup/network/<orig_class>/tasks # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
Without this ability, he would have to split the cgroup into Without this ability, the administrator would have to split the cgroup into
multiple separate ones and then associate the new cgroups with the multiple separate ones and then associate the new cgroups with the
new resource classes. new resource classes.
...@@ -184,20 +185,20 @@ Control Groups extends the kernel as follows: ...@@ -184,20 +185,20 @@ Control Groups extends the kernel as follows:
field of each task_struct using the css_set, anchored at field of each task_struct using the css_set, anchored at
css_set->tasks. css_set->tasks.
- A cgroup hierarchy filesystem can be mounted for browsing and - A cgroup hierarchy filesystem can be mounted for browsing and
manipulation from user space. manipulation from user space.
- You can list all the tasks (by pid) attached to any cgroup. - You can list all the tasks (by PID) attached to any cgroup.
The implementation of cgroups requires a few, simple hooks The implementation of cgroups requires a few, simple hooks
into the rest of the kernel, none in performance critical paths: into the rest of the kernel, none in performance-critical paths:
- in init/main.c, to initialize the root cgroups and initial - in init/main.c, to initialize the root cgroups and initial
css_set at system boot. css_set at system boot.
- in fork and exit, to attach and detach a task from its css_set. - in fork and exit, to attach and detach a task from its css_set.
In addition a new file system, of type "cgroup" may be mounted, to In addition, a new file system of type "cgroup" may be mounted, to
enable browsing and modifying the cgroups presently known to the enable browsing and modifying the cgroups presently known to the
kernel. When mounting a cgroup hierarchy, you may specify a kernel. When mounting a cgroup hierarchy, you may specify a
comma-separated list of subsystems to mount as the filesystem mount comma-separated list of subsystems to mount as the filesystem mount
...@@ -230,13 +231,13 @@ as the path relative to the root of the cgroup file system. ...@@ -230,13 +231,13 @@ as the path relative to the root of the cgroup file system.
Each cgroup is represented by a directory in the cgroup file system Each cgroup is represented by a directory in the cgroup file system
containing the following files describing that cgroup: containing the following files describing that cgroup:
- tasks: list of tasks (by pid) attached to that cgroup. This list - tasks: list of tasks (by PID) attached to that cgroup. This list
is not guaranteed to be sorted. Writing a thread id into this file is not guaranteed to be sorted. Writing a thread ID into this file
moves the thread into this cgroup. moves the thread into this cgroup.
- cgroup.procs: list of tgids in the cgroup. This list is not - cgroup.procs: list of thread group IDs in the cgroup. This list is
guaranteed to be sorted or free of duplicate tgids, and userspace not guaranteed to be sorted or free of duplicate TGIDs, and userspace
should sort/uniquify the list if this property is required. should sort/uniquify the list if this property is required.
Writing a thread group id into this file moves all threads in that Writing a thread group ID into this file moves all threads in that
group into this cgroup. group into this cgroup.
- notify_on_release flag: run the release agent on exit? - notify_on_release flag: run the release agent on exit?
- release_agent: the path to use for release notifications (this file - release_agent: the path to use for release notifications (this file
...@@ -261,7 +262,7 @@ cgroup file system directories. ...@@ -261,7 +262,7 @@ cgroup file system directories.
When a task is moved from one cgroup to another, it gets a new When a task is moved from one cgroup to another, it gets a new
css_set pointer - if there's an already existing css_set with the css_set pointer - if there's an already existing css_set with the
desired collection of cgroups then that group is reused, else a new desired collection of cgroups then that group is reused, otherwise a new
css_set is allocated. The appropriate existing css_set is located by css_set is allocated. The appropriate existing css_set is located by
looking into a hash table. looking into a hash table.
...@@ -292,7 +293,7 @@ file system) of the abandoned cgroup. This enables automatic ...@@ -292,7 +293,7 @@ file system) of the abandoned cgroup. This enables automatic
removal of abandoned cgroups. The default value of removal of abandoned cgroups. The default value of
notify_on_release in the root cgroup at system boot is disabled notify_on_release in the root cgroup at system boot is disabled
(0). The default value of other cgroups at creation is the current (0). The default value of other cgroups at creation is the current
value of their parents notify_on_release setting. The default value of value of their parents' notify_on_release settings. The default value of
a cgroup hierarchy's release_agent path is empty. a cgroup hierarchy's release_agent path is empty.
1.5 What does clone_children do ? 1.5 What does clone_children do ?
...@@ -316,7 +317,7 @@ the "cpuset" cgroup subsystem, the steps are something like: ...@@ -316,7 +317,7 @@ the "cpuset" cgroup subsystem, the steps are something like:
4) Create the new cgroup by doing mkdir's and write's (or echo's) in 4) Create the new cgroup by doing mkdir's and write's (or echo's) in
the /sys/fs/cgroup virtual file system. the /sys/fs/cgroup virtual file system.
5) Start a task that will be the "founding father" of the new job. 5) Start a task that will be the "founding father" of the new job.
6) Attach that task to the new cgroup by writing its pid to the 6) Attach that task to the new cgroup by writing its PID to the
/sys/fs/cgroup/cpuset/tasks file for that cgroup. /sys/fs/cgroup/cpuset/tasks file for that cgroup.
7) fork, exec or clone the job tasks from this founding father task. 7) fork, exec or clone the job tasks from this founding father task.
...@@ -344,7 +345,7 @@ and then start a subshell 'sh' in that cgroup: ...@@ -344,7 +345,7 @@ and then start a subshell 'sh' in that cgroup:
2.1 Basic Usage 2.1 Basic Usage
--------------- ---------------
Creating, modifying, using the cgroups can be done through the cgroup Creating, modifying, using cgroups can be done through the cgroup
virtual filesystem. virtual filesystem.
To mount a cgroup hierarchy with all available subsystems, type: To mount a cgroup hierarchy with all available subsystems, type:
...@@ -441,7 +442,7 @@ You can attach the current shell task by echoing 0: ...@@ -441,7 +442,7 @@ You can attach the current shell task by echoing 0:
# echo 0 > tasks # echo 0 > tasks
You can use the cgroup.procs file instead of the tasks file to move all You can use the cgroup.procs file instead of the tasks file to move all
threads in a threadgroup at once. Echoing the pid of any task in a threads in a threadgroup at once. Echoing the PID of any task in a
threadgroup to cgroup.procs causes all tasks in that threadgroup to be threadgroup to cgroup.procs causes all tasks in that threadgroup to be
be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
in the writing task's threadgroup. in the writing task's threadgroup.
...@@ -479,7 +480,7 @@ in /proc/mounts and /proc/<pid>/cgroups. ...@@ -479,7 +480,7 @@ in /proc/mounts and /proc/<pid>/cgroups.
There is mechanism which allows to get notifications about changing There is mechanism which allows to get notifications about changing
status of a cgroup. status of a cgroup.
To register new notification handler you need: To register a new notification handler you need to:
- create a file descriptor for event notification using eventfd(2); - create a file descriptor for event notification using eventfd(2);
- open a control file to be monitored (e.g. memory.usage_in_bytes); - open a control file to be monitored (e.g. memory.usage_in_bytes);
- write "<event_fd> <control_fd> <args>" to cgroup.event_control. - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
...@@ -488,7 +489,7 @@ To register new notification handler you need: ...@@ -488,7 +489,7 @@ To register new notification handler you need:
eventfd will be woken up by control file implementation or when the eventfd will be woken up by control file implementation or when the
cgroup is removed. cgroup is removed.
To unregister notification handler just close eventfd. To unregister a notification handler just close eventfd.
NOTE: Support of notifications should be implemented for the control NOTE: Support of notifications should be implemented for the control
file. See documentation for the subsystem. file. See documentation for the subsystem.
...@@ -502,7 +503,7 @@ file. See documentation for the subsystem. ...@@ -502,7 +503,7 @@ file. See documentation for the subsystem.
Each kernel subsystem that wants to hook into the generic cgroup Each kernel subsystem that wants to hook into the generic cgroup
system needs to create a cgroup_subsys object. This contains system needs to create a cgroup_subsys object. This contains
various methods, which are callbacks from the cgroup system, along various methods, which are callbacks from the cgroup system, along
with a subsystem id which will be assigned by the cgroup system. with a subsystem ID which will be assigned by the cgroup system.
Other fields in the cgroup_subsys object include: Other fields in the cgroup_subsys object include:
...@@ -516,7 +517,7 @@ Other fields in the cgroup_subsys object include: ...@@ -516,7 +517,7 @@ Other fields in the cgroup_subsys object include:
at system boot. at system boot.
Each cgroup object created by the system has an array of pointers, Each cgroup object created by the system has an array of pointers,
indexed by subsystem id; this pointer is entirely managed by the indexed by subsystem ID; this pointer is entirely managed by the
subsystem; the generic cgroup code will never touch this pointer. subsystem; the generic cgroup code will never touch this pointer.
3.2 Synchronization 3.2 Synchronization
...@@ -639,7 +640,7 @@ void post_clone(struct cgroup *cgrp) ...@@ -639,7 +640,7 @@ void post_clone(struct cgroup *cgrp)
Called during cgroup_create() to do any parameter Called during cgroup_create() to do any parameter
initialization which might be required before a task could attach. For initialization which might be required before a task could attach. For
example in cpusets, no task may attach before 'cpus' and 'mems' are set example, in cpusets, no task may attach before 'cpus' and 'mems' are set
up. up.
void bind(struct cgroup *root) void bind(struct cgroup *root)
...@@ -650,7 +651,26 @@ and root cgroup. Currently this will only involve movement between ...@@ -650,7 +651,26 @@ and root cgroup. Currently this will only involve movement between
the default hierarchy (which never has sub-cgroups) and a hierarchy the default hierarchy (which never has sub-cgroups) and a hierarchy
that is being created/destroyed (and hence has no sub-cgroups). that is being created/destroyed (and hence has no sub-cgroups).
4. Questions 4. Extended attribute usage
===========================
cgroup filesystem supports certain types of extended attributes in its
directories and files. The current supported types are:
- Trusted (XATTR_TRUSTED)
- Security (XATTR_SECURITY)
Both require CAP_SYS_ADMIN capability to set.
Like in tmpfs, the extended attributes in cgroup filesystem are stored
using kernel memory and it's advised to keep the usage at minimum. This
is the reason why user defined extended attributes are not supported, since
any user can do it and there's no limit in the value size.
The current known users for this feature are SELinux to limit cgroup usage
in containers and systemd for assorted meta data like main PID in a cgroup
(systemd creates a cgroup per service).
5. Questions
============ ============
Q: what's up with this '/bin/echo' ? Q: what's up with this '/bin/echo' ?
...@@ -660,5 +680,5 @@ A: bash's builtin 'echo' command does not check calls to write() against ...@@ -660,5 +680,5 @@ A: bash's builtin 'echo' command does not check calls to write() against
Q: When I attach processes, only the first of the line gets really attached ! Q: When I attach processes, only the first of the line gets really attached !
A: We can only return one error code per call to write(). So you should also A: We can only return one error code per call to write(). So you should also
put only ONE pid. put only ONE PID.
...@@ -18,16 +18,16 @@ from the rest of the system. The article on LWN [12] mentions some probable ...@@ -18,16 +18,16 @@ from the rest of the system. The article on LWN [12] mentions some probable
uses of the memory controller. The memory controller can be used to uses of the memory controller. The memory controller can be used to
a. Isolate an application or a group of applications a. Isolate an application or a group of applications
Memory hungry applications can be isolated and limited to a smaller Memory-hungry applications can be isolated and limited to a smaller
amount of memory. amount of memory.
b. Create a cgroup with limited amount of memory, this can be used b. Create a cgroup with a limited amount of memory; this can be used
as a good alternative to booting with mem=XXXX. as a good alternative to booting with mem=XXXX.
c. Virtualization solutions can control the amount of memory they want c. Virtualization solutions can control the amount of memory they want
to assign to a virtual machine instance. to assign to a virtual machine instance.
d. A CD/DVD burner could control the amount of memory used by the d. A CD/DVD burner could control the amount of memory used by the
rest of the system to ensure that burning does not fail due to lack rest of the system to ensure that burning does not fail due to lack
of available memory. of available memory.
e. There are several other use cases, find one or use the controller just e. There are several other use cases; find one or use the controller just
for fun (to learn and hack on the VM subsystem). for fun (to learn and hack on the VM subsystem).
Current Status: linux-2.6.34-mmotm(development version of 2010/April) Current Status: linux-2.6.34-mmotm(development version of 2010/April)
...@@ -38,12 +38,12 @@ Features: ...@@ -38,12 +38,12 @@ Features:
- optionally, memory+swap usage can be accounted and limited. - optionally, memory+swap usage can be accounted and limited.
- hierarchical accounting - hierarchical accounting
- soft limit - soft limit
- moving(recharging) account at moving a task is selectable. - moving (recharging) account at moving a task is selectable.
- usage threshold notifier - usage threshold notifier
- oom-killer disable knob and oom-notifier - oom-killer disable knob and oom-notifier
- Root cgroup has no limit controls. - Root cgroup has no limit controls.
Kernel memory support is work in progress, and the current version provides Kernel memory support is a work in progress, and the current version provides
basically functionality. (See Section 2.7) basically functionality. (See Section 2.7)
Brief summary of control files. Brief summary of control files.
...@@ -144,9 +144,9 @@ Figure 1 shows the important aspects of the controller ...@@ -144,9 +144,9 @@ Figure 1 shows the important aspects of the controller
3. Each page has a pointer to the page_cgroup, which in turn knows the 3. Each page has a pointer to the page_cgroup, which in turn knows the
cgroup it belongs to cgroup it belongs to
The accounting is done as follows: mem_cgroup_charge() is invoked to setup The accounting is done as follows: mem_cgroup_charge() is invoked to set up
the necessary data structures and check if the cgroup that is being charged the necessary data structures and check if the cgroup that is being charged
is over its limit. If it is then reclaim is invoked on the cgroup. is over its limit. If it is, then reclaim is invoked on the cgroup.
More details can be found in the reclaim section of this document. More details can be found in the reclaim section of this document.
If everything goes well, a page meta-data-structure called page_cgroup is If everything goes well, a page meta-data-structure called page_cgroup is
updated. page_cgroup has its own LRU on cgroup. updated. page_cgroup has its own LRU on cgroup.
...@@ -163,13 +163,13 @@ for earlier. A file page will be accounted for as Page Cache when it's ...@@ -163,13 +163,13 @@ for earlier. A file page will be accounted for as Page Cache when it's
inserted into inode (radix-tree). While it's mapped into the page tables of inserted into inode (radix-tree). While it's mapped into the page tables of
processes, duplicate accounting is carefully avoided. processes, duplicate accounting is carefully avoided.
A RSS page is unaccounted when it's fully unmapped. A PageCache page is An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches also also accounted. are really freed. Such SwapCaches are also accounted.
A swapped-in page is not accounted until it's mapped. A swapped-in page is not accounted until it's mapped.
Note: The kernel does swapin-readahead and read multiple swaps at once. Note: The kernel does swapin-readahead and reads multiple swaps at once.
This means swapped-in pages may contain pages for other tasks than a task This means swapped-in pages may contain pages for other tasks than a task
causing page fault. So, we avoid accounting at swap-in I/O. causing page fault. So, we avoid accounting at swap-in I/O.
...@@ -209,7 +209,7 @@ memsw.limit_in_bytes. ...@@ -209,7 +209,7 @@ memsw.limit_in_bytes.
Example: Assume a system with 4G of swap. A task which allocates 6G of memory Example: Assume a system with 4G of swap. A task which allocates 6G of memory
(by mistake) under 2G memory limitation will use all swap. (by mistake) under 2G memory limitation will use all swap.
In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap. In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap.
By using memsw limit, you can avoid system OOM which can be caused by swap By using the memsw limit, you can avoid system OOM which can be caused by swap
shortage. shortage.
* why 'memory+swap' rather than swap. * why 'memory+swap' rather than swap.
...@@ -217,7 +217,7 @@ The global LRU(kswapd) can swap out arbitrary pages. Swap-out means ...@@ -217,7 +217,7 @@ The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of to move account from memory to swap...there is no change in usage of
memory+swap. In other words, when we want to limit the usage of swap without memory+swap. In other words, when we want to limit the usage of swap without
affecting global LRU, memory+swap limit is better than just limiting swap from affecting global LRU, memory+swap limit is better than just limiting swap from
OS point of view. an OS point of view.
* What happens when a cgroup hits memory.memsw.limit_in_bytes * What happens when a cgroup hits memory.memsw.limit_in_bytes
When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
...@@ -236,7 +236,7 @@ an OOM routine is invoked to select and kill the bulkiest task in the ...@@ -236,7 +236,7 @@ an OOM routine is invoked to select and kill the bulkiest task in the
cgroup. (See 10. OOM Control below.) cgroup. (See 10. OOM Control below.)
The reclaim algorithm has not been modified for cgroups, except that The reclaim algorithm has not been modified for cgroups, except that
pages that are selected for reclaiming come from the per cgroup LRU pages that are selected for reclaiming come from the per-cgroup LRU
list. list.
NOTE: Reclaim does not work for the root cgroup, since we cannot set any NOTE: Reclaim does not work for the root cgroup, since we cannot set any
...@@ -316,7 +316,7 @@ We can check the usage: ...@@ -316,7 +316,7 @@ We can check the usage:
# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
1216512 1216512
A successful write to this file does not guarantee a successful set of A successful write to this file does not guarantee a successful setting of
this limit to the value written into the file. This can be due to a this limit to the value written into the file. This can be due to a
number of factors, such as rounding up to page boundaries or the total number of factors, such as rounding up to page boundaries or the total
availability of memory on the system. The user is required to re-read availability of memory on the system. The user is required to re-read
...@@ -350,7 +350,7 @@ Trying usual test under memory controller is always helpful. ...@@ -350,7 +350,7 @@ Trying usual test under memory controller is always helpful.
4.1 Troubleshooting 4.1 Troubleshooting
Sometimes a user might find that the application under a cgroup is Sometimes a user might find that the application under a cgroup is
terminated by OOM killer. There are several causes for this: terminated by the OOM killer. There are several causes for this:
1. The cgroup limit is too low (just too low to do anything useful) 1. The cgroup limit is too low (just too low to do anything useful)
2. The user is using anonymous memory and swap is turned off or too low 2. The user is using anonymous memory and swap is turned off or too low
...@@ -358,7 +358,7 @@ terminated by OOM killer. There are several causes for this: ...@@ -358,7 +358,7 @@ terminated by OOM killer. There are several causes for this:
A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
some of the pages cached in the cgroup (page cache pages). some of the pages cached in the cgroup (page cache pages).
To know what happens, disable OOM_Kill by 10. OOM Control(see below) and To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
seeing what happens will be helpful. seeing what happens will be helpful.
4.2 Task migration 4.2 Task migration
...@@ -399,10 +399,10 @@ About use_hierarchy, see Section 6. ...@@ -399,10 +399,10 @@ About use_hierarchy, see Section 6.
Almost all pages tracked by this memory cgroup will be unmapped and freed. Almost all pages tracked by this memory cgroup will be unmapped and freed.
Some pages cannot be freed because they are locked or in-use. Such pages are Some pages cannot be freed because they are locked or in-use. Such pages are
moved to parent(if use_hierarchy==1) or root (if use_hierarchy==0) and this moved to parent (if use_hierarchy==1) or root (if use_hierarchy==0) and this
cgroup will be empty. cgroup will be empty.
Typical use case of this interface is that calling this before rmdir(). The typical use case for this interface is before calling rmdir().
Because rmdir() moves all pages to parent, some out-of-use page caches can be Because rmdir() moves all pages to parent, some out-of-use page caches can be
moved to the parent. If you want to avoid that, force_empty will be useful. moved to the parent. If you want to avoid that, force_empty will be useful.
...@@ -486,7 +486,7 @@ You can reset failcnt by writing 0 to failcnt file. ...@@ -486,7 +486,7 @@ You can reset failcnt by writing 0 to failcnt file.
For efficiency, as other kernel components, memory cgroup uses some optimization For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory(and swap) usage, it's an fuzz method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.) value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2). value in memory.stat(see 5.2).
...@@ -496,8 +496,8 @@ value in memory.stat(see 5.2). ...@@ -496,8 +496,8 @@ value in memory.stat(see 5.2).
This is similar to numa_maps but operates on a per-memcg basis. This is This is similar to numa_maps but operates on a per-memcg basis. This is
useful for providing visibility into the numa locality information within useful for providing visibility into the numa locality information within
an memcg since the pages are allowed to be allocated from any physical an memcg since the pages are allowed to be allocated from any physical
node. One of the usecases is evaluating application performance by node. One of the use cases is evaluating application performance by
combining this information with the application's cpu allocation. combining this information with the application's CPU allocation.
We export "total", "file", "anon" and "unevictable" pages per-node for We export "total", "file", "anon" and "unevictable" pages per-node for
each memcg. The ouput format of memory.numa_stat is: each memcg. The ouput format of memory.numa_stat is:
...@@ -561,10 +561,10 @@ are pushed back to their soft limits. If the soft limit of each control ...@@ -561,10 +561,10 @@ are pushed back to their soft limits. If the soft limit of each control
group is very high, they are pushed back as much as possible to make group is very high, they are pushed back as much as possible to make
sure that one control group does not starve the others of memory. sure that one control group does not starve the others of memory.
Please note that soft limits is a best effort feature, it comes with Please note that soft limits is a best-effort feature; it comes with
no guarantees, but it does its best to make sure that when memory is no guarantees, but it does its best to make sure that when memory is
heavily contended for, memory is allocated based on the soft limit heavily contended for, memory is allocated based on the soft limit
hints/setup. Currently soft limit based reclaim is setup such that hints/setup. Currently soft limit based reclaim is set up such that
it gets invoked from balance_pgdat (kswapd). it gets invoked from balance_pgdat (kswapd).
7.1 Interface 7.1 Interface
...@@ -592,7 +592,7 @@ page tables. ...@@ -592,7 +592,7 @@ page tables.
8.1 Interface 8.1 Interface
This feature is disabled by default. It can be enabled(and disabled again) by This feature is disabled by default. It can be enabledi (and disabled again) by
writing to memory.move_charge_at_immigrate of the destination cgroup. writing to memory.move_charge_at_immigrate of the destination cgroup.
If you want to enable it: If you want to enable it:
...@@ -601,8 +601,8 @@ If you want to enable it: ...@@ -601,8 +601,8 @@ If you want to enable it:
Note: Each bits of move_charge_at_immigrate has its own meaning about what type Note: Each bits of move_charge_at_immigrate has its own meaning about what type
of charges should be moved. See 8.2 for details. of charges should be moved. See 8.2 for details.
Note: Charges are moved only when you move mm->owner, IOW, a leader of a thread Note: Charges are moved only when you move mm->owner, in other words,
group. a leader of a thread group.
Note: If we cannot find enough space for the task in the destination cgroup, we Note: If we cannot find enough space for the task in the destination cgroup, we
try to make space by reclaiming memory. Task migration may fail if we try to make space by reclaiming memory. Task migration may fail if we
cannot make enough space. cannot make enough space.
...@@ -612,25 +612,25 @@ And if you want disable it again: ...@@ -612,25 +612,25 @@ And if you want disable it again:
# echo 0 > memory.move_charge_at_immigrate # echo 0 > memory.move_charge_at_immigrate
8.2 Type of charges which can be move 8.2 Type of charges which can be moved
Each bits of move_charge_at_immigrate has its own meaning about what type of Each bit in move_charge_at_immigrate has its own meaning about what type of
charges should be moved. But in any cases, it must be noted that an account of charges should be moved. But in any case, it must be noted that an account of
a page or a swap can be moved only when it is charged to the task's current(old) a page or a swap can be moved only when it is charged to the task's current
memory cgroup. (old) memory cgroup.
bit | what type of charges would be moved ? bit | what type of charges would be moved ?
-----+------------------------------------------------------------------------ -----+------------------------------------------------------------------------
0 | A charge of an anonymous page(or swap of it) used by the target task. 0 | A charge of an anonymous page (or swap of it) used by the target task.
| You must enable Swap Extension(see 2.4) to enable move of swap charges. | You must enable Swap Extension (see 2.4) to enable move of swap charges.
-----+------------------------------------------------------------------------ -----+------------------------------------------------------------------------
1 | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory) 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory)
| and swaps of tmpfs file) mmapped by the target task. Unlike the case of | and swaps of tmpfs file) mmapped by the target task. Unlike the case of
| anonymous pages, file pages(and swaps) in the range mmapped by the task | anonymous pages, file pages (and swaps) in the range mmapped by the task
| will be moved even if the task hasn't done page fault, i.e. they might | will be moved even if the task hasn't done page fault, i.e. they might
| not be the task's "RSS", but other task's "RSS" that maps the same file. | not be the task's "RSS", but other task's "RSS" that maps the same file.
| And mapcount of the page is ignored(the page can be moved even if | And mapcount of the page is ignored (the page can be moved even if
| page_mapcount(page) > 1). You must enable Swap Extension(see 2.4) to | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to
| enable move of swap charges. | enable move of swap charges.
8.3 TODO 8.3 TODO
...@@ -640,11 +640,11 @@ memory cgroup. ...@@ -640,11 +640,11 @@ memory cgroup.
9. Memory thresholds 9. Memory thresholds
Memory cgroup implements memory thresholds using cgroups notification Memory cgroup implements memory thresholds using the cgroups notification
API (see cgroups.txt). It allows to register multiple memory and memsw API (see cgroups.txt). It allows to register multiple memory and memsw
thresholds and gets notifications when it crosses. thresholds and gets notifications when it crosses.
To register a threshold application need: To register a threshold, an application must:
- create an eventfd using eventfd(2); - create an eventfd using eventfd(2);
- open memory.usage_in_bytes or memory.memsw.usage_in_bytes; - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
- write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to - write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to
...@@ -659,24 +659,24 @@ It's applicable for root and non-root cgroup. ...@@ -659,24 +659,24 @@ It's applicable for root and non-root cgroup.
memory.oom_control file is for OOM notification and other controls. memory.oom_control file is for OOM notification and other controls.
Memory cgroup implements OOM notifier using cgroup notification Memory cgroup implements OOM notifier using the cgroup notification
API (See cgroups.txt). It allows to register multiple OOM notification API (See cgroups.txt). It allows to register multiple OOM notification
delivery and gets notification when OOM happens. delivery and gets notification when OOM happens.
To register a notifier, application need: To register a notifier, an application must:
- create an eventfd using eventfd(2) - create an eventfd using eventfd(2)
- open memory.oom_control file - open memory.oom_control file
- write string like "<event_fd> <fd of memory.oom_control>" to - write string like "<event_fd> <fd of memory.oom_control>" to
cgroup.event_control cgroup.event_control
Application will be notified through eventfd when OOM happens. The application will be notified through eventfd when OOM happens.
OOM notification doesn't work for root cgroup. OOM notification doesn't work for the root cgroup.
You can disable OOM-killer by writing "1" to memory.oom_control file, as: You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
#echo 1 > memory.oom_control #echo 1 > memory.oom_control
This operation is only allowed to the top cgroup of sub-hierarchy. This operation is only allowed to the top cgroup of a sub-hierarchy.
If OOM-killer is disabled, tasks under cgroup will hang/sleep If OOM-killer is disabled, tasks under cgroup will hang/sleep
in memory cgroup's OOM-waitqueue when they request accountable memory. in memory cgroup's OOM-waitqueue when they request accountable memory.
......
Processor boosting control
- information for users -
Quick guide for the impatient:
--------------------
/sys/devices/system/cpu/cpufreq/boost
controls the boost setting for the whole system. You can read and write
that file with either "0" (boosting disabled) or "1" (boosting allowed).
Reading or writing 1 does not mean that the system is boosting at this
very moment, but only that the CPU _may_ raise the frequency at it's
discretion.
--------------------
Introduction
-------------
Some CPUs support a functionality to raise the operating frequency of
some cores in a multi-core package if certain conditions apply, mostly
if the whole chip is not fully utilized and below it's intended thermal
budget. This is done without operating system control by a combination
of hardware and firmware.
On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
in technical documentation "Core performance boost". In Linux we use
the term "boost" for convenience.
Rationale for disable switch
----------------------------
Though the idea is to just give better performance without any user
intervention, sometimes the need arises to disable this functionality.
Most systems offer a switch in the (BIOS) firmware to disable the
functionality at all, but a more fine-grained and dynamic control would
be desirable:
1. While running benchmarks, reproducible results are important. Since
the boosting functionality depends on the load of the whole package,
single thread performance can vary. By explicitly disabling the boost
functionality at least for the benchmark's run-time the system will run
at a fixed frequency and results are reproducible again.
2. To examine the impact of the boosting functionality it is helpful
to do tests with and without boosting.
3. Boosting means overclocking the processor, though under controlled
conditions. By raising the frequency and the voltage the processor
will consume more power than without the boosting, which may be
undesirable for instance for mobile users. Disabling boosting may
save power here, though this depends on the workload.
User controlled switch
----------------------
To allow the user to toggle the boosting functionality, the acpi-cpufreq
driver exports a sysfs knob to disable it. There is a file:
/sys/devices/system/cpu/cpufreq/boost
which can either read "0" (boosting disabled) or "1" (boosting enabled).
Reading the file is always supported, even if the processor does not
support boosting. In this case the file will be read-only and always
reads as "0". Explicitly changing the permissions and writing to that
file anyway will return EINVAL.
On supported CPUs one can write either a "0" or a "1" into this file.
This will either disable the boost functionality on all cores in the
whole system (0) or will allow the hardware to boost at will (1).
Writing a "1" does not explicitly boost the system, but just allows the
CPU (and the firmware) to boost at their discretion. Some implementations
take external factors like the chip's temperature into account, so
boosting once does not necessarily mean that it will occur every time
even using the exact same software setup.
AMD legacy cpb switch
---------------------
The AMD powernow-k8 driver used to support a very similar switch to
disable or enable the "Core Performance Boost" feature of some AMD CPUs.
This switch was instantiated in each CPU's cpufreq directory
(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
Though the per CPU existence hints at a more fine grained control, the
actual implementation only supported a system-global switch semantics,
which was simply reflected into each CPU's file. Writing a 0 or 1 into it
would pull the other CPUs to the same state.
For compatibility reasons this file and its behavior is still supported
on AMD CPUs, though it is now protected by a config switch
(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
even with the config option set.
This functionality is considered legacy and will be removed in some future
kernel version.
More fine grained boosting control
----------------------------------
Technically it is possible to switch the boosting functionality at least
on a per package basis, for some CPUs even per core. Currently the driver
does not support it, but this may be implemented in the future.
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -56,3 +56,4 @@ stm,m41t00 Serial Access TIMEKEEPER ...@@ -56,3 +56,4 @@ stm,m41t00 Serial Access TIMEKEEPER
stm,m41t62 Serial real-time clock (RTC) with alarm stm,m41t62 Serial real-time clock (RTC) with alarm
stm,m41t80 M41T80 - SERIAL ACCESS RTC WITH ALARMS stm,m41t80 M41T80 - SERIAL ACCESS RTC WITH ALARMS
ti,tsc2003 I2C Touch-Screen Controller ti,tsc2003 I2C Touch-Screen Controller
ti,tmp102 Low Power Digital Temperature Sensor with SMBUS/Two Wire Serial Interface
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册