- 01 7月, 2006 6 次提交
-
-
由 Christoph Lameter 提交于
Per zone counter infrastructure The counters that we currently have for the VM are split per processor. The processor however has not much to do with the zone these pages belong to. We cannot tell f.e. how many ZONE_DMA pages are dirty. So we are blind to potentially inbalances in the usage of memory in various zones. F.e. in a NUMA system we cannot tell how many pages are dirty on a particular node. If we knew then we could put measures into the VM to balance the use of memory between different zones and different nodes in a NUMA system. For example it would be possible to limit the dirty pages per node so that fast local memory is kept available even if a process is dirtying huge amounts of pages. Another example is zone reclaim. We do not know how many unmapped pages exist per zone. So we just have to try to reclaim. If it is not working then we pause and try again later. It would be better if we knew when it makes sense to reclaim unmapped pages from a zone. This patchset allows the determination of the number of unmapped pages per zone. We can remove the zone reclaim interval with the counters introduced here. Futhermore the ability to have various usage statistics available will allow the development of new NUMA balancing algorithms that may be able to improve the decision making in the scheduler of when to move a process to another node and hopefully will also enable automatic page migration through a user space program that can analyse the memory load distribution and then rebalance memory use in order to increase performance. The counter framework here implements differential counters for each processor in struct zone. The differential counters are consolidated when a threshold is exceeded (like done in the current implementation for nr_pageache), when slab reaping occurs or when a consolidation function is called. Consolidation uses atomic operations and accumulates counters per zone in the zone structure and also globally in the vm_stat array. VM functions can access the counts by simply indexing a global or zone specific array. The arrangement of counters in an array also simplifies processing when output has to be generated for /proc/*. Counters can be updated by calling inc/dec_zone_page_state or _inc/dec_zone_page_state analogous to *_page_state. The second group of functions can be called if it is known that interrupts are disabled. Special optimized increment and decrement functions are provided. These can avoid certain checks and use increment or decrement instructions that an architecture may provide. We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can do DMA to all memory and therefore ZONE_NORMAL will not be populated. This is only currently set for IA64 SGI SN2 and currently only affects node_page_state(). In the best case node_page_state can be reduced to retrieving a single counter for the one zone on the node. [akpm@osdl.org: cleanups] [akpm@osdl.org: export vm_stat[] for filesystems] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas event counters do not need to be. Zone based VM statistics are necessary to be able to determine what the state of memory in one zone is. In a NUMA system this can be helpful for local reclaim and other memory optimizations that may be able to shift VM load in order to get more balanced memory use. It is also useful to know how the computing load affects the memory allocations on various zones. This patchset allows the retrieval of that data from userspace. The patchset introduces a framework for counters that is a cross between the existing page_stats --which are simply global counters split per cpu-- and the approach of deferred incremental updates implemented for nr_pagecache. Small per cpu 8 bit counters are added to struct zone. If the counter exceeds certain thresholds then the counters are accumulated in an array of atomic_long in the zone and in a global array that sums up all zone values. The small 8 bit counters are next to the per cpu page pointers and so they will be in high in the cpu cache when pages are allocated and freed. Access to VM counter information for a zone and for the whole machine is then possible by simply indexing an array (Thanks to Nick Piggin for pointing out that approach). The access to the total number of pages of various types does no longer require the summing up of all per cpu counters. Benefits of this patchset right now: - Ability for UP and SMP configuration to determine how memory is balanced between the DMA, NORMAL and HIGHMEM zones. - loops over all processors are avoided in writeback and reclaim paths. We can avoid caching the writeback information because the needed information is directly accessible. - Special handling for nr_pagecache removed. - zone_reclaim_interval vanishes since VM stats can now determine when it is worth to do local reclaim. - Fast inline per node page state determination. - Accurate counters in /sys/devices/system/node/node*/meminfo. Current counters are counting simply which processor allocated a page somewhere and guestimate based on that. So the counters were not useful to show the actual distribution of page use on a specific zone. - The swap_prefetch patch requires per node statistics in order to figure out when processors of a node can prefetch. This patch provides some of the needed numbers. - Detailed VM counters available in more /proc and /sys status files. References to earlier discussions: V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2 V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2 Performance tests with AIM7 did not show any regressions. Seems to be a tad faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes fixes for s390/arm/uml arch code. This patch: Move counter code from page_alloc.c/page-flags.h to vmstat.c/h. Create vmstat.c/vmstat.h by separating the counter code and the proc functions. Move the vm_stat_text array before zoneinfo_show. [akpm@osdl.org: s390 build fix] [akpm@osdl.org: HOTPLUG_CPU build fix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
drivers/char/istallion.c: In function âstli_initbrdsâ: drivers/char/istallion.c:4150: error: implicit declaration of function âstli_parsebrdâ drivers/char/istallion.c:4150: error: âstli_brdspâ undeclared (first use in this function) drivers/char/istallion.c:4150: error: (Each undeclared identifier is reported only once drivers/char/istallion.c:4150: error: for each function it appears in.) drivers/char/istallion.c:4164: error: implicit declaration of function âstli_argbrdsâ While I was at it, I also removed the #ifdef MODULE around the initialation code to allow it to perhaps work when built into the kernel and made a needlessly global function static. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
register_cpu_notifier() cannot do anything in a module, in a !CONFIG_HOTPLUG_CPU kernel. Cc: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
This fixes drivers/char/pc8736x_gpio.c and drivers/char/scx200_gpio.c to use the platform_device_del/put ops correctly. Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Jim Cromie <jim.cromie@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
Fix build error on x86_64. There's nothing even remotely close to imacmp_seg in the kernel, so I removed the whole line. Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Edgar Hucek <hostmaster@ed-soft.at> Cc: Antonino Daplas <adaplas@pol.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 6月, 2006 34 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2由 Linus Torvalds 提交于
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks() ocfs2: clean up some osb fields ocfs2: fix init of uuid_net_key ocfs2: silence a debug print ocfs2: silence ENOENT during lookup of broken links ocfs2: Cleanup message prints ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static ocfs2: warn the user on a dead timeout mismatch ocfs2: OCFS2_FS must depend on SYSFS ocfs2: Compile-time disabling of ocfs2 debugging output. configfs: Clear up a few extra spaces where there should be TABs. configfs: Release memory in configfs_example.
-
由 Linus Torvalds 提交于
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) [TIPC]: Initial activation message now includes TIPC version number [TIPC]: Improve response to requests for node/link information [TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf [IrDA]: Fix the AU1000 FIR dependencies [IrDA]: Fix RCU lock pairing on error path [XFRM]: unexport xfrm_state_mtu [NET]: make skb_release_data() static [NETFILTE] ipv4: Fix typo (Bugzilla #6753) [IrDA]: MCS7780 usb_driver struct should be static [BNX2]: Turn off link during shutdown [BNX2]: Use dev_kfree_skb() instead of the _irq version [ATM]: basic sysfs support for ATM devices [ATM]: [suni] change suni_init to __devinit [ATM]: [iphase] should be __devinit not __init [ATM]: [idt77105] should be __devinit not __init [BNX2]: Add NETIF_F_TSO_ECN [NET]: Add ECN support for TSO [AF_UNIX]: Datagram getpeersec [NET]: Fix logical error in skb_gso_ok [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000 ...
-
由 Allan Stephens 提交于
Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPer Liden <per.liden@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Allan Stephens 提交于
Now allocates reply space for "get links" request based on number of actual links, not number of potential links. Also, limits reply to "get links" and "get nodes" requests to 32KB to match capabilities of tipc-config utility that issued request. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPer Liden <per.liden@ericsson.com>
-
由 Allan Stephens 提交于
Now determines tailroom of bundle buffer by directly inspection of buffer. Previously, buffer was assumed to have a max capacity equal to the link MTU, but the addition of link MTU negotiation means that the link MTU can increase after the bundle buffer is allocated. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPer Liden <per.liden@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Adrian Bunk 提交于
AU1000 FIR is broken, it should depend on SOC_AU1000. Spotted by Jean-Luc Leger. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NSamuel Ortiz <samuel@sortiz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Josh Triplett 提交于
irlan_client_discovery_indication calls rcu_read_lock and rcu_read_unlock, but returns without unlocking in an error case. Fix that by replacing the return with a goto so that the rcu_read_unlock always gets executed. Signed-off-by: NJosh Triplett <josh@freedesktop.org> Acked-by: NPaul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Samuel Ortiz samuel@sortiz.org <> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Adrian Bunk 提交于
This patch removes the unused EXPORT_SYMBOL(xfrm_state_mtu). Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Adrian Bunk 提交于
skb_release_data() no longer has any users in other files. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matt LaPlante 提交于
This patch fixes bugzilla #6753, a typo in the netfilter Kconfig Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Adrian Bunk 提交于
This patch makes a needlessly global struct static. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NSamuel Ortiz <samuel@sortiz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Minor change in shutdown logic to effect a link down. Update version to 1.4.43. Signed-off-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Change all dev_kfree_skb_irq() and dev_kfree_skb_any() to dev_kfree_skb(). These calls are never used in irq context. Signed-off-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roman Kagan 提交于
Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chas Williams 提交于
Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chas Williams 提交于
Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chas Williams 提交于
Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Add NETIF_F_TSO_ECN feature for all bnx2 hardware. Signed-off-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
In the current TSO implementation, NETIF_F_TSO and ECN cannot be turned on together in a TCP connection. The problem is that most hardware that supports TSO does not handle CWR correctly if it is set in the TSO packet. Correct handling requires CWR to be set in the first packet only if it is set in the TSO header. This patch adds the ability to turn on NETIF_F_TSO and ECN using GSO if necessary to handle TSO packets with CWR set. Hardware that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-> features flag. All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If the output device does not have the NETIF_F_TSO_ECN feature set, GSO will split the packet up correctly with CWR only set in the first segment. With help from Herbert Xu <herbert@gondor.apana.org.au>. Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock flag is completely removed. Signed-off-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Catherine Zhang 提交于
This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com> Acked-by: NAcked-by: James Morris <jmorris@namei.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
The test in skb_gso_ok is backwards. Noticed by Michael Chan <mchan@broadcom.com>. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NMichael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shuya MAEDA 提交于
Signed-off-by: NShuya MAEDA <maeda-sxb@necst.nec.co.jp> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
Rather than having illegal_highdma as a macro when HIGHMEM is off, we can turn it into an inline function that returns zero. This will catch callers that give it bad arguments. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sridhar Samudrala 提交于
While debugging a TCP server hang issue, we noticed that currently there is no way for a user to get the acceptq backlog value for a TCP listen socket. All the standard networking utilities that display socket info like netstat, ss and /proc/net/tcp have 2 fields called rx_queue and tx_queue. These fields do not mean much for listening sockets. This patch uses one of these unused fields(rx_queue) to export the accept queue len for listening sockets. Signed-off-by: NSridhar Samudrala <sri@us.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Darrel Goeddel 提交于
This patch encapsulates the usage of eff_cap (in netlink_skb_params) within the security framework by extending security_netlink_recv to include a required capability parameter and converting all direct usage of eff_caps outside of the lsm modules to use the interface. It also updates the SELinux implementation of the security_netlink_send and security_netlink_recv hooks to take advantage of the sid in the netlink_skb_params struct. This also enables SELinux to perform auditing of netlink capability checks. Please apply, for 2.6.18 if possible. Signed-off-by: NDarrel Goeddel <dgoeddel@trustedcs.com> Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
When GSO packets come from an untrusted source (e.g., a Xen guest domain), we need to verify the header integrity before passing it to the hardware. Since the first step in GSO is to verify the header, we can reuse that code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this bit set can only be fed directly to devices with the corresponding bit NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb is fed to the GSO engine which will allow the packet to be sent to the hardware if it passes the header check. This patch changes the sg flag to a full features flag. The same method can be used to implement TSO ECN support. We simply have to mark packets with CWR set with SKB_GSO_ECN so that only hardware with a corresponding NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment the packet, or segment the first MTU and pass the rest to the hardware for further segmentation. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
When a device that is acting as a bridge port is unregistered, the ip_queue/nfnetlink_queue notifier doesn't check if its one of physindev/physoutdev and doesn't release the references if it is. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jorge Matias 提交于
xt_sctp uses an incorrect header offset when --chunk-types is used. Signed-off-by: NJorge Matias <jorge.matias@motorola.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuri Gushin 提交于
"xt_unregister_match(AF_INET, &tcp_matchstruct)" is called twice, leaving "udp_matchstruct" registered, in case of a failure in the registration of the udp6 structure. Signed-off-by: NYuri Gushin <yuri@ecl-labs.org> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yasuyuki Kozakai 提交于
CC net/netfilter/nf_conntrack_proto_sctp.o net/netfilter/nf_conntrack_proto_sctp.c: In function `sctp_print_conntrack': net/netfilter/nf_conntrack_proto_sctp.c:206: warning: implicit declaration of function `local_bh_disable' net/netfilter/nf_conntrack_proto_sctp.c:208: warning: implicit declaration of function `local_bh_enable' CC net/netfilter/nf_conntrack_netlink.o net/netfilter/nf_conntrack_netlink.c: In function `ctnetlink_dump_table': net/netfilter/nf_conntrack_netlink.c:429: warning: implicit declaration of function `local_bh_disable' net/netfilter/nf_conntrack_netlink.c:452: warning: implicit declaration of function `local_bh_enable' Spotted by Toralf Förster Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
When xt_register_table fails the error is not properly propagated back. Based on patch by Lepton Wu <ytht.net@gmail.com>. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This makes things easier to track down, especially in modules. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-