提交 · a756cf5908530e8b40bdf569eb48b40139e8d7fd · openeuler / raspberrypi-kernel

11 1月, 2012 9 次提交

mm: try to distribute dirty pages fairly across zones · a756cf59

由 Johannes Weiner 提交于 1月 10, 2012

The maximum number of dirty pages that exist in the system at any time is
determined by a number of pages considered dirtyable and a user-configured
percentage of those, or an absolute number in bytes.

This number of dirtyable pages is the sum of memory provided by all the
zones in the system minus their lowmem reserves and high watermarks, so
that the system can retain a healthy number of free pages without having
to reclaim dirty pages.

But there is a flaw in that we have a zoned page allocator which does not
care about the global state but rather the state of individual memory
zones.  And right now there is nothing that prevents one zone from filling
up with dirty pages while other zones are spared, which frequently leads
to situations where kswapd, in order to restore the watermark of free
pages, does indeed have to write pages from that zone's LRU list.  This
can interfere so badly with IO from the flusher threads that major
filesystems (btrfs, xfs, ext4) mostly ignore write requests from reclaim
already, taking away the VM's only possibility to keep such a zone
balanced, aside from hoping the flushers will soon clean pages from that
zone.

Enter per-zone dirty limits.  They are to a zone's dirtyable memory what
the global limit is to the global amount of dirtyable memory, and try to
make sure that no single zone receives more than its fair share of the
globally allowed dirty pages in the first place.  As the number of pages
considered dirtyable excludes the zones' lowmem reserves and high
watermarks, the maximum number of dirty pages in a zone is such that the
zone can always be balanced without requiring page cleaning.

As this is a placement decision in the page allocator and pages are
dirtied only after the allocation, this patch allows allocators to pass
__GFP_WRITE when they know in advance that the page will be written to and
become dirty soon.  The page allocator will then attempt to allocate from
the first zone of the zonelist - which on NUMA is determined by the task's
NUMA memory policy - that has not exceeded its dirty limit.

At first glance, it would appear that the diversion to lower zones can
increase pressure on them, but this is not the case.  With a full high
zone, allocations will be diverted to lower zones eventually, so it is
more of a shift in timing of the lower zone allocations.  Workloads that
previously could fit their dirty pages completely in the higher zone may
be forced to allocate from lower zones, but the amount of pages that
"spill over" are limited themselves by the lower zones' dirty constraints,
and thus unlikely to become a problem.

For now, the problem of unfair dirty page distribution remains for NUMA
configurations where the zones allowed for allocation are in sum not big
enough to trigger the global dirty limits, wake up the flusher threads and
remedy the situation.  Because of this, an allocation that could not
succeed on any of the considered zones is allowed to ignore the dirty
limits before going into direct reclaim or even failing the allocation,
until a future patch changes the global dirty throttling and flusher
thread activation so that they take individual zone states into account.

			Test results

15M DMA + 3246M DMA32 + 504 Normal = 3765M memory
40% dirty ratio
16G USB thumb drive
10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 << 15))

		seconds			nr_vmscan_write
		        (stddev)	       min|     median|        max
xfs
vanilla:	 549.747( 3.492)	     0.000|      0.000|      0.000
patched:	 550.996( 3.802)	     0.000|      0.000|      0.000

fuse-ntfs
vanilla:	1183.094(53.178)	 54349.000|  59341.000|  65163.000
patched:	 558.049(17.914)	     0.000|      0.000|     43.000

btrfs
vanilla:	 573.679(14.015)	156657.000| 460178.000| 606926.000
patched:	 563.365(11.368)	     0.000|      0.000|   1362.000

ext4
vanilla:	 561.197(15.782)	     0.000|2725438.000|4143837.000
patched:	 568.806(17.496)	     0.000|      0.000|      0.000
Signed-off-by: NJohannes Weiner <jweiner@redhat.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a756cf59

mm: exclude reserved pages from dirtyable memory · ab8fabd4

由 Johannes Weiner 提交于 1月 10, 2012

Per-zone dirty limits try to distribute page cache pages allocated for
writing across zones in proportion to the individual zone sizes, to reduce
the likelihood of reclaim having to write back individual pages from the
LRU lists in order to make progress.

This patch:

The amount of dirtyable pages should not include the full number of free
pages: there is a number of reserved pages that the page allocator and
kswapd always try to keep free.

The closer (reclaimable pages - dirty pages) is to the number of reserved
pages, the more likely it becomes for reclaim to run into dirty pages:

       +----------+ ---
       |   anon   |  |
       +----------+  |
       |          |  |
       |          |  -- dirty limit new    -- flusher new
       |   file   |  |                     |
       |          |  |                     |
       |          |  -- dirty limit old    -- flusher old
       |          |                        |
       +----------+                       --- reclaim
       | reserved |
       +----------+
       |  kernel  |
       +----------+

This patch introduces a per-zone dirty reserve that takes both the lowmem
reserve as well as the high watermark of the zone into account, and a
global sum of those per-zone values that is subtracted from the global
amount of dirtyable pages.  The lowmem reserve is unavailable to page
cache allocations and kswapd tries to keep the high watermark free.  We
don't want to end up in a situation where reclaim has to clean pages in
order to balance zones.

Not treating reserved pages as dirtyable on a global level is only a
conceptual fix.  In reality, dirty pages are not distributed equally
across zones and reclaim runs into dirty pages on a regular basis.

But it is important to get this right before tackling the problem on a
per-zone level, where the distance between reclaim and the dirty pages is
mostly much smaller in absolute numbers.

[akpm@linux-foundation.org: fix highmem build]
Signed-off-by: NJohannes Weiner <jweiner@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab8fabd4

mm, debug: test for online nid when allocating on single node · f6d7e0cb

由 David Rientjes 提交于 1月 10, 2012

Calling alloc_pages_exact_node() means the allocation only passes the
zonelist of a single node into the page allocator.  If that node isn't
online, it's zonelist may never have been initialized causing a strange
oops that may not immediately be clear.

I recently debugged an issue where node 0 wasn't online and an allocator
was passing 0 to alloc_pages_exact_node() and it resulted in a NULL
pointer on zonelist->_zoneref.  If CONFIG_DEBUG_VM is enabled, though, it
would be nice to catch this a bit earlier.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6d7e0cb

mm: more intensive memory corruption debugging · c0a32fc5

由 Stanislaw Gruszka 提交于 1月 10, 2012

With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception
on access (read,write) to an unallocated page, which permits us to catch
code which corrupts memory.  However the kernel is trying to maximise
memory usage, hence there are usually few free pages in the system and
buggy code usually corrupts some crucial data.

This patch changes the buddy allocator to keep more free/protected pages
and to interlace free/protected and allocated pages to increase the
probability of catching corruption.

When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC,
debug_guardpage_minorder defines the minimum order used by the page
allocator to grant a request.  The requested size will be returned with
the remaining pages used as guard pages.

The default value of debug_guardpage_minorder is zero: no change from
current behaviour.

[akpm@linux-foundation.org: tweak documentation, s/flg/flag/]
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0a32fc5

kernel.h: add BUILD_BUG() macro · 1399ff86

由 David Daney 提交于 1月 10, 2012

We can place this in definitions that we expect the compiler to remove by
dead code elimination.  If this assertion fails, we get a nice error
message at build time.

The GCC function attribute error("message") was added in version 4.3, so
we define a new macro __linktime_error(message) to expand to this for
GCC-4.3 and later.  This will give us an error diagnostic from the
compiler on the line that fails.  For other compilers
__linktime_error(message) expands to nothing, and we have to be content
with a link time error, but at least we will still get a build error.

BUILD_BUG() expands to the undefined function __build_bug_failed() and
will fail at link time if the compiler ever emits code for it.  On GCC-4.3
and later, attribute((error())) is used so that the failure will be noted
at compile time instead.
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: DM <dm.n9107@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1399ff86

mm: avoid livelock on !__GFP_FS allocations · f90ac398

由 Mel Gorman 提交于 1月 10, 2012

Colin Cross reported;

  Under the following conditions, __alloc_pages_slowpath can loop forever:
  gfp_mask & __GFP_WAIT is true
  gfp_mask & __GFP_FS is false
  reclaim and compaction make no progress
  order <= PAGE_ALLOC_COSTLY_ORDER

  These conditions happen very often during suspend and resume,
  when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL
  allocations into __GFP_WAIT.

  The oom killer is not run because gfp_mask & __GFP_FS is false,
  but should_alloc_retry will always return true when order is less
  than PAGE_ALLOC_COSTLY_ORDER.

In his fix, he avoided retrying the allocation if reclaim made no progress
and __GFP_FS was not set.  The problem is that this would result in
GFP_NOIO allocations failing that previously succeeded which would be very
unfortunate.

The big difference between GFP_NOIO and suspend converting GFP_KERNEL to
behave like GFP_NOIO is that normally flushers will be cleaning pages and
kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay.
The same does not necessarily apply during suspend as the storage device
may be suspended.

This patch special cases the suspend case to fail the page allocation if
reclaim cannot make progress and adds some documentation on how
gfp_allowed_mask is currently used.  Failing allocations like this may
cause suspend to abort but that is better than a livelock.

[mgorman@suse.de: Rework fix to be suspend specific]
[rientjes@google.com: Move suspended device check to should_alloc_retry]
Reported-by: NColin Cross <ccross@android.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f90ac398

mm: remove unused pagevec_free · da066ad3

由 Konstantin Khlebnikov 提交于 1月 10, 2012

It not exported and now nobody uses it.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da066ad3

mm: add free_hot_cold_page_list() helper · cc59850e

由 Konstantin Khlebnikov 提交于 1月 10, 2012

This patch adds helper free_hot_cold_page_list() to free list of 0-order
pages.  It frees pages directly from list without temporary page-vector.
It also calls trace_mm_pagevec_free() to simulate pagevec_free()
behaviour.

bloat-o-meter:

add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
function                                     old     new   delta
free_hot_cold_page_list                        -     264    +264
get_page_from_freelist                      2129    2132      +3
__pagevec_free                               243     239      -4
split_free_page                              380     373      -7
release_pages                                606     510     -96
free_page_list                               188       -    -188
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc59850e

mm/page-writeback.c: make determine_dirtyable_memory static again · 1edf2234

由 Johannes Weiner 提交于 1月 10, 2012

The tracing ring-buffer used this function briefly, but not anymore.
Make it local to the writeback code again.

Also, move the function so that no forward declaration needs to be
reintroduced.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1edf2234

10 1月, 2012 4 次提交

vfs: new helper - d_make_root() · adc0e91a

由 Al Viro 提交于 1月 08, 2012

d_alloc_root() with iput() in case of allocation failure...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

adc0e91a

tracing/mm: Move include of trace/events/kmem.h out of header into slab.c · 4dee6b64

由 Steven Rostedt 提交于 1月 09, 2012

Including trace/events/*.h TRACE_EVENT() macro headers in other headers
can cause strange side effects if another trace/event/*.h header
includes that header. Having trace/events/kmem.h inside slab_def.h
caused a compile error in sparc64 when changes were done to some header
files. Moving the kmem.h trace header out of slab.h and into slab.c
fixes the problem.

Note, both slub.c and slob.c already include the trace/events/kmem.h
file. Only slab.c had it missing.

Link: http://lkml.kernel.org/r/20120105190405.1e3191fb5a43b2a0f1655e1f@canb.auug.org.auReported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4dee6b64

net: Fix build with INET disabled. · 3969eb38

由 David S. Miller 提交于 1月 09, 2012

> net/core/sock.c: In function 'sk_update_clone':
> net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg'
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3969eb38

net: introduce netif_addr_lock_nested() and call if when appropriate · 2429f7ac

由 Jiri Pirko 提交于 1月 09, 2012

dev_uc_sync() and dev_mc_sync() are acquiring netif_addr_lock for
destination device of synchronization. Since netif_addr_lock is
already held at the time for source device, this triggers lockdep
deadlock warning.

There's no way this deadlock can happen so use spin_lock_nested() to
silence the warning.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2429f7ac

09 1月, 2012 1 次提交

jbd: Remove j_barrier mutex · 00482785

由 Jan Kara 提交于 12月 22, 2011

j_barrier mutex is used for serializing different journal lock operations.  The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.

So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.

CC: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NJoel Becker <jlbec@evilplan.org>
Signed-off-by: NJan Kara <jack@suse.cz>

00482785

07 1月, 2012 10 次提交

reiserfs: Properly display mount options in /proc/mounts · c3aa0776

由 Jan Kara 提交于 12月 21, 2011

Make reiserfs properly display mount options in /proc/mounts.

CC: reiserfs-devel@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c3aa0776

vfs: prevent remount read-only if pending removes · 8e8b8796

由 Miklos Szeredi 提交于 11月 21, 2011

If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.
Reported-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e8b8796

vfs: count unlinked inodes · 7ada4db8

由 Miklos Szeredi 提交于 11月 21, 2011

Add a new counter to the superblock that keeps track of unlinked but
not yet deleted inodes.

Do not WARN_ON if set_nlink is called with zero count, just do a
ratelimited printk.  This happens on xfs and probably other
filesystems after an unclean shutdown when the filesystem reads inodes
which already have zero i_nlink.  Reported by Christoph Hellwig.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ada4db8

vfs: protect remounting superblock read-only · 4ed5e82f

由 Miklos Szeredi 提交于 11月 21, 2011

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount.  This indicates that remount is in
progress and no further write requests are allowed.  If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.

If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed.  Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.

A later patch deals with delayed writes due to nlink going to zero.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ed5e82f

vfs: keep list of mounts for each superblock · 39f7c4db

由 Miklos Szeredi 提交于 11月 21, 2011

Keep track of vfsmounts belonging to a superblock.  List is protected
by vfsmount_lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

39f7c4db

A
vfs: switch ->show_options() to struct dentry * · 34c80b1d
由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
34c80b1d
A
vfs: switch ->show_path() to struct dentry * · a6322de6
由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a6322de6
A
vfs: switch ->show_devname() to struct dentry * · d861c630
由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d861c630
A
vfs: switch ->show_stats to struct dentry * · 64132379
由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
64132379
A
switch security_path_chmod() to struct path * · cdcf116d
由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
cdcf116d

06 1月, 2012 4 次提交

dma-buf: Introduce dma buffer sharing mechanism · d15bd7ee

由 Sumit Semwal 提交于 12月 26, 2011

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy
sharing of this buffer object across devices.

The framework allows:
- creation of a buffer object, its association with a file pointer, and
   associated allocator-defined operations on that buffer. This operation is
   called the 'export' operation.
- different devices to 'attach' themselves to this exported buffer object, to
  facilitate backing storage negotiation, using dma_buf_attach() API.
- the exported buffer object to be shared with the other entity by asking for
   its 'file-descriptor (fd)', and sharing the fd across.
- a received fd to get the buffer object back, where it can be accessed using
   the associated exporter-defined operations.
- the exporter and user to share the scatterlist associated with this buffer
   object using map_dma_buf and unmap_dma_buf operations.

Atleast one 'attach()' call is required to be made prior to calling the
map_dma_buf() operation.

Couple of building blocks in map_dma_buf() are added to ease introduction
of sync'ing across exporter and users, and late allocation by the exporter.

For this first version, this framework will work with certain conditions:
- *ONLY* exporter will be allowed to mmap to userspace (outside of this
   framework - mmap is not a buffer object operation),
- currently, *ONLY* users that do not need CPU access to the buffer are
   allowed.

More details are there in the documentation patch.

This is based on design suggestions from many people at the mini-summits[1],
most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
Daniel Vetter <daniel@ffwll.ch>.

The implementation is inspired from proof-of-concept patch-set from
Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing
between two v4l2 devices. [2]

[1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
[2]: http://lwn.net/Articles/454389Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: NSumit Semwal <sumit.semwal@ti.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NDave Airlie <airlied@redhat.com>
Reviewed-and-Tested-by: NRob Clark <rob.clark@linaro.org>
Signed-off-by: NDave Airlie <airlied@redhat.com>

d15bd7ee

net: pack skb_shared_info more efficiently · 9f42f126

由 Ian Campbell 提交于 1月 05, 2012

nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be
packed with tx_flags.

Also by moving ip6_frag_id and dataref (both 4 bytes) next to each other we can
avoid a hole between ip6_frag_id and frag_list on 64 bit systems.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f42f126

net_sched: sfq: extend limits · 18cb8098

由 Eric Dumazet 提交于 1月 04, 2012

SFQ as implemented in Linux is very limited, with at most 127 flows
and limit of 127 packets. [ So if 127 flows are active, we have one
packet per flow ]

This patch brings to SFQ following features to cope with modern needs.

- Ability to specify a smaller per flow limit of inflight packets.
    (default value being at 127 packets)

- Ability to have up to 65408 active flows (instead of 127)

- Ability to have head drops instead of tail drops
  (to drop old packets from a flow)

Example of use : No more than 20 packets per flow, max 8000 flows, max
20000 packets in SFQ qdisc, hash table of 65536 slots.

tc qdisc add ... sfq \
        flows 8000 \
        depth 20 \
        headdrop \
        limit 20000 \
	divisor 65536

Ram usage :

2 bytes per hash table entry (instead of previous 1 byte/entry)
32 bytes per flow on 64bit arches, instead of 384 for QFQ, so much
better cache hit ratio.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Dave Taht <dave.taht@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18cb8098

netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call · 68bad94e

由 Neerav Parikh 提交于 1月 04, 2012

This adds a new ndo_get_fcoe_hbainfo() call in
net_device_ops for FCoE protocol stack.

If supported by the underlying device, the FCoE protocol
stack will call this to get device specific information
from the underlying device.
This information will then be utilized by the FCoE protocol
stack to register Fiber Channel HBA attributes with the
Fiber Channel Management Service via Fabric Device
Management Interface (FDMI) as per the T11 FC-GS
specification.

Changes in v2:
- As per comments from David Miller aligning the parameters
of the ndo_get_fcoe_hbainfo()
Signed-off-by: NNeerav Parikh <Neerav.Parikh@intel.com>
Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68bad94e

05 1月, 2012 6 次提交

net/hyperv: Add support for jumbo frame up to 64KB · 4d447c9a

由 Haiyang Zhang 提交于 12月 15, 2011

Allow the user set the MTU up to 65536 for Linux guests running on
Hyper-V 2008 R2 or later.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

4d447c9a

usb: ch9: fix up MaxStreams helper · 18b7ede5

由 Felipe Balbi 提交于 1月 02, 2012

According to USB 3.0 Specification Table 9-22, if
bmAttributes [4:0] are set to zero, it means "no
streams supported", but the way this helper was
defined on Linux, we will *always* have one stream
which might cause several problems.

For example on DWC3, we would tell the controller
endpoint has streams enabled and yet start transfers
with Stream ID set to 0, which would goof up the host
side.

While doing that, convert the macro to an inline
function due to the different checks we now need.
Signed-off-by: NFelipe Balbi <balbi@ti.com>
Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

18b7ede5

driver core: remove __must_check from device_create_file · b9d4e714

由 Greg Kroah-Hartman 提交于 1月 04, 2012

With the conversion of the sysdev to a real struct device, more drivers
are calling device_create_file, and some of them don't check the return
value, which isn't wise.

But as they happen to be in parts of the kernel where a warning is
considered an error (i.e. powerpc), this breaks the build.  So for now,
remove the marking on the function, which fixes the build problems.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

b9d4e714

NFC: Export a new attribute nfcid1 in target info · 288e0713

由 Ilan Elias 提交于 12月 22, 2011

The nfcid1 is the NFC-A identifier.
It is exported as an attribute of the target info
(returned as a response to NFC_CMD_GET_TARGET).
Signed-off-by: NIlan Elias <ilane@ti.com>
Acked-by: NSamuel Ortiz <sameo@linux.intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

288e0713

ethtool: Remove ethtool_ops::set_rx_ntuple operation · 6cfb5e75

由 Ben Hutchings 提交于 1月 03, 2012

All implementations have been converted to implement set_rxnfc
instead.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cfb5e75

ethtool: Allow drivers to select RX NFC rule locations · 55664f32

由 Ben Hutchings 提交于 1月 03, 2012

Define special location values for RX NFC that request the driver to
select the actual rule location.  This allows for implementation on
devices that use hash-based filter lookup, whereas currently the API is
more suited to devices with TCAM lookup or linear search.

In ethtool_set_rxnfc() and the compat wrapper ethtool_ioctl(), copy
the structure back to user-space after insertion so that the actual
location is returned.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55664f32

04 1月, 2012 6 次提交
- A
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c · 0226f492
  由 Al Viro 提交于 12月 06, 2011
```
rationale: that stuff is far tighter bound to fs/namespace.c than to
the guts of procfs proper.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0226f492
- A
  vfs: move fsnotify junk to struct mount · c63181e6
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c63181e6
- A
  vfs: move mnt_devname · 52ba1621
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  52ba1621
- A
  vfs: move mnt_list to struct mount · 1a4eeaf2
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1a4eeaf2
- A
  vfs: move the rest of int fields to struct mount · 863d684f
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  863d684f
- A
  vfs: mnt_id/mnt_group_id moved · 15169fe7
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  15169fe7