- 14 1月, 2011 40 次提交
-
-
由 Lasse Collin 提交于
The return value of wr->flush() is not checked in write_byte(). This means that the decompressor won't stop even if the caller doesn't want more data. This can happen e.g. with corrupt LZMA-compressed initramfs. Returning the error quickly allows the user to see the error message quicker. There is a similar missing check for wr.flush() near the end of unlzma(). Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lasse Collin 提交于
Return value of rc->fill() is checked in rc_read() and error() is called when needed, but then the code continues as if nothing had happened. rc_read() is a void function and it's on the top of performance critical call stacks, so propagating the error code via return values doesn't sound like the best fix. It seems better to check rc->buffer_size (which holds the return value of rc->fill()) in the main loop. It does nothing bad that the code runs a little with unknown data after a failed rc->fill(). This fixes an infinite loop in initramfs decompression if the LZMA-compressed initramfs image is corrupt. Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lasse Collin 提交于
Validation of header.pos calls error() but doesn't make the function return to indicate an error to the caller. Instead the decoding is attempted with invalid header.pos. This fixes it. Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lasse Collin 提交于
Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lasse Collin 提交于
Currently users of mm.h need to include <linux/slab.h> to use the macros malloc() and free() provided by mm.h. This fixes it. Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lasse Collin 提交于
set_error_fn() has become a useless complication after c1e7c3ae ("bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure") fixed the use of error() in malloc(). Only decompress_unlzma.c had some use for it and that was easy to change too. This also gets rid of the static function pointer "error", which should have been marked as __initdata. Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lasse Collin 提交于
Signed-off-by: NLasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Frysinger 提交于
This header uses things like __be32, so pull in linux/types.h. Further, it uses BLOCK_SIZE, so pull in linux/fs.h. Signed-off-by: NMike Frysinger <vapier@gentoo.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Cc: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stefani Seibold 提交于
Generate a unique inode numbers for any entries in the cram file system. For files which did not contain data's (device nodes, fifos and sockets) the offset of the directory entry inside the cramfs plus 1 will be used as inode number. The + 1 for the inode will it make possible to distinguish between a file which contains no data and files which has data, the later one has a inode value where the lower two bits are always 0. It also reimplements the behavior to set the size and the number of block to 0 for special file, which is the right value for empty files, devices, fifos and sockets As a little benefit it will be also more compatible which older mkcramfs, because it will never use the cramfs_inode->offset for creating a inode number for special files. [akpm@linux-foundation.org: trivial comment fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NStefani Seibold <stefani@seibold.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Shishkin 提交于
Currently, 3 kernel function prototypes are present in a header file exported to userland. This patch fixes it. Signed-off-by: NAlexander Shishkin <virtuoso@slind.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Moyer 提交于
aio_run_iocbs() is not used at all, so get rid of it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Namhyung Kim 提交于
'nr >= min_nr >= 0' always satisfies 'nr >= 0' so the check is unnecesary. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Torokhov 提交于
When hypervisor decides to decrease target balloon size while the balloon driver tries to lock pages hypervisor may respond with VMW_BALLOON_PPN_NOTNEEDED. Use this data and immediately stop reserving pages and wait for the next update cycle to fetch new target instead of continuing trying to lock pages until size of refused list grows above VMW_BALLOON_MAX_REFUSED (16) pages. As a result the driver stops bothering the hypervisor with its attempts to lock more pages that are not needed anymore. Most likely next order from hypervisor will be to reduce ballon size anyway. It is a small optimization. Signed-off-by: NDmitry Torokhov <dtor@vmware.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mika Laitio 提交于
This is a 1-wire/w1 DS2423 slave driver for reading the values from all 4 counters available DS2423 devices by using standard w1_slave file. In ds2423 the counters are tied to ram pages 12-15 in and each of those ram-pages. Each of these counter values (and asoociated ram page values) are represented as a own line in w1_slave file. Driver has been tested on mips and x86. usage example: cat /sys/bus/w1/devices/1d-00000009b964/w1_slave 00 02 00 00 00 00 00 00 00 6d 38 00 ff ff 00 00 fe ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff crc=YES c=2 00 02 00 00 00 00 00 00 00 e0 1f 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff crc=YES c=2 00 5a 0e 5f 18 00 00 00 00 0b 28 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff 00 00 ff ff crc=YES c=408882778 00 05 00 00 00 00 00 00 00 8d 39 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff crc=YES c=5 Patch includes also the documentation. [randy.dunlap@oracle.com: fix ds2423 build, needs to select CRC16] Signed-off-by: NMika Laitio <lamikr@pilppa.org> Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alex Dubov 提交于
Apart from currently used standard memstick data transfer method, Sony introduced several newer ones, to uncover full bandwidth/capacity of its Pro, HG and XC media formats. This patch lays a foundation to enable those methods as made possible by host/media capabilities. As a side effect of this patch, mspro_block_read_attributes became more streamlined and readable. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: NAlex Dubov <oakad@yahoo.com> Reported-by: NMaxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alex Dubov 提交于
mspro_block_mutex is identical in scope to mspro_block_disk_lock and therefore unnecessary. Signed-off-by: NAlex Dubov <oakad@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alex Dubov 提交于
Implement the usual pattern around idr_pre_get() and idr_get_new() to handlethe situation where another thread concurrently steals this thread's idr_pre_get() preallocation. Signed-off-by: NAlex Dubov <oakad@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Takashi Iwai 提交于
Signed-off-by: NAries Lee <arieslee@jmicron.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Takashi Iwai 提交于
Add a function jmb38x_ms_pmos() to enable / disable PMOS setups for JMicron 38x controllers. Signed-off-by: NAries Lee <arieslee@jmicron.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Takashi Iwai 提交于
This patch corrects the definition of clock values for JMicron 38x controllers and sets the value properly per interface type. Also, it adds a check for TPC errors in the interrupt handler. Signed-off-by: NAries Lee <arieslee@jmicron.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vasiliy Kulikov 提交于
If device_register() fails then call put_device(). See comment to device_register. Signed-off-by: NVasiliy Kulikov <segooon@gmail.com> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Add PPS signal generator which utilizes STROBE pin of a parallel port to send PPS signals. It uses parport abstraction layer and hrtimers to precisely control the signal. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Cc: Rodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Add PPS signal generator which utilizes STROBE pin of a parallel port to send PPS signals. It uses parport abstraction layer and hrtimers to precisely control the signal. [akpm@linux-foundation.org: fix build] Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Add parallel port PPS client. It uses a standard method for capturing timestamps for assert edge transitions: getting a timestamp soon after an interrupt has happened. This is not a very precise source of time information due to interrupt handling delays. However, timestamps for clear edge transitions are much more precise because the interrupt handler continuously polls hardware port until the transition is done. Hardware port operations require only about 1us so the maximum error should not exceed this value. This was my primary goal when developing this client. Clear edge capture could be disabled using clear_wait parameter. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Add an optional feature of PPSAPI, kernel consumer support, which uses the added hardpps() function. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
MONOTONIC_RAW clock timestamps are ideally suited for frequency calculation and also fit well into the original NTP hardpps design. Now phase and frequency can be adjusted separately: the former based on REALTIME clock and the latter based on MONOTONIC_RAW clock. A new function getnstime_raw_and_real is added to timekeeping subsystem to capture both timestamps at the same time and atomically. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
This commit adds hardpps() implementation based upon the original one from the NTPv4 reference kernel code from David Mills. However, it is highly optimized towards very fast syncronization and maximum stickness to PPS signal. The typical error is less then a microsecond. To make it sync faster I had to throw away exponential phase filter so that the full phase offset is corrected immediately. Then I also had to throw away median phase filter because it gives a bigger error itself if used without exponential filter. Maybe we will find an appropriate filtering scheme in the future but it's not necessary if the signal quality is ok. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Remove the code that gatheres timestamp in pps_tty_dcd_change() in case passed ts parameter is NULL because it never happens in the current code. Fix comments as well. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Bitwise conjunction is distributive so we can simplify some conditions. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
This way less overhead is involved when running production kernel. If you want to debug a pps client module please define DEBUG to enable the checks. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Now pps_idr_lock is never used in interrupt context so we can replace spin_lock_irq/spin_unlock_irq with plain spin_lock/spin_unlock. But there is also a potential race condition when someone can steal an id which was allocated by idr_pre_get before it is used. So convert spin lock to mutex and protect the whole id generation process. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Cc: Rodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Since now idr is only used to manage char device id's and not used in kernel API anymore it should be moved to pps.c. This also makes it possible to release id only at actual device freeing so nobody can register a pps device with the same id while our device is not freed yet. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Since we now have direct pointers to struct pps_device everywhere it's easy to use dev_* functions to print messages instead of plain printks. Where dev_* cannot be used printks are converted to pr_*. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Using device index as a pointer needs some unnecessary work to be done every time the pointer is needed (in irq handler for example). Using a direct pointer is much more easy (and safe as well). Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Add a helper function to gather timestamps. This way clients don't have to duplicate it. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
There was a race in PPS_FETCH ioctl handler when several processes want to obtain PPS data simultaneously using sleeping PPS_FETCH. They all sleep most of the time in the system call. With the old approach when the first process waiting on the pps queue is waken up it makes new system call right away and zeroes pps->go. So other processes continue to sleep. This is a clear race condition because of the global 'go' variable. With the new approach pps->last_ev holds some value increasing at each PPS event. PPS_FETCH ioctl handler saves current value to the local variable at the very beginning so it can safely check that there is a new event by just comparing both variables. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Move variable declarations where they are used in pps_cdev_ioctl. Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Gordeev 提交于
Here are some very trivial fixes combined: - add macro definitions to protect header file from including several times - remove declaration for an unexistent array - fix typos Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su> Acked-by: NRodolfo Giometti <giometti@linux.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Mahoney 提交于
Commit 4be2c95d ("taskstats: pad taskstats netlink response for aligment issues on ia64") added a null field to align the taskstats structure but the discussion centered around ia64. The issue exists on other platforms with inefficient unaligned access and adding them piecemeal would be an unmaintainable mess. This patch uses Dave Miller's suggestion of using a combination of CONFIG_64BIT && !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine whether alignment is needed. Note that this will cause breakage on those platforms with applications like iotop which had hard-coded offsets into the packet to access the taskstats structure. The message seen on systems without the alignment fixes looks like: kernel unaligned access to 0xe000023879dca9bc, ip=0xa000000100133d10 The addresses may vary but resolve to locations inside __delayacct_add_tsk. iotop makes what I'd call unreasonable assumptions about the contents of a netlink genetlink packet containing generic attributes. They're typed and have headers that specify value lengths, so the client can (should) identify and skip the ones the client doesn't understand. The kernel, as of version 2.6.36, presented a packet like so: +--------------------------------+ | genlmsghdr - 4 bytes | +--------------------------------+ | NLA header - 4 bytes | /* Aggregate header */ +-+------------------------------+ | | NLA header - 4 bytes | /* PID header */ | +------------------------------+ | | pid/tgid - 4 bytes | | +------------------------------+ | | NLA header - 4 bytes | /* stats header */ | + -----------------------------+ <- oops. aligned on 4 byte boundary | | struct taskstats - 328 bytes | +-+------------------------------+ The iotop code expects that the kernel will behave as it did then, assuming that the packet format is set in stone. The format is set in stone, but the packet offsets are not. There's nothing in the packet format that guarantees that the packet will be sent in exactly the same way. The attribute contents are set (or versioned) and the aggregate contents are set but they can be anywhere in the packet. The issue here isn't that an unaligned structure gets passed to userspace, it's that the NLA infrastructure has something of a weakness: The 4 byte attribute header may force the payload to be unaligned. The taskstats structure is created at an unaligned location and then 64-bit values are operated on inside the kernel, so the unaligned access warnings gets spewed everywhere. It's possible to use the unaligned access API to operate on the structure in the kernel but it seems like a wasted effort to work around userspace code that isn't following the packet format. Any new additions would also need the be worked around. It's a maintenance nightmare. The conclusion of the earlier discussion seemed to be "ok fine, if we have to break it, don't break it on arches that don't have the problem." Dave pointed out that the unaligned access problem doesn't only exist on ia64, but also on other 64-bit arches that don't have efficient unaligned access and it should be fixed there as well. The committed version of the patch and this addition keep with the conclusion of that discussion not to break it unnecessarily, which the pid padding and the packet padding fixes did do. x86_64 and powerpc don't suffer this problem so they shouldn't suffer the solution. Other 64-bit architectures do and will, though. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reported-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NDavid S. Miller <davem@davemloft.net> Cc: Dan Carpenter <error27@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Florian Mickler <florian@mickler.org> Cc: Guillaume Chazarain <guichaz@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-