- 24 1月, 2014 2 次提交
-
-
由 Mike Frysinger 提交于
This header uses enum NPmode but doesn't include ppp_defs.h. If you try to use this header w/out including the defs header first, it leads to a build failure. So add the explicit include to fix it. Don't know of any packages directly impacted, but noticed while building some ppp code by hand. Signed-off-by: NMike Frysinger <vapier@gentoo.org> Cc: Paul Mackerras <paulus@samba.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Geert Uytterhoeven 提交于
Now all 64-bit architectures have been converted to int-ll64.h, we can remove int-l64.h in kernelspace. For backwards compatibility, alpha, ia64, mips64, and powerpc64 still use int-l64.h in userspace. This is the (reworked for UAPI) non-documentation part of more than two year old "asm/types.h: All architectures use int-ll64.h in kernelspace" (https://lkml.org/lkml/2011/8/13/104) Since <asm/types.h> (from include/uapi/asm-generic/types.h) is used for both kernel and user space, include/asm-generic/int-ll64.h cannot just become include/asm-generic/types.h, as Arnd suggested. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 1月, 2014 2 次提交
-
-
由 Li Zhong 提交于
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Dongmao Zhang 提交于
In the cluster evironment, cluster write has poor performance because userspace_flush() has to contact a userspace program (cmirrord) for clear/mark/flush requests. But both mark and flush requests require cmirrord to communicate the message to all the cluster nodes for each flush call. This behaviour is really slow. To address this we now merge mark and flush requests together to reduce the kernel-userspace-kernel time. We allow a new directive, "integrated_flush" that can be used to instruct the kernel log code to combine flush and mark requests when directed by userspace. If not directed by userspace (due to an older version of the userspace code perhaps), the kernel will function as it did previously - preserving backwards compatibility. Additionally, flush requests are performed lazily when only clear requests exist. Signed-off-by: NDongmao Zhang <dmzhang@suse.com> Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 17 1月, 2014 1 次提交
-
-
由 Vadim Rozenfeld 提交于
Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 1月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
I noticed the new sched_{set,get}attr() calls didn't properly deal with the SCHED_RESET_ON_FORK hack. Instead of propagating the flags in high bits nonsense use the brand spanking new attr::sched_flags field. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Link: http://lkml.kernel.org/r/20140115162242.GJ31570@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 1月, 2014 1 次提交
-
-
由 Dario Faggioli 提交于
Introduces the data structures, constants and symbols needed for SCHED_DEADLINE implementation. Core data structure of SCHED_DEADLINE are defined, along with their initializers. Hooks for checking if a task belong to the new policy are also added where they are needed. Adds a scheduling class, in sched/dl.c and a new policy called SCHED_DEADLINE. It is an implementation of the Earliest Deadline First (EDF) scheduling algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS) that makes it possible to isolate the behaviour of tasks between each other. The typical -deadline task will be made up of a computation phase (instance) which is activated on a periodic or sporadic fashion. The expected (maximum) duration of such computation is called the task's runtime; the time interval by which each instance need to be completed is called the task's relative deadline. The task's absolute deadline is dynamically calculated as the time instant a task (better, an instance) activates plus the relative deadline. The EDF algorithms selects the task with the smallest absolute deadline as the one to be executed first, while the CBS ensures each task to run for at most its runtime every (relative) deadline length time interval, avoiding any interference between different tasks (bandwidth isolation). Thanks to this feature, also tasks that do not strictly comply with the computational model sketched above can effectively use the new policy. To summarize, this patch: - introduces the data structures, constants and symbols needed; - implements the core logic of the scheduling algorithm in the new scheduling class file; - provides all the glue code between the new scheduling class and the core scheduler and refines the interactions between sched/dl and the other existing scheduling classes. Signed-off-by: NDario Faggioli <raistlin@linux.it> Signed-off-by: NMichael Trimarchi <michael@amarulasolutions.com> Signed-off-by: NFabio Checconi <fchecconi@gmail.com> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 1月, 2014 1 次提交
-
-
由 Yann Droneaud 提交于
Unlike recent modern userspace API such as: epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC), fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC), signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC), or the venerable general purpose open (O_CLOEXEC), perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC (eg. close-on-exec) flag on file descriptor it returns to userspace. The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow perf_event_open() syscall to atomically set close-on-exec. Having this flag will enable userspace to remove the file descriptor from the list of file descriptors being inherited across exec, without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the associated race condition between the current thread and another thread calling fork(2) then execve(2). Links: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html - Notes in DMA buffer sharing: leak and security hole http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/dma-buf-sharing.txt?id=v3.13-rc3#n428Signed-off-by: NYann Droneaud <ydroneaud@opteya.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8c03f54e1598b1727c19706f3af03f98685d9fe6.1388952061.git.ydroneaud@opteya.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 1月, 2014 2 次提交
-
-
由 Steven Whitehouse 提交于
This patch adds four new fields to directory leaf blocks. The intent is not to use them in the kernel itself, although perhaps we may be able to use them as hints at some later date, but instead to provide more information for debug/fsck use. One new field adds a pointer to the inode to which the leaf belongs. This can be useful if the pointer to the leaf block has become corrupt, as it will allow us to know which inode this block should be associated with. This field is set when the leaf is created and never changed over its lifetime. The second field is a "distance from the hash table" field. The meaning is as follows: 0 = An old leaf in which this value has not been set 1 = This leaf is pointed to directly from the hash table 2+ = This leaf is part of a chain, pointed to by another leaf block, the value gives the position in the chain. The third and fourth fields combine to give a time stamp of the most recent directory insertion or deletion from this leaf block. The time stamp is not updated when a new leaf block is chained from the current one. The code is currently written such that the timestamp on the dir inode will match that of the leaf block for the most recent insertion/deletion. For backwards compatibility, any of these new fields which is zero should be considered to be "unknown". Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> -
由 Vinod Koul 提交于
this gives ability to convey the valid values of supported rates in sample_rates array Signed-off-by: NVinod Koul <vinod.koul@intel.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
- 05 1月, 2014 2 次提交
-
-
由 Vinod Koul 提交于
Now that we don't use SNDRV_PCM_RATE_xxx bit fields for sample rate, we need to change the description to an array for describing the sample rates supported by the sink/source Signed-off-by: NVinod Koul <vinod.koul@intel.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
由 Vinod Koul 提交于
Signed-off-by: NVinod Koul <vinod.koul@intel.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
- 23 12月, 2013 1 次提交
-
-
由 Marek Olšák 提交于
This will allow userspace to correctly program the PA_SC_RASTER_CONFIG register, so it can be considered a fix. Signed-off-by: NMarek Olšák <marek.olsak@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 22 12月, 2013 1 次提交
-
-
由 Christoffer Dall 提交于
Support creating the ARM VGIC device through the KVM_CREATE_DEVICE ioctl, which can then later be leveraged to use the KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in a more generic API than the ARM-specific one and is useful for save/restore of VGIC state. Adds KVM_CAP_DEVICE_CTRL to ARM capabilities. Note that we change the check for creating a VGIC from bailing out if any VCPUs were created, to bailing out if any VCPUs were ever run. This is an important distinction that shouldn't break anything, but allows creating the VGIC after the VCPUs have been created. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 21 12月, 2013 1 次提交
-
-
由 H.J. Lu 提交于
x32 statfs system call is the same as x86-64 statfs system call, which uses 64-bit integer for __statfs_word. This patch defines __statfs_word as __kernel_long_t instead of long. Signed-off-by: NH.J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/CAMe9rOrcppHvC5g8U9n7D%2BpxVGdu1G598pge3Erfw7Pr-iEpAQ@mail.gmail.com Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 19 12月, 2013 1 次提交
-
-
由 Frank Haverkamp 提交于
Module initialization and PCIe setup. Card health monitoring and recovery functionality. Character device creation and deletion are controlled from here. Signed-off-by: NFrank Haverkamp <haver@linux.vnet.ibm.com> Co-authors: Joerg-Stephan Vogt <jsvogt@de.ibm.com>, Michael Jung <MIJUNG@de.ibm.com>, Michael Ruettger <michael@ibmra.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 12月, 2013 2 次提交
-
-
由 Alex Williamson 提交于
These are set of two capability registers, it's pretty much given that they're registers, so reflect their purpose in the name. Suggested-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Alex Williamson 提交于
While we don't really have any infrastructure for making use of VC support, the system BIOS can configure the topology to non-default VC values prior to boot. This may be due to silicon bugs, desire to reserve traffic classes, or perhaps just BIOS bugs. When we reset devices, the VC configuration may return to default values, which can be incompatible with devices upstream. For instance, Nvidia GRID cards provide a PCIe switch and some number of GPUs, all supporting VC. The power-on default for VC is to support TC0-7 across VC0, however some platforms will only enable TC0/VC0 mapping across the topology. When we do a secondary bus reset on the downstream switch port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end of the link only enables TC0/VC0. If the GPU attempts to use TC1-7, it fails. This patch attempts to provide complete support for VC save/restore, even beyond the minimally required use case above. This includes save/restore and reload of the arbitration table, save/restore and reload of the port arbitration tables, and re-enabling of the channels for VC, VC9, and MFVC capabilities. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 17 12月, 2013 1 次提交
-
-
由 Vince Weaver 提交于
Commit fdfbbd07 ("perf: Add generic transaction flags") added support for PERF_SAMPLE_TRANSACTION but forgot to add documentation for the sample type to include/uapi/linux/perf_event.h Signed-off-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1312131548450.10372@pianoman.cluster.toySigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 12月, 2013 3 次提交
-
-
由 Vinod Koul 提交于
The usage of SNDRV_RATES is not effective as we can have rates like 12000 or some other ones used by decoders. This change the usage of this to use the raw Hz values to be sent to kernel Signed-off-by: NVinod Koul <vinod.koul@intel.com> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
由 Rafał Miłecki 提交于
Some devices with support for mobile networks may have buttons for enabling/disabling such connection. An example can be Linksys router 54G3G. We already have KEY_BLUETOOTH, KEY_WLAN and KEY_UWB so it makes sense to add KEY_WWAN as well. As we already have KEY_WIMAX, use it's value for KEY_WWAN and make it an alias. Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Bjorn Helgaas 提交于
Add symbolic constants for the PCIe Slot Control indicator and power control fields defined by spec and use them instead of open-coded hex constants. No functional change. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 10 12月, 2013 1 次提交
-
-
由 Takashi Iwai 提交于
snd_pcm_uframes_t is defined as unsigned long so it would take different sizes depending on 32 or 64bit architectures. As we don't want this ABI incompatibility, and there is no real 64bit user yet, let's make it the fixed size with __u32. Also bump the protocol version number to 0.1.2. Acked-by: NVinod Koul <vinod.koul@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
- 09 12月, 2013 1 次提交
-
-
由 Jakob Bornecrantz 提交于
Userspace uses this to workaround overcommit issues by flushing the command stream early. Signed-off-by: NJakob Bornecrantz <jakob@vmware.com> Reviewed-by: NThomas Hellstrom <thellstrom@vmware.com>
-
- 08 12月, 2013 1 次提交
-
-
由 Geert Uytterhoeven 提交于
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
- 06 12月, 2013 1 次提交
-
-
由 Ping Cheng 提交于
Some devices, such as new Intuos series tablets, have a hardware switch to turn touch data on/off. To report the state, SW_MUTE_DEVICE is added in include/uapi/linux/input.h. Reviewed_by: Chris Bagwell <chris@cnpbagwell.com> Acked-by: NPeter Hutterer <peter.hutterer@who-t.net> Tested-by: NJason Gerecke <killertofu@gmail.com> Signed-off-by: NPing Cheng <pingc@wacom.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
- 03 12月, 2013 1 次提交
-
-
由 Amit Pundir 提交于
Drop EPOLLWAKEUP from epoll events mask if CONFIG_PM_SLEEP is disabled. Signed-off-by: NAmit Pundir <amit.pundir@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 01 12月, 2013 1 次提交
-
-
由 Arvid Brodin 提交于
This implements the rtnl_link_ops fill_info routine for HSR. Signed-off-by: NArvid Brodin <arvid.brodin@alten.se> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2013 2 次提交
-
-
由 Johannes Berg 提交于
The pmcraid driver is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make it use the correct API, but since this may already be used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. My previous patch broke event delivery in the driver as I missed that it wasn't using the right API and forgot to update it later in my series. While changing this, I noticed that the genetlink code could use the static group ID instead of a strcmp(), so also do that for the VFS_DQUOT family. Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
The first netlink attribute (value 0) must always be defined as none/unspec. This is correctly done in inet_diag.h, but other diag interfaces are wrong. Because we cannot change an existing API, I add a comment to point the mistake and avoid to propagate it in a new diag API in the future. CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2013 2 次提交
-
-
由 Ashutosh Dixit 提交于
Endianness issues are now consistent as per the documentation in host/mic_virtio.h. Sparse warnings related to endianness are also fixed. Note that the MIC driver implementation assumes that the host can be both BE or LE whereas the card is always LE. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NSudeep Dutt <sudeep.dutt@intel.com> Reviewed-by: NNikhil Rao <nikhil.rao@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Ashutosh Dixit 提交于
Avoid declaring ALIGN() and __aligned() in include/uapi/linux/mic_common.h since they pollute user space namespace. Also, mic_aligned_size() can be simply replaced simply by sizeof() since all structures where mic_aligned_size() is used are declared using __attribute__ ((aligned(8))); -- >From mail from H Peter Anvin about this: On Fri, Nov 08, 2013 H Peter Anvin <h.peter.anvin@intel.com> wrote: Subject: Namespace pollution in mic_common.h This puts two macros, ALIGN() and __aligned(), into arbitrary user space namespace. This really isn't safe or acceptable, especially since those symbols are highly generic. ... When these structures are forced-aligned, they will in fact have padding automatically added by the compiler to an 8-byte boundary anyway, so mic_aligned_size() does nothing. ... Reported-by: NH Peter Anvin <h.peter.anvin@intel.com> Reviewed-by: NSudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: NNikhil Rao <nikhil.rao@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 11月, 2013 1 次提交
-
-
由 Pali Rohár 提交于
Many notebooks have a special button for enabling/disabling ambient light sensor. Signed-off-by: NPali Rohár <pali.rohar@gmail.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
- 26 11月, 2013 2 次提交
-
-
由 Geert Uytterhoeven 提交于
Fix member definitions for non-native userspace handling: - All multi-byte values are big-endian, hence use __be*, - All pointers are 32-bit pointers under AmigaOS, but unused (except for cd_BoardAddr) under Linux, hence use __be32. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> -
由 Geert Uytterhoeven 提交于
The Zorro definitions and device IDs are used by bootstraps, hence they should be exported through UAPI. Unfortunately zorro.h was never marked for export when headers_install was introduced, so it was forgotten during the big UAPI disintegration. In addition, the removal of zorro_ids.h had been sneaked into commit 7e7a43c3 ("PCI: don't export device IDs to userspace") before, so it was also forgotten. Split off and export the Zorro definitions used by bootstraps. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
- 20 11月, 2013 1 次提交
-
-
由 Johannes Berg 提交于
The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 11月, 2013 1 次提交
-
-
由 Aurelien Jarno 提交于
linux/raid/md_p.h is using conditionals depending on endianess and fails with an error if neither of __BIG_ENDIAN, __LITTLE_ENDIAN or __BYTE_ORDER are defined, but it doesn't include any header which can define these constants. This make this header unusable alone. This patch adds a #include <asm/byteorder.h> at the beginning of this header to make it usable alone. This is needed to compile klibc on MIPS. Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 11月, 2013 3 次提交
-
-
由 Michel Dänzer 提交于
This is required to properly calculate the tiling parameters in userspace. Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Matan Barak 提交于
This commit reverts commit 7afbddfa ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Yann Droneaud 提交于
Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit 400dbc96, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ | flags | 00 00 | command | | in_words | out_words | +----------------------------------------+ | response | | response | | provider_in_words | provider_out_words | | padding | +----------------------------------------+ | | . <uverbs input> . . (in_words * 8) . | | +----------------------------------------+ | | . <provider input> . . (provider_in_words * 8) . | | +----------------------------------------+ - response, if present: +----------------------------------------+ | | . <uverbs output space> . . (out_words * 8) . | | +----------------------------------------+ | | . <provider output space> . . (provider_out_words * 8) . | | +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-