- 02 4月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
BPF programs, attached to kprobes, provide a safe way to execute user-defined BPF byte-code programs without being able to crash or hang the kernel in any way. The BPF engine makes sure that such programs have a finite execution time and that they cannot break out of their sandbox. The user interface is to attach to a kprobe via the perf syscall: struct perf_event_attr attr = { .type = PERF_TYPE_TRACEPOINT, .config = event_id, ... }; event_fd = perf_event_open(&attr,...); ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); 'prog_fd' is a file descriptor associated with BPF program previously loaded. 'event_id' is an ID of the kprobe created. Closing 'event_fd': close(event_fd); ... automatically detaches BPF program from it. BPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - probe_read - wraper of probe_kernel_read() used to access any kernel data structures BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is architecture dependent) and return 0 to ignore the event and 1 to store kprobe event into the ring buffer. Note, kprobes are a fundamentally _not_ a stable kernel ABI, so BPF programs attached to kprobes must be recompiled for every kernel version and user must supply correct LINUX_VERSION_CODE in attr.kern_version during bpf_prog_load() call. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 3月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
While thinking on the whole clock discussion it occurred to me we have two distinct uses of time: 1) the tracking of event/ctx/cgroup enabled/running/stopped times which includes the self-monitoring support in struct perf_event_mmap_page. 2) the actual timestamps visible in the data records. And we've been conflating them. The first is all about tracking time deltas, nobody should really care in what time base that happens, its all relative information, as long as its internally consistent it works. The second however is what people are worried about when having to merge their data with external sources. And here we have the discussion on MONOTONIC vs MONOTONIC_RAW etc.. Where MONOTONIC is good for correlating between machines (static offset), MONOTNIC_RAW is required for correlating against a fixed rate hardware clock. This means configurability; now 1) makes that hard because it needs to be internally consistent across groups of unrelated events; which is why we had to have a global perf_clock(). However, for 2) it doesn't really matter, perf itself doesn't care what it writes into the buffer. The below patch makes the distinction between these two cases by adding perf_event_clock() which is used for the second case. It further makes this configurable on a per-event basis, but adds a few sanity checks such that we cannot combine events with different clocks in confusing ways. And since we then have per-event configurability we might as well retain the 'legacy' behaviour as a default. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 3月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
The only reason CQM had to use a hard-coded pmu type was so it could use cqm_target in hw_perf_event. Do away with the {tp,bp,cqm}_target pointers and provide a non type specific one. This allows us to do away with that silly pmu type as well. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vince@deater.net> Cc: acme@kernel.org Cc: acme@redhat.com Cc: hpa@zytor.com Cc: jolsa@redhat.com Cc: kanaka.d.juvva@intel.com Cc: matt.fleming@intel.com Cc: tglx@linutronix.de Cc: torvalds@linux-foundation.org Cc: vikas.shivappa@linux.intel.com Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 3月, 2015 1 次提交
-
-
由 Michael S. Tsirkin 提交于
QEMU wants to use virtio scsi structures with a different VIRTIO_SCSI_CDB_SIZE/VIRTIO_SCSI_SENSE_SIZE, let's add ifdefs to allow overriding them. Keep the old defines under new names: VIRTIO_SCSI_CDB_DEFAULT_SIZE/VIRTIO_SCSI_SENSE_DEFAULT_SIZE, since that's what these values really are: defaults for cdb/sense size fields. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 10 3月, 2015 2 次提交
-
-
由 Michael S. Tsirkin 提交于
Fix up comment to match virtio 1.0 logic: virtio_blk_outhdr isn't the first elements anymore, the only requirement is that it comes first in the s/g list. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Michael S. Tsirkin 提交于
Now that QEmu reuses linux virtio headers, we noticed a typo in the exported virtio block header. Fix it up. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 07 3月, 2015 1 次提交
-
-
由 Peter Hurley 提交于
ioctl(TIOCGSERIAL|TIOCSSERIAL) report and can change the port->iotype. UART drivers use the UPIO_* definitions, but the uapi header defines parallel values and userspace uses these parallel values for ioctls; thus the userspace values are definitive. Define UPIO_* iotypes in terms of the uapi defines, SERIAL_IO_*; extend the uapi defines to include all values in use by the serial core. Signed-off-by: NPeter Hurley <peter@hurleysoftware.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 2月, 2015 1 次提交
-
-
由 Matt Fleming 提交于
Add support for task events as well as system-wide events. This change has a big impact on the way that we gather LLC occupancy values in intel_cqm_event_read(). Currently, for system-wide (per-cpu) events we defer processing to userspace which knows how to discard all but one cpu result per package. Things aren't so simple for task events because we need to do the value aggregation ourselves. To do this, we defer updating the LLC occupancy value in event->count from intel_cqm_event_read() and do an SMP cross-call to read values for all packages in intel_cqm_event_count(). We need to ensure that we only do this for one task event per cache group, otherwise we'll report duplicate values. If we're a system-wide event we want to fallback to the default perf_event_count() implementation. Refactor this into a common function so that we don't duplicate the code. Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an event's task (if the event isn't per-cpu) inside of the Intel CQM PMU driver. This task information is only availble in the upper layers of the perf infrastructure. Other perf backends stash the target task in event->hw.*target so we need to do something similar. The task is used to determine whether events should share a cache group and an RMID. Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 2月, 2015 1 次提交
-
-
由 Jamal Hadi Salim 提交于
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 2月, 2015 2 次提交
-
-
由 Keith Busch 提交于
The original translation created collisions on Inquiry VPD 83 for many existing devices. Newer specifications provide other ways to translate based on the device's version can be used to create unique identifiers. Version 1.1 provides an EUI64 field that uniquely identifies each namespace, and 1.2 added the longer NGUID field for the same reason. Both follow the IEEE EUI format and readily translate to the SCSI device identification EUI designator type 2h. For devices implementing either, the translation will use this type, defaulting to the EUI64 8-byte type if implemented then NGUID's 16 byte version if not. If neither are provided, the 1.0 translation is used, and is updated to use the SCSI String format to guarantee a unique identifier. Knowing when to use the new fields depends on the nvme controller's revision. The NVME_VS macro was not decoding this correctly, so that is fixed in this patch and moved to a more appropriate place. Since the Identify Namespace structure required an update for the NGUID field, this patch adds the remaining new 1.2 fields to the structure. Signed-off-by: NKeith Busch <keith.busch@intel.com>
-
由 Keith Busch 提交于
Adds support for NVMe metadata formats and exposes block devices for all namespaces regardless of their format. Namespace formats that are unusable will have disk capacity set to 0, but a handle to the block device is created to simplify device management. A namespace is not usable when the format requires host interleave block and metadata in single buffer, has no provisioned storage, or has better data but failed to register with blk integrity. The namespace has to be scanned in two phases to support separate metadata formats. The first establishes the sector size and capacity prior to invoking add_disk. If metadata is required, the capacity will be temporarilly set to 0 until it can be revalidated and registered with the integrity extenstions after add_disk completes. The driver relies on the integrity extensions to provide the metadata buffer. NVMe requires this be a single physically contiguous region, so only one integrity segment is allowed per command. If the metadata is used for T10 PI, the driver provides mappings to save and restore the reftag physical block translation. The driver provides no-op functions for generate and verify if metadata is not used for protection information. This way the setup is always provided by the block layer. If a request does not supply a required metadata buffer, the command is failed with bad address. This could only happen if a user manually disables verify/generate on such a disk. The only exception to where this is okay is if the controller is capable of stripping/generating the metadata, which is possible on some types of formats. The metadata scatter gather list now occupies the spot in the nvme_iod that used to be used to link retryable IOD's, but we don't do that anymore, so the field was unused. Signed-off-by: NKeith Busch <keith.busch@intel.com>
-
- 19 2月, 2015 2 次提交
-
-
由 Peter Zijlstra 提交于
With LBR call stack feature enable, there are three callchain options. Enable the 3rd callchain option (LBR callstack) to user space tooling. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20141105093759.GQ10501@worktop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
The index of lbr_sel_map is bit value of perf branch_sample_type. PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map uses 4096 bytes. By using bit shift as index, we can reduce lbr_sel_map size to 40 bytes. This patch defines 'bit shift' for branch types, and use 'bit shift' to define lbr_sel_maps. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jolsa@redhat.com Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1415156173-10035-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 2月, 2015 1 次提交
-
-
由 Geoff Levand 提交于
Remove the unneded declaration for a kexec_load() routine. Fixes errors like these when running 'make headers_check': include/uapi/linux/kexec.h: userspace cannot reference function or variable defined in the kernel Paul said: : The kexec_load declaration isn't very useful for userspace, see the patch : I submitted in http://lkml.kernel.org/r/1389791824.17407.9.camel@x220 . : And After my attempt the export of that declaration has also been : discussed in : http://lkml.kernel.org/r/115373b6ac68ee7a305975896e1c4971e8e51d4c.1408731991.git.geoff@infradead.org : : In that last discussion no one has been able to point to an actual user of : it. So, as far as I can tell, no one actually uses it. Which makes : sense, because including this header by itself doesn't give one access to : a useful definition of kexec_load. So why bother with the declaration? Signed-off-by: NGeoff Levand <geoff@infradead.org> Acked-by: NPaul Bolle <pebolle@tiscali.nl> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Maximilian Attems <max@stro.at> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 2月, 2015 2 次提交
-
-
由 Helge Deller 提交于
The parisc arch has been the only user of HP-UX SOM binaries. Support for HP-UX executables was never finished and since we now drop support for the HP-UX compat layer anyway, it does not makes sense to keep the BINFMT_SOM support. Cc: linux-fsdevel@vger.kernel.org Cc: linux-parisc@vger.kernel.org Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Rusty Russell 提交于
This was introduced in commit ed9ecb04, but only defined if !VIRTIO_NET_NO_LEGACY. We should always define it: easier for users to have conditional legacy code. Suggested-by: N"Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 13 2月, 2015 2 次提交
-
-
由 Rusty Russell 提交于
In particular, the virtio header always has the u16 num_buffers field. We define a new 'struct virtio_net_hdr_v1' for this (rather than simply calling it 'struct virtio_net_hdr', to avoid nasty type errors if some parts of a project define VIRTIO_NET_NO_LEGACY and some don't. Transitional devices (which can't define VIRTIO_NET_NO_LEGACY) will have to keep using struct virtio_net_hdr_mrg_rxbuf, which has the same byte layout as struct virtio_net_hdr_v1. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Mel Gorman 提交于
Convert existing users of pte_numa and friends to the new helper. Note that the kernel is broken after this patch is applied until the other page table modifiers are also altered. This patch layout is to make review easier. Signed-off-by: NMel Gorman <mgorman@suse.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NAneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: NSasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 2月, 2015 4 次提交
-
-
由 Paul Burton 提交于
Userland code may be built using an ABI which permits linking to objects that have more restrictive floating point requirements. For example, userland code may be built to target the O32 FPXX ABI. Such code may be linked with other FPXX code, or code built for either one of the more restrictive FP32 or FP64. When linking with more restrictive code, the overall requirement of the process becomes that of the more restrictive code. The kernel has no way to know in advance which mode the process will need to be executed in, and indeed it may need to change during execution. The dynamic loader is the only code which will know the overall required mode, and so it needs to have a means to instruct the kernel to switch the FP mode of the process. This patch introduces 2 new options to the prctl syscall which provide such a capability. The FP mode of the process is represented as a simple bitmask combining a number of mode bits mirroring those present in the hardware. Userland can either retrieve the current FP mode of the process: mode = prctl(PR_GET_FP_MODE); or modify the current FP mode of the process: err = prctl(PR_SET_FP_MODE, new_mode); Signed-off-by: NPaul Burton <paul.burton@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8899/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Wang, Yalin 提交于
Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can detect zero_page in /proc/kpageflags, and then do memory analysis more accurately. Signed-off-by: NYalin Wang <yalin.wang@sonymobile.com> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tom Herbert 提交于
Change remote checksum handling to set checksum partial as default behavior. Added an iflink parameter to configure not using checksum partial (calling csum_partial to update checksum). Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Change remote checksum handling to set checksum partial as default behavior. Added an iflink parameter to configure not using checksum partial (calling csum_partial to update checksum). Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 2月, 2015 4 次提交
-
-
由 Rusty Russell 提交于
The VIRTIO_F_ANY_LAYOUT and VIRTIO_F_NOTIFY_ON_EMPTY features are pre-1.0 only. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Rusty Russell 提交于
This allows modern implementations to ensure they don't use legacy feature bits or SCSI commands (which are not used in v1.0 non-legacy). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Rusty Russell 提交于
This provides backdoor access to the device MMIOs, and every device should have one. From the virtio 1.0 spec (CS03): 4.1.4.7.1 Device Requirements: PCI configuration access capability The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Alex Williamson 提交于
Userspace can opt to receive a device request notification, indicating that the device should be released. This is setup the same way as the error IRQ and also supports eventfd signaling. Future support may forcefully remove the device from the user if the request is ignored. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 10 2月, 2015 5 次提交
-
-
由 Hariprasad Shenai 提交于
Renamed the reserved1 member of struct ethtool_drvinfo to erom_version to get expansion/option ROM version of the adapter if present. Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
Add functionality for safely appending string data to a TLV without keeping write count in the caller. Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit. Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mike Snitzer 提交于
For blk-mq request-based DM the responsibility of allocating a cloned request is transfered from DM core to the target type. Doing so enables the cloned request to be allocated from the appropriate blk-mq request_queue's pool (only the DM target, e.g. multipath, can know which block device to send a given cloned request to). Care was taken to preserve compatibility with old-style block request completion that requires request-based DM _not_ acquire the clone request's queue lock in the completion path. As such, there are now 2 different request-based DM target_type interfaces: 1) the original .map_rq() interface will continue to be used for non-blk-mq devices -- the preallocated clone request is passed in from DM core. 2) a new .clone_and_map_rq() and .release_clone_rq() will be used for blk-mq devices -- blk_get_request() and blk_put_request() are used respectively from these hooks. dm_table_set_type() was updated to detect if the request-based target is being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set. DM core disallows switching the DM table's type after it is set. This means that there is no mixing of non-blk-mq and blk-mq devices within the same request-based DM table. [This patch was started by Keith and later heavily modified by Mike] Tested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 08 2月, 2015 2 次提交
-
-
由 Neal Cardwell 提交于
Helpers for mitigating ACK loops by rate-limiting dupacks sent in response to incoming out-of-window packets. This patch includes: - rate-limiting logic - sysctl to control how often we allow dupacks to out-of-window packets - SNMP counter for cases where we rate-limited our dupack sending The rate-limiting logic in this patch decides to not send dupacks in response to out-of-window segments if (a) they are SYNs or pure ACKs and (b) the remote endpoint is sending them faster than the configured rate limit. We rate-limit our responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. We allow dupacks in response to TCP segments with data, because these may be spurious retransmissions for which the remote endpoint wants to receive DSACKs. This is safe because segments with data can't realistically be part of ACK loops, which by their nature consist of each side sending pure/data-less ACKs to each other. The dupack interval is controlled by a new sysctl knob, tcp_invalid_ratelimit, given in milliseconds, in case an administrator needs to dial this upward in the face of a high-rate DoS attack. The name and units are chosen to be analogous to the existing analogous knob for ICMP, icmp_ratelimit. The default value for tcp_invalid_ratelimit is 500ms, which allows at most one such dupack per 500ms. This is chosen to be 2x faster than the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule 2.4). We allow the extra 2x factor because network delay variations can cause packets sent at 1 second intervals to be compressed and arrive much closer. Reported-by: NAvery Fay <avery@mixpanel.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarno Rajahalme 提交于
OVS userspace already probes the openvswitch kernel module for OVS_ACTION_ATTR_SET_MASKED support. This patch adds the kernel module implementation of masked set actions. The existing set action sets many fields at once. When only a subset of the IP header fields, for example, should be modified, all the IP fields need to be exact matched so that the other field values can be copied to the set action. A masked set action allows modification of an arbitrary subset of the supported header bits without requiring the rest to be matched. Masked set action is now supported for all writeable key types, except for the tunnel key. The set tunnel action is an exception as any input tunnel info is cleared before action processing starts, so there is no tunnel info to mask. The kernel module converts all (non-tunnel) set actions to masked set actions. This makes action processing more uniform, and results in less branching and duplicating the action processing code. When returning actions to userspace, the fully masked set actions are converted back to normal set actions. We use a kernel internal action code to be able to tell the userspace provided and converted masked set actions apart. Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com> Acked-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2015 1 次提交
-
-
由 Niklas Cassel 提交于
This is the last missing piece to get a kernel booting to a prompt in qemu-cris. Signed-off-by: NNiklas Cassel <nks@flawful.org> Acked-by: NJesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 05 2月, 2015 2 次提交
-
-
由 Theodore Ts'o 提交于
Add a new mount option which enables a new "lazytime" mode. This mode causes atime, mtime, and ctime updates to only be made to the in-memory version of the inode. The on-disk times will only get updated when (a) if the inode needs to be updated for some non-time related change, (b) if userspace calls fsync(), syncfs() or sync(), or (c) just before an undeleted inode is evicted from memory. This is OK according to POSIX because there are no guarantees after a crash unless userspace explicitly requests via a fsync(2) call. For workloads which feature a large number of random write to a preallocated file, the lazytime mount option significantly reduces writes to the inode table. The repeated 4k writes to a single block will result in undesirable stress on flash devices and SMR disk drives. Even on conventional HDD's, the repeated writes to the inode table block will trigger Adjacent Track Interference (ATI) remediation latencies, which very negatively impact long tail latencies --- which is a very big deal for web serving tiers (for example). Google-Bug-Id: 18297052 Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Dumazet 提交于
FQ has a fast path for skb attached to a socket, as it does not have to compute a flow hash. But for other packets, FQ being non stochastic means that hosts exposed to random Internet traffic can allocate million of flows structure (104 bytes each) pretty easily. Not only host can OOM, but lookup in RB trees can take too much cpu and memory resources. This patch adds a new attribute, orphan_mask, that is adding possibility of having a stochastic hash for orphaned skb. Its default value is 1024 slots, to mimic SFQ behavior. Note: This does not apply to locally generated TCP traffic, and no locally generated traffic will share a flow structure with another perfect or stochastic flow. This patch also handles the specific case of SYNACK messages: They are attached to the listener socket, and therefore all map to a single hash bucket. If listener have set SO_MAX_PACING_RATE, hoping to have new accepted socket inherit this rate, SYNACK might be paced and even dropped. This is very similar to an internal patch Google have used more than one year. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 2月, 2015 4 次提交
-
-
由 Satoru Takeuchi 提交于
"notused" is not necessary. Set 1 to the first entry is enough. Signed-off-by: Takeuchi Satoru <takeuchi_satoru@jp.fujitsu.com Cc: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Willem de Bruijn 提交于
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit timestamps, this loops timestamps on top of empty packets. Doing so reduces the pressure on SO_RCVBUF. Payload inspection and cmsg reception (aside from timestamps) are no longer possible. This works together with a follow on patch that allows administrators to only allow tx timestamping if it does not loop payload or metadata. Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- Changes (rfc -> v1) - add documentation - remove unnecessary skb->len test (thanks to Richard Cochran) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christophe Ricard 提交于
NFC_EVT_TRANSACTION is sent through netlink in order for a specific application running on a secure element to notify userspace of an event. Typically the secure element application counterpart on the host could interpret that event and act upon it. Forwarded information contains: - SE host generating the event - Application IDentifier doing the operation - Applications parameters Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Chunyan Zhang 提交于
Add a full sc9836-uart driver for SC9836 SoC which is based on the spreadtrum sharkl64 platform. This driver also support earlycon. Originally-by: NLanqing Liu <lanqing.liu@spreadtrum.com> Signed-off-by: NOrson Zhai <orson.zhai@spreadtrum.com> Signed-off-by: NChunyan Zhang <chunyan.zhang@spreadtrum.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NPeter Hurley <peter@hurleysoftware.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-