提交 · fc4099f17240767554ff3a73977acb78ef615404 · OpenHarmony / kernel_linux

23 10月, 2015 1 次提交

openvswitch: Fix egress tunnel info. · fc4099f1

由 Pravin B Shelar 提交于 10月 22, 2015

While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices.  Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.

Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc4099f1

22 10月, 2015 1 次提交

openvswitch: Clarify conntrack COMMIT behaviour · 1d008a1d

由 Joe Stringer 提交于 10月 19, 2015

The presence of this attribute does not modify the ct_state for the
current packet, only future packets. Make this more clear in the header
definition.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d008a1d

17 10月, 2015 1 次提交

net: add pfmemalloc check in sk_add_backlog() · c7c49b8f

由 Eric Dumazet 提交于 9月 29, 2015

Greg reported crashes hitting the following check in __sk_backlog_rcv()

	BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));

The pfmemalloc bit is currently checked in sk_filter().

This works correctly for TCP, because sk_filter() is ran in
tcp_v[46]_rcv() before hitting the prequeue or backlog checks.

For UDP or other protocols, this does not work, because the sk_filter()
is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
queuing if socket is owned by user by the time packet is processed by
softirq handler.

Fixes: b4b9e355 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7c49b8f

15 10月, 2015 1 次提交

drm/dp/mst: make mst i2c transfer code more robust. · ae491542

由 Dave Airlie 提交于 10月 14, 2015

This zeroes the msg so no random stack data ends up getting
sent, it also limits the function to not accepting > 4
i2c msgs.

Cc: stable@vger.kernel.org
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDave Airlie <airlied@redhat.com>

ae491542

13 10月, 2015 2 次提交

rtnetlink: fix gcc -Wconversion warning · e8444637

由 Arad, Ronen 提交于 10月 09, 2015

RTA_ALIGNTO is currently define as 4. It has to be 4U to prevent warning
for RTA_ALIGN and RTA_DATA expansions when -Wconversion gcc option is
enabled.
This follows NLMSG_ALIGNTO definition in <include/uapi/linux/netlink.h>.
Signed-off-by: NRonen Arad <ronen.arad@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8444637

arm64: Fix MINSIGSTKSZ and SIGSTKSZ · c9692657

由 Manjeet Pawar 提交于 10月 09, 2015

MINSIGSTKSZ and SIGSTKSZ for ARM64 are not correctly set in latest kernel.
This patch fixes this issue.

This issue is reported in LTP (testcase: sigaltstack02.c).
Testcase failed when sigaltstack() called with stack size "MINSIGSTKSZ - 1"
Since in Glibc-2.22, MINSIGSTKSZ is set to 5120 but in kernel
it is set to 2048 so testcase gets failed.

Testcase Output:
sigaltstack02 1  TPASS  :  stgaltstack() fails, Invalid Flag value,errno:22
sigaltstack02 2  TFAIL  :  sigaltstack() returned 0, expected -1,errno:12

Reported Issue in Glibc Bugzilla:
Bugfix in Glibc-2.22: [Bug 16850]
https://sourceware.org/bugzilla/show_bug.cgi?id=16850Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAkhilesh Kumar <akhilesh.k@samsung.com>
Signed-off-by: NManjeet Pawar <manjeet.p@samsung.com>
Signed-off-by: NRohit Thapliyal <r.thapliyal@samsung.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c9692657

09 10月, 2015 1 次提交

irqdomain: Add an accessor for the of_node field · 10abc7df

由 Marc Zyngier 提交于 10月 09, 2015

As we're about to remove the of_node field from the irqdomain
structure, introduce an accessor for it. Subsequent patches
will take care of the actual repainting.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1444402211-1141-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

10abc7df

08 10月, 2015 1 次提交

af_unix: constify the sock parameter in unix_sk() · c72eda06

由 Paul Moore 提交于 10月 06, 2015

Make unix_sk() just like inet[6]_sk() by constify'ing the sock
parameter.
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c72eda06

07 10月, 2015 3 次提交

openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT · ab38a7b5

由 Joe Stringer 提交于 10月 06, 2015

Previously, the CT_ATTR_FLAGS attribute, when nested under the
OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the
semantics of the ct action. It's more extensible to just represent each
flag as a nested attribute, and this requires no additional error
checking to reject flags that aren't currently supported.
Suggested-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab38a7b5

openvswitch: Extend ct_state match field to 32 bits · fbccce59

由 Joe Stringer 提交于 10月 06, 2015

The ct_state field was initially added as an 8-bit field, however six of
the bits are already being used and use cases are already starting to
appear that may push the limits of this field. This patch extends the
field to 32 bits while retaining the internal representation of 8 bits.
This should cover forward compatibility of the ABI for the foreseeable
future.

This patch also reorders the OVS_CS_F_* bits to be sequential.
Suggested-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbccce59

openvswitch: Fix typos in CT headers · 0a7cc172

由 Joe Stringer 提交于 10月 06, 2015

These comments hadn't caught up to their implementations, fix them.

Fixes: 7f8a436e "openvswitch: Add conntrack action"
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a7cc172

05 10月, 2015 2 次提交

openvswitch: Rename LABEL->LABELS · 33db4125

由 Joe Stringer 提交于 10月 01, 2015

Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
for these to be consistent with conntrack.

Fixes: c2ac6673 "openvswitch: Allow matching on conntrack label"
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33db4125

tcp/dccp: fix old style declarations · 8695a144

由 Raanan Avargil 提交于 10月 01, 2015

I’m using the compilation flag -Werror=old-style-declaration, which
requires that the “inline” word would come at the beginning of the code
line.

$ make drivers/net/ethernet/intel/e1000e/e1000e.ko
...
include/net/inet_timewait_sock.h:116:1: error: ‘inline’ is not at
beginning of declaration [-Werror=old-style-declaration]
static void inline inet_twsk_schedule(struct inet_timewait_sock *tw, int
timeo)

include/net/inet_timewait_sock.h:121:1: error: ‘inline’ is not at
beginning of declaration [-Werror=old-style-declaration]
static void inline inet_twsk_reschedule(struct inet_timewait_sock *tw,
int timeo)

Fixes: ed2e9239 ("tcp/dccp: fix timewait races in timer handling")
Signed-off-by: NRaanan Avargil <raanan.avargil@intel.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8695a144

02 10月, 2015 5 次提交

drm/dp/mst: add some defines for logical/physical ports · ccf03d69

由 Dave Airlie 提交于 10月 01, 2015

This just removes the magic number.
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDave Airlie <airlied@redhat.com>

ccf03d69

drm/dp/mst: split connector registration into two parts (v2) · d9515c5e

由 Dave Airlie 提交于 9月 16, 2015

In order to cache the EDID properly for tiled displays, we
need to retrieve it before we register the connector with
userspace, otherwise userspace can call get resources
and try and get the edid before we've even cached it.

This fixes some problems when hotplugging mst monitors,
with X/mutter running. As mutter seems to get 0 modes
for one of the monitors in the tile.

v2: fix warning in radeon
handle tile setting in cached path rather than
get edid path.
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: NDave Airlie <airlied@redhat.com>

d9515c5e

memcg: remove pcp_counter_lock · ef510194

由 Greg Thelen 提交于 10月 01, 2015

Commit 733a572e ("memcg: make mem_cgroup_read_{stat|event}() iterate
possible cpus instead of online") removed the last use of the per memcg
pcp_counter_lock but forgot to remove the variable.

Kill the vestigial variable.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef510194

memcg: fix dirty page migration · 0610c25d

由 Greg Thelen 提交于 10月 01, 2015

The problem starts with a file backed dirty page which is charged to a
memcg.  Then page migration is used to move oldpage to newpage.

Migration:
 - copies the oldpage's data to newpage
 - clears oldpage.PG_dirty
 - sets newpage.PG_dirty
 - uncharges oldpage from memcg
 - charges newpage to memcg

Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
count.

However, because newpage is not yet charged, setting newpage.PG_dirty
does not increment the memcg's dirty page count.  After migration
completes newpage.PG_dirty is eventually cleared, often in
account_page_cleaned().  At this time newpage is charged to a memcg so
the memcg's dirty page count is decremented which causes underflow
because the count was not previously incremented by migration.  This
underflow causes balance_dirty_pages() to see a very large unsigned
number of dirty memcg pages which leads to aggressive throttling of
buffered writes by processes in non root memcg.

This issue:
 - can harm performance of non root memcg buffered writes.
 - can report too small (even negative) values in
   memory.stat[(total_)dirty] counters of all memcg, including the root.

To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
page_memcg() and set_page_memcg() helpers.

Test:
    0) setup and enter limited memcg
    mkdir /sys/fs/cgroup/test
    echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/test/cgroup.procs

    1) buffered writes baseline
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    2) buffered writes with compaction antagonist to induce migration
    yes 1 > /proc/sys/vm/compact_memory &
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    kill %
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    3) buffered writes without antagonist, should match baseline
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

                       (speed, dirty residue)
             unpatched                       patched
    1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
    2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
    3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages

    Notice that unpatched baseline performance (1) fell after
    migration (3): 841 -> 114 MB/s.  In the patched kernel, post
    migration performance matches baseline.

Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting")
Signed-off-by: NGreg Thelen <gthelen@google.com>
Reported-by: NDave Hansen <dave.hansen@intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0610c25d

userfaultfd: remove kernel header include from uapi header · 9ff42d10

由 Andre Przywara 提交于 10月 01, 2015

As include/uapi/linux/userfaultfd.h is a user visible header file, it
should not include kernel-exclusive header files.

So trying to build the userfaultfd test program from the selftests
directory fails, since it contains a reference to linux/compiler.h.  As
it turns out, that header is not really needed there, so we can simply
remove it to fix that issue.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ff42d10

01 10月, 2015 3 次提交

blk-mq: factor out a helper to iterate all tags for a request_queue · 0bf6cd5b

由 Christoph Hellwig 提交于 9月 27, 2015

And replace the blk_mq_tag_busy_iter with it - the driver use has been
replaced with a new helper a while ago, and internal to the block we
only need the new version.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0bf6cd5b

blk-mq: fix racy updates of rq->errors · f4829a9b

由 Christoph Hellwig 提交于 9月 27, 2015

blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f4829a9b

usb: renesas_usbhs: fix build warning if 64-bit architecture · 9ae7ce00

由 Yoshihiro Shimoda 提交于 9月 29, 2015

This patch fixes the following warning if 64-bit architecture environment:

./drivers/usb/renesas_usbhs/common.c:496:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  dparam->type = of_id ? (u32)of_id->data : 0;
Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NFelipe Balbi <balbi@ti.com>

9ae7ce00

30 9月, 2015 4 次提交

drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2 · 4ad640e9

由 Egbert Eich 提交于 9月 23, 2015

drm_kms_helper_poll_enable() was converted to lock the mode_config
mutex in commit 8c4ccc4a
("drm/probe-helper: Grab mode_config.mutex in poll_init/enable").

This disregarded the cases where this function is called from a context
where this mutex is already locked.

Add a non-locking version as well.

Changes since v1:
- use function name suffix '_locked' for the function that
  is to be called from a locked context.
Signed-off-by: NEgbert Eich <eich@suse.de>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

4ad640e9

skbuff: Fix skb checksum partial check. · 31b33dfb

由 Pravin B Shelar 提交于 9月 28, 2015

Earlier patch 6ae459bd tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb->data.

Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.

Fixes: 6ae459bd ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: NAndrew Vagin <avagin@odin.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31b33dfb

af_unix: Convert the unix_sk macro to an inline function for type safety · 4613012d

由 Aaron Conole 提交于 9月 26, 2015

As suggested by Eric Dumazet this change replaces the
#define with a static inline function to enjoy
complaints by the compiler when misusing the API.
Signed-off-by: NAaron Conole <aconole@bytheb.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4613012d

blk-mq: fix sysfs registration/unregistration race · 4593fdbe

由 Akinobu Mita 提交于 9月 27, 2015

There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.

null_add_dev
    --> blk_mq_init_queue
        --> blk_mq_init_allocated_queue
            --> add to 'all_q_list' (*)
    --> add_disk
        --> blk_register_queue
            --> blk_mq_register_disk (++)

null_del_dev
    --> del_gendisk
        --> blk_unregister_queue
            --> blk_mq_unregister_disk (--)
    --> blk_cleanup_queue
        --> blk_mq_free_queue
            --> del from 'all_q_list' (*)

blk_mq_queue_reinit
    --> blk_mq_sysfs_unregister (-)
    --> blk_mq_sysfs_register (+)

While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback.  But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished.  Because '/sys/block/*/mq/' is not exists.

There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.

In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking.  So this introduces
q->mq_sysfs_init_done which is properly protected with all_q_mutex.

Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held.  Since hctx->nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4593fdbe

28 9月, 2015 1 次提交

x86/xen: Support kexec/kdump in HVM guests by doing a soft reset · 0b34a166

由 Vitaly Kuznetsov 提交于 9月 25, 2015

Currently there is a number of issues preventing PVHVM Xen guests from
doing successful kexec/kdump:

  - Bound event channels.
  - Registered vcpu_info.
  - PIRQ/emuirq mappings.
  - shared_info frame after XENMAPSPACE_shared_info operation.
  - Active grant mappings.

Basically, newly booted kernel stumbles upon already set up Xen
interfaces and there is no way to reestablish them. In Xen-4.7 a new
feature called 'soft reset' is coming. A guest performing kexec/kdump
operation is supposed to call SCHEDOP_shutdown hypercall with
SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
(with some help from toolstack) will do full domain cleanup (but
keeping its memory and vCPU contexts intact) returning the guest to
the state it had when it was first booted and thus allowing it to
start over.

Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
probably OK as by default all unknown shutdown reasons cause domain
destroy with a message in toolstack log: 'Unknown shutdown reason code
5. Destroying domain.'  which gives a clue to what the problem is and
eliminates false expectations.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

0b34a166

26 9月, 2015 1 次提交

ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ · 5ebc7603

由 Jiang Liu 提交于 9月 17, 2015

Avoid IRQs occupied by ISA IRQs when allocating IRQs for PCI link devices,
otherwise it may cause interrupt storm due to incompatible pin attributes.

This issue was triggered on a KVM virtual machine, which
 1) uses IRQ9 for SCI in high level mode.
 2) defines an PCI interrupt link device (LNKS) with IRQ9 as the only
    possible irq.
 3) has an PCI device referring to link device LNKS.
So it causes interrupt storm when enabling the PCI device because PCI IRQ
works in low level mode.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5ebc7603

25 9月, 2015 8 次提交

IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY · c6790aa9

由 Sagi Grimberg 提交于 9月 24, 2015

Commit 96249d70 ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.

ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.

The local_dma_lkey support will be restored in CX4 depending on FW
capability query.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c6790aa9

target: Propigate backend read-only to core_tpg_add_lun · eeeb9522

由 Nicholas Bellinger 提交于 9月 15, 2015

This patch adds a DF_READ_ONLY flag that is used by IBLOCK to
signal when a backend has been set to read-only mode, in order
to propigate read-only status up to core_tpg_add_lun() for all
future LUN fabric exports.

With this is place, existing emulation for reporting read-only
in spc_emulate_modesense() and normal transport_lookup_cmd_lun()
TCM_WRITE_PROTECTED status checking just works as expected.
Reported-by: NJoeue Deng <joeue404@gmail.com>
Reported-by: NAndy Grover <agrover@redhat.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

eeeb9522

phy: add phy_device_remove() · 38737e49

由 Russell King 提交于 9月 24, 2015

Add a phy_device_remove() function to complement phy_device_register(),
which undoes the effects of phy_device_register() by removing the phy
device from visibility, but not freeing it.

This allows these details to be moved out of the mdio bus code into
the phy code where this action belongs.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38737e49

phy: fix mdiobus module safety · 3e3aaf64

由 Russell King 提交于 9月 24, 2015

Re-implement the mdiobus module refcounting to ensure that we actually
ensure that the mdiobus module code does not go away while we might call
into it.

The old scheme using bus->dev.driver was buggy, because bus->dev is a
class device which never has a struct device_driver associated with it,
and hence the associated code trying to obtain a refcount did nothing
useful.

Instead, take the approach that other subsystems do: pass the module
when calling mdiobus_register(), and record that in the mii_bus struct.
When we need to increment the module use count in the phy code, use
this stored pointer.  When the phy is deteched, drop the module
refcount, remembering that the phy device might go away at that point.

This doesn't stop the mii_bus going away while there are in-use phys -
it merely stops the underlying code vanishing.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e3aaf64

lwtunnel: remove source and destination UDP port config option · b194f30c

由 Jiri Benc 提交于 9月 22, 2015

The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.

As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.

Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).

If in the future we rework ARP handling to allow port specification, the
attributes can be added back.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b194f30c

ipv4: send arp replies to the correct tunnel · 63d008a4

由 Jiri Benc 提交于 9月 22, 2015

When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.

We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.

The only thing to do is to "reverse" the ip_tunnel_info.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63d008a4

skbuff: Fix skb checksum flag on skb pull · 6ae459bd

由 Pravin B Shelar 提交于 9月 22, 2015

VXLAN device can receive skb with checksum partial. But the checksum
offset could be in outer header which is pulled on receive. This results
in negative checksum offset for the skb. Such skb can cause the assert
failure in skb_checksum_help(). Following patch fixes the bug by setting
checksum-none while pulling outer header.

Following is the kernel panic msg from old kernel hitting the bug.

------------[ cut here ]------------
kernel BUG at net/core/dev.c:1906!
RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
Call Trace:
<IRQ>
[<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
[<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
[<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
[<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
[<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
[<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
[<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
[<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
[<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
[<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
[<ffffffff81550128>] ip_local_deliver+0x88/0x90
[<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
[<ffffffff81550365>] ip_rcv+0x235/0x300
[<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
[<ffffffff8151c360>] netif_receive_skb+0x80/0x90
[<ffffffff81459935>] virtnet_poll+0x555/0x6f0
[<ffffffff8151cd04>] net_rx_action+0x134/0x290
[<ffffffff810683d8>] __do_softirq+0xa8/0x210
[<ffffffff8162fe6c>] call_softirq+0x1c/0x30
[<ffffffff810161a5>] do_softirq+0x65/0xa0
[<ffffffff810687be>] irq_exit+0x8e/0xb0
[<ffffffff81630733>] do_IRQ+0x63/0xe0
[<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
Reported-by: NAnupam Chanda <achanda@vmware.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ae459bd

cgroup, writeback: don't enable cgroup writeback on traditional hierarchies · 9badce00

由 Tejun Heo 提交于 9月 23, 2015

inode_cgwb_enabled() gates cgroup writeback support. If it returns
true, each inode is attached to the corresponding memory domain which
gets mapped to io domain. It currently only tests whether the
filesystem and bdi support cgroup writeback; however, cgroup writeback
support doesn't work on traditional hierarchies and thus it should
also test whether memcg and iocg are on the default hierarchy.

This caused traditional hierarchy setups to hit the cgroup writeback
path inadvertently and ended up creating separate writeback domains
for each memcg and mapping them all to the root iocg uncovering a
couple issues in the cgroup writeback path.

cgroup writeback was never meant to be enabled on traditional
hierarchies. Make inode_cgwb_enabled() test whether both memcg and
iocg are on the default hierarchy.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NArtem Bityutskiy <dedekind1@gmail.com>
Reported-by: NDexuan Cui <decui@microsoft.com>
Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net

9badce00

24 9月, 2015 1 次提交

netpoll: Close race condition between poll_one_napi and napi_disable · 2d8bff12

由 Neil Horman 提交于 9月 23, 2015

Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Change notes:
V2)
	Remove a trailing whtiespace
	Resubmit with proper subject prefix

V3)
	Clean up spacing nits
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d8bff12

23 9月, 2015 2 次提交

userfaultfd: register uapi generic syscall (aarch64) · 09f72981

由 Dr. David Alan Gilbert 提交于 9月 22, 2015

Add the userfaultfd syscalls to uapi asm-generic, it was tested with
postcopy live migration on aarch64 with both 4k and 64k pagesize
kernels.
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09f72981

userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" · ac5be6b4

由 Andrea Arcangeli 提交于 9月 22, 2015

This reverts commit 51360155 and adapts
fs/userfaultfd.c to use the old version of that function.

It didn't look robust to call __wake_up_common with "nr == 1" when we
absolutely require wakeall semantics, but we've full control of what we
insert in the two waitqueue heads of the blocked userfaults.  No
exclusive waitqueue risks to be inserted into those two waitqueue heads
so we can as well stick to "nr == 1" of the old code and we can rely
purely on the fact no waitqueue inserted in one of the two waitqueue
heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac5be6b4

22 9月, 2015 1 次提交

tcp/dccp: fix timewait races in timer handling · ed2e9239

由 Eric Dumazet 提交于 9月 19, 2015

When creating a timewait socket, we need to arm the timer before
allowing other cpus to find it. The signal allowing cpus to find
the socket is setting tw_refcnt to non zero value.

As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
call inet_twsk_schedule() first.

This also means we need to remove tw_refcnt changes from
inet_twsk_schedule() and let the caller handle it.

Note that because we use mod_timer_pinned(), we have the guarantee
the timer wont expire before we set tw_refcnt as we run in BH context.

To make things more readable I introduced inet_twsk_reschedule() helper.

When rearming the timer, we can use mod_timer_pending() to make sure
we do not rearm a canceled timer.

Note: This bug can possibly trigger if packets of a flow can hit
multiple cpus. This does not normally happen, unless flow steering
is broken somehow. This explains this bug was spotted ~5 months after
its introduction.

A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
but will be provided in a separate patch for proper tracking.

Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NYing Cai <ycai@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed2e9239

21 9月, 2015 1 次提交

ip6tunnel: make rx/tx bytes counters consistent · 83cf9a25

由 Nicolas Dichtel 提交于 9月 18, 2015

Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part.

Before the patch, the external ipv6 header + gre header were included on
tx.

After the patch:
$ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1
PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data.
64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms

--- 192.168.6.121 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms
7: ip6gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21
    RX: bytes  packets  errors  dropped overrun mcast
    84         1        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    84         1        0       0       0       0
$ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1
PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data.
64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms

--- 192.168.1.121 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms
8: ip6tnl1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121
    RX: bytes  packets  errors  dropped overrun mcast
    84         1        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    84         1        0       0       0       0
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83cf9a25

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多