提交 · 06a16293f71927f756dcf37558a79c0b05a91641 · openeuler / raspberrypi-kernel

27 10月, 2015 1 次提交

Input: evdev - add event-mask API · 06a16293

由 David Herrmann 提交于 10月 24, 2015

Hardware manufacturers group keys in the weirdest way possible. This may
cause a power-key to be grouped together with normal keyboard keys and
thus be reported on the same kernel interface.

However, user-space is often only interested in specific sets of events.
For instance, daemons dealing with system-reboot (like systemd-logind)
listen for KEY_POWER, but are not interested in any main keyboard keys.
Usually, power keys are reported via separate interfaces, however,
some i8042 boards report it in the AT matrix. To avoid waking up those
system daemons on each key-press, we had two ideas:
 - split off KEY_POWER into a separate interface unconditionally
 - allow filtering a specific set of events on evdev FDs

Splitting of KEY_POWER is a rather weird way to deal with this and may
break backwards-compatibility. It is also specific to KEY_POWER and might
be required for other stuff, too. Moreover, we might end up with a huge
set of input-devices just to have them properly split.

Hence, this patchset implements the second idea: An event-mask to specify
which events you're interested in. Two ioctls allow setting this mask for
each event-type. If not set, all events are reported. The type==0 entry is
used same as in EVIOCGBIT to set the actual EV_* mask of filtered events.
This way, you have a two-level filter.

We are heavily forward-compatible to new event-types and event-codes. So
new user-space will be able to run on an old kernel which doesn't know the
given event-codes or event-types.
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

06a16293

17 10月, 2015 5 次提交

Input: rotary-encoder - add support for quarter-period mode · 3a341a4c

由 Ezequiel Garcia 提交于 10月 13, 2015

Some encoders have both outputs low in stable states, others also have
a stable state with both outputs high (half-period mode) and some have
a stable state in all steps (quarter-period mode). The driver used to
support the former states and with this change it can also support the
later.

This commit also deprecates the 'half-period' property and introduces
a new property 'steps-per-period'. This property specifies the
number of steps (stable states) produced by the rotary encoder
for each GPIO period.
Signed-off-by: NGuido Martínez <guido@vanguardiasur.com.ar>
Signed-off-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

3a341a4c

Input: document and check on implicitly defined FF_MAX_EFFECTS · 33b96d93

由 Elias Vanderstuyft 提交于 10月 14, 2015

There is an undocumented upper bound for the total number of ff effects:
    FF_GAIN (= 96).
This can be found as follows:
- user: write(EV_FF, effect_id, iterations)
    calls kernel: ff->playback(effect_id, ...): starts effect "effect_id"
- user: write(EV_FF, FF_GAIN, gain)
    calls kernel: ff->set_gain(gain, ...): sets gain

A collision occurs when effect_id equals FF_GAIN.
According to input_ff_event(),
FF_GAIN is the smallest value where a collision occurs.
Therefore the greatest safe value for effect_id is FF_GAIN - 1,
and thus the total number of effects should never exceed FF_GAIN.

Define FF_MAX_EFFECTS as FF_GAIN and check on this limit in ff-core.
Signed-off-by: NElias Vanderstuyft <elias.vds@gmail.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

33b96d93

Input: fix EVIOCSFF macro inconsistency by using _IOW() · 52a92667

由 Elias Vanderstuyft 提交于 9月 21, 2015

Just like the EVIOCSABS(abs) macro, use the more compact
_IOW(..., type) instead of _IOC(_IOC_WRITE, ..., sizeof(type))
for the EVIOCSFF macro.
Signed-off-by: NElias Vanderstuyft <elias.vds@gmail.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

52a92667

devicetree: bindings: Use linux-event-codes.h for evdev codes · 4f38b9f2

由 Hans de Goede 提交于 10月 14, 2015

Add a symlink to uapi/linux/linux-event-codes.h, and include that
instead of (re)defining all the evdev type and code values in
dt-bindings/input/input.h. This way we do not need to keep all the event
codes synced manually.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

4f38b9f2

Input: add input-event-codes header file · f902dd89

由 Hans de Goede 提交于 10月 14, 2015

Add input-event-codes header file and move all type and axis defines
there.

The purpose of this new header file is to have a single canonical source
for event-codes which can be used outside of C-code too. One example of
such usage is the use of event-codes in devicetree source files.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

f902dd89

14 10月, 2015 2 次提交

Input: rotary_encoder - add wake up support · 47ec6e5a

由 Sylvain Rochet 提交于 10月 13, 2015

This patch adds wake up support to GPIO rotary encoders.
Signed-off-by: NSylvain Rochet <sylvain.rochet@finsecur.com>
Reviewed-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

47ec6e5a

Input: improve autorepeat initialization · 027c71bb

由 Petri Gynther 提交于 10月 13, 2015

Add new function input_enable_softrepeat() that allows drivers to
initialize their own values for input_dev->rep[REP_DELAY] and
input_dev->rep[REP_PERIOD], but also use the software autorepeat
functionality from input.c.

For example, a HID driver could do:

static void xyz_input_configured(struct hid_device *hid,
                                 struct hid_input *hidinput)
{
        input_enable_softrepeat(hidinput->input, 400, 100);
}

static struct hid_driver xyz_driver = {
        .input_configured = xyz_input_configured,
}
Signed-off-by: NPetri Gynther <pgynther@google.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

027c71bb

02 10月, 2015 5 次提交

drm/dp/mst: add some defines for logical/physical ports · ccf03d69

由 Dave Airlie 提交于 10月 01, 2015

This just removes the magic number.
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDave Airlie <airlied@redhat.com>

ccf03d69

drm/dp/mst: split connector registration into two parts (v2) · d9515c5e

由 Dave Airlie 提交于 9月 16, 2015

In order to cache the EDID properly for tiled displays, we
need to retrieve it before we register the connector with
userspace, otherwise userspace can call get resources
and try and get the edid before we've even cached it.

This fixes some problems when hotplugging mst monitors,
with X/mutter running. As mutter seems to get 0 modes
for one of the monitors in the tile.

v2: fix warning in radeon
handle tile setting in cached path rather than
get edid path.
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: NDave Airlie <airlied@redhat.com>

d9515c5e

memcg: remove pcp_counter_lock · ef510194

由 Greg Thelen 提交于 10月 01, 2015

Commit 733a572e ("memcg: make mem_cgroup_read_{stat|event}() iterate
possible cpus instead of online") removed the last use of the per memcg
pcp_counter_lock but forgot to remove the variable.

Kill the vestigial variable.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef510194

memcg: fix dirty page migration · 0610c25d

由 Greg Thelen 提交于 10月 01, 2015

The problem starts with a file backed dirty page which is charged to a
memcg.  Then page migration is used to move oldpage to newpage.

Migration:
 - copies the oldpage's data to newpage
 - clears oldpage.PG_dirty
 - sets newpage.PG_dirty
 - uncharges oldpage from memcg
 - charges newpage to memcg

Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
count.

However, because newpage is not yet charged, setting newpage.PG_dirty
does not increment the memcg's dirty page count.  After migration
completes newpage.PG_dirty is eventually cleared, often in
account_page_cleaned().  At this time newpage is charged to a memcg so
the memcg's dirty page count is decremented which causes underflow
because the count was not previously incremented by migration.  This
underflow causes balance_dirty_pages() to see a very large unsigned
number of dirty memcg pages which leads to aggressive throttling of
buffered writes by processes in non root memcg.

This issue:
 - can harm performance of non root memcg buffered writes.
 - can report too small (even negative) values in
   memory.stat[(total_)dirty] counters of all memcg, including the root.

To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
page_memcg() and set_page_memcg() helpers.

Test:
    0) setup and enter limited memcg
    mkdir /sys/fs/cgroup/test
    echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/test/cgroup.procs

    1) buffered writes baseline
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    2) buffered writes with compaction antagonist to induce migration
    yes 1 > /proc/sys/vm/compact_memory &
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    kill %
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    3) buffered writes without antagonist, should match baseline
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

                       (speed, dirty residue)
             unpatched                       patched
    1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
    2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
    3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages

    Notice that unpatched baseline performance (1) fell after
    migration (3): 841 -> 114 MB/s.  In the patched kernel, post
    migration performance matches baseline.

Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting")
Signed-off-by: NGreg Thelen <gthelen@google.com>
Reported-by: NDave Hansen <dave.hansen@intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0610c25d

userfaultfd: remove kernel header include from uapi header · 9ff42d10

由 Andre Przywara 提交于 10月 01, 2015

As include/uapi/linux/userfaultfd.h is a user visible header file, it
should not include kernel-exclusive header files.

So trying to build the userfaultfd test program from the selftests
directory fails, since it contains a reference to linux/compiler.h.  As
it turns out, that header is not really needed there, so we can simply
remove it to fix that issue.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ff42d10

01 10月, 2015 2 次提交

blk-mq: factor out a helper to iterate all tags for a request_queue · 0bf6cd5b

由 Christoph Hellwig 提交于 9月 27, 2015

And replace the blk_mq_tag_busy_iter with it - the driver use has been
replaced with a new helper a while ago, and internal to the block we
only need the new version.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0bf6cd5b

blk-mq: fix racy updates of rq->errors · f4829a9b

由 Christoph Hellwig 提交于 9月 27, 2015

blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f4829a9b

30 9月, 2015 4 次提交

drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2 · 4ad640e9

由 Egbert Eich 提交于 9月 23, 2015

drm_kms_helper_poll_enable() was converted to lock the mode_config
mutex in commit 8c4ccc4a
("drm/probe-helper: Grab mode_config.mutex in poll_init/enable").

This disregarded the cases where this function is called from a context
where this mutex is already locked.

Add a non-locking version as well.

Changes since v1:
- use function name suffix '_locked' for the function that
  is to be called from a locked context.
Signed-off-by: NEgbert Eich <eich@suse.de>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

4ad640e9

skbuff: Fix skb checksum partial check. · 31b33dfb

由 Pravin B Shelar 提交于 9月 28, 2015

Earlier patch 6ae459bd tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb->data.

Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.

Fixes: 6ae459bd ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: NAndrew Vagin <avagin@odin.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31b33dfb

af_unix: Convert the unix_sk macro to an inline function for type safety · 4613012d

由 Aaron Conole 提交于 9月 26, 2015

As suggested by Eric Dumazet this change replaces the
#define with a static inline function to enjoy
complaints by the compiler when misusing the API.
Signed-off-by: NAaron Conole <aconole@bytheb.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4613012d

blk-mq: fix sysfs registration/unregistration race · 4593fdbe

由 Akinobu Mita 提交于 9月 27, 2015

There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.

null_add_dev
    --> blk_mq_init_queue
        --> blk_mq_init_allocated_queue
            --> add to 'all_q_list' (*)
    --> add_disk
        --> blk_register_queue
            --> blk_mq_register_disk (++)

null_del_dev
    --> del_gendisk
        --> blk_unregister_queue
            --> blk_mq_unregister_disk (--)
    --> blk_cleanup_queue
        --> blk_mq_free_queue
            --> del from 'all_q_list' (*)

blk_mq_queue_reinit
    --> blk_mq_sysfs_unregister (-)
    --> blk_mq_sysfs_register (+)

While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback.  But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished.  Because '/sys/block/*/mq/' is not exists.

There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.

In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking.  So this introduces
q->mq_sysfs_init_done which is properly protected with all_q_mutex.

Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held.  Since hctx->nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4593fdbe

28 9月, 2015 2 次提交

Input: edt-ft5x06 - remove support for platform data · 22ddbacc

由 Dmitry Torokhov 提交于 9月 12, 2015

We do not have any users of platform data in the tree and all newer
platforms are either DT or ACPI, so let's drop handling of platform data.
Tested-by: NFranklin S Cooper Jr <fcooper@ti.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

22ddbacc

Input: edt-ft5x06 - switch to newer gpio framework · 13c23cd1

由 Franklin S Cooper Jr 提交于 9月 11, 2015

The current/old gpio framework used doesn't properly listen to
ACTIVE_LOW and ACTIVE_HIGH flags. The newer gpio framework takes into
account these flags when setting gpio values.

Since the values being output were based on voltage and not logic they
change to reflect this difference. Also use gpiod_set_value_cansleep since
wake and reset pins can be provided by bus based io expanders.

Switch from msleep(5) to udelay_range(5000,6000) to avoid check patch
warning.
Signed-off-by: NFranklin S Cooper Jr <fcooper@ti.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

13c23cd1

26 9月, 2015 1 次提交

ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ · 5ebc7603

由 Jiang Liu 提交于 9月 17, 2015

Avoid IRQs occupied by ISA IRQs when allocating IRQs for PCI link devices,
otherwise it may cause interrupt storm due to incompatible pin attributes.

This issue was triggered on a KVM virtual machine, which
 1) uses IRQ9 for SCI in high level mode.
 2) defines an PCI interrupt link device (LNKS) with IRQ9 as the only
    possible irq.
 3) has an PCI device referring to link device LNKS.
So it causes interrupt storm when enabling the PCI device because PCI IRQ
works in low level mode.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5ebc7603

25 9月, 2015 8 次提交

IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY · c6790aa9

由 Sagi Grimberg 提交于 9月 24, 2015

Commit 96249d70 ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.

ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.

The local_dma_lkey support will be restored in CX4 depending on FW
capability query.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c6790aa9

target: Propigate backend read-only to core_tpg_add_lun · eeeb9522

由 Nicholas Bellinger 提交于 9月 15, 2015

This patch adds a DF_READ_ONLY flag that is used by IBLOCK to
signal when a backend has been set to read-only mode, in order
to propigate read-only status up to core_tpg_add_lun() for all
future LUN fabric exports.

With this is place, existing emulation for reporting read-only
in spc_emulate_modesense() and normal transport_lookup_cmd_lun()
TCM_WRITE_PROTECTED status checking just works as expected.
Reported-by: NJoeue Deng <joeue404@gmail.com>
Reported-by: NAndy Grover <agrover@redhat.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

eeeb9522

phy: add phy_device_remove() · 38737e49

由 Russell King 提交于 9月 24, 2015

Add a phy_device_remove() function to complement phy_device_register(),
which undoes the effects of phy_device_register() by removing the phy
device from visibility, but not freeing it.

This allows these details to be moved out of the mdio bus code into
the phy code where this action belongs.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38737e49

phy: fix mdiobus module safety · 3e3aaf64

由 Russell King 提交于 9月 24, 2015

Re-implement the mdiobus module refcounting to ensure that we actually
ensure that the mdiobus module code does not go away while we might call
into it.

The old scheme using bus->dev.driver was buggy, because bus->dev is a
class device which never has a struct device_driver associated with it,
and hence the associated code trying to obtain a refcount did nothing
useful.

Instead, take the approach that other subsystems do: pass the module
when calling mdiobus_register(), and record that in the mii_bus struct.
When we need to increment the module use count in the phy code, use
this stored pointer.  When the phy is deteched, drop the module
refcount, remembering that the phy device might go away at that point.

This doesn't stop the mii_bus going away while there are in-use phys -
it merely stops the underlying code vanishing.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e3aaf64

lwtunnel: remove source and destination UDP port config option · b194f30c

由 Jiri Benc 提交于 9月 22, 2015

The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.

As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.

Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).

If in the future we rework ARP handling to allow port specification, the
attributes can be added back.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b194f30c

ipv4: send arp replies to the correct tunnel · 63d008a4

由 Jiri Benc 提交于 9月 22, 2015

When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.

We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.

The only thing to do is to "reverse" the ip_tunnel_info.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63d008a4

skbuff: Fix skb checksum flag on skb pull · 6ae459bd

由 Pravin B Shelar 提交于 9月 22, 2015

VXLAN device can receive skb with checksum partial. But the checksum
offset could be in outer header which is pulled on receive. This results
in negative checksum offset for the skb. Such skb can cause the assert
failure in skb_checksum_help(). Following patch fixes the bug by setting
checksum-none while pulling outer header.

Following is the kernel panic msg from old kernel hitting the bug.

------------[ cut here ]------------
kernel BUG at net/core/dev.c:1906!
RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
Call Trace:
<IRQ>
[<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
[<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
[<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
[<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
[<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
[<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
[<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
[<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
[<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
[<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
[<ffffffff81550128>] ip_local_deliver+0x88/0x90
[<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
[<ffffffff81550365>] ip_rcv+0x235/0x300
[<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
[<ffffffff8151c360>] netif_receive_skb+0x80/0x90
[<ffffffff81459935>] virtnet_poll+0x555/0x6f0
[<ffffffff8151cd04>] net_rx_action+0x134/0x290
[<ffffffff810683d8>] __do_softirq+0xa8/0x210
[<ffffffff8162fe6c>] call_softirq+0x1c/0x30
[<ffffffff810161a5>] do_softirq+0x65/0xa0
[<ffffffff810687be>] irq_exit+0x8e/0xb0
[<ffffffff81630733>] do_IRQ+0x63/0xe0
[<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
Reported-by: NAnupam Chanda <achanda@vmware.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ae459bd

cgroup, writeback: don't enable cgroup writeback on traditional hierarchies · 9badce00

由 Tejun Heo 提交于 9月 23, 2015

inode_cgwb_enabled() gates cgroup writeback support. If it returns
true, each inode is attached to the corresponding memory domain which
gets mapped to io domain. It currently only tests whether the
filesystem and bdi support cgroup writeback; however, cgroup writeback
support doesn't work on traditional hierarchies and thus it should
also test whether memcg and iocg are on the default hierarchy.

This caused traditional hierarchy setups to hit the cgroup writeback
path inadvertently and ended up creating separate writeback domains
for each memcg and mapping them all to the root iocg uncovering a
couple issues in the cgroup writeback path.

cgroup writeback was never meant to be enabled on traditional
hierarchies. Make inode_cgwb_enabled() test whether both memcg and
iocg are on the default hierarchy.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NArtem Bityutskiy <dedekind1@gmail.com>
Reported-by: NDexuan Cui <decui@microsoft.com>
Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net

9badce00

24 9月, 2015 1 次提交

netpoll: Close race condition between poll_one_napi and napi_disable · 2d8bff12

由 Neil Horman 提交于 9月 23, 2015

Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Change notes:
V2)
	Remove a trailing whtiespace
	Resubmit with proper subject prefix

V3)
	Clean up spacing nits
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d8bff12

23 9月, 2015 2 次提交

userfaultfd: register uapi generic syscall (aarch64) · 09f72981

由 Dr. David Alan Gilbert 提交于 9月 22, 2015

Add the userfaultfd syscalls to uapi asm-generic, it was tested with
postcopy live migration on aarch64 with both 4k and 64k pagesize
kernels.
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09f72981

userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" · ac5be6b4

由 Andrea Arcangeli 提交于 9月 22, 2015

This reverts commit 51360155 and adapts
fs/userfaultfd.c to use the old version of that function.

It didn't look robust to call __wake_up_common with "nr == 1" when we
absolutely require wakeall semantics, but we've full control of what we
insert in the two waitqueue heads of the blocked userfaults.  No
exclusive waitqueue risks to be inserted into those two waitqueue heads
so we can as well stick to "nr == 1" of the old code and we can rely
purely on the fact no waitqueue inserted in one of the two waitqueue
heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac5be6b4

22 9月, 2015 1 次提交

tcp/dccp: fix timewait races in timer handling · ed2e9239

由 Eric Dumazet 提交于 9月 19, 2015

When creating a timewait socket, we need to arm the timer before
allowing other cpus to find it. The signal allowing cpus to find
the socket is setting tw_refcnt to non zero value.

As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
call inet_twsk_schedule() first.

This also means we need to remove tw_refcnt changes from
inet_twsk_schedule() and let the caller handle it.

Note that because we use mod_timer_pinned(), we have the guarantee
the timer wont expire before we set tw_refcnt as we run in BH context.

To make things more readable I introduced inet_twsk_reschedule() helper.

When rearming the timer, we can use mod_timer_pending() to make sure
we do not rearm a canceled timer.

Note: This bug can possibly trigger if packets of a flow can hit
multiple cpus. This does not normally happen, unless flow steering
is broken somehow. This explains this bug was spotted ~5 months after
its introduction.

A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
but will be provided in a separate patch for proper tracking.

Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NYing Cai <ycai@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed2e9239

21 9月, 2015 4 次提交

ip6tunnel: make rx/tx bytes counters consistent · 83cf9a25

由 Nicolas Dichtel 提交于 9月 18, 2015

Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part.

Before the patch, the external ipv6 header + gre header were included on
tx.

After the patch:
$ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1
PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data.
64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms

--- 192.168.6.121 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms
7: ip6gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21
    RX: bytes  packets  errors  dropped overrun mcast
    84         1        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    84         1        0       0       0       0
$ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1
PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data.
64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms

--- 192.168.1.121 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms
8: ip6tnl1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121
    RX: bytes  packets  errors  dropped overrun mcast
    84         1        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    84         1        0       0       0       0
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83cf9a25

net: Fix behaviour of unreachable, blackhole and prohibit routes · 0315e382

由 Nikola Forró 提交于 9月 17, 2015

Man page of ip-route(8) says following about route types:

  unreachable - these destinations are unreachable.  Packets are dis‐
  carded and the ICMP message host unreachable is generated.  The local
  senders get an EHOSTUNREACH error.

  blackhole - these destinations are unreachable.  Packets are dis‐
  carded silently.  The local senders get an EINVAL error.

  prohibit - these destinations are unreachable.  Packets are discarded
  and the ICMP message communication administratively prohibited is
  generated.  The local senders get an EACCES error.

In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.

In both address families all three route types now behave consistently
with documentation.
Signed-off-by: NNikola Forró <nforro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0315e382

rcu: Change _wait_rcu_gp() to work around GCC bug 67055 · 66e8c57d

由 Oleg Nesterov 提交于 8月 25, 2015

Code like this in inline functions confuses some recent versions of gcc:

	const int n = const-expr;
	whatever_t array[n];

For more details, see:

	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67055#c13

This compiler bug results in the following failure after 114b7fd4b (rcu:
Create rcu_sync infrastructure):

	In file included from include/linux/rcupdate.h:429:0,
			  from include/linux/rcu_sync.h:5,
			  from kernel/rcu/sync.c:1:
	include/linux/rcutiny.h: In function 'rcu_barrier_sched':
	include/linux/rcutiny.h:55:20: internal compiler error: Segmentation fault
	  static inline void rcu_barrier_sched(void)

This commit therefore eliminates the constant local variable in favor of
direct use of the expression.
Reported-and-tested-by: NMark Salter <msalter@redhat.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

66e8c57d

security: fix typo in security_task_prctl · b7f76ea2

由 Jann Horn 提交于 9月 18, 2015

Signed-off-by: NJann Horn <jann@thejh.net>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7f76ea2

19 9月, 2015 1 次提交

mm: fix type cast in __pfn_to_phys() · ae4f9769

由 Tyler Baker 提交于 9月 19, 2015

The various definitions of __pfn_to_phys() have been consolidated to
use a generic macro in include/asm-generic/memory_model.h. This hit
mainline in the form of 012dcef3 "mm: move __phys_to_pfn and
__pfn_to_phys to asm/generic/memory_model.h". When the generic macro
was implemented the type cast to phys_addr_t was dropped which caused
boot regressions on ARM platforms with more than 4GB of memory and
LPAE enabled.

It was suggested to use PFN_PHYS() defined in include/linux/pfn.h
as provides the correct logic and avoids further duplication.
Reported-by: Nkernelci.org bot <bot@kernelci.org>
Suggested-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NTyler Baker <tyler.baker@linaro.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ae4f9769

18 9月, 2015 1 次提交

IB/hfi1: fix pstateinfo from returning improperly byteswapped value · aadfc3b2

由 Ira Weiny 提交于 9月 09, 2015

Byteswap link_width_downgrade_*_active values before sending on the wire.  In
addition properly define the Port State Info structure.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NChristian Gomez <christian.gomez@intel.com>
Signed-off-by: NRimmer, Todd <todd.rimmer@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Acked-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

aadfc3b2