1. 18 7月, 2014 15 次提交
    • H
      scsi: Remove CONFIG_SCSI_MULTI_LUN · c309b351
      Hannes Reinecke 提交于
      Obsolete; either use 'max_lun' if the host supports only a
      limited number of LUNs or BLIST_NOLUN if the target has
      problems addressing more than one LUN.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NEwan Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      c309b351
    • D
      sg: O_EXCL and other lock handling · cc833acb
      Douglas Gilbert 提交于
      This addresses a problem reported by Vaughan Cao concerning
      the correctness of the O_EXCL logic in the sg driver. POSIX
      doesn't defined O_EXCL semantics on devices but "allow only
      one open file descriptor at a time per sg device" is a rough
      definition. The sg driver's semantics have been to wait
      on an open() when O_NONBLOCK is not given and there are
      O_EXCL headwinds. Nasty things can happen during that wait
      such as the device being detached (removed). So multiple
      locks are reworked in this patch making it large and hard
      to break down into digestible bits.
      
      This patch is against Linus's current git repository which
      doesn't include any sg patches sent in the last few weeks.
      Hence this patch touches as little as possible that it
      doesn't need to and strips out most SCSI_LOG_TIMEOUT()
      changes in v3 because Hannes said he was going to rework all
      that stuff.
      
      The sg3_utils package has several test programs written to
      test this patch. See examples/sg_tst_excl*.cpp .
      
      Not all the locks and flags in sg have been re-worked in
      this patch, notably sg_request::done . That can wait for
      a follow-up patch if this one meets with approval.
      Signed-off-by: NDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      cc833acb
    • D
      sg: add SG_FLAG_Q_AT_TAIL flag · 16070cc1
      Douglas Gilbert 提交于
      When the SG_IO ioctl was copied into the block layer and
      later into the bsg driver, subtle differences emerged.
      
      One difference is the way injected commands are queued through
      the block layer (i.e. this is not SCSI device queueing nor SATA
      NCQ). Summarizing:
         - SG_IO in the block layer: blk_exec*(at_head=false)
         - sg SG_IO: at_head=true
         - bsg SG_IO: at_head=true
      
      Some time ago Boaz Harrosh introduced a sg v4 flag called
      BSG_FLAG_Q_AT_TAIL to override the bsg driver default.
      This patch does the equivalent for the sg driver.
      
      ChangeLog:
           Introduce SG_FLAG_Q_AT_TAIL flag to cause commands
           to be injected into the block layer with
           at_head=false.
      Signed-off-by: NDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      16070cc1
    • D
      sg: relax 16 byte cdb restriction · 65c26a0f
      Douglas Gilbert 提交于
       - remove the 16 byte CDB (SCSI command) length limit from the sg driver
         by handling longer CDBs the same way as the bsg driver. Remove comment
         from sg.h public interface about the cmd_len field being limited to 16
         bytes.
       - remove some dead code caused by this change
       - cleanup comment block at the top of sg.h, fix urls
      Signed-off-by: NDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      65c26a0f
    • M
      sd: Limit transfer length · bcdb247c
      Martin K. Petersen 提交于
      Until now the per-command transfer length has exclusively been gated by
      the max_sectors parameter in the scsi_host template. Given that the size
      of this parameter has been bumped to an unsigned int we have to be
      careful not to exceed the target device's capabilities.
      
      If the if the device specifies a Maximum Transfer Length in the Block
      Limits VPD we'll use that value. Otherwise we'll use 0xffffffff for
      devices that have use_16_for_rw set and 0xffff for the rest. We then
      combine the chosen disk limit with max_sectors in the host template. The
      smaller of the two will be used to set the max_hw_sectors queue limit.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      bcdb247c
    • C
      sd: bad return code of init_sd · 8d964478
      Clément Calmels 提交于
      In init_sd function, if kmem_cache_create or mempool_create_slab_pools
      calls fail, the error will not be correclty reported because
      class_register previously set the value of err to 0.
      Signed-off-by: NClément Calmels <clement.calmels@free.fr>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8d964478
    • V
      sd: notify block layer when using temporary change to cache_type · cb2fb68d
      Vaughan Cao 提交于
      This is a fix for commit 39c60a09
      
        "sd: fix array cache flushing bug causing performance problems"
      
      We must notify the block layer via q->flush_flags after a temporary change
      of the cache_type to write through.  Without this, a SYNCHRONIZE CACHE
      command will still be generated.  This patch factors out a helper that
      can be called from sd_revalidate_disk and cache_type_store.
      Signed-off-by: NVaughan Cao <vaughan.cao@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cb2fb68d
    • A
      scsi_debug: allow huge transfer length for read/write commands · 6bb5e6e7
      Akinobu Mita 提交于
      This change enables to test read/write commands with huge transfer
      length such as 1GB.  For example:
      
      	# modprobe scsi_debug dev_size_mb=1024 clustering=1 opts=1
      	# cat /sys/block/$DEV/queue/max_hw_sectors_kb > \
      		/sys/block/$DEV/queue/max_sectors_kb
      	# fio --name=test --rw=write --bs=1g --size=1g --filename=/dev/$DEV \
      		--mem=mmaphuge  --direct=1
      
      The data type of max_sectors in scsi_host_template has been extended
      to unsigned int by the previous change.  So we can increase it from
      0xffff to 0xffffffff to allow such huge transfer length.
      
      Also, this increases sg_tablesize and max_segment_size, otherwise the
      maximum transfer length is limited to 64MB.
      (sg_tablesize * max_segment_size = 256 * 256KB)
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked by: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6bb5e6e7
    • A
      scsi: increase upper limit for max_sectors · 8ed5a4d2
      Akinobu Mita 提交于
      max_sectors in struct Scsi_Host specifies maximum number of sectors
      allowed in a single SCSI command.  The data type of max_sectors is
      unsigned short, so the maximum transfer length per SCSI command is
      limited to less than 256MB in 4096-bytes sector size. (0xffff * 4096)
      
      This commit increases the SCSI mid level's limitation for max_sectors
      upto the block layer's limitation for max_hw_sectors by extending the
      data type of max_sectors in struct Scsi_Host and scsi_host_template,
      so that SCSI lower level drivers can specify more than 0xffff.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8ed5a4d2
    • A
      sd: use READ_16 or WRITE_16 when transfer length is greater than 0xffff · e430cbc8
      Akinobu Mita 提交于
      This change makes the scsi disk driver handle the requests whose
      transfer length is greater than 0xffff with READ_16 or WRITE_16.
      
      However, this is a preparation for extending the data type of
      max_sectors in struct Scsi_Host and scsi_host_template.  So, it is
      impossible to happen this condition for now, because SCSI low-level
      drivers can not specify max_sectors greater than 0xffff due to the
      data type limitation.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e430cbc8
    • A
      sg: prevent integer overflow when converting from sectors to bytes · 46f69e6a
      Akinobu Mita 提交于
      This prevents integer overflow when converting the request queue's
      max_sectors from sectors to bytes.  However, this is a preparation for
      extending the data type of max_sectors in struct Scsi_Host and
      scsi_host_template.  So, it is impossible to happen this integer
      overflow for now, because SCSI low-level drivers can not specify
      max_sectors greater than 0xffff due to the data type limitation.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked by: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      46f69e6a
    • B
      scsi: remove two cancel_delayed_work() calls from the mid-layer · fcc95a76
      Bart Van Assche 提交于
      scsi_put_command() is either invoked before blk_start_request() or
      after block layer processing has completed.  scsi_cmnd.abort_work
      is scheduled from inside the SCSI timeout handler.  The block layer
      guarantees that either the regular completion handler
      (softirq_done_fn()) or the timeout handler (rq_timed_out_fn()) is
      invoked but not both. This means that scsi_put_command() is never
      invoked while abort_work is scheduled.  Hence remove the
      cancel_delayed_work() call from scsi_put_command().
      
      Similarly, scsi_abort_command() is only invoked from the SCSI
      timeout handler. If scsi_abort_command() is invoked for a SCSI
      command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
      scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
      hence that scsi_cmnd.abort_work is no longer pending. Hence also
      remove the cancel_delayed_work() call from scsi_abort_command().
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      fcc95a76
    • J
      scsi: handle flush errors properly · 89fb4cd1
      James Bottomley 提交于
      Flush commands don't transfer data and thus need to be special cased
      in the I/O completion handler so that we can propagate errors to
      the block layer and filesystem.
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      Reported-by: NSteven Haber <steven@qumulo.com>
      Tested-by: NSteven Haber <steven@qumulo.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      89fb4cd1
    • L
      Merge tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 59ca9ee4
      Linus Torvalds 提交于
      Pull Xen fixes from Konrad Rzeszutek Wilk:
       "Two fixes found during migration of PV guests.  David would be the one
        doing this pull but he is on vacation.
      
        Fixes:
         - fix console deadlock when resuming PV guests
         - fix regression hit when ballooning and resuming PV guests"
      
      * tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/balloon: set ballooned out pages as invalid in p2m
        xen/manage: fix potential deadlock when resuming the console
      59ca9ee4
    • L
      Merge tag 'trace-fixes-v3.16-rc5-v2' of... · 22d36854
      Linus Torvalds 提交于
      Merge tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
       "A few more fixes for ftrace infrastructure.
      
        I was cleaning out my INBOX and found two fixes from zhangwei from a
        year ago that were lost in my mail.  These fix an inconsistency
        between trace_puts() and the way trace_printk() works.  The reason
        this is important to fix is because when trace_printk() doesn't have
        any arguments, it turns into a trace_puts().  Not being able to enable
        a stack trace against trace_printk() because it does not have any
        arguments is quite confusing.  Also, the fix is rather trivial and low
        risk.
      
        While porting some changes to PowerPC I discovered that it still has
        the function graph tracer filter bug that if you also enable stack
        tracing the function graph tracer filter is ignored.  I fixed that up.
      
        Finally, Martin Lau, fixed a bug that would cause readers of the
        ftrace ring buffer to block forever even though it was suppose to be
        NONBLOCK"
      
      This also includes the fix from an earlier pull request:
      
       "Oleg Nesterov fixed a memory leak that happens if a user creates a
        tracing instance, sets up a filter in an event, and then removes that
        instance.  The filter allocates memory that is never freed when the
        instance is destroyed"
      
      * tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ring-buffer: Fix polling on trace_pipe
        tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs
        tracing: Fix graph tracer with stack tracer on other archs
        tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
        tracing: instance_rmdir() leaks ftrace_event_file->filter
      22d36854
  2. 17 7月, 2014 4 次提交
  3. 16 7月, 2014 4 次提交
  4. 15 7月, 2014 17 次提交
    • Z
      tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs · f0160a5a
      zhangwei(Jovi) 提交于
      The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
      so add it, to be consistent with __trace_printk/__trace_bprintk.
      Those functions are all called by the same function: trace_printk().
      
      Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com
      
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f0160a5a
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 0b632204
      Linus Torvalds 提交于
      Pull fuse fixes from Miklos Szeredi:
       "This contains miscellaneous fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: replace count*size kzalloc by kcalloc
        fuse: release temporary page if fuse_writepage_locked() failed
        fuse: restructure ->rename2()
        fuse: avoid scheduling while atomic
        fuse: handle large user and group ID
        fuse: inode: drop cast
        fuse: ignore entry-timeout on LOOKUP_REVAL
        fuse: timeout comparison fix
      0b632204
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5615f9f8
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Bluetooth pairing fixes from Johan Hedberg.
      
       2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB,
          from Max Stepanov.
      
       3) New iwlwifi chip IDs, from Oren Givon.
      
       4) bnx2x driver reads wrong PCI config space MSI register, from Yijing
          Wang.
      
       5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu.
      
       6) Fix double SKB free in openvswitch, from Andy Zhou.
      
       7) Fix sk_dst_set() being racey with UDP sockets, leading to strange
          crashes, from Eric Dumazet.
      
       8) Interpret the NAPI budget correctly in the new systemport driver,
          from Florian Fainelli.
      
       9) VLAN code frees percpu stats in the wrong place, leading to crashes
          in the get stats handler.  From Eric Dumazet.
      
      10) TCP sockets doing a repair can crash with a divide by zero, because
          we invoke tcp_push() with an MSS value of zero.  Just skip that part
          of the sendmsg paths in repair mode.  From Christoph Paasch.
      
      11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai.
      
      12) Don't ignore path MTU icmp messages with a zero mtu, machines out
          there still spit them out, and all of our per-protocol handlers for
          PMTU can cope with it just fine.  From Edward Allcutt.
      
      13) Some NETDEV_CHANGE notifier invocations were not passing in the
          correct kind of cookie as the argument, from Loic Prylli.
      
      14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul
          Maloy.
      
      15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix
          from Dmitry Popov.
      
      16) Fix skb->sk assigned without taking a reference to 'sk' in
          appletalk, from Andrey Utkin.
      
      17) Fix some info leaks in ULP event signalling to userspace in SCTP,
          from Daniel Borkmann.
      
      18) Fix deadlocks in HSO driver, from Olivier Sobrie.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
        hso: fix deadlock when receiving bursts of data
        hso: remove unused workqueue
        net: ppp: don't call sk_chk_filter twice
        mlx4: mark napi id for gro_skb
        bonding: fix ad_select module param check
        net: pppoe: use correct channel MTU when using Multilink PPP
        neigh: sysctl - simplify address calculation of gc_* variables
        net: sctp: fix information leaks in ulpevent layer
        MAINTAINERS: update r8169 maintainer
        net: bcmgenet: fix RGMII_MODE_EN bit
        tipc: clear 'next'-pointer of message fragments before reassembly
        r8152: fix r8152_csum_workaround function
        be2net: set EQ DB clear-intr bit in be_open()
        GRE: enable offloads for GRE
        farsync: fix invalid memory accesses in fst_add_one() and fst_init_card()
        igb: do a reset on SR-IOV re-init if device is down
        igb: Workaround for i210 Errata 25: Slow System Clock
        usbnet: smsc95xx: add reset_resume function with reset operation
        dp83640: Always decode received status frames
        r8169: disable L23
        ...
      5615f9f8
    • S
      tracing: Fix graph tracer with stack tracer on other archs · 5f8bf2d2
      Steven Rostedt (Red Hat) 提交于
      Running my ftrace tests on PowerPC, it failed the test that checks
      if function_graph tracer is affected by the stack tracer. It was.
      Looking into this, I found that the update_function_graph_func()
      must be called even if the trampoline function is not changed.
      This is because archs like PowerPC do not support ftrace_ops being
      passed by assembly and instead uses a helper function (what the
      trampoline function points to). Since this function is not changed
      even when multiple ftrace_ops are added to the code, the test that
      falls out before calling update_function_graph_func() will miss that
      the update must still be done.
      
      Call update_function_graph_function() for all calls to
      update_ftrace_function()
      
      Cc: stable@vger.kernel.org # 3.3+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5f8bf2d2
    • Z
      tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs · 8abfb872
      zhangwei(Jovi) 提交于
      Currently trace option stacktrace is not applicable for
      trace_printk with constant string argument, the reason is
      in __trace_puts/__trace_bputs ftrace_trace_stack is missing.
      
      In contrast, when using trace_printk with non constant string
      argument(will call into __trace_printk/__trace_bprintk), then
      trace option stacktrace is workable, this inconstant result
      will confuses users a lot.
      
      Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8abfb872
    • T
      ALSA: hda - Fix broken PM due to incomplete i915 initialization · 4da63c6f
      Takashi Iwai 提交于
      When the initialization of Intel HDMI controller fails due to missing
      i915 kernel symbols (e.g. HD-audio is built in while i915 is module),
      the driver discontinues the probe.  However, since the probe was done
      asynchronously, the driver object still remains, thus the relevant PM
      ops are still called at suspend/resume. This results in the bad access
      to the incomplete audio card object, eventually leads to Oops or stall
      at PM.
      
      This patch adds the missing checks of chip->init_failed flag at each
      PM callback in order to fix the problem above.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      4da63c6f
    • O
      hso: fix deadlock when receiving bursts of data · 8f9818af
      Olivier Sobrie 提交于
      When the module sends bursts of data, sometimes a deadlock happens in
      the hso driver when the tty buffer doesn't get the chance to be flushed
      quickly enough.
      
      Remove the endless while loop in function put_rxbuf_data() which is
      called by the urb completion handler.
      If there isn't enough room in the tty buffer, discards all the data
      received in the URB.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dan Williams <dcbw@redhat.com>
      Cc: Jan Dumon <j.dumon@option.com>
      Signed-off-by: NOlivier Sobrie <olivier@sobrie.be>
      Acked-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f9818af
    • O
      hso: remove unused workqueue · 5c763edf
      Olivier Sobrie 提交于
      The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere
      in the code. So, remove it.
      Signed-off-by: NOlivier Sobrie <olivier@sobrie.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c763edf
    • L
      Merge tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 · 1b81e881
      Linus Torvalds 提交于
      Pull firewire fix from Stefan Richter:
       "The 1394 drivers cannot and are not supposed to be built on platforms
        which don't provide the DMA mapping API (regression since v3.16-rc1
        with CONFIG_COMPILE_TEST=y on some architectures)"
      
      * tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA
      1b81e881
    • L
      Merge git://git.kvack.org/~bcrl/aio-fixes · 8ec8ba8e
      Linus Torvalds 提交于
      Pull another aio fix from Ben LaHaise:
       "put_reqs_available() can now be called from within irq context, which
        means that it (and its sibling function get_reqs_available()) now need
        to be irq-safe, not just preempt-safe"
      
      * git://git.kvack.org/~bcrl/aio-fixes:
        aio: protect reqs_available updates from changes in interrupt handlers
      8ec8ba8e
    • S
      net/l2tp: don't fall back on UDP [get|set]sockopt · 3cf521f7
      Sasha Levin 提交于
      The l2tp [get|set]sockopt() code has fallen back to the UDP functions
      for socket option levels != SOL_PPPOL2TP since day one, but that has
      never actually worked, since the l2tp socket isn't an inet socket.
      
      As David Miller points out:
      
        "If we wanted this to work, it'd have to look up the tunnel and then
         use tunnel->sk, but I wonder how useful that would be"
      
      Since this can never have worked so nobody could possibly have depended
      on that functionality, just remove the broken code and return -EINVAL.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NJames Chapman <jchapman@katalix.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Cc: Phil Turnbull <phil.turnbull@oracle.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cf521f7
    • C
      net: ppp: don't call sk_chk_filter twice · 3916a319
      Christoph Schulz 提交于
      Commit 568f194e ("net: ppp: use
      sk_unattached_filter api") causes sk_chk_filter() to be called twice when
      setting a PPP pass or active filter. This applies to both the generic PPP
      subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP
      subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from
      within get_filter(). The second one is through the call chain
      
        ppp_ioctl() or isdn_ppp_ioctl()
        --> sk_unattached_filter_create()
            --> __sk_prepare_filter()
                --> sk_chk_filter()
      
      The first call from within get_filter() should be deleted as get_filter() is
      called just before calling sk_unattached_filter_create() later on, which
      eventually calls sk_chk_filter() anyway.
      
      For 3.15.x, this proposed change is a bugfix rather than a pure optimization as
      in that branch, sk_chk_filter() may replace filter codes by other codes which
      are not recognized when executing sk_chk_filter() a second time. So with
      3.15.x, if sk_chk_filter() is called twice, the second invocation may yield
      EINVAL (this depends on the filter codes found in the filter to be set, but
      because the replacement is done for frequently used codes, this is almost
      always the case). The net effect is that setting pass and/or active PPP filters
      does not work anymore, since sk_unattached_filter_create() always returns
      EINVAL due to the second call to sk_chk_filter(), regardless whether the filter
      was originally sane or not.
      Signed-off-by: NChristoph Schulz <develop@kristov.de>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3916a319
    • J
      mlx4: mark napi id for gro_skb · 32b333fe
      Jason Wang 提交于
      Napi id was not marked for gro_skb, this will lead rx busy loop won't
      work correctly since they stack never try to call low latency receive
      method because of a zero socket napi id. Fix this by marking napi id
      for gro_skb.
      
      The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
      (from 20531.68 to 30610.88).
      
      Cc: Amir Vadai <amirv@mellanox.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b333fe
    • N
      bonding: fix ad_select module param check · 548d28bd
      Nikolay Aleksandrov 提交于
      Obvious copy/paste error when I converted the ad_select to the new
      option API. "lacp_rate" there should be "ad_select" so we can get the
      proper value.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: David S. Miller <davem@davemloft.net>
      
      Fixes: 9e5f5eeb ("bonding: convert ad_select to use the new option
      API")
      Reported-by: NKarim Scheik <karim.scheik@prisma-solutions.at>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      548d28bd
    • C
      net: pppoe: use correct channel MTU when using Multilink PPP · a8a3e41c
      Christoph Schulz 提交于
      The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
      ppp_generic module) tries to determine how big a fragment might be. According
      to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
      corresponding comment and code in ppp_mp_explode():
      
      		/*
      		 * hdrlen includes the 2-byte PPP protocol field, but the
      		 * MTU counts only the payload excluding the protocol field.
      		 * (RFC1661 Section 2)
      		 */
      		mtu = pch->chan->mtu - (hdrlen - 2);
      
      However, the pppoe module *does* include the PPP protocol field in the channel
      MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
      certain circumstances (one byte if PPP protocol compression is used, two
      otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
      module has to subtract two bytes from the channel MTU. This error only
      manifests itself when using Multilink PPP, as otherwise the channel MTU is not
      used anywhere.
      
      In the following, I will describe how to reproduce this bug. We configure two
      pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
      a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
      computed by adding the two link MTUs and subtracting the MP header twice, which
      is 4 bytes long.) The necessary pppd statements on both sides are "multilink
      mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
      rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
      side, we additionally need to start two pppoe-server instances to be able to
      establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
      of the PPP network interface to the MRRU (2976) on both sides of the connection
      in order to make use of the higher bandwidth. (If we didn't do that, IP
      fragmentation would kick in, which we want to avoid.)
      
      Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
      server over the PPP link. This results in the following network packet:
      
         2948 (echo payload)
       +    8 (ICMPv4 header)
       +   20 (IPv4 header)
      ---------------------
         2976 (PPP payload)
      
      These 2976 bytes do not exceed the MTU of the PPP network interface, so the
      IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
      prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
      than the negotiated MRRU. So this packet would have to be divided in three
      fragments. But this does not happen as each link MTU is assumed to be two bytes
      larger. So this packet is diveded into two fragments only, one of size 1489 and
      one of size 1488. Now we have for that bigger fragment:
      
         1489 (PPP payload)
       +    4 (MP header)
       +    2 (PPP protocol field for the MP payload (0x3d))
       +    6 (PPPoE header)
      --------------------------
         1501 (Ethernet payload)
      
      This packet exceeds the link MTU and is discarded.
      
      If one configures the link MTU on the client side to 1501, one can see the
      discarded Ethernet frames with tcpdump running on the client. A
      
      ping -s 2948 -c 1 192.168.15.254
      
      leads to the smaller fragment that is correctly received on the server side:
      
      (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
      52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
        length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
        Flags [end], length 1492
      
      and to the bigger fragment that is not received on the server side:
      
      (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
      52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
        length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
        Flags [begin], length 1493
      
      With the patch below, we correctly obtain three fragments:
      
      52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
        length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
        Flags [begin], length 1492
      52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
        length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
        Flags [none], length 1492
      52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
        length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
        Flags [end], length 5
      
      And the ICMPv4 echo request is successfully received at the server side:
      
      IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
        length 2976)
          192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
            length 2956
      
      The bug was introduced in commit c9aa6895
      ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
      to 3.10 upwards but the fix can be applied (with minor modifications) to
      kernels as old as 2.6.32.
      Signed-off-by: NChristoph Schulz <develop@kristov.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8a3e41c
    • M
      neigh: sysctl - simplify address calculation of gc_* variables · 9ecf07a1
      Mathias Krause 提交于
      The code in neigh_sysctl_register() relies on a specific layout of
      struct neigh_table, namely that the 'gc_*' variables are directly
      following the 'parms' member in a specific order. The code, though,
      expresses this in the most ugly way.
      
      Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
      the table. This way we can refer to the 'gc_*' variables directly.
      
      Similarly seen in the grsecurity patch, written by Brad Spengler.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ecf07a1
    • D
      net: sctp: fix information leaks in ulpevent layer · 8f2e5ae4
      Daniel Borkmann 提交于
      While working on some other SCTP code, I noticed that some
      structures shared with user space are leaking uninitialized
      stack or heap buffer. In particular, struct sctp_sndrcvinfo
      has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
      remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
      putting this into cmsg. But also struct sctp_remote_error
      contains a 2 bytes hole that we don't fill but place into a skb
      through skb_copy_expand() via sctp_ulpevent_make_remote_error().
      
      Both structures are defined by the IETF in RFC6458:
      
      * Section 5.3.2. SCTP Header Information Structure:
      
        The sctp_sndrcvinfo structure is defined below:
      
        struct sctp_sndrcvinfo {
          uint16_t sinfo_stream;
          uint16_t sinfo_ssn;
          uint16_t sinfo_flags;
          <-- 2 bytes hole  -->
          uint32_t sinfo_ppid;
          uint32_t sinfo_context;
          uint32_t sinfo_timetolive;
          uint32_t sinfo_tsn;
          uint32_t sinfo_cumtsn;
          sctp_assoc_t sinfo_assoc_id;
        };
      
      * 6.1.3. SCTP_REMOTE_ERROR:
      
        A remote peer may send an Operation Error message to its peer.
        This message indicates a variety of error conditions on an
        association. The entire ERROR chunk as it appears on the wire
        is included in an SCTP_REMOTE_ERROR event. Please refer to the
        SCTP specification [RFC4960] and any extensions for a list of
        possible error formats. An SCTP error notification has the
        following format:
      
        struct sctp_remote_error {
          uint16_t sre_type;
          uint16_t sre_flags;
          uint32_t sre_length;
          uint16_t sre_error;
          <-- 2 bytes hole  -->
          sctp_assoc_t sre_assoc_id;
          uint8_t  sre_data[];
        };
      
      Fix this by setting both to 0 before filling them out. We also
      have other structures shared between user and kernel space in
      SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
      copy that buffer over from user space first and thus don't need
      to care about it in that cases.
      
      While at it, we can also remove lengthy comments copied from
      the draft, instead, we update the comment with the correct RFC
      number where one can look it up.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f2e5ae4