Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
doujutun3207
flink
提交
c5d1d123
F
flink
项目概览
doujutun3207
/
flink
与 Fork 源项目一致
从无法访问的项目Fork
通知
24
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
F
flink
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
c5d1d123
编写于
5月 13, 2016
作者:
F
Flavio Pompermaier
提交者:
twalthr
7月 25, 2016
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[FLINK-3901] [table] Create a RowCsvInputFormat to use as default CSV IF in Table API
上级
32130160
变更
4
展开全部
显示空白变更内容
内联
并排
Showing
4 changed file
with
1241 addition
and
18 deletion
+1241
-18
flink-libraries/flink-table/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java
.../java/org/apache/flink/api/java/io/RowCsvInputFormat.java
+143
-0
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/sources/CsvTableSource.scala
...a/org/apache/flink/api/table/sources/CsvTableSource.scala
+11
-17
flink-libraries/flink-table/src/test/java/org/apache/flink/api/java/batch/TableSourceITCase.java
...va/org/apache/flink/api/java/batch/TableSourceITCase.java
+1
-1
flink-libraries/flink-table/src/test/java/org/apache/flink/api/java/io/RowCsvInputFormatTest.java
...a/org/apache/flink/api/java/io/RowCsvInputFormatTest.java
+1086
-0
未找到文件。
flink-libraries/flink-table/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java
0 → 100644
浏览文件 @
c5d1d123
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHRow WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package
org.apache.flink.api.java.io
;
import
org.apache.flink.annotation.Internal
;
import
org.apache.flink.api.common.io.ParseException
;
import
org.apache.flink.api.table.Row
;
import
org.apache.flink.api.table.typeutils.RowTypeInfo
;
import
org.apache.flink.core.fs.Path
;
import
org.apache.flink.types.parser.FieldParser
;
import
org.apache.flink.types.parser.FieldParser.ParseErrorState
;
@Internal
public
class
RowCsvInputFormat
extends
CsvInputFormat
<
Row
>
{
private
static
final
long
serialVersionUID
=
1L
;
private
int
arity
;
public
RowCsvInputFormat
(
Path
filePath
,
RowTypeInfo
rowTypeInfo
)
{
this
(
filePath
,
DEFAULT_LINE_DELIMITER
,
DEFAULT_FIELD_DELIMITER
,
rowTypeInfo
);
}
public
RowCsvInputFormat
(
Path
filePath
,
String
lineDelimiter
,
String
fieldDelimiter
,
RowTypeInfo
rowTypeInfo
)
{
this
(
filePath
,
lineDelimiter
,
fieldDelimiter
,
rowTypeInfo
,
createDefaultMask
(
rowTypeInfo
.
getArity
()));
}
public
RowCsvInputFormat
(
Path
filePath
,
RowTypeInfo
rowTypeInfo
,
int
[]
includedFieldsMask
)
{
this
(
filePath
,
DEFAULT_LINE_DELIMITER
,
DEFAULT_FIELD_DELIMITER
,
rowTypeInfo
,
includedFieldsMask
);
}
public
RowCsvInputFormat
(
Path
filePath
,
String
lineDelimiter
,
String
fieldDelimiter
,
RowTypeInfo
rowTypeInfo
,
int
[]
includedFieldsMask
)
{
this
(
filePath
,
lineDelimiter
,
fieldDelimiter
,
rowTypeInfo
,
(
includedFieldsMask
==
null
)
?
createDefaultMask
(
rowTypeInfo
.
getArity
())
:
toBooleanMask
(
includedFieldsMask
));
}
public
RowCsvInputFormat
(
Path
filePath
,
RowTypeInfo
rowTypeInfo
,
boolean
[]
includedFieldsMask
)
{
this
(
filePath
,
DEFAULT_LINE_DELIMITER
,
DEFAULT_FIELD_DELIMITER
,
rowTypeInfo
,
includedFieldsMask
);
}
public
RowCsvInputFormat
(
Path
filePath
,
String
lineDelimiter
,
String
fieldDelimiter
,
RowTypeInfo
rowTypeInfo
,
boolean
[]
includedFieldsMask
)
{
super
(
filePath
);
if
(
rowTypeInfo
.
getArity
()
==
0
)
{
throw
new
IllegalArgumentException
(
"Row arity must be greater than 0."
);
}
if
(
includedFieldsMask
==
null
)
{
includedFieldsMask
=
createDefaultMask
(
rowTypeInfo
.
getArity
());
}
this
.
arity
=
rowTypeInfo
.
getArity
();
setDelimiter
(
lineDelimiter
);
setFieldDelimiter
(
fieldDelimiter
);
Class
<?>[]
classes
=
new
Class
<?>[
rowTypeInfo
.
getArity
()];
for
(
int
i
=
0
;
i
<
rowTypeInfo
.
getArity
();
i
++)
{
classes
[
i
]
=
rowTypeInfo
.
getTypeAt
(
i
).
getTypeClass
();
}
setFieldsGeneric
(
includedFieldsMask
,
classes
);
}
@Override
public
Row
fillRecord
(
Row
reuse
,
Object
[]
parsedValues
)
{
if
(
reuse
==
null
)
{
reuse
=
new
Row
(
arity
);
}
for
(
int
i
=
0
;
i
<
parsedValues
.
length
;
i
++)
{
reuse
.
setField
(
i
,
parsedValues
[
i
]);
}
return
reuse
;
}
@Override
protected
boolean
parseRecord
(
Object
[]
holders
,
byte
[]
bytes
,
int
offset
,
int
numBytes
)
throws
ParseException
{
boolean
[]
fieldIncluded
=
this
.
fieldIncluded
;
int
startPos
=
offset
;
final
int
limit
=
offset
+
numBytes
;
for
(
int
field
=
0
,
output
=
0
;
field
<
fieldIncluded
.
length
;
field
++)
{
// check valid start position
if
(
startPos
>=
limit
)
{
if
(
isLenient
())
{
return
false
;
}
else
{
throw
new
ParseException
(
"Row too short: "
+
new
String
(
bytes
,
offset
,
numBytes
));
}
}
if
(
fieldIncluded
[
field
])
{
// parse field
@SuppressWarnings
(
"unchecked"
)
FieldParser
<
Object
>
parser
=
(
FieldParser
<
Object
>)
this
.
getFieldParsers
()[
output
];
int
latestValidPos
=
startPos
;
startPos
=
parser
.
resetErrorStateAndParse
(
bytes
,
startPos
,
limit
,
this
.
getFieldDelimiter
(),
holders
[
output
]);
if
(!
isLenient
()
&&
parser
.
getErrorState
()
!=
ParseErrorState
.
NONE
)
{
// Row is able to handle null values
if
(
parser
.
getErrorState
()
!=
ParseErrorState
.
EMPTY_STRING
)
{
throw
new
ParseException
(
String
.
format
(
"Parsing error for column %s of row '%s' originated by %s: %s."
,
field
,
new
String
(
bytes
,
offset
,
numBytes
),
parser
.
getClass
().
getSimpleName
(),
parser
.
getErrorState
()));
}
}
holders
[
output
]
=
parser
.
getLastResult
();
// check parse result
if
(
startPos
<
0
)
{
holders
[
output
]
=
null
;
startPos
=
skipFields
(
bytes
,
latestValidPos
,
limit
,
this
.
getFieldDelimiter
());
}
output
++;
}
else
{
// skip field
startPos
=
skipFields
(
bytes
,
startPos
,
limit
,
this
.
getFieldDelimiter
());
}
}
return
true
;
}
}
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/sources/CsvTableSource.scala
浏览文件 @
c5d1d123
...
...
@@ -23,11 +23,13 @@ import org.apache.flink.api.java.io.TupleCsvInputFormat
import
org.apache.flink.api.java.tuple.Tuple
import
org.apache.flink.api.java.typeutils.
{
TupleTypeInfo
,
TupleTypeInfoBase
}
import
org.apache.flink.api.java.
{
DataSet
,
ExecutionEnvironment
}
import
org.apache.flink.api.table.
{
Row
,
TableException
}
import
org.apache.flink.api.table.
Row
import
org.apache.flink.core.fs.Path
import
org.apache.flink.api.table.typeutils.RowTypeInfo
import
org.apache.flink.api.java.io.RowCsvInputFormat
/**
* A [[TableSource]] for simple CSV files with
up to 25
fields.
* A [[TableSource]] for simple CSV files with
a (logically) unlimited number of
fields.
*
* @param path The path to the CSV file.
* @param fieldNames The names of the table fields.
...
...
@@ -49,21 +51,13 @@ class CsvTableSource(
ignoreFirstLine
:
Boolean
=
false
,
ignoreComments
:
String
=
null
,
lenient
:
Boolean
=
false
)
extends
BatchTableSource
[
Tuple
]
{
if
(
fieldNames
.
length
!=
fieldTypes
.
length
)
{
throw
new
TableException
(
"Number of field names and field types must be equal."
)
}
if
(
fieldNames
.
length
>
25
)
{
throw
new
TableException
(
"Only up to 25 fields supported with this CsvTableSource."
)
}
extends
BatchTableSource
[
Row
]
{
/** Returns the data of the table as a [[DataSet]] of [[Row]]. */
override
def
getDataSet
(
execEnv
:
ExecutionEnvironment
)
:
DataSet
[
Tuple
]
=
{
override
def
getDataSet
(
execEnv
:
ExecutionEnvironment
)
:
DataSet
[
Row
]
=
{
val
typeInfo
=
getReturnType
.
asInstanceOf
[
TupleTypeInfoBase
[
Tuple
]
]
val
inputFormat
=
new
Tuple
CsvInputFormat
(
new
Path
(
path
),
rowDelim
,
fieldDelim
,
typeInfo
)
val
typeInfo
=
getReturnType
.
asInstanceOf
[
RowTypeInfo
]
val
inputFormat
=
new
Row
CsvInputFormat
(
new
Path
(
path
),
rowDelim
,
fieldDelim
,
typeInfo
)
inputFormat
.
setSkipFirstLineAsHeader
(
ignoreFirstLine
)
inputFormat
.
setLenient
(
lenient
)
...
...
@@ -86,8 +80,8 @@ class CsvTableSource(
/** Returns the number of fields of the table. */
override
def
getNumberOfFields
:
Int
=
fieldNames
.
length
/** Returns the [[
TypeInformation
]] for the return type of the [[CsvTableSource]]. */
override
def
getReturnType
:
TypeInformation
[
Tuple
]
=
{
new
TupleTypeInfo
(
fieldTypes
.
toArray
:_
*
)
/** Returns the [[
RowTypeInfo
]] for the return type of the [[CsvTableSource]]. */
override
def
getReturnType
:
RowTypeInfo
=
{
new
RowTypeInfo
(
fieldTypes
)
}
}
flink-libraries/flink-table/src/test/java/org/apache/flink/api/java/batch/TableSourceITCase.java
浏览文件 @
c5d1d123
...
...
@@ -84,7 +84,7 @@ public class TableSourceITCase extends TableProgramsTestBase {
public
static
class
TestBatchTableSource
implements
BatchTableSource
<
Row
>
{
private
TypeInformation
[]
fieldTypes
=
new
TypeInformation
<?>[]
{
private
TypeInformation
<?>
[]
fieldTypes
=
new
TypeInformation
<?>[]
{
BasicTypeInfo
.
STRING_TYPE_INFO
,
BasicTypeInfo
.
LONG_TYPE_INFO
,
BasicTypeInfo
.
INT_TYPE_INFO
...
...
flink-libraries/flink-table/src/test/java/org/apache/flink/api/java/io/RowCsvInputFormatTest.java
0 → 100644
浏览文件 @
c5d1d123
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录