提交 70884cfb 编写于 作者: M Mark Needham

more apoc.import

上级 7519f76b
This procedure imports CSV files that comply with the link:https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin-import/#import-tool-header-format/[Neo4j import tool's header format].
=== Nodes
The following file contains two people:
.persons.csv
[source,text]
----
id:ID,name:STRING
1,John
2,Jane
----
We'll place this file into the `import` directory of our Neo4j instance.
We can create two `Person` nodes with their `name` properties set, by running the following query:
[source,cypher]
----
CALL apoc.import.csv([{fileName: 'file:/persons.csv', labels: ['Person']}], [], {});
----
.Results
[opts="header"]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "progress.csv" | "file" | "csv" | 2 | 0 | 4 | 7 | 0 | -1 | 0 | TRUE | NULL
|===
We can check what's been imported by running the following query:
[source,cypher]
----
MATCH (p:Person)
RETURN p;
----
.Results
[opts="header"]
|===
| p
| (:Person {name: "John", id: "1"})
| (:Person {name: "Jane", id: "2"})
|===
=== Nodes and relationships
The following files contain nodes and relationships in CSV format:
.people-nodes.csv
[source,text]
----
:ID|name:STRING|speaks:STRING[]
1|John|en,fr
2|Jane|en,de
----
.knows-rels.csv
[source,text]
----
:START_ID|:END_ID|since:INT
1|2|2016
----
We will import two `Person` nodes and a `KNOWS` relationship between them (with the value of the `since` property set).
The field terminators and the array delimiters are changed from the default value, and the CSVs use numeric ids.
[source,cypher]
----
CALL apoc.import.csv(
[{fileName: 'file:/people-nodes.csv', labels: ['Person']}],
[{fileName: 'file:/knows-rels.csv', type: 'KNOWS'}],
{delimiter: '|', arrayDelimiter: ',', stringIds: false}
);
----
.Results
[opts="header"]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "progress.csv" | "file" | "csv" | 2 | 1 | 7 | 7 | 0 | -1 | 0 | TRUE | NULL
|===
We can check what's been imported by running the following query:
[source,cypher]
----
MATCH path = (p1:Person)-[:KNOWS]->(p2:Person)
RETURN path;
----
.Results
[opts="header"]
|===
| path
| (:Person {name: "John", speaks: ["en", "fr"], __csv_id: 1})-[:KNOWS {since: 2016}]->(:Person {name: "Jane", speaks: ["en", "de"], __csv_id: 2})
|===
\ No newline at end of file
[[import-graphml-simple]]
=== Import simple GraphML file
The `simple.graphml` file contains a graph representation from the http://graphml.graphdrawing.org/primer/graphml-primer.html[GraphML primer^].
image::apoc.import.graphml.simple-diagram.png[]
.simple.graphml
[source,xml]
----
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="undirected">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
<node id="n4"/>
<node id="n5"/>
<node id="n6"/>
<node id="n7"/>
<node id="n8"/>
<node id="n9"/>
<node id="n10"/>
<edge source="n0" target="n2"/>
<edge source="n1" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n3" target="n5"/>
<edge source="n3" target="n4"/>
<edge source="n4" target="n6"/>
<edge source="n6" target="n5"/>
<edge source="n5" target="n7"/>
<edge source="n6" target="n8"/>
<edge source="n8" target="n7"/>
<edge source="n8" target="n9"/>
<edge source="n8" target="n10"/>
</graph>
</graphml>
----
.The following imports a graph based on `simple.graphml`
[source,cypher]
----
CALL apoc.import.graphml("http://graphml.graphdrawing.org/primer/simple.graphml", {})
----
If we run this query, we'll see the following output:
.Results
[opts="header"]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "http://graphml.graphdrawing.org/primer/simple.graphml" | "file" | "graphml" | 11 | 12 | 0 | 618 | 0 | -1 | 0 | TRUE | NULL
|===
We could also copy `simple.graphml` into Neo4j's `import` directory, and import the file from there.
We can then run the import procedure in the following way:
.The following imports a graph based on `simple.graphml`
[source,cypher]
----
CALL apoc.import.graphml("file://simple.graphml", {})
----
The Neo4j Browser visualization below shows the imported graph:
image::apoc.import.graphml.simple.png[title="Simple Graph Visualization"]
[[import-graphml-apoc]]
=== Import GraphML file created by Export GraphML procedures
`movies.graphml` contains a subset of Neo4j's movies graph, and was generated by the xref::export/graphml.adoc#export-graphml-whole-database[Export GraphML procedure].
.movies.graphml
[source,xml]
----
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born"/>
<key id="name" for="node" attr.name="name"/>
<key id="tagline" for="node" attr.name="tagline"/>
<key id="label" for="node" attr.name="label"/>
<key id="title" for="node" attr.name="title"/>
<key id="released" for="node" attr.name="released"/>
<key id="roles" for="edge" attr.name="roles"/>
<key id="label" for="edge" attr.name="label"/>
<graph id="G" edgedefault="directed">
<node id="n188" labels=":Movie"><data key="labels">:Movie</data><data key="title">The Matrix</data><data key="tagline">Welcome to the Real World</data><data key="released">1999</data></node>
<node id="n189" labels=":Person"><data key="labels">:Person</data><data key="born">1964</data><data key="name">Keanu Reeves</data></node>
<node id="n190" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Carrie-Anne Moss</data></node>
<node id="n191" labels=":Person"><data key="labels">:Person</data><data key="born">1961</data><data key="name">Laurence Fishburne</data></node>
<node id="n192" labels=":Person"><data key="labels">:Person</data><data key="born">1960</data><data key="name">Hugo Weaving</data></node>
<node id="n193" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Lilly Wachowski</data></node>
<node id="n194" labels=":Person"><data key="labels">:Person</data><data key="born">1965</data><data key="name">Lana Wachowski</data></node>
<node id="n195" labels=":Person"><data key="labels">:Person</data><data key="born">1952</data><data key="name">Joel Silver</data></node>
<edge id="e267" source="n189" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Neo"]</data></edge>
<edge id="e268" source="n190" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Trinity"]</data></edge>
<edge id="e269" source="n191" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Morpheus"]</data></edge>
<edge id="e270" source="n192" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Agent Smith"]</data></edge>
<edge id="e271" source="n193" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e272" source="n194" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e273" source="n195" target="n188" label="PRODUCED"><data key="label">PRODUCED</data></edge>
</graph>
</graphml>
----
.The following imports a graph based on `movies.graphml`
[source,cypher]
----
CALL apoc.import.graphml("movies.graphml", {})
----
If we run this query, we'll see the following output:
.Results
[opts="header"]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "movies.graphml" | "file" | "graphml" | 8 | 7 | 36 | 23 | 0 | -1 | 0 | TRUE | NULL
|===
We can run the following query to see the imported graph:
[source,cypher]
----
MATCH p=()-->()
RETURN p
----
.Results
[opts="header"]
|===
| p
| ({name: "Laurence Fishburne", born: "1961", labels: ":Person"})-[:ACTED_IN {roles: "[\"Morpheus\"]", label: "ACTED_IN"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999",
labels: ":Movie"})
| ({name: "Carrie-Anne Moss", born: "1967", labels: ":Person"})-[:ACTED_IN {roles: "[\"Trinity\"]", label: "ACTED_IN"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", la
bels: ":Movie"}) | ({name: "Lana Wachowski", born: "1965", labels: ":Person"})-[:DIRECTED {label: "DIRECTED"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"})
| ({name: "Joel Silver", born: "1952", labels: ":Person"})-[:PRODUCED {label: "PRODUCED"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"})
| ({name: "Lilly Wachowski", born: "1967", labels: ":Person"})-[:DIRECTED {label: "DIRECTED"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"})
| ({name: "Keanu Reeves", born: "1964", labels: ":Person"})-[:ACTED_IN {roles: "[\"Neo\"]", label: "ACTED_IN"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":
Movie"})
| ({name: "Hugo Weaving", born: "1960", labels: ":Person"})-[:ACTED_IN {roles: "[\"Agent Smith\"]", label: "ACTED_IN"}]->({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", la
bels: ":Movie"})
|===
The labels defined in the GraphML file have been added to the `labels` property on each node, rather than being added as a node label.
We can set the config property `readLabels: true` to import native labels:
.The following imports a graph based on `movies.graphml` and stores node labels
[source,cypher]
----
CALL apoc.import.graphml("movies.graphml", {readLabels: true})
----
.Results
[opts="header"]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "movies.graphml" | "file" | "graphml" | 8 | 7 | 21 | 23 | 0 | -1 | 0 | TRUE | NULL
|===
And now let's re-run the query to see the imported graph:
[source,cypher]
----
MATCH p=()-->()
RETURN;
----
.Results
[opts="header"]
|===
| p
| (:Person {name: "Lilly Wachowski", born: "1967"})-[:DIRECTED]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
| (:Person {name: "Carrie-Anne Moss", born: "1967"})-[:ACTED_IN {roles: "[\"Trinity\"]"}]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
| (:Person {name: "Hugo Weaving", born: "1960"})-[:ACTED_IN {roles: "[\"Agent Smith\"]"}]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
| (:Person {name: "Laurence Fishburne", born: "1961"})-[:ACTED_IN {roles: "[\"Morpheus\"]"}]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
| (:Person {name: "Keanu Reeves", born: "1964"})-[:ACTED_IN {roles: "[\"Neo\"]"}]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
| (:Person {name: "Joel Silver", born: "1952"})-[:PRODUCED]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
| (:Person {name: "Lana Wachowski", born: "1965"})-[:DIRECTED]->(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"})
|===
The `apoc.import.json` procedure can be used to import JSON files created by the xref::overview/apoc.export/index.adoc[`apoc.export.json.*`] procedures.
`all.json` contains a subset of Neo4j's movies graph, and was generated by xref::overview/apoc.export/apoc.export.json.all.adoc[].
.all.json
[source,json]
----
include::example$data/exportJSON/all.json[leveloffset]
----
We can import this file using `apoc.import.json`.
[source,cypher]
----
CALL apoc.import.json("file:///all.json")
----
.Results
[opts=header]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "file:///all.json" | "file" | "json" | 3 | 1 | 15 | 105 | 4 | -1 | 0 | TRUE | NULL
|===
\ No newline at end of file
The procedure support the following config parameters:
.Config parameters
[opts=header]
|===
| name | type | default | description
| readLabels | Boolean | false | Creates node labels based on the value in the `labels` property of `node` elements
| defaultRelationshipType | String | RELATED | The default relationship type to use if none is specified in the GraphML file
| storeNodeIds | Boolean | false | store the `id` property of `node` elements
| batchSize | Integer | 20000 | The number of elements to process per transaction
|===
\ No newline at end of file
The procedure support the following config parameters:
.Config parameters
[opts=header]
|===
| name | type | default | description
| nodes | Map<String, List<String>> | {}| properties to include for each node label e.g. `{Movie: ['title']}`
| rels | Map<String, List<String>> | {} | properties to include for each relationship type e.g. `{`ACTED_IN`: ["roles"]}`
|===
\ No newline at end of file
The procedure support the following config parameters:
.Config parameters
[opts=header]
|===
| name | type | default | description | https://neo4j.com/docs/operations-manual/current/tools/import/options/[import tool counterpart]
| delimiter | String | , |delimiter character between columns | `--delimiter=,`
| arrayDelimiter | String | ; | delimiter character in arrays | `--array-delimiter=;`
| ignoreDuplicateNodes | Boolean | false | for duplicate nodes, only load the first one and skip the rest (true) or fail the import (false) | `--ignore-duplicate-nodes=false`
| quotationCharacter | String | " | quotation character | `--quote='"'`
| stringIds | Boolean | true | treat ids as strings | `--id-type=STRING`
| skipLines | Integer | 1 | lines to skip (incl. header) | N/A
|===
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册