Dear * community,
亲爱的*社区,
I have a 34 GB json file that has many data inside. I tried to import into my mongodb by using mongoimport --file file.json - but it failed ofcourse the file is too big and threw a memory system throw error you know it. Is it possible to use php code to iterate through the file with a cursor? I have zero experience on this, someone told me that would be possible. I want to know how the file is build, but I do not know how to view an example array of it. From the source I could get an example array:
我有一个34 GB的json文件,里面有很多数据。我试图通过使用mongoimport --file file.json导入到我的mongodb中 - 但它失败了,因为文件太大而扔了一个你知道的内存系统抛出错误。是否可以使用PHP代码使用游标迭代文件?我对此没有经验,有人告诉我这是可能的。我想知道文件是如何构建的,但我不知道如何查看它的示例数组。从源代码我可以得到一个示例数组:
{
"_id": ObjectId("53b29644aafd413977b23b7e"),
"summonerId": NumberLong(24570940),
"region": "euw",
"updatedAt": NumberLong(1404212804),
"season": NumberLong(4),
"stats": {
"110": {
"totalSessionsPlayed": NumberLong(3),
"totalSessionsLost": NumberLong(2),
"totalSessionsWon": NumberLong(1),
"totalChampionKills": NumberLong(34),
"totalDamageDealt": NumberLong(415051),
"totalDamageTaken": NumberLong(63237),
"mostChampionKillsPerSession": NumberLong(12),
"totalMinionKills": NumberLong(538),
"totalDoubleKills": NumberLong(5),
"totalTripleKills": NumberLong(1),
"totalDeathsPerSession": NumberLong(18),
"totalGoldEarned": NumberLong(40977),
"totalTurretsKilled": NumberLong(6),
"totalPhysicalDamageDealt": NumberLong(381668),
"totalMagicDamageDealt": NumberLong(31340),
"totalAssists": NumberLong(25),
"maxChampionsKilled": NumberLong(12),
"maxNumDeaths": NumberLong(10)
}
}
}
The field stats contains more arrays, 110 is just an example. How can I iterate through this big sized file or how can I import it into my mongodb? For example; I want to echo summonerid,championid (which is 110 in this case),totalSessionsPlayed. It has to reloop as much as it needs until theres no championid left for this particular summonerid.
字段统计包含更多数组,110只是一个示例。如何迭代这个大文件或如何将其导入我的mongodb?例如;我想回应summonerid,championid(在这种情况下是110),totalSessionsPlayed。它必须尽可能多地重新循环,直到没有为这个特殊的召唤者留下任何冠军。
Again... A summonerID has a list of champions that it has been playing in his playing career. Champions are referring to (in this example) 110. Every single summonerid can contain multiple champions and I want to have all champions, how many times the champion has been played (totalsessionplayed) by summonerid.
再一次......召唤者ID有一个在他的职业生涯中一直在玩的冠军名单。冠军指的是(在这个例子中)110。每个召唤者都可以包含多个冠军,我希望拥有所有冠军,召唤冠军的次数(totalsessionplay)。
1 个解决方案
#1
10
You'll want to use a streaming parser. These only pull small portions of your file into memory at a time.
您将需要使用流式解析器。这些只能一次将文件的一小部分拉入内存。
They come in a couple different flavors: SAX-like push parsers, and pull parsers. XML reader models: SAX versus XML pull parser gives an overview of the difference.
它们有几种不同的风格:类似SAX的推送解析器和拉解析器。 XML阅读器模型:SAX与XML pull解析器概述了差异。
Push Parser
This is a quick example using salsify/json-streaming-parser.
这是使用salsify / json-streaming-parser的快速示例。
As it rolls through the file we'll keep track of the summonerId
, championId
, and state. It's all event-based - you don't get random access with a sequential parser so you have to keep track of things yourself. Every time a totalSessionsPlayed
comes up it'll echo out the summonerId, championId, and totalSessionsPlayed.
当它浏览文件时,我们将跟踪summonerId,championId和state。它都是基于事件的 - 您不会通过顺序解析器获得随机访问权限,因此您必须自己跟踪事物。每当totalSessionsPlayed出现时,它都会回显summonerId,championId和totalSessionsPlayed。
data.json
This is a paired-down json file for demonstration purposes.
这是一个用于演示目的的配对json文件。
[
{
"_id": "53b29644aafd413977b23b7e",
"summonerId": 24570940,
"region": "euw",
"stats": {
"110": {
"totalSessionsPlayed": 3,
"totalSessionsLost": 2,
"totalSessionsWon": 1
},
"112": {
"totalSessionsPlayed": 45,
"totalSessionsLost": 2,
"totalSessionsWon": 1
}
}
},
{
"_id": "asdfasdfasdf",
"summonerId": 555555,
"region": "euw",
"stats": {
"42": {
"totalSessionsPlayed": 65,
"totalSessionsLost": 2,
"totalSessionsWon": 1
},
"88": {
"totalSessionsPlayed": 99,
"totalSessionsLost": 2,
"totalSessionsWon": 1
}
}
}
]
Example:
class ListMatchUps extends JsonStreamingParser\Listener\IdleListener
{
private $key;
private $summonerId;
private $championId;
private $inStats;
public function start_document()
{
$this->key = null;
$this->summonerId = null;
$this->championId = null;
$this->inStats = false;
}
public function start_object()
{
if ($this->key === 'stats') {
$this->inStats = true;
} else if ($this->inStats) {
$this->championId = $this->key;
}
}
public function end_object()
{
if ($this->championId !== null) {
$this->championId = null;
} else if ($this->inStats) {
$this->inStats = false;
} else {
$this->summonerId = null;
}
}
public function key($key)
{
$this->key = $key;
}
public function value($value)
{
switch ($this->key) {
case 'summonerId':
$this->summonerId = $value;
break;
case 'totalSessionsPlayed':
echo "{$this->summonerId},{$this->championId},$value\n";
break;
}
}
}
$stream = fopen('data.json', 'r');
$listener = new ListMatchUps();
try {
$parser = new JsonStreamingParser_Parser($stream, $listener);
$parser->parse();
} catch (Exception $e) {
fclose($stream);
throw $e;
}
Output:
24570940,110,3
24570940,112,45
555555,42,65
555555,88,99
Pull Parser
This is using a parser I recently wrote, pcrov/jsonreader (requires PHP 7.)
这是使用我最近编写的解析器,pcrov / jsonreader(需要PHP 7.)
Same data.json as above.
与上面相同的data.json。
Example:
use pcrov\JsonReader\JsonReader;
$reader = new JsonReader();
$reader->open("data.json");
while($reader->read("summonerId")) {
$summonerId = $reader->value();
$reader->next("stats");
foreach($reader->value() as $championId => $stats) {
echo "$summonerId, $championId, {$stats['totalSessionsPlayed']}\n";
}
}
$reader->close();
Output:
24570940, 110, 3
24570940, 112, 45
555555, 42, 65
555555, 88, 99
#1
10
You'll want to use a streaming parser. These only pull small portions of your file into memory at a time.
您将需要使用流式解析器。这些只能一次将文件的一小部分拉入内存。
They come in a couple different flavors: SAX-like push parsers, and pull parsers. XML reader models: SAX versus XML pull parser gives an overview of the difference.
它们有几种不同的风格:类似SAX的推送解析器和拉解析器。 XML阅读器模型:SAX与XML pull解析器概述了差异。
Push Parser
This is a quick example using salsify/json-streaming-parser.
这是使用salsify / json-streaming-parser的快速示例。
As it rolls through the file we'll keep track of the summonerId
, championId
, and state. It's all event-based - you don't get random access with a sequential parser so you have to keep track of things yourself. Every time a totalSessionsPlayed
comes up it'll echo out the summonerId, championId, and totalSessionsPlayed.
当它浏览文件时,我们将跟踪summonerId,championId和state。它都是基于事件的 - 您不会通过顺序解析器获得随机访问权限,因此您必须自己跟踪事物。每当totalSessionsPlayed出现时,它都会回显summonerId,championId和totalSessionsPlayed。
data.json
This is a paired-down json file for demonstration purposes.
这是一个用于演示目的的配对json文件。
[
{
"_id": "53b29644aafd413977b23b7e",
"summonerId": 24570940,
"region": "euw",
"stats": {
"110": {
"totalSessionsPlayed": 3,
"totalSessionsLost": 2,
"totalSessionsWon": 1
},
"112": {
"totalSessionsPlayed": 45,
"totalSessionsLost": 2,
"totalSessionsWon": 1
}
}
},
{
"_id": "asdfasdfasdf",
"summonerId": 555555,
"region": "euw",
"stats": {
"42": {
"totalSessionsPlayed": 65,
"totalSessionsLost": 2,
"totalSessionsWon": 1
},
"88": {
"totalSessionsPlayed": 99,
"totalSessionsLost": 2,
"totalSessionsWon": 1
}
}
}
]
Example:
class ListMatchUps extends JsonStreamingParser\Listener\IdleListener
{
private $key;
private $summonerId;
private $championId;
private $inStats;
public function start_document()
{
$this->key = null;
$this->summonerId = null;
$this->championId = null;
$this->inStats = false;
}
public function start_object()
{
if ($this->key === 'stats') {
$this->inStats = true;
} else if ($this->inStats) {
$this->championId = $this->key;
}
}
public function end_object()
{
if ($this->championId !== null) {
$this->championId = null;
} else if ($this->inStats) {
$this->inStats = false;
} else {
$this->summonerId = null;
}
}
public function key($key)
{
$this->key = $key;
}
public function value($value)
{
switch ($this->key) {
case 'summonerId':
$this->summonerId = $value;
break;
case 'totalSessionsPlayed':
echo "{$this->summonerId},{$this->championId},$value\n";
break;
}
}
}
$stream = fopen('data.json', 'r');
$listener = new ListMatchUps();
try {
$parser = new JsonStreamingParser_Parser($stream, $listener);
$parser->parse();
} catch (Exception $e) {
fclose($stream);
throw $e;
}
Output:
24570940,110,3
24570940,112,45
555555,42,65
555555,88,99
Pull Parser
This is using a parser I recently wrote, pcrov/jsonreader (requires PHP 7.)
这是使用我最近编写的解析器,pcrov / jsonreader(需要PHP 7.)
Same data.json as above.
与上面相同的data.json。
Example:
use pcrov\JsonReader\JsonReader;
$reader = new JsonReader();
$reader->open("data.json");
while($reader->read("summonerId")) {
$summonerId = $reader->value();
$reader->next("stats");
foreach($reader->value() as $championId => $stats) {
echo "$summonerId, $championId, {$stats['totalSessionsPlayed']}\n";
}
}
$reader->close();
Output:
24570940, 110, 3
24570940, 112, 45
555555, 42, 65
555555, 88, 99