I have a some thousands text file to parse, it's a product catalog that follows a certain pattern.
我有一个数千个文本文件要解析,它是一个遵循某种模式的产品目录。
It has two serial numbers, with one of them I was splitting the whole text into an array, each key a product.
它有两个序列号,其中一个是我将整个文本分成一个数组,每个键都是一个产品。
The problem is the serial I was using in preg_split gets deleted from the product, and I need it.
问题是我在preg_split中使用的序列从产品中删除了,我需要它。
Here's a raw product:
这是一个原始产品:
1532.000028-01532.213.00010875-8
TRES ANÉIS, DOIS PENDENTES, DOIS BRINCOS, SENDO UM
COM
TARRACHA DE METAL NÃO NOBRE, DE: OURO, OURO BRANCO BAIXO;
CONTÉM: diamantes, pérola cultivada, pedra, massa; CONSTAM: amassada(s),
incompleta(s), PESO LOTE: 13,50G (TREZE GRAMAS E CI NQUENTAR$ 901,00
Valor Grama: 66,74
The first numbers are the two serials, they are stick together beacuse of flaws of the PDF parser.
第一个数字是两个连续出版物,它们由于PDF解析器的缺陷而粘在一起。
Here's the REGEX I'm using to split the array into products:
这是我用来将数组拆分成产品的REGEX:
$texto = preg_split("/([0-9]{4}[.][0-9]{6}[-][0-9]{1})+/",$texto);
Output:
输出:
1532.213.00010875-8
TRES ANÉIS, DOIS PENDENTES, DOIS BRINCOS, SENDO UM
COM
TARRACHA DE METAL NÃO NOBRE, DE: OURO, OURO BRANCO BAIXO;
CONTÉM: diamantes, pérola cultivada, pedra, massa; CONSTAM: amassada(s),
incompleta(s), PESO LOTE: 13,50G (TREZE GRAMAS E CI NQUENTAR$ 901,00
Valor Grama: 66,74
As you can see, the first serial is removed from the output. I need it. How can I split these products, keeping both arrays?
如您所见,第一个序列从输出中删除。我需要它。如何拆分这些产品,同时保留两个阵列?
1 个解决方案
#1
6
Change your capture group into a lookahead, like this:
将您的捕获组更改为前瞻,如下所示:
$texto = preg_split("/(?=[0-9]{4}[.][0-9]{6}[-][0-9]{1})/",$texto);
#1
6
Change your capture group into a lookahead, like this:
将您的捕获组更改为前瞻,如下所示:
$texto = preg_split("/(?=[0-9]{4}[.][0-9]{6}[-][0-9]{1})/",$texto);