如何将分隔的单词分隔成一个数组,其中键是分隔符的左边部分,并且右边的部分值

时间:2022-09-13 11:37:25

I have a string like this, where each word is marked with encoding (FW, PRP, etc) using underline:

我有一个这样的字符串,其中每个单词都使用下划线标记编码(FW,PRP等):

Hi_FW !_.
My_PRP$ name_NN 's_POS Jim_NNP ._.
I_PRP 'm_VBP from_IN New_NNP Zealand_NNP ._.
This_DT is_VBZ my_PRP$ friend_NN ._.
His_PRP$ name_NN 's_POS Adam_NNP ._.
He_PRP 's_VBZ from_IN Australia_NNP ._.
This_DT is_VBZ my_PRP$ friend_NN too_RB ._.
Her_PRP$ name_NN 's_POS Paola_NNP ._.
She_PRP 's_VBZ from_IN Italy_NNP ._.

I need to break it into an array where a key is a word, and its value is its corresponding tag:

我需要将其分解为一个数组,其中键是一个单词,其值是其对应的标记:

[
    "Hi" => "FW",
    "My" => "PRP$",
    "name" => "NN"
    ...
]

I assume I can somehow split this string by the delimiter _, but can't seem to find a good way to then join it into the array I need.

我假设我可以通过分隔符_以某种方式分割这个字符串,但似乎找不到一个好的方法然后将它加入我需要的数组。

How can that be achieved?

怎么能实现呢?

3 个解决方案

#1


$arr = explode("\n", $string);
$newarr = array();
foreach($arr as $item)
{
    $explodeditem = explode(' ', $item);
    foreach($explodeditem as $string)
        array_push ($newarr, $string);
}
$result = array();
foreach($newarr as $item)
{
    $newArr = explode('_', $item);
    $result[$newArr[0]] = $newArr[1];
}

#2


Lets assume we are reading from a file (data.txt) then the following reads the contents of the file using fopen() which can be omitted if your requirement is a string.

让我们假设我们正在从文件(data.txt)中读取,然后使用fopen()读取文件的内容,如果您的需求是字符串,则可以省略该内容。

The following is a partial naive implementation solution intended to give you a head start. Comments for given very simple delimiters and use of multiple preg_split() (twice):

以下是一个部分天真的实施解决方案,旨在为您提供一个良好的开端。给出非常简单的分隔符的注释和使用多个preg_split()(两次):

<?php

$results = array();
$delimiter = '_';

$file_handle = fopen("data.txt", "r");
while (!feof($file_handle)) {

   // ie. My_PRP$ name_NN 's_POS Jim_NNP ._.
   $line = fgets($file_handle);

   // validations ommited 

   // split by delimiter '_'
   // [0] = My
   // [1] = PRP$
   $line_array = preg_split("/$delimiter/", $line);

   // ie. for cases Hi_FW !_.
   // from results above, split by space
   // [0] = FW
   // [1] = !
   $value = preg_split("/\s/", $line_array[1]);

   // sighh, adding delimiter back to key-value array
   $result[$line_array[0]] = $delimiter.$value[0];
}
fclose($file_handle);

print_r($result);

?>

data.txt

Hi_FW !_.
My_PRP$ name_NN 's_POS Jim_NNP ._.
I_PRP 'm_VBP from_IN New_NNP Zealand_NNP ._.
This_DT is_VBZ my_PRP$ friend_NN ._.
His_PRP$ name_NN 's_POS Adam_NNP ._.
He_PRP 's_VBZ from_IN Australia_NNP ._.
This_DT is_VBZ my_PRP$ friend_NN too_RB ._.
Her_PRP$ name_NN 's_POS Paola_NNP ._.
She_PRP 's_VBZ from_IN Italy_NNP ._.

Hope this helps.

希望这可以帮助。

#3


I would do an explode on whitespaces and than on _

我会在空格上爆炸而不是在_上

<?php
$inputArray = explode(" ", $input);

$sentences = array();

foreach ($inputArray as $word){
    $wordArray = explode("_", $word);
    $sentences[$wordArray[0]] = $wordArray[1];
}

#1


$arr = explode("\n", $string);
$newarr = array();
foreach($arr as $item)
{
    $explodeditem = explode(' ', $item);
    foreach($explodeditem as $string)
        array_push ($newarr, $string);
}
$result = array();
foreach($newarr as $item)
{
    $newArr = explode('_', $item);
    $result[$newArr[0]] = $newArr[1];
}

#2


Lets assume we are reading from a file (data.txt) then the following reads the contents of the file using fopen() which can be omitted if your requirement is a string.

让我们假设我们正在从文件(data.txt)中读取,然后使用fopen()读取文件的内容,如果您的需求是字符串,则可以省略该内容。

The following is a partial naive implementation solution intended to give you a head start. Comments for given very simple delimiters and use of multiple preg_split() (twice):

以下是一个部分天真的实施解决方案,旨在为您提供一个良好的开端。给出非常简单的分隔符的注释和使用多个preg_split()(两次):

<?php

$results = array();
$delimiter = '_';

$file_handle = fopen("data.txt", "r");
while (!feof($file_handle)) {

   // ie. My_PRP$ name_NN 's_POS Jim_NNP ._.
   $line = fgets($file_handle);

   // validations ommited 

   // split by delimiter '_'
   // [0] = My
   // [1] = PRP$
   $line_array = preg_split("/$delimiter/", $line);

   // ie. for cases Hi_FW !_.
   // from results above, split by space
   // [0] = FW
   // [1] = !
   $value = preg_split("/\s/", $line_array[1]);

   // sighh, adding delimiter back to key-value array
   $result[$line_array[0]] = $delimiter.$value[0];
}
fclose($file_handle);

print_r($result);

?>

data.txt

Hi_FW !_.
My_PRP$ name_NN 's_POS Jim_NNP ._.
I_PRP 'm_VBP from_IN New_NNP Zealand_NNP ._.
This_DT is_VBZ my_PRP$ friend_NN ._.
His_PRP$ name_NN 's_POS Adam_NNP ._.
He_PRP 's_VBZ from_IN Australia_NNP ._.
This_DT is_VBZ my_PRP$ friend_NN too_RB ._.
Her_PRP$ name_NN 's_POS Paola_NNP ._.
She_PRP 's_VBZ from_IN Italy_NNP ._.

Hope this helps.

希望这可以帮助。

#3


I would do an explode on whitespaces and than on _

我会在空格上爆炸而不是在_上

<?php
$inputArray = explode(" ", $input);

$sentences = array();

foreach ($inputArray as $word){
    $wordArray = explode("_", $word);
    $sentences[$wordArray[0]] = $wordArray[1];
}