I'm trying to convert an old HTML Site to a new CMS. To get the correct menu hierachy (with varying depth) I want to read all the files with PHP and extract/parse the menu (nested unordered lists) into an associative array
我正在尝试将旧的HTML网站转换为新的CMS。为了获得正确的菜单层次(具有不同的深度),我想用PHP读取所有文件,并将菜单(嵌套的无序列表)提取/解析为关联数组
root.html
<ul id="menu">
<li class="active">Start</li>
<ul>
<li><a href="file1.html">Sub1</a></li>
<li><a href="file2.html">Sub2</a></li>
</ul>
</ul>
file1.html
<ul id="menu">
<li><a href="root.html">Start</a></li>
<ul>
<li class="active">Sub1</li>
<ul>
<li><a href="file3.html">SubSub1</a></li>
<li><a href="file4.html">SubSub2</a></li>
<li><a href="file5.html">SubSub3</a></li>
<li><a href="file6.html">SubSub4</a></li>
</ul>
</ul>
</ul>
file3.html
<ul id="menu">
<li><a href="root.html">Start</a></li>
<ul>
<li><a href="file1.html">Sub1</a></li>
<ul>
<li class="active">SubSub1</li>
<ul>
<li><a href="file7.html">SubSubSub1</a></li>
<li><a href="file8.html">SubSubSub2</a></li>
<li><a href="file9.html">SubSubSub3</a></li>
</ul>
</ul>
</ul>
</ul>
file4.html
<ul id="menu">
<li><a href="root.html">Start</a></li>
<ul>
<li><a href="file1.html">Sub1</a></li>
<ul>
<li><a href="file3.html">SubSub1</a></li>
<li class="active">SubSub2</li>
<li><a href="file5.html">SubSub3</a></li>
<li><a href="file6.html">SubSub4</a></li>
</ul>
</ul>
</ul>
I would like to loop through all files, extract 'id="menu"' and create an array like this (or similar) while keeping the hierarchy and file information
我想遍历所有文件,提取'id =“menu”'并创建一个这样的数组(或类似的),同时保持层次结构和文件信息
Array
[file] => root.html
[child] => Array
[Sub1] => Array
[file] => file1.html
[child] => Array
[SubSub1] => Array
[file] => file3.html
[child] => Array
[SubSubSub1] => Array
[file] => file7.html
[SubSubSub2] => Array
[file] => file8.html
[SubSubSub3] => Array
[file] => file9.html
[SubSub2] => Array
[file] => file4.html
[SubSub3] => Array
[file] => file5.html
[SubSub4] => Array
[file] => file6.html
[Sub2] => Array
[file] => file2.html
With the help of the PHP Simple HTML DOM Parser libray I successfully read the file and extracted the menu
借助PHP Simple HTML DOM Parser libray,我成功读取了文件并解压缩了菜单
$html = file_get_html($file);
foreach ($html->find("ul[id=menu]") as $ul) {
..
}
To only parse the active section of the menu (leaving out the links to got 1 or more levels up) I used
要仅解析菜单的活动部分(省略链接以获得1个或更多级别),我使用了
$ul->find("ul",-1)
which finds the last ul inside the outer ul. This works great for a single file.
在外部ul中找到最后一个ul。这适用于单个文件。
But I'm having trouble to loop through all the files/menus and keep the parent/child information because each menu has a different depth.
但是我无法遍历所有文件/菜单并保留父/子信息,因为每个菜单都有不同的深度。
Thanks for all suggestions, tips and help!
感谢您的所有建议,提示和帮助!
2 个解决方案
#1
0
Edit: OK, this was not so easy after all :)
编辑:好的,毕竟这不是那么容易:)
By the way, this library is really an excellent tool. Kudos to the guys who wrote it.
顺便说一句,这个库真的是一个很好的工具。感谢写这篇文章的人。
Here is one possible solution:
这是一个可能的解决方案:
class menu_parse {
static $missing = array(); // list of missing files
static private $files = array(); // list of source files to process
// initiate menu parsing
static function start ($file)
{
// start with root file
self::$files[$file] = 1;
// parse all source files
for ($res=array(); current(self::$files); next(self::$files))
{
// get next file name
$file = key(self::$files);
// parse the file
if (!file_exists ($file))
{
self::$missing[$file] = 1;
continue;
}
$html = file_get_html ($file);
// get menu root (if any)
$root = $html->find("ul[id=menu]",0);
if ($root) self::menu ($root, $res);
}
// reorder missing files array
self::$missing = array_keys (self::$missing);
// that's all folks
return $res;
}
// parse a menu at a given level
static private function menu ($menu, &$res)
{
foreach ($menu->children as $elem)
{
switch ($elem->tag)
{
case "li" : // name and possibly source file of a menu
// grab menu name
$name = $elem->plaintext;
// see if we can find a link to the menu file
$link = $elem->children(0);
if ($link && $link->tag == 'a')
{
// found the link
$file = $link->href;
$res[$name]->file = $file;
// add the source file to the processing list
self::$files[$file] = 1;
}
break;
case "ul" : // go down one level to grab items of the current menu
self::menu ($elem, $res[$name]->childs);
}
}
}
}
Usage:
// The result will be an array of menus indexed by item names.
//
// Each menu will be an object with 2 members
// - file -> source file of the menu
// - childs -> array of menu subtitems
//
$res = menu_parse::start ("root.html");
// parse_menu::$missing will contain all the missing files names
echo "Result : <pre>";
print_r ($res);
echo "</pre><br>missing files:<pre>";
print_r (menu_parse::$missing);
echo "</pre>";
Ouput of your test case:
测试用例的输出:
Array
(
[Start] => stdClass Object
(
[childs] => Array
(
[Sub1] => stdClass Object
(
[file] => file1.html
[childs] => Array
(
[SubSub1] => stdClass Object
(
[file] => file3.html
[childs] => Array
(
[SubSubSub1] => stdClass Object
(
[file] => file7.html
)
[SubSubSub2] => stdClass Object
(
[file] => file8.html
)
[SubSubSub3] => stdClass Object
(
[file] => file9.html
)
)
)
[SubSub2] => stdClass Object
(
[file] => file3.html
)
[SubSub3] => stdClass Object
(
[file] => file5.html
)
[SubSub4] => stdClass Object
(
[file] => file6.html
)
)
)
[Sub2] => stdClass Object
(
[file] => file2.html
)
)
[file] => root.html
)
)
missing files: Array
(
[0] => file2.html
[1] => file5.html
[2] => file6.html
[3] => file7.html
[4] => file8.html
[5] => file9.html
)
Notes:
- The code assumes all item names are unique inside a given menu.
该代码假定所有项目名称在给定菜单中是唯一的。
You could modify the code to have the (sub)menus as an array with numeric indexes and names as properties (so that two items with the same name would not overwrite each other), but that would complicate the structure of the result.
您可以修改代码以将(子)菜单作为具有数字索引和名称作为属性的数组(以便具有相同名称的两个项不会相互覆盖),但这会使结果的结构复杂化。
Should such name duplication occur, the best solution would be to rename one of the items, IMHO.
如果发生这样的名称重复,最好的解决方案是重命名其中一个项目,恕我直言。
- The code also assume there is only one root menu.
该代码还假设只有一个根菜单。
It could be modified to handle more than one, but that does not make much sense IMHO (it would mean a root menu ID duplication, which would likely cause trouble to the JavaScript trying to process it in the first place).
它可以修改为处理多个,但这没有多大意义恕我直言(这将意味着根菜单ID重复,这可能会导致JavaScript尝试首先处理它的麻烦)。
#2
0
This is more like a directory tree with upward links. file1 on level 1 points to file3 on level 2, and this points back to file 1 on level 1 which causes the "different depth". Consider of setting up a particular menu-object pointing upwards and downwards and keeping lists of that instead of arrays of arrays of strings. Starting point for such a hierarchie in php could be a class like this:
这更像是具有向上链接的目录树。级别1上的file1指向级别2上的file3,这指向级别1上的文件1,这导致“不同深度”。考虑设置一个向上和向下指向的特定菜单对象,并保留它的列表而不是字符串数组的数组。 php中这样一个hierarchie的起点可能是这样一个类:
class menuItem {
protected $leftSibling = null;
protected $rightSibling = null;
protected $parents = array();
protected $childs = array();
protected properties = array();
// set property like menu name or file name
function setProp($name, $val) {
$this->properties[$name] = $val;
}
// get a propertue if set, false otherwise
function getProp($name) {
if ( isset($this->properties[$name]) )
return $this->properties[$name];
return false;
}
function getLeftSiblingsAsArray() {
$sibling = $this->getLeftSibling();
$siblings = array();
while ( $sibling != null ) {
$siblings[] = $sibling;
$sibling = $sibling->getLeftSibling();
}
return $siblings;
}
function addChild($item) {
$this->childs[] = $item;
}
function addLeftSibling($item) {
$sibling = $this->leftSibling;
while ( $sibling != null ) {
if ( $sibling->hasLeft() )
$sibling = $sibling->getLeftSibling();
else {
$sibling->addFinalLeft($item);
break;
}
}
}
function addFinalLeft(item) {
$sibling->leftSibling = $item;
}
....
#1
0
Edit: OK, this was not so easy after all :)
编辑:好的,毕竟这不是那么容易:)
By the way, this library is really an excellent tool. Kudos to the guys who wrote it.
顺便说一句,这个库真的是一个很好的工具。感谢写这篇文章的人。
Here is one possible solution:
这是一个可能的解决方案:
class menu_parse {
static $missing = array(); // list of missing files
static private $files = array(); // list of source files to process
// initiate menu parsing
static function start ($file)
{
// start with root file
self::$files[$file] = 1;
// parse all source files
for ($res=array(); current(self::$files); next(self::$files))
{
// get next file name
$file = key(self::$files);
// parse the file
if (!file_exists ($file))
{
self::$missing[$file] = 1;
continue;
}
$html = file_get_html ($file);
// get menu root (if any)
$root = $html->find("ul[id=menu]",0);
if ($root) self::menu ($root, $res);
}
// reorder missing files array
self::$missing = array_keys (self::$missing);
// that's all folks
return $res;
}
// parse a menu at a given level
static private function menu ($menu, &$res)
{
foreach ($menu->children as $elem)
{
switch ($elem->tag)
{
case "li" : // name and possibly source file of a menu
// grab menu name
$name = $elem->plaintext;
// see if we can find a link to the menu file
$link = $elem->children(0);
if ($link && $link->tag == 'a')
{
// found the link
$file = $link->href;
$res[$name]->file = $file;
// add the source file to the processing list
self::$files[$file] = 1;
}
break;
case "ul" : // go down one level to grab items of the current menu
self::menu ($elem, $res[$name]->childs);
}
}
}
}
Usage:
// The result will be an array of menus indexed by item names.
//
// Each menu will be an object with 2 members
// - file -> source file of the menu
// - childs -> array of menu subtitems
//
$res = menu_parse::start ("root.html");
// parse_menu::$missing will contain all the missing files names
echo "Result : <pre>";
print_r ($res);
echo "</pre><br>missing files:<pre>";
print_r (menu_parse::$missing);
echo "</pre>";
Ouput of your test case:
测试用例的输出:
Array
(
[Start] => stdClass Object
(
[childs] => Array
(
[Sub1] => stdClass Object
(
[file] => file1.html
[childs] => Array
(
[SubSub1] => stdClass Object
(
[file] => file3.html
[childs] => Array
(
[SubSubSub1] => stdClass Object
(
[file] => file7.html
)
[SubSubSub2] => stdClass Object
(
[file] => file8.html
)
[SubSubSub3] => stdClass Object
(
[file] => file9.html
)
)
)
[SubSub2] => stdClass Object
(
[file] => file3.html
)
[SubSub3] => stdClass Object
(
[file] => file5.html
)
[SubSub4] => stdClass Object
(
[file] => file6.html
)
)
)
[Sub2] => stdClass Object
(
[file] => file2.html
)
)
[file] => root.html
)
)
missing files: Array
(
[0] => file2.html
[1] => file5.html
[2] => file6.html
[3] => file7.html
[4] => file8.html
[5] => file9.html
)
Notes:
- The code assumes all item names are unique inside a given menu.
该代码假定所有项目名称在给定菜单中是唯一的。
You could modify the code to have the (sub)menus as an array with numeric indexes and names as properties (so that two items with the same name would not overwrite each other), but that would complicate the structure of the result.
您可以修改代码以将(子)菜单作为具有数字索引和名称作为属性的数组(以便具有相同名称的两个项不会相互覆盖),但这会使结果的结构复杂化。
Should such name duplication occur, the best solution would be to rename one of the items, IMHO.
如果发生这样的名称重复,最好的解决方案是重命名其中一个项目,恕我直言。
- The code also assume there is only one root menu.
该代码还假设只有一个根菜单。
It could be modified to handle more than one, but that does not make much sense IMHO (it would mean a root menu ID duplication, which would likely cause trouble to the JavaScript trying to process it in the first place).
它可以修改为处理多个,但这没有多大意义恕我直言(这将意味着根菜单ID重复,这可能会导致JavaScript尝试首先处理它的麻烦)。
#2
0
This is more like a directory tree with upward links. file1 on level 1 points to file3 on level 2, and this points back to file 1 on level 1 which causes the "different depth". Consider of setting up a particular menu-object pointing upwards and downwards and keeping lists of that instead of arrays of arrays of strings. Starting point for such a hierarchie in php could be a class like this:
这更像是具有向上链接的目录树。级别1上的file1指向级别2上的file3,这指向级别1上的文件1,这导致“不同深度”。考虑设置一个向上和向下指向的特定菜单对象,并保留它的列表而不是字符串数组的数组。 php中这样一个hierarchie的起点可能是这样一个类:
class menuItem {
protected $leftSibling = null;
protected $rightSibling = null;
protected $parents = array();
protected $childs = array();
protected properties = array();
// set property like menu name or file name
function setProp($name, $val) {
$this->properties[$name] = $val;
}
// get a propertue if set, false otherwise
function getProp($name) {
if ( isset($this->properties[$name]) )
return $this->properties[$name];
return false;
}
function getLeftSiblingsAsArray() {
$sibling = $this->getLeftSibling();
$siblings = array();
while ( $sibling != null ) {
$siblings[] = $sibling;
$sibling = $sibling->getLeftSibling();
}
return $siblings;
}
function addChild($item) {
$this->childs[] = $item;
}
function addLeftSibling($item) {
$sibling = $this->leftSibling;
while ( $sibling != null ) {
if ( $sibling->hasLeft() )
$sibling = $sibling->getLeftSibling();
else {
$sibling->addFinalLeft($item);
break;
}
}
}
function addFinalLeft(item) {
$sibling->leftSibling = $item;
}
....