I am trying to do the following with PHP...
我正在尝试使用PHP执行以下操作...
- Read a directory
- 阅读目录
- Find all .md and .markdown files
- 找到所有.md和.markdown文件
- Read the first 2 lines of these Markdown files.
- 阅读这些Markdown文件的前两行。
- If a
Title: Title for the file
is found on line 1 then add it to the array - 如果在第1行找到文件的标题:标题,则将其添加到数组中
- If a
Description: Short description
is found on line 2 then add it to the array - 如果在第2行找到描述:简短描述,则将其添加到数组中
- If a Sub-directory is found, repeat steps 1-5 on them
- 如果找到子目录,请对它们重复步骤1-5
- Should now have a nice list/array
- 现在应该有一个很好的列表/数组
- Print this list/array to screen to show up like this....
- 将此列表/数组打印到屏幕以显示如下....
Directory 1 Name
<a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 1 line 2
<a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 2 line 2
<a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 3 line 2
Directory 2 Name
<a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 1 line 2
<a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 2 line 2
<a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 3 line 2
etc..........
Code so far
function getFilesFromDir($dir)
{
$files = array();
//scan directory passsed into function
if ($handle = opendir($dir)) {
while (false !== ($file = readdir($handle))) {
// If file is .md or .markdown continue
if (preg_match('/\.(md|markdown)$/', $file)) {
// Grab first 2 lines of Markdown file
$content = file($dir . '/' . $file);
$title = $content[0];
$description = $content[1];
// If first 2 lines of Markdown file have a
// "Title: file title" and "Description: file description" lines we then
// add these key/value pairs to the array for meta data
// Match Title line
$pattern = '/^(Title|Description):(.+)/';
if (preg_match($pattern, $title, $matched)) {
$title = trim($matched[2]);
}
// match Description line
if (preg_match($pattern, $description, $matched)) {
$description = trim($matched[2]);
}
// Add .m and .markdown files and folder path to array
// Add captured Title and Description to array as well
$files[$dir][] = array("filepath" => $dir . '/' . $file,
"title" => $title,
"description" => $description
);
}
}
closedir($handle);
}
return $files;
}
Usage
用法
$dir = 'mdfiles';
$fileArray = getFilesFromDir($dir);
Help needed
So far the code just needs to add the ability to do what it does on sub-directories and the way that it matches the first 2 lines of code and then runs the regex 2 times, can probably be done differently?
到目前为止,代码只需添加执行它在子目录上执行的操作的能力以及它与前两行代码匹配然后运行正则表达式2次的方式,可能会以不同的方式完成吗?
I would think there is a better way so that the REGEX I have to match the Title and Description can be run just once?
我认为有一个更好的方法,以便我必须匹配标题和描述的REGEX可以只运行一次?
Can someone help me modify to make this code detect and run on sub-directories as well as improve the way it reads the first 2 lines of a markdown file to get the title and description if they exist?
有人可以帮我修改,使这个代码检测并在子目录上运行,并改善它读取markdown文件的前两行的方式,以获得标题和描述(如果它们存在)?
Also need help printing the array to screen to make it not only just show the dat, I know how to do that but has to break the files up to show the Folder name at the top of each set like in my demo output above.
还需要帮助打印阵列到屏幕,使它不仅只是显示数据,我知道如何做到这一点,但必须打破文件,以显示每个集的顶部的文件夹名称,如上面的我的演示输出。
I appreciate any help
我感谢任何帮助
2 个解决方案
#1
2
To recursively iterate over files, the RecursiveDirectoryIterator
is quite handy (related: PHP recursive directory path). It already offers an easy access to FileSystemObject
as well which looks useful in your case as you want to obtain the files content.
为了递归迭代文件,RecursiveDirectoryIterator非常方便(相关:PHP递归目录路径)。它已经提供了对FileSystemObject的轻松访问,这在您的情况下看起来很有用,因为您想要获取文件内容。
Additionally it's possible to run one regular expression to parse the first two lines of the file, as patterns get cached when you execute them more often, it should be fine. One pattern has the benefit that the code is more structured, but the downside that the pattern is more complex. Configuration could look like this:
另外,可以运行一个正则表达式来解析文件的前两行,因为当你更频繁地执行它们时模式会被缓存,应该没问题。一种模式的好处是代码更加结构化,但模式更复杂的缺点。配置可能如下所示:
#
# configuration
#
$path = 'md';
$fileFilter = '~\.(md|markdown)$~';
$pattern = '~^(?:Title: (.*))?(?:(?:\r\n|\n)(?:Description: (.*)))?~u';
Just in case the markdown files are actually UTF-8 encoded, I added the u
-modifier (PCRE8).
为了防止markdown文件实际上是UTF-8编码,我添加了u-modifier(PCRE8)。
The processing part of the code is then using a recursive directory iterator over $path
, skips files not matching $fileFilter
and then parses the first two lines of each file (if a file is at least readable and has at least one line) and stores it into a directory based hashtable/array $result
:
然后代码的处理部分使用$ path上的递归目录迭代器,跳过与$ fileFilter不匹配的文件,然后解析每个文件的前两行(如果文件至少是可读的并且至少有一行)并且存储它进入基于目录的散列表/数组$ result:
#
# main
#
# init result array (the nice one)
$result = array();
# recursive iterator for files
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($path, FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO),
RecursiveIteratorIterator::SELF_FIRST);
foreach($iterator as $path => $info)
{
# filter out files that don't match
if (!preg_match($fileFilter, $path)) continue;
# get first two lines
try
{
for
(
$maxLines = 2,
$lines = '',
$file = $info->openFile()
;
!$file->eof() && $maxLines--
;
$lines .= $file->fgets()
);
$lines = rtrim($lines, "\n");
if (!strlen($lines)) # skip empty files
continue;
}
catch (RuntimeException $e)
{
continue; # files which are not readable are skipped.
}
# parse md file
$r = preg_match($pattern, $lines, $matches);
if (FALSE === $r)
{
throw new Exception('Regular expression failed.');
}
list(, $title, $description) = $matches + array('', '', '');
# grow result array
$result[dirname($path)][] = array($path, $title, $description);
}
What's left is the output. As the hashtable is pre-ordered by the directory hash, it's fairly straight forward by first iterating over the directories and then over the files within:
剩下的是输出。由于哈希表是由目录哈希预先排序的,因此首先迭代目录然后遍历以下文件,它是相当直接的:
#
# output
#
$dirCounter = 0;
foreach ($result as $name => $dirs)
{
printf("Directory %d %s\n", ++$dirCounter, basename($name));
foreach ($dirs as $entry)
{
list($path, $title, $description) = $entry;
printf("<a href='%s'>%s from line 1 of Markdown %s</a> <br>\n%s\n\n",
htmlspecialchars($path),
htmlspecialchars($title),
htmlspecialchars(basename($path)),
htmlspecialchars($description)
);
}
}
#2
1
This should work:
这应该工作:
if (preg_match('/\.(md|markdown)$/', $file)) {
// ...
} elseif (is_dir($file)) {
$files = array_merge($files, getFilesFromDir($dir . '/' . $file));
}
Running the regex twice isn't so bad, and may be better than trying to hash something together across both lines. However you could achieve the same result with preg_replace
:
两次运行正则表达式并不是那么糟糕,并且可能比尝试在两行*享某些内容更好。但是你可以用preg_replace实现相同的结果:
$title = trim(preg_replace('/^Title:(.+)/', '$1', $content[0]));
$description = trim(preg_replace('/^Description:(.+)/', '$1', $content[1]));
For outputting your array as per the example, this this:
要按照示例输出数组,请执行以下操作:
foreach ($filesArray as $directory => $files) {
echo $directory . "\n\n";
foreach ($files as $fileData) {
echo '<a href="' . $fileData['filepath'] . '">' . $fileData['title'] . "</a><br />\n";
echo $fileData['description'] . "\n\n";
}
}
#1
2
To recursively iterate over files, the RecursiveDirectoryIterator
is quite handy (related: PHP recursive directory path). It already offers an easy access to FileSystemObject
as well which looks useful in your case as you want to obtain the files content.
为了递归迭代文件,RecursiveDirectoryIterator非常方便(相关:PHP递归目录路径)。它已经提供了对FileSystemObject的轻松访问,这在您的情况下看起来很有用,因为您想要获取文件内容。
Additionally it's possible to run one regular expression to parse the first two lines of the file, as patterns get cached when you execute them more often, it should be fine. One pattern has the benefit that the code is more structured, but the downside that the pattern is more complex. Configuration could look like this:
另外,可以运行一个正则表达式来解析文件的前两行,因为当你更频繁地执行它们时模式会被缓存,应该没问题。一种模式的好处是代码更加结构化,但模式更复杂的缺点。配置可能如下所示:
#
# configuration
#
$path = 'md';
$fileFilter = '~\.(md|markdown)$~';
$pattern = '~^(?:Title: (.*))?(?:(?:\r\n|\n)(?:Description: (.*)))?~u';
Just in case the markdown files are actually UTF-8 encoded, I added the u
-modifier (PCRE8).
为了防止markdown文件实际上是UTF-8编码,我添加了u-modifier(PCRE8)。
The processing part of the code is then using a recursive directory iterator over $path
, skips files not matching $fileFilter
and then parses the first two lines of each file (if a file is at least readable and has at least one line) and stores it into a directory based hashtable/array $result
:
然后代码的处理部分使用$ path上的递归目录迭代器,跳过与$ fileFilter不匹配的文件,然后解析每个文件的前两行(如果文件至少是可读的并且至少有一行)并且存储它进入基于目录的散列表/数组$ result:
#
# main
#
# init result array (the nice one)
$result = array();
# recursive iterator for files
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($path, FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO),
RecursiveIteratorIterator::SELF_FIRST);
foreach($iterator as $path => $info)
{
# filter out files that don't match
if (!preg_match($fileFilter, $path)) continue;
# get first two lines
try
{
for
(
$maxLines = 2,
$lines = '',
$file = $info->openFile()
;
!$file->eof() && $maxLines--
;
$lines .= $file->fgets()
);
$lines = rtrim($lines, "\n");
if (!strlen($lines)) # skip empty files
continue;
}
catch (RuntimeException $e)
{
continue; # files which are not readable are skipped.
}
# parse md file
$r = preg_match($pattern, $lines, $matches);
if (FALSE === $r)
{
throw new Exception('Regular expression failed.');
}
list(, $title, $description) = $matches + array('', '', '');
# grow result array
$result[dirname($path)][] = array($path, $title, $description);
}
What's left is the output. As the hashtable is pre-ordered by the directory hash, it's fairly straight forward by first iterating over the directories and then over the files within:
剩下的是输出。由于哈希表是由目录哈希预先排序的,因此首先迭代目录然后遍历以下文件,它是相当直接的:
#
# output
#
$dirCounter = 0;
foreach ($result as $name => $dirs)
{
printf("Directory %d %s\n", ++$dirCounter, basename($name));
foreach ($dirs as $entry)
{
list($path, $title, $description) = $entry;
printf("<a href='%s'>%s from line 1 of Markdown %s</a> <br>\n%s\n\n",
htmlspecialchars($path),
htmlspecialchars($title),
htmlspecialchars(basename($path)),
htmlspecialchars($description)
);
}
}
#2
1
This should work:
这应该工作:
if (preg_match('/\.(md|markdown)$/', $file)) {
// ...
} elseif (is_dir($file)) {
$files = array_merge($files, getFilesFromDir($dir . '/' . $file));
}
Running the regex twice isn't so bad, and may be better than trying to hash something together across both lines. However you could achieve the same result with preg_replace
:
两次运行正则表达式并不是那么糟糕,并且可能比尝试在两行*享某些内容更好。但是你可以用preg_replace实现相同的结果:
$title = trim(preg_replace('/^Title:(.+)/', '$1', $content[0]));
$description = trim(preg_replace('/^Description:(.+)/', '$1', $content[1]));
For outputting your array as per the example, this this:
要按照示例输出数组,请执行以下操作:
foreach ($filesArray as $directory => $files) {
echo $directory . "\n\n";
foreach ($files as $fileData) {
echo '<a href="' . $fileData['filepath'] . '">' . $fileData['title'] . "</a><br />\n";
echo $fileData['description'] . "\n\n";
}
}