如何一次处理多个xpath（基于feed结构）或创建具有相同结构的我自己的feed

the code below is tested and working, it prints the contents of a feed that has this structure.

下面的代码经过测试和工作,它打印出具有这种结构的feed的内容。

<rss>
    <channel>
        <item>
            <pubDate/>
            <title/>
            <description/>
            <link/>
            <author/>
        </item>
    </channel>
</rss>

What I didn't manage to succesfully do is to print feeds that follow this structure below (the difference is on <feed><entry><published> ) even though I changed the xpath to /feed//entry. you can see the structure on the page source.

我没有成功做到的是打印遵循以下结构的提要(区别在于 ),即使我将xpath更改为/ feed //条目。你可以在页面源上看到结构。

<feed>
    <entry>
        <published/>
        <title/>
        <description/>
        <link/>
        <author/>
    </entry>
</feed>

I have to say that the code sorts all item based on its pubDate. In the second structure feed I guess it should sort all entry based on its published.

我不得不说代码根据pubDate对所有项进行排序。在第二个结构Feed中,我猜它应该基于其发布的所有条目进行排序。

I probably make a mistake on the xPath I can't find. However, if at the end of this I manage to print that feed right, how can I modify the code to handle different structures all at once ?

我可能在我找不到的xPath上犯了一个错误。但是,如果在最后我设法打印该权限,我如何修改代码以同时处理不同的结构?

Is there any service that allow me to create and host my own feeds based on those feeds, so I will have the same structure to all? I hope I made my self clear... Thank you.

是否有任何服务允许我根据这些Feed创建和托管我自己的Feed,所以我将拥有相同的结构?我希望我清楚自己......谢谢你。

<?php

$feeds = array();

// Get all feed entries
$entries = array();
foreach ($feeds as $feed) {
    $xml = simplexml_load_file($feed);
    $entries = array_merge($entries, $xml->xpath(''));
}

?>

3 个解决方案

#1

The main contribution of this answer is a solution (at the end) that can be used with infinite number of formats, just specifying all "entry" alternative names in the external (global) parameter $postElements and all "published-date" alternative names in the external (global) parameter $pub-dateElements.

这个答案的主要贡献是一个解决方案(最后)可以使用无限多种格式,只需在外部(全局)参数$ postElements和所有“发布日期”替代名称中指定所有“条目”替代名称在外部(全局)参数$ pub-dateElements中。

Besides this, here is how to specify an XPath expression that selects all /rss//item and all /feed//entry elements.

除此之外,这里是如何指定一个XPath表达式,它选择所有/ rss //项目和所有/ feed //条目元素。

In the simple case of just two possible document formats this (as proposed by @Josh Davis) Xpath expression correctly works:

在两种可能的文档格式的简单情况下(由@Josh Davis提出)Xpath表达式正确工作:

/rss//item  |   /feed//entry

A more general XPath expression allows the selection of the wanted elements from a set of unlimited number of document formats:

更通用的XPath表达式允许从一组无限数量的文档格式中选择所需元素:

/*[contains($topElements, concat('|',name(),'|'))]
    //*[contains($postElements, concat('|',name(),'|'))]

where the variable $topElements should be substituted by a pipe-delimited string of all possible names for a top element, and $postElements should be substituted by a pipe-delimited string of all possible names for a "entry" element. We also allow the "entry" elements to be at different depths in the different document formats.

其中变量$ topElements应由顶部元素的所有可能名称的竖线分隔字符串替换,$ postElements应由“entry”元素的所有可能名称的竖线分隔字符串替换。我们还允许“条目”元素在不同文档格式中处于不同深度。

In particular, for this concrete case the XPath expression will be;

特别是,对于这个具体情况,XPath表达式将是;

/*[contains('|feed|rss|', concat('|',name(),'|'))]
    //*[contains('|item|entry|', concat('|',name(),'|'))]

The rest of this post shows how the complete wanted processing can be done entirely in XSLT -- easily and with elegance.

这篇文章的其余部分展示了如何在XSLT中完全完成所需的处理 - 轻松而优雅。

I. A gentle introduction

一,温和的介绍

Such processing is easy and simple with XSLT:

使用XSLT,这样的处理简单易行:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <myFeed>
   <xsl:apply-templates/>
  </myFeed>
 </xsl:template>

 <xsl:template match="channel|feed">
  <xsl:apply-templates select="*">
   <xsl:sort select="pubDate|published" order="descending"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="item|entry">
  <post>
    <xsl:apply-templates mode="identity"/>
  </post>
 </xsl:template>

 <xsl:template match="pubDate|published" mode="identity">
  <publicationDate>
   <xsl:apply-templates/>
  </publicationDate>
 </xsl:template>

  <xsl:template match="node()|@*" mode="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="identity"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied to this XML document (in format 1):

当此转换应用于此XML文档时(格式为1):

<rss>
    <channel>
        <item>
            <pubDate>2011-06-05</pubDate>
            <title>Title1</title>
            <description>Description1</description>
            <link>Link1</link>
            <author>Author1</author>
        </item>
        <item>
            <pubDate>2011-06-06</pubDate>
            <title>Title2</title>
            <description>Description2</description>
            <link>Link2</link>
            <author>Author2</author>
        </item>
        <item>
            <pubDate>2011-06-07</pubDate>
            <title>Title3</title>
            <description>Description3</description>
            <link>Link3</link>
            <author>Author3</author>
        </item>
    </channel>
</rss>

and when it is applied on this equivalent document (in format 2):

当它应用于此等效文档时(格式2):

<feed>
        <entry>
            <published>2011-06-05</published>
            <title>Title1</title>
            <description>Description1</description>
            <link>Link1</link>
            <author>Author1</author>
        </entry>
        <entry>
            <published>2011-06-06</published>
            <title>Title2</title>
            <description>Description2</description>
            <link>Link2</link>
            <author>Author2</author>
        </entry>
        <entry>
            <published>2011-06-07</published>
            <title>Title3</title>
            <description>Description3</description>
            <link>Link3</link>
            <author>Author3</author>
        </entry>
</feed>

in both cases the same wanted, correct result is produced:

在这两种情况下,同样需要,产生正确的结果:

<myFeed>
   <post>
      <publicationDate>2011-06-07</publicationDate>
      <title>Title3</title>
      <description>Description3</description>
      <link>Link3</link>
      <author>Author3</author>
   </post>
   <post>
      <publicationDate>2011-06-06</publicationDate>
      <title>Title2</title>
      <description>Description2</description>
      <link>Link2</link>
      <author>Author2</author>
   </post>
   <post>
      <publicationDate>2011-06-05</publicationDate>
      <title>Title1</title>
      <description>Description1</description>
      <link>Link1</link>
      <author>Author1</author>
   </post>
</myFeed>

II. The full solution

II。完整的解决方案

This can be generalized to a parameterized solution:

这可以推广到参数化解决方案:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="postElements" select=
 "'|entry|item|'"/>
 <xsl:param name="pub-dateElements" select=
  "'|published|pubDate|'"/>

  <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="identity"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <myFeed>
   <xsl:apply-templates select=
   "//*[contains($postElements, concat('|',name(),'|'))]">
    <xsl:sort order="descending" select=
     "*[contains($pub-dateElements, concat('|',name(),'|'))]"/>
   </xsl:apply-templates>
  </myFeed>
 </xsl:template>

 <xsl:template match="*">
  <xsl:choose>
   <xsl:when test=
    "contains($postElements, concat('|',name(),'|'))">
    <post>
      <xsl:apply-templates/>
    </post>
   </xsl:when>
   <xsl:when test=
   "contains($pub-dateElements, concat('|',name(),'|'))">
    <publicationDate>
     <xsl:apply-templates/>
    </publicationDate>
   </xsl:when>
   <xsl:otherwise>
    <xsl:call-template name="identity"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

</xsl:stylesheet>

This transformation can be used with infinite number of formats, just specifying all "entry" alternative names in the external (global) parameter $postElements and all "published-date" alternative names in the external (global) parameter $pub-dateElements.

此转换可以使用无限多种格式,只需在外部(全局)参数$ postElements中指定所有“条目”备用名称,在外部(全局)参数$ pub-dateElements中指定所有“已发布日期”备用名称。

Anyone can try this transformation to verify that when applied on the two XML documents above it again produces the same, wanted and correct result.

任何人都可以尝试这种转换来验证当应用于上面的两个XML文档时,它再次产生相同的,想要的和正确的结果。

#2

This question is really two questions, "How to handle multiple xpath at once" and "[How to] create my own feeds with the same structure".

这个问题实际上是两个问题,“如何一次处理多个xpath”和“[如何]创建具有相同结构的自己的源”。

The second one has been brilliantly answered by Dimitre Novatchev. If you want to "merge" or transform one or several XML documents, that's definitely what I'd recommend.

第二个由Dimitre Novatchev出色地回答。如果你想“合并”或转换一个或多个XML文档,那肯定是我推荐的。

Meanwhile, I'll take the easy path and address the first question, "How to handle multiple xpath at once". It's easy, there's an operator for that: |. If you want to query all nodes that match /feed//entry or /rss//item then you can use /feed//entry | /rss//item.

与此同时,我将采取简单的方法并解决第一个问题,“如何同时处理多个xpath”。这很简单,有一个操作员:|。如果要查询匹配/ feed // entry或/ rss // item的所有节点,则可以使用/ feed // entry | / RSS //项目。

#3

Here's a solutions.

这是一个解决方案。

The problem is that many RSS or Atom feeds have namespaces defined which don't play nicely with SimpleXML. In the example below, I'm using str_replace to replace xmlns= to ns=. I'm then using the name of the root element to determine the type of feed (whether it's RSS or Atom).

问题是许多RSS或Atom提要都定义了名称空间,这些命名空间与SimpleXML不能很好地配合。在下面的示例中,我使用str_replace将xmlns =替换为ns =。然后我使用根元素的名称来确定feed的类型(无论是RSS还是Atom)。

The array_push call takes care of adding all of the entries to the $entries array which you can then use later.

array_push调用负责将所有条目添加到$ entries数组中,稍后您可以使用它。

$entries = array();

foreach ( $feeds as $feed )
{
  $xml = simplexml_load_string(str_replace('xmlns=', 'ns=', $feed));

  switch ( strtolower($xml->getName()) )
  {
    // Atom
    case 'feed':
      array_push($entries, $xml->xpath('/feed//entry'));

      break;

    // RSS
    case 'rss':
      array_push($entries, $xml->xpath('/rss//item'));

      break;
  }

  // Unset the namespace variable.
  unset($namespaces);
}

var_dump($entries);

Another solution could be to use Google Reader to aggregate all of your feeds and use that feed instead of all of your separate ones.

另一种解决方案可能是使用Google阅读器汇总所有Feed并使用该Feed而不是所有单独的Feed。

#1