The website is almost entirely d/x/html, and is hosted on a linux/apache server.
该网站几乎完全是d / x / html,并托管在linux / apache服务器上。
While I'm not opposed to using a database, I've been told that I can implement a solution that parses through the html documents and returns my search results without mucking about too much with asp/php/cgi (which I am most certainly a novice in).
虽然我不反对使用数据库,但我被告知我可以实现一个解析html文档的解决方案并返回我的搜索结果,而不会过多地使用asp / php / cgi(我肯定是这样)一个新手)。
Is this possible? Is there a better way? Should I look to a specific third party application?
这可能吗?有没有更好的办法?我应该查看特定的第三方应用程序吗?
THANKS!!!
8 个解决方案
#1
Instead of paying for search appliances, you can also pay Google to have it crawl your site and present customized search results. It's inexpensive and Google does a good job indexing everything (including PDFs). If I remember correctly its ad-supported version is free (i.e. you pay to remove the ads)
您还可以向Google支付费用,以便抓取您的网站并展示自定义搜索结果,而不是为搜索设备付费。它价格便宜,谷歌可以很好地索引所有内容(包括PDF)。如果我没记错的话,其广告支持的版本是免费的(即您付费删除广告)
#2
There are "spiders" that will crawl your site and generate some form of search index. How reliable these are and how well they perform I really can't say. We recently purchased two Google search appliances here at work and use one for our intranet and one for our external web. They do a very nice job of indexing exactly the content you want as well as setting up specialized "search zones" and even keyword mapping.
有些“蜘蛛”会抓取您的网站并生成某种形式的搜索索引。这些是多么可靠以及它们的表现如何我真的不能说。我们最近在这里购买了两个Google搜索设备,其中一个用于我们的内部网,一个用于我们的外部网。他们可以很好地为您想要的内容编制索引,并设置专门的“搜索区域”甚至是关键字映射。
I highly recommend them: http://www.google.com/enterprise/mini/
我强烈推荐他们:http://www.google.com/enterprise/mini/
- Nicholas
#3
The google search is the easiest route. The only thing I would suggest is that you add a google sitemap to your site. That way you can notify google of updates or new pages to make sure the search listing is as up-to-date as possible.
谷歌搜索是最简单的路线。我建议的唯一一件事就是将谷歌站点地图添加到您的站点。这样,您可以通知谷歌更新或新页面,以确保搜索列表尽可能最新。
#4
If you can write some code in your favorite programing language you can also have a look at Apache Solr (url). The concept is simple: You get a seperate Search-Server, already implemented and as a seperated program. You can put in Documents by Posting (HTTP-Post) them to the Search-Server. You can make searches by issuing a GET-Request and getting back a XML-File with the search results.
如果您可以用您喜欢的编程语言编写一些代码,您还可以查看Apache Solr(url)。这个概念很简单:你得到一个单独的Search-Server,已经实现并作为一个单独的程序。您可以通过发布(HTTP-Post)将文档放入Search-Server。您可以通过发出GET-Request并使用搜索结果获取XML文件来进行搜索。
What you have to write is the code to send the files to the search-search (only some lines of code) and the parsing of the xml-search-results (can be done easily with xslt)
你需要写的是将文件发送到搜索搜索的代码(只有一些代码行)和解析xml-search-results(可以使用xslt轻松完成)
I dont know how many documents you are talking about but this solution scales very well, I currently use it with 2.5 Mio Pages in the Index and get results in under 50 ms.
我不知道你说的文件有多少,但这个解决方案非常好,我目前在索引中使用2.5 Mio页面,并在50毫秒内得到结果。
#5
Add a link to Google that only returns results for your domain (with a site:
delimiter). I don't know how to do this but it shouldn't be hard
添加一个只返回您域名结果的Google链接(带有网站:分隔符)。我不知道该怎么做,但这应该不难
#6
Thanks all! I'm currently looking into a google custom search engine. The search bars with logos are cumbersome, but if all google wants for the legwork on this is a watermarked search bar and a couple ads served, then that's the solution for me!
谢谢大家!我目前正在寻找谷歌自定义搜索引擎。带有徽标的搜索栏很麻烦,但是如果所有google都希望通过水印搜索栏和几个广告投放,那么这就是我的解决方案!
#7
Here's how I did the search on my blog (using Google)... don't remember where I got this template from originally but from the comments I guess it originally came from javascriptkit.com. :)
这是我在我的博客上搜索的方式(使用谷歌)...不记得我从哪里获得这个模板,但从评论中我猜它最初来自javascriptkit.com。 :)
<script type="text/javascript">
// Google Internal Site Search script- By JavaScriptKit.com(http://www.javascriptkit.com)
// For this and over 400+ free scripts, visit JavaScript Kit-http://www.javascriptkit.com/
// This notice must stay intact for use
//Enter domain of site to search.
var domainroot="ericasberry.com"
function Gsitesearch(curobj)
{
curobj.q.value="site:"+domainroot+" "+curobj.qfront.value
}
</script>
<form action="http://www.google.com/search" method="get"
onSubmit="Gsitesearch(this)">
<p>Search ericasberry.com:<br />
<input name="q" type="hidden" />
<input name="qfront" type="text" style="width: 180px" />
<input type="submit" value="Search" /></p>
</form>
#8
Google Ajax Search API
Google Ajax Search API
#1
Instead of paying for search appliances, you can also pay Google to have it crawl your site and present customized search results. It's inexpensive and Google does a good job indexing everything (including PDFs). If I remember correctly its ad-supported version is free (i.e. you pay to remove the ads)
您还可以向Google支付费用,以便抓取您的网站并展示自定义搜索结果,而不是为搜索设备付费。它价格便宜,谷歌可以很好地索引所有内容(包括PDF)。如果我没记错的话,其广告支持的版本是免费的(即您付费删除广告)
#2
There are "spiders" that will crawl your site and generate some form of search index. How reliable these are and how well they perform I really can't say. We recently purchased two Google search appliances here at work and use one for our intranet and one for our external web. They do a very nice job of indexing exactly the content you want as well as setting up specialized "search zones" and even keyword mapping.
有些“蜘蛛”会抓取您的网站并生成某种形式的搜索索引。这些是多么可靠以及它们的表现如何我真的不能说。我们最近在这里购买了两个Google搜索设备,其中一个用于我们的内部网,一个用于我们的外部网。他们可以很好地为您想要的内容编制索引,并设置专门的“搜索区域”甚至是关键字映射。
I highly recommend them: http://www.google.com/enterprise/mini/
我强烈推荐他们:http://www.google.com/enterprise/mini/
- Nicholas
#3
The google search is the easiest route. The only thing I would suggest is that you add a google sitemap to your site. That way you can notify google of updates or new pages to make sure the search listing is as up-to-date as possible.
谷歌搜索是最简单的路线。我建议的唯一一件事就是将谷歌站点地图添加到您的站点。这样,您可以通知谷歌更新或新页面,以确保搜索列表尽可能最新。
#4
If you can write some code in your favorite programing language you can also have a look at Apache Solr (url). The concept is simple: You get a seperate Search-Server, already implemented and as a seperated program. You can put in Documents by Posting (HTTP-Post) them to the Search-Server. You can make searches by issuing a GET-Request and getting back a XML-File with the search results.
如果您可以用您喜欢的编程语言编写一些代码,您还可以查看Apache Solr(url)。这个概念很简单:你得到一个单独的Search-Server,已经实现并作为一个单独的程序。您可以通过发布(HTTP-Post)将文档放入Search-Server。您可以通过发出GET-Request并使用搜索结果获取XML文件来进行搜索。
What you have to write is the code to send the files to the search-search (only some lines of code) and the parsing of the xml-search-results (can be done easily with xslt)
你需要写的是将文件发送到搜索搜索的代码(只有一些代码行)和解析xml-search-results(可以使用xslt轻松完成)
I dont know how many documents you are talking about but this solution scales very well, I currently use it with 2.5 Mio Pages in the Index and get results in under 50 ms.
我不知道你说的文件有多少,但这个解决方案非常好,我目前在索引中使用2.5 Mio页面,并在50毫秒内得到结果。
#5
Add a link to Google that only returns results for your domain (with a site:
delimiter). I don't know how to do this but it shouldn't be hard
添加一个只返回您域名结果的Google链接(带有网站:分隔符)。我不知道该怎么做,但这应该不难
#6
Thanks all! I'm currently looking into a google custom search engine. The search bars with logos are cumbersome, but if all google wants for the legwork on this is a watermarked search bar and a couple ads served, then that's the solution for me!
谢谢大家!我目前正在寻找谷歌自定义搜索引擎。带有徽标的搜索栏很麻烦,但是如果所有google都希望通过水印搜索栏和几个广告投放,那么这就是我的解决方案!
#7
Here's how I did the search on my blog (using Google)... don't remember where I got this template from originally but from the comments I guess it originally came from javascriptkit.com. :)
这是我在我的博客上搜索的方式(使用谷歌)...不记得我从哪里获得这个模板,但从评论中我猜它最初来自javascriptkit.com。 :)
<script type="text/javascript">
// Google Internal Site Search script- By JavaScriptKit.com(http://www.javascriptkit.com)
// For this and over 400+ free scripts, visit JavaScript Kit-http://www.javascriptkit.com/
// This notice must stay intact for use
//Enter domain of site to search.
var domainroot="ericasberry.com"
function Gsitesearch(curobj)
{
curobj.q.value="site:"+domainroot+" "+curobj.qfront.value
}
</script>
<form action="http://www.google.com/search" method="get"
onSubmit="Gsitesearch(this)">
<p>Search ericasberry.com:<br />
<input name="q" type="hidden" />
<input name="qfront" type="text" style="width: 180px" />
<input type="submit" value="Search" /></p>
</form>
#8
Google Ajax Search API
Google Ajax Search API