MySQL - 使用特殊字符的文本字符串搜索数据库

时间:2022-09-13 07:44:11

I have a news system, and we need you to access the details of the news from a link that will take the title of the news, not the news ID database.

我有一个新闻系统,我们需要您从链接获取新闻的详细信息,该链接将采用新闻标题,而不是新闻ID数据库。

Note: Some published news have words and letters with special characters, for example:

注意:某些已发布的新闻包含带有特殊字符的单词和字母,例如:

News title: $title = Nuevo demo de la canción de ñandú;

新闻标题:$ title = Nuevo demo delacancióndeñandú;

This title explode with PHP to remove hyphens and spaces would be as follows.

这个标题用PHP爆炸删除连字符和空格如下。

Explode title: $title_explode = explode(' ','-', $title);

爆炸标题:$ title_explode = explode('',' - ',$ title);

Looks like this: $title_filter = nuevo-demo-de-la-canción-de-ñandú;

看起来像这样:$ title_filter = nuevo-demo-de-la-canción-de-ñandú;

I created in the database a field title in this field the $title will be saved, and I have another field within the database, title_filter, where would place the title $title_filter, which it is the title that exploit, and would be saved also in the database, but we have a ñ and accented letters as we can to query the database to bring me to this record smoothly comparison.

我在数据库中创建了一个字段标题,在这个字段中将保存$ title,我在数据库中有另一个字段title_filter,其中会放置标题$ title_filter,它是利用的标题,也会被保存在数据库中,但我们有一个ñ和重音字母,因为我们可以查询数据库,以便我顺利地对比这个记录。

Query: $sql = SELECT * FROM news WHERE title = $title_filter;

查询:$ sql = SELECT * FROM news WHERE title = $ title_filter;

Domain with the news: www.dominio.com/nuevo-demo-de-la-canción-de-ñandú

有关新闻的域名:www.dominio.com/nuevo-demo-de-la-canción-de-ñandú

My questions:

我的问题:

  1. This search is effective?
  2. 这种搜索有效吗?
  3. You can optimize and remove accents $ title to make it so: $title_filter = nuevo-demo-de-la-cancion-de-nandu;
  4. 您可以优化并删除重音符号$ title以使其成为:$ title_filter = nuevo-demo-de-la-cancion-de-nandu;
  5. Worth create the additional field of the database title_filter to saved clean and do the title search on that title.
  6. 值得创建数据库title_filter的附加字段以保存清除并对该标题执行标题搜索。

Note: The idea of not using the word ID to search the database is to have cleaner seo links and improve the website.

注意:不使用单词ID搜索数据库的想法是拥有更清晰的seo链接并改进网站。

I hope you have understood me and can help me solve this dilemma and make database query data more effective.

我希望你了解我,可以帮助我解决这个难题,使数据库查询数据更有效。

1 个解决方案

#1


2  

I answer you according to the data you give me:

我根据你给我的数据回答你:

Question 1

That search wouldn't give you any result because the field title is not filtered. That means, you are comparing:

该搜索不会给您任何结果,因为字段标题未被过滤。这意味着,你正在比较:

$title_filtered = 'Nuevo-demo-de-la-canción-de-ñandú';
SELECT * FROM news  WHERE title = $title_filter

But title contains => 'Nuevo demo de la canción de ñandú'

但标题包含=>'Nuevo demo delacancióndeñandú'

So you should change your SQL query to this one:

所以你应该将SQL查询更改为:

SELECT * FROM news WHERE title_filter = $title_filter

SELECT * FROM news WHERE title_filter = $ title_filter

Question 2

Yes, this is posible with two different ways, both are explained in * (I give you the links):

是的,这有两种不同的方式,两者都在*中解释(我给你链接):

  • Using a REGEX and string replace.
  • 使用REGEX和字符串替换。
  • Using a list of a characters to replace
  • 使用要替换的字符列表
  • Using the translator PHP functions iconv
  • 使用翻译PHP函数iconv

You can read these three solutions in this * post. It is perfectly explained.

您可以在此*帖子中阅读这三个解决方案。这是完美的解释。

Question 3

Definitly yes, if you don't create that fields the performance of your applications will we affected:

绝对是的,如果您不创建那些字段,我们会影响您的应用程序的性能:

  1. Everytime a user (or a crawler) visit any URL of your site, you have to filter the title.
  2. 每当用户(或抓取工具)访问您网站的任何网址时,您都必须过滤标题。
  3. As you only have the URL filtered, and the to apply the filter you must do it on database site, that means over use unnecesarly the database server. And as you know, that is a well know Neck bottle.
  4. 由于您只过滤了URL,并且要应用过滤器,您必须在数据库站点上执行此操作,这意味着过度使用数据库服务器。如你所知,这是一个众所周知的颈瓶。
  5. It is a great security gap because use data user directly without filters expose you to SQL injections.
  6. 这是一个很大的安全性差距,因为没有过滤器直接使用数据用户会让您接受SQL注入。

May be your doubts are about database space consumtion of having a new field. But the problems to not do that are more relevant, as you see above, also the number of new titles won't cause any space memory problem.

可能是你的疑虑是关于拥有一个新领域的数据库空间的消耗。但是,如上所述,不这样做的问题更为相关,新标题的数量也不会造成任何空间记忆问题。

Extra - About the filtering

One posible problem you have is the way you replace spaces to -. If you have something like this:

你有一个可能的问题是你将空格替换为 - 的方式。如果您有这样的事情:

$title = "This is a test "

$ title =“这是一个测试”

Your method will filter into this:

你的方法将过滤到这个:

$title_filtered = "This-----is-a-test----"

$ title_filtered =“这-----是一个测试----”

So, to avoid those kinds of problems is better to use a REGEX like this:

所以,要避免这类问题,最好使用这样的REGEX:

$string = '    hello     world     ';
$filters = array('@^[\t\s]+@', '@[\t\s]+$@', '@[\t\s]+@');
$filtered = array('', '', '-');

$title_filtered = preg_replace($filters, $filtered, $string);

Extra - About the Database performance

Other suggestion that will help you, is the use a MySQL index.

其他可以帮助你的建议是使用MySQL索引。

Because if you don't have one, every time an user visit an url you will do a full scan on your news table searching for the correct row. And that is not scalable and will give extra work to your server when your traffic and the number of news increase.

因为如果你没有,每次用户访问网址时,你都会在新闻表上进行全面扫描,搜索正确的行。这是不可扩展的,当您的流量和新闻数量增加时,它将为您的服务器提供额外的工作。

To solve that a MySQL Index is the best way, take a look to the oficial page. Simplifying, instead of doing a full scan of the news table, you only will do a search into a tree data structure.

要解决MySQL索引是最好的方法,请查看官方页面。简化而不是对新闻表进行全面扫描,您只会搜索树数据结构。

Using this one should be enought:

应该使用这个:

ALTER TABLE `news` ADD INDEX `index_title_filter` (`title_filter`)

If you want to know more, read this * post.

如果您想了解更多信息,请阅读此*帖子。

#1


2  

I answer you according to the data you give me:

我根据你给我的数据回答你:

Question 1

That search wouldn't give you any result because the field title is not filtered. That means, you are comparing:

该搜索不会给您任何结果,因为字段标题未被过滤。这意味着,你正在比较:

$title_filtered = 'Nuevo-demo-de-la-canción-de-ñandú';
SELECT * FROM news  WHERE title = $title_filter

But title contains => 'Nuevo demo de la canción de ñandú'

但标题包含=>'Nuevo demo delacancióndeñandú'

So you should change your SQL query to this one:

所以你应该将SQL查询更改为:

SELECT * FROM news WHERE title_filter = $title_filter

SELECT * FROM news WHERE title_filter = $ title_filter

Question 2

Yes, this is posible with two different ways, both are explained in * (I give you the links):

是的,这有两种不同的方式,两者都在*中解释(我给你链接):

  • Using a REGEX and string replace.
  • 使用REGEX和字符串替换。
  • Using a list of a characters to replace
  • 使用要替换的字符列表
  • Using the translator PHP functions iconv
  • 使用翻译PHP函数iconv

You can read these three solutions in this * post. It is perfectly explained.

您可以在此*帖子中阅读这三个解决方案。这是完美的解释。

Question 3

Definitly yes, if you don't create that fields the performance of your applications will we affected:

绝对是的,如果您不创建那些字段,我们会影响您的应用程序的性能:

  1. Everytime a user (or a crawler) visit any URL of your site, you have to filter the title.
  2. 每当用户(或抓取工具)访问您网站的任何网址时,您都必须过滤标题。
  3. As you only have the URL filtered, and the to apply the filter you must do it on database site, that means over use unnecesarly the database server. And as you know, that is a well know Neck bottle.
  4. 由于您只过滤了URL,并且要应用过滤器,您必须在数据库站点上执行此操作,这意味着过度使用数据库服务器。如你所知,这是一个众所周知的颈瓶。
  5. It is a great security gap because use data user directly without filters expose you to SQL injections.
  6. 这是一个很大的安全性差距,因为没有过滤器直接使用数据用户会让您接受SQL注入。

May be your doubts are about database space consumtion of having a new field. But the problems to not do that are more relevant, as you see above, also the number of new titles won't cause any space memory problem.

可能是你的疑虑是关于拥有一个新领域的数据库空间的消耗。但是,如上所述,不这样做的问题更为相关,新标题的数量也不会造成任何空间记忆问题。

Extra - About the filtering

One posible problem you have is the way you replace spaces to -. If you have something like this:

你有一个可能的问题是你将空格替换为 - 的方式。如果您有这样的事情:

$title = "This is a test "

$ title =“这是一个测试”

Your method will filter into this:

你的方法将过滤到这个:

$title_filtered = "This-----is-a-test----"

$ title_filtered =“这-----是一个测试----”

So, to avoid those kinds of problems is better to use a REGEX like this:

所以,要避免这类问题,最好使用这样的REGEX:

$string = '    hello     world     ';
$filters = array('@^[\t\s]+@', '@[\t\s]+$@', '@[\t\s]+@');
$filtered = array('', '', '-');

$title_filtered = preg_replace($filters, $filtered, $string);

Extra - About the Database performance

Other suggestion that will help you, is the use a MySQL index.

其他可以帮助你的建议是使用MySQL索引。

Because if you don't have one, every time an user visit an url you will do a full scan on your news table searching for the correct row. And that is not scalable and will give extra work to your server when your traffic and the number of news increase.

因为如果你没有,每次用户访问网址时,你都会在新闻表上进行全面扫描,搜索正确的行。这是不可扩展的,当您的流量和新闻数量增加时,它将为您的服务器提供额外的工作。

To solve that a MySQL Index is the best way, take a look to the oficial page. Simplifying, instead of doing a full scan of the news table, you only will do a search into a tree data structure.

要解决MySQL索引是最好的方法,请查看官方页面。简化而不是对新闻表进行全面扫描,您只会搜索树数据结构。

Using this one should be enought:

应该使用这个:

ALTER TABLE `news` ADD INDEX `index_title_filter` (`title_filter`)

If you want to know more, read this * post.

如果您想了解更多信息,请阅读此*帖子。