What I want to accomplish: I want to join 2 tables on urls. But one table has url formats that are a bit different from the other. As they are the only common denominator between the two tables, (no unique id's to join on) I have to use wildcards to match the url's.
我想要实现的目标:我想在网址上加入2个表格。但是一个表的url格式与另一个表有点不同。因为它们是两个表之间唯一的共同点,(没有唯一的id加入)我必须使用通配符来匹配url。
For example one table has formats like these: 'www.url.com', 'url.com.', 'abc.def.url.com' while another table has format 'url.com' So in order to match I need to do something like
例如,一个表格具有以下格式:'www.url.com','url.com。','abc.def.url.com',而另一个表格格式为'url.com'所以为了匹配我需要的做某事
LEFT JOIN t1
ON t1.url = '%.'||t2.url OR t1.url= t2.url||'.' OR etc...
Is this the right way to approach this?
这是解决这个问题的正确方法吗?
I'm using PostgreSQL.
我正在使用PostgreSQL。
2 个解决方案
#1
1
You have a very strict constraint as you are joining on URLs. url.com should match url.com and www.url.com should match url.com but www.dummyurl.com shouldnt match www.myurl.com. Your original query is fine and is the right way to approach the problem.
在加入URL时,您有一个非常严格的约束。 url.com应匹配url.com,www.url.com应与url.com匹配,但www.dummyurl.com不应与www.myurl.com匹配。您的原始查询很好,是解决问题的正确方法。
To avoid overmatching, would recommend using this
为避免匹配,建议使用此功能
select * from t1 inner join t2
on t1.url like '%.'||t2.url or t1.url = t2.url
I have used inner join to avoid NULL matching Here's a demo
我使用内连接来避免NULL匹配这是一个演示
This assumes that t2.url will be at the end of t1.uel after a period or both strictly match. Try running this and see if you get your anticipated results
这假设在一段时间或两者严格匹配之后t2.url将在t1.uel的末尾。尝试运行它,看看你是否得到了预期的结果
#2
0
Just a guess, can you use CONTAINS()?
只是一个猜测,你能使用CONTAINS()吗?
LEFT JOIN t1
ON CONTAINS(t1.url, t2.url)
Edit: Well, seems that PostGRE doesn't support CONTAINS.
编辑:好吧,似乎PostGRE不支持CONTAINS。
Try using position() function. It returns 0 if the substring is not found
尝试使用position()函数。如果未找到子字符串,则返回0
ON position(t2.url in t1.url) <> 0
#1
1
You have a very strict constraint as you are joining on URLs. url.com should match url.com and www.url.com should match url.com but www.dummyurl.com shouldnt match www.myurl.com. Your original query is fine and is the right way to approach the problem.
在加入URL时,您有一个非常严格的约束。 url.com应匹配url.com,www.url.com应与url.com匹配,但www.dummyurl.com不应与www.myurl.com匹配。您的原始查询很好,是解决问题的正确方法。
To avoid overmatching, would recommend using this
为避免匹配,建议使用此功能
select * from t1 inner join t2
on t1.url like '%.'||t2.url or t1.url = t2.url
I have used inner join to avoid NULL matching Here's a demo
我使用内连接来避免NULL匹配这是一个演示
This assumes that t2.url will be at the end of t1.uel after a period or both strictly match. Try running this and see if you get your anticipated results
这假设在一段时间或两者严格匹配之后t2.url将在t1.uel的末尾。尝试运行它,看看你是否得到了预期的结果
#2
0
Just a guess, can you use CONTAINS()?
只是一个猜测,你能使用CONTAINS()吗?
LEFT JOIN t1
ON CONTAINS(t1.url, t2.url)
Edit: Well, seems that PostGRE doesn't support CONTAINS.
编辑:好吧,似乎PostGRE不支持CONTAINS。
Try using position() function. It returns 0 if the substring is not found
尝试使用position()函数。如果未找到子字符串,则返回0
ON position(t2.url in t1.url) <> 0