All,
所有人,
I am stuck again trying to get my data in a format that I need it in. I have a text field that looks like this.
我再次被困住了,试图以我需要的格式获取数据。文本框是这样的。
"deangelo 001 deangelo
“迪安杰罗001迪安杰罗
local origin of name: italain
名字的本地来源:italain
from the american name deangelo
来自美国的名字,迪安吉洛
meaning: of the angels
的意义:天使
emotional spectrum • he is a fountain of joy for all.
他是所有人快乐的源泉。
personal integrity • his good name is his most precious asset. personality • it’s hard to soar with eagles when you’re surrounded by turkeys! relationships • starts slowly, but a relationship with deangelo builds over time. travel & leisure • a trip of a lifetime is in his future.
他的好名声是他最宝贵的财富。当你被火鸡包围的时候,很难和老鹰一起飞翔。关系•开始缓慢,但与迪安吉洛的关系随着时间的推移而建立。旅行与休闲•一生的旅行是他的未来。
career & money • a gifted child, deangelo will need to be challenged constantly.
职业和金钱•天才儿童,迪安杰罗需要不断的挑战。
life’s opportunities • joy and happiness await this blessed person.
生命中的机遇•快乐和幸福等待着这位幸运的人。
deangelo’s lucky numbers: 12 • 38 • 18 • 34 • 29 • 16
迪安杰洛的幸运数字:12•38•18•34•29•16
"
”
What would the best way be in Postgresql to remove the carriage returns and new lines? I've tried several things and none of them want to behave.
在Postgresql中,删除回车和新行的最佳方式是什么?我做过几件事,但没有一个人愿意做。
select regexp_replace(field, E'\r\c', ' ', 'g') from mytable
WHERE id = 5520805582
SELECT regexp_replace(field, E'[^\(\)\&\/,;\*\:.\>\<[:space:]a-zA-Z0-9-]', ' ')
FROM mytable
WHERE field~ E'[^\(\)\&\/,;\*\:.\<\>[:space:]a-zA-Z0-9-]'
AND id = 5520805582;
Thanks in advance, Adam
提前谢谢,亚当
4 个解决方案
#1
106
select regexp_replace(field, E'[\\n\\r]+', ' ', 'g' )
read the manual http://www.postgresql.org/docs/current/static/functions-matching.html
阅读手册http://www.postgresql.org/docs/current/static/functions-matching.html
#2
29
select regexp_replace(field, E'[\\n\\r\\u2028]+', ' ', 'g' )
I had the same problem in my postgres d/b, but the newline in question wasn't the traditional ascii CRLF, it was a unicode line separator, character U2028. The above code snippet will capture that unicode variation as well.
在我的postgres d/b中也有同样的问题,但是所涉及的换行符不是传统的ascii CRLF,而是unicode行分隔符,字符U2028。上面的代码片段也将捕获unicode变体。
Update... although I've only ever encountered the aforementioned characters "in the wild", to follow lmichelbacher's advice to translate even more unicode newline-like characters, use this:
更新……虽然我只遇到过前面提到的“in the wild”字符,但是要按照lmichelbacher的建议翻译更多的unicode换行字符,请使用以下方法:
select regexp_replace(field, E'[\\n\\r\\f\\u000B\\u0085\\u2028\\u2029]+', ' ', 'g' )
#3
13
OP asked specifically about regexes since it would appear there's concern for a number of other characters as well as newlines, but for those just wanting strip out newlines, you don't even need to go to a regex. You can simply do:
OP专门询问了关于regex的问题,因为它似乎涉及到许多其他字符和换行符,但是对于那些只想去掉换行符的人来说,甚至不需要使用regex。你可以做的:
select replace(field,E'\n','');
I think this is an SQL-standard behavior, so it should extend back to all but perhaps the very earliest versions of Postgres. The above tested fine for me in 9.4 and 9.2
我认为这是一个sql标准的行为,所以它应该扩展到所有的,但可能是最早的Postgres版本。在9.4和9.2中,上述测试对我来说都很好。
#4
7
In the case you need to remove line breaks from the begin or end of the string, you may use this:
如果您需要从字符串的开头或结尾删除换行符,您可以使用以下方法:
UPDATE table
SET field = regexp_replace(field, E'(^[\\n\\r]+)|([\\n\\r]+$)', '', 'g' );
Have in mind that the hat ^
means the begin of the string and the dollar sign $
means the end of the string.
记住,帽子^意味着字符串的开始和美元符号$意味着结束的字符串。
Hope it help someone.
希望它帮助别人。
#1
106
select regexp_replace(field, E'[\\n\\r]+', ' ', 'g' )
read the manual http://www.postgresql.org/docs/current/static/functions-matching.html
阅读手册http://www.postgresql.org/docs/current/static/functions-matching.html
#2
29
select regexp_replace(field, E'[\\n\\r\\u2028]+', ' ', 'g' )
I had the same problem in my postgres d/b, but the newline in question wasn't the traditional ascii CRLF, it was a unicode line separator, character U2028. The above code snippet will capture that unicode variation as well.
在我的postgres d/b中也有同样的问题,但是所涉及的换行符不是传统的ascii CRLF,而是unicode行分隔符,字符U2028。上面的代码片段也将捕获unicode变体。
Update... although I've only ever encountered the aforementioned characters "in the wild", to follow lmichelbacher's advice to translate even more unicode newline-like characters, use this:
更新……虽然我只遇到过前面提到的“in the wild”字符,但是要按照lmichelbacher的建议翻译更多的unicode换行字符,请使用以下方法:
select regexp_replace(field, E'[\\n\\r\\f\\u000B\\u0085\\u2028\\u2029]+', ' ', 'g' )
#3
13
OP asked specifically about regexes since it would appear there's concern for a number of other characters as well as newlines, but for those just wanting strip out newlines, you don't even need to go to a regex. You can simply do:
OP专门询问了关于regex的问题,因为它似乎涉及到许多其他字符和换行符,但是对于那些只想去掉换行符的人来说,甚至不需要使用regex。你可以做的:
select replace(field,E'\n','');
I think this is an SQL-standard behavior, so it should extend back to all but perhaps the very earliest versions of Postgres. The above tested fine for me in 9.4 and 9.2
我认为这是一个sql标准的行为,所以它应该扩展到所有的,但可能是最早的Postgres版本。在9.4和9.2中,上述测试对我来说都很好。
#4
7
In the case you need to remove line breaks from the begin or end of the string, you may use this:
如果您需要从字符串的开头或结尾删除换行符,您可以使用以下方法:
UPDATE table
SET field = regexp_replace(field, E'(^[\\n\\r]+)|([\\n\\r]+$)', '', 'g' );
Have in mind that the hat ^
means the begin of the string and the dollar sign $
means the end of the string.
记住,帽子^意味着字符串的开始和美元符号$意味着结束的字符串。
Hope it help someone.
希望它帮助别人。