Node.js Cheerio parser breaks UTF-8 encoding

时间:2022-10-17 15:02:16

From: https://*.com/questions/31574127/node-js-cheerio-parser-breaks-utf-8-encoding

[问题]

I parse my request with Cheerio like this:

var url = http://shop.nag.ru/catalog/16939.IP-videonablyudenie-OMNY/16944.IP-kamery-OMNY-c-vario-obektivom/16704.OMNY-1000-PRO;
request.get(url, function (err, response, body) {
console.log(body);
$ = cheerio.load(body);
console.log($(".description").html());
});

And as output I see content but in unreadable strange encoding:

//Plain body console.log(body) (p.s. russian chars):
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1><p style // cheerio's console.log $(".description").html()
<h1><span style="font-size: 16px;">&#x423;&#x43B;&#x438;&#x447;&#x43D;&#x430;&#x44F; 3&#x41C;&#x43F; IP HD &#x43A;&#x430;&#x43C;&#x435;&#x440;&#x430; OMNY

Target url link coding is in UTF-8 format. So why Cheerio breaks my encoding?

Trying to use iconv to encode my body responce:

var body1 = iconv.decode(body, "utf-8");

but console.log($(".description").html()); still returns weird text.

[回答]

Cheerio hasn't broken anything. The HTML it outputs will be rendered by any browser exactly the same as the HTML input. Take a look at this snippet:

<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>

<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>
 Run code snippet

It's merely the case that У is the HTML "entity" for the UTF-8 character У, in the same way the entity &gt; represents >.

However, if you want to get the unencoded text, you can set the decodeEntities option to false:

const $ = cheerio.load(
`<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>`,
{ decodeEntities: false }
); console.log($('span').html())
// => Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше
.as-console-wrapper{min-height:100%}
<script src="https://wzrd.in/standalone/cheerio@latest"></script>