JsonParseException:未识别的令牌“http”:预期(“true”、“false”或“null”)

时间:2020-12-25 17:08:41

We have the following string which is a valid JSON written to a file on HDFS.

我们有下面的字符串,它是一个用于HDFS文件的有效JSON。

{  
  "id":"tag:search.twitter.com,2005:564407444843950080",
  "objectType":"activity",
  "actor":{  
    "objectType":"person",
    "id":"id:twitter.com:2302910022",
    "link":"http%3A%2F%2Fwww.twitter.com%2Fme7me4610012",
    "displayName":"",
    "postedTime":"2014-01-21T11:06:06.000Z",
    "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F563125491159162881%2FfypkHK3M_normal.jpeg",
    "summary":"‏‏‏‏‏‏‏‏ضًـأّيِّعٌهّ أّروٌأّحًنِأّ تٌـشُـتٌـهّـيِّ مًنِ يِّفُـهّـمًهّـأّ فُـقُط  حسابي بالإنستقرام lloooo_20",
    "links":[  
      {  
        "href":null,
        "rel":"me"
      }
    ],
    "friendsCount":10503,
    "followersCount":10325,
    "listedCount":12,
    "statusesCount":84957,
    "twitterTimeZone":null,
    "verified":false,
    "utcOffset":null,
    "preferredUsername":"me7me4610012",
    "languages":[  
      "ar"
    ],
    "favoritesCount":17
  },
  "verb":"share",
  "postedTime":"2015-02-08T12:56:35.000Z",
  "generator":{  
    "displayName":"Twitter for Android",
    "link":"http%3A%2F%2Ftwitter.com%2Fdownload%2Fandroid"
  },
  "provider":{  
    "objectType":"service",
    "displayName":"Twitter",
    "link":"http%3A%2F%2Fwww.twitter.com"
  },
  "link":"http%3A%2F%2Ftwitter.com%2Fme7me4610012%2Fstatuses%2F564407444843950080",
  "body":"RT @sckud1: فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIln…",
  "object":{  
    "id":"tag:search.twitter.com,2005:564407126526013440",
    "objectType":"activity",
    "actor":{  
      "objectType":"person",
      "id":"id:twitter.com:462268717",
      "link":"http%3A%2F%2Fwww.twitter.com/sckud1",
      "displayName":"صفق الهوى",
      "postedTime":"2012-01-12T19:24:17.000Z",
      "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F508424482885615616%2FmPBGZBPx_normal.jpeg",
      "summary":"اعلانك في سوق الخليج يحقق لك الوصول الى اكثر من مليون متابع خليجي  http%3A%2F%2Fmarketgulf.com",
      "links":[  
        {  
          "href":"http%3A%2F%2Fmarketgulf.com",
          "rel":"me"
        }
      ],
      "friendsCount":435237,
      "followersCount":464951,
      "listedCount":708,
      "statusesCount":1071685,
      "twitterTimeZone":"Riyadh",
      "verified":false,
      "utcOffset":"10800",
      "preferredUsername":"sckud1",
      "languages":[  
        "ar"
      ],
      "location":{  
        "objectType":"place",
        "displayName":"Made in K S A"
      },
      "favoritesCount":77
    },
    "verb":"post",
    "postedTime":"2015-02-08T12:55:19.000Z",
    "generator":{  
      "displayName":"Tweet Old Post",
      "link":"http%3A%2F%2Fwww.ajaymatharu.com%2F"
    },
    "provider":{  
      "objectType":"service",
      "displayName":"Twitter",
      "link":"http%3A%2F%2Fwww.twitter.com"
    },
    "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
    "body":"فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
    "object":{  
      "objectType":"note",
      "id":"object:search.twitter.com,2005:564407126526013440",
      "summary":"فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
      "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
      "postedTime":"2015-02-08T12:55:19.000Z"
    },
    "favoritesCount":0,
    "twitter_entities":{  
      "hashtags":[  

      ],
      "trends":[  

      ],
      "urls":[  
        {  
          "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
          "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
          "display_url":"hasterya.com/archives/34688…",
          "indices":[  
            85,
            107
          ]
        }
      ],
      "user_mentions":[  

      ],
      "symbols":[  

      ],
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_extended_entities":{  
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_filter_level":"low",
    "twitter_lang":"ar"
  },
  "favoritesCount":0,
  "twitter_entities":{  
    "hashtags":[  

    ],
    "trends":[  

    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "display_url":"hasterya.com/archives/34688…",
        "indices":[  
          97,
          119
        ]
      }
    ],
    "user_mentions":[  
      {  
        "screen_name":"sckud1",
        "name":"صفق الهوى",
        "id":462268717,
        "id_str":"462268717",
        "indices":[  
          3,
          10
        ]
      }
    ],
    "symbols":[  

    ],
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_extended_entities":{  
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_filter_level":"low",
  "twitter_lang":"ar",
  "retweetCount":1,
  "gnip":{  
    "matching_rules":[  
      {  
        "tag":"ISIS66"
      }
    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "expanded_status":200
      },
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "expanded_status":200
      }
    ],
    "klout_score":50,
    "language":{  
      "value":"ar"
    }
  }
}

EDIT

编辑

We configure a flume agent that reads the data from that file and pass it to Solr sink but unfortunately this exception in the title is throw.

我们配置一个flume代理,它从该文件读取数据并将其传递给Solr sink,但不幸的是,这个异常在标题中是抛出。

and here is the stack trace

这是堆栈跟踪。

org.kitesdk.morphline.api.MorphlineRuntimeException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:98)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:120)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:128)
    at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:141)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1524)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3095)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2340)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:818)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:698)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:159)
    at org.kitesdk.morphline.json.ReadJsonBuilder$ReadJson.doProcess(ReadJsonBuilder.java:109)
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
    ... 10 more

2 个解决方案

#1


4  

We have the following string which is a valid JSON ...

我们有下面的字符串,它是一个有效的JSON…

Clearly the JSON parser disagrees!

显然,JSON解析器不同意!

However, the exception says that the error is at "line 1: column 9", and there is no "http" token near the beginning of the JSON. So I suspect that the parser is trying to parse something different than this string when the error occurs.

但是,异常表示错误位于“第1行:第9列”,并且在JSON开头没有“http”标记。因此,我怀疑解析器在错误发生时尝试解析与此字符串不同的东西。

You need to find what JSON is actually being parsed. Run the application within a debugger, set a breakpoint on the relevant constructor for JsonParseException ... then find out what is in the ByteArrayInputStream that it is attempting to parse.

您需要找到实际正在解析的JSON。在调试器中运行应用程序,在JsonParseException的相关构造函数上设置断点…然后找出它正在试图解析的ByteArrayInputStream中的内容。

#2


1  

I faced this exception for a long time and was not able to pinpoint the problem. The exception says line 1 column 9. The mistake I did is to get the first line of the file which flume is processing.

我在很长一段时间里都遇到了这个例外,并没有找到问题的症结所在。例外情况是第1行第9行。我所犯的错误是获取flume正在处理的文件的第一行。

Apache flume process the content of the file in patches. So, when flume throws this exception and says line 1, it means the first line in the current patch.

Apache flume处理补丁中文件的内容。所以,当flume抛出这个异常并表示第1行,它表示当前补丁中的第一行。

If your flume agent is configured to use batch size = 100, and (for example) the file contains 400 lines, this means the exception is thrown in one of the following lines 1, 101, 201,301.

如果您的flume代理被配置为使用batch size = 100,并且(例如)该文件包含400行,这意味着异常将被抛出以下第1行,101,201301。

How to discover the line which causes the problem?

如何发现导致问题的线路?

You have three ways to do that.

有三种方法。

1- pull the source code and run the agent in debug mode. If you are an average developer like me and do not know how to make this, check the other two options.

1-拉出源代码并在调试模式下运行代理。如果您是像我一样的普通开发人员,并且不知道该如何做,请检查其他两个选项。

2- Try to split the file based on the batch size and run the flume agent again. If you split the file into 4 files, and the invalid json exists between lines 301 and 400, the flume agent will process the first 3 files and stop at the fourth file. Take the fourth file and again split it into more smaller files. continue the process until you reach a file with only one line and flume fails while processing it.

2-尝试根据批量大小拆分文件并再次运行flume代理。如果将文件分成4个文件,并且在第301行和第400行之间存在无效的json, flume代理将处理前3个文件,并在第4个文件中停止。拿下第四个文件,再把它分割成更小的文件。继续这个过程,直到您到达一个只有一行的文件,并且在处理它时,flume失败。

3- Reduce the batch size of the flume agent to only one and compare the number of processed events in the output of the sink you are using. For example, in my case I am using Solr sink. The file contains 400 lines. The flume agent is configured with batch size=100. When I run the flume agent, it fails at some point and throw that exception. At this point check how many documents are ingested in Solr. If the invalid json exists at line 346, the number of documents indexed into Solr will be 345, so the next line is the line which causes the problem.

3-将flume代理的批处理大小减少到一个,并将处理事件的数量与您正在使用的接收器的输出进行比较。例如,在我的例子中,我使用的是Solr sink。该文件包含400行。flume代理配置为batch size=100。当我运行flume代理时,它会在某个点失败并抛出异常。在这一点上,检查在Solr中吸收了多少文档。如果无效的json存在于第346行,那么索引到Solr的文档数量将是345,所以下一行是导致问题的行。

In my case I followed the third option and fortunately I pinpoint the line which causes the problem.

在我的案例中,我遵循了第三个选项,幸运的是我找到了导致问题的原因。

This is a long answer but it actually does not solve the exception. How I overcome this exception?

这是一个很长的答案,但它实际上并不能解决例外情况。我如何克服这个例外?

I have no idea why Jackson library complain while parsing a json string contains escaped characters \n \r \t. I think (but I am not sure) the Jackson parser is by default escaping these characters which cases the json string to be split into two lines (in case of \n) and then it deals each line as a separate json string.

我不知道为什么在解析json字符串时,Jackson library会抱怨,其中包含了转义字符\n \r \t。我认为(但我不确定)Jackson解析器是默认地转义这些字符,这些字符将json字符串拆分为两行(如果是\n),然后将每一行作为单独的json字符串处理。

In my case we used a customized interceptor to remove these characters before being processed by the flume agent. This is the way we solved this problem.

在我的例子中,我们使用自定义的拦截器来删除这些字符,然后由flume代理处理。这就是我们解决这个问题的方法。

#1


4  

We have the following string which is a valid JSON ...

我们有下面的字符串,它是一个有效的JSON…

Clearly the JSON parser disagrees!

显然,JSON解析器不同意!

However, the exception says that the error is at "line 1: column 9", and there is no "http" token near the beginning of the JSON. So I suspect that the parser is trying to parse something different than this string when the error occurs.

但是,异常表示错误位于“第1行:第9列”,并且在JSON开头没有“http”标记。因此,我怀疑解析器在错误发生时尝试解析与此字符串不同的东西。

You need to find what JSON is actually being parsed. Run the application within a debugger, set a breakpoint on the relevant constructor for JsonParseException ... then find out what is in the ByteArrayInputStream that it is attempting to parse.

您需要找到实际正在解析的JSON。在调试器中运行应用程序,在JsonParseException的相关构造函数上设置断点…然后找出它正在试图解析的ByteArrayInputStream中的内容。

#2


1  

I faced this exception for a long time and was not able to pinpoint the problem. The exception says line 1 column 9. The mistake I did is to get the first line of the file which flume is processing.

我在很长一段时间里都遇到了这个例外,并没有找到问题的症结所在。例外情况是第1行第9行。我所犯的错误是获取flume正在处理的文件的第一行。

Apache flume process the content of the file in patches. So, when flume throws this exception and says line 1, it means the first line in the current patch.

Apache flume处理补丁中文件的内容。所以,当flume抛出这个异常并表示第1行,它表示当前补丁中的第一行。

If your flume agent is configured to use batch size = 100, and (for example) the file contains 400 lines, this means the exception is thrown in one of the following lines 1, 101, 201,301.

如果您的flume代理被配置为使用batch size = 100,并且(例如)该文件包含400行,这意味着异常将被抛出以下第1行,101,201301。

How to discover the line which causes the problem?

如何发现导致问题的线路?

You have three ways to do that.

有三种方法。

1- pull the source code and run the agent in debug mode. If you are an average developer like me and do not know how to make this, check the other two options.

1-拉出源代码并在调试模式下运行代理。如果您是像我一样的普通开发人员,并且不知道该如何做,请检查其他两个选项。

2- Try to split the file based on the batch size and run the flume agent again. If you split the file into 4 files, and the invalid json exists between lines 301 and 400, the flume agent will process the first 3 files and stop at the fourth file. Take the fourth file and again split it into more smaller files. continue the process until you reach a file with only one line and flume fails while processing it.

2-尝试根据批量大小拆分文件并再次运行flume代理。如果将文件分成4个文件,并且在第301行和第400行之间存在无效的json, flume代理将处理前3个文件,并在第4个文件中停止。拿下第四个文件,再把它分割成更小的文件。继续这个过程,直到您到达一个只有一行的文件,并且在处理它时,flume失败。

3- Reduce the batch size of the flume agent to only one and compare the number of processed events in the output of the sink you are using. For example, in my case I am using Solr sink. The file contains 400 lines. The flume agent is configured with batch size=100. When I run the flume agent, it fails at some point and throw that exception. At this point check how many documents are ingested in Solr. If the invalid json exists at line 346, the number of documents indexed into Solr will be 345, so the next line is the line which causes the problem.

3-将flume代理的批处理大小减少到一个,并将处理事件的数量与您正在使用的接收器的输出进行比较。例如,在我的例子中,我使用的是Solr sink。该文件包含400行。flume代理配置为batch size=100。当我运行flume代理时,它会在某个点失败并抛出异常。在这一点上,检查在Solr中吸收了多少文档。如果无效的json存在于第346行,那么索引到Solr的文档数量将是345,所以下一行是导致问题的行。

In my case I followed the third option and fortunately I pinpoint the line which causes the problem.

在我的案例中,我遵循了第三个选项,幸运的是我找到了导致问题的原因。

This is a long answer but it actually does not solve the exception. How I overcome this exception?

这是一个很长的答案,但它实际上并不能解决例外情况。我如何克服这个例外?

I have no idea why Jackson library complain while parsing a json string contains escaped characters \n \r \t. I think (but I am not sure) the Jackson parser is by default escaping these characters which cases the json string to be split into two lines (in case of \n) and then it deals each line as a separate json string.

我不知道为什么在解析json字符串时,Jackson library会抱怨,其中包含了转义字符\n \r \t。我认为(但我不确定)Jackson解析器是默认地转义这些字符,这些字符将json字符串拆分为两行(如果是\n),然后将每一行作为单独的json字符串处理。

In my case we used a customized interceptor to remove these characters before being processed by the flume agent. This is the way we solved this problem.

在我的例子中,我们使用自定义的拦截器来删除这些字符,然后由flume代理处理。这就是我们解决这个问题的方法。