如何在服务器端最好地验证JSON

When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.

在服务器端处理POST，PUT和PATCH请求时，我们经常需要处理一些JSON来执行请求。

It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:

很明显，我们需要以某种方式验证这些JSON（例如结构，允许/预期键和值类型），我至少可以看到两种方式：

Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.

收到JSON后，在对JSON进行任何操作以完成请求之前，先按原样验证JSON。
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.

按原样获取JSON，开始处理它（例如访问其各种键值）并尝试在执行业务逻辑时随时验证它，并可能使用一些异常处理来处理时态数据。

The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).

第一种方法与第二种方法相比似乎更加强大，但可能更昂贵（时间成本），因为每个请求都将得到验证（并且希望它们中的大多数都是有效的，因此验证有点多余）。

The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.

第二种方法可以保存对有效请求的强制验证，但是在业务逻辑中混合检查可能是错误的甚至是有风险的。

Which of the two above is better? Or, is there yet a better way?

以上哪两个更好？或者，还有更好的方法吗？

6 个解决方案

#1

What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.

您使用POST，PUT和PATCH描述的内容听起来就像是在实现REST API。根据您的后端平台，您可以使用将JSON映射到非常强大的对象并为您执行验证的库。在JAVA中，您可以使用Jersey，Spring或Jackson。如果您使用的是.NET，则可以使用Json.NET。

If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.

如果效率是您的目标，并且您希望验证每个请求，那么如果您使用JavaScript可以使用json2.js，则可以在前端进行评估，这将是理想的选择。

In regards to comparing your methods, here is a Pro / Cons list.

关于比较您的方法，这是一个Pro / Cons列表。

Method #1: Upon Request

Pros

优点

The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
保持业务逻辑完整性。正如您所提到的，在处理业务逻辑时尝试验证可能会导致实际上无效的无效测试，反之亦然，或者验证可能会无意中对业务逻辑产生负面影响。
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
正如诺伯特所说，事先捕捉错误将提高效率。这个逻辑问题是为什么要花时间处理，如果首先出现错误？
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
代码将更清晰，更易于阅读。将验证和业务逻辑分开将使代码更清晰，更易于阅读和维护。

Cons

缺点

It could result in redundant processing meaning longer computing time.
它可能导致冗余处理意味着更长的计算时间。

Method #2: Validation on the Go

Pros

优点

It's efficient theoretically by saving process and compute time doing them at the same time.
从理论上讲，它可以通过保存过程和计算时间来实现它们。

Cons

缺点

In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
实际上，节省的处理时间可能微不足道（如Norbert所述）。无论哪种方式，您仍在进行验证检查。此外，如果发现错误，则浪费处理时间。
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
可以包括数据完整性。在以这种方式处理时，JSON可能会损坏。
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
代码不是很清楚。在阅读业务逻辑时，可能不会发生什么事情，因为验证逻辑是混合的。

What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.

它真正归结为Accuracy vs Speed。它们通常具有反比关系。随着您越来越准确并验证您的JSON，您可能不得不在速度上妥协。这在大型数据集中确实很明显，因为这些天计算机真的很快。由于您在接收数据时对数据的准确程度或者额外的一秒左右是否至关重要，因此您需要决定哪些更重要。在某些情况下，它确实很重要（即与股票市场和医疗保健应用程序，毫秒重要），两者都非常重要。在这种情况下，当您增加一个，例如准确度时，您可能必须通过获得更高性能的机器来提高速度。

Hope this helps.

希望这可以帮助。

#2

The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.

第一种方法更稳健，但不一定要显着更昂贵。即使您因错误而能够中止解析过程，它也会变得更便宜：您的业务逻辑通常占用流程中90％以上的资源，因此如果您的错误百分比为10％，那么您已经是资源中立的。如果您优化验证过程以便事先执行业务流程的验证，那么您的错误率可能会低得多（例如，20比1中的1比1）以保持资源中立。

For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):

有关假设前期数据验证的实现的示例，请查看GSON（https://code.google.com/p/google-gson/）：

GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data: Sample object (JAVA used as example language):

GSON的工作原理如下：JSON的每个部分都可以转换为对象。此对象是键入的或包含类型化数据：示例对象（用作示例语言的JAVA）：

public class someInnerDataFromJSON {
    String name;
    String address;
    int housenumber;
    String buildingType;
    // Getters and setters
    public String getName() { return name; }
    public void setName(String name) { this.name=name; }
    //etc.
}

The data parsed by GSON is by using the model provided, already type checked. This is the first point where your code can abort.

GSON解析的数据是使用提供的模型，已经进行了类型检查。这是您的代码可以中止的第一个点。

After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.

在此出口点之后，假设数据已确认到模型，您可以验证数据是否在特定限制范围内。您也可以将其写入模型中。

Assume for this buildingType is a list:

假设这个buildingType是一个列表：

Single family house
单身家庭的房子
Multi family house
多家庭的房子
Apartment
公寓

You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.

您可以通过创建检查数据的setter来检查解析期间的数据，也可以在第一组业务规则应用程序中解析后检查数据。首先检查数据的好处是，您以后的代码将具有较少的异常处理，因此更少且更容易理解代码。

#3

In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.

一般来说，第一种选择是要走的路。您可能需要考虑第二个选项的唯一原因是，如果您正在处理数十MB或更大的JSON数据。

In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.

换句话说，只有当您尝试流式传输JSON并动态处理它时，您才需要考虑第二个选项。

Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.

假设您每个JSON最多只处理几百KB，那么您可以选择第一个选项。

Here are some steps you could follow:

以下是您可以遵循的一些步骤：

Go for a JSON parser like GSON that would just convert your entire JSON input into the corresponding Java domain model object. (If GSON doesn't throw an exception, you can be sure that the JSON is perfectly valid.)
去寻找像GSON这样的JSON解析器，只需将整个JSON输入转换为相应的Java域模型对象。（如果GSON没有抛出异常，您可以确定JSON完全有效。）
Of course, the objects which were constructed using GSON in step 1 may not be in a functionally valid state. For example, functional checks like mandatory fields and limit checks would have to be done.
当然，在步骤1中使用GSON构造的对象可能不处于功能有效状态。例如，必须进行功能检查，如必填字段和限制检查。
For this, you could define a validateState method which repeatedly validates the states of the object itself and its child objects.
为此，您可以定义一个validateState方法，该方法重复验证对象本身及其子对象的状态。

Here is an example of a validateState method:

以下是validateState方法的示例：

public void validateState(){ 
    //Assume this validateState is part of Customer class.

    if(age<12 || age>150) 
        throw new IllegalArgumentException("Age should be in the range 12 to 120");
    if(age<18 && (guardianId==null || guardianId.trim().equals("")) 
        throw new IllegalArgumentException("Guardian id is mandatory for minors");

    for(Account a:customer.getAccounts()){
        a.validateState(); //Throws appropriate exceptions if any inconsistency in state
    }
}

#4

I would definitively go for validation before processing.

我会在处理之前明确地进行验证。

Let's say you receive some json data with 10 variables of which you expect:

假设您收到一些json数据，其中包含10个您期望的变量：

the first 5 variables to be of type string
前5个变量是string类型
6 and 7 are supposed to be integers
假设6和7是整数
8, 9 and 10 are supposed to be arrays
8,9和10应该是阵列

You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.

在开始处理任何此类数据之前，您可以执行快速变量类型验证，如果其中一个失败，则返回验证错误响应。

foreach($data as $varName => $varValue){
    $varType = gettype($varValue);
    if(!$this->isTypeValid($varName, $varType)){
        // return validation error
    }
}

// continue processing

Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.

想一想您直接处理数据的情况，然后第10个值变成无效类型。处理前9个变量是浪费资源，因为无论如何最终都会返回一些验证错误响应。最重要的是，您必须回滚已保留到存储的任何更改。

I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.

我只在我的例子中使用变量类型，但我建议在处理任何变量之前对所有变量进行完全验证（长度，最大/最小值等）。

#5

The answer depends entirely on your use case.

答案完全取决于您的用例。

If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.

如果您希望所有调用都来自受信任的客户端，那么应该实现前期模式验证，以便仅在设置调试标志时激活它。

However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.

但是，如果您的服务器提供公共api服务，那么您应该预先验证呼叫。这不仅仅是一个性能问题 - 您的服务器可能会被客户，黑客，竞争对手等的安全漏洞仔细检查。

If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.

如果您的服务器向不受信任的客户端提供私有api服务（例如，在必须与第三方开发人员的系统集成的封闭网络设置中），那么您至少应该预先运行这些检查，以免您受到指责别人的蠢事。

#6

It really depends on your requirements. But in general I'd always go for #1.

这实际上取决于您的要求。但总的来说，我总是选择＃1。

Few considerations:

几点考虑：

For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.

为了保持一致性，我将使用方法＃1，以获得性能＃2。但是，当使用＃2时，您必须考虑到在无效输入的情况下回滚可能在将来变得复杂，因为逻辑会发生变化。

Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.

Json验证不应该花那么长时间。在python中，您可以使用ujson解析json字符串，这是json python模块的超快C实现。

For validation, I use the jsonschema python module which makes json validation easy.

为了验证，我使用了jsonschema python模块，它使json验证变得容易。

Another approach:

另一种方法：

if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.

如果使用jsonschema，则可以逐步验证json请求。我将对json结构中最常见/最重要的部分进行初始验证，并验证业务逻辑路径中的其余部分。这将允许编写更简单的json模式，因此更轻量级。

The final decision:

最终决定：

If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:

如果（并且仅当）此决定是关键的，我将实施两种解决方案，在正确和错误的输入条件下对它们进行时间分析，并根据错误的输入频率对结果进行加权。因此：

1c = average time spent with method 1 on correct input
1c =方法1在正确输入上花费的平均时间
1w = average time spent with method 1 on wrong input
1w =方法1在错误输入上花费的平均时间
2c = average time spent with method 2 on correct input
2c =方法2在正确输入上花费的平均时间
2w = average time spent with method 2 on wrong input
2w =方法2在错误输入上花费的平均时间
CR = correct input rate (or frequency)
CR =正确的输入速率（或频率）

WR = wrong input rate (or frequency)

WR =错误的输入速率（或频率）

if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
    chose method 1
else:
    chose method 2

#1

If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.

如果效率是您的目标，并且您希望验证每个请求，那么如果您使用JavaScript可以使用json2.js，则可以在前端进行评估，这将是理想的选择。

In regards to comparing your methods, here is a Pro / Cons list.

关于比较您的方法，这是一个Pro / Cons列表。

Method #1: Upon Request

Pros

优点

The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
保持业务逻辑完整性。正如您所提到的，在处理业务逻辑时尝试验证可能会导致实际上无效的无效测试，反之亦然，或者验证可能会无意中对业务逻辑产生负面影响。
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
正如诺伯特所说，事先捕捉错误将提高效率。这个逻辑问题是为什么要花时间处理，如果首先出现错误？
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
代码将更清晰，更易于阅读。将验证和业务逻辑分开将使代码更清晰，更易于阅读和维护。

Cons

缺点

It could result in redundant processing meaning longer computing time.
它可能导致冗余处理意味着更长的计算时间。

Method #2: Validation on the Go

Pros

优点

It's efficient theoretically by saving process and compute time doing them at the same time.
从理论上讲，它可以通过保存过程和计算时间来实现它们。

Cons

缺点

In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
实际上，节省的处理时间可能微不足道（如Norbert所述）。无论哪种方式，您仍在进行验证检查。此外，如果发现错误，则浪费处理时间。
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
可以包括数据完整性。在以这种方式处理时，JSON可能会损坏。
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
代码不是很清楚。在阅读业务逻辑时，可能不会发生什么事情，因为验证逻辑是混合的。

Hope this helps.

希望这可以帮助。

#2

For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):

有关假设前期数据验证的实现的示例，请查看GSON（https://code.google.com/p/google-gson/）：

GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data: Sample object (JAVA used as example language):

GSON的工作原理如下：JSON的每个部分都可以转换为对象。此对象是键入的或包含类型化数据：示例对象（用作示例语言的JAVA）：

public class someInnerDataFromJSON {
    String name;
    String address;
    int housenumber;
    String buildingType;
    // Getters and setters
    public String getName() { return name; }
    public void setName(String name) { this.name=name; }
    //etc.
}

The data parsed by GSON is by using the model provided, already type checked. This is the first point where your code can abort.

GSON解析的数据是使用提供的模型，已经进行了类型检查。这是您的代码可以中止的第一个点。

After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.

在此出口点之后，假设数据已确认到模型，您可以验证数据是否在特定限制范围内。您也可以将其写入模型中。

Assume for this buildingType is a list:

假设这个buildingType是一个列表：

Single family house
单身家庭的房子
Multi family house
多家庭的房子
Apartment
公寓

#3

In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.

一般来说，第一种选择是要走的路。您可能需要考虑第二个选项的唯一原因是，如果您正在处理数十MB或更大的JSON数据。

In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.

换句话说，只有当您尝试流式传输JSON并动态处理它时，您才需要考虑第二个选项。

Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.

假设您每个JSON最多只处理几百KB，那么您可以选择第一个选项。

Here are some steps you could follow:

以下是您可以遵循的一些步骤：

Go for a JSON parser like GSON that would just convert your entire JSON input into the corresponding Java domain model object. (If GSON doesn't throw an exception, you can be sure that the JSON is perfectly valid.)
去寻找像GSON这样的JSON解析器，只需将整个JSON输入转换为相应的Java域模型对象。（如果GSON没有抛出异常，您可以确定JSON完全有效。）
Of course, the objects which were constructed using GSON in step 1 may not be in a functionally valid state. For example, functional checks like mandatory fields and limit checks would have to be done.
当然，在步骤1中使用GSON构造的对象可能不处于功能有效状态。例如，必须进行功能检查，如必填字段和限制检查。
For this, you could define a validateState method which repeatedly validates the states of the object itself and its child objects.
为此，您可以定义一个validateState方法，该方法重复验证对象本身及其子对象的状态。

Here is an example of a validateState method:

以下是validateState方法的示例：

public void validateState(){ 
    //Assume this validateState is part of Customer class.

    if(age<12 || age>150) 
        throw new IllegalArgumentException("Age should be in the range 12 to 120");
    if(age<18 && (guardianId==null || guardianId.trim().equals("")) 
        throw new IllegalArgumentException("Guardian id is mandatory for minors");

    for(Account a:customer.getAccounts()){
        a.validateState(); //Throws appropriate exceptions if any inconsistency in state
    }
}

#4

I would definitively go for validation before processing.

我会在处理之前明确地进行验证。

Let's say you receive some json data with 10 variables of which you expect:

假设您收到一些json数据，其中包含10个您期望的变量：

the first 5 variables to be of type string
前5个变量是string类型
6 and 7 are supposed to be integers
假设6和7是整数
8, 9 and 10 are supposed to be arrays
8,9和10应该是阵列

You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.

在开始处理任何此类数据之前，您可以执行快速变量类型验证，如果其中一个失败，则返回验证错误响应。

foreach($data as $varName => $varValue){
    $varType = gettype($varValue);
    if(!$this->isTypeValid($varName, $varType)){
        // return validation error
    }
}

// continue processing

I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.

我只在我的例子中使用变量类型，但我建议在处理任何变量之前对所有变量进行完全验证（长度，最大/最小值等）。

#5

The answer depends entirely on your use case.

答案完全取决于您的用例。

If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.

如果您希望所有调用都来自受信任的客户端，那么应该实现前期模式验证，以便仅在设置调试标志时激活它。

#6

It really depends on your requirements. But in general I'd always go for #1.

这实际上取决于您的要求。但总的来说，我总是选择＃1。

Few considerations:

几点考虑：

Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.

Json验证不应该花那么长时间。在python中，您可以使用ujson解析json字符串，这是json python模块的超快C实现。

For validation, I use the jsonschema python module which makes json validation easy.

为了验证，我使用了jsonschema python模块，它使json验证变得容易。

Another approach:

另一种方法：

The final decision:

最终决定：

1c = average time spent with method 1 on correct input
1c =方法1在正确输入上花费的平均时间
1w = average time spent with method 1 on wrong input
1w =方法1在错误输入上花费的平均时间
2c = average time spent with method 2 on correct input
2c =方法2在正确输入上花费的平均时间
2w = average time spent with method 2 on wrong input
2w =方法2在错误输入上花费的平均时间
CR = correct input rate (or frequency)
CR =正确的输入速率（或频率）

WR = wrong input rate (or frequency)

WR =错误的输入速率（或频率）

if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
    chose method 1
else:
    chose method 2

秒客网

如何在服务器端最好地验证JSON

6 个解决方案

#1

Method #1: Upon Request

Method #2: Validation on the Go

#2

#3

#4

#5

#6

#1

Method #1: Upon Request

Method #2: Validation on the Go

#2

#3

#4

#5

#6

相关文章