检测源代码中的SQL注入

Consider the following code snippet:

请考虑以下代码段：

import MySQLdb

def get_data(id):
    db = MySQLdb.connect(db='TEST')
    cursor = db.cursor()
    cursor.execute("SELECT * FROM TEST WHERE ID = '%s'" % id)

    return cursor.fetchall()

print(get_data(1))

There is a major problem in the code - it is vulnerable to SQL injections attacks since the query is not parameterized through DB API and is constructed via string formatting. If you call the function this way:

代码中存在一个主要问题 - 它容易受到SQL注入攻击，因为查询不是通过DB API参数化的，而是通过字符串格式化构建的。如果你这样调用函数：

get_data("'; DROP TABLE TEST -- ")

the following query would be executed:

将执行以下查询：

SELECT * FROM TEST WHERE ID = ''; DROP TABLE TEST --

Now, my goal is to analyze the code in the project and detect all places potentially vulnerable to SQL injections. In other words, where the query is constructed via string formatting as opposed to passing query parameters in a separate argument.

现在，我的目标是分析项目中的代码并检测可能容易受到SQL注入攻击的所有位置。换句话说，查询是通过字符串格式构建的，而不是在单独的参数中传递查询参数。

Is it something that can be solved statically, with the help of pylint, pyflakes or any other static code analysis packages?

在pylint，pyflakes或任何其他静态代码分析包的帮助下，它是否可以静态解决？

I'm aware of sqlmap popular penetration testing tool, but, as far as I understand, it is working against a web resource, testing it as a black-box through HTTP requests.

我知道sqlmap流行的渗透测试工具，但是，据我所知，它正在对抗Web资源，通过HTTP请求将其作为黑盒进行测试。

4 个解决方案

#1

There is a tool that tries to solve exactly what the question is about, py-find-injection:

有一个工具试图准确解决问题的关键，py-find-injection：

py_find_injection uses various heuristics to look for SQL injection vulnerabilities in python source code.

py_find_injection使用各种启发式方法来查找python源代码中的SQL注入漏洞。

It uses ast module, looks for session.execute() and cursor.execute() calls, and checks whether the query inside is formed via string interpolation, concatenation or format().

它使用ast模块，查找session.execute（）和cursor.execute（）调用，并检查内部查询是否通过字符串插值，串联或格式（）形成。

Here is what it outputs while checking the snippet in the question:

以下是在检查问题中的代码段时输出的内容：

$ py-find-injection test.py
test.py:6   string interpolation of SQL query
1 total errors

The project, though, is not actively maintained, but could be used as a starting point. A good idea would be to make a pylint or pyflakes plugin out of it.

但是，该项目没有得到积极维护，但可以作为起点。一个好主意是用它制作一个pylint或pyflakes插件。

#2

Not sure how this will compare with the other packages, but to a certain extent you need to parse the arguments being passed to cursor.execute. This bit of pyparsing code looks for:

不确定这将如何与其他包进行比较，但在某种程度上，您需要解析传递给cursor.execute的参数。这段pyparsing代码寻找：

arguments using string interpolation

使用字符串插值的参数
arguments using string concatenation with variable names

使用字符串连接和变量名称的参数
arguments that are just variable names

只是变量名的参数

But sometimes arguments use string concatenation just to break up a long string into - if all the strings in the expression are literals being added together, there is no risk of SQL injection.

但有时候参数使用字符串连接只是为了将长字符串分解为 - 如果表达式中的所有字符串都是加在一起的字面值，则不存在SQL注入的风险。

This pyparsing snippet will look for calls to cursor.execute, and then look for the at-risk argument forms:

这个pyparsing片段将查找对cursor.execute的调用，然后查找有风险的参数形式：

from pyparsing import *
import re

identifier = Word(alphas, alphanums+'_')
integer = Word(nums)
LPAR,RPAR,PLUS,PERCENT = map(Literal, '()+%')

stringInterpRE = re.compile(r"%-?\d*\*?\.?\d*\*?s")
def containsStringInterpolation(s,l,tokens):
    if not stringInterpRE.search(tokens[0]):
        raise ParseException(s,l,"No string interpolation")
tupleContents = identifier | integer
tupleExpr = LPAR + delimitedList(tupleContents) + RPAR
stringInterpArg = identifier | tupleExpr        
interpolatedString = originalTextFor(quotedString.copy().setParseAction(containsStringInterpolation) + 
                                    PERCENT + stringInterpArg)

stringTerm = interpolatedString | OneOrMore(quotedString.copy()) | identifier
stringTerm.setName("stringTerm")

unsafeStringExpr = (stringTerm + OneOrMore(PLUS + stringTerm)) | identifier | interpolatedString
def unsafeExpr(s,l,tokens):
    if not any(term == interpolatedString or term == identifier
                for term in tokens):
        raise ParseException(s,l,"No unsafe string terms")
unsafeStringExpr.setParseAction(unsafeExpr)
unsafeStringExpr.setName("unsafeExpr")

func = Literal("cursor.execute")
statement = func + LPAR + unsafeStringExpr + RPAR
statement.setName("execute stmt")
#statement.ignore(pythonComment)

for tokens in statement.searchString(sample):
    print ' '.join(tokens.asList())

This will scan through the following sample:

这将扫描以下示例：

sample = """
import MySQLdb

def get_data(id):
    db = MySQLdb.connect(db='TEST')
    cursor = db.cursor()
    cursor.execute("SELECT * FROM TEST WHERE ID = '%s' -- UNSAFE" % id)
    cursor.execute("SELECT * FROM TEST WHERE ID = '" + id + "' -- UNSAFE")
    cursor.execute(sqlVar + " -- UNSAFE")
    cursor.execute("SELECT * FROM TEST WHERE ID = 'FRED' -- SAFE")
    cursor.execute("SELECT * FROM TEST WHERE ID = " + 
                        "'FRED' -- SAFE")
    cursor.execute("SELECT * FROM TEST "
                        "WHERE ID = "
                        "'FRED' -- SAFE")
    cursor.execute("SELECT * FROM TEST "
                        "WHERE ID = " +
                        "'%s' -- UNSAFE" % name)
    return cursor.fetchall()

print(get_data(1))"""

and report these unsafe statements:

并报告这些不安全的陈述：

cursor.execute ( "SELECT * FROM TEST WHERE ID = '%s' -- UNSAFE" % id )
cursor.execute ( "SELECT * FROM TEST WHERE ID = '" + id + "' -- UNSAFE" )
cursor.execute ( sqlVar + " -- UNSAFE" )
cursor.execute ( "SELECT * FROM TEST " "WHERE ID = " + "'%s' -- UNSAFE" % name )

You can also have pyparsing report the location of the found lines, using scanString instead of searchString.

您还可以使用scanString而不是searchString，使用pyparsing报告找到的行的位置。

#3

About the best that I can think you'd get would be grep'ing through your codebase, looking for cursor.execute() statements being passed a string using Python string interpolation, as in your example:

关于我认为你能得到的最好的东西是通过代码库进行grep'ing，寻找使用Python字符串插值传递字符串的cursor.execute（）语句，如下例所示：

cursor.execute("SELECT * FROM TEST WHERE ID = '%s'" % id)

which of course should have been written as a parameterized query to avoid the vulnerability:

当然应该将其编写为参数化查询以避免漏洞：

cursor.execute("SELECT * FROM TEST WHERE ID = '%s'", (id,))

That's not going to be perfect -- for instance, you might have a hard time catching code like this:

这不会是完美的 - 例如，您可能很难捕获这样的代码：

query = "SELECT * FROM TEST WHERE ID = '%s'" % id
# some stuff
cursor.execute(query)

But it might be about the best you can easily do.

但它可能是你可以轻松做到的最好的。

#4

-1

It's a good thing that you're already aware of the problem and trying to resolve it.

你已经意识到问题并尝试解决它是一件好事。

As you may already know, the best practices to execute SQL in any DB is to use prepared statements or stored procedures if these are available.

您可能已经知道，在任何数据库中执行SQL的最佳实践是使用预准备语句或存储过程（如果可用）。

In this particular case, you can implement a prepared statement by "preparing" the statement and then executing.

在这种特殊情况下，您可以通过“准备”语句然后执行来实现预准备语句。

e.g:

例如：

cursor = db.cursor()
query = "SELECT * FROM TEST WHERE ID = %s"  
cur.execute(query, "2")

#1