软件质量与测试第2周作业：wordcount

Github: https://github.com/Phi-Li/python3-wc

PSP 2.1 表格

PSP2.1	PSP阶段	预估耗时（分钟）	实际耗时（分钟）
Planning	计划	60	45
· Estimate	· 估计这个任务需要多少时间	60	45
Development	开发	820	865
· Analysis	· 需求分析 (包括学习新技术)	120	120
· Design Spec	· 生成设计文档	30	45
· Design Review	· 设计复审 (和同事审核设计文档)	30	30
· Coding Standard	· 代码规范 (为目前的开发制定合适的规范)	10	10
· Design	· 具体设计	60	60
· Coding	· 具体编码	420	450
· Code Review	· 代码复审	30	30
· Test	· 测试（自我测试，修改代码，提交修改）	120	120
Reporting	报告	75	60
· Test Report	· 测试报告	15	30
· Size Measurement	· 计算工作量	30	15
· Postmortem & Process Improvement Plan	· 事后总结, 并提出过程改进计划	30	15
	合计	955	970

解题思路

因开发时间紧迫，没有足够的时间来学习Java各种包的用法，所以本次作业使用较为熟悉的Python完成。

对于字符，行数和词数的统计，将文件读入一个字符串中，对字符串进行各种操作：

字符串的长度即为字符的个数，在很多 GNU Linux 操作系统中一个字符的长度为一个字节，因此也可直接使用文件的大小。
按换行符将字符串切分为字符串列表，列表的长度即为行数。
使用正则表达式匹配词间分隔符，同理可得词数。

对于源码文件代码行，空行和注释行的统计则比较复杂，详见“代码说明”。

程序设计实现过程

程序首先对参数进行解析，确定执行的操作。接着打开指定的文件，创建文件对象，再将文件对象传给相应的方法执行统计，统计的结果保存在一个字典中。最后输出函数按格式将结果指定要输出的部分按格式写入文件。

代码说明

此部分将对程序关键数据结构和实现功能的关键代码行进行说明。

保存统计结果的字典：

counts = {
    'total_lines' : 0,
    'total_words' : 0,
    'total_chars' : 0,
    'total_bytes' : 0
}

ccounts = {
    'slot': 0,
    'blank': 0,
    'comment': 0
}

输出函数所使用的选项，初始状态下全为False：

print_opt = {
    'print_lines' : False,
    'print_words' : False,
    'print_chars' : False,
    'print_bytes' : False
}

参数解析使用Python内置的getopt实现，具体可见“参考文献”[1]，以下代码参考了其中的示例。

    try:
        opts, args = getopt.gnu_getopt(argv[1:], "clLmwao:e:", longopts)
    except getopt.GetoptError as err:
        print(err)
        usage(dep.EXIT_FAILURE)

    print(opts)
    
    for opt, arg in opts:
        if opt in ("-c", "--bytes"):
            print_opt['print_bytes'] = True
        elif opt in ("-m", "--chars"):
            print_opt['print_chars'] = True
        elif opt in ("-l", "--lines"):
            print_opt['print_lines'] = True
        elif opt in ("-w", "--words"):
            print_opt['print_words'] = True
        elif opt in ("-L", "--max-line-length"):
            print_opt['print_linelength'] = True

...

如“解题思路”中所述，函数wc()中用来统计字符，行数和词数的关键代码行：

counts['total_bytes'] = len(string)
counts['total_lines'] = len(string.split('\n'))

token_list = split(r"[\s,]+", string)
counts['total_words'] = len(token_list)

如果用户指定了一个stoplist，则首先将其按空格切分为一个列表：

...
elif opt == "-e":
            stoplist = open(arg)
            stop_token_list = stoplist.read().split()
            stoplist.close()

wc()接收此列表作为一个参数，调用filter()滤除列表中的元素，filter()接收一个lambda表达式作为判断条件：

    if li:
        token_list = list(filter(lambda t: not (t in li), token_list))

对源码文件代码行，空行和注释行的统计需要对代码的每行进行解析，辨别哪一部分是代码行，哪一部分是空行，哪一部分是注释行。
由于readlines()返回的列表不包括文件尾空行，文件尾为空行时行数为列表长度加1：
```
total = (len(lines) + 1) if lines[len(lines) - 1].endswith('\n') else len(lines)
```
去掉每一行首尾的空格和制表符，如只剩余换行符则为空行，以'#'开头为注释：
```
    for l in lines:
        l = l.strip(' \t')
        if l == '\n':
            ccounts['blank'] = ccounts['blank'] + 1
            continue
        elif l.startswith('#'):
            ccounts['comment'] = ccounts['comment'] + 1
```

输出函数从print_opt中读取输出选项，拼接出要输出的字符串写入文件：

    if print_opt['print_lines']:
        s = s + file + ", " + "行数: " + str(counts['total_lines']) + "\n"
    if print_opt['print_words']:
        s = s + file + ", " + "单词数: " + str(counts['total_words']) + "\n"
    if print_opt['print_chars']:
        s = s + file + ", " + "字符数: " + str(counts['total_chars']) + "\n"
    if cc:
        s = s + file + ", " + "代码行/空行/注释行: " + str(ccounts['slot']) + "/" + str(ccounts['blank']) + "/" + str(ccounts['comment']) + "\n"
...
    output.write(s)

测试设计过程

以下为题目给的测试用例，更多的测试用例参见Github仓库的README。

输入

python3 wc.py –c char.txt

或

wc.exe –c char.txt

char.txt

!@#$%^&*()_+{}|"?><[]\;',./~`a

输出

char.txt, 字符数: 30

输入

python3 wc.py –c charwithspace.txt

或

wc.exe –c charwithspace.txt

charwithspace.txt

!@#$%^ &*()_+	{}|"?><[]\;',.
/~`a

输出

charwithspace.txt, 字符数: 33

输入

python3 wc.py –c wordtest.java

或

wc.exe –c wordtest.java

wordtest.java

File write,name = new File(outputPath);
            writename.createNe|wFile();
            Buffe,redWriter out = new BufferedWrit,er(new FileWriter(writename));
            out.close();

输出

wordtest.java, 单词数: 16
wordtest.java, 行数: 4

输入

python3 wc.py –w stoptest.java –e stoplist.txt

或

wc.exe –w stoptest.java –e stoplist.txt

stoptest.java:

public static void test(){
   if(out == 3){
            file = 100;
            System.out.println(file);
   }
   else if(out == 1){
            file = files[0];
   }
}

stoplist.txt:

out = file else void

输出

stoptest.java, 单词数: 15

秒客网