需要帮助理解C函数

Here's the deal: I'm trying, as a learning experience, to convert a C program to C++. This program takes a text file and applies modifications to it according to user-inputted rules. Specifically, it applies sounds changes to a set of words, using rules formatted like "s1/s2/env". s1 represents the characters to be changed, s2 represents what to change it into, and env is the context in which the change should be applied.

这是交易:我正在尝试将C程序转换为C ++作为学习经验。该程序采用文本文件,并根据用户输入的规则对其进行修改。具体来说,它使用格式为“s1 / s2 / env”的规则将声音更改应用于一组单词。 s1表示要更改的字符,s2表示要将其更改为的内容,env是应该应用更改的上下文。

I'm sorry that I don't describe this in more depth, but the question would be too long, and the author's site already explains it.

对不起,我没有更详细地描述这个,但问题太长了,作者的网站已经解释过了。

The function I'm having trouble is TryRule. I understand that it's supposed to see if a given rule applies to a given string, but I'm having trouble understanding exactly how it does it. The poor explanation of the parameters confuses me: for example, I don't understand why the strings "s1" and "s2" have to be passed back, or what does "i" represent.

我遇到麻烦的功能是TryRule。我知道它应该看看给定的规则是否适用于给定的字符串,但我很难理解它是如何做到的。对参数的不良解释让我感到困惑:例如,我不明白为什么字符串“s1”和“s2”必须被传回,或者“i”代表什么。

This is the code:

这是代码:

/*
**  TryRule
**
**  See if a rule s1->s2/env applies at position i in the given word.
**
**  If it does, we pass back the index where s1 was found in the
**  word, as well as s1 and s2, and return TRUE.
**
**  Otherwise, we return FALSE, and pass garbage in the output variables.
*/
int TryRule( char *word, int i, char *Rule, int *n, char **s1, char **s2, char *varRep )
    {
        int j, m, cont = 0;
        int catLoc;
        char *env;
        int  optional = FALSE;
        *varRep = '\0';

        if (!Divide( Rule, s1, s2, &env ) || !strchr( env, '_' ))
            return(FALSE);

        for (j = 0, cont = TRUE; cont && j < strlen(env); j++)
        {
            switch( env[j] )
            {
                case '(':
                    optional = TRUE;
                    break;

                case ')':
                    optional = FALSE;
                    break;

                case '#':
                    cont = j ? (i == strlen(word)) : (i == 0); 
                    break;

                case '_':
                    cont = !strncmp( &word[i], *s1, strlen(*s1) );
                    if (cont)
                    {
                        *n = i;
                        i += strlen(*s1);
                    }
                    else
                    {
                        cont = TryCat( *s1, &word[i], &m, &catLoc );
                        if (cont && m)
                        {
                            int c;
                            *n = i;
                            i += m;

                            for (c = 0; c < nCat; c++)
                                if ((*s2)[0] == Cat[c][0] && catLoc < strlen(Cat[c]))
                                    *varRep = Cat[c][catLoc];
                        }
                        else if (cont)
                            cont = FALSE;
                    }
                    break;

                default:
                    cont = TryCat( &env[j], &word[i], &m, &catLoc );
                    if (cont && !m)
                    {
                        /* no category applied */
                        cont = i < strlen(word) && word[i] == env[j];
                        m = 1;
                    }
                    if (cont)
                        i += m;
                    if (!cont && optional)
                        cont = TRUE;
            }
        }
        if (cont && printRules)
            printf( "   %s->%s /%s applies to %s at %i\n", 
            *s1, *s2, env, word, *n );

    return(cont);
}

3 个解决方案

#1

This code is... tough to read. I looked at the original file, and it could really use some better variable names. I especially love this part from one of the function comments:

这段代码......难以阅读。我查看了原始文件,它确实可以使用一些更好的变量名称。我特别喜欢其中一个功能评论:

/*
** (Stuff I removed)
**
** Warning: For now, we don't have a way to handle digraphs. 
**
** We also return TRUE if (<- It really just stops here!)
*/

I can see the challenge. I agree with wmeyer about the variables. I think I understand things, so I'm going to attempt to translate the function into pseudo code.

我可以看到挑战。我同意wmeyer的变量。我想我理解的东西,所以我将尝试将函数转换为伪代码。

Word: The string we are looking at
i: The index in the string we're looking at
Rule: The text of the rule (i.e. "v/b/_")
n: A variable to return the index into the string we found the match for the _, I think
s1: Returns the first part of the rule, decoded out of Rule
s2: Returns the second part of the rule, decoded out of Rule
varRep: Returns the character matched in the category, if a category matched, I think

Word:我们正在查看的字符串i:我们正在查看的字符串中的索引规则:规则的文本(即“v / b / _”)n:将索引返回到我们找到的字符串的变量_的匹配,我认为s1:返回规则的第一部分,从规则s2解码:返回规则的第二部分,解析出规则varRep:返回类别中匹配的字符,如果匹配的类别, 我认为

int TryRule( char *word, int i, char *Rule,
                int *n, char **s1, char **s2, char *varRep ) {
        Prepare a bunch of variables we''ll use later
        Mark that we''re not working on an optional term
        Set varRep''s first char to null, so it''s an empty string

        if (We can parse the rule into it''s parts
              OR there is no _ in the environment (which is required))
            return FALSE // Error, we can't run, the rule is screwy

        for (each character, j, in env (the third part of the rule)) {
            if (cont is TRUE) {
                switch (the character we''re looking at, j) {
                    if the character is opening paren:
                        set optional to TRUE, marking it''s an optional character
                    if the character is closing paren:
                        set optional to FALSE, since we''re done with optional stuff
                    if the character is a hash mark (#):
                        // This is rather complicated looking, but it's not bad
                        // This uses a ? b : c, which means IF a THEN b ELSE c
                        // Remember i is the position in the word we are looking at
                        // Hash marks match the start or end of a word
                        // J is the character in the word

                        if (j >= 0) {
                            // We're not working on the first character in the rule
                            // so the # mark we found is to find the end of a word

                            if (i == the length of the word we''re looking at) {
                                // We've found the end of the word, so the rule matches

                                continue = true;   // Keep going
                            } else {
                                // We're not at the end of a word, but we found a hash
                                // Rule doesn't match, so break out of the main loop by setting
                                //     continue to false

                                continue = false;
                            }
                        } else {
                            // OK, the hash mark is the first part of env,
                            // so it signifies the start of a word

                            continue = (i == 0);   // Continue holds if we
                                                   // are matching the first
                                                   // character in *word or not
                        }
                    if the character is an _ (the match character):
                        // This gets complicated

                        continue = if word starting at character i ISN''T s1, the search string;

                        if (continue == TRUE) {
                            // There was no match, so we'll go look at the next word
                            n = the index of the word start that didn''t match   // Not sure why
                            i = i (start index to look) + length of s1 (word we just matched)
                            // This means i now holds the index of the start of the next word
                        } else {
                            // TryCat sees if the character we're trying to match is a category

                            continue = s1 is a category in the program
                                          && the category contains the character at word[i]

                            // If continue holds false, s1 was a category and we found no match
                            // If continue holds true, s1 either wasn't a category (so m = 0)
                            //     or s1 WAS a category, m contains 1, and catLoc holds which
                            //     character in the category definition was matched

                            if (we found a match of some sort
                                   && s1 was a category (indicated by m == 1)) {
                                n = index of the character in the word we found a match
                                i = the index of the next character (m is always 1, so this is ugly)

                                for (each category defined) {
                                    if (first character of s2
                                           == the category''s name
                                        && where in the category definition we matched
                                              is less than the length of the category we''re on) {
                                           varRep = the character matched in the category
                                        }
                                }

                                // Now the above seems EXACTLY like the TryCat function. You'd
                                // think varRep would always hold the same value as catLoc. I
                                // believe this loop is so that later rules also get applied?
                            } else {
                                continue = FALSE; // Because we didn't match a letter or category
                            }
                        }
                    Any other character:
                        continue = the character we''re looking at is a category in the program
                                      && the category contains the character at word[i]

                        if (there was a match AND it wasn''t a category (m == 0, just a letter)) {
                            m = 1;
                            continue if and only if there are characters left in the word
                                 (i < strlen()) && the current character is at word[i]
                                 (we matched a literal character, instead of a category)
                        }

                        if (continue)
                            i = i + m // Remember, M is always 1 or 0
                                      // So this is basically IF continue THEN i++ END IF
                        if ((continue == FALSE) && (optional == TRUE))
                            // We didn't find a match, but we're working on an optional part
                            // So continue anyway
                            continue = TRUE;
                end switch
             end if continue == true
        }
    }

    if (continue && printRules)
        print out a little debug statement showing what we matched

    return continue;   // At this point, if continue is false we can't keep matching
}

I hope this helps. You may need to read it a few times. It took me over 45 minutes to write this, almost entirely because of trying to decipher exactly what's going on in some of the cases around TryCat. Add in about 5 minutes for constantly trying to hit the Tab key and getting my cursor send to the next field (stupid HTML text box).

我希望这有帮助。您可能需要阅读几次。这花了我超过45分钟来写这个,几乎完全是因为试图破解TryCat周围的一些案例中究竟发生了什么。添加大约5分钟,不断尝试按Tab键并将光标发送到下一个字段(愚蠢的HTML文本框)。

Sorry this is so big, you'll probably have to do a bunch of horizontal scrolling.

对不起,这太大了,你可能不得不做一堆水平滚动。

#2

Given that you are converting from C to C++ you should be refactoring the code to become more readable as well.

鉴于您正在从C转换为C ++,您应该重构代码以使其更具可读性。

One major problem with this code is that the variables have terrible names and I'd wager even the original writer of the routine would need to spend some time analysing it.

这段代码的一个主要问题是变量的名称很糟糕,我甚至打赌这个例程的原始作者需要花一些时间来分析它。

Just renaming the variables to be more precise would give you a greater hand in understanding what the code does.

只需将变量重命名为更精确,就可以更好地理解代码的作用。

Take a look at some questions tagged under refactoring for some help. There is also Refactoring by Martin Fowler

看看在重构下标记的一些问题,以获得一些帮助。 Martin Fowler也有Refactoring

#3

I think you need the whole code to understand this fragment.

我认为你需要整个代码来理解这个片段。

It looks like "word", "i" and "Rule" are input variables, the rest are pure output variables.

它看起来像“word”,“i”和“Rule”是输入变量,其余都是纯输出变量。

"i" is the current index within "word", i.e. TryRule only looks at "word" starting at "word[i]".

“i”是“word”中的当前索引,即TryRule仅查看从“word [i]”开始的“word”。

In "s1" the functions returns the left side of the rule that was applied. In "s2" the right side of that rule.

在“s1”中,函数返回应用的规则的左侧。在“s2”中,该规则的右侧。

In "n" the function returns the position within "word" where the rule applies.

在“n”中,函数返回规则适用的“单词”内的位置。

No idea what "varRep" is.

不知道“varRep”是什么。

#1