词法分析器正确输出但指针“filename”包含错误的名称

时间:2021-09-01 09:42:39

The purpose of this code is to read the following txts(d.txt,e.txt,f.txt) and do the actions that are required in order to put the alphabet with the correct order into the output.txt. The code suppose to work since in output.txt i get the correct results but there is a problem with the testing i did using the printf (it's at the end of newfile function). In order to run i give as input d.txt and output.txt. It should print

此代码的目的是读取以下txts(d.txt,e.txt,f.txt)并执行将字母顺序正确的字母放入output.txt所需的操作。代码假设工作,因为在output.txt我得到正确的结果,但我使用printf测试有问题(它在newfile函数的末尾)。为了运行我输入d.txt和output.txt作为输入。它应该打印

top->prev points to file :d
top->prev points to file :e

but instead it prints the following and i can't find the reason

但相反它打印以下,我找不到原因

top->prev points to file :d
top->prev points to file :f

d.txt:

d.txt:

abc
#include e.txt
mno

e.txt:

e.txt:

def
#include f.txt
jkl

f.txt:

f.txt:

ghi

code:

码:

%{
#include <stdio.h>
#include <stdlib.h>

struct yyfilebuffer{
    YY_BUFFER_STATE bs;
    struct yyfilebuffer *prev;
    FILE *f;
    char *filename;
}*top;

int i;
char temporal[7];
void newfile(char *filename);
void popfile();
void create();
%}

%s INC
%option noyywrap
%%
"#include " {BEGIN INC;}
<INC>.*$ {for(i=1;i<strlen(yytext)-2;i++)
          {
            temporal[i-1]=yytext[i];
          }
          newfile(temporal);
          BEGIN INITIAL;
         }

<<EOF>> {popfile();
        BEGIN INITIAL;
        }
%%

void main(int argc,int **argv)
{
    if ( argc < 3 )
    {
        printf("\nUsage yybuferstate <filenamein> <filenameout>");
        exit(1);
    }
    else
    {
        create();
        newfile(argv[1]);
        yyout = fopen(argv[2], "w");
        yylex();
    }
    system("pause");
}

void create()
{
    top = NULL;
}

void newfile(char *filename)
{
    struct yyfilebuffer *newptr;
    if(top == NULL)
    {
        newptr = malloc(1*sizeof(struct yyfilebuffer));
        newptr->prev = NULL;
        newptr->filename = filename;
        newptr->f = fopen(filename,"r");
        newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
        top = newptr;
        yy_switch_to_buffer(top->bs);
    }
    else
    {
        newptr = malloc(1*sizeof(struct yyfilebuffer));
        newptr->prev = top;
        newptr->filename = filename;
        newptr->f = fopen(filename,"r");
        newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
        top = newptr;
        yy_switch_to_buffer(top->bs);   //edw
    }
    if(top->prev != NULL)
    {
        printf("top->prev points to file : %s\n",top->prev->filename);
    }
}

void popfile()
{
    struct yyfilebuffer *temp;      
    temp = NULL;
    if(top->prev == NULL)
    {
        printf("\n Error : Trying to pop from empty stack");
        exit(1);
    }
    else
    {
        temp = top;
        top = temp->prev;
        yy_switch_to_buffer(top->bs);
        system("pause");
    }
}

1 个解决方案

#1


2  

You need to think about how you manage memory, remembering that C does not really have a string type in the way you might be used to from other languages.

您需要考虑如何管理内存,记住C实际上没有像您可能习惯使用其他语言的字符串类型。

You define a global variable:

您定义一个全局变量:

char temporal[7];

(which has an odd name, since globals are anything but temporary), and then fill in its value in your lexer:

(它有一个奇怪的名字,因为全局变量不是临时的),然后在你的词法分析器中填入它的值:

for(i=1;i<strlen(yytext)-2;i++) {
        temporal[i-1]=yytext[i];
}

There are at least three problems with the above code:

上述代码至少有三个问题:

  1. temporal only has room for a six-character filename, but nowhere do you check to make sure that yyleng is not greater than 6. If it is, you will overwrite random memory. (The flex-generated scanner sets yyleng to the length of the token whose starting address is yytext. So you might as well use that value instead of computing strlen(yytext), which involves a scan over the text.)

    temporal只有六个字符文件名的空间,但是你无处检查以确保yyleng不大于6.如果是,你将覆盖随机内存。 (flex生成的扫描程序将yyleng设置为令牌的起始地址为yytext的长度。因此,您可以使用该值而不是计算strlen(yytext),这涉及对文本进行扫描。)

  2. You never null-terminate temporal. That's OK the first time, because it has static lifetime and will therefore be filled with zeros at program initialization. But the second and subsequent times you are counting on the new filename to not be shorter than the previous one; otherwise, you'll end up with part of the previous name at the end of the new name.

    你永远不会终止时间。这是第一次,因为它具有静态生命周期,因此在程序初始化时将填充零。但是第二次和以后的时间你指望新的文件名不会比前一个短;否则,您将在新名称的末尾结束以前名称的一部分。

  3. You could have made much better use of the standard C library. Although for reasons I will note below, this does not solve the problem you observe, it would have been better to use the following instead of the loop, after checking that yyleng is not too big:

    您可以更好地使用标准C库。虽然由于下面我将注意到的原因,但这并没有解决你观察到的问题,在检查yyleng不是太大之后使用以下而不是循环会更好:

    memcpy(temporal, yytext + 1, yyleng - 2); /* Copy the filename */
    temporal[yyleng - 2] = '\0';              /* NUL-terminate the copy */
    

Once you make the copy in temporal, you give that to newfile:

一旦你以时间方式制作副本,你就把它给新文件:

newfile(temporal);

And in newfile, what we see is:

在newfile中,我们看到的是:

newptr->filename = filename;

That does not copy filename. The call to newfile passed the address of temporal as an argument, so within newfile, the value of the parameter filename is the address of temporal. You then store that address in newptr->filename, so newptr->filename is also the address of temporal.

那不会复制文件名。对newfile的调用将temporal的地址作为参数传递,因此在newfile中,参数filename的值是temporal的地址。然后将该地址存储在newptr-> filename中,因此newptr-> filename也是temporal的地址。

But, as noted above, temporal is not temporary. It is a global variable whose lifetime is the entire lifetime of the program. So the next time your lexical scanner encounters an include directive, it will put it into temporal, overwriting the previous contents. So what then happens to the filename member in the yyfilebuffer structure? Answer: nothing. It still points to the same place, temporal, but the contents of that place have changed. So when you later print out the contents of the string pointed to by that filename field, you'll get a different string from the one which happened to be in temporal when you first created that yyfilebuffer structure.

但是,如上所述,时间不是暂时的。它是一个全局变量,其生命周期是程序的整个生命周期。因此,下次您的词汇扫描程序遇到include指令时,它会将其置于时间内,覆盖以前的内容。那么yyfilebuffer结构中的文件名成员会发生什么呢?答:没事。它仍然指向同一个地方,时间,但该地方的内容已经改变。因此,当您稍后打印出该文件名字段所指向的字符串的内容时,您将获得与第一次创建该yyfilebuffer结构时恰好处于临时状态的字符串不同的字符串。

On the whole, you'll find it easier to manage memory if newfile and popfile "own" the memory in the filebuffer stack. That means that newfile should make a copy of its argument into freshly-allocated storage, and popfile should free that storage, since it is no longer needed. If newfile makes a copy, then it is not necessary for the lexical-scanner action which calls newfile to make a copy; it is only necessary for it to make sure that the string is correctly NUL-terminated when it calls newfile.

总的来说,如果newfile和popfile“拥有”filebuffer堆栈中的内存,你会发现管理内存更容易。这意味着newfile应该将其参数的副本复制到新分配的存储中,并且popfile应该释放该存储,因为不再需要它。如果newfile复制,则调用newfile的词法扫描程序操作不需要复制;它只需要确保字符串在调用newfile时正确地以NUL方式终止。

In short, the code might look like this:

简而言之,代码可能如下所示:

/* Changed parameter to const, since we are not modifying its contents */
void newfile(const char *filename) { 
    /* Eliminated this check as obviously unnecessary: if(top == NULL) */
    struct yyfilebuffer *newptr = malloc(sizeof(struct yyfilebuffer));
    newptr->prev = top;
    // Here we copy filename. Since I suspect that you are on Windows,
    // I'll write it out in full. Normally, I'd use strdup.
    newptr->filename = malloc(strlen(filename) + 1);
    strcpy(newptr->filename, filename);
    newptr->f = fopen(filename,"r");
    newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
    top = newptr;
    yy_switch_to_buffer(top->bs);   //edw

    if(top->prev != NULL) {
        printf("top->prev points to file : %s\n",top->prev->filename);
    }
}

void popfile() {
    if(top->prev == NULL) {
        fprintf(stderr, "Error : Trying to pop from empty stack\n");
        exit(1);
    }
    struct yyfilebuffer temp = top;
    top = temp->prev;
    /* Reclaim memory */
    free(temp->filename);
    free(temp);

    yy_switch_to_buffer(top->bs);
    system("pause");
}

Now that newfile takes ownership of the string passed to it, we no longer need to make a copy. Since the action clearly indicates that you expect the argument to the #include to be something like a C #include directive (surrounded either by "..." or <...>), it is better to make that explicit:

既然newfile取得了传递给它的字符串的所有权,我们就不再需要复制了。由于该操作清楚地表明您希望#include的参数类似于C #include指令(由“...”或<...>包围),因此最好将其显式化:

<INC>\".+\"$|"<".+">"$ {
           /* NUL-terminate the filename by overwriting the trailing "*/
           yytext[yyleng - 1] = '\0';
           newfile(yytext + 1);
           BEGIN INITIAL;
         }

#1


2  

You need to think about how you manage memory, remembering that C does not really have a string type in the way you might be used to from other languages.

您需要考虑如何管理内存,记住C实际上没有像您可能习惯使用其他语言的字符串类型。

You define a global variable:

您定义一个全局变量:

char temporal[7];

(which has an odd name, since globals are anything but temporary), and then fill in its value in your lexer:

(它有一个奇怪的名字,因为全局变量不是临时的),然后在你的词法分析器中填入它的值:

for(i=1;i<strlen(yytext)-2;i++) {
        temporal[i-1]=yytext[i];
}

There are at least three problems with the above code:

上述代码至少有三个问题:

  1. temporal only has room for a six-character filename, but nowhere do you check to make sure that yyleng is not greater than 6. If it is, you will overwrite random memory. (The flex-generated scanner sets yyleng to the length of the token whose starting address is yytext. So you might as well use that value instead of computing strlen(yytext), which involves a scan over the text.)

    temporal只有六个字符文件名的空间,但是你无处检查以确保yyleng不大于6.如果是,你将覆盖随机内存。 (flex生成的扫描程序将yyleng设置为令牌的起始地址为yytext的长度。因此,您可以使用该值而不是计算strlen(yytext),这涉及对文本进行扫描。)

  2. You never null-terminate temporal. That's OK the first time, because it has static lifetime and will therefore be filled with zeros at program initialization. But the second and subsequent times you are counting on the new filename to not be shorter than the previous one; otherwise, you'll end up with part of the previous name at the end of the new name.

    你永远不会终止时间。这是第一次,因为它具有静态生命周期,因此在程序初始化时将填充零。但是第二次和以后的时间你指望新的文件名不会比前一个短;否则,您将在新名称的末尾结束以前名称的一部分。

  3. You could have made much better use of the standard C library. Although for reasons I will note below, this does not solve the problem you observe, it would have been better to use the following instead of the loop, after checking that yyleng is not too big:

    您可以更好地使用标准C库。虽然由于下面我将注意到的原因,但这并没有解决你观察到的问题,在检查yyleng不是太大之后使用以下而不是循环会更好:

    memcpy(temporal, yytext + 1, yyleng - 2); /* Copy the filename */
    temporal[yyleng - 2] = '\0';              /* NUL-terminate the copy */
    

Once you make the copy in temporal, you give that to newfile:

一旦你以时间方式制作副本,你就把它给新文件:

newfile(temporal);

And in newfile, what we see is:

在newfile中,我们看到的是:

newptr->filename = filename;

That does not copy filename. The call to newfile passed the address of temporal as an argument, so within newfile, the value of the parameter filename is the address of temporal. You then store that address in newptr->filename, so newptr->filename is also the address of temporal.

那不会复制文件名。对newfile的调用将temporal的地址作为参数传递,因此在newfile中,参数filename的值是temporal的地址。然后将该地址存储在newptr-> filename中,因此newptr-> filename也是temporal的地址。

But, as noted above, temporal is not temporary. It is a global variable whose lifetime is the entire lifetime of the program. So the next time your lexical scanner encounters an include directive, it will put it into temporal, overwriting the previous contents. So what then happens to the filename member in the yyfilebuffer structure? Answer: nothing. It still points to the same place, temporal, but the contents of that place have changed. So when you later print out the contents of the string pointed to by that filename field, you'll get a different string from the one which happened to be in temporal when you first created that yyfilebuffer structure.

但是,如上所述,时间不是暂时的。它是一个全局变量,其生命周期是程序的整个生命周期。因此,下次您的词汇扫描程序遇到include指令时,它会将其置于时间内,覆盖以前的内容。那么yyfilebuffer结构中的文件名成员会发生什么呢?答:没事。它仍然指向同一个地方,时间,但该地方的内容已经改变。因此,当您稍后打印出该文件名字段所指向的字符串的内容时,您将获得与第一次创建该yyfilebuffer结构时恰好处于临时状态的字符串不同的字符串。

On the whole, you'll find it easier to manage memory if newfile and popfile "own" the memory in the filebuffer stack. That means that newfile should make a copy of its argument into freshly-allocated storage, and popfile should free that storage, since it is no longer needed. If newfile makes a copy, then it is not necessary for the lexical-scanner action which calls newfile to make a copy; it is only necessary for it to make sure that the string is correctly NUL-terminated when it calls newfile.

总的来说,如果newfile和popfile“拥有”filebuffer堆栈中的内存,你会发现管理内存更容易。这意味着newfile应该将其参数的副本复制到新分配的存储中,并且popfile应该释放该存储,因为不再需要它。如果newfile复制,则调用newfile的词法扫描程序操作不需要复制;它只需要确保字符串在调用newfile时正确地以NUL方式终止。

In short, the code might look like this:

简而言之,代码可能如下所示:

/* Changed parameter to const, since we are not modifying its contents */
void newfile(const char *filename) { 
    /* Eliminated this check as obviously unnecessary: if(top == NULL) */
    struct yyfilebuffer *newptr = malloc(sizeof(struct yyfilebuffer));
    newptr->prev = top;
    // Here we copy filename. Since I suspect that you are on Windows,
    // I'll write it out in full. Normally, I'd use strdup.
    newptr->filename = malloc(strlen(filename) + 1);
    strcpy(newptr->filename, filename);
    newptr->f = fopen(filename,"r");
    newptr->bs = yy_create_buffer(newptr->f, YY_BUF_SIZE);
    top = newptr;
    yy_switch_to_buffer(top->bs);   //edw

    if(top->prev != NULL) {
        printf("top->prev points to file : %s\n",top->prev->filename);
    }
}

void popfile() {
    if(top->prev == NULL) {
        fprintf(stderr, "Error : Trying to pop from empty stack\n");
        exit(1);
    }
    struct yyfilebuffer temp = top;
    top = temp->prev;
    /* Reclaim memory */
    free(temp->filename);
    free(temp);

    yy_switch_to_buffer(top->bs);
    system("pause");
}

Now that newfile takes ownership of the string passed to it, we no longer need to make a copy. Since the action clearly indicates that you expect the argument to the #include to be something like a C #include directive (surrounded either by "..." or <...>), it is better to make that explicit:

既然newfile取得了传递给它的字符串的所有权,我们就不再需要复制了。由于该操作清楚地表明您希望#include的参数类似于C #include指令(由“...”或<...>包围),因此最好将其显式化:

<INC>\".+\"$|"<".+">"$ {
           /* NUL-terminate the filename by overwriting the trailing "*/
           yytext[yyleng - 1] = '\0';
           newfile(yytext + 1);
           BEGIN INITIAL;
         }