如何删除C源文件中的所有/* */注释?

时间:2022-11-24 23:10:06

I have a C file which I copied from somewhere else, but it has a lot of comments like below:

我有一个C文件,是从别的地方复制过来的,但是它有很多的评论如下:

int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)

How can I delete all the comments enclosed by /* and */. Sometimes, the comments are consist of 4-5 lines, and i need to delete all those lines.

如何删除/*和*/所附的所有评论?有时,注释由4-5行组成,我需要删除所有这些行。

Basically, I need to delete all text between /* and */ and even \n can come in between. Please help me do this using one of sed, awk or perl.

基本上,我需要删除/*和*/之间的所有文本,甚至连\n都可以放在中间。请帮助我使用sed、awk或perl。

10 个解决方案

#1


31  

Why not just use the c preprocessor to do this? Why are you confining yourself to a home-grown regex?

为什么不直接使用c预处理器来完成呢?你为什么要把自己关在一个土生土长的regex公司?

[Edit] This approach also handles Barts printf(".../*...") scenario cleanly

[编辑]此方法还可以干净地处理Barts printf(“…/*…”)场景

Example:

例子:

[File: t.c]
/* This is a comment */
int main () {
    /* 
     * This
     * is 
     * a
     * multiline
     * comment
     */
    int f = 42;
    /*
     * More comments
     */
    return 0;
}

.

$ cpp -P t.c
int main () {







    int f = 42;



    return 0;
}

Or you can remove the whitespace and condense everything

或者您可以删除空白并将所有内容压缩。

$ cpp -P t.c | egrep -v "^[ \t]*$"
int main () {
    int f = 42;
    return 0;
}

No use re-inventing the wheel, is there?

再发明*是没有用的,是吗?

[Edit] If you want to not expand included files and macroa by this approach, cpp provides flags for this. Consider:

如果您不希望通过这种方法扩展包含的文件和宏,cpp将为此提供标志。考虑:

[File: t.c]

(文件:t.c.)

#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

.

$ cpp -P -fpreprocessed t.c | grep -v "^[ \t]*$"
#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

There is a slight caveat in that macro expansion can be avoided, but the original definition of the macro is stripped from the source.

在这个宏扩展中有一点需要注意的地方是可以避免的,但是对宏的原始定义是从源代码中剥离出来的。

#2


12  

See perlfaq6. It's quite a complex scenario.

看到perlfaq6。这是一个非常复杂的场景。

$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;

A word of warning - once you've done this, do you have a test scenario to prove to yourself that you've just removed the comments and nothing valuable ? If you're running such a powerful regexp I'd ensure some sort of test (even if you simply record the behaviour before/afterwards).

警告一下——一旦你这样做了,你是否有一个测试场景来证明你刚刚删除了注释,没有任何价值?如果您正在运行如此强大的regexp,我将确保进行某种测试(即使您只是在/之后记录行为)。

#3


6  

Take a look at the strip_comments routine in Inline::Filters:

查看Inline:::Filters例程中的strip_comments例程:

sub strip_comments {
    my ($txt, $opn, $cls, @quotes) = @_;
    my $i = -1;
    while (++$i < length $txt) {
    my $closer;
        if (grep {my $r=substr($txt,$i,length($_)) eq $_; $closer=$_ if $r; $r}
        @quotes) {
        $i = skip_quoted($txt, $i, $closer);
        next;
        }
        if (substr($txt, $i, length($opn)) eq $opn) {
        my $e = index($txt, $cls, $i) + length($cls);
        substr($txt, $i, $e-$i) =~ s/[^\n]/ /g;
        $i--;
        next;
        }
    }
    return $txt;
}

#4


5  

Please do not use cpp for this unless you understand the ramifications:

请不要使用cpp,除非你了解其后果:

$ cat t.c
#include <stdio.h>

#define MSG "Hello World"

int main(void) {
    /* ANNOY: print MSG using the puts function */
    puts(MSG);
    return 0;
}

Now, let's run it through cpp:

现在,让我们来看看cpp:

$ cpp -P t.c -fpreprocessed


#include <stdio.h>



int main(void) {


    puts(MSG);
    return 0;
}

Clearly, this file is no longer going to compile.

显然,这个文件将不再编译。

#5


4  

Consider:

考虑:

printf("... /* ...");
int matrix[20];
printf("... */ ...");

In other words: I wouldn't use regex for this task, unless you're doing a replace-once and are positive that the above does not occur.

换句话说:我不会将regex用于此任务,除非您正在执行一次替换,并且确信上述操作不会发生。

#6


3  

You MUST use a C preprocessor for this in combination with other tools to temporarily disable specific preprocessor functionality like expanding #defines or #includes, all other approaches will fail in edge cases. This will work for all cases:

您必须为此使用一个C预处理器,并结合其他工具暂时禁用特定的预处理器功能,如展开#define或#include,所有其他方法在边缘情况下都将失败。这将适用于所有情况:

[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g;s/__/aB/g;s/#/aC/g' "$file" |
          gcc -P -E $arg - |
          sed 's/aC/#/g;s/aB/__/g;s/aA/a/g'

Put it in a shell script and call it with the name of the file you want parsed, optionally prefixed by a flag like "-ansi" to specify the C standard to apply.

将它放入shell脚本中,并使用您希望解析的文件的名称对其进行调用,可选地以“-ansi”之类的标志作为前缀,以指定要应用的C标准。

#7


2  

Try this on the command line (replacing 'file-names' with the list of file that need to be processed):

在命令行上试试这个(用需要处理的文件列表替换'file-names'):

perl -i -wpe 'BEGIN{undef $/} s!/\*.*?\*/!!sg' file-names

This program changes the files in-place (overwriting the original file with the corrected output). If you just want the output without changing the original files, omit the '-i' switch.

这个程序更改了文件的位置(用正确的输出覆盖原始文件)。如果您只想要输出而不更改原始文件,请忽略“-i”开关。

Explanation:

解释:

perl -- call the perl interpreter
-i      switch to 'change-in-place' mode.
-w      print warnings to STDOUT (if there are any)
 p      read the files and print $_ for each record; like while(<>){ ...; print $_;}
 e      process the following argument as a program (once for each input record)

BEGIN{undef $/} --- process whole files instead of individual lines.
s!      search and replace ...
  /\*     the starting /* marker
  .*?     followed by any text (not gredy search)
  \*/     followed by the */ marker
!!      replace by the empty string (i.e. remove comments)  
  s     treat newline characters \n like normal characters (remove multi-line comments)
   g    repeat as necessary to process all comments.

file-names   list of files to be processed.

#8


1  

When I want something short and simple for CSS, I use this:

当我想要一些简短的CSS样式时,我使用以下方法:

awk -vRS='*/' '{gsub(/\/\*.*/,"")}1' FILE

This won't handle the case where comment delimiters appear inside strings but it's much simpler than a solution that does. Obviously it's not bulletproof or suitable for everything but you know better than the pedants on SO whether or not you can live with that.

这不会处理字符串中出现注释分隔符的情况,但它比出现注释分隔符的解决方案要简单得多。显然它不是防弹的,也不适合所有的东西但是你比那些书呆子更了解所以你是否能接受它。

I believe this one is bulletproof however.

我相信这个是防弹的。

#9


1  

Try the below recursive way of finding and removing Java script type comments, XML type Comments and single line comments

尝试使用下面的递归方法查找和删除Java脚本类型注释、XML类型注释和单行注释

/* This is a multi line js comments.

Please remove me*/

for f in find pages/ -name "*.*"; do perl -i -wpe 'BEGIN{undef $/} s!/*.*?*/!!sg' $f; done

f在查找页/ -name "*.*";perl -i -wpe '是否以{undef $/}开头?sg的$ f;完成

<!-- This is a multi line xml comments.

Please remove me -->

for f in find pages/ -name "*.*"; do perl -i -wpe 'BEGIN{undef $/} s!<!--.*?-->!!sg' $f; done

f在查找页/ -name "*.*";perl -i -wpe '开始{undef $/} !

//This is single line comment Please remove me.

for f in find pages/ -name "*.*"; do sed -i 's///.*//' $f; done

f在查找页/ -name "*.*";做sed -我的/ / /。* / / ' $ f;完成

Note : pages is a root directory and the above script will find and remove in all files located in root and sub directories as well.

注意:页面是根目录,上面的脚本将在根目录和子目录中找到并删除所有文件。

#10


0  

very simplistic example using gawk. Please test a lot of times before implementing. Of course it doesn't take care of the other comment style // (in C++??)

非常简单的例子。在实现之前请进行多次测试。当然,它不考虑其他注释样式// (c++ ?)

$ more file
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
/*
function(){
 blah blah
}
*/
float a;
float b;

$ awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' file
int matrix[20];


for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;


for (index = 0; index < 5 ;index++)


float a;
float b;

#1


31  

Why not just use the c preprocessor to do this? Why are you confining yourself to a home-grown regex?

为什么不直接使用c预处理器来完成呢?你为什么要把自己关在一个土生土长的regex公司?

[Edit] This approach also handles Barts printf(".../*...") scenario cleanly

[编辑]此方法还可以干净地处理Barts printf(“…/*…”)场景

Example:

例子:

[File: t.c]
/* This is a comment */
int main () {
    /* 
     * This
     * is 
     * a
     * multiline
     * comment
     */
    int f = 42;
    /*
     * More comments
     */
    return 0;
}

.

$ cpp -P t.c
int main () {







    int f = 42;



    return 0;
}

Or you can remove the whitespace and condense everything

或者您可以删除空白并将所有内容压缩。

$ cpp -P t.c | egrep -v "^[ \t]*$"
int main () {
    int f = 42;
    return 0;
}

No use re-inventing the wheel, is there?

再发明*是没有用的,是吗?

[Edit] If you want to not expand included files and macroa by this approach, cpp provides flags for this. Consider:

如果您不希望通过这种方法扩展包含的文件和宏,cpp将为此提供标志。考虑:

[File: t.c]

(文件:t.c.)

#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

.

$ cpp -P -fpreprocessed t.c | grep -v "^[ \t]*$"
#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

There is a slight caveat in that macro expansion can be avoided, but the original definition of the macro is stripped from the source.

在这个宏扩展中有一点需要注意的地方是可以避免的,但是对宏的原始定义是从源代码中剥离出来的。

#2


12  

See perlfaq6. It's quite a complex scenario.

看到perlfaq6。这是一个非常复杂的场景。

$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;

A word of warning - once you've done this, do you have a test scenario to prove to yourself that you've just removed the comments and nothing valuable ? If you're running such a powerful regexp I'd ensure some sort of test (even if you simply record the behaviour before/afterwards).

警告一下——一旦你这样做了,你是否有一个测试场景来证明你刚刚删除了注释,没有任何价值?如果您正在运行如此强大的regexp,我将确保进行某种测试(即使您只是在/之后记录行为)。

#3


6  

Take a look at the strip_comments routine in Inline::Filters:

查看Inline:::Filters例程中的strip_comments例程:

sub strip_comments {
    my ($txt, $opn, $cls, @quotes) = @_;
    my $i = -1;
    while (++$i < length $txt) {
    my $closer;
        if (grep {my $r=substr($txt,$i,length($_)) eq $_; $closer=$_ if $r; $r}
        @quotes) {
        $i = skip_quoted($txt, $i, $closer);
        next;
        }
        if (substr($txt, $i, length($opn)) eq $opn) {
        my $e = index($txt, $cls, $i) + length($cls);
        substr($txt, $i, $e-$i) =~ s/[^\n]/ /g;
        $i--;
        next;
        }
    }
    return $txt;
}

#4


5  

Please do not use cpp for this unless you understand the ramifications:

请不要使用cpp,除非你了解其后果:

$ cat t.c
#include <stdio.h>

#define MSG "Hello World"

int main(void) {
    /* ANNOY: print MSG using the puts function */
    puts(MSG);
    return 0;
}

Now, let's run it through cpp:

现在,让我们来看看cpp:

$ cpp -P t.c -fpreprocessed


#include <stdio.h>



int main(void) {


    puts(MSG);
    return 0;
}

Clearly, this file is no longer going to compile.

显然,这个文件将不再编译。

#5


4  

Consider:

考虑:

printf("... /* ...");
int matrix[20];
printf("... */ ...");

In other words: I wouldn't use regex for this task, unless you're doing a replace-once and are positive that the above does not occur.

换句话说:我不会将regex用于此任务,除非您正在执行一次替换,并且确信上述操作不会发生。

#6


3  

You MUST use a C preprocessor for this in combination with other tools to temporarily disable specific preprocessor functionality like expanding #defines or #includes, all other approaches will fail in edge cases. This will work for all cases:

您必须为此使用一个C预处理器,并结合其他工具暂时禁用特定的预处理器功能,如展开#define或#include,所有其他方法在边缘情况下都将失败。这将适用于所有情况:

[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g;s/__/aB/g;s/#/aC/g' "$file" |
          gcc -P -E $arg - |
          sed 's/aC/#/g;s/aB/__/g;s/aA/a/g'

Put it in a shell script and call it with the name of the file you want parsed, optionally prefixed by a flag like "-ansi" to specify the C standard to apply.

将它放入shell脚本中,并使用您希望解析的文件的名称对其进行调用,可选地以“-ansi”之类的标志作为前缀,以指定要应用的C标准。

#7


2  

Try this on the command line (replacing 'file-names' with the list of file that need to be processed):

在命令行上试试这个(用需要处理的文件列表替换'file-names'):

perl -i -wpe 'BEGIN{undef $/} s!/\*.*?\*/!!sg' file-names

This program changes the files in-place (overwriting the original file with the corrected output). If you just want the output without changing the original files, omit the '-i' switch.

这个程序更改了文件的位置(用正确的输出覆盖原始文件)。如果您只想要输出而不更改原始文件,请忽略“-i”开关。

Explanation:

解释:

perl -- call the perl interpreter
-i      switch to 'change-in-place' mode.
-w      print warnings to STDOUT (if there are any)
 p      read the files and print $_ for each record; like while(<>){ ...; print $_;}
 e      process the following argument as a program (once for each input record)

BEGIN{undef $/} --- process whole files instead of individual lines.
s!      search and replace ...
  /\*     the starting /* marker
  .*?     followed by any text (not gredy search)
  \*/     followed by the */ marker
!!      replace by the empty string (i.e. remove comments)  
  s     treat newline characters \n like normal characters (remove multi-line comments)
   g    repeat as necessary to process all comments.

file-names   list of files to be processed.

#8


1  

When I want something short and simple for CSS, I use this:

当我想要一些简短的CSS样式时,我使用以下方法:

awk -vRS='*/' '{gsub(/\/\*.*/,"")}1' FILE

This won't handle the case where comment delimiters appear inside strings but it's much simpler than a solution that does. Obviously it's not bulletproof or suitable for everything but you know better than the pedants on SO whether or not you can live with that.

这不会处理字符串中出现注释分隔符的情况,但它比出现注释分隔符的解决方案要简单得多。显然它不是防弹的,也不适合所有的东西但是你比那些书呆子更了解所以你是否能接受它。

I believe this one is bulletproof however.

我相信这个是防弹的。

#9


1  

Try the below recursive way of finding and removing Java script type comments, XML type Comments and single line comments

尝试使用下面的递归方法查找和删除Java脚本类型注释、XML类型注释和单行注释

/* This is a multi line js comments.

Please remove me*/

for f in find pages/ -name "*.*"; do perl -i -wpe 'BEGIN{undef $/} s!/*.*?*/!!sg' $f; done

f在查找页/ -name "*.*";perl -i -wpe '是否以{undef $/}开头?sg的$ f;完成

<!-- This is a multi line xml comments.

Please remove me -->

for f in find pages/ -name "*.*"; do perl -i -wpe 'BEGIN{undef $/} s!<!--.*?-->!!sg' $f; done

f在查找页/ -name "*.*";perl -i -wpe '开始{undef $/} !

//This is single line comment Please remove me.

for f in find pages/ -name "*.*"; do sed -i 's///.*//' $f; done

f在查找页/ -name "*.*";做sed -我的/ / /。* / / ' $ f;完成

Note : pages is a root directory and the above script will find and remove in all files located in root and sub directories as well.

注意:页面是根目录,上面的脚本将在根目录和子目录中找到并删除所有文件。

#10


0  

very simplistic example using gawk. Please test a lot of times before implementing. Of course it doesn't take care of the other comment style // (in C++??)

非常简单的例子。在实现之前请进行多次测试。当然,它不考虑其他注释样式// (c++ ?)

$ more file
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
/*
function(){
 blah blah
}
*/
float a;
float b;

$ awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' file
int matrix[20];


for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;


for (index = 0; index < 5 ;index++)


float a;
float b;