如何为自定义CPU创建C编译器?

时间:2022-09-02 15:21:06

What would be the easiest way to create a C compiler for a custom CPU, assuming of course I already have an assembler for it?

什么是为自定义CPU创建C编译器最简单的方法,假设我当然已经有了汇编程序?

Since a C compiler generates assembly, is there some way to just define standard bits and pieces of assembly code for the various C idioms, rebuild the compiler, and thereby obtain a cross compiler for the target hardware?

由于C编译器生成汇编,是否有一些方法可以为各种C语言定义标准位和汇编代码片段,重建编译器,从而获得目标硬件的交叉编译器?

Preferably the compiler itself would be written in C, and build as a native executable for either Linux or Windows.

优选地,编译器本身将用C编写,并且构建为Linux或Windows的本机可执行文件。

Please note: I am not asking how to write the compiler itself. I did take that course in college, I know about general compiler-compilers, etc. In this situation, I'd just like to configure some existing framework if at all possible. I don't want to modify the language, I just want to be able to target an arbitrary architecture. If the answer turns out to be "it doesn't work that way", that information will be useful to myself and anyone else who might make similar assumptions.

请注意:我不是在问如何编写编译器本身。我确实在大学里学过这门课程,我知道一般的编译器编译器等。在这种情况下,我只想配置一些现有的框架,如果可能的话。我不想修改语言,我只是希望能够定位任意架构。如果答案结果是“它不起作用”,那么这些信息对我自己和其他可能做出类似假设的人都有用。

6 个解决方案

#1


31  

Quick overview/tutorial on writing a LLVM backend.

编写LLVM后端的快速概述/教程。

This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.

本文档描述了为LLVM编写后端的技术,它将LLVM表示转换为机器汇编代码或其他语言。

[ . . . ]

[。 。 。 ]

To create a static compiler (one that emits text assembly), you need to implement the following:

要创建静态编译器(发出文本汇编的编译器),您需要实现以下内容:

  • Describe the register set.
  • 描述寄存器组。
  • Describe the instruction set.
  • 描述指令集。
  • Describe the target machine.
  • 描述目标机器。
  • Implement the assembly printer for the architecture.
  • 为架构实现组装打印机。
  • Implement an instruction selector for the architecture.
  • 实现架构的指令选择器。

#2


8  

There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.

交叉编译器的概念,即在一个架构上运行,但针对不同架构的交叉编译器。您可以看到GCC是如何做到的(例如)并向集合添加新架构,如果这是您想要扩展的编译器。

Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this

编辑:几年前我在GCC邮件列表上发现了一个关于如何添加新目标的问题,有人指出这个问题

#3


3  

1) Short answer:

1)简答:

"No. There's no such thing as a "compiler framework" where you can just add water (plug in your own assembly set), stir, and it's done."

“没有。没有”编译框架“这样的东西,你只需加水(插入你自己的装配集),搅拌,就完成了。”

2) Longer answer: it's certainly possible. But challenging. And likely expensive.

2)更长的答案:这当然是可能的。但具有挑战性而且可能很贵。

If you wanted to do it yourself, I'd start by looking at Gnu CC. It's already available for a large variety of CPUs and platforms.

如果你想自己做,我会先看看Gnu CC。它已经可用于各种各样的CPU和平台。

3) Take a look at this link for more ideas (including the idea of "just build a library of functions and macros"), that would be my first suggestion:

3)看一下这个链接以获得更多想法(包括“只是构建一个函数和宏库”的想法),这将是我的第一个建议:

http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/

http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/

#4


3  

The short answer is that it doesn't work that way.

简短的回答是它不会那样工作。

The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):

更长的答案是,为新的CPU类型编写编译器需要花费一些精力。但是,您无需从头开始创建编译器。大多数编译器都是以几个通道构成的;这是一个典型的架构(可能有很多变化):

  1. Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
  2. 句法分析(词法分析器和解析器),以及用于C预处理,导致抽象语法树。
  3. Type checking, leading to an annotated abstract syntax tree.
  4. 类型检查,导致带注释的抽象语法树。
  5. Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
  6. 中间代码生成,导致与体系结构无关的中间代码。在此阶段执行一些优化。
  7. Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.
  8. 机器代码生成,导致装配或直接到机器代码。在此阶段执行更多优化。

In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.

在本说明书中,仅步骤4取决于机器。因此,您可以使用编译器,其中第4步明确分开并插入您自己的步骤4.这样做需要深入了解CPU以及对编译器内部的一些了解,但您不必担心之前发生的事情。

Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.

几乎所有非常小,非常罕见或非常老的CPU都有GCC的后端(步骤4)。编写GCC后端的主要文档是GCC内部手册,特别是有关机器描述和目标描述的章节。 GCC是免费软件,因此使用它没有许可成本。

#5


1  

You can modify existing open source compilers such as GCC or Clang. Other answers have provided you with links about where to learn more. But these compilers are not designed to easily retargeted; they are "easier" to retarget than compilers than other compilers wired for specific targets.

您可以修改现有的开源编译器,如GCC或Clang。其他答案为您提供了有关了解更多信息的链接。但是这些编译器的设计并不容易重新定位;与针对特定目标的其他编译器相比,它们比编译器“更容易”重新定位。

But if you want a compiler that is relatively easy to retarget, you want one in which you can specify the machine architecture in explicit terms, and some tool generates the rest of the compiler (GCC does a bit of this; I don't think Clang/LLVM does much but I could be wrong here).

但是如果你想要一个相对容易重新定位的编译器,你需要一个可以用明确的术语指定机器架构的编译器,而某些工具生成编译器的其余部分(GCC做了一点;我不认为Clang / LLVM做了很多但我在这里错了)。

There's a lot of this in the literature, google "compiler-compiler".

在文献中有很多这样的东西,谷歌“编译器 - 编译器”。

But for a concrete solution for C, you should check out ACE, a compiler vendor that generates compilers on demand for customers. Not free, but I hear they produce very good compilers very quickly. I think it produces standard style binaries (ELF?) so it skips the assembler stage. (I have no experience or relationship with ACE.)

但是对于C的具体解决方案,您应该查看ACE,这是一个根据客户需求生成编译器的编译器供应商。不是免费的,但我听说他们很快就会生成非常好的编译器。我认为它会生成标准样式二进制文件(ELF?),因此它会跳过汇编程序阶段。 (我没有ACE的经验或关系。)

If you don't care about code quality, you can likely write a syntax-directed translation of C to assembler using a C AST. You can get C ASTs from GCC, Clang, maybe ANTLR, and from our DMS Software Reengineering Toolkit.

如果您不关心代码质量,您可以使用C AST编写C语言指导的汇编程序。你可以从GCC,Clang,也许是ANTLR和我们的DMS软件再造工具包中获得C AST。

#6


1  

vbcc (at www.compilers.de) is a good and simple retargetable C-compiler written in C. It's much simpler than GCC/LLVM. It's so simple I was able to retarget the compiler to my own CPU with a few weeks of work without having any prior knowledge of compilers.

vbcc(在www.compilers.de上)是一个用C编写的优秀且简单的可重定向C编译器。它比GCC / LLVM简单得多。这很简单我能够在没有任何编译器知识的情况下将编译器重新定位到我自己的CPU上几周的工作。

#1


31  

Quick overview/tutorial on writing a LLVM backend.

编写LLVM后端的快速概述/教程。

This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.

本文档描述了为LLVM编写后端的技术,它将LLVM表示转换为机器汇编代码或其他语言。

[ . . . ]

[。 。 。 ]

To create a static compiler (one that emits text assembly), you need to implement the following:

要创建静态编译器(发出文本汇编的编译器),您需要实现以下内容:

  • Describe the register set.
  • 描述寄存器组。
  • Describe the instruction set.
  • 描述指令集。
  • Describe the target machine.
  • 描述目标机器。
  • Implement the assembly printer for the architecture.
  • 为架构实现组装打印机。
  • Implement an instruction selector for the architecture.
  • 实现架构的指令选择器。

#2


8  

There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.

交叉编译器的概念,即在一个架构上运行,但针对不同架构的交叉编译器。您可以看到GCC是如何做到的(例如)并向集合添加新架构,如果这是您想要扩展的编译器。

Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this

编辑:几年前我在GCC邮件列表上发现了一个关于如何添加新目标的问题,有人指出这个问题

#3


3  

1) Short answer:

1)简答:

"No. There's no such thing as a "compiler framework" where you can just add water (plug in your own assembly set), stir, and it's done."

“没有。没有”编译框架“这样的东西,你只需加水(插入你自己的装配集),搅拌,就完成了。”

2) Longer answer: it's certainly possible. But challenging. And likely expensive.

2)更长的答案:这当然是可能的。但具有挑战性而且可能很贵。

If you wanted to do it yourself, I'd start by looking at Gnu CC. It's already available for a large variety of CPUs and platforms.

如果你想自己做,我会先看看Gnu CC。它已经可用于各种各样的CPU和平台。

3) Take a look at this link for more ideas (including the idea of "just build a library of functions and macros"), that would be my first suggestion:

3)看一下这个链接以获得更多想法(包括“只是构建一个函数和宏库”的想法),这将是我的第一个建议:

http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/

http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/

#4


3  

The short answer is that it doesn't work that way.

简短的回答是它不会那样工作。

The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):

更长的答案是,为新的CPU类型编写编译器需要花费一些精力。但是,您无需从头开始创建编译器。大多数编译器都是以几个通道构成的;这是一个典型的架构(可能有很多变化):

  1. Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
  2. 句法分析(词法分析器和解析器),以及用于C预处理,导致抽象语法树。
  3. Type checking, leading to an annotated abstract syntax tree.
  4. 类型检查,导致带注释的抽象语法树。
  5. Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
  6. 中间代码生成,导致与体系结构无关的中间代码。在此阶段执行一些优化。
  7. Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.
  8. 机器代码生成,导致装配或直接到机器代码。在此阶段执行更多优化。

In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.

在本说明书中,仅步骤4取决于机器。因此,您可以使用编译器,其中第4步明确分开并插入您自己的步骤4.这样做需要深入了解CPU以及对编译器内部的一些了解,但您不必担心之前发生的事情。

Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.

几乎所有非常小,非常罕见或非常老的CPU都有GCC的后端(步骤4)。编写GCC后端的主要文档是GCC内部手册,特别是有关机器描述和目标描述的章节。 GCC是免费软件,因此使用它没有许可成本。

#5


1  

You can modify existing open source compilers such as GCC or Clang. Other answers have provided you with links about where to learn more. But these compilers are not designed to easily retargeted; they are "easier" to retarget than compilers than other compilers wired for specific targets.

您可以修改现有的开源编译器,如GCC或Clang。其他答案为您提供了有关了解更多信息的链接。但是这些编译器的设计并不容易重新定位;与针对特定目标的其他编译器相比,它们比编译器“更容易”重新定位。

But if you want a compiler that is relatively easy to retarget, you want one in which you can specify the machine architecture in explicit terms, and some tool generates the rest of the compiler (GCC does a bit of this; I don't think Clang/LLVM does much but I could be wrong here).

但是如果你想要一个相对容易重新定位的编译器,你需要一个可以用明确的术语指定机器架构的编译器,而某些工具生成编译器的其余部分(GCC做了一点;我不认为Clang / LLVM做了很多但我在这里错了)。

There's a lot of this in the literature, google "compiler-compiler".

在文献中有很多这样的东西,谷歌“编译器 - 编译器”。

But for a concrete solution for C, you should check out ACE, a compiler vendor that generates compilers on demand for customers. Not free, but I hear they produce very good compilers very quickly. I think it produces standard style binaries (ELF?) so it skips the assembler stage. (I have no experience or relationship with ACE.)

但是对于C的具体解决方案,您应该查看ACE,这是一个根据客户需求生成编译器的编译器供应商。不是免费的,但我听说他们很快就会生成非常好的编译器。我认为它会生成标准样式二进制文件(ELF?),因此它会跳过汇编程序阶段。 (我没有ACE的经验或关系。)

If you don't care about code quality, you can likely write a syntax-directed translation of C to assembler using a C AST. You can get C ASTs from GCC, Clang, maybe ANTLR, and from our DMS Software Reengineering Toolkit.

如果您不关心代码质量,您可以使用C AST编写C语言指导的汇编程序。你可以从GCC,Clang,也许是ANTLR和我们的DMS软件再造工具包中获得C AST。

#6


1  

vbcc (at www.compilers.de) is a good and simple retargetable C-compiler written in C. It's much simpler than GCC/LLVM. It's so simple I was able to retarget the compiler to my own CPU with a few weeks of work without having any prior knowledge of compilers.

vbcc(在www.compilers.de上)是一个用C编写的优秀且简单的可重定向C编译器。它比GCC / LLVM简单得多。这很简单我能够在没有任何编译器知识的情况下将编译器重新定位到我自己的CPU上几周的工作。