软件开发中的非确定性有限状态机？

Recently I've been thinking about finite state machines (FSMs), and how I would implement them in software (programming language doesn't matter).

最近我一直在考虑有限状态机(FSM),以及如何在软件中实现它们(编程语言并不重要)。

My understanding is that deterministic state machines are in widespread use (parses/lexers, compilers and so on), but what's the matter with non-deterministic state machines?

我的理解是确定性状态机被广泛使用(解析/词法分析器,编译器等),但是非确定性状态机的问题是什么?

I know that is possible to convert all non-deterministic state machines to deterministic ones (even programmatically). That's not my point. I also imagine that non-deterministic state machines are much more complicated to implement.

我知道可以将所有非确定性状态机转换为确定性状态机(甚至是以编程方式)。那不是我的观点。我还想象非确定性状态机的实现要复杂得多。

Anyway, does it make any sense to implement a non-deterministic state machine? Are there any special applications I don't know about? What could be the reasons to do that? Maybe optimized and specialized non-deterministic state machines are faster?

无论如何,实现非确定性状态机是否有意义?有什么特别的应用我不知道吗?这可能是什么原因?也许优化和专业的非确定性状态机更快?

8 个解决方案

#1

Most regular expression engines use non-deterministic automata since they offer much greater flexibility. DFAs are much more restricted. Have a look at some implementations and you'll see this. Microsoft even underlines this fact in their documentation of the .NET Regex class:

大多数正则表达式引擎使用非确定性自动机,因为它们提供了更大的灵活性。 DFA受到更多限制。看看一些实现,你会看到这一点。微软甚至在他们的.NET Regex类文档中强调了这一事实:

The .NET Framework regular expression engine is a backtracking regular expression matcher that incorporates a traditional Nondeterministic Finite Automaton (NFA) engine such as that used by Perl, Python, Emacs, and Tcl.

.NET Framework正则表达式引擎是一种回溯正则表达式匹配器,它包含传统的非确定性有限自动机(NFA)引擎,例如Perl,Python,Emacs和Tcl使用的引擎。

Matching behavior (first paragraph) – this article also offers a rationale for the employment of an NFA rather than the more efficient DFA.

匹配行为(第一段) - 本文还提供了使用NFA而不是更有效的DFA的基本原理。

#2

As you know, NFAs and DFAs are computationally equivalent. It's one of the first theorems in automata theory. There are algorithms to convert one to another(unlike Pushdown or turing machines).

如您所知,NFA和DFA在计算上是等效的。这是自动机理论中的第一个定理之一。有一些算法可以将一个转换为另一个(与Pushdown或图灵机不同)。

So. Why one over the other? Because representation of a given problem with a NFA is far easier than the equivalent DFA.

所以。为什么一个超过另一个?因为使用NFA表示给定问题比等效DFA容易得多。

edit: in terms of actually computing the machine, DFAs are going to go faster because they don't have to backtrack. But they will take more memory to represent. (Mem vs CPU tradeoff)

编辑:就实际计算机器而言,DFA会更快,因为它们不需要回溯。但他们将需要更多的记忆来代表。 (Mem与CPU权衡)

#3

My advice = take a look at the manual for Adrian Thurstons Ragel.

我的建议=看看Adrian Thurstons Ragel的手册。

There are simple, ways to generate a DFA directly, but I believe they only support a limited range of operators - basically the EBNF usual suspects. Ragel uses non-deterministic methods to compose complex automata from simpler ones, then uses epsilon elimination and minimisation to create efficient deterministic automata. No matter how many wierd operators you need, the conversion to a minimal deterministic automata is always the same, and each operator implementation is kept simple by using nondeterministic methods.

有简单的方法直接生成DFA,但我相信它们只支持有限范围的运营商 - 基本上是EBNF通常的嫌疑人。 Ragel使用非确定性方法从较简单的自动机组成复杂自动机,然后使用epsilon消除和最小化来创建有效的确定性自动机。无论您需要多少个奇怪的运算符,转换到最小确定性自动机总是相同的,并且通过使用非确定性方法使每个运算符实现保持简单。

#4

The viterbi algorithm operates on Hidden Markov Models by treating them much like an NFA. Not entirely identical, but certainly analogous.

维特比算法在隐马尔可夫模型上运行,将它们视为非常类似于NFA。不完全相同,但肯定是类似的。

They're useful in applications like speech and text recognition.

它们在语音和文本识别等应用中非常有用。

#5

Very often it is much more easier to create a NFA and then work with it (the only difference is that you hold a set of states instead of one state). If you want to have it fast, you can make DFA, but don't forget, that the time to do it is exponential (because of the resultant automaton can be exponentially bigger!).

通常,创建NFA然后使用它更容易(唯一的区别是你拥有一组状态而不是一个状态)。如果你想要快速,你可以制作DFA,但不要忘记,这样做的时间是指数级的(因为结果自动机可以指数级更大!)。

On the other hand, if you want to make a complement language, you have no choice, you need a det. variant.

另一方面,如果你想补充语言,你别无选择,你需要一个det。变种。

It is the reason why the negation is in none of regular-expression-engine, only in classes ([^...]), where you can be sure that the automaton is deterministic.

这就是为什么否定是正则表达式引擎的原因,只有在类([^ ...])中,你可以确定自动机是确定性的。

#6

Correct me if I'm wrong but from my compilers class I remember that sometimes you simply can't use DFA as it would lead to an "explosion" of states.

纠正我,如果我错了,但是从我的编译器课程中我记得有时你根本不能使用DFA,因为它会导致状态的“爆炸”。

#7

I think the main reason for choosing a non-deterministic finite automaton would be to actually get the chosen match back. It's likely a lot harder to do it with a deterministic version.

我认为选择非确定性有限自动机的主要原因是实际得到所选择的匹配。使用确定性版本执行此操作可能要困难得多。

If all you want to know is IF they match or not, and no other details, I would think compiling down to a finite automaton would be better.

如果您只想知道它们是否匹配,并且没有其他细节,我认为编译成有限自动机会更好。

#8

-1

Cayuga utilizes non-deterministic finite state machines under the hood for complex event processing. Well, it looks like they call it "Stateful Publish/Subscribe for Event Monitoring", but I believe it is CEP.

Cayuga利用非确定性有限状态机进行复杂的事件处理。好吧,看起来他们称之为“有状态发布/订阅事件监控”,但我相信它是CEP。

I believe some of their papers even discuss why they are using an automata model. You might want to poke around their site.

我相信他们的一些论文甚至讨论了他们使用自动机模型的原因。你可能想要浏览他们的网站。

...Cayuga automata, extended from standard non-deterministic finite automata.

......卡尤加自动机,从标准的非确定性有限自动机扩展而来。

#1