在静态类型语言中清除和类型安全的状态机实现?

I implemented a simple state machine in Python:

我用Python实现了一个简单的状态机:

import time

def a():
    print "a()"
    return b

def b():
    print "b()"
    return c

def c():
    print "c()"
    return a


if __name__ == "__main__":
    state = a
    while True:
        state = state()
        time.sleep(1)

I wanted to port it to C, because it wasn't fast enough. But C doesn't let me make a function that returns a function of the same type. I tried making the function of this type: typedef *fn(fn)(), but it doesn't work, so I had to use a structure instead. Now the code is very ugly!

我想把它移植到C，因为它不够快。但是C不允许我创建一个返回相同类型函数的函数。我尝试了这个类型的函数:typedef *fn(fn)()，但它不起作用，所以我不得不使用一个结构。现在的代码很难看!

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

typedef struct fn {
    struct fn (*f)(void);
} fn_t;

fn_t a(void);
fn_t b(void);
fn_t c(void);

fn_t a(void)
{
    fn_t f = {b};

    (void)printf("a()\n");

    return f;
}

fn_t b(void)
{
    fn_t f = {c};

    (void)printf("b()\n");

    return f;
}

fn_t c(void)
{
    fn_t f = {a};

    (void)printf("c()\n");

    return f;
}

int main(void)
{
    fn_t state = {a};

    for(;; (void)sleep(1)) state = state.f();

    return EXIT_SUCCESS;
}

So I figured it's a problem with C's broken type system. So I used a language with a real type system (Haskell), but the same problem happens. I can't just do something like:

所以我认为这是C的破坏类型系统的问题。所以我用了一种带有真正类型系统(Haskell)的语言，但同样的问题也发生了。我不能就这么做:

type Fn = IO Fn
a :: Fn
a = print "a()" >> return b
b :: Fn
b = print "b()" >> return c
c :: Fn
c = print "c()" >> return a

I get the error, Cycle in type synonym declarations.

我得到了错误，在类型同义声明中循环。

So I have to make some wrapper the same way I did for the C code like this:

所以我必须做一些包装就像我对C代码做的那样:

import Control.Monad
import System.Posix

data Fn = Fn (IO Fn)

a :: IO Fn
a = print "a()" >> return (Fn b)

b :: IO Fn
b = print "b()" >> return (Fn c)

c :: IO Fn
c = print "c()" >> return (Fn a)

run = foldM (\(Fn f) () -> sleep 1 >> f) (Fn a) (repeat ())

Why is it so hard to make a state machine in a statically typed language? I have to make unnecessary overhead in statically typed languages as well. Dynamically typed languages don't have this problem. Is there an easier way to do it in a statically typed language?

为什么在静态类型的语言中创建状态机如此困难?我还必须在静态类型语言中增加不必要的开销。动态类型语言没有这个问题。在静态类型的语言中是否有更简单的方法?

11 个解决方案

#1

If you use newtype instead of data, you don't incur any overhead. Also, you can wrap each state's function at the point of definition, so the expressions that use them don't have to:

如果使用newtype而不是数据，就不会产生任何开销。同样，您可以在定义的时候包装每个状态的函数，因此使用它们的表达式不必:

import Control.Monad

newtype State = State { runState :: IO State }

a :: State
a = State $ print "a()" >> return b

b :: State
b = State $ print "b()" >> return c

c :: State
c = State $ print "c()" >> return a

runMachine :: State -> IO ()
runMachine s = runMachine =<< runState s

main = runMachine a

Edit: it struck me that runMachine has a more general form; a monadic version of iterate:

编辑:让我惊讶的是runMachine有一个更通用的形式;迭代的一元版本:

iterateM :: Monad m => (a -> m a) -> a -> m [a]
iterateM f a = do { b <- f a
                  ; as <- iterateM f b
                  ; return (a:as)
                  }

main = iterateM runState a

Edit: Hmm, iterateM causes a space-leak. Maybe iterateM_ would be better.

编辑:嗯，iterateM导致空间泄漏。也许迭代会更好。

iterateM_ :: Monad m => (a -> m a) -> a -> m ()
iterateM_ f a = f a >>= iterateM_ f

main = iterateM_ runState a

Edit: If you want to thread some state through the state machine, you can use the same definition for State, but change the state functions to:

编辑:如果您想在状态机中线程一些状态，您可以使用相同的状态定义，但是将状态函数更改为:

a :: Int -> State
a i = State $ do{ print $ "a(" ++ show i ++ ")"
                ; return $ b (i+1)
                }

b :: Int -> State
b i = State $ do{ print $ "b(" ++ show i ++ ")"
                ; return $ c (i+1)
                }

c :: Int -> State
c i = State $ do{ print $ "c(" ++ show i ++ ")"
                ; return $ a (i+1)
                }

main = iterateM_ runState $ a 1

#2

In Haskell, the idiom for this is just to go ahead and execute the next state:

在Haskell中，这个习语的意思是继续执行下一个状态:

type StateMachine = IO ()
a, b, c :: StateMachine
a = print "a()" >> b
b = print "b()" >> c
c = print "c()" >> a

You need not worry that this will overflow a stack or anything like that. If you insist on having states, then you should make the data type more explicit:

您不必担心这会溢出堆栈或类似的东西。如果您坚持有状态，那么您应该使数据类型更明确:

data PossibleStates = A | B | C
type StateMachine = PossibleStates -> IO PossibleStates
machine A = print "a()" >> return B
machine B = print "b()" >> return C
machine C = print "c()" >> return A

You can then get compiler warnings about any StateMachine that forgot some states.

然后，您可以得到编译器对任何丢失某些状态的StateMachine的警告。

#3

The problem with your Haskell code is, that type only introduces a synonym, which is quite similar to what typedef in C does. One important restriction is, that the expansion of the type must be finite, you can't give a finite expansion of your state machine. A solution is using a newtype: A newtype is a wrapper that does only exist for the type checker; there is absolutely zero overhead (excluded stuff that occurs because of generalization that can't be removed). Here is your signature; it typechecks:

Haskell代码的问题是，该类型只引入同义词，这与C中的typedef所做的非常相似。一个重要的限制是，类型的扩展必须是有限的，你不能给出状态机的有限扩展。解决方案是使用newtype: newtype是只存在于类型检查器的包装器;绝对没有开销(由于不能删除的泛化而产生的排除的东西)。这是你的签名;它typechecks:

newtype FN = FN { unFM :: (IO FN) }

Please note, that whenever you want to use an FN, you have to unpack it first using unFN. Whenever you return a new function, use FN to pack it.

请注意，当您想使用FN时，您必须首先使用unFN解压它。每当返回一个新函数时，使用FN将其打包。

#4

In the C-like type systems functions are not first order citizens. There are certain restrictions on handling them. That was a decision for simplicity and speed of implementation/execution that stuck. To have functions behave like objects, one generally requires support for closures. Those however are not naturally supported by mosts processors' instruction sets. As C was designed to be close to the metal, there was no support for them.

在类c类型系统中，函数不是一级公民。处理它们有一定的限制。这是为了实现和执行的简单性和速度而做出的决定。要使函数像对象一样运行，通常需要对闭包的支持。然而，mosts处理器的指令集并不能自然地支持这些特性。因为C被设计成靠近金属，所以没有支撑。

When declaring recursive structures in C, the type must be fully expandable. A consequence of this is, that you can only have pointers as self-references in struct declarations:

在C中声明递归结构时，类型必须是完全可扩展的。这样做的结果是，您只能在struct声明中使用指针作为自引用:

struct rec;
struct rec {
    struct rec *next;
};

Also every identifier we use has to be declared. One of the restrictions of function-types is, that one can not forward declare them.

我们使用的每个标识符都必须声明。函数类型的一个限制是，不能向前声明它们。

A state machine in C usually works by making a mapping from integers to functions, either in a switch statement or in a jump table:

C语言中的状态机通常是通过将整数映射到函数，或者在switch语句中或在跳转表中进行映射:

typedef int (*func_t)();

void run() {
    func_t table[] = {a, b, c};

    int state = 0;

    while(True) {
        state = table[state]();
    }
}

Alternatively you could profile your Python code and try to find out why your code is slow. You can port the critical parts to C/C++ and keep using Python for the state machine.

或者，您可以分析您的Python代码，并尝试找出为什么您的代码比较慢。您可以将关键部件移植到C/ c++，并在状态机中继续使用Python。

#5

As usual, despite the great answers already present, I couldn't resist trying it out for myself. One thing that bothered me about what is presented is that it ignores input. State machines--the ones that I am familiar with--choose between various possible transitions based on input.

像往常一样，尽管已经有了很好的答案，我还是忍不住自己去尝试。有一件事困扰着我，它忽略了输入。状态机(我熟悉的状态机)根据输入在各种可能的转换之间进行选择。

data State vocab = State { stateId :: String
                         , possibleInputs :: [vocab]
                         , _runTrans :: (vocab -> State vocab)
                         }
                      | GoalState { stateId :: String }

instance Show (State a) where
  show = stateId

runTransition :: Eq vocab => State vocab -> vocab -> Maybe (State vocab)
runTransition (GoalState id) _                   = Nothing
runTransition s x | x `notElem` possibleInputs s = Nothing
                  | otherwise                    = Just (_runTrans s x)

Here I define a type State, which is parameterized by a vocabulary type vocab. Now let's define a way that we can trace the execution of a state machine by feeding it inputs.

这里我定义了一个类型状态，它由词汇表类型vocab参数化。现在让我们定义一种方法，通过输入状态机的输入来跟踪状态机的执行。

traceMachine :: (Show vocab, Eq vocab) => State vocab -> [vocab] -> IO ()
traceMachine _ [] = putStrLn "End of input"
traceMachine s (x:xs) = do
  putStrLn "Current state: "
  print s
  putStrLn "Current input: "
  print x
  putStrLn "-----------------------"
  case runTransition s x of
    Nothing -> putStrLn "Invalid transition"
    Just s' -> case s' of
      goal@(GoalState _) -> do
        putStrLn "Goal state reached:"
        print s'
        putStrLn "Input remaining:"
        print xs
      _ -> traceMachine s' xs

Now let's try it out on a simple machine that ignores its inputs. Be warned: the format I have chosen is rather verbose. However, each function that follows can be viewed as a node in a state machine diagram, and I think you'll find the verbosity to be completely relevant to describing such a node. I've used stateId to encode in string format some visual information about how that state behaves.

现在让我们在一个忽略输入的简单机器上尝试一下。请注意:我选择的格式相当冗长。但是，接下来的每个函数都可以看作状态机图中的一个节点，我认为您会发现，冗长与描述这样的节点完全相关。我使用stateId来编码字符串格式的一些关于状态如何运行的视觉信息。

data SimpleVocab = A | B | C deriving (Eq, Ord, Show, Enum)

simpleMachine :: State SimpleVocab
simpleMachine = stateA

stateA :: State SimpleVocab
stateA = State { stateId = "A state. * -> B"
               , possibleInputs = [A,B,C]
               , _runTrans = \_ -> stateB
               }

stateB :: State SimpleVocab
stateB = State { stateId = "B state. * -> C"
               , possibleInputs = [A,B,C]
               , _runTrans = \_ -> stateC
               }

stateC :: State SimpleVocab
stateC = State { stateId = "C state. * -> A"
               , possibleInputs = [A,B,C]
               , _runTrans = \_ -> stateA
               }

Since the inputs don't matter for this state machine, you can feed it anything.

由于输入对这个状态机无关紧要，您可以向它提供任何内容。

ghci> traceMachine simpleMachine [A,A,A,A]

I won't include the output, which is also very verbose, but you can see it clearly moves from stateA to stateB to stateC and back to stateA again. Now let's make a slightly more complicated machine:

我不包含输出，它也非常冗长，但是您可以清楚地看到它从stateA移动到stateB，再回到stateC。现在让我们做一个稍微复杂一点的机器:

lessSimpleMachine :: State SimpleVocab
lessSimpleMachine = startNode

startNode :: State SimpleVocab
startNode = State { stateId = "Start node. A -> 1, C -> 2"
                  , possibleInputs = [A,C]
                  , _runTrans = startNodeTrans
                  }
  where startNodeTrans C = node2
        startNodeTrans A = node1

node1 :: State SimpleVocab
node1 = State { stateId = "node1. B -> start, A -> goal"
              , possibleInputs = [B, A]
              , _runTrans = node1trans
              }
  where node1trans B = startNode
        node1trans A = goalNode

node2 :: State SimpleVocab
node2 = State { stateId = "node2. C -> goal, A -> 1, B -> 2"
              , possibleInputs = [A,B,C]
              , _runTrans = node2trans
              }
  where node2trans A = node1
        node2trans B = node2
        node2trans C = goalNode

goalNode :: State SimpleVocab
goalNode = GoalState "Goal. :)"

The possible inputs and transitions for each node should require no further explanation, as they are verbosely described in the code. I'll let you play with traceMachine lessSipmleMachine inputs for yourself. See what happens when inputs is invalid (does not adhere to the "possible inputs" restrictions), or when you hit a goal node before the end of input.

每个节点可能的输入和转换不需要进一步的解释，因为它们在代码中被详细描述。我会让你自己玩traceMachine lessSipmleMachine。看看当输入无效(不遵守“可能的输入”限制)或在输入结束前命中目标节点时会发生什么。

I suppose the verbosity of my solution sort of fails what you were basically asking, which was to cut down on the cruft. But I think it also illustrates how descriptive Haskell code can be. Even though it is very verbose, it is also very straightforward in how it represents nodes of a state machine diagram.

我认为我的解决方案的冗长有点不符合你的基本要求，那就是削减开支。但我认为它也说明了描述性Haskell代码是怎样的。尽管它非常冗长，但在如何表示状态机图的节点方面也非常简单。

#6

Iit's not hard to make state machines in Haskell, once you realize that they are not monads! A state machine like the one you want is an arrow, an automaton arrow to be exact:

在Haskell中制造状态机并不难，一旦你意识到它们不是monads!你想要的状态机是箭头，准确地说，是自动箭头:

newtype State a b = State (a -> (b, State a b))

This is a function, which takes an input value and produces an output value along with a new version of itself. This is not a monad, because you cannot write join or (>>=) for it. Equivalently once you have turned this into an arrow you will realize that it's impossible to write an ArrowApply instance for it.

这是一个函数，它接受一个输入值并产生一个输出值以及一个新版本的本身。这不是monad，因为不能为它编写join或(>>=)。当你把它变成一个箭头时，你就会意识到写一个ArrowApply实例是不可能的。

Here are the instances:

实例如下:

import Control.Arrow
import Control.Category
import Prelude hiding ((.), id)

instance Category State where
    id = State $ \x -> (x, id)

    State f . State g =
        State $ \x ->
            let (y, s2) = g x
                (z, s1) = f y
            in (z, s1 . s2)

instance Arrow State where
    arr f = let s = State $ \x -> (f x, s) in s
    first (State f) =
        State $ \(x1, x2) ->
            let (y1, s) = f x1
            in ((y1, x2), first s)

Have fun.

玩得开心。

#7

What you want is a recursive type. Different languages have different ways of doing this.

您需要的是递归类型。不同的语言有不同的处理方法。

For example, in OCaml (a statically-typed language), there is an optional compiler/interpreter flag -rectypes that enables support for recursive types, allowing you to define stuff like this:

例如，在OCaml(一种静态类型语言)中，有一个可选的编译器/解释器标志-重新类型，它支持递归类型，允许您定义如下内容:

let rec a () = print_endline "a()"; b
and b () = print_endline "b()"; c
and c () = print_endline "c()"; a
;;

Although this is not "ugly" as you complained about in your C example, what happens underneath is still the same. The compiler simply worries about that for you instead of forcing you to write it out.

虽然这并不像您在C示例中抱怨的那样“丑陋”，但是下面发生的事情仍然是一样的。编译器只是为你担心，而不是强迫你写出来。

As others have pointed out, in Haskell you can use newtype and there won't be any "overhead". But you complain about having to explicitly wrap and unwrap the recursive type, which is "ugly". (Similarly with your C example; there is no "overhead" since at the machine level a 1-member struct is identical to its member, but it is "ugly".)

正如其他人指出的，在Haskell中，您可以使用newtype，并且不会有任何“开销”。但是您会抱怨必须显式地包装和展开递归类型，这是“丑陋的”。(类似于您的C示例;从机器级别上没有“开销”，因为1个成员结构与它的成员相同，但它是“丑陋的”。

Another example I want to mention is Go (another statically-typed language). In Go, the type construct defines a new type. It is not a simple alias (like typedef in C or type in Haskell), but creates a full-fledged new type (like newtype in Haskell) because such a type has an independent "method set" of methods that you can define on it. Because of this, the type definition can be recursive:

另一个例子是Go(另一种静态类型语言)。在Go中，类型构造定义了一个新类型。它不是一个简单的别名(比如C中的typedef或Haskell中的类型)，而是创建了一个完全成熟的新类型(如Haskell中的newtype)，因为这样的类型有一个独立的“方法集”，您可以在它上定义这些方法。因此，类型定义可以递归:

type Fn func () Fn

#8

You can get the same effect in C as in Python code,- just declare that functions return (void*):

您可以在C和Python代码中得到相同的效果，只需声明函数return (void*):

#include "stdio.h"

typedef void* (*myFunc)(void);

void* a(void);
void* b(void);
void* c(void);

void* a(void) {
    printf("a()\n");
    return b;
}

void* b(void) {
    printf("b()\n");
    return c;
}

void* c(void) {
    printf("c()\n");
    return a;
}

void main() {
    void* state = a;
    while (1) {
        state = ((myFunc)state)();
        sleep(1);
    }
}

#9

Your problem has been had before: Recursive declaration of function pointer in C

您的问题之前有:C中的函数指针的递归声明。

C++ operator overloading can be used to hide the mechanics of what is essentially the same as your your C and Haskell solutions, as Herb Sutter describes in GotW #57: Recursive Declarations.

c++运算符重载可以用来隐藏与C和Haskell解决方案本质上相同的机制，如Herb Sutter在GotW #57: recursivedeclaration中描述的那样。

struct FuncPtr_;
typedef FuncPtr_ (*FuncPtr)();

struct FuncPtr_
{
  FuncPtr_( FuncPtr pp ) : p( pp ) { }
  operator FuncPtr() { return p; }
  FuncPtr p;
};

FuncPtr_ f() { return f; } // natural return syntax

int main()
{
  FuncPtr p = f();  // natural usage syntax
  p();
}

But this business with functions will, in all likelihood, perform worse than the equivalent with numeric states. You should use a switch statement or a state table, because what you really want in this situation is a structured semantic equivalent to goto.

但是，这种带有函数的业务很可能比具有数值状态的业务表现得更差。您应该使用switch语句或state表，因为在这种情况下，您真正想要的是与goto等价的结构化语义。

#10

An example in F#:

f#的一个例子:

type Cont = Cont of (unit -> Cont)

let rec a() =
    printfn "a()"
    Cont (fun () -> b 42)

and b n =
    printfn "b(%d)" n
    Cont c

and c() =
    printfn "c()"
    Cont a

let rec run (Cont f) =
    let f = f()
    run f

run (Cont a)

Regarding the question "why is it so hard to implement state machines using functions in statically typed languages?": That's because the type of of a and friends is a little bit weird: a function that when returns a function that returns a function that returns a function...

关于“为什么用静态类型化语言中的函数实现状态机如此困难?”:这是因为a和friends的类型有点奇怪:当一个函数返回一个函数时，它返回一个函数，返回一个函数。

If I remove Cont from my example the F# compiler complains and says:

如果我从我的例子中移除Cont, f#编译器会抱怨说:

Expecting 'a but given unit -> 'a. The resulting type would be infinite when unifying 'a and unit -> 'a.

Another answer shows a solution in OCaml whose type inference is strong enough to remove the need for declaring Cont, which shows static typing is not to blame, rather the lack of powerful type inference in many statically typed languages.

另一个答案显示了OCaml中的一个解决方案，该解决方案的类型推断能力强到足以消除声明Cont的需要，这表明不应归咎于静态类型，而应归咎于在许多静态类型语言中缺乏强大的类型推断。

I don't know why F# doesn't do it, I would guess maybe this would make the type inference algorithm more complicated, slower, or "too powerful" (it could manage to infer the type of incorrectly typed expressions, failing at a later point giving error messages that are hard to understand).

我不知道为什么f#不这么做，我猜这可能会使类型推断算法更加复杂，更慢，或者“太强大”(它可以推断出类型错误的表达式，在稍后的点失败，给出难以理解的错误消息)。

Note that the Python example you gave isn't really safe. In my example, b represents a family of states parameterized by an integer. In an untyped language, it's easy to make a mistake and return b or b 42 instead of the correct lambda and miss that mistake until the code is executed.

注意，您给出的Python示例并不是真正安全的。在我的示例中，b表示由整数参数化的状态族。在非类型化语言中，很容易犯错误并返回b或b42而不是正确的lambda并错过该错误，直到执行代码为止。

#11

-6

The Python code you posted will be transformed into a recursive function, but it will not be tail call optimized because Python has no tail call optimization, so it will stack overflow at some point. So the Python code is actually broken, and would take more work to get it as good as the Haskell or C versions.

您所发布的Python代码将被转换为递归函数，但不会进行尾调用优化，因为Python没有尾调用优化，因此在某些时候会发生堆栈溢出。因此，Python代码实际上已经被破坏了，要使其与Haskell或C版本一样好，还需要做更多的工作。

Here is an example of what I mean:

我的意思是:

so.py:

import threading
stack_size_bytes = 10**5
threading.stack_size(10**5)
machine_word_size = 4

def t1():
    print "start t1"
    n = stack_size_bytes/machine_word_size
    while n:
        n -= 1
    print "done t1"

def t2():
    print "start t2"
    n = stack_size_bytes/machine_word_size+1
    while n:
        n -= 1
    print "done t2"

if __name__ == "__main__":
    t = threading.Thread(target=t1)
    t.start()
    t.join()
    t = threading.Thread(target=t2)
    t.start()
    t.join()

shell:

外壳:

$ python so.py
start t1
done t1
start t2
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 530, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 483, in run
    self.__target(*self.__args, **self.__kwargs)
  File "so.py", line 18, in t2
    print "done t2"
RuntimeError: maximum recursion depth exceeded

#1