为什么类型安全的关系操作如此困难?

时间:2022-03-02 16:00:16

I was trying to code a relational problem in Haskell, when I had to find out that doing this in a type safe manner is far from obvious. E.g. a humble

我试图在Haskell中编写关系问题的代码,当时我必须发现以类型安全的方式执行此操作远非显而易见。例如。谦虚

select 1,a,b, from T

already raises a number of questions:

已经提出了一些问题:

  • what is the type of this function?
  • 这个功能的类型是什么?

  • what is the type of the projection 1,a,b ? What is the type of a projection in general?
  • 投影1,a,b的类型是什么?一般来说投影的类型是什么?

  • what is the result type and how do I express the relationship between the result type and the projection?
  • 什么是结果类型,我如何表达结果类型和投影之间的关系?

  • what is the type of such a function which accepts any valid projection?
  • 接受任何有效投影的这种函数的类型是什么?

  • how can I detect invalid projections at compile time ?
  • 如何在编译时检测无效的投影?

  • How would I add a column to a table or to a projection?
  • 如何在表格或投影中添加列?

I believe even Oracle's PL/SQL language does not get this quite right. While invald projections are mostly detected at compile time, the is a large number of type errors which only show at runtime. Most other bindings to RDBMSs (e.g. Java's jdbc and perl's DBI) use SQL contained in Strings and thus give up type-safety entirely.

我相信即使是Oracle的PL / SQL语言也没有这么做。虽然invald投影主要在编译时检测到,但是大量的类型错误仅在运行时显示。对RDBMS的大多数其他绑定(例如Java的jdbc和perl的DBI)使用字符串中包含的SQL,因此完全放弃了类型安全性。

Further research showed that there are some Haskell libraries (HList, vinyl and TRex), which provide type-safe extensible records and some more. But these libraries all require Haskell extensions like DataKinds, FlexibleContexts and many more. Furthermore these libraries are not easy to use and have a smell of trickery, at least to uninitialized observers like me.

进一步的研究表明,有一些Haskell库(HList,乙烯基和TRex),它们提供类型安全的可扩展记录等等。但是这些库都需要Haskell扩展,如DataKinds,FlexibleContexts等等。此外,这些库不易使用,并且有一种诡计的气味,至少对于像我这样的未初始化的观察者来说。

This suggests, that type-safe relational operations do not fit in well with the functional paradigm, at least not as it is implemented in Haskell.

这表明,类型安全的关系操作不能很好地适应功能范例,至少不像在Haskell中实现的那样。

My questions are the following:

我的问题如下:

  • What are the fundamental causes of this difficulty to model relational operations in a type safe way. Where does Hindley-Milner fall short? Or does the problem originate at typed lambda calculus already?
  • 以类型安全的方式对关系操作建模这种困难的根本原因是什么。 Hindley-Milner在哪里落空?或者问题是否源于已键入的lambda演算?

  • Is there a paradigm, where relational operations are first class citizens? And if so, is there a real-world implementation?
  • 是否有范式,关系操作是一等公民?如果是这样,是否有真实的实施?

1 个解决方案

#1


Let's define a table indexed on some columns as a type with two type parameters:

让我们将在某些列上建立索引的表定义为具有两个类型参数的类型:

data IndexedTable k v = ???

groupBy :: (v -> k) -> IndexedTable k v

-- A table without an index just has an empty key
type Table = IndexedTable ()

k will be a (possibly nested) tuple of all columns that the table is indexed on. v will be a (possibly nested) tuple of all columns that the table is not indexed on.

k将是表被索引的所有列的(可能是嵌套的)元组。 v将是表未编入索引的所有列的(可能是嵌套的)元组。

So, for example, if we had the following table

所以,例如,如果我们有下表

| Id | First Name | Last Name |
|----|------------|-----------|
|  0 | Gabriel    | Gonzalez  |
|  1 | Oscar      | Boykin    |
|  2 | Edgar      | Codd      |

... and it were indexed on the first column, then the type would be:

...并将其编入第一列索引,然后类型为:

type Id = Int
type FirstName = String
type LastName = String

IndexedTable Int (FirstName, LastName)

However, if it were indexed on the first and second column, then the type would be:

但是,如果它在第一列和第二列上编入索引,那么类型将是:

IndexedTable (Int, Firstname) LastName

Table would implement the Functor, Applicative, and Alternative type classes. In other words:

Table将实现Functor,Applicative和Alternative类型。换一种说法:

instance Functor (IndexedTable k)

instance Applicative (IndexedTable k)

instance Alternative (IndexedTable k)

So joins would be implemented as:

因此,连接将实现为:

join :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, v2)
join t1 t2 = liftA2 (,) t1 t2

leftJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, Maybe v2)
leftJoin t1 t2 = liftA2 (,) t1 (optional t2)

rightJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (Maybe v1, v2)
rightJoin t1 t2 = liftA2 (,) (optional t1) t2

Then you would have a separate type that we will call a Select. This type will also have two type parameters:

那么你将有一个我们称之为Select的单独类型。此类型还有两个类型参数:

data Select v r = ???

A Select would consume a bunch of rows of type v from the table and produce a result of type r. In other words, we should have a function of type:

Select将从表中消耗一堆v类型的行,并生成类型为r的结果。换句话说,我们应该有一个类型的函数:

selectIndexed :: Indexed k v -> Select v r -> r

Some example Selects that we might define would be:

我们可能定义的一些示例选择将是:

count   :: Select v Integer
sum     :: Num a => Select a a
product :: Num a => Select a a
max     :: Ord a => Select a a

This Select type would implement the Applicative interface, so we could combine multiple Selects into a single Select. For example:

此Select类型将实现Applicative接口,因此我们可以将多个Selects组合到一个Select中。例如:

liftA2 (,) count sum :: Select Integer (Integer, Integer)

That would be analogous to this SQL:

这类似于这个SQL:

SELECT COUNT(*), SUM(*)

However, often our table will have multiple columns, so we need a way to focus a Select onto a single column. Let's call this function Focus:

但是,我们的表通常会有多列,因此我们需要一种方法将Select聚焦到单个列上。让我们称这个功能为焦点:

focus :: Lens' a b -> Select b r -> Select a r

So that we can write things like:

这样我们就可以写下这样的东西:

liftA3 (,,) (focus _1 sum) (focus _2 product) (focus _3 max)
  :: (Num a, Num b, Ord c)
  => Select (a, b, c) (a, b, c)

So if we wanted to write something like:

所以,如果我们想写一些类似的东西:

SELECT COUNT(*), MAX(firstName) FROM t

That would be equivalent to this Haskell code:

这相当于这个Haskell代码:

firstName :: Lens' Row String

table :: Table Row

select table (liftA2 (,) count (focus firstName max)) :: (Integer, String)

So you might wonder how one might implement Select and Table.

所以你可能想知道如何实现Select和Table。

I describe how to implement Table in this post:

我在这篇文章中描述了如何实现Table:

http://www.haskellforall.com/2014/12/a-very-general-api-for-relational-joins.html

... and you can implement Select as just:

...你可以实现Select as:

type Select = Control.Foldl.Fold

type focus = Control.Foldl.pretraverse

-- Assuming you define a `Foldable` instance for `IndexedTable`
select t s = Control.Foldl.fold s t

Also, keep in mind that these are not the only ways to implement Table and Select. They are just a simple implementation to get you started and you can generalize them as necessary.

另外,请记住,这些并不是实现Table和Select的唯一方法。它们只是一个简单的实现,可以帮助您入门,您可以根据需要进行概括。

What about selecting columns from a table? Well, you can define:

如何从表中选择列?好吧,你可以定义:

column :: Select a (Table a)
column = Control.Foldl.list

So if you wanted to do:

所以如果你想这样做:

SELECT col FROM t

... you would write:

......你会写:

field :: Lens' Row Field

table :: Table Row

select (focus field column) table :: [Field]

The important takeaway is that you can implement a relational API in Haskell just fine without any fancy type system extensions.

重要的是你可以在Haskell中实现一个关系API,没有任何花哨的类型系统扩展。

#1


Let's define a table indexed on some columns as a type with two type parameters:

让我们将在某些列上建立索引的表定义为具有两个类型参数的类型:

data IndexedTable k v = ???

groupBy :: (v -> k) -> IndexedTable k v

-- A table without an index just has an empty key
type Table = IndexedTable ()

k will be a (possibly nested) tuple of all columns that the table is indexed on. v will be a (possibly nested) tuple of all columns that the table is not indexed on.

k将是表被索引的所有列的(可能是嵌套的)元组。 v将是表未编入索引的所有列的(可能是嵌套的)元组。

So, for example, if we had the following table

所以,例如,如果我们有下表

| Id | First Name | Last Name |
|----|------------|-----------|
|  0 | Gabriel    | Gonzalez  |
|  1 | Oscar      | Boykin    |
|  2 | Edgar      | Codd      |

... and it were indexed on the first column, then the type would be:

...并将其编入第一列索引,然后类型为:

type Id = Int
type FirstName = String
type LastName = String

IndexedTable Int (FirstName, LastName)

However, if it were indexed on the first and second column, then the type would be:

但是,如果它在第一列和第二列上编入索引,那么类型将是:

IndexedTable (Int, Firstname) LastName

Table would implement the Functor, Applicative, and Alternative type classes. In other words:

Table将实现Functor,Applicative和Alternative类型。换一种说法:

instance Functor (IndexedTable k)

instance Applicative (IndexedTable k)

instance Alternative (IndexedTable k)

So joins would be implemented as:

因此,连接将实现为:

join :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, v2)
join t1 t2 = liftA2 (,) t1 t2

leftJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, Maybe v2)
leftJoin t1 t2 = liftA2 (,) t1 (optional t2)

rightJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (Maybe v1, v2)
rightJoin t1 t2 = liftA2 (,) (optional t1) t2

Then you would have a separate type that we will call a Select. This type will also have two type parameters:

那么你将有一个我们称之为Select的单独类型。此类型还有两个类型参数:

data Select v r = ???

A Select would consume a bunch of rows of type v from the table and produce a result of type r. In other words, we should have a function of type:

Select将从表中消耗一堆v类型的行,并生成类型为r的结果。换句话说,我们应该有一个类型的函数:

selectIndexed :: Indexed k v -> Select v r -> r

Some example Selects that we might define would be:

我们可能定义的一些示例选择将是:

count   :: Select v Integer
sum     :: Num a => Select a a
product :: Num a => Select a a
max     :: Ord a => Select a a

This Select type would implement the Applicative interface, so we could combine multiple Selects into a single Select. For example:

此Select类型将实现Applicative接口,因此我们可以将多个Selects组合到一个Select中。例如:

liftA2 (,) count sum :: Select Integer (Integer, Integer)

That would be analogous to this SQL:

这类似于这个SQL:

SELECT COUNT(*), SUM(*)

However, often our table will have multiple columns, so we need a way to focus a Select onto a single column. Let's call this function Focus:

但是,我们的表通常会有多列,因此我们需要一种方法将Select聚焦到单个列上。让我们称这个功能为焦点:

focus :: Lens' a b -> Select b r -> Select a r

So that we can write things like:

这样我们就可以写下这样的东西:

liftA3 (,,) (focus _1 sum) (focus _2 product) (focus _3 max)
  :: (Num a, Num b, Ord c)
  => Select (a, b, c) (a, b, c)

So if we wanted to write something like:

所以,如果我们想写一些类似的东西:

SELECT COUNT(*), MAX(firstName) FROM t

That would be equivalent to this Haskell code:

这相当于这个Haskell代码:

firstName :: Lens' Row String

table :: Table Row

select table (liftA2 (,) count (focus firstName max)) :: (Integer, String)

So you might wonder how one might implement Select and Table.

所以你可能想知道如何实现Select和Table。

I describe how to implement Table in this post:

我在这篇文章中描述了如何实现Table:

http://www.haskellforall.com/2014/12/a-very-general-api-for-relational-joins.html

... and you can implement Select as just:

...你可以实现Select as:

type Select = Control.Foldl.Fold

type focus = Control.Foldl.pretraverse

-- Assuming you define a `Foldable` instance for `IndexedTable`
select t s = Control.Foldl.fold s t

Also, keep in mind that these are not the only ways to implement Table and Select. They are just a simple implementation to get you started and you can generalize them as necessary.

另外,请记住,这些并不是实现Table和Select的唯一方法。它们只是一个简单的实现,可以帮助您入门,您可以根据需要进行概括。

What about selecting columns from a table? Well, you can define:

如何从表中选择列?好吧,你可以定义:

column :: Select a (Table a)
column = Control.Foldl.list

So if you wanted to do:

所以如果你想这样做:

SELECT col FROM t

... you would write:

......你会写:

field :: Lens' Row Field

table :: Table Row

select (focus field column) table :: [Field]

The important takeaway is that you can implement a relational API in Haskell just fine without any fancy type system extensions.

重要的是你可以在Haskell中实现一个关系API,没有任何花哨的类型系统扩展。