如何在从Haskell中的文本文件中读取后将元组添加到列表中

时间:2021-08-19 00:34:09

I am trying to create a program in Haskell that reads text from a text file and adds them to a list.

我正在尝试在Haskell中创建一个程序,该程序从文本文件中读取文本并将它们添加到列表中。

My idea is:

我的想法是:

type x = [(String, Integer)] 

where the String is each word from the text and Integer is how many times that word occurs in the text. So I want to create a tuple of those values and add it to a list. I then want to print the contents of the list.

其中String是文本中的每个单词,而Integer是该单词在文本中出现的次数。所以我想创建这些值的元组并将其添加到列表中。然后我想打印列表的内容。

I know how to read a text file in Haskell, but am unsure as to what to do next. I am new to programming in Haskell and have predominantly been programming in Java which is very different.

我知道如何在Haskell中读取文本文件,但我不确定下一步该做什么。我是Haskell编程的新手,主要是用Java编程,这是非常不同的。

EDIT:

This is what I have so far from the suggestions. I am able to write to an output text file with the text received from the file and make it lower case. The issues I am having is using the other functions because it says:

这是我迄今为止的建议。我能够使用从文件接收的文本写入输出文本文件并将其设置为小写。我遇到的问题是使用其他功能,因为它说:

Test.hs:14:59: Not in scope: ‘group’

Here is the code:

这是代码:

import System.IO  
import Data.Char(toLower)

main = do  
       contents <- readFile "testFile.txt"
       let lowContents = map toLower contents
       let outStr = countWords (lowContents)
       let finalStr = sortOccurrences (outStr)
       print outStr

-- Counts all the words
countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)

-- Split words
toWords :: String -> [String]
toWords s = words s

-- Counts, how often each string in the given list appears
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sortOccurrences xs

-- Sort list in order of occurrences.
sortOccurrences :: [(String, Int)] -> [(String, Int)]
sortOccurrences sort = sortBy sort (comparing snd)

Please can anyone help me with this.

请任何人都可以帮助我。

2 个解决方案

#1


Haskell features a fairly expressive type system (much more so than Java) so it's a good idea to consider this issue purely in terms of types, in a top-down fashion. You mentioned that you already know how read a text file in Haskell, so I'll assume you know how to get a String which holds the file contents.

Haskell具有一个相当富有表现力的类型系统(远远超过Java),所以最好从自上而下的方式考虑这个问题。您提到您已经知道如何在Haskell中读取文本文件,因此我假设您知道如何获取保存文件内容的String。

The function you'd like to define is something like this. For now, we'll set the definition to undefined such that the code typechecks (but yields an exception at runtime):

您要定义的功能是这样的。现在,我们将定义设置为undefined,以便代码类型检查(但在运行时产生异常):

countWords :: String -> [(String, Int)]
countWords fileContents = undefined

Your function maps a String (the file contents) to a list of tuples, each of which associating some word with the count how often that word appeared in the input. This sounds like one part of the solution will be a function which can split a string into a list of words such that you can then process that to count the words. I.e. you'll want something like this:

您的函数将String(文件内容)映射到元组列表,每个元组都将某个单词与计数相关联。该单词出现在输入中的频率。这听起来像解决方案的一部分将是一个函数,它可以将一个字符串拆分成一个单词列表,这样你就可以处理它来计算单词。即你会想要这样的东西:

-- Splits a string into a list of words
toWords :: String -> [String]
toWords s = undefined

-- Counts, how often each string in the given list appears
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = undefined

With these at hand, you can actually define the original function:

有了这些,您可以实际定义原始功能:

countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)

You now nicely decomposed the problem into two sub-problems.

您现在很好地将问题分解为两个子问题。

Another nice aspect of this type-driven programm is that Hoogle can be told to go look for functions for a given type. For instance, consider the type of the toWords function we sketched earlier:

这种类型驱动程序的另一个不错的方面是可以告诉Hoogle去寻找给定类型的函数。例如,考虑我们之前草拟的toWords函数的类型:

toWords :: String -> [String]
toWords s = undefined

Feeding this to Hoogle reveals a nice function: words which seems to do just what we want! So we can define

向Hoogle提供此功能可以发现一个很好的功能:单词似乎正是我们想要的!所以我们可以定义

toWords :: String -> [String]
toWords s = words s

The only thing missing is coming up with an appropriate definition for countOccurrences. Alas, searching for this type on Hoogle doesn't show any ready-made solutions. However, there are three functions which will be useful for coming up with our own definition: sort, group and map:

唯一缺少的就是为countOccurrences提供一个合适的定义。唉,在Hoogle上搜索这种类型并没有显示任何现成的解决方案。但是,有三个函数可用于提出我们自己的定义:sort,group和map:

  1. The sort function does, what the name suggests: it sorts a list of things:

    sort函数的作用,顾名思义:它对一系列事物进行排序:

    λ: sort [1,1,1,2,2,1,1,3,3]
    [1,1,1,1,1,2,2,3,3]
    
  2. The group function groups consecutive(!) equal elements, yielding a list of lists. E.g.

    组函数将连续(!)相等的元素分组,产生列表列表。例如。

    λ: group [1,1,1,1,1,2,2,3,3]
    [[1,1,1,1,1],[2,2],[3,3]]
    
  3. The map function can be used to turn the list of lists produced by group into a list of tuples, giving the length of each group:

    map函数可用于将group生成的列表列表转换为元组列表,给出每个组的长度:

    λ: map (\xs -> (head xs, length xs)) [[1,1,1,1,1],[2,2],[3,3]]
    [(1,5),(2,2),(3,2)]
    

Composing these three functions allows you to define

通过编写这三个函数,您可以进行定义

countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sort $ xs

Now you have all the pieces in place. Your countWords is defined in terms to toWords and countOccurrences, each of which having a proper definition.

现在你已经准备好了所有的部分。你的countWords是用toWords和countOccurrences来定义的,每个都有一个正确的定义。

The nice thing about this type-driven approach is that writing down the funciton signatures will help both your thinking as well as the compiler (catching you when you violate assumptions). You also, automatically, decompose the problem into smaller problems, each of which you can test independently in ghci.

关于这种类型驱动方法的好处是,写下函数签名将有助于你的思考和编译器(当你违反假设时捕获你)。您还可以自动将问题分解为较小的问题,每个问题都可以在ghci中独立测试。

#2


Data.Map is the easiest way to do this.

Data.Map是最简单的方法。

import qualified Data.Map as M

-- assuming you already have your list of words:
listOfWords :: [String]

-- you can generate your list of tuples with this
listOfTuples :: [(String, Integer)]
listOfTuples = M.toList . M.fromListWith (+) $ zip listOfWords (repeat 1)

#1


Haskell features a fairly expressive type system (much more so than Java) so it's a good idea to consider this issue purely in terms of types, in a top-down fashion. You mentioned that you already know how read a text file in Haskell, so I'll assume you know how to get a String which holds the file contents.

Haskell具有一个相当富有表现力的类型系统(远远超过Java),所以最好从自上而下的方式考虑这个问题。您提到您已经知道如何在Haskell中读取文本文件,因此我假设您知道如何获取保存文件内容的String。

The function you'd like to define is something like this. For now, we'll set the definition to undefined such that the code typechecks (but yields an exception at runtime):

您要定义的功能是这样的。现在,我们将定义设置为undefined,以便代码类型检查(但在运行时产生异常):

countWords :: String -> [(String, Int)]
countWords fileContents = undefined

Your function maps a String (the file contents) to a list of tuples, each of which associating some word with the count how often that word appeared in the input. This sounds like one part of the solution will be a function which can split a string into a list of words such that you can then process that to count the words. I.e. you'll want something like this:

您的函数将String(文件内容)映射到元组列表,每个元组都将某个单词与计数相关联。该单词出现在输入中的频率。这听起来像解决方案的一部分将是一个函数,它可以将一个字符串拆分成一个单词列表,这样你就可以处理它来计算单词。即你会想要这样的东西:

-- Splits a string into a list of words
toWords :: String -> [String]
toWords s = undefined

-- Counts, how often each string in the given list appears
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = undefined

With these at hand, you can actually define the original function:

有了这些,您可以实际定义原始功能:

countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)

You now nicely decomposed the problem into two sub-problems.

您现在很好地将问题分解为两个子问题。

Another nice aspect of this type-driven programm is that Hoogle can be told to go look for functions for a given type. For instance, consider the type of the toWords function we sketched earlier:

这种类型驱动程序的另一个不错的方面是可以告诉Hoogle去寻找给定类型的函数。例如,考虑我们之前草拟的toWords函数的类型:

toWords :: String -> [String]
toWords s = undefined

Feeding this to Hoogle reveals a nice function: words which seems to do just what we want! So we can define

向Hoogle提供此功能可以发现一个很好的功能:单词似乎正是我们想要的!所以我们可以定义

toWords :: String -> [String]
toWords s = words s

The only thing missing is coming up with an appropriate definition for countOccurrences. Alas, searching for this type on Hoogle doesn't show any ready-made solutions. However, there are three functions which will be useful for coming up with our own definition: sort, group and map:

唯一缺少的就是为countOccurrences提供一个合适的定义。唉,在Hoogle上搜索这种类型并没有显示任何现成的解决方案。但是,有三个函数可用于提出我们自己的定义:sort,group和map:

  1. The sort function does, what the name suggests: it sorts a list of things:

    sort函数的作用,顾名思义:它对一系列事物进行排序:

    λ: sort [1,1,1,2,2,1,1,3,3]
    [1,1,1,1,1,2,2,3,3]
    
  2. The group function groups consecutive(!) equal elements, yielding a list of lists. E.g.

    组函数将连续(!)相等的元素分组,产生列表列表。例如。

    λ: group [1,1,1,1,1,2,2,3,3]
    [[1,1,1,1,1],[2,2],[3,3]]
    
  3. The map function can be used to turn the list of lists produced by group into a list of tuples, giving the length of each group:

    map函数可用于将group生成的列表列表转换为元组列表,给出每个组的长度:

    λ: map (\xs -> (head xs, length xs)) [[1,1,1,1,1],[2,2],[3,3]]
    [(1,5),(2,2),(3,2)]
    

Composing these three functions allows you to define

通过编写这三个函数,您可以进行定义

countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sort $ xs

Now you have all the pieces in place. Your countWords is defined in terms to toWords and countOccurrences, each of which having a proper definition.

现在你已经准备好了所有的部分。你的countWords是用toWords和countOccurrences来定义的,每个都有一个正确的定义。

The nice thing about this type-driven approach is that writing down the funciton signatures will help both your thinking as well as the compiler (catching you when you violate assumptions). You also, automatically, decompose the problem into smaller problems, each of which you can test independently in ghci.

关于这种类型驱动方法的好处是,写下函数签名将有助于你的思考和编译器(当你违反假设时捕获你)。您还可以自动将问题分解为较小的问题,每个问题都可以在ghci中独立测试。

#2


Data.Map is the easiest way to do this.

Data.Map是最简单的方法。

import qualified Data.Map as M

-- assuming you already have your list of words:
listOfWords :: [String]

-- you can generate your list of tuples with this
listOfTuples :: [(String, Integer)]
listOfTuples = M.toList . M.fromListWith (+) $ zip listOfWords (repeat 1)