如何使我的Hadoop python映射器工作?

时间:2021-10-25 20:32:53

I want to try write python mapper functions for Hadoop MapReduce (as a complete beginner). I've tried the code below and although it runs it returns "terminated- steps completed with errors". I used the default aggregate reducer function.

我想尝试为Hadoop MapReduce编写python mapper函数(作为一个完整的初学者)。我已经尝试了下面的代码,虽然它运行它返回“已终止 - 步骤已完成但有错误”。我使用了默认的聚合减速器功能。

import sys

keywords = ["bear", "bears"]
for line in sys.stin:
    words = line.split()
    for key in keywords:
        if key in words[1:]:
            ans = words[words.index(key)-1] 
            print("%s\t%d" % (ans, 1))

(Thanks in advance)

(提前致谢)

1 个解决方案

#1


for line in sys.stin:

should be

for line in sys.stdin:

In general, you should test your script before running it on hadoop map-reduce with:

通常,您应该在运行hadoop map-reduce之前测试脚本:

cat test_file.txt | python your_mapper.py | sort | python your_reducer.py

Then you would have seen the AttributeError.

然后你会看到AttributeError。

#1


for line in sys.stin:

should be

for line in sys.stdin:

In general, you should test your script before running it on hadoop map-reduce with:

通常,您应该在运行hadoop map-reduce之前测试脚本:

cat test_file.txt | python your_mapper.py | sort | python your_reducer.py

Then you would have seen the AttributeError.

然后你会看到AttributeError。