I am trying to convert EBCDIC file to ASCII using CobolIoProvider class from JRecord in Apache Beam.
我正在尝试使用Apache Beam中JRecord的CobolIoProvider类将EBCDIC文件转换为ASCII。
Code that I'm using:
我正在使用的代码:
CobolIoProvider ioProvider = CobolIoProvider.getInstance();
AbstractLineReader reader = ioProvider.getLineReader(Constants.IO_FIXED_LENGTH, Convert.FMT_MAINFRAME,CopybookLoader.SPLIT_NONE, copybookname, cobolfilename);
The code reads and converts the file as required. I am able to read the cobolfilename and copybookname only from the local system which are basically paths of the EBCDIC file and the copybook respectively. However, when I try to read the files from GCS, it fails with FileNotFoundException – “The filename, directory name, or volume label syntax is incorrect” .
代码根据需要读取和转换文件。我只能从本地系统读取cobolfilename和copybookname,它们分别是EBCDIC文件和副本的路径。但是,当我尝试从GCS读取文件时,它会因FileNotFoundException而失败 - “文件名,目录名称或卷标语法不正确”。
Is there a way to read Cobol file(EBCDIC) from GCS using CobolIoProvider class ?
有没有办法使用CobolIoProvider类从GCS读取Cobol文件(EBCDIC)?
If not, is there any other class available to convert Cobol file(EBCDIC) to ASCII and allowing the files to be read from GCS.
如果没有,是否有任何其他类可用于将Cobol文件(EBCDIC)转换为ASCII并允许从GCS读取文件。
Using ICobolIOBuilder:-
使用ICobolIOBuilder: -
Code that I’m using:
我正在使用的代码:
ICobolIOBuilder iob = JRecordInterface1.COBOL.newIOBuilder("copybook.cbl")
.setFileOrganization(Constants.IO_FIXED_LENGTH)
.setSplitCopybook(CopybookLoader.SPLIT_NONE);
AbstractLineReader reader = iob.newReader(bs); //bs is an InputStream object of my Cobol file
However, here are a few concerns:-
但是,这里有一些问题: -
1) I have to keep my copybook.cbl locally. Is there any way to read copybook file from GCS. I tried the below code, trying to read my copybook from GCS to Stream and pass the stream to LoadCopyBook(). But the code didn’t work.
1)我必须在本地保留我的copybook.cbl。有没有办法从GCS读取字帖文件。我尝试了以下代码,尝试将我的副本从GCS读取到Stream并将流传递给LoadCopyBook()。但是代码没有用。
Sample code below:
示例代码如下:
InputStream bs2 = new ByteArrayInputStream(copybookfile.toString().getBytes());
LayoutDetail schema = new CobolCopybookLoader()
.loadCopyBook( bs, " copybook.cbl",
CopybookLoader.SPLIT_NONE, 0, "",
Constants.USE_STANDARD_COLUMNS,
Convert.FMT_INTEL, 0, new TextLog())
.asLayoutDetail();
AbstractLineReader reader = LineIOProvider.getInstance().getLineReader(schema);
reader.open(inputStream, schema);
2) Reading the EBCDIC file from stream using newReader didn’t convert my file to ascii.
2)使用newReader从流中读取EBCDIC文件未将我的文件转换为ascii。
Thanks.
谢谢。
2 个解决方案
#1
1
I do not have a full answer. If you are using a recent version of suggest changing the JRecord code to use the JRecordInterface1. The IO-Builder is a lot more flexible than the older CobolIoProvider interface.
我没有完整的答案。如果您使用的是最新版本的建议,请更改JRecord代码以使用JRecordInterface1。 IO-Builder比旧的CobolIoProvider接口更灵活。
String encoding = "cp037"; // cp037/IBM037 US ebcdic; cp273 - German ebcdic
ICobolIOBuilder iob = JRecordInterface1.COBOL
.newIOBuilder("CopybookFile.cbl")
.setFileOrganization(Constants.IO_FIXED_LENGTH)
.setFont(encoding); // should set encoding if you can
AbstractLineReader reader = iob.newReader(datastream);
With the IO-Builder interface you can use streams. This question Stream file from Google Cloud Storage is about creating a stream from GCS, may be useful. Hopefully some one with more knowledge of GCS can help.
使用IO-Builder界面,您可以使用流。来自Google云端存储的此问题流文件是关于从GCS创建流,可能很有用。希望有一些对GCS有更多了解的人可以提供帮助。
Alternatively you could read from GCS directly and create data-lines(data-records) using the newLine method of a JRecord-IO-Builder:
或者,您可以直接从GCS读取并使用JRecord-IO-Builder的newLine方法创建数据行(数据记录):
AbstractLine l = iob.newLine(byteArray);
I will look at creating a basic Read/Write interface to JRecord so JRecord user's can write there own interface to GCS or IBM's Mainframe Access (ZFile) etc. But this will take time.
我将着眼于为JRecord创建一个基本的读/写接口,以便JRecord用户可以编写自己的接口到GCS或IBM的大型机访问(ZFile)等。但这需要时间。
#2
1
The easiest way to use Beam/Dataflow with new kinds of file-based sources is to first use FileIO
to get a PCollection<ReadableFile>
and then use a DoFn
to read that file. This will require implementing the code to read from a given channel. Something like the following:
将Beam / Dataflow与新型基于文件的源一起使用的最简单方法是首先使用FileIO获取PCollection
Pipeline p = ...
p.apply(FileIO.match().filepattern("..."))
.apply(FileIO.readMatches(...))
.apply(new DoFn<ReadableFile, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
try (ReadableByteChannel channel = c.element().open()) {
// Use CobolIO to read from the byte channel
}
});
#1
1
I do not have a full answer. If you are using a recent version of suggest changing the JRecord code to use the JRecordInterface1. The IO-Builder is a lot more flexible than the older CobolIoProvider interface.
我没有完整的答案。如果您使用的是最新版本的建议,请更改JRecord代码以使用JRecordInterface1。 IO-Builder比旧的CobolIoProvider接口更灵活。
String encoding = "cp037"; // cp037/IBM037 US ebcdic; cp273 - German ebcdic
ICobolIOBuilder iob = JRecordInterface1.COBOL
.newIOBuilder("CopybookFile.cbl")
.setFileOrganization(Constants.IO_FIXED_LENGTH)
.setFont(encoding); // should set encoding if you can
AbstractLineReader reader = iob.newReader(datastream);
With the IO-Builder interface you can use streams. This question Stream file from Google Cloud Storage is about creating a stream from GCS, may be useful. Hopefully some one with more knowledge of GCS can help.
使用IO-Builder界面,您可以使用流。来自Google云端存储的此问题流文件是关于从GCS创建流,可能很有用。希望有一些对GCS有更多了解的人可以提供帮助。
Alternatively you could read from GCS directly and create data-lines(data-records) using the newLine method of a JRecord-IO-Builder:
或者,您可以直接从GCS读取并使用JRecord-IO-Builder的newLine方法创建数据行(数据记录):
AbstractLine l = iob.newLine(byteArray);
I will look at creating a basic Read/Write interface to JRecord so JRecord user's can write there own interface to GCS or IBM's Mainframe Access (ZFile) etc. But this will take time.
我将着眼于为JRecord创建一个基本的读/写接口,以便JRecord用户可以编写自己的接口到GCS或IBM的大型机访问(ZFile)等。但这需要时间。
#2
1
The easiest way to use Beam/Dataflow with new kinds of file-based sources is to first use FileIO
to get a PCollection<ReadableFile>
and then use a DoFn
to read that file. This will require implementing the code to read from a given channel. Something like the following:
将Beam / Dataflow与新型基于文件的源一起使用的最简单方法是首先使用FileIO获取PCollection
Pipeline p = ...
p.apply(FileIO.match().filepattern("..."))
.apply(FileIO.readMatches(...))
.apply(new DoFn<ReadableFile, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
try (ReadableByteChannel channel = c.element().open()) {
// Use CobolIO to read from the byte channel
}
});