Linux终端输入:在4095字符限制下从终端截断行读取用户输入

时间:2020-12-01 00:25:24

In a bash script, I try to read lines from standard input, using built-in read command after setting IFS=$'\n'. The lines are truncated at 4095 character limit if I paste input to the read. This limitation seems to come from reading from terminal, because this worked perfectly fine:

在bash脚本中,我尝试在设置IFS= '\n'后使用内置的read命令从标准输入中读取行。如果我将输入粘贴到read中,行将在4095字符限制处被截断。这个限制似乎来自于从终端读取,因为它工作得非常好:

fill=
for i in $(seq 1 94); do fill="${fill}x"; done
for i in $(seq 1 100); do printf "%04d00$fill" $i; done | (read line; echo $line)

I experience the same behavior with Python script (did not accept longer than 4095 input from terminal, but accepted from pipe):

我对Python脚本也有同样的体验(不接受终端输入超过4095,但接受管道输入):

#!/usr/bin/python

from sys import stdin

line = stdin.readline()
print('%s' % line)

Even C program works the same, using read(2):

甚至C程序也是如此,使用read(2):

#include <stdio.h>
#include <unistd.h>

int main(void)
{
    char buf[32768];
    int sz = read(0, buf, sizeof(buf) - 1);
    buf[sz] = '\0';
    printf("READ LINE: [%s]\n", buf);
    return 0;
}

In all cases, I cannot enter longer than about 4095 characters. The input prompt stops accepting characters.

在所有情况下,我输入的字符不能超过4095个。输入提示符停止接受字符。

Question-1: Is there a way to interactively read from terminal longer than 4095 characters in Linux systems (at least Ubuntu 10.04 and 13.04)?

问题1:在Linux系统中,是否有一种方法可以从超过4095个字符的终端进行交互读取(至少是Ubuntu 10.04和13.04)?

Question-2: Where does this limitation come from?

问题2:这种限制从何而来?

Systems affected: I noticed this limitation in Ubuntu 10.04/x86 and 13.04/x86, but Cygwin (recent version at least) does not truncate yet at over 10000 characters (did not test further since I need to get this script working in Ubuntu). Terminals used: Virtual Console and KDE konsole (Ubuntu 13.04) and gnome-terminal (Ubuntu 10.04).

系统受影响:我在Ubuntu 10.04/x86和13.04/x86中注意到了这个限制,但是Cygwin(至少是最新版本)还没有截断超过10000个字符(由于我需要让这个脚本在Ubuntu中运行,所以没有进一步测试)。使用的终端:虚拟控制台和KDE konsole (Ubuntu 13.04)和gnome-terminal (Ubuntu 10.04)。

3 个解决方案

#1


8  

Please refer to termios(3) manual page, "Canonical and noncanonical mode".

请参阅termios(3)手册页“规范和非规范模式”。

By default the terminal (standard input) is in canonical mode; in this mode the kernel will buffer the input line before returning the input to the application. The hard-coded limit for Linux (maybe N_TTY_BUF_SIZE defined in ${linux_source_path}/include/linux/tty.h) is set to 4096 allowing input of 4095 characters not counting the ending new line. In noncanonical mode there will by default be no buffering by kernel and the read(2) system call returns immediately once a single character of input is returned (key is pressed). You can manipulate the terminal settings to read a specified amount of characters or set a time-out for non-canonical mode, but then too the hard-coded limit is 4095 per the termios(3) manual page.

默认情况下,终端(标准输入)处于规范模式;在这种模式下,内核将在将输入返回给应用程序之前缓冲输入行。Linux的硬编码限制(可能是${linux_source_path}/include/ Linux /tty.h中定义的N_TTY_BUF_SIZE)被设置为4096,允许输入4095个字符,不包括新的行。在非标准模式下,默认情况下内核不会缓存,一旦返回一个输入字符(按下键),read(2)系统调用立即返回。您可以操作终端设置来读取指定数量的字符,或者设置非规范模式的超时,但是硬编码的限制也是每个termios(3)手动页面4095。

Bash read builtin command still works in non-canonical mode as can be demonstrated by the following:

Bash read builtin命令仍然在非规范模式下工作,如下所示:

IFS=$'\n'      # Allow spaces and other white spaces.
stty -icanon   # Disable canonical mode.
read line      # Now we can read without inhibitions set by terminal.
stty icanon    # Re-enable canonical mode (assuming it was enabled to begin with).

After this modification of adding stty -icanon you can paste longer than 4096 character string and read it successfully using bash built-in read command (I successfully tried longer than 10000 characters).

在添加stty -icanon之后,您可以粘贴超过4096个字符串并使用bash内置的read命令成功读取它(我成功地尝试了超过10000个字符)。

If you put this in a file, i.e. make it a script, you can use strace to see the system calls called, and you will see read(2) called multiple times, each time returning a single character.

如果将其放入一个文件中,即使其成为一个脚本,您可以使用strace查看调用的系统调用,您将看到read(2)多次调用,每次返回一个字符。

#2


-1  

I do not have a workaround for you, but I can answer question 2. In linux PIPE_BUF is set to 4096 (in limits.h) If you do a write of more than 4096 to a pipe it will be truncated.

我没有办法帮你,但是我可以回答问题2。在linux PIPE_BUF中设置为4096(在限制中),如果您对一个管道写超过4096,它将被截断。

From /usr/include/linux/limits.h:

从/usr/include/linux/limits.h:

#ifndef _LINUX_LIMITS_H
#define _LINUX_LIMITS_H

#define NR_OPEN         1024

#define NGROUPS_MAX    65536    /* supplemental group IDs are available */
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */
#define LINK_MAX         127    /* # links a file may have */
#define MAX_CANON        255    /* size of the canonical input queue */
#define MAX_INPUT        255    /* size of the type-ahead buffer */
#define NAME_MAX         255    /* # chars in a file name */
#define PATH_MAX        4096    /* # chars in a path name including nul */
#define PIPE_BUF        4096    /* # bytes in atomic write to a pipe */
#define XATTR_NAME_MAX   255    /* # chars in an extended attribute name */
#define XATTR_SIZE_MAX 65536    /* size of an extended attribute value (64k) */
#define XATTR_LIST_MAX 65536    /* size of extended attribute namelist (64k) */

#define RTSIG_MAX     32

#endif

#3


-1  

The problem is definitely not the read() ; as it can read upto any valid integer value. The problem comes from the heap memory or the pipe size.. as they are the only possible limiting factors to the size..

问题绝对不是read();因为它可以读取任何有效的整数值。问题来自堆内存或管道大小。因为它们是唯一可能限制尺寸的因素。

#1


8  

Please refer to termios(3) manual page, "Canonical and noncanonical mode".

请参阅termios(3)手册页“规范和非规范模式”。

By default the terminal (standard input) is in canonical mode; in this mode the kernel will buffer the input line before returning the input to the application. The hard-coded limit for Linux (maybe N_TTY_BUF_SIZE defined in ${linux_source_path}/include/linux/tty.h) is set to 4096 allowing input of 4095 characters not counting the ending new line. In noncanonical mode there will by default be no buffering by kernel and the read(2) system call returns immediately once a single character of input is returned (key is pressed). You can manipulate the terminal settings to read a specified amount of characters or set a time-out for non-canonical mode, but then too the hard-coded limit is 4095 per the termios(3) manual page.

默认情况下,终端(标准输入)处于规范模式;在这种模式下,内核将在将输入返回给应用程序之前缓冲输入行。Linux的硬编码限制(可能是${linux_source_path}/include/ Linux /tty.h中定义的N_TTY_BUF_SIZE)被设置为4096,允许输入4095个字符,不包括新的行。在非标准模式下,默认情况下内核不会缓存,一旦返回一个输入字符(按下键),read(2)系统调用立即返回。您可以操作终端设置来读取指定数量的字符,或者设置非规范模式的超时,但是硬编码的限制也是每个termios(3)手动页面4095。

Bash read builtin command still works in non-canonical mode as can be demonstrated by the following:

Bash read builtin命令仍然在非规范模式下工作,如下所示:

IFS=$'\n'      # Allow spaces and other white spaces.
stty -icanon   # Disable canonical mode.
read line      # Now we can read without inhibitions set by terminal.
stty icanon    # Re-enable canonical mode (assuming it was enabled to begin with).

After this modification of adding stty -icanon you can paste longer than 4096 character string and read it successfully using bash built-in read command (I successfully tried longer than 10000 characters).

在添加stty -icanon之后,您可以粘贴超过4096个字符串并使用bash内置的read命令成功读取它(我成功地尝试了超过10000个字符)。

If you put this in a file, i.e. make it a script, you can use strace to see the system calls called, and you will see read(2) called multiple times, each time returning a single character.

如果将其放入一个文件中,即使其成为一个脚本,您可以使用strace查看调用的系统调用,您将看到read(2)多次调用,每次返回一个字符。

#2


-1  

I do not have a workaround for you, but I can answer question 2. In linux PIPE_BUF is set to 4096 (in limits.h) If you do a write of more than 4096 to a pipe it will be truncated.

我没有办法帮你,但是我可以回答问题2。在linux PIPE_BUF中设置为4096(在限制中),如果您对一个管道写超过4096,它将被截断。

From /usr/include/linux/limits.h:

从/usr/include/linux/limits.h:

#ifndef _LINUX_LIMITS_H
#define _LINUX_LIMITS_H

#define NR_OPEN         1024

#define NGROUPS_MAX    65536    /* supplemental group IDs are available */
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */
#define LINK_MAX         127    /* # links a file may have */
#define MAX_CANON        255    /* size of the canonical input queue */
#define MAX_INPUT        255    /* size of the type-ahead buffer */
#define NAME_MAX         255    /* # chars in a file name */
#define PATH_MAX        4096    /* # chars in a path name including nul */
#define PIPE_BUF        4096    /* # bytes in atomic write to a pipe */
#define XATTR_NAME_MAX   255    /* # chars in an extended attribute name */
#define XATTR_SIZE_MAX 65536    /* size of an extended attribute value (64k) */
#define XATTR_LIST_MAX 65536    /* size of extended attribute namelist (64k) */

#define RTSIG_MAX     32

#endif

#3


-1  

The problem is definitely not the read() ; as it can read upto any valid integer value. The problem comes from the heap memory or the pipe size.. as they are the only possible limiting factors to the size..

问题绝对不是read();因为它可以读取任何有效的整数值。问题来自堆内存或管道大小。因为它们是唯一可能限制尺寸的因素。