如何从一个字符串中创建熊猫DataFrame ?

时间:2021-01-03 23:47:59

In order to test some functionality I would like to create a DataFrame from a string. Let's say my test data looks like:

为了测试某些功能,我想从字符串中创建一个DataFrame。假设我的测试数据如下:

TESTDATA="""col1;col2;col3
1;4.4;99
2;4.5;200
3;4.7;65
4;3.2;140
"""

What is the simplest way to read that data into a Pandas DataFrame?

将数据读入熊猫数据存储器的最简单方法是什么?

2 个解决方案

#1


241  

A simple way to do this is to use StringIO and pass that to the pandas.read_csv function. E.g:

一个简单的方法是使用StringIO并将其传递给熊猫。read_csv函数。例句:

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA = StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)

df = pd.read_csv(TESTDATA, sep=";")

#2


2  

A traditional variable-width CSV is unreadable for storing data as a string variable. Consider fixed-width pipe-separated data instead. Various IDEs and editors may have a plugin to format pipe-separated text into a neat table.

传统的可变宽度CSV不能作为字符串变量存储数据。考虑使用固定宽度的管道分隔的数据。各种ide和编辑器可能都有一个插件来将管道分隔的文本格式化为一个整洁的表。

The following works for me. To use it, store it into a file named pandas_util.py and call read_pipe_separated_str(str_input). An example is included in the function's docstring.

以下是我的工作。要使用它,将它存储到一个名为pandas_util的文件中。py和调用read_pipe_separated_str(str_input)。函数的docstring中包含一个示例。

import io
import re

import pandas as pd


def _prepare_pipe_separated_str(str_input):
    substitutions = [
        ('^ *', ''),  # Remove leading spaces
        (' *$', ''),  # Remove trailing spaces
        (r' *\| *', '|'),  # Remove spaces between columns
    ]
    if all(line.lstrip().startswith('|') and line.rstrip().endswith('|') for line in str_input.strip().split('\n')):
        substitutions.extend([
            (r'^\|', ''),  # Remove redundant leading delimiter
            (r'\|$', ''),  # Remove redundant trailing delimiter
        ])
    for pattern, replacement in substitutions:
        str_input = re.sub(pattern, replacement, str_input, flags=re.MULTILINE)
    return str_input


def read_pipe_separated_str(str_input):
    """Read a Pandas object from a pipe-separated table contained within a string.

    Example:
        | int_score | ext_score | automation_eligible |
        |           |           | True                |
        | 221.3     | 0         | False               |
        |           | 576       | True                |
        | 300       | 600       | True                |

    The leading and trailing pipes are optional, but if one is present, so must be the other.

    In PyCharm, the "Pipe Table Formatter" plugin has a "Format" feature that can be used to neatly format a table.
    """
    str_input = _prepare_pipe_separated_str(str_input)
    return pd.read_csv(pd.compat.StringIO(str_input), sep='|')

#1


241  

A simple way to do this is to use StringIO and pass that to the pandas.read_csv function. E.g:

一个简单的方法是使用StringIO并将其传递给熊猫。read_csv函数。例句:

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA = StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)

df = pd.read_csv(TESTDATA, sep=";")

#2


2  

A traditional variable-width CSV is unreadable for storing data as a string variable. Consider fixed-width pipe-separated data instead. Various IDEs and editors may have a plugin to format pipe-separated text into a neat table.

传统的可变宽度CSV不能作为字符串变量存储数据。考虑使用固定宽度的管道分隔的数据。各种ide和编辑器可能都有一个插件来将管道分隔的文本格式化为一个整洁的表。

The following works for me. To use it, store it into a file named pandas_util.py and call read_pipe_separated_str(str_input). An example is included in the function's docstring.

以下是我的工作。要使用它,将它存储到一个名为pandas_util的文件中。py和调用read_pipe_separated_str(str_input)。函数的docstring中包含一个示例。

import io
import re

import pandas as pd


def _prepare_pipe_separated_str(str_input):
    substitutions = [
        ('^ *', ''),  # Remove leading spaces
        (' *$', ''),  # Remove trailing spaces
        (r' *\| *', '|'),  # Remove spaces between columns
    ]
    if all(line.lstrip().startswith('|') and line.rstrip().endswith('|') for line in str_input.strip().split('\n')):
        substitutions.extend([
            (r'^\|', ''),  # Remove redundant leading delimiter
            (r'\|$', ''),  # Remove redundant trailing delimiter
        ])
    for pattern, replacement in substitutions:
        str_input = re.sub(pattern, replacement, str_input, flags=re.MULTILINE)
    return str_input


def read_pipe_separated_str(str_input):
    """Read a Pandas object from a pipe-separated table contained within a string.

    Example:
        | int_score | ext_score | automation_eligible |
        |           |           | True                |
        | 221.3     | 0         | False               |
        |           | 576       | True                |
        | 300       | 600       | True                |

    The leading and trailing pipes are optional, but if one is present, so must be the other.

    In PyCharm, the "Pipe Table Formatter" plugin has a "Format" feature that can be used to neatly format a table.
    """
    str_input = _prepare_pipe_separated_str(str_input)
    return pd.read_csv(pd.compat.StringIO(str_input), sep='|')