有没有办法分离包含数据和时间列的数据文件中的列

时间:2021-01-19 14:52:23

So I'm importing a Data file that contains 5 columns such as

所以我正在导入一个包含5列的数据文件,例如

1992-01-25T00:00:30.000Z|0.718|-0.758|-0.429|1.129

I know that scanf() allows you to specify the data type it is scanning, like in this case it would be %s and %f. But my problems is for the first Column I would like to scanf it as an number or split that column into two columns like so 1992-01-25|00:00:30.000. Is using fgets() another alternative?

我知道scanf()允许你指定它正在扫描的数据类型,就像在这种情况下它将是%s和%f。但我的问题是第一列我想将它作为数字扫描或将该列分成两列,如1992-01-25 | 00:00:30.000。使用fgets()另一种选择吗?

Is there a way I can do this efficiently because I'm storing each column into arrays and then I have a search function for each Array and it will be a pain to search an Array containing Strings.

有没有办法可以有效地做到这一点,因为我将每个列存储到数组中,然后我为每个数组都有一个搜索函数,搜索包含字符串的数组会很麻烦。

2 个解决方案

#1


1  

You can use fgets, strtok, and sscanf to parse the file.

您可以使用fgets,strtok和sscanf来解析文件。

  • fgets reads a line from the file
  • fgets从文件中读取一行

  • strtok breaks the line into substrings using the | as a separator
  • strtok使用|将行拆分为子串作为分隔符

  • sscanf parses the substrings to convert each substring into numbers
  • sscanf解析子字符串以将每个子字符串转换为数字

In the sample code below, the date fields are combined into a single integer. For example,
"1992-01-25" becomes the decimal number 19920125. The time fields are combined so that the final result represents the number of milliseconds from midnight.

在下面的示例代码中,日期字段组合成一个整数。例如,“1992-01-25”变为十进制数19920125.时间字段被组合,以便最终结果表示从午夜开始的毫秒数。

bool parseFile(FILE *fpin)
{
    char line[256];
    while (fgets(line, sizeof(line), fpin) != NULL)
    {
        // get the date/time portion of the line
        char *dateToken = strtok(line, "|");

        // extract the floating point values from the line
        float value[4];
        for (int i = 0; i < 4; i++)
        {
            char *token = strtok(NULL, "|");
            if (token == NULL)
                return false;
            if (sscanf(token, "%f", &value[i]) != 1)
                return false;
        }

        // extract the components of the date and time
        int year, month, day, hour, minute, second, millisec;
        char t, z;
        sscanf(dateToken, "%d-%d-%d%c%d:%d:%d.%d%c",
               &year, &month, &day, &t,
               &hour, &minute, &second, &millisec, &z);

        // combine the components into a single number for the date and time
        int date = year * 10000 + month * 100 + day;
        int time = hour * 3600000 + minute * 60000 + second * 1000 + millisec;

        // display the parsed information
        printf("%d %d", date, time);
        for (int i = 0; i < 4; i++)
            printf(" %6.3f", value[i]);
        printf("\n");
    }

    return true;    // the file was successfully parsed
}

#2


1  

If I were you, I'd make a structure that holds the table of data after parsing the file.

如果我是你,我会在解析文件后创建一个包含数据表的结构。

typedef struct {
   int num_rows;
   char table[MAX_NUM_ROWS][MAX_NUM_COLS][MAX_COL_LEN];
} YOUR_DATA;

You should use fgets to parse that file, line by line. First tokenize the line on the 'T' and from then on tokenize it on '|', something like this

您应该使用fgets逐行解析该文件。首先将'T'上的线标记,然后在'|'上标记它,就像这样

FILE your_fp;
YOUR_DATA yourTable;
char line[MAX_ROW_LEN] = {0};
char *ptr = NULL, field = NULL;
int row = 0, col = 0;
if ((your_fp = fopen("datafile.txt", "r")) == NULL) {
   //error
}
while(fgets(line, sizeof(line), your_fp) != NULL) {
   ptr = line;
   col = 0;
   if ((field = strsep(&ptr, "T")) != NULL) {
      snprintf(yourTable.table[row][col], MAX_COL_LEN, "%s", field);
      col++;
   }
   while ((field = strsep(&ptr, "|")) != NULL) {
      snprintf(yourTable.table[row][col], MAX_COL_LEN, "%s", field);
      col++;
   }
   row++
}

Probably want to keep track of the number of rows in the table etc. Than you can worry about trying to convert them into the correct data type.

可能想要跟踪表格中的行数等。您可以担心尝试将它们转换为正确的数据类型。

#1


1  

You can use fgets, strtok, and sscanf to parse the file.

您可以使用fgets,strtok和sscanf来解析文件。

  • fgets reads a line from the file
  • fgets从文件中读取一行

  • strtok breaks the line into substrings using the | as a separator
  • strtok使用|将行拆分为子串作为分隔符

  • sscanf parses the substrings to convert each substring into numbers
  • sscanf解析子字符串以将每个子字符串转换为数字

In the sample code below, the date fields are combined into a single integer. For example,
"1992-01-25" becomes the decimal number 19920125. The time fields are combined so that the final result represents the number of milliseconds from midnight.

在下面的示例代码中,日期字段组合成一个整数。例如,“1992-01-25”变为十进制数19920125.时间字段被组合,以便最终结果表示从午夜开始的毫秒数。

bool parseFile(FILE *fpin)
{
    char line[256];
    while (fgets(line, sizeof(line), fpin) != NULL)
    {
        // get the date/time portion of the line
        char *dateToken = strtok(line, "|");

        // extract the floating point values from the line
        float value[4];
        for (int i = 0; i < 4; i++)
        {
            char *token = strtok(NULL, "|");
            if (token == NULL)
                return false;
            if (sscanf(token, "%f", &value[i]) != 1)
                return false;
        }

        // extract the components of the date and time
        int year, month, day, hour, minute, second, millisec;
        char t, z;
        sscanf(dateToken, "%d-%d-%d%c%d:%d:%d.%d%c",
               &year, &month, &day, &t,
               &hour, &minute, &second, &millisec, &z);

        // combine the components into a single number for the date and time
        int date = year * 10000 + month * 100 + day;
        int time = hour * 3600000 + minute * 60000 + second * 1000 + millisec;

        // display the parsed information
        printf("%d %d", date, time);
        for (int i = 0; i < 4; i++)
            printf(" %6.3f", value[i]);
        printf("\n");
    }

    return true;    // the file was successfully parsed
}

#2


1  

If I were you, I'd make a structure that holds the table of data after parsing the file.

如果我是你,我会在解析文件后创建一个包含数据表的结构。

typedef struct {
   int num_rows;
   char table[MAX_NUM_ROWS][MAX_NUM_COLS][MAX_COL_LEN];
} YOUR_DATA;

You should use fgets to parse that file, line by line. First tokenize the line on the 'T' and from then on tokenize it on '|', something like this

您应该使用fgets逐行解析该文件。首先将'T'上的线标记,然后在'|'上标记它,就像这样

FILE your_fp;
YOUR_DATA yourTable;
char line[MAX_ROW_LEN] = {0};
char *ptr = NULL, field = NULL;
int row = 0, col = 0;
if ((your_fp = fopen("datafile.txt", "r")) == NULL) {
   //error
}
while(fgets(line, sizeof(line), your_fp) != NULL) {
   ptr = line;
   col = 0;
   if ((field = strsep(&ptr, "T")) != NULL) {
      snprintf(yourTable.table[row][col], MAX_COL_LEN, "%s", field);
      col++;
   }
   while ((field = strsep(&ptr, "|")) != NULL) {
      snprintf(yourTable.table[row][col], MAX_COL_LEN, "%s", field);
      col++;
   }
   row++
}

Probably want to keep track of the number of rows in the table etc. Than you can worry about trying to convert them into the correct data type.

可能想要跟踪表格中的行数等。您可以担心尝试将它们转换为正确的数据类型。