将char字符串转换为其基础数据类型

I have a string(char*), and i need to find its underlying datatype such as int, float, double, short, long, or just a character array containing alphabets with or with out digits(like varchar in SQL). For ex:

我有一个字符串(char *),我需要找到它的基础数据类型,如int,float,double,short,long,或者只是一个字符数组,包含带或不带数字的字母(如SQL中的varchar)。例如:

    char* str1 = "12312"
    char* str2 = "231.342"
    char* str3 = "234234243234"
    char* str4 = "4323434.2432342"
    char* str5 = "i contain only alphabets"

Given these strings, i need to find that the first string is of type int and typecast it to an int, and so on ex:

给定这些字符串,我需要找到第一个字符串是int类型并将其类型转换为int,依此类推:

int no1 = atoi(str1)
float no2 = atof(str2)
long no3 = atol(str3)
double no4 = strtod(str4)
char* varchar1 = strdup(str5)

Clarifying a bit more...

澄清一点......

I have a string and its contents could be alphabets and/or digits and/or special characters. Right now, I am able to parse string and

我有一个字符串,其内容可以是字母和/或数字和/或特殊字符。现在,我能够解析字符串和

Identify if it contains only digits,
Here i convert the string into short or int or long, based on best fit. ( How do i know if the string can be converted to an short int or long?)

确定它是否只包含数字,这里我根据最佳拟合将字符串转换为short或int或long。 (我怎么知道字符串是否可以转换为短整数或长整数?)

Only alphabets, leave it as a string.

只有字母,将其保留为字符串。

Digits with a single decimal point.
Here i need to convert the string into float or double ( Same question here)

带小数点的数字。这里我需要将字符串转换为float或double(这里也是同样的问题)

other. leave it as a string

其他。把它留作一个字符串

4 个解决方案

#1

In C (not in C++), I would use a combination of strtod/strol and max values from <limits.h> and <float.h>:

在C(不是在C ++中),我将使用strtod / strol和和中的最大值的组合:

#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>

/*    Now, we know the following values:
      INT_MAX, INT_MIN, SHRT_MAX, SHRT_MIN, CHAR_MAX, CHAR_MIN, etc.    */

typedef union tagMyUnion
{
   char TChar_ ; short TShort_ ; long TLong_ ; double TDouble_ ;
} MyUnion ;

typedef enum tagMyEnum
{
   TChar, TShort, TLong, TDouble, TNaN
} MyEnum ;

void whatIsTheValue(const char * string_, MyEnum * enum_, MyUnion * union_)
{
   char * endptr ;
   long lValue ;
   double dValue ;

   *enum_ = TNaN ;

   /* integer value */
   lValue = strtol(string_, &endptr, 10) ;

   if(*endptr == 0) /* It is an integer value ! */
   {
      if((lValue >= CHAR_MIN) && (lValue <= CHAR_MAX)) /* is it a char ? */
      {
         *enum_ = TChar ;
         union_->TChar_ = (char) lValue ;
      }
      else if((lValue >= SHRT_MIN) && (lValue <= SHRT_MAX)) /* is it a short ? */
      {
         *enum_ = TShort ;
         union_->TShort_ = (short) lValue ;
      }
      else if((lValue >= LONG_MIN) && (lValue <= LONG_MAX)) /* is it a long ? */
      {
         *enum_ = TLong ;
         union_->TLong_ = (long) lValue ;
      }

      return ;
   }

   /* real value */
   dValue = strtod(string_, &endptr) ;

   if(*endptr == 0) /* It is an real value ! */
   {
      if((dValue >= -DBL_MAX) && (dValue <= DBL_MAX)) /* is it a double ? */
      {
         *enum_ = TDouble ;
         union_->TDouble_ = (double) dValue ;
      }

      return ;
   }

   return ;
}

void studyValue(const char * string_)
{
   MyEnum enum_ ;
   MyUnion union_ ;

   whatIsTheValue(string_, &enum_, &union_) ;

   switch(enum_)
   {
      case TChar    : printf("It is a char : %li\n", (long) union_.TChar_) ; break ;
      case TShort   : printf("It is a short : %li\n", (long) union_.TShort_) ; break ;
      case TLong    : printf("It is a long : %li\n", (long) union_.TLong_) ; break ;
      case TDouble  : printf("It is a double : %f\n", (double) union_.TDouble_) ; break ;
      case TNaN     : printf("It is a not a number : %s\n", string_) ; break ;
      default       : printf("I really don't know : %s\n", string_) ; break ;
   }
}

int main(int argc, char **argv)
{
   studyValue("25") ;
   studyValue("-25") ;
   studyValue("30000") ;
   studyValue("-30000") ;
   studyValue("300000") ;
   studyValue("-300000") ;
   studyValue("25.5") ;
   studyValue("-25.5") ;
   studyValue("25555555.55555555") ;
   studyValue("-25555555.55555555") ;
   studyValue("Hello World") ;
   studyValue("555-55-55") ;

   return 0;
}

Which results in the following:

其结果如下:

[25] is a char : 25
[-25] is a char : -25
[30000] is a short : 30000
[-30000] is a short : -30000
[300000] is a long : 300000
[-300000] is a long : -300000
[25.5] is a double : 25.500000
[-25.5] is a double : -25.500000
[25555555.55555555] is a double : 25555555.555556
[-25555555.55555555] is a double : -25555555.555556
[Hello World] is a not a number
[555-55-55] is a not a number

Sorry for my rusty C.

对不起生锈的C.

:-)

So, in substance, you after the call of whatIsTheValue, you retrieve the type through the MyEnum enum, and then, according to the value in this enum, retrieve the right value, correctly typed, from the union MyUnion.

所以,实质上,在调用whatIsTheValue之后,您通过MyEnum枚举检索类型,然后根据此枚举中的值,从联合MyUnion中检索正确键入的正确值。

Note that finding if the number is a double or a float is a bit more complicated because the difference seems to be in the precision, i.e. is your number representable in a double, or in float. A most "decimal real" numbers are not exactly representable into a double, I would not bother.

请注意,查找数字是double还是float更复杂,因为差异似乎是精度,即您的数字可以表示为double或float。最“十进制”的数字并不完全可以表示为双,我不打扰。

Note, too, that there is a catch, as 25.0 could be both real and an integer number. My comparing "dValue == (double)(long)dValue", I guess you should know if is an integer, again, not taking into account the usual precision problems coming witb binary real numbers used by computers.

还要注意,有一个catch,因为25.0可以是实数和整数。我比较“dValue ==(double)(long)dValue”,我想你应该知道是否是一个整数,再次,不考虑计算机使用的二进制实数的常见精度问题。

#2

First, check whether the problem hasn't already been solved for you. It could be that your library functions for converting strings to numbers already do the checks that you need.

首先,检查问题是否尚未解决。可能是您的库函数将字符串转换为数字已经进行了所需的检查。

Failing that, you're going to need to do some pattern matching on strings, and that's what regular expressions are for!

如果做不到这一点,你将需要对字符串进行一些模式匹配,这就是正则表达式的用途!

E.g. if the string matches the regexp:

例如。如果字符串匹配正则表达式:

[+-]?\d+

then you know that it's an int or a long. Convert it to a long, and then check its size. If your long can fit into an int, convert it to an int.

然后你知道它是一个int或long。将其转换为long,然后检查其大小。如果你的long可以适合int,则将其转换为int。

You can do the same for floats and doubles, although the regular expression is a bit mroe complicated.

你可以对浮点数和双打执行相同的操作,尽管正则表达式有点复杂。

Watch out for awkward cases like the empty string, a lone decimal point, numbers too large for a long, and so on. You also need to decide whether you will allow exponent notation.

注意尴尬的情况,比如空字符串,一个小小的点,数字太长,等等。您还需要决定是否允许使用指数表示法。

#3

Try getting it into a long with sscanf. If that fails, try getting it into a double with sscanf. If that fails, it's a string. You can use the %n conversion to tell whether all of the input was consumed successfully. The constants in <limits.h> and <float.h> may help you decide if the numeric results can fit into narrower types on your platform. If this isn't homework, your destination types are probably externally defined - e.g. by a database schema - and the latter comment is irrelevant.

尝试用sscanf把它变成一个长的。如果失败,请尝试使用sscanf将其变为双精度数。如果失败,那就是一个字符串。您可以使用%n转换来判断是否已成功使用所有输入。和中的常量可以帮助您确定数值结果是否适合您平台上较窄的类型。如果这不是作业,您的目的地类型可能是外部定义的 - 例如通过数据库模式 - 后面的注释是无关紧要的。

#4

First of all, you should decide which representatins you want to recognize. For example, is 0xBAC0 an unsigned short expressed in hex? Same goes for 010 (in octal) and 1E-2 (for 0,01).

首先,您应该决定要识别哪些代表。例如,0xBAC0是以十六进制表示的无符号短整数吗? 010(八进制)和1E-2(对于0,01)也是如此。

Once you have decided on the represantation, you can use regular expressions to determine the general forms. For example:

一旦确定了represantation,就可以使用正则表达式来确定一般表单。例如:

-?\d*.\d*([eE]?[+-]?\d*.\d*)? is a floating point number (almost, it accept weird things like .e-. you should define the regex that is most appropriate for you)

- ?\ d * \ d *([EE] [+ - ]?\ d * \ d *?)。?是一个浮点数(几乎,它接受奇怪的东西,如.e-。你应该定义最适合你的正则表达式)

-?\d+ is an integer

- ?\ d +是一个整数

0x[0-9A-Fa-f]+ is an hex constant

0x [0-9A-Fa-f] +是十六进制常量

and so on. If you are not using a regex library you will have to write a small parser for those represantion from scratch.

等等。如果您没有使用正则表达式库,则必须从头开始为这些重新编写一个小的解析器。

Now you can convert it to the largest possible type (e.g. long long for integers, double for floating pointers) and then use the values in limits.h to see if the value would fit in a smaller type.

现在,您可以将其转换为最大可能类型(例如,long为long整数,double为浮点指针),然后使用limits.h中的值来查看该值是否适合较小的类型。

For example if the integer is less than SHRT_MAX you can assume it's a short.

例如,如果整数小于SHRT_MAX,则可以假设它是短整数。

You might also have to take arbitrary decisions, for example 54321 can only be an unsigned short but 12345 could be a signed short or an unsigned short.

您可能还需要采取任意决策,例如54321只能是无符号短路,但12345可以是带符号的短路或无符号短路。

#1