python火焰(熊猫)不能将用户dtype转换为

时间:2022-10-10 23:03:23

I want to read the uk.txt file from a uk nga geonames download using python blaze and then odo to insert it into a Postgresql db.

我想读英国。来自英国nga geonames下载的txt文件,使用python blaze将其插入到Postgresql db中。

Code is:

代码是:

import blaze as bz
from odo import odo

dataPath = 'uk.txt'
myData = bz.Data(dataPath, sep='\t')
out = odo(myData, 'postgresql://postgres:postgres@localhost:5432/blaze_test::uk_geonames')

I get the error ValueError: cannot safely convert passed user dtype of <i8 for object dtyped data in column 0 that I think I understand as meaning "a datatype cant be converted to insert into the db"

我得到了error ValueError:无法安全地将传入的用户dtype ( )转换为第0列中的对象d类型化数据,我认为这意味着“不能将数据类型转换为插入到db中”

Should I force dtype to equal something? How would I fix this?

我应该强制dtype等于什么吗?我该怎么解决这个问题呢?

A sample input from the file is:

来自该文件的示例输入是:

RC  UFI UNI LAT LONG    DMS_LAT DMS_LONG    MGRS    JOG FC  DSG PC  CC1 ADM1    POP ELEV    CC2 NT  LC  SHORT_FORM  GENERIC SORT_NAME_RO    FULL_NAME_RO    FULL_NAME_ND_RO SORT_NAME_RG    FULL_NAME_RG    FULL_NAME_ND_RG NOTE    MODIFY_DATE DISPLAY NAME_RANK   NAME_LINK   TRANSL_CD   NM_MODIFY_DATE

1   380952  475802  54.086111   -6.655556   540510  -63920  29UPV5334795644 NN29-06 H   STM     EI,UK               EI,UK   N       Clarebane       CLAREBANERIVER  Clarebane River Clarebane River CLAREBANERIVER  Clarebane River Clarebane River     2014-06-27  1,2,3   2

1 个解决方案

#1


5  

For some reason, the header isn't being inferred correctly. You can pass in the infer_header keyword argument like so:

由于某些原因,标题没有被正确地推断出来。您可以传入infer_header关键字参数如下:

In [12]: from blaze import Data

In [13]: from odo import CSV, odo

In [14]: d = Data(CSV('uk.txt', sep='\t', has_header=True))

In [15]: d.head(5)
Out[15]:
   RC     UFI     UNI        LAT      LONG  DMS_LAT  DMS_LONG  \
0   1  380952  475802  54.086111 -6.655556   540510    -63920
1   1  380952  475801  54.086111 -6.655556   540510    -63920
2   1  380954  475805  54.104722 -6.648889   540617    -63856
3   1  380955  475806  54.098056 -6.644167   540553    -63839
4   1  380958  475810  54.040556 -6.614444   540226    -63652

              MGRS      JOG FC      ...          SORT_NAME_RG  \
0  29UPV5334795644  NN29-06  H      ...        CLAREBANERIVER
1  29UPV5334795644  NN29-06  H      ...             CLAREBANE
2  29UPV5371497729  NN29-06  H      ...           ALINA LOUGH
3  29UPV5404796997  NN29-06  H      ...          CORLISSLOUGH
4  29UPV5620690667  NN29-06  H      ...          DRUMBOYLOUGH

      FULL_NAME_RG  FULL_NAME_ND_RG NOTE MODIFY_DATE DISPLAY NAME_RANK  \
0  Clarebane River  Clarebane River  NaN  2014-06-27   1,2,3         2
1        Clarebane        Clarebane  NaN  2014-06-27   1,2,3         1
2     Alina, Lough     Alina, Lough  NaN  2014-06-27   1,2,3         1
3    Corliss Lough    Corliss Lough  NaN  2014-06-27   1,2,3         1
4    Drumboy Lough    Drumboy Lough  NaN  2014-06-27   1,2,3         1

  NAME_LINK TRANSL_CD NM_MODIFY_DATE
0       NaN       NaN     2014-06-27
1       NaN       NaN     2014-06-27
2       NaN       NaN     2014-06-27
3       NaN       NaN     2014-06-27
4       NaN       NaN     2014-06-27

[5 rows x 34 columns]

After that, simply odo it into the desired table:

然后,简单地将它放入所需的表格中:

In [16]: t = odo(d, 'postgresql://localhost::uk')

In [17]: uk = Data(t)

In [19]: uk.head(5)
Out[19]:
   RC     UFI     UNI        LAT      LONG  DMS_LAT  DMS_LONG  \
0   1  380952  475802  54.086111 -6.655556   540510    -63920
1   1  380952  475801  54.086111 -6.655556   540510    -63920
2   1  380954  475805  54.104722 -6.648889   540617    -63856
3   1  380955  475806  54.098056 -6.644167   540553    -63839
4   1  380958  475810  54.040556 -6.614444   540226    -63652

              MGRS      JOG FC      ...          SORT_NAME_RG  \
0  29UPV5334795644  NN29-06  H      ...        CLAREBANERIVER
1  29UPV5334795644  NN29-06  H      ...             CLAREBANE
2  29UPV5371497729  NN29-06  H      ...           ALINA LOUGH
3  29UPV5404796997  NN29-06  H      ...          CORLISSLOUGH
4  29UPV5620690667  NN29-06  H      ...          DRUMBOYLOUGH

      FULL_NAME_RG  FULL_NAME_ND_RG NOTE MODIFY_DATE DISPLAY NAME_RANK  \
0  Clarebane River  Clarebane River  NaN  2014-06-27   1,2,3         2
1        Clarebane        Clarebane  NaN  2014-06-27   1,2,3         1
2     Alina, Lough     Alina, Lough  NaN  2014-06-27   1,2,3         1
3    Corliss Lough    Corliss Lough  NaN  2014-06-27   1,2,3         1
4    Drumboy Lough    Drumboy Lough  NaN  2014-06-27   1,2,3         1

  NAME_LINK TRANSL_CD NM_MODIFY_DATE
0       NaN       NaN     2014-06-27
1       NaN       NaN     2014-06-27
2       NaN       NaN     2014-06-27
3       NaN       NaN     2014-06-27
4       NaN       NaN     2014-06-27

[5 rows x 34 columns]

#1


5  

For some reason, the header isn't being inferred correctly. You can pass in the infer_header keyword argument like so:

由于某些原因,标题没有被正确地推断出来。您可以传入infer_header关键字参数如下:

In [12]: from blaze import Data

In [13]: from odo import CSV, odo

In [14]: d = Data(CSV('uk.txt', sep='\t', has_header=True))

In [15]: d.head(5)
Out[15]:
   RC     UFI     UNI        LAT      LONG  DMS_LAT  DMS_LONG  \
0   1  380952  475802  54.086111 -6.655556   540510    -63920
1   1  380952  475801  54.086111 -6.655556   540510    -63920
2   1  380954  475805  54.104722 -6.648889   540617    -63856
3   1  380955  475806  54.098056 -6.644167   540553    -63839
4   1  380958  475810  54.040556 -6.614444   540226    -63652

              MGRS      JOG FC      ...          SORT_NAME_RG  \
0  29UPV5334795644  NN29-06  H      ...        CLAREBANERIVER
1  29UPV5334795644  NN29-06  H      ...             CLAREBANE
2  29UPV5371497729  NN29-06  H      ...           ALINA LOUGH
3  29UPV5404796997  NN29-06  H      ...          CORLISSLOUGH
4  29UPV5620690667  NN29-06  H      ...          DRUMBOYLOUGH

      FULL_NAME_RG  FULL_NAME_ND_RG NOTE MODIFY_DATE DISPLAY NAME_RANK  \
0  Clarebane River  Clarebane River  NaN  2014-06-27   1,2,3         2
1        Clarebane        Clarebane  NaN  2014-06-27   1,2,3         1
2     Alina, Lough     Alina, Lough  NaN  2014-06-27   1,2,3         1
3    Corliss Lough    Corliss Lough  NaN  2014-06-27   1,2,3         1
4    Drumboy Lough    Drumboy Lough  NaN  2014-06-27   1,2,3         1

  NAME_LINK TRANSL_CD NM_MODIFY_DATE
0       NaN       NaN     2014-06-27
1       NaN       NaN     2014-06-27
2       NaN       NaN     2014-06-27
3       NaN       NaN     2014-06-27
4       NaN       NaN     2014-06-27

[5 rows x 34 columns]

After that, simply odo it into the desired table:

然后,简单地将它放入所需的表格中:

In [16]: t = odo(d, 'postgresql://localhost::uk')

In [17]: uk = Data(t)

In [19]: uk.head(5)
Out[19]:
   RC     UFI     UNI        LAT      LONG  DMS_LAT  DMS_LONG  \
0   1  380952  475802  54.086111 -6.655556   540510    -63920
1   1  380952  475801  54.086111 -6.655556   540510    -63920
2   1  380954  475805  54.104722 -6.648889   540617    -63856
3   1  380955  475806  54.098056 -6.644167   540553    -63839
4   1  380958  475810  54.040556 -6.614444   540226    -63652

              MGRS      JOG FC      ...          SORT_NAME_RG  \
0  29UPV5334795644  NN29-06  H      ...        CLAREBANERIVER
1  29UPV5334795644  NN29-06  H      ...             CLAREBANE
2  29UPV5371497729  NN29-06  H      ...           ALINA LOUGH
3  29UPV5404796997  NN29-06  H      ...          CORLISSLOUGH
4  29UPV5620690667  NN29-06  H      ...          DRUMBOYLOUGH

      FULL_NAME_RG  FULL_NAME_ND_RG NOTE MODIFY_DATE DISPLAY NAME_RANK  \
0  Clarebane River  Clarebane River  NaN  2014-06-27   1,2,3         2
1        Clarebane        Clarebane  NaN  2014-06-27   1,2,3         1
2     Alina, Lough     Alina, Lough  NaN  2014-06-27   1,2,3         1
3    Corliss Lough    Corliss Lough  NaN  2014-06-27   1,2,3         1
4    Drumboy Lough    Drumboy Lough  NaN  2014-06-27   1,2,3         1

  NAME_LINK TRANSL_CD NM_MODIFY_DATE
0       NaN       NaN     2014-06-27
1       NaN       NaN     2014-06-27
2       NaN       NaN     2014-06-27
3       NaN       NaN     2014-06-27
4       NaN       NaN     2014-06-27

[5 rows x 34 columns]