循环通过netcdf文件并运行计算 - Python或R.

时间:2022-02-20 05:33:29

This is my first time using netCDF and I'm trying to wrap my head around working with it.

这是我第一次使用netCDF,而我正试着用它来解决这个问题。

I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. Data was downloaded from here.

我有多个版本3 netcdf文件(NOAA NARR air.2m每日平均一年)。每个文件跨越1979年至2012年的一年。它们是349 x 277网格,分辨率约为32千米。数据是从这里下载的。

The dimension is time (hours since 1/1/1800) and my variable of interest is air. I need to calculate accumulated days with a temperature < 0. For example

维度是时间(自1800年1月1日以来的小时数),我感兴趣的变量是空气。我需要计算温度<0的累计天数。例如

    Day 1 = +4 degrees, accumulated days = 0
    Day 2 = -1 degrees, accumulated days = 1
    Day 3 = -2 degrees, accumulated days = 2
    Day 4 = -4 degrees, accumulated days = 3
    Day 5 = +2 degrees, accumulated days = 0
    Day 6 = -3 degrees, accumulated days = 1

I need to store this data in a new netcdf file. I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for.

我需要将这些数据存储在新的netcdf文件中。我熟悉Python,有些熟悉R.什么是循环每一天的最佳方法,检查前一天的值,并根据它,将值输出到具有完全相同的维度和变量的新netcdf文件...或者可能只是使用我正在寻找的输出将另一个变量添加到原始netcdf文件中。

Is it best to leave all the files separate or combine them? I combined them with ncrcat and it worked fine, but the file is 2.3gb.

最好将所有文件分开或组合起来吗?我将它们与ncrcat结合使用它工作正常,但文件是2.3gb。

Thanks for the input.

感谢您的投入。

My current progress in python:

我目前在python中取得的进展:

import numpy
import netCDF4
#Change my working DIR
f = netCDF4.Dataset('air7912.nc', 'r')
for a in f.variables:
  print(a)

#output =
     lat
     long
     x
     y
     Lambert_Conformal
     time
     time_bnds
     air

f.variables['air'][1, 1, 1]
#Output
     298.37473

To help me understand this better what type of data structure am I working with? Is ['air'] the key in the above example and [1,1,1] are also keys? to get the value of 298.37473. How can I then loop through [1,1,1]?

为了帮助我更好地理解我使用的数据结构类型是什么? ['air']是上例中的关键,[1,1,1]也是键吗?得到298.37473的价值。我怎么能循环通过[1,1,1]?

3 个解决方案

#1


10  

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:

您可以使用netCDF4中非常好的MFDataset功能将一堆文件视为一个聚合文件,而无需使用ncrcat。所以你的代码看起来像这样:

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0

pcolormesh(cold_days)
colorbar()

循环通过netcdf文件并运行计算 -  Python或R.

And here's one way to write the file (there might be easier ways):

这是编写文件的一种方法(可能有更简单的方法):

# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]

#  write the cold_days data
cold_days_v[:,:]=cold_days

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay: 循环通过netcdf文件并运行计算 -  Python或R. Also note that here I just downloaded two of the datasets for testing, so I used

如果我尝试在Unidata NetCDF-Java Tools-UI GUI中查看生成的文件,它似乎没问题:还请注意,我刚刚下载了两个用于测试的数据集,所以我使用了

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')

as an example. For all the data, you could use

举个例子。对于所有数据,您可以使用

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')

or

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')

#2


4  

Here is an R solution.

这是一个R解决方案。

infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)

outfile <- "data/air.colddays.nc"     

library(raster)

r <- raster::stack(infiles) 
r <- sum((r - 273.15) < 0)

plot(r)

循环通过netcdf文件并运行计算 -  Python或R.

#3


1  

I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:

我知道这个帖子从2013年开始已经很晚了,但我只是想指出,所接受的解决方案并不能解决所提出的确切问题。问题似乎是要求每个连续温度时间段的长度低于零(如果温度超过零,则在计数器重置的问题中注意),这对气候应用(例如农业)很重要,而接受的解决方案只给出总数温度低于零的一年中的天数。如果这真的是mkmitchell想要的(它已经被接受为答案)那么它可以从cdo中的命令行完成,而不必担心NETCDF输入/输出:

 cdo timsum -lec,273.15 in.nc out.nc

so a looped script would be:

所以一个循环的脚本将是:

files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
    # I use 273.15 as from the question seems T is in Kelvin 
    cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done 

If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:

如果您想要整个期间的总数,那么您可以使用_numdays文件而不是更小:

cdo cat *_numdays.nc total.nc 
cdo timsum total.nc total_below_zero.nc 

But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

但同样,问题似乎要求每个事件累积的天数,这是不同的,但不是由接受的答案提供。

#1


10  

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:

您可以使用netCDF4中非常好的MFDataset功能将一堆文件视为一个聚合文件,而无需使用ncrcat。所以你的代码看起来像这样:

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0

pcolormesh(cold_days)
colorbar()

循环通过netcdf文件并运行计算 -  Python或R.

And here's one way to write the file (there might be easier ways):

这是编写文件的一种方法(可能有更简单的方法):

# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]

#  write the cold_days data
cold_days_v[:,:]=cold_days

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay: 循环通过netcdf文件并运行计算 -  Python或R. Also note that here I just downloaded two of the datasets for testing, so I used

如果我尝试在Unidata NetCDF-Java Tools-UI GUI中查看生成的文件,它似乎没问题:还请注意,我刚刚下载了两个用于测试的数据集,所以我使用了

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')

as an example. For all the data, you could use

举个例子。对于所有数据,您可以使用

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')

or

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')

#2


4  

Here is an R solution.

这是一个R解决方案。

infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)

outfile <- "data/air.colddays.nc"     

library(raster)

r <- raster::stack(infiles) 
r <- sum((r - 273.15) < 0)

plot(r)

循环通过netcdf文件并运行计算 -  Python或R.

#3


1  

I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:

我知道这个帖子从2013年开始已经很晚了,但我只是想指出,所接受的解决方案并不能解决所提出的确切问题。问题似乎是要求每个连续温度时间段的长度低于零(如果温度超过零,则在计数器重置的问题中注意),这对气候应用(例如农业)很重要,而接受的解决方案只给出总数温度低于零的一年中的天数。如果这真的是mkmitchell想要的(它已经被接受为答案)那么它可以从cdo中的命令行完成,而不必担心NETCDF输入/输出:

 cdo timsum -lec,273.15 in.nc out.nc

so a looped script would be:

所以一个循环的脚本将是:

files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
    # I use 273.15 as from the question seems T is in Kelvin 
    cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done 

If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:

如果您想要整个期间的总数,那么您可以使用_numdays文件而不是更小:

cdo cat *_numdays.nc total.nc 
cdo timsum total.nc total_below_zero.nc 

But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

但同样,问题似乎要求每个事件累积的天数,这是不同的,但不是由接受的答案提供。