我有这个netcdf的天气数据(需要postgresql摄取的数千个数据中的一个)。我目前能够以每个频段20-23秒的速度将每个频段插入到启用postgis的表中。(对于月度数据,还有一些我尚未测试的每日数据。)
我听说过使用COPY FROM、删除gid、使用ssds等不同的方法来加速此过程。但我是python的新手,不知道如何将netcdf数据存储到可以使用COPY FROM的地方,也不知道最好的路径是什么。
如果任何人有任何关于如何加快速度的其他想法,请分享!
以下是摄取脚本
import netCDF4, psycopg2, time
# Establish connection
db1 = psycopg2.connect("host=localhost dbname=postgis_test user=********** password=********")
cur = db1.cursor()
# Create Table in postgis
print(str(time.ctime()) + " CREATING TABLE")
try:
cur.execute("DROP TABLE IF EXISTS table_name;")
db1.commit()
cur.execute(
"CREATE TABLE table_name (gid serial PRIMARY KEY not null, thedate DATE, thepoint geometry, lon decimal, lat decimal, thevalue decimal);")
db1.commit()
print("TABLE CREATED")
except:
print(psycopg2.DatabaseError)
print("TABLE CREATION FAILED")
rawvalue_nc_file = 'netcdf_file.nc'
nc = netCDF4.Dataset(rawvalue_nc_file, mode='r')
nc.variables.keys()
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:], time_var.units)
newtime = [fdate.strftime('%Y-%m-%d') for fdate in dtime]
rawvalue = nc.variables['tx_max'][:]
lathash = {}
lonhash = {}
entry1 = 0
entry2 = 0
lattemp = nc.variables['lat'][:].tolist()
for entry1 in range(lat.size):
lathash[entry1] = lattemp[entry1]
lontemp = nc.variables['lon'][:].tolist()
for entry2 in range(lon.size):
lonhash[entry2] = lontemp[entry2]
for timestep in range(dtime.size):
print(str(time.ctime()) + " " + str(timestep + 1) + "/180")
for _lon in range(lon.size):
for _lat in range(lat.size):
latitude = round(lathash[_lat], 6)
longitude = round(lonhash[_lon], 6)
thedate = newtime[timestep]
thevalue = round(float(rawvalue.data[timestep, _lat, _lon] - 273.15), 3)
if (thevalue > -100):
cur.execute("INSERT INTO table_name (thedate, thepoint, thevalue) VALUES (%s, ST_MakePoint(%s,%s,0), %s)",(thedate, longitude, latitude, thevalue))
db1.commit()
cur.close()
db1.close()
print(" Done!")发布于 2019-01-22 03:48:22
如果您确定大部分时间都花在PostgreSQL上,而不是在您自己的任何其他代码中,那么您可能想看看fast execution helpers,在本例中就是cur.execute_values()。
此外,您可能希望确保您处于事务中,这样数据库就不会退回到自动提交模式。(“如果不发出BEGIN命令,则每个单独的语句都有一个隐式的BEGIN和COMMIT (如果成功))。”)
像这样的东西可以做到这一点--尽管还没有经过测试。
for timestep in range(dtime.size):
print(str(time.ctime()) + " " + str(timestep + 1) + "/180")
values = []
cur.execute("BEGIN")
for _lon in range(lon.size):
for _lat in range(lat.size):
latitude = round(lathash[_lat], 6)
longitude = round(lonhash[_lon], 6)
thedate = newtime[timestep]
thevalue = round(
float(rawvalue.data[timestep, _lat, _lon] - 273.15), 3
)
if thevalue > -100:
values.append((thedate, longitude, latitude, thevalue))
psycopg2.extras.execute_values(
cur,
"INSERT INTO table_name (thedate, thepoint, thevalue) VALUES %s",
values,
template="(%s, ST_MakePoint(%s,%s,0), %s)"
)
db1.commit()https://stackoverflow.com/questions/54296801
复制相似问题