文章/答案/技术大牛

发布

社区首页 >问答首页 >Python将数据保存到PostgreSQL:数组值错误

问Python将数据保存到PostgreSQL:数组值错误
EN

Stack Overflow用户

提问于 2021-07-04 18:37:03

回答 2查看 53关注 0票数 1

我正在尝试学习如何将在pandas中创建的数据帧保存到postgresql数据库(托管在Azure上)。我打算从简单的虚拟数据开始：

data = {'a':  ['x', 'y'],
        'b': ['z', 'p'],
        'c': [3, 5]
        }

df = pd.DataFrame (data, columns = ['a','b','c'])

我发现了一个将df数据推送到psql表中的函数。它从定义连接开始：

def connect(params_dic):
    """ Connect to the PostgreSQL database server """
    conn = None
    try:
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        conn = psycopg2.connect(**params_dic)
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
        sys.exit(1) 
    print("Connection successful")
    return conn
conn = connect(param_dic)

*param_dic包含所有连接详细信息(user/pass/host/db)一旦建立连接，我将定义execute function：

def execute_many(conn, df, table):
    """
    Using cursor.executemany() to insert the dataframe
    """
    # Create a list of tupples from the dataframe values
    tuples = [tuple(x) for x in df.to_numpy()]
    # Comma-separated dataframe columns
    cols = ','.join(list(df.columns))
    # SQL quert to execute
    query  = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
    cursor = conn.cursor()
    try:
        cursor.executemany(query, tuples)
        conn.commit()
    except (Exception, psycopg2.DatabaseError) as error:
        print("Error: %s" % error)
        conn.rollback()
        cursor.close()
        return 1
    print("execute_many() done")
    cursor.close()

我对在DB中创建的psql表执行此函数：

execute_many(conn,df,"raw_data.test")

表raw_data.test由列a(char[])、b(char[])、c(数字)组成。当我运行代码时，我在控制台中得到以下信息：

Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
                                                ^
DETAIL:  Array value must start with "{" or dimension information.

我不知道如何解释它，因为df中的列都不是数组

df.dtypes
Out[185]: 
a    object
b    object
c     int64
dtype: object

有没有想法哪里出了问题，或者建议如何以更简单的方式在pSQL中保存df？我发现了很多使用sqlalchemy创建连接字符串的解决方案，如下所示：

conn_string = 'postgres://user:password@host/database'

但我不确定这在云数据库上是否有效-如果我试图用azure主机详细信息编辑这样的连接字符串，它不会起作用。

azure

python

pandas

postgresql

回答 2

Stack Overflow用户

发布于 2021-07-04 19:13:26

PostgreSQL中的字符串通常的数据类型是TEXT、VARCHAR(n)或CHAR(n)，使用圆括号；而不是使用方括号的CHAR[]。

我猜您希望列包含一个字符串，而CHAR[]是一个拼写错误；在这种情况下，您需要将表列重新创建(或迁移)为正确的类型-很可能是TEXT。

(如果固定长度的数据是真正的固定长度数据，则可以使用CHAR(n)；VARCHAR(n)主要具有历史意义。在大多数情况下，请使用TEXT.)

如果您确实想使列成为数组，则需要从

Alternately，传递该位置的列表。

票数 0

Stack Overflow用户

发布于 2021-07-04 20:27:48

考虑调整您的参数化方法，因为psycopg2支持一种更优化的方法来格式化SQL语句中的标识符，比如表或列名。

事实上，docs表明您当前方法不是最优的，并会带来安全风险：

# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)

请改用psycop2.sql模块：

from psycopg2 import sql 
...

query = (
    sql.SQL("insert into {} values (%s, %s, %s)") 
    .format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)

此外，为了在SQL中实现最佳实践，请始终在追加查询中包括列名，并且不要依赖于存储表的列顺序：

query = (
    sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)") 
    .format(
        sql.Identifier('table'), 
        sql.Identifier('col1'),
        sql.Identifier('col2'), 
        sql.Identifier('col3')
    )
)

最后，停止在所有Python代码(不仅仅是psycopg2)中使用%进行字符串格式化。从Python3开始，这个方法已经是de-emphasized but not deprecated 了！而应使用str.format (Python )或F-2.6+ (Python )。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68243843

复制

相似问题

问Python将数据保存到PostgreSQL:数组值错误
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python将数据保存到PostgreSQL:数组值错误EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python将数据保存到PostgreSQL:数组值错误
EN