首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >如何扩展/重用Python C扩展/API实现?

如何扩展/重用Python C扩展/API实现?
EN

Stack Overflow用户
提问于 2019-05-24 13:25:05
回答 1查看 135关注 0票数 0

问题是,现在我必须使用Posix C getline函数从文件中获取行,然后使用PyUnicode_DecodeUTF8将其转换为Python Unicode对象,并使用我的caching policy算法对其进行缓存。与Python内置的for line in file C实现相比,这个过程的性能要好得多。

如果我从代码中删除了PyUnicode_DecodeUTF8调用,那么我使用Posix getline的实现就会变得比使用Python内置的for line in file C实现更快。因此,如果我可以让Python直接给我一个Python Unicode字符串对象,而不是必须首先调用Posix C getline函数(然后才将其结果转换为Python Unicode对象),我的代码性能将几乎提高20% (从最大的23%),也就是说,它不会成为100%等同于for line in file的性能,因为我通过缓存东西做了一些工作,但是这个开销是最小的。

例如,我想在我的代码中使用_textiowrapper_readline()函数,如下所示:

代码语言:javascript
复制
#include <Python.h>
#include <textio.c.h> // C Python file defininig:
                      // _textiowrapper_readline(),
                      // CHECK_ATTACHED(),
                      // PyUnicode_READY(), etc

typedef struct
{
    PyObject_HEAD
}
PyMymoduleExtendingPython;

static PyObject* 
PyMymoduleExtendingPython_iternext(PyMymoduleExtendingPython* self, PyObject* args)
{
    PyObject *line;
    CHECK_ATTACHED(self);
    line = _textiowrapper_readline(self, -1); // <- function from `textio.c`

    if (line == NULL || PyUnicode_READY(line) == -1)
        return NULL;

    if (PyUnicode_GET_LENGTH(line) == 0) {
        /* Reached EOF or would have blocked */
        Py_DECREF(line);
        Py_CLEAR(self->snapshot);
        self->telling = self->seekable;
        return NULL;
    }
代码语言:javascript
复制
    return line;
}

// create my module
PyMODINIT_FUNC PyInit_mymodule_extending_python_api(void)
{
    PyObject* mymodule;
    PyMymoduleExtendingPython.tp_iternext = 
           (iternextfunc) PyMymoduleExtendingPython_iternext;

    Py_INCREF( &PyMymoduleExtendingPython );
    PyModule_AddObject( mymodule, "FastFile", (PyObject*) &PyMymoduleExtendingPython );
    return mymodule;
}

我如何包含来自C Python的textio实现,并在我自己的Python C扩展/API上重用它的代码?

正如我在上一个问题How to improve Python C Extensions file line reading?中所提到的,用来读取行的Python内置方法比我自己用C或C++标准方法从文件中获取行要快得多。

this answer上,有人建议我通过读取8KB的数据块来重新实现Python算法,然后调用PyUnicode_DecodeUTF8来解码它们,而不是在我读取的每一行都调用PyUnicode_DecodeUTF8

但是,我不需要重写所有已经编写/完成/准备读取行的C Python代码,我只需调用它的"getline“函数_textiowrapper_readline()直接将行作为Python Unicode对象获取,然后缓存/使用,就像我已经对从Posix C getline函数获得的行所做的那样(并传递给PyUnicode_DecodeUTF8()将它们解码为Python Unicode对象)。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-05-27 05:26:30

我没有设法直接导入C API (扩展)函数,但我使用Python导入了io模块,该模块具有作为io.open()的全局内置函数open的链接/引用。

代码语言:javascript
复制
bool hasfinished;
const char* filepath;
long long int linecount;
std::deque<PyObject*> linecache;

PyObject* iomodule;
PyObject* openfile;
PyObject* fileiterator;

FastFile(const char* filepath) : hasfinished(false), filepath(filepath), linecount(0) {
    iomodule = PyImport_ImportModule( "io" );

    if( iomodule == NULL ) {
        std::cerr << "ERROR: FastFile failed to import the io module '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
    PyObject* openfunction = PyObject_GetAttrString( iomodule, "open" );
代码语言:javascript
复制
    if( openfunction == NULL ) {
        std::cerr << "ERROR: FastFile failed get the io module open function '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
    openfile = PyObject_CallFunction( openfunction, "s", filepath, 
            "s", "r", "i", -1, "s", "UTF8", "s", "replace" );

    PyObject* iterfunction = PyObject_GetAttrString( openfile, "__iter__" );
    Py_DECREF( openfunction );

    if( iterfunction == NULL ) {
        std::cerr << "ERROR: FastFile failed get the io module iterator function '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
    PyObject* openfileresult = PyObject_CallObject( iterfunction, NULL );
    Py_DECREF( iterfunction );
代码语言:javascript
复制
    if( openfileresult == NULL ) {
        std::cerr << "ERROR: FastFile failed get the io module iterator object '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
    fileiterator = PyObject_GetAttrString( openfile, "__next__" );
    Py_DECREF( openfileresult );

    if( fileiterator == NULL ) {
        std::cerr << "ERROR: FastFile failed get the io module iterator object '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
}

~FastFile() {
    this->close();
    Py_XDECREF( iomodule );
    Py_XDECREF( openfile );
    Py_XDECREF( fileiterator );

    for( PyObject* pyobject : linecache ) {
        Py_DECREF( pyobject );
    }
}

void close() {
    PyObject* closefunction = PyObject_GetAttrString( openfile, "close" );
代码语言:javascript
复制
    if( closefunction == NULL ) {
        std::cerr << "ERROR: FastFile failed get the close file function for '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
    PyObject* closefileresult = PyObject_CallObject( closefunction, NULL );
    Py_DECREF( closefunction );

    if( closefileresult == NULL ) {
        std::cerr << "ERROR: FastFile failed close open file '"
                << filepath << "')!" << std::endl;
        PyErr_Print();
        return;
    }
    Py_DECREF( closefileresult );
}

bool _getline() {
    // Fix StopIteration being raised multiple times because 
    // _getlines is called multiple times
    if( hasfinished ) { return false; }
    PyObject* readline = PyObject_CallObject( fileiterator, NULL );

    if( readline != NULL ) {
        linecount += 1;
        linecache.push_back( readline );
        return true;
    }

    // PyErr_Print();
    PyErr_Clear();
    hasfinished = true;
    return false;
}

在使用Visual Studio Compiler进行编译时,使用this code时,它具有以下性能

代码语言:javascript
复制
print( 'fastfile_time %.2f%%, python_time %.2f%%' % ( 
        fastfile_time/python_time, python_time/fastfile_time ), flush=True )
代码语言:javascript
复制
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.985254
FastFile timedifference 0:00:01.084283
fastfile_time 1.10%, python_time 0.91% = 0.09%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.979861
FastFile timedifference 0:00:01.073879
fastfile_time 1.10%, python_time 0.91% = 0.09%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.990369
FastFile timedifference 0:00:01.086416
fastfile_time 1.10%, python_time 0.91% = 0.09%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.975223
FastFile timedifference 0:00:01.077857
fastfile_time 1.11%, python_time 0.90% = 0.10%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.988327
FastFile timedifference 0:00:01.085866
fastfile_time 1.10%, python_time 0.91% = 0.09%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.971848
FastFile timedifference 0:00:01.087894
fastfile_time 1.12%, python_time 0.89% = 0.11%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.968116
FastFile timedifference 0:00:01.079976
fastfile_time 1.12%, python_time 0.90% = 0.10%
$ python3 fastfileperformance.py
Python   timedifference 0:00:00.980856
FastFile timedifference 0:00:01.068325
fastfile_time 1.09%, python_time 0.92% = 0.08%

但是当用g++编译它时,它的性能是这样的:

代码语言:javascript
复制
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.703964
FastFile timedifference 0:00:00.813478
fastfile_time 1.16%, python_time 0.87% = 0.13%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.703432
FastFile timedifference 0:00:00.809531
fastfile_time 1.15%, python_time 0.87% = 0.13%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.705319
FastFile timedifference 0:00:00.814130
fastfile_time 1.15%, python_time 0.87% = 0.13%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.711852
FastFile timedifference 0:00:00.837132
fastfile_time 1.18%, python_time 0.85% = 0.15%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.695033
FastFile timedifference 0:00:00.800901
fastfile_time 1.15%, python_time 0.87% = 0.13%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.694661
FastFile timedifference 0:00:00.796754
fastfile_time 1.15%, python_time 0.87% = 0.13%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.699377
FastFile timedifference 0:00:00.816715
fastfile_time 1.17%, python_time 0.86% = 0.14%
$ /bin/python3.6 fastfileperformance.py
Python   timedifference 0:00:00.699229
FastFile timedifference 0:00:00.818774
fastfile_time 1.17%, python_time 0.85% = 0.15%
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56286313

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档