文章/答案/技术大牛

发布

社区首页 >问答首页 >rte_eth_tx_burst()描述符/mbuf管理保证与空闲阈值

问rte_eth_tx_burst()描述符/mbuf管理保证与空闲阈值
EN

Stack Overflow用户

提问于 2021-09-13 21:20:49

回答 3查看 384关注 0票数 1

rte_eth_tx_burst()函数的文档说明如下：

 * It is the responsibility of the rte_eth_tx_burst() function to
 * transparently free the memory buffers of packets previously sent.
 * This feature is driven by the *tx_free_thresh* value supplied to the
 * rte_eth_dev_configure() function at device configuration time.
 * When the number of free TX descriptors drops below this threshold, the
 * rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf*  buffers
 * of those packets whose transmission was effectively completed.

我有一个小的测试程序，在那里这似乎不成立(当在vfio X553 1GbE网卡上使用ixgbe驱动程序时)。

所以我的程序设置了一个传输队列，如下所示：

uint16_t tx_ring_size = 1024-32;
rte_eth_dev_configure(port_id, 0, 1, &port_conf);
r = rte_eth_dev_adjust_nb_rx_tx_desc(port_id, &rx_ring_size, &tx_ring_size);
struct rte_eth_txconf txconf = dev_info.default_txconf;
r = rte_eth_tx_queue_setup(port_id, 0, tx_ring_size,
        rte_eth_dev_socket_id(port_id), &txconf);

传输mbuf数据包池的创建方式如下：

struct rte_mempool *pkt_pool = rte_pktmbuf_pool_create("pkt_pool", 1023, 341, 0,
        RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());

这样，在发送数据包时，我宁愿在数据包缓冲区用完之前用完TX描述符。(程序只生成一个数据段的数据包)

我的期望是，当我在循环中调用rte_eth_tx_burst() (一个接一个地发送数据包)时，它永远不会失败，因为它透明地释放已经发送的数据包的mbuf。

然而，这并没有发生。

我基本上有一个像这样的传输循环：

for (unsigned i = 0; i < 2048; ++i) {
    struct rte_mbuf *pkt = rte_pktmbuf_alloc(args.pkt_pool);
    // error check, prepare packet etc.

    uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
    // error check etc.
}

在发送了1086个数据包(每个数据包大约300字节)后，rte_eth_tx_burst()返回0。

我使用默认的阈值，即查询的值是(来自dev_info.default_txconf)：

tx thresh   : 32
tx rs thresh: 32
wthresh     : 0

所以现在的主要问题是：rte_eth_tx_burst()尝试释放mbuf缓冲区(以及描述符)的难度有多大？

我的意思是，它可以忙于循环，直到之前提供的mbufs的传输完成。

或者，它可以快速检查一些描述符是否再次空闲。但如果没有，那就放弃吧。

相关问题:默认threshold values是否适用于此用例？

所以我这样解决这个问题：

for (;;) {
    uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
    if (l == 1) {
        break;
    } else {
        RTE_LOG(ERR, USER1, "cannot send packet\n");
        int r = rte_eth_tx_done_cleanup(args.port_id, 0, 256);
        if (r < 0) {
             rte_panic("%u. cannot cleanup tx descs: %s\n", i, rte_strerror(-r));
        }
        RTE_LOG(WARNING, USER1, "%u. cleaned up %d descriptors ...\n", i, r);
    }
}

这样我就得到了如下的输出：

USER1: cannot send packet
USER1: 1086. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1118. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1150. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 1182. cleaned up 0 descriptors ...
[..]

USER1: cannot send packet
USER1: 1950. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1982. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 2014. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 2014. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 2046. cleaned up 32 descriptors ...

这意味着它最多释放32个这样的描述符。它并不总是成功，但下一个rte_eth_tx_burst()成功了，释放了一些空间。

附带问题:有没有更好的、更适合dpdk的方式来处理mbufs的回收？

当我更改代码，导致在耗尽传输描述符之前耗尽mbuf时(例如，使用1024个描述符创建的tx环，mbuf池仍具有1023个元素)，我必须像这样更改分配部分：

struct rte_mbuf *pkt;
do {
    pkt = rte_pktmbuf_alloc(args.pkt_pool);
    if (!pkt) {
        r = rte_eth_tx_done_cleanup(args.port_id, 0, 256);
        if (r < 0) {
             rte_panic("%u. cannot cleanup tx descs: %s\n", i, rte_strerror(-r));
        }
        RTE_LOG(WARNING, USER1, "%u. cleaned up %d descriptors ...\n", i, r);
    }
} while (!pkt);

输出类似，例如：

USER1: 1023. cleaned up 95 descriptors ...
USER1: 1118. cleaned up 32 descriptors ...
USER1: 1150. cleaned up 32 descriptors ...
USER1: 1182. cleaned up 32 descriptors ...
USER1: 1214. cleaned up 0 descriptors ...
USER1: 1214. cleaned up 0 descriptors ...
USER1: 1214. cleaned up 32 descriptors ...
[..]

这意味着释放描述符/mbufs是如此“慢”，以至于它不得不忙于循环多达3次。

同样，这是一种有效的方法，还是有更好的dpdk方法来解决这个问题？

由于rte_eth_tx_done_cleanup()可能返回-ENOTSUP，这可能表明我使用它可能不是最好的解决方案。

顺便说一句，即使使用ixgbe驱动程序，当我禁用校验和卸载时，它也会失败！

显然，ixgbe_dev_tx_done_cleanup()随后调用的是ixgbe_tx_done_cleanup_vec()，而不是ixgbe_tx_done_cleanup_full()，后者无条件地返回-ENOTSUP

static int
ixgbe_tx_done_cleanup_vec(struct ixgbe_tx_queue *txq __rte_unused,
                        uint32_t free_cnt __rte_unused)
{
        return -ENOTSUP;
}

这有意义吗？

因此，也许更好的策略是确保描述符的数量少于池元素(例如1024-32 < 1023)，然后重新调用rte_eth_tx_burst()，直到它返回一个描述符？

意思是这样的：

for (;;) {
    uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
    if (l == 1) {
        break;
    } else {
        RTE_LOG(ERR, USER1, "%u. cannot send packet - retry\n", i);
    }
}

这是可行的，并且输出再次显示描述符一次释放32个，例如：

USER1: 1951. cannot send packet - retry
USER1: 1951. cannot send packet - retry
USER1: 1983. cannot send packet - retry
USER1: 1983. cannot send packet - retry
USER1: 2015. cannot send packet - retry
USER1: 2015. cannot send packet - retry
USER1: 2047. cannot send packet - retry
USER1: 2047. cannot send packet - retry

我知道我也可以使用rte_eth_tx_burst()来提交更大的突发。但我想先弄清楚简单的/边缘的情况，并理解dpdk语义。

我在Fedora 33和DPDK 20.11.2

dpdk

回答 3

Stack Overflow用户

发布于 2021-09-20 05:02:29

建议/解决方案:在使用rte_mempool_list_dump或dpdk-procinfo分析问题的原因后，请使用rte_eth_tx_buffer_flush或更改TX阈值的设置。

解释：

在不同的PMD上，mbuf_free的行为是不同的，在同一个NIC内，PF和VF也是不同的。下面是一些可以理解这一点的要点

可以创建具有或不具有高速缓存元素的

rte_mempool。
使用高速缓存元素创建时，根据可用lcore (eal_options)和每个核心的高速缓存元素数参数，将为每个核心高速缓存添加已配置的mbuf。
当HW offload DEV_TX_OFFLOAD_MBUF_FAST_FREE可用并已启用时，根据协议，mbuf的ref_cnt将为1。
因此，每当tx_burst (成功或失败被调用)阈值级别都会检查是否可以将空闲MBUF /MBUF-段推送回池。启用了DEV_TX_OFFLOAD_MBUF_FAST_FREE的
驱动程序盲目地将元素放入lcore缓存。
而对于no DEV_TX_OFFLOAD_MBUF_FAST_FREE，验证MBUF的通用方法确保检查nb_segments和ref_cnt，然后将其推送到内存池。

但总是要么是固定的(32，我相信是所有PMD的默认设置)，要么是可用的空闲mbuf总是被推送到缓存或池中。

事实：

对于IXGBE VF驱动程序，选项DEV_TX_OFFLOAD_MBUF_FAST_FREE不可用。这意味着每次达到阈值时，都会检查每个单独的cache.
Assumption并将其推送到内存池。根据代码片段，
仅为TX配置，并且由于必须创建rte_eth_dev_configure，因此rte_pktmbuf_pool_create创建为具有341个元素，并且只有1个基于Lcore的元素(它运行分配和tx循环)。

代码片段-1：

for (unsigned i = 0; i < 2048; ++i) {
    struct rte_mbuf *pkt = rte_pktmbuf_alloc(args.pkt_pool);
    // error check, prepare packet etc.

    uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
    // error check etc.
}

After 1086 transmitted packets (of ~ 300 bytes each), rte_eth_tx_burst() returns 0.

观察如果mbuf确实在运行，rte_pktmbuf_alloc应该在rte_eth_tx_burst之前失败。但如果在1086处失败，则会创建一个有趣的现象，因为创建的mbuf总数为1023，并且失败发生在32 mbuf_release到mempool的2次迭代中。分析ixgbe的驱动程序代码可以发现，在tx_xmit_pkts中(唯一的位置返回为0)是

        /* Only use descriptors that are available */
        nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
        if (unlikely(nb_pkts == 0))
                return 0;

即使在配置中tx_ring_size设置为992，在内部rte_eth_dev_adjust_nb_desc也会设置为*nb_desc, desc_lim->nb_min的最大值。基于代码，它不是，因为没有免费的mbuf，但由于TX描述符较低或不可用。

而在所有其他情况下，每当rte_eth_tx_done_cleanup或rte_eth_tx_buffer_flush实际上将任何挂起的描述符从SW PMD直接推送到DMA时。这在内部释放了更多的描述符，从而使tx_burst更加平滑。

为了找出根本原因，每当DPDK API tx_burst返回

通过dpdk-procinfo

调用rte_mempool_list_dump或

使用内存池转储

注意:大多数PMD的操作是通过批处理和聚束至少4次(在SSE的情况下)来摊销描述符(PCIe有效负载)写入的成本。因此，即使DPDK tx_burst返回1，单个数据包也不会将数据包推出NIC。因此，为了确保使用rte_eth_tx_buffer_flush。

票数 2

Stack Overflow用户

发布于 2021-09-20 06:35:39

比方说，您调用rte_eth_tx_burst()来发送一个小数据包(单个mbuf，没有卸载)。假设驱动器确实将数据包推送到HW。这样做会消耗环中的一个描述符:驱动程序“记住”该分组mbuf与该描述符相关联。但是数据包不会立即发送。HW通常具有将完成通知给驾驶员的一些手段。想象一下:如果驱动程序在每次rte_eth_tx_burst()调用时检查完成(因此忽略任何阈值)，那么以紧密循环的方式为另一个包再次调用rte_eth_tx_burst()可能会消耗更多的描述符，而不是回收第一个描述符。因此，鉴于这一事实，在研究tx_free_thresh语义时，我不会使用紧凑循环。无论是每个包调用一次rte_eth_tx_burst()，还是一批包调用一次，这都没有关系。

现在。比方说，你有一个尺寸为N的Tx环。假设tx_free_thresh为M。你有一个大小为Z的内存池。您所要做的就是分配一个N - M - 1小数据包的突发，并调用rte_eth_tx_burst()来发送这个突发(没有卸载；假设每个数据包占用一个Tx描述符)。然后，您可以有意识地等待足够的时间(用于完成)，并检查内存池中空闲对象的数量。此图应为Z - (N - M - 1)。然后分配并发送一个额外的数据包。然后再等一次。这一次，内存池中的空闲对象数应为Z - (N - M)。最后，再分配和发送一个数据包(再次！)因此超过阈值(备用Tx描述符的数量变得小于M)。在此rte_eth_tx_burst()调用期间，驱动程序应检测是否超过阈值，并开始检查完成情况。这将使驱动程序释放(N - M)描述符(由前两次rte_eth_tx_burst()调用使用)，从而清理整个环。则驱动程序继续将所讨论的新分组推送到HW，从而消耗一个描述符。然后检查内存池:这应该报告Z - 1空闲对象。

所以，简而言之:没有循环，只有三个rte_eth_tx_burst()调用，它们之间有足够的等待时间。并在每次发送操作后检查内存池中的空闲对象计数。从理论上讲，通过这种方式，您将能够理解角格语义。这就是它的要点。但是，请记住，不同供应商/PMD的实际行为可能会有所不同。

票数 1

Stack Overflow用户

发布于 2021-09-16 21:44:34

依赖rte_eth_tx_done_cleanup()确实不是一个好的选择，因为许多PMD并没有实现它。大多数Intel PMD都提供它，但例如SFC、MLX*和af_packet都不提供。

然而，仍然不清楚为什么ixgbe PMD doesn't support cleanup when offloads are enabled。

与释放相关的对rte_eth_tx_burst()的要求真的很轻--来自API文档：

 * It is the responsibility of the rte_eth_tx_burst() function to
 * transparently free the memory buffers of packets previously sent.
 * This feature is driven by the *tx_free_thresh* value supplied to the
 * rte_eth_dev_configure() function at device configuration time.
 * When the number of free TX descriptors drops below this threshold, the
 * rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf*  buffers
 * of those packets whose transmission was effectively completed.
[..]
 * @return
 *   The number of output packets actually stored in transmit descriptors of
 *   the transmit ring. The return value can be less than the value of the
 *   *tx_pkts* parameter when the transmit ring is full or has been filled up.

因此，仅尝试释放(而不是等待该尝试的结果)和返回0(因为0小于tx_pkts)就包含在该“契约”中。

FWIW，没有使用dpdk循环在rte_eth_tx_burst()周围分发的示例来重新提交尚未发送的包。不过，也有一些使用rte_eth_tx_burst()并丢弃未发送包的示例。

AFAICS，除了rte_eth_tx_done_cleanup()和rte_eth_tx_burst()之外，没有其他函数用于请求释放先前提交用于传输的mbuf。

因此，建议将mbuf数据包池的大小设置为大于配置的环大小，以便在所有mbuf都处于运行状态且无法恢复的情况下生存，因为没有剩余的mbuf可供再次调用rte_eth_tx_burst()。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69169224

复制

相似问题

问rte_eth_tx_burst()描述符/mbuf管理保证与空闲阈值
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问rte_eth_tx_burst()描述符/mbuf管理保证与空闲阈值EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问rte_eth_tx_burst()描述符/mbuf管理保证与空闲阈值
EN