如何在C中实现循环缓冲区?

I have a need for a fixed-size (selectable at run-time when creating it, not compile-time) circular buffer which can hold objects of any type and it needs to be very high performance. I don't think there will be resource contention issues since, although it's in a multi-tasking embedded environment, it's a co-operative one so the tasks themselves can manage that.

我需要一个固定大小(在运行时可以选择，而不是编译时)循环缓冲区，它可以容纳任何类型的对象，并且它需要非常高性能。我不认为会出现资源争用问题，尽管它是在一个多任务的嵌入式环境中，它是一个合作的，所以任务本身可以管理它。

My initial thought were to store a simple struct in the buffer which would contain the type (simple enum/define) and a void pointer to the payload but I want this to be as fast as possible so I'm open to suggestions that involve bypassing the heap.

我最初的想法是在缓冲区中存储一个简单的struct，它将包含类型(简单的enum/define)和有效负载的空指针，但是我希望这是尽可能快的，这样我就可以接受绕过堆的建议。

Actually I'm happy to bypass any of the standard library for raw speed - from what I've seen of the code, it's not heavily optimized for the CPU : it looks like they just compiled C code for things like strcpy() and such, there's no hand-coded assembly.

实际上，我很乐意绕过任何标准库以获取原始速度——从我所看到的代码来看，它并没有对CPU进行大量优化:看起来它们只是为strcpy()之类的东西编译了C代码，而且没有手工编码的程序集。

Any code or ideas would be greatly appreciated. The operations required are:

任何代码或想法都会非常感谢。所需的操作是:

create a buffer with specific size.
创建一个具有特定大小的缓冲区。
put at the tail.
把尾巴。
get from the head.
从头部。
return the count.
返回计数。
delete a buffer.
删除一个缓冲区。

7 个解决方案

#1

Can you enumerate the types needed at the time you code up the buffer, or do you need to be able to add types at run time via dynamic calls? If the former, then I would create the buffer as a heap-allocated array of n structs, where each struct consists of two elements: an enum tag identifying the data type, and a union of all the data types. What you lose in terms of extra storage for small elements, you make up in terms of not having to deal with allocation/deallocation and the resulting memory fragmentation. Then you just need to keep track of the start and end indices that define the head and tail elements of the buffer, and make sure to compute mod n when incrementing/decrementing the indices.

您能列举出在您编码缓冲区时所需的类型，或者您需要能够通过动态调用在运行时添加类型吗?如果是前者，那么我将创建一个缓冲区，作为一个由heap分配的n结构数组，其中每个struct由两个元素组成:一个用于标识数据类型的枚举标记，以及一个所有数据类型的联合。对于小元素的额外存储而言，您所损失的是不需要处理分配/处理位置和产生的内存碎片。然后您只需要跟踪定义缓冲区的头和尾元素的开始和结束索引，并确保在对索引进行递增/递减时计算mod n。

#2

The simplest solution would be to keep track of the item size and the number of items, and then create a buffer of the appropriate number of bytes:

最简单的解决方案是跟踪项目的大小和项目的数量，然后创建适当数量的字节的缓冲区:

typedef struct circular_buffer
{
    void *buffer;     // data buffer
    void *buffer_end; // end of data buffer
    size_t capacity;  // maximum number of items in the buffer
    size_t count;     // number of items in the buffer
    size_t sz;        // size of each item in the buffer
    void *head;       // pointer to head
    void *tail;       // pointer to tail
} circular_buffer;

void cb_init(circular_buffer *cb, size_t capacity, size_t sz)
{
    cb->buffer = malloc(capacity * sz);
    if(cb->buffer == NULL)
        // handle error
    cb->buffer_end = (char *)cb->buffer + capacity * sz;
    cb->capacity = capacity;
    cb->count = 0;
    cb->sz = sz;
    cb->head = cb->buffer;
    cb->tail = cb->buffer;
}

void cb_free(circular_buffer *cb)
{
    free(cb->buffer);
    // clear out other fields too, just to be safe
}

void cb_push_back(circular_buffer *cb, const void *item)
{
    if(cb->count == cb->capacity){
        // handle error
    }
    memcpy(cb->head, item, cb->sz);
    cb->head = (char*)cb->head + cb->sz;
    if(cb->head == cb->buffer_end)
        cb->head = cb->buffer;
    cb->count++;
}

void cb_pop_front(circular_buffer *cb, void *item)
{
    if(cb->count == 0){
        // handle error
    }
    memcpy(item, cb->tail, cb->sz);
    cb->tail = (char*)cb->tail + cb->sz;
    if(cb->tail == cb->buffer_end)
        cb->tail = cb->buffer;
    cb->count--;
}

#3

// Note power of two buffer size
#define kNumPointsInMyBuffer 1024 

typedef struct _ringBuffer {
    UInt32 currentIndex;
    UInt32 sizeOfBuffer;
    double data[kNumPointsInMyBuffer];
} ringBuffer;

// Initialize the ring buffer
ringBuffer *myRingBuffer = (ringBuffer *)calloc(1, sizeof(ringBuffer));
myRingBuffer->sizeOfBuffer = kNumPointsInMyBuffer;
myRingBuffer->currentIndex = 0;

// A little function to write into the buffer
// N.B. First argument of writeIntoBuffer() just happens to have the
// same as the one calloc'ed above. It will only point to the same
// space in memory if the calloc'ed pointer is passed to
// writeIntoBuffer() as an arg when the function is called. Consider
// using another name for clarity
void writeIntoBuffer(ringBuffer *myRingBuffer, double *myData, int numsamples) {
    // -1 for our binary modulo in a moment
    int buffLen = myRingBuffer->sizeOfBuffer - 1;
    int lastWrittenSample = myRingBuffer->currentIndex;

    int idx;
    for (int i=0; i < numsamples; ++i) {
        // modulo will automagically wrap around our index
        idx = (i + lastWrittenSample) & buffLen; 
        myRingBuffer->data[idx] = myData[i];
    }

    // Update the current index of our ring buffer.
    myRingBuffer->currentIndex += numsamples;
    myRingBuffer->currentIndex &= myRingBuffer->sizeOfBuffer - 1;
}

As long as your ring buffer's length is a power of two, the incredibly fast binary "&" operation will wrap around your index for you. For my application, I'm displaying a segment of audio to the user from a ring buffer of audio acquired from a microphone.

只要你的环形缓冲区的长度是2的幂，那么快速的二进制“&”操作就会环绕你的索引。对于我的应用程序，我正在向用户显示一个音频片段，从一个从麦克风获得的音频的环形缓冲区。

I always make sure that the maximum amount of audio that can be displayed on screen is much less than the size of the ring buffer. Otherwise you might be reading and writing from the same chunk. This would likely give you weird display artifacts.

我总是确保在屏幕上显示的最大音频量远小于环形缓冲区的大小。否则，你可能会从同一块地方阅读和写作。这可能会给您一些奇怪的显示工件。

#4

First, the headline. You don't need modulo arithmetic to wrap the buffer if you use bit ints to hold the head & tail "pointers", and size them so they are perfectly in synch. IE: 4096 stuffed into a 12-bit unsigned int is 0 all by itself, unmolested in any way. Eliminating modulo arithmetic, even for powers of 2, doubles the speed - almost exactly.

首先是标题。如果您使用位元来保持头和尾“指针”，并使它们的大小完全同步，那么您不需要用模运算来包装缓冲区。IE:被塞进一个12比特的无符号整数，它本身就是0，没有任何干扰。消除模运算，即使是2倍的幂，也会使速度加倍——几乎完全正确。

10 million iterations of filling and draining a 4096 buffer of any type of data elements takes 52 seconds on my 3rd Gen i7 Dell XPS 8500 using Visual Studio 2010's C++ compiler with default inlining, and 1/8192nd of that to service a datum.

在我的第3代i7戴尔XPS 8500上使用Visual Studio 2010的c++编译器和默认的内联，以及1/8192的数据，在我的第3代i7戴尔XPS 8500上使用了10000000次的填充和消耗4096缓冲区的数据元素。

I'd RX rewriting the test loops in main() so they no longer control the flow - which is, and should be, controlled by the return values indicating the buffer is full or empty, and the attendant break; statements. IE: the filler and drainer should be able to bang against each other without corruption or instability. At some point I hope to multi-thread this code, whereupon that behavior will be crucial.

我将在main()中重写测试循环，这样它们就不再控制流——这是，并且应该是由返回值控制的，指示缓冲区是满的或空的，以及服务中断;语句。IE:填充物和排水器应该能够在没有损坏或不稳定的情况下互相撞击。在某种程度上，我希望多线程这段代码，因此行为将是至关重要的。

The QUEUE_DESC (queue descriptor) and initialization function forces all buffers in this code to be a power of 2. The above scheme will NOT work otherwise. While on the subject, note that QUEUE_DESC is not hard-coded, it uses a manifest constant (#define BITS_ELE_KNT) for its construction. (I'm assuming a power of 2 is sufficient flexibility here)

QUEUE_DESC(队列描述符)和初始化函数强制这个代码中的所有缓冲区为2的幂。以上方案不适用。在这个主题上，注意QUEUE_DESC不是硬编码的，它使用一个manifest常量(#define BITS_ELE_KNT)来构建它。(我假设2的幂是足够的灵活性)

To make the buffer size run-time selectable, I tried different approaches (not shown here), and settled on using USHRTs for Head, Tail, EleKnt capable of managing a FIFO buffer[USHRT]. To avoid modulo arithmetic I created a mask to && with Head, Tail, but that mask turns out to be (EleKnt -1), so just use that. Using USHRTS instead of bit ints increased performance ~ 15% on a quiet machine. Intel CPU cores have always been faster than their buses, so on a busy, shared machine, packing your data structures gets you loaded and executing ahead of other, competing threads. Trade-offs.

为了使缓冲区大小运行时可选择，我尝试了不同的方法(这里没有显示)，并决定使用USHRTs作为头、尾、EleKnt来管理FIFO缓冲区[USHRT]。为了避免模运算，我创建了一个面具，并与头，尾巴，但那个面具结果是(EleKnt -1)，所以就使用它。在一个安静的机器上使用USHRTS而不是bit ints提高了15%的性能。Intel的CPU核心总是比它们的总线快，所以在繁忙的共享机器上，打包数据结构会让您在其他竞争线程之前加载和执行。权衡。

Note the actual storage for the buffer is allocated on the heap with calloc(), and the pointer is at the base of the struct, so the struct and the pointer have EXACTLY the same address. IE; no offset required to be added to the struct address to tie up registers.

注意，缓冲区的实际存储是用calloc()在堆上分配的，而指针位于struct的底部，因此结构和指针的地址完全相同。即;不需要添加到结构地址的偏移量来绑定寄存器。

In that same vein, all of the variables attendant with servicing the buffer are physically adjacent to the buffer, bound into the same struct, so the compiler can make beautiful assembly language. You'll have to kill the inline optimization to see any assembly, because otherwise it gets crushed into oblivion.

在同样的情况下，服务缓冲区的所有变量都在物理上与缓冲区相邻，绑定到相同的结构中，这样编译器就可以生成漂亮的汇编语言。您必须杀死内联优化以查看任何程序集，否则它会被湮没。

To support the polymorphism of any data type, I've used memcpy() instead of assignments. If you only need the flexibility to support one random variable type per compile, then this code works perfectly.

为了支持任何数据类型的多态性，我使用memcpy()而不是赋值。如果您只需要在每次编译时支持一个随机变量类型的灵活性，那么这段代码就可以很好地工作。

For polymorphism, you just need to know the type and it's storage requirement. The DATA_DESC array of descriptors provides a way to keep track of each datum that gets put in QUEUE_DESC.pBuffer so it can be retrieved properly. I'd just allocate enough pBuffer memory to hold all of the elements of the largest data type, but keep track of how much of that storage a given datum is actually using in DATA_DESC.dBytes. The alternative is to reinvent a heap manager.

对于多态性，您只需要知道类型和它的存储需求。描述符的DATA_DESC数组提供了一种方法来跟踪在QUEUE_DESC中放入的每个数据。pBuffer可以正确检索。我只需要分配足够的pBuffer内存来保存最大数据类型的所有元素，但是要跟踪一个给定数据在DATA_DESC.dBytes中实际使用了多少存储。另一种方法是重新创建堆管理器。

This means QUEUE_DESC's UCHAR *pBuffer would have a parallel companion array to keep track of data type, and size, while a datum's storage location in pBuffer would remain just as it is now. The new member would be something like DATA_DESC *pDataDesc, or, perhaps, DATA_DESC DataDesc[2^BITS_ELE_KNT] if you can find a way to beat your compiler into submission with such a forward reference. Calloc() is always more flexible in these situations.

这意味着QUEUE_DESC的UCHAR *pBuffer将有一个并行的伙伴数组来跟踪数据类型和大小，而datum在pBuffer中的存储位置将保持原样。新成员将类似DATA_DESC * pDataDesc,或者,也许,DATA_DESC DataDesc[2 ^ BITS_ELE_KNT]如果你能找到一个方法来击败你的编译器屈服向前引用。在这些情况下，Calloc()总是更灵活。

You'd still memcpy() in Q_Put(),Q_Get, but the number of bytes actually copied would be determined by DATA_DESC.dBytes, not QUEUE_DESC.EleBytes. The elements are potentially all of different types/sizes for any given put or get.

您仍然会在Q_Put()中memcpy()，Q_Get，但是实际复制的字节数将由DATA_DESC决定。dBytes,不是QUEUE_DESC.EleBytes。对于给定的put或get，元素可能具有所有不同的类型/大小。

I believe this code satisfies the speed and buffer size requirements, and can be made to satisfy the requirement for 6 different data types. I've left the many test fixtures in, in the form of printf() statements, so you can satisfy yourself (or not) that the code works properly. The random number generator demonstrates that the code works for any random head/tail combo.

我相信这段代码满足了速度和缓冲区大小的要求，并且可以满足6种不同数据类型的需求。我已经将许多测试装置放在了printf()语句的形式中，这样您就可以满足自己(或不满足)代码正常工作的情况。随机数生成器表明，该代码适用于任意随机头/尾组合。

enter code here
// Queue_Small.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <stdio.h>
#include <time.h>
#include <limits.h>
#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <math.h>

#define UCHAR unsigned char
#define ULONG unsigned long
#define USHRT unsigned short
#define dbl   double
/* Queue structure */
#define QUEUE_FULL_FLAG 1
#define QUEUE_EMPTY_FLAG -1
#define QUEUE_OK 0
//  
#define BITS_ELE_KNT    12  //12 bits will create 4.096 elements numbered 0-4095
//
//typedef struct    {
//  USHRT dBytes:8;     //amount of QUEUE_DESC.EleBytes storage used by datatype
//  USHRT dType :3; //supports 8 possible data types (0-7)
//  USHRT dFoo  :5; //unused bits of the unsigned short host's storage
// }    DATA_DESC;
//  This descriptor gives a home to all the housekeeping variables
typedef struct  {
    UCHAR   *pBuffer;   //  pointer to storage, 16 to 4096 elements
    ULONG Tail  :BITS_ELE_KNT;  //  # elements, with range of 0-4095
    ULONG Head  :BITS_ELE_KNT;  //  # elements, with range of 0-4095
    ULONG EleBytes  :8;     //  sizeof(elements) with range of 0-256 bytes
    // some unused bits will be left over if BITS_ELE_KNT < 12
    USHRT EleKnt    :BITS_ELE_KNT +1;// 1 extra bit for # elements (1-4096)
    //USHRT Flags   :(8*sizeof(USHRT) - BITS_ELE_KNT +1);   //  flags you can use
    USHRT   IsFull  :1;     // queue is full
    USHRT   IsEmpty :1;     // queue is empty
    USHRT   Unused  :1;     // 16th bit of USHRT
}   QUEUE_DESC;

//  ---------------------------------------------------------------------------
//  Function prototypes
QUEUE_DESC *Q_Init(QUEUE_DESC *Q, int BitsForEleKnt, int DataTypeSz);
int Q_Put(QUEUE_DESC *Q, UCHAR *pNew);
int Q_Get(UCHAR *pOld, QUEUE_DESC *Q);
//  ---------------------------------------------------------------------------
QUEUE_DESC *Q_Init(QUEUE_DESC *Q, int BitsForEleKnt, int DataTypeSz)    {
    memset((void *)Q, 0, sizeof(QUEUE_DESC));//init flags and bit integers to zero
    //select buffer size from powers of 2 to receive modulo 
    //                arithmetic benefit of bit uints overflowing
    Q->EleKnt   =   (USHRT)pow(2.0, BitsForEleKnt);
    Q->EleBytes =   DataTypeSz; // how much storage for each element?
    //  Randomly generated head, tail a test fixture only. 
    //      Demonstrates that the queue can be entered at a random point 
    //      and still perform properly. Normally zero
    srand(unsigned(time(NULL)));    // seed random number generator with current time
    Q->Head = Q->Tail = rand(); // supposed to be set to zero here, or by memset
    Q->Head = Q->Tail = 0;
    //  allocate queue's storage
    if(NULL == (Q->pBuffer = (UCHAR *)calloc(Q->EleKnt, Q->EleBytes)))  {
        return NULL;
    }   else    {
        return Q;
    }
}
//  ---------------------------------------------------------------------------
int Q_Put(QUEUE_DESC *Q, UCHAR *pNew)   
{
    memcpy(Q->pBuffer + (Q->Tail * Q->EleBytes), pNew, Q->EleBytes);
    if(Q->Tail == (Q->Head + Q->EleKnt)) {
        //  Q->IsFull = 1;
        Q->Tail += 1;   
        return QUEUE_FULL_FLAG; //  queue is full
    }
    Q->Tail += 1;   //  the unsigned bit int MUST wrap around, just like modulo
    return QUEUE_OK; // No errors
}
//  ---------------------------------------------------------------------------
int Q_Get(UCHAR *pOld, QUEUE_DESC *Q)   
{
    memcpy(pOld, Q->pBuffer + (Q->Head * Q->EleBytes), Q->EleBytes);
    Q->Head += 1;   //  the bit int MUST wrap around, just like modulo

    if(Q->Head == Q->Tail)      {
        //  Q->IsEmpty = 1;
        return QUEUE_EMPTY_FLAG; // queue Empty - nothing to get
    }
    return QUEUE_OK; // No errors
}
//
//  ---------------------------------------------------------------------------
int _tmain(int argc, _TCHAR* argv[])    {
//  constrain buffer size to some power of 2 to force faux modulo arithmetic
    int LoopKnt = 1000000;  //  for benchmarking purposes only
    int k, i=0, Qview=0;
    time_t start;
    QUEUE_DESC Queue, *Q;
    if(NULL == (Q = Q_Init(&Queue, BITS_ELE_KNT, sizeof(int)))) {
        printf("\nProgram failed to initialize. Aborting.\n\n");
        return 0;
    }

    start = clock();
    for(k=0; k<LoopKnt; k++)    {
        //printf("\n\n Fill'er up please...\n");
        //Q->Head = Q->Tail = rand();
        for(i=1; i<= Q->EleKnt; i++)    {
            Qview = i*i;
            if(QUEUE_FULL_FLAG == Q_Put(Q, (UCHAR *)&Qview))    {
                //printf("\nQueue is full at %i \n", i);
                //printf("\nQueue value of %i should be %i squared", Qview, i);
                break;
            }
            //printf("\nQueue value of %i should be %i squared", Qview, i);
        }
        //  Get data from queue until completely drained (empty)
        //
        //printf("\n\n Step into the lab, and see what's on the slab... \n");
        Qview = 0;
        for(i=1; i; i++)    {
            if(QUEUE_EMPTY_FLAG == Q_Get((UCHAR *)&Qview, Q))   {
                //printf("\nQueue value of %i should be %i squared", Qview, i);
                //printf("\nQueue is empty at %i", i);
                break;
            }
            //printf("\nQueue value of %i should be %i squared", Qview, i);
        }
        //printf("\nQueue head value is %i, tail is %i\n", Q->Head, Q->Tail);
    }
    printf("\nQueue time was %5.3f to fill & drain %i element queue  %i times \n", 
                     (dbl)(clock()-start)/(dbl)CLOCKS_PER_SEC,Q->EleKnt, LoopKnt);
    printf("\nQueue head value is %i, tail is %i\n", Q->Head, Q->Tail);
    getchar();
    return 0;
}

#5

Here is a simple solution in C. Assume interrupts are turned off for each function. No polymorphism & stuff, just common sense.

这里有一个简单的解决方案。假设中断是为每个函数关闭的。没有多态性，只是常识。

#define BUFSIZE 128
char buf[BUFSIZE];
char *pIn, *pOut, *pEnd;
char full;

// init
void buf_init()
{
    pIn = pOut = buf;       // init to any slot in buffer
    pEnd = &buf[BUFSIZE];   // past last valid slot in buffer
    full = 0;               // buffer is empty
}

// add char 'c' to buffer
int buf_put(char c)
{
    if (pIn == pOut  &&  full)
        return 0;           // buffer overrun

    *pIn++ = c;             // insert c into buffer
    if (pIn >= pEnd)        // end of circular buffer?
        pIn = buf;          // wrap around

    if (pIn == pOut)        // did we run into the output ptr?
        full = 1;           // can't add any more data into buffer
    return 1;               // all OK
}

// get a char from circular buffer
int buf_get(char *pc)
{
    if (pIn == pOut  &&  !full)
        return 0;           // buffer empty  FAIL

    *pc = *pOut++;              // pick up next char to be returned
    if (pOut >= pEnd)       // end of circular buffer?
        pOut = buf;         // wrap around

    full = 0;               // there is at least 1 slot
    return 1;               // *pc has the data to be returned
}

#6

A simple implementation could consist of:

一个简单的实现可以包括:

A buffer, implemented as an array of size n, of whatever type you need
一个缓冲区，实现为一个大小为n的数组，无论您需要什么类型。
A read pointer or index (whichever is more efficient for your processor)
读指针或索引(以更有效的方式处理处理器)
A write pointer or index
写指针或索引。
A counter indicating how much data is in the buffer (derivable from the read and write pointers, but faster to track it separately)
指示缓冲区中有多少数据的计数器(可以从读写指针中派生出来，但是可以更快地单独跟踪它)

Every time you write data, you advance the write pointer and increment the counter. When you read data, you increase the read pointer and decrement the counter. If either pointer reaches n, set it to zero.

每次写入数据时，都要提前编写指针并增加计数器。当您读取数据时，您将增加读指针并递减计数器。如果任一指针到达n，将其设置为0。

You can't write if counter = n. You can't read if counter = 0.

如果counter = n，就不能写，如果counter = 0，则不能读取。

#7

C style, simple ring buffer for integers. First use init than use put and get. If buffer does not contain any data it returns "0" zero.

C型，整数的简单环缓冲区。首先使用init，而不是使用put和get。如果缓冲区不包含任何数据，它将返回“0”。

//=====================================
// ring buffer address based
//=====================================
#define cRingBufCount   512
int     sRingBuf[cRingBufCount];    // Ring Buffer
int     sRingBufPut;                // Input index address
int     sRingBufGet;                // Output index address
Bool    sRingOverWrite;

void    GetRingBufCount(void)
{
int     r;
`       r= sRingBufPut - sRingBufGet;
        if ( r < cRingBufCount ) r+= cRingBufCount;
        return r; 
}

void    InitRingBuffer(void)
{
        sRingBufPut= 0;
        sRingBufGet= 0;
}       

void    PutRingBuffer(int d)
{
        sRingBuffer[sRingBufPut]= d;
        if (sRingBufPut==sRingBufGet)// both address are like ziro
        {
            sRingBufPut= IncRingBufferPointer(sRingBufPut);
            sRingBufGet= IncRingBufferPointer(sRingBufGet);
        }
        else //Put over write a data
        {
            sRingBufPut= IncRingBufferPointer(sRingBufPut);
            if (sRingBufPut==sRingBufGet)
            {
                sRingOverWrite= Ture;
                sRingBufGet= IncRingBufferPointer(sRingBufGet);
            }
        }
}

int     GetRingBuffer(void)
{
int     r;
        if (sRingBufGet==sRingBufPut) return 0;
        r= sRingBuf[sRingBufGet];
        sRingBufGet= IncRingBufferPointer(sRingBufGet);
        sRingOverWrite=False;
        return r;
}

int     IncRingBufferPointer(int a)
{
        a+= 1;
        if (a>= cRingBufCount) a= 0;
        return a;
}

#1

#2