CPPFormatLibary提升效率的优化原理

时间:2022-03-13 14:37:06

CPPFormatLibary,以下简称FL,介绍:关于CPPFormatLibary

与stringstream,甚至C库的sprintf系列想比,FL在速度上都有优势,而且是在支持.net格式化风格的基础上。要实现这一点,需要多种优化结合在一起,有代码技巧方面的,也有设计策略上的。下面简要的对这些内容进行讲解:

1.  Pattern缓存

在C库函数sprintf中,比如这行代码:

          char szBuf[];
sprintf_s(szBuf, "%d--#--%8.2f--#--%s", , -40.2f, " String ");

格式化字符串"%d--#--%8.2f--#--%s"在每次函数调用的时候,都需要分析一次,依次找出对应的格式化符,在实际开发过程中,多数情况下,格式化字符串并没有任何不同,因此这个分析属于重复分析。因此在设计FL时将这样的格式化字符串称为PatternList,并使用Hash容器对这个PatternList进行存储,在每次格式化之前,首先在容器中查找对应字符串的Pattern是否已经存在,有的话则直接使用已经分析的结果。

下面的代码是Pattern的定义,PatternList则为对应的数组:

 1         /**
* @brief This is the description of a Format unit
* @example {0} {0:d}
*/
template < typename TCharType >
class TFormatPattern
{
public:
typedef TCharType CharType;
typedef unsigned char ByteType;
typedef std::size_t SizeType; enum EFormatFlag
{
FF_Raw,
FF_None,
FF_Decimal,
FF_Exponent,
FF_FixedPoint,
FF_General,
FF_CSV,
FF_Percentage,
FF_Hex
}; enum EAlignFlag
{
AF_Right,
AF_Left
}; TFormatPattern() :
Start((SizeType)-),
Len(),
Flag(FF_Raw),
Align(AF_Right),
Index((ByteType)-),
Precision((ByteType)-),
Width((ByteType)-) {
} SizeType GetLegnth() const
{
return Len;
} bool IsValid() const
{
return Start != - && Len != - && Index >= ;
} bool HasWidth() const
{
return Width != (ByteType)-;
} bool HasPrecision() const
{
return Precision != (ByteType)-;
} public:
SizeType Start;
SizeType Len;
EFormatFlag Flag;
EAlignFlag Align; ByteType Index;
ByteType Precision;
ByteType Width;
};

这个Pattern就代表了分析格式化字符串的每一个单元。

1         StandardLibrary::FormatTo(str, "{0}--#--{1,8}--#--{2}", , -40.2f, " String ");

在这行代码中,PatternList一共有5个Pattern,分别是:

    {} 参数0

     --#-- 原始类型

    {,} 参数1 宽度8

    --#-- 原始类型 纯字符串

    {} 参数2

这样设计可以优化掉重复的字符串Parse。

2.各种类型到字符串转换的算法优化

这部分代码完全存在于文件Algorithm.hpp中,这里面包含了诸如int、double等转换为字符串的快速算法,实测性能优于sprintf和atoi之类。通过这些基础算法的优化,性能可以得到相当不错的提升。

        template < typename TCharType >
inline void StringReverse(TCharType* Start, TCharType* End)
{
TCharType Aux; while (Start < End)
{
Aux = *End;
*End-- = *Start;
*Start++ = Aux;
}
} namespace Detail
{
const char DigitMap[] =
{
'', '', '', '', '', '', '',
'', '', '', 'A', 'B', 'C', 'D',
'E', 'F'
};
} template < typename TCharType >
inline SIZE_T Int64ToString(INT64 Value, TCharType* Buf, INT Base)
{
assert(Base > && static_cast<SIZE_T>(Base) <= _countof(Detail::DigitMap)); TCharType* Str = Buf; UINT64 UValue = (Value < ) ? -Value : Value; // Conversion. Number is reversed.
do
{
*Str++ = Detail::DigitMap[UValue%Base];
} while (UValue /= Base); if (Value < )
{
*Str++ = '-';
} *Str = '\0'; // Reverse string
StringReverse<TCharType>(Buf, Str - ); return Str - Buf;
}

上面这段代码展示的是快速整数到字符串的转换。据说基于sse指令的各种转换会更快,然而担心兼容性问题影响跨平台,我并未采用。

3. 栈容器和栈字符串

这部分代码存在于文件Utility.hpp中,这部分代码的优化原理就是在需要的动态内存并不大的时候,直接使用栈内存,当内存需求大到超过一定阀值的时候,自动申请堆内存并将栈数据转移到堆内存上。在大多数情况下,我们需要的内存都是很少,因此在绝大多数情况下,都能起到相当显著的优化效果。

         template <
typename T,
INT DefaultLength = 0xFF,
INT ExtraLength =
>
class TAutoArray
{
public:
typedef TAutoArray<T, DefaultLength, ExtraLength> SelfType; enum
{
DEFAULT_LENGTH = DefaultLength
}; class ConstIterator : Noncopyable
{
public:
ConstIterator(const SelfType& InRef) :
Ref(InRef),
Index( InRef.GetLength()>?:- )
{
} bool IsValid() const
{
return Index < Ref.GetLength();
} bool Next()
{
++Index; return IsValid();
} const T& operator *() const
{
const T* Ptr = Ref.GetDataPtr(); return Ptr[Index];
}
private:
ConstIterator& operator = (const ConstIterator&);
ConstIterator(ConstIterator&);
protected:
const SelfType& Ref;
SIZE_T Index;
}; TAutoArray() :
Count(),
AllocatedCount(),
HeapValPtr(NULL)
{
} ~TAutoArray()
{
ReleaseHeapData(); Count = ;
} TAutoArray(const SelfType& Other) :
Count(Other.Count),
AllocatedCount(Other.AllocatedCount),
HeapValPtr(NULL)
{
if (Count > )
{
if (Other.IsDataOnStack())
{
Algorithm::CopyArray(Other.StackVal, Other.StackVal + Count, StackVal);
}
else
{
HeapValPtr = Allocate(AllocatedCount);
Algorithm::CopyArray(Other.HeapValPtr, Other.HeapValPtr + Count, HeapValPtr);
}
}
} SelfType& operator = (const SelfType& Other)
{
if (this == &Other)
{
return *this;
} ReleaseHeapData(); Count = Other.Count;
AllocatedCount = Other.AllocatedCount;
HeapValPtr = NULL; if (Count > )
{
if (Other.IsDataOnStack())
{
Algorithm::CopyArray(Other.StackVal, Other.StackVal + Count, StackVal);
}
else
{
HeapValPtr = Allocate(AllocatedCount);
Algorithm::CopyArray(Other.HeapValPtr, Other.HeapValPtr + Count, HeapValPtr);
}
} return *this;
} SelfType& TakeFrom(SelfType& Other)
{
if (this == &Other)
{
return *this;
} Count = Other.Count;
AllocatedCount = Other.AllocatedCount;
HeapValPtr = Other.HeapValPtr; if (Count > && Other.IsDataOnStack())
{
Algorithm::MoveArray(Other.StackVal, Other.StackVal + Count, StackVal);
} Other.Count = ;
Other.AllocatedCount = ;
Other.HeapValPtr = NULL;
} void TakeTo(SelfType& Other)
{
Other.TakeFrom(*this);
} #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
TAutoArray( SelfType && Other ) :
Count(Other.Count),
AllocatedCount(Other.AllocatedCount),
HeapValPtr(Other.HeapValPtr)
{
if (Count > && Other.IsDataOnStack())
{
Algorithm::MoveArray(Other.StackVal, Other.StackVal + Count, StackVal);
} Other.Count = ;
Other.AllocatedCount = ;
Other.HeapValPtr = NULL;
} SelfType& operator = (SelfType&& Other )
{
return TakeFrom(Other);
}
#endif bool IsDataOnStack() const
{
return HeapValPtr == NULL;
} void AddItem(const T& InValue)
{
if (IsDataOnStack())
{
if (Count < DEFAULT_LENGTH)
{
StackVal[Count] = InValue;
++Count;
}
else if (Count == DEFAULT_LENGTH)
{
InitialMoveDataToHeap(); assert(Count < AllocatedCount); HeapValPtr[Count] = InValue;
++Count;
}
else
{
assert(false && "internal error");
}
}
else
{
if (Count < AllocatedCount)
{
HeapValPtr[Count] = InValue;
++Count;
}
else
{
ExpandHeapSpace(); assert(Count < AllocatedCount);
HeapValPtr[Count] = InValue;
++Count;
}
}
} SIZE_T GetLength() const
{
return Count;
} SIZE_T GetAllocatedCount() const
{
return AllocatedCount;
} T* GetDataPtr()
{
return IsDataOnStack() ? StackVal : HeapValPtr;
} const T* GetDataPtr() const
{
return IsDataOnStack() ? StackVal : HeapValPtr;
} T* GetUnusedPtr()
{
return IsDataOnStack() ? StackVal + Count : HeapValPtr + Count;
} const T* GetUnusedPtr() const
{
return IsDataOnStack() ? StackVal + Count : HeapValPtr + Count;
} SIZE_T GetCapacity() const
{
return IsDataOnStack() ?
DEFAULT_LENGTH - Count :
AllocatedCount - Count;
} T& operator []( SIZE_T Index )
{
assert( Index < GetLength() ); return GetDataPtr()[Index];
} const T& operator []( SIZE_T Index ) const
{
assert( Index < GetLength() ); return GetDataPtr()[Index];
} protected:
void InitialMoveDataToHeap()
{
assert(HeapValPtr == NULL); AllocatedCount = DEFAULT_LENGTH * ; HeapValPtr = Allocate(AllocatedCount); #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
Algorithm::MoveArray(StackVal, StackVal + Count, HeapValPtr);
#else
Algorithm::CopyArray(StackVal, StackVal + Count, HeapValPtr);
#endif
} void ExpandHeapSpace()
{
SIZE_T NewCount = AllocatedCount * ;
assert(NewCount > AllocatedCount); T* DataPtr = Allocate(NewCount); assert(DataPtr); #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
Algorithm::MoveArray(HeapValPtr, HeapValPtr + Count, DataPtr);
#else
Algorithm::CopyArray(HeapValPtr, HeapValPtr + Count, DataPtr);
#endif
ReleaseHeapData(); HeapValPtr = DataPtr;
AllocatedCount = NewCount;
} void ReleaseHeapData()
{
if (HeapValPtr)
{
delete[] HeapValPtr;
HeapValPtr = NULL;
} AllocatedCount = ;
} static T* Allocate(const SIZE_T InAllocatedCount)
{
// +ExtraLength this is a hack method for saving string on it.
return new T[InAllocatedCount + ExtraLength];
} protected:
SIZE_T Count;
SIZE_T AllocatedCount; // +ExtraLength this is a hack method for saving string on it.
T StackVal[DEFAULT_LENGTH + ExtraLength]; T* HeapValPtr;
};

上面这段代码展示的就是这个设想的实现。这是一个模板类,基于这个类实现了栈字符串。同时默认的PatternList也是使用这个模板来保存的,这样在节约了大量的内存分配操作之后,性能得到进一步的提升。

 // String Wrapper
template < typename TCharType >
class TAutoString :
public TAutoArray< TCharType, 0xFF, >
{
public:
typedef TAutoArray< TCharType, 0xFF, > Super;
typedef Mpl::TCharTraits<TCharType> CharTraits;
typedef TCharType CharType; #if !FL_COMPILER_MSVC
using Super::Count;
using Super::AllocatedCount;
using Super::HeapValPtr;
using Super::StackVal;
using Super::Allocate;
using Super::IsDataOnStack;
using Super::DEFAULT_LENGTH;
using Super::GetDataPtr;
using Super::ReleaseHeapData;
#endif TAutoString()
{
} TAutoString(const CharType* pszStr)
{
if (pszStr)
{
const SIZE_T Length = CharTraits::length(pszStr); Count = Length; if (Length <= DEFAULT_LENGTH)
{
CharTraits::copy(pszStr, pszStr + Length, StackVal);
StackVal[Count] = ;
}
else
{
HeapValPtr = Allocate(Length);
CharTraits::copy(pszStr, pszStr + Length, HeapValPtr);
HeapValPtr[Count] = ;
}
}
} void AddChar(CharType InValue)
{
AddItem(InValue); if (IsDataOnStack())
{
StackVal[Count] = ;
}
else
{
HeapValPtr[Count] = ;
}
} void AddStr(const CharType* pszStart, const CharType* pszEnd = NULL)
{
const SIZE_T Length = pszEnd ? pszEnd - pszStart : CharTraits::length(pszStart); if (IsDataOnStack())
{
if (Count + Length <= DEFAULT_LENGTH)
{
CharTraits::copy(StackVal+Count, pszStart, Length);
Count += Length; StackVal[Count] = ;
}
else
{
assert(!HeapValPtr); AllocatedCount = static_cast<SIZE_T>((Count + Length)*1.5f);
HeapValPtr = Allocate(AllocatedCount);
assert(HeapValPtr); if (Count > )
{
CharTraits::copy(HeapValPtr, StackVal, Count);
} CharTraits::copy(HeapValPtr+Count, pszStart, Length); Count += Length; HeapValPtr[Count] = ;
}
}
else
{
if (Count + Length <= AllocatedCount)
{
CharTraits::copy(HeapValPtr+Count, pszStart, Length);
Count += Length; HeapValPtr[Count] = ;
}
else
{
SIZE_T NewCount = static_cast<SIZE_T>((Count + Length)*1.5f); CharType* DataPtr = Allocate(NewCount); if (Count > )
{
CharTraits::copy(DataPtr, HeapValPtr, Count);
} ReleaseHeapData(); CharTraits::copy(DataPtr, pszStart, Length); Count += Length; AllocatedCount = NewCount;
HeapValPtr = DataPtr; HeapValPtr[Count] = ;
}
}
} const TCharType* CStr() const
{
return GetDataPtr();
} // is is a internal function
//
void InjectAdd(SIZE_T InCount)
{
Count += InCount; assert(IsDataOnStack() ? (Count <= DEFAULT_LENGTH) : (Count < AllocatedCount));
} protected:
void AddItem(const TCharType& InValue)
{
Super::AddItem(InValue);
}
};

上面展示的是基于栈内存容器实现的栈字符串,在大多数情况下,我们格式化字符串时都采用栈字符串来保存结果,这样可以显著的提升性能。

同时栈容器和栈字符串,都特别适合于当临时容器和临时字符串,因为多数情况下它们都优化掉了可能需要动态内存分配的操作。所以它们的使用并不局限于这一个小地方。

4. 基于C++ 11的优化

除了引入了C++ 11的容器unordered_map之外,还引入了右值引用等新内容,在某些情况下,可以带来一定的性能提升。

 #if FL_PLATFORM_HAS_RIGHT_VALUE_REFERENCE
TAutoArray( SelfType && Other ) :
Count(Other.Count),
AllocatedCount(Other.AllocatedCount),
HeapValPtr(Other.HeapValPtr)
{
if (Count > && Other.IsDataOnStack())
{
Algorithm::MoveArray(Other.StackVal, Other.StackVal + Count, StackVal);
} Other.Count = ;
Other.AllocatedCount = ;
Other.HeapValPtr = NULL;
} SelfType& operator = (SelfType&& Other )
{
return TakeFrom(Other);
}
#endif

上面展示的是基于右值引用的优化。

除此之外还是用了线程局部存储(TLS),这依赖于编译器是否支持。前面提到了我们采用Hash容器来存储Pattern缓存,然而在单线程的时候自然无需多余考虑,当需要支持多线程时,则全局唯一的Hash容器的访问都需要加锁,而加锁是有性能开销的。幸好C++ 11带来了内置的TLS支持,其结果就是每个线程会独立保存一份这样的Pattern缓存,因此无需对其访问加锁,这样无疑效率会更高。缺陷则是会损失部分内存。所有的这些都可以通过预先的宏定义来进行开关,使用者可以自行决定使用TLS还是Lock,或者不支持多线程。

         template < typename TPolicy >
class TGlobalPatternStorage :
public TPatternStorage<TPolicy>
{
public:
static TGlobalPatternStorage* GetStorage()
{
#if FL_WITH_THREAD_LOCAL
struct ManagedStorage
{
typedef Utility::TScopedLocker<System::CriticalSection> LockerType; System::CriticalSection ManagedCS;
Utility::TAutoArray<TGlobalPatternStorage*> Storages; ~ManagedStorage()
{
LockerType Locker(ManagedCS); for( SIZE_T i=; i<Storages.GetLength(); ++i )
{
delete Storages[i];
}
} void AddStorage( TGlobalPatternStorage* Storage )
{
assert(Storage); LockerType Locker(ManagedCS); Storages.AddItem(Storage);
}
}; static ManagedStorage StaticManager; static FL_THREAD_LOCAL TGlobalPatternStorage* StaticStorage = NULL; if( !StaticStorage )
{
StaticStorage = new TGlobalPatternStorage(); StaticManager.AddStorage(StaticStorage);
} return StaticStorage;
#else
static TGlobalPatternStorage StaticStorage;
return &StaticStorage;
#endif
}
};

如上所示为项目中使用TLS的代码。

总结

在将这一系列的优化结合起来之后,可以使得FL的整体效率处于较高水平,不低于C库函数,同时还具备其它格式化库不具备的功能,对于代码安全性等各方面的增强,都有帮助。下面是Test.cpp的测试结果,FL代表的是使用FL库的耗时,CL代表的C库的耗时,同时此测试模拟了多线程环境。

Windows Visual Studio 2013 Release下的输出:

CPPFormatLibary提升效率的优化原理
0x64
Test20, -10.0050, X , X
0x64
Test20, -10.0050, X , X
1920 FLElapse:0.0762746
1920 CLElapse:0.269722
1636 FLElapse:0.0756153
7732 FLElapse:0.0766446
7956 FLElapse:0.0762051
7956 CLElapse:0.285714
1636 CLElapse:0.288648
7732 CLElapse:0.289193
CPPFormatLibary提升效率的优化原理

Mac Xcode Release:

CPPFormatLibary提升效率的优化原理
99
Test20, -10.0050, X , X
18446744073709551615 FLElapse:0.0901681
18446744073709551615 CLElapse:0.19329
18446744073709551615 FLElapse:0.147378
18446744073709551615 FLElapse:0.150375
18446744073709551615 FLElapse:0.153342
18446744073709551615 CLElapse:0.303508
18446744073709551615 CLElapse:0.308418
18446744073709551615 CLElapse:0.307407
CPPFormatLibary提升效率的优化原理

这并非完全的测试,更多的测试需要在实际使用过程中来验证。