Apache Ignite:如何提高插入性能?

时间:2022-10-11 03:23:12

What additional things can I do beyond using the IDataStreamer and IBinaryObject to decrease insertion time into Apache Ignite.NET? It is possible to get a significant performance increase or is this as good as it will get?

除了使用IDataStreamer和IBinaryObject减少Apache Ignite.NET的插入时间之外,我还能做些什么?有可能获得显着的性能提升,还是会达到最佳效果?

I'm using:

  • .NET
  • 41 Query Fields: 1 string field and 40 float fields per row
  • 41查询字段:每行1个字符串字段和40个浮点字段

  • IBinaryObject / WithKeepBinary
  • IBinaryObject / WithKeepBinary

  • IDataStreamer
  • Default JVM settings
  • 默认JVM设置

  • Partitioned Cache
  • No Persistance

I used this example as a starting point: https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/examples/Apache.Ignite.Examples/Datagrid/DataStreamerExample.cs

我用这个例子作为起点:https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/examples/Apache.Ignite.Examples/Datagrid/DataStreamerExample.cs

Here's my usage of the IDataStreamer:

这是我对IDataStreamer的使用:

using (var ds = m_ignite.GetDataStreamer<string, IBinaryObject>(CacheName)) {
    foreach (var binaryRow in rows.Select(r => BuildRow(r))) {
        var key = binaryRow.GetField<string>(PrimaryKeyName);
        ds.AddData(key, binaryRow);
    }
}

Performance results: (5 nodes all with the same specifications)

性能结果:(5个节点都具有相同的规格)

BenchmarkDotNet=v0.10.8, OS=Windows 8.1 (6.3.9600)
Processor=Intel Xeon CPU E5-2698 v4 2.20GHz Intel Xeon CPU E5-2698 v4 2.20GHz, ProcessorCount=4
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2053.0
  Job-UZDKMF : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2053.0

RunStrategy=Monitoring  TargetCount=1

NumRows      Mean (ms)      Per Row (ms/row) 
10           359.50*        35.95* 
100          465.50*        4.66* 
1,000        797.80*        0.80* 
10,000       4,479.80       0.45 
100,000      37,611.60      0.38 
500,000      184,640.00     0.37 
1,000,000    366,801.40     0.37 
2,000,000    732,562.40     0.37 
4,000,000    1,458,913.60   0.36

*Measurement is larger because it also measures some lightweight work before inserting the rows

Any hints, tips, or documentation is appreciated. Thank you!

任何提示,技巧或文档表示赞赏。谢谢!

1 个解决方案

#1


2  

  1. Do not call GetField to retrieve key, return it directly from BuildRow (i.e. return KeyValuePair<string, IBinaryObject>)

    不要调用GetField来检索密钥,直接从BuildRow返回它(即返回KeyValuePair ) ,ibinaryobject>

  2. Parallelise the insertion (and BuildRow calls):

    并行插入(和BuildRow调用):

    Parallel.ForEach(rows, r => 
    {
        KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
        ds.AddData(pair);
    });
    
  3. Run more Ignite nodes on more machines

    在更多计算机上运行更多Ignite节点

  4. If rows come from external data source, you can make every Ignite node load only the related part. You can do that by executing the DataStreamer on each row via ICompute.Broadcast and, while iterating over rows, check if the key belongs to current node:

    如果行来自外部数据源,则可以使每个Ignite节点仅加载相关部分。您可以通过ICompute.Broadcast在每一行上执行DataStreamer来执行此操作,并在迭代行时检查该键是否属于当前节点:

    IAffinity aff = m_ignite.GetAffinity(cacheName);
    IClusterNode localNode = m_ignite.GetCluster().GetLocalNode();
    Parallel.ForEach(rows, r => 
    {
        string key = GetKey(r);
        if (aff.IsPrimary(localNode, key))
        {
            KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
            ds.AddData(pair);
        }
    });
    

#1


2  

  1. Do not call GetField to retrieve key, return it directly from BuildRow (i.e. return KeyValuePair<string, IBinaryObject>)

    不要调用GetField来检索密钥,直接从BuildRow返回它(即返回KeyValuePair ) ,ibinaryobject>

  2. Parallelise the insertion (and BuildRow calls):

    并行插入(和BuildRow调用):

    Parallel.ForEach(rows, r => 
    {
        KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
        ds.AddData(pair);
    });
    
  3. Run more Ignite nodes on more machines

    在更多计算机上运行更多Ignite节点

  4. If rows come from external data source, you can make every Ignite node load only the related part. You can do that by executing the DataStreamer on each row via ICompute.Broadcast and, while iterating over rows, check if the key belongs to current node:

    如果行来自外部数据源,则可以使每个Ignite节点仅加载相关部分。您可以通过ICompute.Broadcast在每一行上执行DataStreamer来执行此操作,并在迭代行时检查该键是否属于当前节点:

    IAffinity aff = m_ignite.GetAffinity(cacheName);
    IClusterNode localNode = m_ignite.GetCluster().GetLocalNode();
    Parallel.ForEach(rows, r => 
    {
        string key = GetKey(r);
        if (aff.IsPrimary(localNode, key))
        {
            KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
            ds.AddData(pair);
        }
    });