
时间:2021-05-26 02:16:09

Scenario is something like this, I have 4 specific URLs in hand, each URL page contains many links to a web page, I need to extract some information of those web pages. I'm planning to use nested task to do this job, Multiple tasks inside one task. Something like below.


        var t1Actions = new List<Action>();
        var t1 = Task.Factory.StartNew(() =>
                foreach (var action in t1Actions)
                    Task.Factory.StartNew(action, TaskCreationOptions.AttachedToParent);

        var t2Actions = new List<Action>();
        var t2 = Task.Factory.StartNew(() =>
                foreach (var action in t2Actions)
                    Task.Factory.StartNew(action, TaskCreationOptions.AttachedToParent);

        var t3Actions = new List<Action>();
        var t3 = Task.Factory.StartNew(() =>
                foreach (var action in t3Actions)
                    Task.Factory.StartNew(action, TaskCreationOptions.AttachedToParent);

        var t4Actions = new List<Action>();
        var t4 = Task.Factory.StartNew(() =>
                foreach (var action in t4Actions)
                    Task.Factory.StartNew(action, TaskCreationOptions.AttachedToParent);

        Task.WhenAll(t1, t2, t3, t4);

Here is my questions:


  1. Is this way a good way to do jobs like what I mentioned above?
  2. 这种方式是否像我上面提到的那样做好工作?
  3. Which one is efficient, replace child tasks with Parallel.Invoke(action) or leave it as it is?
  4. 哪一个是高效的,用Parallel.Invoke(动作)替换子任务或保持原样?
  5. How should I notify (for example UI) if a nested task completed, Do I have control over nested tasks?
  6. 我应该如何通知(例如UI)嵌套任务是否已完成,我是否可以控制嵌套任务?

Any advice will be helpful.


1 个解决方案



The actual problem isn't how to handle child tasks. It's how to get a list of URLs from some directory pages, retrieve those pages and process them.


This can be done easily using .NET's Dataflow library. Each step can be implemented as a block that reads one URL and produces an output.


  1. The first block can be a TransformManyBlock that accepts one page URL and retursn a list of page URLs
  2. 第一个块可以是TransformManyBlock,它接受一个页面URL并重新查找页面URL列表
  3. The second block can be a TransformBlock that accepts a single page URL and returns its contents
  4. 第二个块可以是TransformBlock,它接受单个页面URL并返回其内容
  5. The third block can be an Action Block that accepts the page and does whatever is needed with it.
  6. 第三个块可以是一个Action Block,它接受页面并执行所需的任何操作。

For example:


var listBlock = new TransformManyBlock<Uri,Uri>(async uri=> 
    var content=await httpClient.GetStringAsync(uri);
    var uris=ProcessThePage(contents);
    return uris;

var downloadBlock = new TransformBlock<Uri,(Uri,string)>(async uri=> 
    var content=await httpClient.GetStringAsync(uri);
    return (uri,content);

var processingBlock = new ActionBlock<(Uri uri,string content)>(async msg=> 
    //Do something
    var pathFromUri(msg.uri);

var linkOptions=new DataflowLinkOptions{PropagateCompletion=true};


Each block runs using its own Task. You can specify that a block may use more than one tasks, eg to download multiple pages concurrently.


Each block has an input and output buffer. You can specify a limit to the input buffer to avoid flooding a block with too many messages to process. If a block reaches the limit upstream blocks will pause. This way, you could prevent eg the downloadBlock from flooding a slow processingBlock with thousands of pages.


Once you have a pipeline, you can post messages to the first block. When you're done, you can tell the block to Complete(). Each block in the pipeline will finish processing messages in its input buffer and propagate the completion call to the next linked block.


You can await for all messages to finish by awaiting the last block's Completion task.


var directoryPages=new Uri[]{..};

foreach(var uri in directoryPages)


await processingBlock.Complete();

The ExecutionDataflowBlockOptions can be used to specify the use of multiple tasks and the intput buffer limits, eg :


var options=new ExecutionDataflowBlockOptions 

var downloadBlock = new TransformBlock<Uri,(Uri,string)>(...,options);

This means that downloadBlock will accept up to 10 URIs before signalling the listBlock to pause. It will process up to 4 Uris concurrently




The actual problem isn't how to handle child tasks. It's how to get a list of URLs from some directory pages, retrieve those pages and process them.


This can be done easily using .NET's Dataflow library. Each step can be implemented as a block that reads one URL and produces an output.


  1. The first block can be a TransformManyBlock that accepts one page URL and retursn a list of page URLs
  2. 第一个块可以是TransformManyBlock,它接受一个页面URL并重新查找页面URL列表
  3. The second block can be a TransformBlock that accepts a single page URL and returns its contents
  4. 第二个块可以是TransformBlock,它接受单个页面URL并返回其内容
  5. The third block can be an Action Block that accepts the page and does whatever is needed with it.
  6. 第三个块可以是一个Action Block,它接受页面并执行所需的任何操作。

For example:


var listBlock = new TransformManyBlock<Uri,Uri>(async uri=> 
    var content=await httpClient.GetStringAsync(uri);
    var uris=ProcessThePage(contents);
    return uris;

var downloadBlock = new TransformBlock<Uri,(Uri,string)>(async uri=> 
    var content=await httpClient.GetStringAsync(uri);
    return (uri,content);

var processingBlock = new ActionBlock<(Uri uri,string content)>(async msg=> 
    //Do something
    var pathFromUri(msg.uri);

var linkOptions=new DataflowLinkOptions{PropagateCompletion=true};


Each block runs using its own Task. You can specify that a block may use more than one tasks, eg to download multiple pages concurrently.


Each block has an input and output buffer. You can specify a limit to the input buffer to avoid flooding a block with too many messages to process. If a block reaches the limit upstream blocks will pause. This way, you could prevent eg the downloadBlock from flooding a slow processingBlock with thousands of pages.


Once you have a pipeline, you can post messages to the first block. When you're done, you can tell the block to Complete(). Each block in the pipeline will finish processing messages in its input buffer and propagate the completion call to the next linked block.


You can await for all messages to finish by awaiting the last block's Completion task.


var directoryPages=new Uri[]{..};

foreach(var uri in directoryPages)


await processingBlock.Complete();

The ExecutionDataflowBlockOptions can be used to specify the use of multiple tasks and the intput buffer limits, eg :


var options=new ExecutionDataflowBlockOptions 

var downloadBlock = new TransformBlock<Uri,(Uri,string)>(...,options);

This means that downloadBlock will accept up to 10 URIs before signalling the listBlock to pause. It will process up to 4 Uris concurrently
