left-icon

LINQ Succinctly®
by Jason Roberts

Previous
Chapter

of
A
A
A

CHAPTER 6

Parallel LINQ

Parallel LINQ


Overview

The area of parallel programming is vast and potentially complicated. This chapter introduces parallel LINQ (PLINQ) as a method of reducing processing time when executing LINQ queries.

PLINQ is a higher level abstraction that sits above various lower level .NET multithreading related components and aims to abstract away the lower-level details of multithreading, while at the same time offering the familiar general LINQ query semantics.

Note: Because PLINQ is at a higher level of abstraction, the programmer still needs to have basic working knowledge of multithreaded programming.

When converting a LINQ query to a PLINQ query, PLINQ takes care of the following lower-level aspects of parallelization:

·     Split the query work (input sequence) into a number of smaller sub-segments.

·     Execute query code against each sub-segment on different threads.

·     Once all sub-segments have been processed, reassemble all the results from the sub-segments back into a single output sequence.

In this way, PLINQ may help reduce overall query processing time by utilizing multiple cores of the machine on which it is executing. It should also be noted that PLINQ works on local queries, as opposed to remote interpreted queries.

Not all queries will benefit from PLINQ, and indeed, depending on the number of input elements and the amount of processing required, PLINQ queries may actually take longer to run due to the overhead of splitting/threading/reassembly. As with all performance-related tasks, measurements/profiling and methodical performance-tuning adjustments should be made rather than randomly turning all LINQ queries in the code base into PLINQ queries.

It should also be noted that simply using PLINQ is no guarantee of the query actually executing in parallel. This is because not all query operators are able to be parallelized. Even with those query operators that are parallelizable, during execution, PLINQ may decide to still run them sequentially if it determines that it may be perform better.

Applying PLINQ

To turn a normal LINQ query to a PLINQ query, the AsParallel method is added to the query. This extension method is defined in the System.Linq.ParallelEnumerable class. When the AsParallel method is added to a query, it essentially “switches on” PLINQ for that query.

The following code shows the method signature for AsParallel method.

public static ParallelQuery<TSource> AsParallel<TSource>(
                                            this IEnumerable<TSource> source)

Notice that the AsParallel extension method extends IEnumerable<TSource>. Because local LINQ queries operate on IEnumberable<T> sequence, the method is available to be added to regular local LINQ queries. The other important thing to note in the preceding code is that the return type is ParallelQuery<TSource>. It is this different return type that “switches on” PLINQ. Because the query “stream” has now been converted from an IEnumerable<T> to a ParallelQuery<TSource> the other PLINQ versions of the query operators can now be applied to the query.

To illustrate this, the following code shows the differences in the signature of the Where operator between regular local LINQ and PLINQ

// Where query operator from System.Linq.Enumerable

public static IEnumerable<TSource> Where<TSource>(
             this IEnumerable<TSource> source, Func<TSource, bool> predicate)

// Where query operator from System.Linq.ParallelEnumerable

public static ParallelQuery<TSource> Where<TSource>(
           this ParallelQuery<TSource> source, Func<TSource, bool> predicate)

Notice in the preceding code the second PLINQ version of the Where query operator is extending ParallelQuery<T> and not IEnumerable<T>.

So once we have switched our regular LINQ query to PLINQ using AsParallel, when we use subsequent query operators, they will use the PLINQ versions and be executed in parallel where possible, and when PLINQ does not decide sequential (non-parallel) execution is better.

The following code shows two queries that both filter the input sequence of numbers where the string version of that number contains the string “3”. This example code is to create enough of a CPU load to see some meaningful output timings. The second version of the query uses PLINQ to parallelize the query. In both queries the output sequence result is being ignored and ToArray is being used to force the query to run. The execution time for both queries is being captured using a StopWatch.

var someNumbers = Enumerable.Range(1, 10000000).ToArray();

var sw = new Stopwatch();

sw.Start();

someNumbers.Where(x => x.ToString().Contains("3")).ToArray();

sw.Stop();

Console.WriteLine("Non PLINQ query took {0} ms", sw.ElapsedMilliseconds);

sw.Restart();

someNumbers.AsParallel().Where(x => x.ToString().Contains("3")).ToArray();

sw.Stop();

Console.WriteLine("PLINQ query took {0} ms", sw.ElapsedMilliseconds);

This produces the following output:

Non PLINQ query took 3218 ms
PLINQ query took 1494 ms

We can see from these results that the PLINQ version of the query executed about twice as fast as the non-PLINQ version.

Output element ordering

When a PLINQ query executes in parallel, the input sequence is broken up into sub-segments, so once each sub-segment has been processed, it needs to be added into the overall resulting output sequence.

While regular LINQ queries respect the ordering of input elements when they appear in the output sequence, PLINQ queries may not return elements in the same order as they were put in.

The following code demonstrates this by performing a simple Select query that just returns the input number unchanged on ten input numbers. The query uses PLINQ. The numbers are in the input and output sequences and output for comparison.

var inputNumbers = Enumerable.Range(1, 10).ToArray();

Console.WriteLine("Input numbers");

foreach (var num in inputNumbers)

{

    Console.Write(num + " ");

}

var outputNumbers = inputNumbers.AsParallel().Select(x => x);

Console.WriteLine();

Console.WriteLine("Output numbers");   

foreach (var num in outputNumbers)

{

    Console.Write(num + " ");

}  


This produces the following output:

Input numbers
1 2 3 4 5 6 7 8 9 10

Output numbers
1 4 7 9 2 5 8 10 3 6

Notice the output sequence element ordering does not match the input ordering.

To force the output sequence to be in the same order as the input sequence, the AsOrdered PLINQ extension method can be used, as the following code demonstrates.

var inputNumbers = Enumerable.Range(1, 10).ToArray();

var outputNumbers = inputNumbers.AsParallel().AsOrdered().Select(x => x);

Console.WriteLine("Output numbers");

foreach (var num in outputNumbers)

{

    Console.Write(num + " ");

}

This produces the following output:

Output numbers
1 2 3 4 5 6 7 8 9 10

Notice now the output elements are in the same order as the input elements. It should be noted that PLINQ has to do extra work to track the position of input elements when AsOrdered is used, so there may be some negative performance implications, depending on the size of the input sequence. If only part of the query requires ordering preservation, then AsOrdered can be used for those parts. If other parts of the query do not require order preservation, then the default unordered behavior can be restored by adding a call to the AsUnordered extension method. Subsequent query operators after the AsUnordered will not track the input ordering of elements.

Potential PLINQ Problems

There are a number of things to be aware of when using PLINQ:

·     PLINQ may not always be faster than LINQ

·     Avoid writing to shared memory (such as static variables)

·     Do not call non-thread safe methods from PLINQ

·     Calling thread-safe methods may incur locking/synchronization overheads

For further information and a more comprehensive list, see the MSDN documentation at https://msdn.microsoft.com/en-us/library/dd997403%28v=vs.110%29.aspx.

Mixing LINQ and PLINQ

A single query can execute partly in standard sequential mode and have some parts of the query execute in parallel. This provides flexibility to the query author and enables non-thread-safe parts of the query (or parts where parallelization may run more slowly) to run sequentially, while other parts run in parallel.

While the AsParallel method switches the query into PLINQ mode, the companion AsSequential extension method switches back to regular LINQ. 

The following code shows the method signature of the AsSequential method. Notice that the extension method works on a ParallelQuery<TSource> input sequence and returns a standard IEnumerable<T>. Because the return type is a normal IEnumerable<T>, subsequent query operators will bind to the standard local LINQ operators, not the PLINQ ones.

public static IEnumerable<TSource> AsSequential<TSource>(
                                          this ParallelQuery<TSource> source)

The following code demonstrates how to switch between LINQ and PLINQ in a single query.

IEnumerable<int> inputNumbers = Enumerable.Range(1, 10).ToArray();

IEnumerable<int> outputNumbers = inputNumbers

                                   .AsParallel()

                                   .Select(x => x) // PLINQ version of Select

                                   .AsSequential()

                                   .Select(x => x); // LINQ version of Select

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.