CHAPTER 2
There are two styles of writing LINQ queries: the fluent style (or fluent syntax) and the query expression style (or query syntax).
The fluent style uses query operator extension methods to create queries, while the query expression style uses a different syntax that the compiler will translate into fluent syntax.
The code samples up until this point have all used the fluent syntax.
The fluent syntax makes use of the query operator extension methods as defined in the static System.Linq.Enumerable class (or System.Linq.Queryable for interpreted or System.Linq.ParallelEnumerable for PLINQ queries). These extension methods add additional methods to instances of IEnumerable<TSource>. This means that any instance of a class that implements this interface can use these fluent LINQ extension methods.
Query operators can be used singularly, or chained together to create more complex queries.
If the following class has been defined:
class Ingredient { public string Name { get; set; } public int Calories { get; set; } |
The following code will use three chained query operators: Where, OrderBy, and Select.
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=150}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Butter", Calories=200} }; IEnumerable<string> highCalorieIngredientNamesQuery = ingredients.Where(x => x.Calories >= 150) .OrderBy(x => x.Name) .Select(x => x.Name); { Console.WriteLine(ingredientName); } |
Executing this code produces the following output:
Butter
Milk
Sugar
Figure 2 shows a graphical representation of this chain of query operators. Each operator works on the sequence provided by the preceding query operator. Notice that the initial input sequence (represented by the variable ingredients) is an IEnumerable<Ingredient>, whereas the output sequence (represented by the variable highCalorieIngredientNamesQuery) is a different type; it is an IEnumerable<string>.
In this example, the chain of operators is working with a sequence of Ingredient elements until the Select operator transforms each element in the sequence; each Ingredient object is transformed to a simple string. This transformation is called projection. Input elements are projected into transformed output elements.

Figure 2: Multiple query operators acting in a chain
The lambda expression provided to the Select query operator decides what “shape” the elements in the output sequence will take. The lambda expression x => x.Name is telling the Select operator “for each Ingredient element, output a string element with the value of the Name property from the input Ingredient.”
Note: Individual query operators will be discussed later in Chapter 3, LINQ Query Operators.
Query expressions offer a syntactical nicety on top of the fluent syntax.
The following code shows the equivalent version using query syntax of the preceding fluent style query.
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=150}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Butter", Calories=200} }; IEnumerable<string> highCalorieIngredientNamesQuery = from i in ingredients where i.Calories >= 150 orderby i.Name select i.Name;
foreach (var ingredientName in highCalorieIngredientNamesQuery) { Console.WriteLine(ingredientName); } |
Executing this code produces the same output as the fluent syntax version:
Butter
Milk
Sugar
The steps that are performed are the same as in the fluent syntax version, with each query clause (from, where, orderby, select) passing on a modified sequence to the next query clause.
The query expression in the preceding code begins with the from clause. The from clause has two purposes: the first is to describe what the input sequence is (in this case ingredients); the second is to introduce a range variable.
The final clause is the select clause, which describes what the output sequence will be from the entire query. Just as with the fluent syntax version, the select clause in the preceding code is projecting a sequence of Ingredient objects into a sequence of string objects.
A range variable is an identifier that represents each element in the sequence in turn. It’s similar to the variable used in a foreach statement; as the sequence is processed, this range variable represents the current element being processed. In the preceding code, the range variable i is being declared. Even though the same range variable identifier i is used in each clause, the sequence that the range variable “traverses” is different. Each clause works with the input sequence produced from the preceding clause (or the initial input IEnumerable<T>). This means that each clause is processing a different sequence; it is simply the name of the range variable identifier that is being reused.
In addition to the range variable introduced in the from clause, additional range variables can be added using other clauses or keywords. The following can introduce new range variables into a query expression:
· Additional from clauses
· The let clause
· The into keyword
· The join clause
A let clause in a LINQ query expression allows the introduction of an additional range variable. This additional range variable can then be used in other clauses that follow it.
In the following code, the let clause is used to introduce a new range variable called isDairy, which will be of type Boolean.
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=150}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Butter", Calories=200} }; IEnumerable<Ingredient> highCalDairyQuery = from i in ingredients let isDairy = i.Name == "Milk" || i.Name == "Butter" where i.Calories >= 150 && isDairy select i; foreach (var ingredient in highCalDairyQuery) { Console.WriteLine(ingredient.Name); } |
The output of this code produces:
Milk
Butter
In the preceding code, the isDairy range variable is introduced and then used in the where clause. Notice that the original range variable i remains available to the select clause.
In this example, the new range variable is a simple scalar value, but let can also be used to introduce a subsequence. In the following code sample, the range variable ingredients is not a scalar value, but an array of strings.
string[] csvRecipes = { "milk,sugar,eggs", "flour,BUTTER,eggs", "vanilla,ChEEsE,oats" }; var dairyQuery = from csvRecipe in csvRecipes let ingredients = csvRecipe.Split(',') from ingredient in ingredients let uppercaseIngredient = ingredient.ToUpper() where uppercaseIngredient == "MILK" || uppercaseIngredient == "BUTTER" || uppercaseIngredient == "CHEESE" select uppercaseIngredient; foreach (var dairyIngredient in dairyQuery) { Console.WriteLine("{0} is dairy", dairyIngredient); } |
Notice in the preceding code that we are using multiple let clauses as well as additional from clauses. Using additional from clauses is another way to introduce new range variables into a query expression.
The into keyword also allows a new identifier to be declared that can store the result of a select clause (as well as group and join clauses.)
The following code demonstrates using into to create a new anonymous type and then using this in the remainder of the query expression.
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=150}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Butter", Calories=200} }; IEnumerable<Ingredient> highCalDairyQuery = from i in ingredients select new // anonymous type { OriginalIngredient = i, IsDairy = i.Name == "Milk" || i.Name == "Butter", IsHighCalorie = i.Calories >= 150 } into temp where temp.IsDairy && temp.IsHighCalorie // cannot write "select i;" as into hides the previous range variable i select temp.OriginalIngredient;
foreach (var ingredient in highCalDairyQuery) { Console.WriteLine(ingredient.Name); } |
This code produces the following output:
Milk
Butter
Note that using into hides the previous range variable i. This means that i cannot be used in the final select.
The let clause, however, does not hide the previous range variable(s), meaning they can still be used later in query expressions.
The join clause takes two input sequences in which elements in either sequence do not necessarily have any direct relationship in the class domain model.
To perform a join, some value of the elements in the first sequence is compared for equality with some value of the elements in the second sequence. It is important to note here that the join clause perform equi-joins; the values from both sequences are compared for equality. This means that the join clause does not support non-equijoins such as inequality, or comparisons such as greater-than or less-than. Because of this, rather than specifying joined elements using an operator such as == in C#, the equals keyword is used. The design thinking about introducing this keyword is to make it very clear that joins are equi-joins.
Common types of joins include:
· Inner joins.
· Group joins.
· Left outer joins.
The following join code examples assume that the following classes have been defined:
class Recipe { public int Id { get; set; } public string Name { get; set; } } class Review { public int RecipeId { get; set; } public string ReviewText { get; set; } } |
These two classes model the fact that recipes can have 0, 1, or many reviews. The Review class has a RecipeID property that holds the id of the recipe that the Review pertains to; notice here though that there is no direct relationship in the form of a property of type Recipe.
Inner join
An inner join returns an element in the output sequence for each item in the first sequence that has matching items in the second sequence. If an element in the first sequence does not have any matching elements in the second sequence, it will not appear in the output sequence.
Take the following code example:
Recipe[] recipes = { new Recipe {Id = 1, Name = "Mashed Potato"}, new Recipe {Id = 2, Name = "Crispy Duck"}, new Recipe {Id = 3, Name = "Sachertorte"} }; Review[] reviews = { new Review {RecipeId = 1, ReviewText = "Tasty!"}, new Review {RecipeId = 1, ReviewText = "Not nice :("}, new Review {RecipeId = 1, ReviewText = "Pretty good"}, new Review {RecipeId = 2, ReviewText = "Too hard"}, new Review {RecipeId = 2, ReviewText = "Loved it"} }; var query = from recipe in recipes join review in reviews on recipe.Id equals review.RecipeId select new // anonymous type { RecipeName = recipe.Name, RecipeReview = review.ReviewText }; foreach (var item in query) { Console.WriteLine("{0} - '{1}'", item.RecipeName, item.RecipeReview); } |
In this preceding code, two input sequences are created: recipes, holding a number of recipes, and a second sequence, reviews of Review objects.
The query expression starts with the usual from clause pointing to the first sequence (recipes) and declaring the range variable recipe.
Next comes the use of the join clause. Here another range variable (review) is introduced that represents elements being processed in the reviews input sequence. The on keyword allows the specification of what value of the recipe range variable object is related to which value of the review range variable object. Again the equals keyword is used here to represent an equi-join. This join clause is essentially stating that reviews belong to recipes as identified by the common values of Id in Recipe and RecipeId in Review.
The result of executing this code is as follows:
Mashed Potato - 'Tasty!'
Mashed Potato - 'Not nice :('
Mashed Potato - 'Pretty good'
Crispy Duck - 'Too hard'
Crispy Duck - 'Loved it'
Notice here that the recipe “Sachertorte” does not exist in the output. This is because there are no reviews for it, i.e. there are no matching elements in the second input sequence (reviews).
Also notice that the result are “flat,” meaning there is no concept of groups of reviews belonging to a “parent” recipe.
Group join
A group join can produce a grouped hierarchical result where items in the second sequence are matched to items in the first sequence.
Unlike the previous inner join, the output of a group join can be organized hierarchically with reviews grouped into their related recipe.
Recipe[] recipes = { new Recipe {Id = 1, Name = "Mashed Potato"}, new Recipe {Id = 2, Name = "Crispy Duck"}, new Recipe {Id = 3, Name = "Sachertorte"} }; Review[] reviews = { new Review {RecipeId = 1, ReviewText = "Tasty!"}, new Review {RecipeId = 1, ReviewText = "Not nice :("}, new Review {RecipeId = 1, ReviewText = "Pretty good"}, new Review {RecipeId = 2, ReviewText = "Too hard"}, new Review {RecipeId = 2, ReviewText = "Loved it"} }; var query = from recipe in recipes join review in reviews on recipe.Id equals review.RecipeId select new // anonymous type { RecipeName = recipe.Name, Reviews = reviewGroup // collection of related reviews }; foreach (var item in query) { Console.WriteLine("Reviews for {0}", item.RecipeName);
foreach (var review in item.Reviews) { Console.WriteLine(" - {0}", review.ReviewText); } } |
In this version, notice the addition of the into keyword. This allows the creation of hierarchical results. The reviewGroup range variable represents a sequence of reviews that match the join expression, in this case where the recipe.Id equals review.RecipeId.
To create the output sequence of groups (where each group contains the related reviews) the result of the query is projected into an anonymous type. Each instance of this anonymous type in the output sequence represents each group. The anonymous type has two properties: RecipeName coming from the element in the first sequence, and Reviews that come from the results of the join expression, i.e. the reviews that belong to the recipe.
The output of this code produces the following:
Reviews for Mashed Potato
- Tasty!
- Not nice :(
- Pretty good
Reviews for Crispy Duck
- Too hard
- Loved it
Reviews for Sachertorte
Notice the hierarchical output here, and also that “Sachertorte” has been included in the output this time, even though it has no associated reviews.
Left outer join
To get flat, non-hierarchical output that also includes elements in the first sequence that have no matching elements in the second sequence (in our example “Sachertorte”), the DefaultIfEmpty() query operator can be used in conjunction with an additional from clause to introduce a new range variable, rg, that will be set to null if there are no matching elements in the second sequence. The select code to create the anonymous type projection is also modified to account for the fact that we may now have a null review.
Notice in the following code that RecipeReview = rg.ReviewText will produce a System.NullReferenceException; hence the need for the null checking code.
Recipe[] recipes = { new Recipe {Id = 1, Name = "Mashed Potato"}, new Recipe {Id = 2, Name = "Crispy Duck"}, new Recipe {Id = 3, Name = "Sachertorte"} }; Review[] reviews = { new Review {RecipeId = 1, ReviewText = "Tasty!"}, new Review {RecipeId = 1, ReviewText = "Not nice :("}, new Review {RecipeId = 1, ReviewText = "Pretty good"}, new Review {RecipeId = 2, ReviewText = "Too hard"}, new Review {RecipeId = 2, ReviewText = "Loved it"} }; var query = from recipe in recipes join review in reviews on recipe.Id equals review.RecipeId into reviewGroup from rg in reviewGroup.DefaultIfEmpty() select new // anonymous type { RecipeName = recipe.Name, // RecipeReview = rg.ReviewText System.NullReferenceException RecipeReview = ( rg == null ? "n/a" : rg.ReviewText ) }; foreach (var item in query) { Console.WriteLine("{0} - '{1}'", item.RecipeName, item.RecipeReview); } |
This produces the following output:
Mashed Potato - 'Tasty!'
Mashed Potato - 'Not nice :('
Mashed Potato - 'Pretty good'
Crispy Duck - 'Too hard'
Crispy Duck - 'Loved it'
Sachertorte - 'n/a'
Notice here that the results are flat, and because of the null checking code in the select, “Sachertorte” is included in the results with a review of “n/a.”
As an alternative to performing the null check in the select, the DefaultIfEmpty method can be instructed to create a new instance rather than creating a null. The following code shows this alternative query expression:
var query = from recipe in recipes join review in reviews on recipe.Id equals review.RecipeId into reviewGroup from rg in reviewGroup .DefaultIfEmpty(new Review{ReviewText = "n/a"}) select new // anonymous type { RecipeName = recipe.Name, RecipeReview = rg.ReviewText }; |
Note that this code produces the same output as the previous version:
Mashed Potato - 'Tasty!'
Mashed Potato - 'Not nice :('
Mashed Potato - 'Pretty good'
Crispy Duck - 'Too hard'
Crispy Duck - 'Loved it'
Sachertorte - 'n/a'
There are a number of other syntactical elements when using query expressions:
· The group clause
· The orderby clause
· The ascending and descending keywords
· The by keyword
The group clause takes a flat input sequence and produces an output sequence of groups. Another way to think of this is that the output produced is like a list of lists; because of this, a nested for loop can be used to iterate over all the groups and group items.
The following code shows the use of the group clause and the by keyword to group all ingredients together that have the same amount of calories.
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Lard", Calories=500}, new Ingredient{Name = "Butter", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=100}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Oats", Calories=50}
}; IEnumerable<IGrouping<int, Ingredient>> query = from i in ingredients group i by i.Calories; foreach (IGrouping<int, Ingredient> group in query) { Console.WriteLine("Ingredients with {0} calories", group.Key); foreach (Ingredient ingredient in group) { Console.WriteLine(" - {0}", ingredient.Name); } } |
This code produces the following output:
Ingredients with 500 calories
- Sugar
- Lard
- Butter
Ingredients with 100 calories
- Egg
- Milk
Ingredients with 50 calories
- Flour
- Oats
In the preceding code, the first generic type parameter (int) in the IGrouping<int, Ingredient> represents the key of the group, in this example the number of calories. The second generic type parameter (Ingredient) represents the type of the list of items that have the same key (calorie value).
Also note that explicit types have been used in the code to better illustrate what types are being operated upon. The code could be simplified by using the var keyword; for example: var query = … rather than IEnumerable<IGrouping<int, Ingredient>> query = …
The orderby clause is used to produce a sequence sorted in either ascending (the default) or descending order.
The following code sorts a list of ingredients by name:
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Lard", Calories=500}, new Ingredient{Name = "Butter", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=100}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Oats", Calories=50}
}; IOrderedEnumerable<Ingredient> sortedByNameQuery = from i in ingredients orderby i.Name select i; foreach (var ingredient in sortedByNameQuery) { Console.WriteLine(ingredient.Name); } |
This produces the following sorted output:
Butter
Egg
Flour
Lard
Milk
Oats
Sugar
To sort in descending order (the non-default order), the query can be modified, and the descending keyword added, as shown in the following modified query:
IOrderedEnumerable<Ingredient> sortedByNameQuery = from i in ingredients orderby i.Name descending select i; |
This produces the following output:
Sugar
Oats
Milk
Lard
Flour
Egg
Butter
The orderby clause can also be used when grouping is being applied. In the following code, the calorie-grouped ingredients are sorted by the number of calories (which is the Key of each group).
Ingredient[] ingredients = { new Ingredient{Name = "Sugar", Calories=500}, new Ingredient{Name = "Lard", Calories=500}, new Ingredient{Name = "Butter", Calories=500}, new Ingredient{Name = "Egg", Calories=100}, new Ingredient{Name = "Milk", Calories=100}, new Ingredient{Name = "Flour", Calories=50}, new Ingredient{Name = "Oats", Calories=50} }; IEnumerable<IGrouping<int, Ingredient>> query = from i in ingredients group i by i.Calories into calorieGroup orderby calorieGroup.Key select calorieGroup;
foreach (IGrouping<int, Ingredient> group in query) { Console.WriteLine("Ingredients with {0} calories", group.Key); foreach (Ingredient ingredient in group) { Console.WriteLine(" - {0}", ingredient.Name); } } |
Notice in the preceding code, that even though the input sequence ingredients is in descending calorie order, the output below is sorted by the calorie value (the Key) of the group in ascending order:
Ingredients with 50 calories
- Flour
- Oats
Ingredients with 100 calories
- Egg
- Milk
Ingredients with 500 calories
- Sugar
- Lard
- Butter
The two LINQ styles can be used together; for example, fluent query operators can be mixed in with the query expression style.
Each syntax style has its own benefits.
Query expression style keyword availability
Not every query operator that is available using the fluent syntax has an equivalent query expression syntax keyword. The following query operators using the fluent style have associated keywords when using the query expression syntax:
· GroupBy
· GroupJoin
· Join
· OrderBy
· OrderByDescending
· Select
· SelectMany
· ThenBy
· ThenByDescending
· Where
Number of operators
If the query only requires the use of a single query operator to get the required results, then a single call to a fluent style operator is usually a smaller amount of code to write, and can also be comprehended more quickly by readers.
The following code shows a query returning all ingredients over 150 calories using both the fluent style (q1) and query expression style (q2). Notice the fluent style is much terser when needing to use only a single operator.
var q1 = ingredients.Where(x => x.Calories > 100); var q2 = from i in ingredients where i.Calories > 100 select i; |
Simple queries using basic operators
If the query is relatively simple and uses a few basic query operators such as Where and OrderBy, either the fluent or the query expression styles can be used successfully. The choice between the two styles in these cases is usually down to the personal preference of the programmer or the coding standards imposed by the organization/client.
Complex queries with range additional range variables
Once queries become more complex and require the use of additional range variables (e.g. when using the let clause or performing joins), then query expressions are usually simpler than the equivalent fluent style version.
If a query is being written using the query expression style, it is still possible to make use of query operators that have no equivalent query syntax keyword. This is accomplished by mixing both fluent syntax and query syntax in the same query.
We have already seen an example of mixing the two styles with the use of the DefaultIfEmpty fluent style operator when we discussed left outer joins earlier in this chapter.
The following example shows mixing the Count query operator (using fluent syntax) with a query expression (inside the parentheses).
int mixedQuery = (from i in ingredients where i.Calories > 100 select i).Count(); |