Flexible Reporting with LINQ and C# 4.0 dynamic keyword
July 10, 2011
It’s commonly very difficult to question business people about reporting requirements. It’s not really their fault either – they just can’t know exactly what they want until they’re trying to answer a question and can’t easily do it with the reports you’ve given them.
This is why it’s good to make reports as flexible and updateable as possible, but with as little developer required to update the reports as possible.
If you’re operating in an environment where all database access must be via stored procedures, this is a really big problem. It’s really unlikely that the changes requested by business can be implemented with the same stored procedure you naïvely created for your first attempt. I’ve seen scenarios where a database has stored procedures with the suffixes GetReport, GetReport2, GetReport3, GetReport4, etc. Yuck.
Even if you’re using Entity Framework, LINQ to SQL, or some other data layer framework that enables more free-form access to the database, you can’t always ensure that all report queries will result in good execution costs and actually be performant.
This is why it can sometimes be a good idea to perform a very basic database query (via stored procedure if necessary) to get a base set of data, and then perform more conditional operations on it in memory with LINQ. It’s a pain to do a “Name Contains” filter in a stored procedure (especially if there are a dozen other options) but with LINQ it’s no big deal.
1 2 3 4
IEnumerable data = GetBaseData(); if (!String.IsNullOrEmpty(nameFilter)) data = data.Where(d => d.Name.IndexOf(nameFilter, StringComparison.OrdinalIgnoreCase) >= 0);
This is really great for simple filters, but gets difficult when we want to do more complex grouping and aggregating functions, such as grouping by Hourly/Daily/Weekly/Monthly and/or by other data points.
The remainder of this article will show how this can be done with static code, and then how we can drastically increase the maintainability of this same code by employing the dynamic keyword introduced in C# 4.0.
The .NET 3.5 Way
Let’s consider the following data type. It represents a simplified sort of reporting data you might see from an ad platform that is utilized on the web and also on mobile devices (iOS and Android).
1 2 3 4 5 6 7 8 9
public class ReportData { public DateTime Date { get; set; } public string Platform { get; set; } public string Size { get; set; } public int Views { get; set; } public int Clicks { get; set; } }
We want the ability to do any of these things independently or together:
Group Date by Hour, Day, Week, Hour, or an overall Summary
Group all platforms together
Group all sizes together
Of course, no matter what we do, we will always sum the Views and Clicks.
Let’s start with some code to generate quite a bit of random data and then show it to us:
class Program { static void Main(string[] args) { List data = CreateRandomData(); Report("Raw Data, Top 10", data.Take(10)); Console.ReadLine(); } private static List CreateRandomData() { List data = new List(); Random rand = new Random(); DateTime start = new DateTime(2011,1,1); DateTime end = new DateTime(2011, 7, 1); for (DateTime dt = start; dt < end; dt = dt.AddHours(1)) { foreach (string platform in new string[] { "Web", "iOS", "Android" }) { foreach (string size in new string[] { "Standard", "Enhanced" }) { int views = rand.Next(0, 50); int clicks = rand.Next(0, views / 3); data.Add(new ReportData { Date = dt, Platform = platform, Size = size, Views = views, Clicks = clicks }); } } } Console.WriteLine("Generated {0} Sample Rows", data.Count); return data; } private static void Report(string label, IEnumerable data) { Console.WriteLine(); Console.WriteLine(label); string format = "{0:MM/dd/yyyy HH:mm} {1,-8} {2,-8} {3,6} {4,6}"; Console.WriteLine(format, "Date ", "Platform", "Size", "Views", "Clicks"); foreach (ReportData d in data) Console.WriteLine(format,d.Date, d.Platform, d.Size, d.Views, d.Clicks); } }
This results in the following output. (The Views and Clicks will be different each time because they are generated randomly.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Generated 26064 Sample Rows Raw Data, Top 10 Date Platform Size Views Clicks 01/01/2011 00:00 Web Standard 15 2 01/01/2011 00:00 Web Enhanced 34 1 01/01/2011 00:00 iOS Standard 36 2 01/01/2011 00:00 iOS Enhanced 20 5 01/01/2011 00:00 Android Standard 44 8 01/01/2011 00:00 Android Enhanced 29 4 01/01/2011 01:00 Web Standard 45 9 01/01/2011 01:00 Web Enhanced 30 7 01/01/2011 01:00 iOS Standard 25 5 01/01/2011 01:00 iOS Enhanced 47 9
So how would we do this filtering statically? For each item we want to group by, we have to do a combination GroupBy-Select that groups on everything ELSE.
Here is the example for Platform and Size, which are a lot simpler than for Date.
private static IEnumerable ProcessVersion1(IEnumerable data, DateMode dateMode, bool platform, bool size) { if(platform) { data = data.GroupBy(d => new { Date = d.Date, Size = d.Size, }) .Select(g => new ReportData { Date = g.Key.Date, Platform = "All", Size = g.Key.Size, Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); } if (size) { data = data.GroupBy(d => new { Date = d.Date, Platform = d.Platform, }) .Select(g => new ReportData { Date = g.Key.Date, Platform = g.Key.Platform, Size = "All", Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); } // Date to come later }
So to group all the Platforms together, we must group by Date and Size, then emit new data items that have “All” for the Platform, emit the Date and Size from the grouping key, and sum the Views and Clicks. To group all Sizes together, we must group by Date and Platform, then emit new data items that have “All” for the Size, emit the Date and Platform from the grouping key, and sum the Views and Clicks.
In short, to group an axis together, you must (for the most part) focus on all of the OTHER axes.
public enum DateMode { Hourly, Daily, Weekly, Monthly, Summary, } private static IEnumerable ProcessVersion1(IEnumerable data, DateMode dateMode, bool platform, bool size) { // Platform and Size, handled previously switch (dateMode) { case DateMode.Hourly: // No need to modify the data, we're assuming it comes from // the database already grouped by Hour break; case DateMode.Daily: data = data.GroupBy(d => new { Date = d.Date.Date, Platform = d.Platform, Size = d.Size }) .Select(g => new ReportData { Date = g.Key.Date, Platform = g.Key.Platform, Size = g.Key.Size, Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); break; case DateMode.Weekly: data = data.GroupBy(d => new { Date = d.Date.Date.AddDays(-((int)d.Date.Date.DayOfWeek)), Platform = d.Platform, Size = d.Size }) .Select(g => new ReportData { Date = g.Key.Date, Platform = g.Key.Platform, Size = g.Key.Size, Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); break; case DateMode.Monthly: data = data.GroupBy(d => new { Date = new DateTime(d.Date.Year, d.Date.Month, 1), Platform = d.Platform, Size = d.Size }) .Select(g => new ReportData { Date = g.Key.Date, Platform = g.Key.Platform, Size = g.Key.Size, Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); break; case DateMode.Summary: data = data.GroupBy(d => new { Platform = d.Platform, Size = d.Size }) .Select(g => new ReportData { Date = DateTime.MinValue, Platform = g.Key.Platform, Size = g.Key.Size, Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); break; } return data; }
Can you imagine how much of a mess it would be if we had to add a new property to this mix? If we were to add another property that was similar to Platform or Size, we would need to reference it twice for each of the 4 date cases (in the GroupBy and in the Select) and then twice each for the existing Platform and Size groupings, not to mention its own grouping logic. That’s 12 edit points needed to add one property!
That simply is not maintainable, and there must be a better way.
One answer to the Stack Overflow question Dynamic LINQ GroupBy Multiple Columns suggests a way that would make this still possible without using dynamic code. It involves making an EntryGrouper class that implements IEquatable and serves as the grouping key for the GroupBy operations. While this method should work, it is by no means short, and would have to be re-implemented for every new use case. The use of the dynamic keyword I present can be reused over and over.
Introducing the dynamic keyword
C# 4.0 introduces the dynamic keyword (and indeed a whole new dynamic runtime) that enables us to declare any properties we want on an object, and it is not evaluated until runtime.
Now, this isn’t completely magical – you could cast any object to a dynamic and it would compile, but at runtime there must be some mechanism to provide the values and methods you try to use.
Most of the time, we will use System.Dynamic.ExpandoObject to accomplish this for us. ExpandoObject by itself is another statically-typed object that we cannot declare random properties on. Only when we assign it to a dynamic variable does it get this power, but under the hood, the ExpandoObject essentially stores values in a Dictionary, and we can exploit that for our purposes.
By itself, ExpandoObject does not override GetHashCode() or Equals() from the base object, and LINQ’s GroupBy method needs this in order to do its work. So first, we must create a wrapper that enables us to use the power of the dynamic keyword, compute equivalence and a hash code, and enable us to get back at the original data for the Select afterwards.
private static IEnumerable ProcessVersion2(IEnumerable data, DateMode dateMode, bool platform, bool size) { return data .GroupBy(d => { dynamic key = new ExpandoObject(); switch (dateMode) { case DateMode.Hourly: key.Date = d.Date; break; case DateMode.Daily: key.Date = d.Date.Date; break; case DateMode.Weekly: key.Date = d.Date.Date.AddDays(-((int)d.Date.DayOfWeek)); break; case DateMode.Monthly: key.Date = new DateTime(d.Date.Year, d.Date.Month, 1); break; case DateMode.Summary: key.Date = DateTime.MinValue; break; } if (!platform) key.Platform = d.Platform; if (!size) key.Size = d.Size; return new DynamicHashWrapper(key); }) .Select(g => new ReportData { Date = g.Key.Value.Date, Platform = platform ? "All" : g.Key.Value.Platform, Size = size ? "All" : g.Key.Value.Size, Views = g.Sum(d => d.Views), Clicks = g.Sum(d => d.Clicks) }); }
Now we have a single GroupBy and a single Select. By the way, grouping and selecting only once means we should get a performance improvement because we aren’t generating a bunch of intermediate objects each time we do a grouping – we create one dynamic grouping key and spin through the collection only once. Of course, the dynamic object probably has its own overhead from not being statically compiled, but the improvement in maintainability is WELL worth that small setback.
As a comparison, this new implementation takes 33 lines of code, whereas the original was 111 lines. Who wants to try to maintain a 111-line method!?
Now we can try a bunch of different groupings and see them in action.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
static void Main(string[] args) { List data = CreateRandomData(); Report("Raw Data, Top 10", data.Take(10)); Report("Group Platform", Process(data, DateMode.Hourly, true, false).Take(10)); Report("Group Size", Process(data, DateMode.Hourly, false, true).Take(10)); Report("Group Platform & Size", Process(data, DateMode.Hourly, true, true).Take(10)); Report("Group All by Day", Process(data, DateMode.Daily, true, true).Take(10)); Report("Group All by Week", Process(data, DateMode.Weekly, true, true).Take(10)); Report("Group All by Month", Process(data, DateMode.Monthly, true, true).Take(10)); Report("Group All Summary", Process(data, DateMode.Summary, true, true).Take(10)); Report("Summary By Types", Process(data, DateMode.Summary, false, false).Take(10)); Console.ReadLine(); }
Generated 26064 Sample Rows Raw Data, Top 10 Date Platform Size Views Clicks 01/01/2011 00:00 Web Standard 15 3 01/01/2011 00:00 Web Enhanced 45 5 01/01/2011 00:00 iOS Standard 9 1 01/01/2011 00:00 iOS Enhanced 6 1 01/01/2011 00:00 Android Standard 0 0 01/01/2011 00:00 Android Enhanced 7 0 01/01/2011 01:00 Web Standard 33 7 01/01/2011 01:00 Web Enhanced 1 0 01/01/2011 01:00 iOS Standard 28 7 01/01/2011 01:00 iOS Enhanced 25 5 Group Platform Date Platform Size Views Clicks 01/01/2011 00:00 All Standard 24 4 01/01/2011 00:00 All Enhanced 58 6 01/01/2011 01:00 All Standard 105 14 01/01/2011 01:00 All Enhanced 49 8 01/01/2011 02:00 All Standard 133 13 01/01/2011 02:00 All Enhanced 36 6 01/01/2011 03:00 All Standard 95 16 01/01/2011 03:00 All Enhanced 94 25 01/01/2011 04:00 All Standard 58 14 01/01/2011 04:00 All Enhanced 78 3 Group Size Date Platform Size Views Clicks 01/01/2011 00:00 Web All 60 8 01/01/2011 00:00 iOS All 15 2 01/01/2011 00:00 Android All 7 0 01/01/2011 01:00 Web All 34 7 01/01/2011 01:00 iOS All 53 12 01/01/2011 01:00 Android All 67 3 01/01/2011 02:00 Web All 55 0 01/01/2011 02:00 iOS All 42 10 01/01/2011 02:00 Android All 72 9 01/01/2011 03:00 Web All 53 11 Group Platform & Size Date Platform Size Views Clicks 01/01/2011 00:00 All All 82 10 01/01/2011 01:00 All All 154 22 01/01/2011 02:00 All All 169 19 01/01/2011 03:00 All All 189 41 01/01/2011 04:00 All All 136 17 01/01/2011 05:00 All All 142 17 01/01/2011 06:00 All All 150 15 01/01/2011 07:00 All All 181 33 01/01/2011 08:00 All All 143 22 01/01/2011 09:00 All All 208 34 Group All by Day Date Platform Size Views Clicks 01/01/2011 00:00 All All 3756 552 01/02/2011 00:00 All All 3370 479 01/03/2011 00:00 All All 3537 503 01/04/2011 00:00 All All 3298 469 01/05/2011 00:00 All All 3605 573 01/06/2011 00:00 All All 3617 518 01/07/2011 00:00 All All 3193 450 01/08/2011 00:00 All All 3485 513 01/09/2011 00:00 All All 3629 504 01/10/2011 00:00 All All 3628 533 Group All by Week Date Platform Size Views Clicks 12/26/2010 00:00 All All 3756 552 01/02/2011 00:00 All All 24105 3505 01/09/2011 00:00 All All 24585 3495 01/16/2011 00:00 All All 24618 3553 01/23/2011 00:00 All All 24875 3471 01/30/2011 00:00 All All 24845 3634 02/06/2011 00:00 All All 24528 3391 02/13/2011 00:00 All All 24561 3467 02/20/2011 00:00 All All 24280 3621 02/27/2011 00:00 All All 24614 3507 Group All by Month Date Platform Size Views Clicks 01/01/2011 00:00 All All 109101 15593 02/01/2011 00:00 All All 98057 14078 03/01/2011 00:00 All All 108231 15308 04/01/2011 00:00 All All 106527 15286 05/01/2011 00:00 All All 110096 15742 06/01/2011 00:00 All All 106354 14982 Group All Summary Date Platform Size Views Clicks 01/01/0001 00:00 All All 638366 90989 Summary By Types Date Platform Size Views Clicks 01/01/0001 00:00 Web Standard 105615 14983 01/01/0001 00:00 Web Enhanced 107740 15448 01/01/0001 00:00 iOS Standard 107203 15417 01/01/0001 00:00 iOS Enhanced 106457 15117 01/01/0001 00:00 Android Standard 104772 14840 01/01/0001 00:00 Android Enhanced 106579 15184
Summary and Code Download
With the static alternatives, we get a lot of spaghetti code that is impossible to maintain, and even if it was, the many GroupBy/Select iterations required for multiple axes result in unnecessary object creation, loops through the data collection, and eventual garbage collection.
Using the dynamic keyword and a small, reusable trick to enable the dynamic object to generate meaningful hash codes, we can loop through the data collection once. Our results are just as good, and the next time someone visits this code, they’re much less likely to screw it up.
The code in this article is fairly fragmented. DynamicReporting.cs - Click here to download the full source code.