Complex EF joins and aggregation

Home / Complex EF joins and aggregation

There are a few scenarios, especially when using partitioning/windowing T-SQL functions, that EF falls down a bit. It also is not entirely straight forward to perform somewhat complex joins and aggregation.


Earlier today, I was looking at aggregating data for InvoiceDetails (line items) for a given set of Invoices. The line items have a many to many cross reference table that maps the line items to criteria groups (or categories) by the Id of the product on the line item. This is, effectively, a method to measure performance metrics based on products sold grouped into specific categories. The database looks something like this:

In T-SQL, it’s pretty easy to aggregate quantities/amounts based on the grouping the groups. The query would look like this;

SELECT COALESCE(SUM(id.Quantity), 0) Quantity, COALESCE(SUM(i.TotalAmount), 0) Vaps, g.GroupDescription Description
FROM [Commission].[InvoiceDetail] id
INNER JOIN Invoice i ON i.InvoiceId = id.InvoiceId
INNER JOIN GroupXref x ON x.ProductId = id.ProductId
INNER JOIN Group ON g.GroupId = x.GroupId
WHERE i.InvoiceDate >= DATEADD(d,0,DATEDIFF(d,0,GETDATE())) -- midnight today
AND i.InvoiceDate <  DATEADD(d,1,DATEDIFF(d,0,GETDATE())) -- midnight tomorrow
GROUP BY g.GroupCode, g.GroupDescription

However, I don’t want to run a stored procedure, view, or whatever. I want to achieve the same thing in LINQ through EF. Without navigation properties defined, this looks like it will be messy. LINQ and EF do support arbitrary joins without defined foreign keys. It is a little messy, but not horrible.

Essentially, different queries (IQueryable) can be defined and, using the fluent mapping conventions, these queries can be joined arbitrarily.

In my case, I have (3) repositories defined to go grab the data from the (4) tables. I grab these from my IoC container.

var idRepo = ((IRepository<InvoiceDetail>)GlobalConfiguration.Configuration.DependencyResolver.GetService(typeof(IRepository<InvoiceDetail>)));
var xrefRepo = ((IRepository<GroupXref>)GlobalConfiguration.Configuration.DependencyResolver.GetService(typeof(IRepository<GroupXref>)));
var gRepo = ((IRepository<Group>)GlobalConfiguration.Configuration.DependencyResolver.GetService(typeof(IRepository<Group>)));

For the xref and groups, I want them all. I’m also going to define a date range to limit which Invoices we’re aggregating.

var xrefQuery = xrefRepo.Get(x => true);
var groupQuery = gRepo.Get(x => true);

var startDate = DateTime.Today.AddDays(-1000);
var endDate = DateTime.Today.AddDays(1);

Then, to produce the same, effective, SQL as the query above, we join all the stuff:

var query = idRepo.Get(x => x.Invoice.InvoiceDate >= startDate && x.Invoice.InvoiceDate < endDate)
    .Join(xrefQuery,
        id => id.ProductId,
        x => x.ProductId,
        (id, x) => new
        {
            InvoiceDetail = td,
            PerformanceGroupXref = x
        })
    .Join(bcQuery,
        ix => ix.GroupXref.GroupId,
        g => g.GroupId,
        (ix, g) => new
        {
            InvoiceDetailWithPerfXref = ix,
            Group = g
        })
    // we can filter our results right here if we want ..
    //.Where(x => x.Group.GroupCode == "WHATEVER")
    .GroupBy(x => new { GroupCode = x.Group.GroupCode, GroupDescription = x.Group.GroupDescription })
    .Select(g => new
    {
        GroupCode = g.Key.GroupCode,
        GroupDescription = g.Key.GroupDescription,
        Quantity = g.Sum(x => x.InvoiceDetailWithGroupXref.InvoiceDetail.Quantity),
        Amount = g.Sum(x => x.InvoiceDetailWithGroupXref.InvoiceDetail.TotalAmount),
    });

// Get the list of groups
var list = query.ToList();

// Get the sum of from an arbitrary group
var quantity = query.First(x => x.GroupCode == "SOMEVALUE").Quantity.Value;

That’s it. The resulting list is a list of IGrouping which we could aggregate further if we wanted.

The generated SQL isn’t too terrible either.

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [GroupCode], 
    [GroupBy1].[K2] AS [GroupDescription], 
    [GroupBy1].[A1] AS [C2], 
    [GroupBy1].[A2] AS [C3]
    FROM ( SELECT 
        [Filter1].[K1] AS [K1], 
        [Filter1].[K2] AS [K2], 
        SUM([Filter1].[A1]) AS [A1], 
        SUM([Filter1].[A2]) AS [A2]
        FROM ( SELECT 
            [Extent4].[GroupCode] AS [K1], 
            [Extent4].[GroupDescription] AS [K2], 
            [Extent1].[Quantity] AS [A1], 
            [Extent1].[TotalAmount] / cast(10 as decimal(18)) AS [A2]
            FROM    [dbo].[InvoiceDetail] AS [Extent1]
            INNER JOIN [dbo].[Invoice] AS [Extent2] ON [Extent1].[InvoiceID] = [Extent2].[InvoiceID]
            INNER JOIN [dbo].[GroupXref] AS [Extent3] ON [Extent1].[ProductId] = [Extent3].[ProductId]
            INNER JOIN [dbo].[Group] AS [Extent4] ON [Extent3].[GroupID] = [Extent4].[GroupId]
            WHERE ([Extent2].[InvoiceDate] >= '12/4/2013 12:00:00 AM') AND ([Extent2].[InvoiceDate] < '8/31/2016 12:00:00 AM') AND (([Extent4].[GroupCode] = 'WHATEVER1') OR (([Extent4].[GroupCode] IS NULL) AND ('WHATEVER1' IS NULL)) OR ([Extent4].[GroupCode] = 'WHATEVER2') OR (([Extent4].[GroupCode] IS NULL) AND ('WHATEVER2' IS NULL)))
        )  AS [Filter1]
        GROUP BY [K1], [K2]
    )  AS [GroupBy1]

Grouping with aggregation is about the best you can get with EF. There is no way to perform PARTITION BY type queries directly with EF.

Leave a Reply