Optimizing Entity Framework

Home / Optimizing Entity Framework

In my previous post discussing profiling Entity Framework, I alluded to DbInterceptors as being able to provide lots of useful functionality. In this post I’ll expand on the DbInterceptor and show a few optimizations / manipulations that I like to perform on EF’s generated queries.


One problem that my organization faces involves database locking. As a result, we generally force the NOLOCK hints into all of our SELECT queries. An easy way to handle this is with manipulating the queries in a DbIntercepter. Using a Regular Expression, we can look for EF’s pattern for table aliases, and stick “WITH (NOLOCK)” at the appropriate points.

I like to create a CommandOptimizations.cs file with static methods for performing this optimization. The methods can then easily be called, or utilized, as needed, on a per-context basis. Here’s how the code looks:


private static Regex _tableAliasRegex = new Regex(@"(?<tableAlias>\[dbo\]\.\[([\w\d]*)\] AS \[Extent\d+\](?! WITH \(NOLOCK\)))", RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);

public static string AddNoLock(string commandText)
{
	commandText = _tableAliasRegex.Replace(commandText, "${tableAlias} WITH (NOLOCK)");
	return commandText;
}

Obviously, if your schema is something other than “dbo,” the RegEx would need to be revised.

Another interesting performance deficiency I noticed at some point with EF queries occurred when querying against a large conditional set. An example LINQ expression might look like:

var arr = new string[] { "1", "2", "3", "4", ....... "999" };
var resultSet = _context.MyEntities.Where(x => arr.Contains(x.Name)).ToList();

The resulting query for this is pretty straight forward and uses an “IN” expression to represent the large condition set. This isn’t a bad approach, but it gets slow as the set size increases. Again, we can intercept the queries that EF has generated with “IN” expressions and replace them with expressions that perform a JOIN against the conditional set. We can see a pretty sizable performance increase when using JOIN instead of IN. Here’s a bit of code that will look for the IN expression and build up an appropriate JOIN against array expression.

private static Regex _clauseRegex = new Regex(@"(WHERE \[.*\].\[.*] IN \((.*)?\))", RegexOptions.Compiled | RegexOptions.IgnoreCase);
private static Regex _paramRegex = new Regex(@"\[.*?\].?\[.*?\]", RegexOptions.Compiled | RegexOptions.IgnoreCase);

public static string ReplaceInClause(string commandText)
{
    if (commandText.Contains(" IN "))
    {
        List<string[]> replaceTexts = new List<string[]>();

        var count = 0;

        var matches1 = _clauseRegex.Matches(commandText);

        if (matches1.Count > 0)
        {
            foreach (Match match in matches1)
            {
                count++;
                var originalStr = match.Value;
                var index1 = originalStr.IndexOf("(") + 1;
                var index2 = originalStr.IndexOf(")");
                var values = originalStr.Substring(index1, index2 - index1).Split(',');
                var formattedValues = values.Select(x => string.Format("({0})", x.Trim()));
                var concatValues = string.Join(",", formattedValues);
                var paramName = _paramRegex.Match(originalStr);

                var replaceText = new string[] { match.Value, string.Format(" JOIN (values {0}) V{1}(ref{1}) on V{1}.ref{1} = {2} ", concatValues, count, paramName) };
                replaceTexts.Add(replaceText);
            }

            foreach (var replaceText in replaceTexts)
            {
                commandText = commandText.Replace(replaceText[0], replaceText[1]);
            }
        }
    }

    return commandText;
}

A third, rather frustrating, limitation of EF is that it doesn’t support querying across multiple databases out of the box. The work-around I devised for this was through a combination of command interception and specifying the schema with the ModelBuilder.

For example, if we define our table mappings like so:

modelBuilder.Entity<Entity1>().ToTable("ReplaceMeEntity1").HasKey(t => t.Id);
modelBuilder.Entity<Entity2>().ToTable("ReplaceMeEntity2").HasKey(t => t.Id);
modelBuilder.Entity<Entity3>().ToTable("ReplaceMeEntity3").HasKey(t => t.Id);

We can replace those strings before query execution.

private void FixDbCommand(DbCommand command, IEnumerable<DbContext> contexts)
{
    var context = contexts.FirstOrDefault() as MyEntityContext;
    if (context != null)
    {
        command.CommandText = command.CommandText
            .Replace("[dbo].[ReplaceMeEntity1]", "[EntitiesDb1].[dbo].[Entities1]")
            .Replace("[dbo].[ReplaceMeEntity2]", "[EntitiesDb2].[dbo].[Entities2]")
            .Replace("[dbo].[ReplaceMeEntity3]", "[EntitiesDb3].[dbo].[Entities3]");
        command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText);
        command.CommandText = CommandOptimizations.AddNoLock(command.CommandText);
    }
}

Tying it all together, the SELECT query based optimizations should be executed in the DbInterceptor when a Scalar or Reader query is executed. This applies to adding NOLOCK and the IN-clause replacement. The table/database name replacement should be performed for all query operations. Here’s a roll-up of the previous code showing how to integrate the parts together.

using System;
using System.Collections.Generic;
using System.Data.Common;
using System.Data.Entity;
using System.Data.Entity.Infrastructure.Interception;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Web.Http;

using log4net;

namespace MyApp
{
    public class MyContext : DbContext
    {
        public MyContext()
            : base("ConnStr")
        {
            Database.SetInitializer<MyContext>(null);
        }

        // Attach the interceptor
        static MyContext()
        {
            DbInterception.Add(new CommandInterceptor());
        }

        public DbSet<Entity1> Entities1 { get; set; }
        public DbSet<Entity2> Entities2 { get; set; }
        public DbSet<Entity3> Entities3 { get; set; }

        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<Entity1>().ToTable("ReplaceMeEntity1").HasKey(t => t.Id);
            modelBuilder.Entity<Entity2>().ToTable("ReplaceMeEntity2").HasKey(t => t.Id);
            modelBuilder.Entity<Entity3>().ToTable("ReplaceMeEntity3").HasKey(t => t.Id);
        }
    }

    /// <summary>
    /// Provides an interceptor to monitor EF commands and log via log4net
    /// </summary>
    public class CommandInterceptor : DbCommandInterceptor
    {
        private ILog _logger;
        private readonly Stopwatch _stopwatch = new Stopwatch();

        public CommandInterceptor()
        {
            _logger = (ILog)GlobalConfiguration.Configuration.DependencyResolver.GetService(typeof(ILog));
        }

        [ThreadStatic]
        public static bool SuppressNoLock;

        [ThreadStatic]
        public static bool SuppressInFix;

        [ThreadStatic]
        public static bool LogEntityStatsToDatabase = true;

        public override void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
        {
            FixDbCommand(command, interceptionContext.DbContexts);
            if (!SuppressNoLock && interceptionContext.DbContexts.FirstOrDefault() is MyContext)
            {
                command.CommandText = CommandOptimizations.AddNoLock(command.CommandText);
            }

            if (!SuppressInFix && interceptionContext.DbContexts.FirstOrDefault() is MyContext)
            {
                command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText);
            }
            base.ScalarExecuting(command, interceptionContext);
            _stopwatch.Restart();
        }

        public override void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
        {
            _stopwatch.Stop();
            if (interceptionContext.Exception != null)
            {
                Trace.TraceInformation("Error executing command: {0}", command.CommandText);
                if (LogEntityStatsToDatabase)
                {
                    _logger.Error(string.Format("Error executing command: {0}", command.CommandText), interceptionContext.Exception);
                }
            }
            else
            {
                Trace.TraceInformation("ScalarExecuted: {0}, Elapsed: {1}", command.CommandText, _stopwatch.Elapsed);
                if (LogEntityStatsToDatabase)
                {
                    _logger.DebugFormat("ScalarExecuted.  Elapsed: {0}, Command: {1}", _stopwatch.Elapsed, command.CommandText);
                }
            }
            base.ScalarExecuted(command, interceptionContext);
        }

        public override void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
        {
            FixDbCommand(command, interceptionContext.DbContexts);
            base.NonQueryExecuting(command, interceptionContext);
            _stopwatch.Restart();
        }

        public override void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
        {
            _stopwatch.Stop();
            if (interceptionContext.Exception != null)
            {
                Trace.TraceInformation("Error executing command: {0}", command.CommandText);
                if (LogEntityStatsToDatabase)
                {
                    _logger.Error(string.Format("Error executing command: {0}", command.CommandText), interceptionContext.Exception);
                }
            }
            else
            {
                Trace.TraceInformation("NonQueryExecuted: {0}, Elapsed: {1}", command.CommandText, _stopwatch.Elapsed);
                if (LogEntityStatsToDatabase)
                {
                    _logger.DebugFormat("NonQueryExecuted.  Elapsed: {0}, Command: {1}", _stopwatch.Elapsed, command.CommandText);
                }
            }
            base.NonQueryExecuted(command, interceptionContext);
        }

        public override void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
        {
            FixDbCommand(command, interceptionContext.DbContexts);
            if (!SuppressNoLock && interceptionContext.DbContexts.FirstOrDefault() is MyContext)
            {
                command.CommandText = CommandOptimizations.AddNoLock(command.CommandText);
            }

            if (!SuppressInFix && interceptionContext.DbContexts.FirstOrDefault() is MyContext)
            {
                command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText);
            }
            base.ReaderExecuting(command, interceptionContext);
            _stopwatch.Restart();
        }

        public override void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
        {
            _stopwatch.Stop();
            if (interceptionContext.Exception != null)
            {
                Trace.TraceInformation("Error executing command: {0}", command.CommandText);
                if (LogEntityStatsToDatabase)
                {
                    _logger.Error(string.Format("Error executing command: {0}", command.CommandText), interceptionContext.Exception);
                }
            }
            else
            {
                Trace.TraceInformation("ReaderExecuted: {0}, Elapsed: {1}", command.CommandText, _stopwatch.Elapsed);
                if (LogEntityStatsToDatabase)
                {
                    _logger.DebugFormat("ReaderExecuted.  Elapsed: {0}, Command: {1}", _stopwatch.Elapsed, command.CommandText);
                }
            }
            base.ReaderExecuted(command, interceptionContext);
        }

        private void FixDbCommand(DbCommand command, IEnumerable<DbContext> contexts)
        {
            var context = contexts.FirstOrDefault() as MyContext;
            if (context != null)
            {
                command.CommandText = command.CommandText
                    .Replace("[dbo].[ReplaceMeEntity1]", "[EntitiesDb1].[dbo].[Entities1]")
                    .Replace("[dbo].[ReplaceMeEntity2]", "[EntitiesDb2].[dbo].[Entities2]")
                    .Replace("[dbo].[ReplaceMeEntity3]", "[EntitiesDb3].[dbo].[Entities3]");
                command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText);
                command.CommandText = CommandOptimizations.AddNoLock(command.CommandText);
            }
        }
    }

    public static class CommandOptimizations
    {
        private static Regex _clauseRegex = new Regex(@"(WHERE \[.*\].\[.*] IN \((.*)?\))", RegexOptions.Compiled | RegexOptions.IgnoreCase);
        private static Regex _paramRegex = new Regex(@"\[.*?\].?\[.*?\]", RegexOptions.Compiled | RegexOptions.IgnoreCase);
        private static Regex _tableAliasRegex = new Regex(@"(?<tableAlias>\[dbo\]\.\[([\w\d]*)\] AS \[Extent\d+\](?! WITH \(NOLOCK\)))", RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);

        /// <summary>
        /// Replaces slower IN commands with JOINs.  The performance increase is significant.
        /// </summary>
        /// <param name="command"></param>
        /// <returns></returns>
        public static string ReplaceInClause(string commandText)
        {
            if (commandText.Contains(" IN "))
            {
                List<string[]> replaceTexts = new List<string[]>();

                var count = 0;

                var matches1 = _clauseRegex.Matches(commandText);

                if (matches1.Count > 0)
                {
                    foreach (Match match in matches1)
                    {
                        count++;
                        var originalStr = match.Value;
                        var index1 = originalStr.IndexOf("(") + 1;
                        var index2 = originalStr.IndexOf(")");
                        var values = originalStr.Substring(index1, index2 - index1).Split(',');
                        var formattedValues = values.Select(x => string.Format("({0})", x.Trim()));
                        var concatValues = string.Join(",", formattedValues);
                        var paramName = _paramRegex.Match(originalStr);

                        var replaceText = new string[] { match.Value, string.Format(" JOIN (values {0}) V{1}(ref{1}) on V{1}.ref{1} = {2} ", concatValues, count, paramName) };
                        replaceTexts.Add(replaceText);
                    }

                    foreach (var replaceText in replaceTexts)
                    {
                        commandText = commandText.Replace(replaceText[0], replaceText[1]);
                    }
                }
            }

            return commandText;
        }

        public static string AddNoLock(string commandText)
        {
            commandText = _tableAliasRegex.Replace(commandText, "${tableAlias} WITH (NOLOCK)");
            return commandText;
        }
    }
}

2 thoughts on “Optimizing Entity Framework”

    1. I did – but using a general IEnumerable<<>>.Contains(…) within a LINQ expression is what causes EF to generate an IN clause. There are probably a few different ways to approach the scenario, but manipulating the query directly was generic enough and reusable to consistently make the replacement at the interception level.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.