In my previous post discussing profiling Entity Framework, I alluded to DbInterceptors as being able to provide lots of useful functionality. In this post I’ll expand on the DbInterceptor and show a few optimizations / manipulations that I like to perform on EF’s generated queries.
One problem that my organization faces involves database locking. As a result, we generally force the NOLOCK hints into all of our SELECT queries. An easy way to handle this is with manipulating the queries in a DbIntercepter. Using a Regular Expression, we can look for EF’s pattern for table aliases, and stick “WITH (NOLOCK)” at the appropriate points.
I like to create a CommandOptimizations.cs file with static methods for performing this optimization. The methods can then easily be called, or utilized, as needed, on a per-context basis. Here’s how the code looks:
1 2 3 4 5 6 7 | private static Regex _tableAliasRegex = new Regex( @"(?<tableAlias>\[dbo\]\.\[([\w\d]*)\] AS \[Extent\d+\](?! WITH \(NOLOCK\)))" , RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); public static string AddNoLock( string commandText) { commandText = _tableAliasRegex.Replace(commandText, "${tableAlias} WITH (NOLOCK)" ); return commandText; } |
Obviously, if your schema is something other than “dbo,” the RegEx would need to be revised.
Another interesting performance deficiency I noticed at some point with EF queries occurred when querying against a large conditional set. An example LINQ expression might look like:
1 2 | var arr = new string [] { "1" , "2" , "3" , "4" , ....... "999" }; var resultSet = _context.MyEntities.Where(x => arr.Contains(x.Name)).ToList(); |
The resulting query for this is pretty straight forward and uses an “IN” expression to represent the large condition set. This isn’t a bad approach, but it gets slow as the set size increases. Again, we can intercept the queries that EF has generated with “IN” expressions and replace them with expressions that perform a JOIN against the conditional set. We can see a pretty sizable performance increase when using JOIN instead of IN. Here’s a bit of code that will look for the IN expression and build up an appropriate JOIN against array expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | private static Regex _clauseRegex = new Regex( @"(WHERE \[.*\].\[.*] IN \((.*)?\))" , RegexOptions.Compiled | RegexOptions.IgnoreCase); private static Regex _paramRegex = new Regex( @"\[.*?\].?\[.*?\]" , RegexOptions.Compiled | RegexOptions.IgnoreCase); public static string ReplaceInClause( string commandText) { if (commandText.Contains( " IN " )) { List< string []> replaceTexts = new List< string []>(); var count = 0; var matches1 = _clauseRegex.Matches(commandText); if (matches1.Count > 0) { foreach (Match match in matches1) { count++; var originalStr = match.Value; var index1 = originalStr.IndexOf( "(" ) + 1; var index2 = originalStr.IndexOf( ")" ); var values = originalStr.Substring(index1, index2 - index1).Split( ',' ); var formattedValues = values.Select(x => string .Format( "({0})" , x.Trim())); var concatValues = string .Join( "," , formattedValues); var paramName = _paramRegex.Match(originalStr); var replaceText = new string [] { match.Value, string .Format( " JOIN (values {0}) V{1}(ref{1}) on V{1}.ref{1} = {2} " , concatValues, count, paramName) }; replaceTexts.Add(replaceText); } foreach ( var replaceText in replaceTexts) { commandText = commandText.Replace(replaceText[0], replaceText[1]); } } } return commandText; } |
A third, rather frustrating, limitation of EF is that it doesn’t support querying across multiple databases out of the box. The work-around I devised for this was through a combination of command interception and specifying the schema with the ModelBuilder.
For example, if we define our table mappings like so:
1 2 3 | modelBuilder.Entity<Entity1>().ToTable( "ReplaceMeEntity1" ).HasKey(t => t.Id); modelBuilder.Entity<Entity2>().ToTable( "ReplaceMeEntity2" ).HasKey(t => t.Id); modelBuilder.Entity<Entity3>().ToTable( "ReplaceMeEntity3" ).HasKey(t => t.Id); |
We can replace those strings before query execution.
1 2 3 4 5 6 7 8 9 10 11 12 13 | private void FixDbCommand(DbCommand command, IEnumerable<DbContext> contexts) { var context = contexts.FirstOrDefault() as MyEntityContext; if (context != null ) { command.CommandText = command.CommandText .Replace( "[dbo].[ReplaceMeEntity1]" , "[EntitiesDb1].[dbo].[Entities1]" ) .Replace( "[dbo].[ReplaceMeEntity2]" , "[EntitiesDb2].[dbo].[Entities2]" ) .Replace( "[dbo].[ReplaceMeEntity3]" , "[EntitiesDb3].[dbo].[Entities3]" ); command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText); command.CommandText = CommandOptimizations.AddNoLock(command.CommandText); } } |
Tying it all together, the SELECT query based optimizations should be executed in the DbInterceptor when a Scalar or Reader query is executed. This applies to adding NOLOCK and the IN-clause replacement. The table/database name replacement should be performed for all query operations. Here’s a roll-up of the previous code showing how to integrate the parts together.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | using System; using System.Collections.Generic; using System.Data.Common; using System.Data.Entity; using System.Data.Entity.Infrastructure.Interception; using System.Diagnostics; using System.Linq; using System.Text; using System.Text.RegularExpressions; using System.Threading.Tasks; using System.Web.Http; using log4net; namespace MyApp { public class MyContext : DbContext { public MyContext() : base ( "ConnStr" ) { Database.SetInitializer<MyContext>( null ); } // Attach the interceptor static MyContext() { DbInterception.Add( new CommandInterceptor()); } public DbSet<Entity1> Entities1 { get ; set ; } public DbSet<Entity2> Entities2 { get ; set ; } public DbSet<Entity3> Entities3 { get ; set ; } protected override void OnModelCreating(DbModelBuilder modelBuilder) { modelBuilder.Entity<Entity1>().ToTable( "ReplaceMeEntity1" ).HasKey(t => t.Id); modelBuilder.Entity<Entity2>().ToTable( "ReplaceMeEntity2" ).HasKey(t => t.Id); modelBuilder.Entity<Entity3>().ToTable( "ReplaceMeEntity3" ).HasKey(t => t.Id); } } /// <summary> /// Provides an interceptor to monitor EF commands and log via log4net /// </summary> public class CommandInterceptor : DbCommandInterceptor { private ILog _logger; private readonly Stopwatch _stopwatch = new Stopwatch(); public CommandInterceptor() { _logger = (ILog)GlobalConfiguration.Configuration.DependencyResolver.GetService( typeof (ILog)); } [ThreadStatic] public static bool SuppressNoLock; [ThreadStatic] public static bool SuppressInFix; [ThreadStatic] public static bool LogEntityStatsToDatabase = true ; public override void ScalarExecuting(DbCommand command, DbCommandInterceptionContext< object > interceptionContext) { FixDbCommand(command, interceptionContext.DbContexts); if (!SuppressNoLock && interceptionContext.DbContexts.FirstOrDefault() is MyContext) { command.CommandText = CommandOptimizations.AddNoLock(command.CommandText); } if (!SuppressInFix && interceptionContext.DbContexts.FirstOrDefault() is MyContext) { command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText); } base .ScalarExecuting(command, interceptionContext); _stopwatch.Restart(); } public override void ScalarExecuted(DbCommand command, DbCommandInterceptionContext< object > interceptionContext) { _stopwatch.Stop(); if (interceptionContext.Exception != null ) { Trace.TraceInformation( "Error executing command: {0}" , command.CommandText); if (LogEntityStatsToDatabase) { _logger.Error( string .Format( "Error executing command: {0}" , command.CommandText), interceptionContext.Exception); } } else { Trace.TraceInformation( "ScalarExecuted: {0}, Elapsed: {1}" , command.CommandText, _stopwatch.Elapsed); if (LogEntityStatsToDatabase) { _logger.DebugFormat( "ScalarExecuted. Elapsed: {0}, Command: {1}" , _stopwatch.Elapsed, command.CommandText); } } base .ScalarExecuted(command, interceptionContext); } public override void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext< int > interceptionContext) { FixDbCommand(command, interceptionContext.DbContexts); base .NonQueryExecuting(command, interceptionContext); _stopwatch.Restart(); } public override void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext< int > interceptionContext) { _stopwatch.Stop(); if (interceptionContext.Exception != null ) { Trace.TraceInformation( "Error executing command: {0}" , command.CommandText); if (LogEntityStatsToDatabase) { _logger.Error( string .Format( "Error executing command: {0}" , command.CommandText), interceptionContext.Exception); } } else { Trace.TraceInformation( "NonQueryExecuted: {0}, Elapsed: {1}" , command.CommandText, _stopwatch.Elapsed); if (LogEntityStatsToDatabase) { _logger.DebugFormat( "NonQueryExecuted. Elapsed: {0}, Command: {1}" , _stopwatch.Elapsed, command.CommandText); } } base .NonQueryExecuted(command, interceptionContext); } public override void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext) { FixDbCommand(command, interceptionContext.DbContexts); if (!SuppressNoLock && interceptionContext.DbContexts.FirstOrDefault() is MyContext) { command.CommandText = CommandOptimizations.AddNoLock(command.CommandText); } if (!SuppressInFix && interceptionContext.DbContexts.FirstOrDefault() is MyContext) { command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText); } base .ReaderExecuting(command, interceptionContext); _stopwatch.Restart(); } public override void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext) { _stopwatch.Stop(); if (interceptionContext.Exception != null ) { Trace.TraceInformation( "Error executing command: {0}" , command.CommandText); if (LogEntityStatsToDatabase) { _logger.Error( string .Format( "Error executing command: {0}" , command.CommandText), interceptionContext.Exception); } } else { Trace.TraceInformation( "ReaderExecuted: {0}, Elapsed: {1}" , command.CommandText, _stopwatch.Elapsed); if (LogEntityStatsToDatabase) { _logger.DebugFormat( "ReaderExecuted. Elapsed: {0}, Command: {1}" , _stopwatch.Elapsed, command.CommandText); } } base .ReaderExecuted(command, interceptionContext); } private void FixDbCommand(DbCommand command, IEnumerable<DbContext> contexts) { var context = contexts.FirstOrDefault() as MyContext; if (context != null ) { command.CommandText = command.CommandText .Replace( "[dbo].[ReplaceMeEntity1]" , "[EntitiesDb1].[dbo].[Entities1]" ) .Replace( "[dbo].[ReplaceMeEntity2]" , "[EntitiesDb2].[dbo].[Entities2]" ) .Replace( "[dbo].[ReplaceMeEntity3]" , "[EntitiesDb3].[dbo].[Entities3]" ); command.CommandText = CommandOptimizations.ReplaceInClause(command.CommandText); command.CommandText = CommandOptimizations.AddNoLock(command.CommandText); } } } public static class CommandOptimizations { private static Regex _clauseRegex = new Regex( @"(WHERE \[.*\].\[.*] IN \((.*)?\))" , RegexOptions.Compiled | RegexOptions.IgnoreCase); private static Regex _paramRegex = new Regex( @"\[.*?\].?\[.*?\]" , RegexOptions.Compiled | RegexOptions.IgnoreCase); private static Regex _tableAliasRegex = new Regex( @"(?<tableAlias>\[dbo\]\.\[([\w\d]*)\] AS \[Extent\d+\](?! WITH \(NOLOCK\)))" , RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); /// <summary> /// Replaces slower IN commands with JOINs. The performance increase is significant. /// </summary> /// <param name="command"></param> /// <returns></returns> public static string ReplaceInClause( string commandText) { if (commandText.Contains( " IN " )) { List< string []> replaceTexts = new List< string []>(); var count = 0; var matches1 = _clauseRegex.Matches(commandText); if (matches1.Count > 0) { foreach (Match match in matches1) { count++; var originalStr = match.Value; var index1 = originalStr.IndexOf( "(" ) + 1; var index2 = originalStr.IndexOf( ")" ); var values = originalStr.Substring(index1, index2 - index1).Split( ',' ); var formattedValues = values.Select(x => string .Format( "({0})" , x.Trim())); var concatValues = string .Join( "," , formattedValues); var paramName = _paramRegex.Match(originalStr); var replaceText = new string [] { match.Value, string .Format( " JOIN (values {0}) V{1}(ref{1}) on V{1}.ref{1} = {2} " , concatValues, count, paramName) }; replaceTexts.Add(replaceText); } foreach ( var replaceText in replaceTexts) { commandText = commandText.Replace(replaceText[0], replaceText[1]); } } } return commandText; } public static string AddNoLock( string commandText) { commandText = _tableAliasRegex.Replace(commandText, "${tableAlias} WITH (NOLOCK)" ); return commandText; } } } |
Did you investigate implementing the JOIN in LINQ compared to intercepting/rewriting the query itself?
I did – but using a general IEnumerable<<>>.Contains(…) within a LINQ expression is what causes EF to generate an IN clause. There are probably a few different ways to approach the scenario, but manipulating the query directly was generic enough and reusable to consistently make the replacement at the interception level.