Performance Enhancements with LINQ and PLINQO

Introduction

I recently completed a performance review for a client’s ASP.NET MVC web app that was running slowly.  Thankfully, their application was using the LINQ-to-SQL version of PLINQO, so it was easy to identify and resolve the data access bottlenecks.  Below, I have explained some of the technologies and techniques that I used to solve their server-side performance issues.

Why I Still Use LINQ-to-SQL

You may be wondering why anyone would continue to use LINQ-to-SQL (L2S) when Microsoft is pushing Entity Framework (EF) on .NET developers.  There are several reasons:

  1. Lightweight – Compared to EF, L2S is much lighter.  There is less code and less overhead.
  2. Simplicity – I don’t mind complexity as long as it serves a purpose, but the bloat of EF is usually not necessary.
  3. Performance – Performance tests have consistently shown faster application performance with L2S.
  4. Cleaner SQL – The actual T-SQL generated by L2S tends to be simpler and cleaner than that of EF.
  5. EnhancementsPLINQO adds a ton of features to L2S including caching, future queries, batched queries, bulk update/delete, auditing, a business rule engine and more.  Many of these features are not available with Entity Framework out-of-the-box.

Didn’t Microsoft say LINQ-to-SQL was dead?

Contrary to popular belief, Microsoft has never said that LINQ to SQL is dead, and improvements are being made according to Damien Guard’s LINQ to SQL 4.0 feature list.

Microsoft also released the following statement about their plan to continue support for LINQ to SQL:

Question #3: Where does Microsoft stand on LINQ to SQL?

Answer: We would like to be very transparent with our customers about our intentions for future innovation with respect to LINQ to SQL and the Entity Framework.

In .NET 4.0, we continue to invest in both technologies. Within LINQ to SQL, we made a number of performance and usability enhancements, as well as updates to the class designer and code generation. Within the Entity Framework, we listened to a great deal to customer feedback and responded with significant investments including better foreign key support, T4 code generation, and POCO support.

Moving forward, Microsoft is committing to supporting both technologies as important parts of the .NET Framework, adding new features that meet customer requirements. We do, however, expect that the bulk of our overall investment will be in the Entity Framework, as this framework is built around the Entity Data Model (EDM). EDM represents a key strategic direction for Microsoft that spans many of our products, including SQL Server, .NET, and Visual Studio. EDM-based tools, languages and frameworks are important technologies that enable our customers and partners to increase productivity across the development lifecycle and enable better integration across applications and data sources.

Reasons to Use Entity Framework

The main reasons to use EF are:

  • Support for Other Databases – It can be used with databases other than Microsoft SQL Server, including MySQL and Oracle.  Of course, if you’re programming an application in .NET, you’re probably using MSSQL anyway.
  • POCO Support – This stands for “Plain Old Code Objects” or “Plain Old C# Objects.”  The gist is that you can take a code-first approach to designing your data entities.  This is opposed to the database-first approach used by many ORMs.
  • Abstraction – EF lets you add greater abstraction to your data objects so they aren’t as tightly coupled with your database schema.  With L2S, all of your entities are locked into the Active Record Pattern where objects and properties are mapped directly to database tables and columns.

In my opinion, those arguments aren’t good enough to outweigh the benefits of the L2S version of PLINQO in most business scenarios.  If you really need to use EF, you should check out the Entity Framework version of PLINQO.  It doesn’t have all of the features of the L2S version, but it provides some of the same benefits.

What About NHibernate?

There is also an NHibernate version of PLINQO.  NHibernate is just as bloated as EF, but it is the most mature Object Relational Mapper (ORM) available.  As far as I know, it has more features than any other ORM, and there are many extensions available, along with a ton of documentation and support forums.  This is the only flavor of PLINQO that supports multiple database technologies including MSSQL, MySQL, Oracle, DB2 and others.

How to Identify Query Performance Issues

There are two common tools available to intercept the underlying SQL activity that is being executed by LINQ:

  • SQL Server Profiler – This SQL client tool is included with Developer, Standard and Enterprise editions of SQL Server.  It works best when using a local copy of the database.  Tip – It helps to add a filter to the trace settings to only display activity from the SQL login used by the web application.
  • LINQ to SQL Profiler – This is a code based logging tool.  It is less convenient than SQL Profiler but it works well if you are connecting to a remote SQL Server.  http://www.codesmithtools.com/product/frameworks/plinqo/tour/profiler

Eager Loading versus Lazy Loading

By default, LINQ-to-SQL uses “lazy loading” in database queries.  This means that only the data model(s) specifically requested in the query will be returned.  Related entities (objects or collections related by foreign key) are not immediately loaded.  However, the related entities will be automatically loaded later if they are referenced.  Essentially, only the primary object in the object graph will be hydrated during the initial query.  Related objects are hydrated on-demand only if they are called later.

In general, lazy loading performs well because it limits the amount of data that is fetched from the database.  But in some scenarios it introduces a large volume of redundant queries back to SQL Server.  This often occurs when the application encounters a loop which references related entities.  Following is an example of that scenario.

Lazy loading sample controller code (C#):

public ActionResult Index() { 
    MyDataContext db = new MyDataContext();
    List<User> users = db.Users.ToList();
    return View(users);  
}

Lazy loading sample view code (Razor):

@model IEnumerable<User>
@foreach (var user in Model) {
    <b>Username:</b> @user.Username <br />
    <b>Roles:</b>
    <ul> 
    @foreach (var role in user.RoleList) {
        <li>@role.RoleName - @role.RoleType.RoleTypeName</li> 
    }
    </ul> 
}

With the above code, the List<User> object will initially be filled with data from the Users table.  When the MVC Razor view is executed, the code will loop through each User to output the Username and the list of Roles.  Lazy loading will automatically pull in the RoleList for each User (@user.RoleList.RoleName), and a separate SQL query will be executed for each User record.  Also, another query for RoleType will be executed for every Role (@role.RoleType.RoleTypeName).

In our hypothetical example, let’s assume there are 100 Users in the database.  Each User has 3 Roles attached (N:N relationship).  Each Role has 1 RoleType attached (N:1 relationship).  In this case, 401 total queries will be executed:  1 query returning 100 User records, then 100 queries fetching Role records (3 Roles per User), then 300 queries fetching RoleType records.  L2S is not even smart enough to cache the RoleType records even though the same records will be requested many times.  Granted, the lookup queries are simple and efficient (they fetch related data by primary key), but the large volume of round-trips to SQL are unnecessary.

Instead of relying on lazy loading to pull related entities, “eager loading” can be used to hydrate the object graph ahead of time.  Eager loading can proactively fetch entities of any relationship type (N:1, 1:N, 1:1, or N:N).  With L2S, eager loading is fully configurable so that developers can limit which relationships are loaded.

There are two methods of eager loading with L2S, both of which are part of Microsoft’s default implementation (i.e. these are not PLINQO-specific features).  The first is to use a projection query to return results into a flattened data object called a “view model.”  The second is to use DataLoadOptions to specify the relationship(s) to load.

View Model Approach

This approach to eager loading works well for many-to-one (N:1) and one-to-one (1:1) relationships but it does not handle many-to-many (N:N) relationships.  However, N:1 is the most common type of lookup, so it works for the majority of scenarios.

The idea is to cast the result set into a “view model” data object which is a simple container for all of the required data output.  This can be an anonymous type, but it is generally recommended that developers use a defined data type (see the PLINQO Future Queries section below for details).

Because our first example used a N:N relationship (Users to Roles), it cannot be improved using this methodology.  However, it could be useful for loading other User relationships.  Following is an example of eager loading data from a related User Profile entity (1:1) and a User Type entity (N:1).

View model sample controller code (C#):

public Class UserViewModel {
    public string Username { get; set; }
    public string UserTypeName { get; set; }
    public string Email { get; set; }
    public string TwitterHandle { get; set; }
}

public ActionResult Index() { 
    MyDataContext db = new MyDataContext();
    List<UserViewModel> users = db.Users
        .Select(u => new UserResult() { 
            Username = u.Username,
            UserType = u.UserType.UserTypeName,
            Email = u.UserProfile.Email,
            TwitterHandle = u.UserProfile.TwitterHandle
        }).ToList();
    return View(users);  
}

View model sample view code (Razor):

@model IEnumerable<UserViewModel>
@foreach (var user in Model) {
    <b>Username:</b> @user.Username <br />
    <b>User Type:</b> @user.UserTypeName <br />
    <b>Email:</b> @user.Email <br />
    <b>Twitter: </b> @user.TwitterHandle
}

With this controller code, the controller’s LINQ query loads all of the UserResult properties in one SQL query (the query performs the appropriate joins with the UserType and UserProfile tables).  The actual SQL output performs inner or outer joins (depending on whether the foreign key column is nullable) to collect all of the data in one round-trip.

Another benefit to the view model approach is a reduction of the volume of data sent over the network.  Ordinarily, LINQ pulls all columns from each database table, regardless of how many data columns are actually displayed later.  When a simplified return type is used, you are explicitly specifying the columns that should be returned from SQL Server.  This is beneficial if your database table contains a large number of columns, or it has columns with large data sizes like varchar(MAX) or other BLOBs.

DataLoadOptions Approach

An easier way to eager load data is to specify DataLoadOptions for the LINQ database context.  Each relationship is added to the DataLoadOptions via the LoadWith method.  Note:  This can only be done before any queries are executed.  The DataLoadOptions property may not be set or modified once any objects are attached to the context.  See http://msdn.microsoft.com/en-us/library/system.data.linq.dataloadoptions%28v=vs.110%29.aspx for details.

All relationship types are allowed in DataLoadOptions, therefore it is the only way to eager load N:N relationships.  Also, there is no limit to the number of association types that can be eager loaded.  LINQ is also intelligent enough to ignore any associations that do not apply to the query.  Therefore, a single “master” DataLoadOptions can be applied to multiple queries throughout the application.

Revisiting the first example (Users and Roles), here is a slightly modified version.

DataLoadOptions sample controller code (C#):

public ActionResult Index() { 
    MyDataContext db = new MyDataContext();

    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<User>(u => u.RoleList); //Load all Roles for each User
    options.LoadWith<Role>(r => r.RoleType); //Load all RoleTypes for each Role
    db.LoadOptions = options;

    List<User> users = db.Users.ToList();
    return View(users);  
}

DataLoadOptions sample view code (Razor):

@model IEnumerable<User>
@foreach (var user in Model) {
    <b>Username:</b> @user.Username <br />
    <b>Roles:</b>
    <ul> 
    @foreach (var role in user.RoleList) {
        <li>@role.RoleName - @role.RoleType.RoleTypeName</li> 
    }
    </ul> 
}

Note that the Razor view code has not changed at all.  The only difference is the controller code which adds DataLoadOptions to the LINQ data context.  Although the view code is identical, this time only a single SQL query is executed, compared to 401 queries for the original controller code sample.

It is also worth noting that view model and DataLoadOptions approaches can be used together (i.e. they are not mutually exclusive).  Any LoadWith relationships will be processed even when used inside a projection query.

View model with DataLoadOptions sample controller code (C#):

public Class UserViewModel {
    public string Username { get; set; }
    public string UserTypeName { get; set; }
    public string Email { get; set; }
    public string TwitterHandle { get; set; }
    public IEnumerable<Role> RoleList { get; set; }
}

public ActionResult Index() { 
    MyDataContext db = new MyDataContext();

    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<User>(u => u.RoleList); //Load all Roles for each User
    options.LoadWith<Role>(r => r.RoleType); //Load all RoleTypes for each Role
    db.LoadOptions = options;

    List<UserViewModel> users = db.Users
        .Select(u => new UserResult() { 
            Username = u.Username,
            UserType = u.UserType.UserTypeName,
            Email = u.UserProfile.Email,
            TwitterHandle = u.UserProfile.TwitterHandle,
            RoleList = u.RoleList
        }).ToList();
    return View(users);  
}

View model with DataLoadOptions sample view code (Razor):

@model IEnumerable<UserViewModel>
@foreach (var user in Model) {
    <b>Username:</b> @user.Username <br />
    <b>User Type:</b> @user.UserTypeName <br />
    <b>Email:</b> @user.Email <br />
    <b>Twitter: </b> @user.TwitterHandle    <b>Roles:</b>
    <ul> 
    @foreach (var role in user.RoleList) {
        <li>@role.RoleName - @role.RoleType.RoleTypeName</li> 
    }
    </ul> 
}

This example is “the best of both worlds” because it pulls all required data in a single query, but it does not pull unnecessary columns from the User, UserType or UserProfile tables.  It would still pull all columns from the Role and RoleList tables.

PLINQO Caching

PLINQO adds intelligent caching features to the LINQ data context.  Caching is implemented as a LINQ query extension.  The .FromCache() extension can be used for single return types or collections, and FromCacheFirstOrDefault() can be used for single return objects.

PLINQO caching example (C#):

//Returns one user, or null if not found
User someUser = db.Users.Where(u => u.Username = "administrator").FromCacheFirstOrDefault(); 

//Returns a collection of users, or an empty collection if not found
IEnumerable<User> adminUsers = db.Users.Where(u => u.Username.Contains("admin")).FromCache();

Note:  These query extension methods change the return type for collections.  Normally LINQ returns an IQueryable<T>.  The cache extension methods return IEnumerable<T> instead.  However, it is still possible to convert the output .ToList() or .ToArray().

Converting cached collections sample (C#):

//Convert IEnumerable<User> to list
List<User> userList = db.Users.FromCache().ToList();

//Convert IEnumerable<User> to array
User[] userArray = db.Users.FromCache().ToArray();

The caching in PLINQO also allows developers to specify a cache duration.  This can be passed as an integer (the # of seconds to retain cache), or a string can be passed which refers to a caching profile from the web.config or app.config.

Cache duration sample (C#):

//Cache for 5 minutes (300 seconds)
IEnumerable<User> users = db.Users.FromCache(300);

//Cache according to the "Short" profile in web.config
IEnumerable<Role> roles = db.Roles.FromCache("Short");

The “Short” value above refers to a cache profile name in the web.config or app.config settings.

Cache settings in web.config (XML):

I recommend creating 3 caching profiles in the web.config.  I created a profile called “Short” which is a 5 minute duration.  Because this is a default setting, any .FromCache() commands will use the “Short” profile unless another is specified.

<configSections>
    <section name="cacheManager" type="CodeSmith.Data.Caching.CacheManagerSection, CodeSmith.Data" />
</configSections>
<cacheManager defaultProvider="HttpCacheProvider" defaultProfile="Short">
    <profiles>
        <add name="Brief" description="Brief cache" duration="0:0:10" />
        <add name="Short" description="Short cache" duration="0:5:0" />
        <add name="Long" description="Long cache" duration="1:0:0" />
    </profiles>
    <providers>
        <add name="HttpCacheProvider" description="HttpCacheProvider" type="CodeSmith.Data.Caching.HttpCacheProvider, CodeSmith.Data" />
    </providers>
</cacheManager>

The “Short” profile will cache items for 5 minutes.  This is a good choice for most scenarios.  There is also a profile called “Long” which will cache for 1 hour, best suited for reference data which rarely changes.  There is also a special-case profile called “Brief” which is useful in scenarios where a single data set is requested repeatedly in a single page load.

CacheProfile class (C#):

You may want to create a CacheProfile.cs class to standardize your references.  Here is an example that can be customized to your needs.

/// <summary>
/// Cache duration names correspond to caching profiles in the web.config or app.config
/// </summary>
public static class CacheProfile
{
    /// <summary>
    /// 10 second cache duration.
    /// </summary>
    public const string Brief = "Brief";

    /// <summary>
    /// 5 minute cache duration.
    /// </summary>
    public const string Short = "Short";

    /// <summary>
    /// 1 hour cache duration.
    /// </summary>
    public const string Long = "Long";
}

CacheProfile class usage (C#):

IEnumerable<User> users = db.Users.FromCache(CacheProfile.Brief); //Uses "Brief" profile
IEnumerable<User> users = db.Users.FromCache(); //Assumes "Short" profile
IEnumerable<User> users = db.Users.FromCache(CacheProfile.Long); //Uses "Long" profile

PLINQO’s caching system has additional features like cache groups, explicit cache invalidation, and a cache manager for non-LINQ objects.  See http://www.codesmithtools.com/product/frameworks/plinqo/tour/caching for full documentation.

PLINQO Future Queries

The “futures” capability of PLINQO allows for intelligent batching of queries to reduce round-trips to SQL Server.  This is particularly helpful for MVC because multiple objects are often sent to the same view.  Because the view is executed after the controller action, often all of the objects passed can be batched in a single SQL query.

Futures usage in controller action (C#):

public ActionResult EditUser(int userId) { 
    MyDataContext db = new MyDataContext();
    User user = db.Users.Where(u => u.UserID == userId).Future(); //Deferred
    ViewBag.RoleOptions = new SelectList(db.Roles.Future(), "RoleID", "RoleName", user.UserID); //Deferred
    ViewBag.StateOptions = new SelectList(db.States.Future(), "StateID", "StateName", user.StateID); //Deferred
    //All Future queries will be executed when the view calls any of the objects
    return View(user);
}

Instead of executing these 3 queries independently, they will be batched into a single SQL query.  The Future() extension makes use of LINQ-to-SQL’s deferred execution feature.  In other words, LINQ does not actually execute any of these queries until one of the collections is enumerated.

Enumeration occurs when:

  1. Any code attempts to access the contents of the collection, usually by looping through the collection (for each item in collection), or…
  2. The objects in the collection are counted.  For example: RoleOptions.Count(), or…
  3. The collection is converted to another type through .ToList() or .ToArray().

Caveats of future queries:

  • If you want to use .Future() then you should not use .ToList() or .ToArray().  This causes immediate enumeration of the query.  Frankly, there is rarely a need to immediately convert to a list or an array.  The .Future() return type of IEnumerable<T> works perfectly well for most scenarios.
  • Razor views tend to throw errors if futures were used with an anonymous type.

If you want to use futures, make sure that your results are returned into a known data type.  In other words, the following will throw a runtime error when the Razor view is executed.

Anonymous type which causes runtime error (C#):

//Fails at runtime when the view enumerates the collection
ViewBag.UserOptions = new SelectList(db.Users.Select(new { ID = u.UserID, FirstName = u.FirstName }).Future(), "UserID", "FirstName");

However, the same Future query will work fine if a type is specified.

Future query with known type (C#):

//Works fine at runtime because UserViewModel is a known type
ViewBag.UserOptions = new SelectList(db.Users.Select(new UserViewModel() { ID = u.UserID, FirstName = u.FirstName }).Future(), "UserID", "FirstName");

Another nice feature is that PLINQO allows caching and futures together.  There are extension methods which combine both features:

Futures with caching sample (C#):

//Returns one user, or null if not found
User someUser = db.Users.Where(u => u.Username = "administrator").FutureCacheFirstOrDefault(); 

//Returns a collection of users 
IEnumerable<User> adminUsers = db.Users.Where(u => u.Username.Contains("admin")).FutureCache();

The term “FromCache” is simply changed to “FutureCache.”  These extension methods support the same cache duration settings as FromCache().

For more details on PLINQO Future Queries, see http://www.codesmithtools.com/product/frameworks/plinqo/tour/futurequeries

Best Practices for Query Reuse

Often, there are a lot of common queries that are repeated throughout an application.  Ideally, your application should have a data service layer or business logic layer to house common queries for easy reuse.  However, if that is not currently in place, the simplest solution is to add methods to each controller as appropriate.  This would still improve code reuse with minimal programming changes.

Sample redundant controller code (C#):

public ActionResult FirstPage() { 
    ViewBag.CustomerList = new SelectList(db.Customers.OrderBy(c => c.CustomerID), "CustomerID", "CustomerName");
    return View();
}
public ActionResult SecondPage() { 
    ViewBag.CustomerList = new SelectList(db.Customers.OrderBy(c => c.CustomerID), "CustomerID", "CustomerName");
    return View();
}

Suggested replacement code (C#):

public IEnumerable<SelectListItem> GetCustomerList()
{
	return (from c in db.Customers orderby c.CustomerName 
            select new SelectListItem() { Value = c.CustomerID.ToString(), Text = c.CustomerName })			
            .FutureCache(CacheProfile.Long);
}

public ActionResult FirstPage() { 
    ViewBag.CustomerList = new SelectList(GetCustomerList(), "Value", "Text");
    return View();
}
public ActionResult SecondPage() { 
    ViewBag.CustomerList = new SelectList(GetCustomerList(), "Value", "Text");
    return View();
}

This reduces the amount of redundant copy-paste query code while also implementing Futures and Caching.  Because the return type is IEnumerable<SelectListItem> it avoids the anonymous type issues described above.  It can also handle lists where a selected value must be specified.

Sample selected value in a SelectList (C#):

public IEnumerable<SelectListItem> GetCountryList()
{
	return (from c in db.Countries orderby c.CountryName
            select new SelectListItem() { Value = c.CountryID.ToString(), Text = c.CountryName })			
            .FutureCache(CacheProfile.Long);
}
public ActionResult EditUser(int userId) 
{
     User user = db.Users.GetByKey(userId);
     ViewBag.CountryList = new SelectList(GetCountryList(), "Value", "Text", user.CountryID);
     return View(user); 
}

On a similar note, controllers often reuse the same relationships is multiple actions.  Because of this, it is usually beneficial to create a private function to specify the common DataLoadOptions.  This is preferably done in a business layer, but the controller is acceptable in a pinch.

MVC Controller DataLoadOptions sample (C#):

private DataLoadOptions EagerLoadUserRelationships()
{
    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<User>(u => u.Country);
    options.LoadWith<User>(u => u.RoleList);
    options.LoadWith<Role>(r => r.RoleType);
    return options;
}

public ActionResult Index() {
    MyDataContext db = new MyDataContext();
    db.LoadOptions = EagerLoadUserRelationships();
    Users userList = db.Users.Where(...).Future();
}

Other Performance Improvements

PLINQO has a number of other performance improvements including:

  • Bulk updates (update many records with a single command).
  • Bulk deletion (delete many records with a single command).
  • Stored procedures with multiple result sets.
  • Batch queries.

These features are documented here:  http://www.codesmithtools.com/product/frameworks/plinqo/tour/performance

I hope you found this useful.  Happy coding!

Advertisements

Why I Avoid Stored Procedures (And You Should Too)

Subtitle:  Object Relational Mappers are the New Standard Practice for Application Development

Okay – DBA’s are already upset after reading the title.  I understand.  Stored procedures have been the standard practice of most professional software developers for more than a decade.  But like my recommendations on Flash, there are newer and better options than some old tried-and-true technologies.

I’ll have you know that I’m not an extremist on this subject.  I have personally worked on many applications that use stored procedures, and I have written plenty of them myself to follow a client’s “established best practices.”  Stored procedures still have a place in some scenarios.  However, given a choice, I avoid stored procedures as often as possible.

Why Do Developers Use Stored Procedures?

Traditionally, there have been a lot of reasons given for the use of stored procedures.  The most popular arguments include:

  • Performance – The query plan for stored procedures is compiled in SQL Server so that subsequent requests can run faster.  Also, a single stored procedure can perform multiple SQL commands, reducing traffic between an application and the database server.
  • Security – Stored procedures are well defined database objects that can be locked down with security measures.  Use of typed parameters can help prevent SQL injection attacks.
  • Code Re-Use – Database queries can be written once and re-used multiple times without writing the same SQL commands over and over again.
  • Business Logic Encapsulation – The database can house a lot of business logic so that the brain of your application is kept “neatly in one place.”

Years ago, I was completely on-board with this train of thought.  But around 2005, I started working with Object Relational Mappers (ORMs) and it changed my whole thinking about stored procedures.

What is an Object Relational Mapper?

An ORM is a code-based tool for application developers to work with databases.  The purpose of an ORM is to create a code representation of the data model.  Once you have this code in place you can access your database without writing a line of SQL code.  No stored procedures (or ad-hoc SQL) needed.  A lengthy study is beyond the scope of this article, but you can read about ORMs on Wikipedia.

Oh CRUD

ORMs are very efficient at Create [INSERT], Read [SELECT], Update and Delete (CRUD) operations.  Usually at least 90% of SQL commands used in an application are simple CRUD operations.  ORMs automatically write parameterized queries for you so that you never have to spend time writing CRUD from scratch.

Writing stored procedures for simple operations is a waste of time and it muddies up the database.  I cringe when I see a list of 500 stored procedures in a single database.  I cringe further when I discover that most of the stored procedures contain mind-numbingly simple queries like “select * from [tablename] where [col1]=@param” and “update [tablename] set [col 1]=@param where [id]=@userId”.

It drives me batty when I know that someone took the time to write all of those stored procedures, then an application developer manually wrote code to call the stored procedure, convert and pass in parameters, check the result and pass back a DataSet.  I commonly see ~20 lines of C# or VB application code to call a simple one-line stored procedure.

With LINQ, I can accomplish more functionality with less code.  By writing “var result = context.TableName.Where(tn => tn.UserId == userId);” the data context writes a parameterized SQL query, the query is executed, the result is returned and translated into a strongly-typed set of objects (much better than a DataSet).  I get more benefit out of one line of LINQ code than a stored procedure coupled with 20 lines of code to call it.

Good N-Tier Application Design

N-Tier Application DiagramIf you are familiar with n-tier (or multi-tier) application design, you know that the business layer belongs in the middle of your application stack.  In other words, business logic should be compiled application code that is testable via automated unit tests.

Your data access layer should be comprised of an ORM and basic repository methodology.  Some people would throw stored procedures into the diagram as a second data access layer, but this is unnecessary and (IMHO) incorrect.

A database should be limited to the role of a “persistence layer” – a technology agnostic storage mechanism.  When business logic is scattered through stored procedures, database triggers and application code, the n-tier model is broken.

Since T-SQL is technology-specific, your stored procedures would need to be re-written in order to migrate to MySQL, Oracle or another database.  If you avoid stored procedures and use an ORM like NHibernate, moving to a new database architecture is basically as simple as changing a connection string (okay, maybe not quite that simple in the real world).

Benefits of Object Relational Mappers

Every ORM is different, some with more benefits than others.  My personal favorite is still PLINQO for LINQ-to-SQL, and there are versions for Entity Framework and NHibernate now.

These are the most common benefits to look for in a good ORM:

  • Less Hand-Written Code – There is substantial time and cost savings when using a good ORM versus hand-writing stored procedures and stored procedure calls.
  • Rapid Application Development – Developers are all under pressure to build more features with less time.  Tools like PLINQO offer code generation with intelligent re-generation.  This is extremely important because PLINQO can update your data layer to match the database with two clicks of a mouse.  If you’ve been using stored procedures and your data model changes, you’re in for a lot of fun tracking down and fixing all of the references.
  • Maintainability & Refactoring – The foundational principle of agile development is the expectation that applications will need to be changed in the future.  The data model and business rules will change.  If you use a good ORM, these changes can be handled quickly and easily with a refactoring tool (like ReSharper).  Compiling your application is often enough to verify that your changes were done properly.  But if you used stored procedures as the basis of your data access and/or business logic layer, you will be frightened to make any changes for fear of breaking something.  This “stored procedure paralysis” leads to hacks and workarounds instead of properly addressing new requirements.  I just saw this happen a few weeks ago on a client’s project.
  • Performance – A LINQ data context uses deferred execution for selects, and it batches record updates and deletes in a single database round-trip on Submit().  Good ORMs offer additional performance enhancements like caching, futures and batch operations (select, update and delete).  These can have a dramatic impact on application speed.
  • Proper N-Tier Design – Your database should merely be a “persistence layer.”  It should not contain the business logic of your application.  Using an ORM helps you architect a solution where your business logic resides in strongly typed managed code (instead of being scattered through stored procedures and triggers).  This keeps the separation of concerns more clear, avoiding a broken or inverted n-tier structure.
  • Business Logic Encapsulation – The data entities that are generated by your ORM are effectively business objects.  Instead of working with “dumb DataSets,” these entities can enforce business rules that can interact with your presentation layer and data layer.

Common Objections to Object Relational Mappers

  • “Stored procedures are faster.” – In speed tests that I’ve seen, parameterized queries perform just as well as stored procedures (sometimes better).  From what I’ve read, SQL Server does cache the execution plan of parameterized queries.
  • “LINQ syntax is not as flexible as T-SQL.” – This is true but misleading.  95% of T-SQL queries I’ve seen can be accomplished easily in LINQ syntax (even aggregate functions).  If you look at some sample LINQ commands you may be surprised what is possible.  Also, you can execute custom SQL or call a stored procedure from LINQ and return the results in a strongly-typed object.
  • “ORMs add too much overhead.” – Compared to what?  If you’re comparing with straight ADO.NET and a DataReader, there is a slight performance hit.  If you’re comparing with explicit hydration of custom business objects, there is likely a performance gain.  Plus, good use of caching, futures and batch operations more than make up for any overhead.  Remember that data access speed is not the only measure of good application design.
  • “ORMs pull unnecessary data.” – By default most ORMs will return every column in the table(s) that you requested.  However, most ORMs support projection queries where you can limit the columns returned.  The results of your projection are returned into a strongly typed collection.
  • “Stored procedures still provide better security.” – If you need to give data access to someone other than your development team, you should build a secure web service instead of leaving your database open to direct access by multiple clients.  As for SQL injection attacks, parameterized queries are just as safe as stored procedures, if not more so.
  • “ORMs should only be used to call stored procedures.” – Most ORMs can call stored procedures, so these two technologies are not mutually exclusive.  However, calling stored procedures from an ORM negates many of the benefits of ORM technology and design patterns.  It adds work and complexity while reducing benefits.

When Should I Use Stored Procedures?

I see several cases where stored procedures are still acceptable:

  • You’re doing data warehousing / ETL / data aggregation.  An ORM is usually not the right tool for this kind of bulk data management.  You should consider a tool like SQL Server Integration Services (SSIS), BizTalk, or another dedicated ETL tool.
  • You’re working on a legacy application where everything is already built using stored procedures.  Sometimes you have to go with the flow.  Sometimes adding a second tool is worse than supporting the wrong tool.
  • You need to feed data to SQL Server Reporting Services (SSRS).  Strangely, the CLR integration for SQL Server does not have LINQ support, so T-SQL is your only option with SSRS.  But frankly, SSRS isn’t always the best choice either.  There are a lot of good third party tools for report generation to HTML, PDF and Excel that can interface with your ORM.  Since SSRS is not available in the Express Edition of SQL Server (and definitely not available in MySQL or Oracle) you are limiting your deployment options again.
  • Your query absolutely cannot be performed with LINQ syntax.  This is rare, but does happen occasionally.  Thankfully, stored procedures or custom SQL can be called from a LINQ data context.
  • You need to perform a lot of complex SQL statements from a small set of input.  Although there are lots of ways to handle batch operations with PLINQO, sometimes “you gotta do what you gotta do” to address a performance issue.
  • You need to push thousands of records to SQL in a single statement.  This is most efficient through table-valued stored procedure parameters in SQL Server.

I hope you find this information useful.  Happy coding!

Kick it on DotNetKicks.com [Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Version 4.0.1 of PLINQO Released

Okay, I’m a little behind on blogging. Since I previously wrote multiple posts regarding PLINQO (Professional LINQ-to-SQL), I wanted to drop a quick update that version 4.0.1 was recently released. I also skipped writing about the release of version 4.0.0 because I couldn’t find a definitive enhancement list. I stumbled across it today, so I’m posting both.

Version 4.0.0 Highlights:

  • Futures Support – Allows creation of a queue of objects to be loaded all at once. This differs from the old Multiple Result Sets feature in that it defers execution until the data is really needed. It’s also easier to support more than 2 result sets in a single call.
  • Caching Improvements – Added support for various caching providers including Memcached.
  • Detach/Attach Entities – Added more methods for serialization/deserialization so detached entities can be stored as binary or XML.
  • More DetailsClick here for the full set of enhancements

Version 4.0.1 Highlights:

  • DataContextName – You can finally control the name of the DataContext that is generated. This is long overdue and greatly appreciated.
  • Pagination Improvements – Added methods for NextPage, PreviousPage and GoToPage for PagedList.
  • Null Handling – Added NotNull rule and attribute and improved SQL queries that use null comparisons.
  • More DetailsClick here for the full set of enhancements

I know that some people are a little hesitant of continuing to use LINQ-to-SQL (L2S) given Microsoft’s shift in direction to LINQ-to-Entities (L2E). However, Microsoft has not dropped support for L2S in .NET 4.0. They actually added some features to LINQ-to-SQL in the recent release of .NET 4.0 and Visual Studio 2010. L2S is widely adopted and (from what I can tell) MS intends to continue supporting it in future versions of .NET, even though they aren’t going to develop it further.

At this time, the PLINQO team intends to provide LINQ-to-Entities support in a future release. This means that PLINQO users should require little-to-no-work in making the switch to L2E. In the meantime, I’m happy using PLINQO as my primary OR/M on new projects.

Kick it on DotNetKicks.com [Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

PLINQO 3.0 – Even Better

I wrote a blog article a few months ago giving kudos to the guys at CodeSmith after my discovery of PLINQO 2.0. Since then, I haven’t done much with LINQ-to-SQL because the legacy projects at my day job use Castle ActiveRecord with NHibernate. But I recently started a new project (giving me the freedom to investigate other technologies) and was pleasantly surprised to find PLINQO 3.0. In addition to a new major code revision, I found that CodeSmith released a new website with more information about the benefits of PLINQO and sample usage.

In case you’re not familiar with PLINQO, this set of code-generation templates is designed to enhance the LINQ-to-SQL development experience. They’re not only a time-saver like most code generation templates, but they allow you to overcome many of the limitations of “raw” LINQ-to-SQL. See Does LINQ StinQ? Not with PLINQO!

My first article covered some of the main benefits of PLINQO 2.0, including:

  • Generates one file per entity instead of one massive DBML file.
  • Generates partial classes where custom code can be written and won’t be overwritten.
  • Generated entity files are added to the project as code behind files to their corresponding custom entity files.
  • Adds customizable business rules engine to enforce entity validation, business and security rules.
  • Generation of entity manager classes… Provides access to common queries based on primary keys, foreign keys, and indexes.
  • Ability to automatically remove object prefix and suffixes (ie. tbl and usp) [based on RegEx].

In addition to those features, PLINQO 3.0 has the following benefits:

  • Entity Detach – Detach entities from one DataContext and attach to another (very useful for caching scenarios).
  • Entity Clone – Create copies of entities in-memory, set only the properties that need to be changed and persist as a new object with a new primary key.
  • Many-to-Many Relationships – Yes, M:M can be done in LINQ-to-SQL without writing goofy code to manage the link tables.
  • Auditing – The app can track all property changes complete with a copy of the old value and new value. Tracked changes can be read iteratively or dumped to an XML string.
  • Batch Updates and Deletes – You can perform updates and deletes on records based on criteria on the SQL Server without pulling each record into your app first. I’d already been using another implementation of this concept, but it’s nice to have it built into PLINQO.
  • Multiple Result Sets – PLINQO can pull multiple recordsets back in a single request. This can be done either by using a stored procedure or using the ExecuteQuery method passing a list of queries as parameters.

I think some of those benefits may have existed in the 2.0 release, but weren’t documented. I’m glad to see they’re starting to provide more documentation and samples. It would still be nice to see more, however (as it occurs to me) your custom PLINQO code really sits on top of LINQ-to-SQL, so all of the standard LINQ documentation applies.

I do have some suggestions for CodeSmith to implement in future versions of PLINQO:

  • I’m fond of the IRepository pattern because of unit testing with frameworks such as Rhino Mocks. I’ve seen a couple of implementations of IRepository with LINQ (example 1, example 2). This should be a code generation option.
  • I’d like to see a DataContext session factory with per-web-request lifestyle. This is available in other ORM systems like ActiveRecord. After some digging, I found an example of this that also demonstrates integration with Microsoft’s MVC and Castle Windsor (IoC). Sweet.
  • There are some helpful LINQ libraries out there, such as LINQKit and the LINQ Dynamic Query Library. It would be nice to include these and/or other free libraries with PLINQO.
  • I’ve gotten the impression that Microsoft is going to favor the Entity Framework (LINQ-to-Entities) over LINQ-to-SQL. I’d love to see PLINQO adapted to support the Entity Framework. That would certainly placate the domain-driven design fans along with those who use db’s other than MS SQL.

Finally, a bit of a rant: I’m kind-of annoyed that PLINQO only has one way to select the tables you want to include in code generation: you have to write a RegEx to identify tables to exclude. I’ve worked on several projects where I want to generate entities for less than 50% of the tables in my database. For instance, when writing modules for DotNetNuke, I only want to generate entities for my 5 tables, not the 100+ tables that come with a DNN installation.

NetTiers had a dialog to select tables for code generation. It sure would be nice to bring that back in PLINQO. If a dialog box is too much trouble, at least there could be a switch to specify whether my RegEx is an include list or an exclude list. I submitted a ticket to CodeSmith on this one. Please vote and add comments on their website if you support this idea. How about it, CodeSmith? 🙂

See the new PLINQO website at http://www.plinqo.com/ for downloads, documentation and an offer to get a free copy of CodeSmith. I also suggest that you watch both introductory videos: Video 1. Video 2.

Kick it on DotNetKicks.com [Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Does LINQ StinQ? Not with PLINQO!

There has been much debate over Microsoft’s first major Object Relational Mapper (ORM) called LINQ (Language-Integrated Query). Microsoft has released a several flavors of LINQ suitable for different purposes, most notably LINQ-to-SQL and LINQ-to-Entities. Most developers use LINQ-to-SQL given that LINQ-to-Entities is brand new and most info available online is about LINQ-to-SQL, so for the purposes of this article (and in most people’s minds), “LINQ” == “LINQ-to-SQL”. [Yes, I know LINQ is really an expression language and not an ORM, but let’s not get technical on a technology blog. :-)]

LINQ has been met with mixed reactions from the development community. Some are enthralled that Microsoft has finally built their own “supported” ORM. Some love the ease-of-use in being able to drag tables onto a designer surface in Visual Studio 2008 (reminiscent of TableAdapters). Others like the (usually) clean and efficient SQL queries generated by the ORM.

To me, the best feature of LINQ is the integration of a strongly-typed data query language built into .NET. Gone are the days of query building from strings. Even an impressive ORM like NHibernate with its own query language (HQL) suffers from the lack of a strongly-typed query language (leading to hybrid attempts like LINQ-to-NHibernate). Altogether, it is a powerful and efficient system of data access when you understand it and use it properly.

Of course, LINQ is not without its problems and critics. I’ve heard a lot of complaints about “lack of support” for implementing dynamic queries, that is, creating queries with joins and other non-comparison criteria at runtime. There are some solutions for this like the LINQ Dynamic Query Library or Predicate Builder.

One of the biggest concerns from enterprise-level developers is that LINQ is a “black box” and the Visual Studio designer writes a ton of “hidden code” to map the entities to the database. Bear in mind that all ORMs are a black box to some degree. Even though NHibernate is open-source, most developers use a DLL from a release and never tinker with its inner workings. NHibernate extensions like Castle ActiveRecord even hide the XML column mappings from the developer. But in all honesty, LINQ-to-SQL is very bad at allowing the developer open access to the mysterious column mapping code and has a number of issues when you make changes to the underlying database schema.

A couple of months ago, I decided to try out LINQ-to-SQL on one of my pet projects. It took some getting used to, but after pulling some hair out, learning some important lessons and finding some handy tools, it worked pretty well. I even found that I can usually avoid many dynamic query issues by chaining sub-queries together to execute a single query thanks to Deferred Execution.

My biggest gripes centered around the creation and maintenance of the entity mappings in the DBML file. Microsoft’s O/R Designer documentation openly admits:

…the designer is a one-way code generator. This means that only changes that you make to the designer surface are reflected in the code file. Manual changes to the code file are not reflected in the O/R Designer. Any changes that you make manually in the code file are overwritten when the designer is saved and code is regenerated.

More than that, when you make changes to your database schema, you either need to manually update the entity through the O/R Designer or you need to delete and re-add the entity, losing any customizations you’ve made (including entity relationships). Suddenly the code-generation time savings doesn’t make up for customization frustration.

Enter “Professional LINQ to Objects,” a.k.a. PLINQO, a code generation tool for LINQ that does what you want. I’ve used other ORM frameworks from CodeSmith before, particularly NetTiers. I believe in the value of code-generation when it’s implemented properly. One of the most important attributes of a good code-gen tool is that it lets you re-generate code without overwriting your customizations. PLINQO delivers intelligent regeneration to LINQ-to-SQL.

PLINQO works by generating your DBML file for you, but surprisingly this doesn’t break Visual Studio’s O/R Designer. In fact, you can still open and modify the .dbml file with the Designer and your changes will not be overwritten next time you generate. You can also modify many aspects of the entities via code (including validation attributes) and your code customizations will be untouched by re-generation. Sweet!

Some other benefits of PLINQO (as noted on their site):

  • Generates one file per entity instead of one massive [DBML] file.
  • Generates partial classes where custom code can be written and won’t be overwritten.
  • Generated entity files are added to the project as code behind files to their corresponding custom entity files.
  • Adds customizable business rules engine to enforce entity validation, business and security rules.
  • Generation of entity manager classes… Provides access to common queries based on primary keys, foreign keys, and indexes.
  • Ability to automatically remove object prefix and suffixes (ie. tbl and usp) [based on RegEx].

Another nice bonus is that you configure and execute the code generation right inside VS 2008. CodeSmith Pro has Visual Studio integration that lets you add a .csp (CodeSmith Project) file inside your VS project and manage all the settings from Solution Explorer. Just right-click your .csp and select “Generate Output” and your .dbml, entities, managers and query classes appear in your solution.

If you’ve been hesitant to try LINQ because of “black box” or code re-generation concerns, now is the time to give it a shot. Download a trial of CodeSmith and try the PLINQO 2.0 release or latest development trunk. Happy coding!

Kick it on DotNetKicks.com [Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]