Thursday, 20 April 2017

Overcoming Azure Storage Queue Message Size Limits

Background

I'd been working on a project for work that required the ability to output PDF files.
Normally this is a simple enough task, and I'd just utilize the terrific WkHtmlToPdf tool to convert some markup to a PDF.  There's wrappers for doing this using standard c# but that isn't the point of this post.
Now inconveniently, Azure web apps restrict access to the drawing API that WkHtmlToPdf uses, rendering it largely useless.  I tinkered for a bit and found that this restriction did not apply to the webjobs that run inside of the Web App! Success! All I now needed to do was write a web job that took the HTML, and returned the rendered PDF.

Storage Queues to the Rescue!

Having worked with Azure web jobs before, I knew how powerful and easy to use the Storage Queue API is.  A great resource on getting started can be found here:

https://docs.microsoft.com/en-us/azure/app-service-web/websites-dotnet-webjobs-sdk-storage-queues-how-to

After a bit of design, I decided the best course of action was to send the HTML, along with a callback URL to post the resulting PDF back to the server, to the webjob through a queue message. I'll spare the implementation details, just trust me that it worked.

Or so I thought...

On the Subject of Message Limits

During testing, I found it kept producing the PDFs perfectly! Except for one.  This particular report was a few pages long, and contained a lot of text.  Over 64KB of text apparently.  Now this was an issue.  According to the azure documentation, 64 KB is the limit for an individual message.  This was simply upsetting.  I thought to myself, okay let's just minify the HTML and perhaps even gzip it!  These were the thoughts of a broken man.  I knew this would just put the problem off until it broke again.  I put my StackOverflow hat on and looked around for the solutions people had used before me.  In my heart I knew the answer was separate storage and only sending a link to the file in the message for the webjob to download.  The people of StackOverflow confirmed this.

Now working with the blob storage is a sinch.  Handling caching and deletion of old data is somewhat more of a problem.  It's not a very hard one, just one I didn't want to write a custom solution for.  Surely someone had the ambition to write this for me and release it to the world!
Not for free unfortunately.

  



The Birth of Storks

This section isn't as messy as it sounds.
I set off to create a multi purpose, platform independant, buzzword filled plugin to solve the problem for me and all those after me.  I laid out my specifications.  It needed to:
  • Be abstract enough to not care what the backing storage that handled retrieval and storage operations was
  • Handle most use cases by default
  • Be extensible enough to handle custom storage items and custom stores
  • Work out of the box with minimal configuration
  • Ideally handle old data deletion automatically
After a few hours of work, I had the core system completed.
It utilized a controller for calling all storage and retrieval messages.  This controller handles using the correct encoder to encode and decode data, and then using the supplied DataStore to pull and push data. The core implements a DataStore for:
  • In Memory storage, useful for tests and a caching mechanism.  Not that great for actual usage
  • Local File Storage, useful for on premise solutions or systems which don't differentiate between site storage and file storage
  • Delegated store.  This store allows you to supply functions as the Get/Set operations.  I haven't found a great need for it yet, but I imagine someone could use it when dealing heavily in compiled expression lambdas
A plugin extends this support to Azure Blob Storage, which also handles old data deletion to not use as much space.  I may incorporate this into the local file storage as well.

Find the Project and instructions on getting started on github here:
https://github.com/LiamMorrow/Storks

Or install via nuget with:

Core

Contains base interfaces and default implementations using file and in memory data stores
Install via the Package Manager Console
Install-Package Storks

Azure Storage Implementation

Contains an implementation using Azure Blob storage as a data store
Install via the Package Manager Console
Install-Package Storks.AzureStorage


That's all for now
Until next time,
Liam Morrow

Wednesday, 12 April 2017

Caching Strategy For Compiled Expressions

So you may or may not be aware that using Activator.CreateInstance is very slow, and if you are doing this quite a bit it is worth using (cached) compiled expressions to do this.

Now if you don't know what an Expression is, there are many more qualified people than myself who can explain the concept.  A good explanation by Jon Skeet can be found here:

Using compiled expressions to speed these operations up is well documented on stackoverflow and commonly used.
A quick implementation may be something like this:
static ConcurrentDictionary<Type,Delegate> cache = new ConcurrentDictionary<Type,Delegate>();

static T Create<T>(){
    var constructor = (Func<T>) cache.GetOrAdd(typeof(T), CreateConstructorDelegate);
    return constructor.Invoke(); // Notice you can use invoke because it is a generic
} 

static object Create(Type t){
    var constructor = cache.GetOrAdd(typeof(T), CreateConstructorDelegate);

    // Note we must use DynamicInvoke because the type isn't known at compile time
    return constructor.DynamicInvoke(); 
} 

static Delegate CreateConstructorDelegate(Type t){
    var body = Expression.New(t);
    return Expression.Lambda(body).Compile();
}
Now the problem with this method is in the non generic DynamicInvoke area. After timing it I found it was no better than using Activator.CreateInstance.

For my purposes this was unacceptable, so I dug around looking for a way to call Invoke without knowing the type at compile time. Unfortunately this just isn't possible but it got me thinking... All* instances in c# derive from the standard .net System.Object!
Now Func<T> can't be cast to Func<object> even if T inherits from object due to generic classes not having covariance so I had to figure out something else.

I found by modifying the expression to return an object I could then call Invoke!
I achieved this by instead of returning the direct proper type, wrap it in a ConvertExpression to cast the result as an object first.
This changes the return type of the generated function to System.Object.

Paired with an effective caching and abstraction layer, all that would be required is to cast the return to the type that you are actually working with.  A generic wrapper method is perfect for this.

Now there is some extra memory overhead when value types (int, double, etc.) are casted to an object due to boxing.  To mitigate this, I have used a separate cache for when the method is called with a generic which does no boxing rather than a System.Type.

 All it took was changing the expression creator to this:
static ConcurrentDictionary<Type,Func<object>> cache = new ConcurrentDictionary<Type,Func<object>>();

static ConcurrentDictionary<Type,Delegate> genericCache = new ConcurrentDictionary<Type,Delegate>();

static T Create<T>(){
    var constructor = genericCache.GetOrAdd(typeof(T), x=>CreateConstructorDelegate<T>());
    return ((Func<T>)constructor).Invoke(); // Notice you can use invoke because it is a generic
} 

static object Create(Type t){
     var constructor = cache.GetOrAdd(typeof(T), CreateConstructorDelegate);

     // Note we can now use Invoke because it is a strongly typed Func<object>!
     return constructor.Invoke();
} 

static Func<T> CreateConstructorDelegate<T>(){
    var body = Expression.New(typeof(T));
    return Expression.Lambda<Func<T>>(body).Compile();
}

static Func<object> CreateConstructorDelegate(Type t){
    var ctor = Expression.New(t);

    // Note we now cast the new T to an object inside the function
    // allowing us to ensure that it returns a Func<object> on compile
    var body = Expression.Convert(ctor, typeof(object)); 
    return Expression.Lambda<Func<object>>(body).Compile();
}
After running the performance tests I found the overhead was negligible of casting to object, and still 2 orders of magnitude faster than using DynamicInvoke. Here are the timings of creating 1,000,000 objects running in release mode without the debugger, ordered slowest to fastest:
MethodSeconds
Non Generic Compiled Lambda (DynamicInvoke)01.1585493
Activator.CreateInstance01.0153902
Compiled Generic Lambda Without Cast (Invoke)00.0759568
Object Casted Compiled Expression (Invoke)00.0726109
Natively Calling Constructor00.0050353
As you can see the casted expression performed equally with the generic, non casted Func.

That's all for now
Until next time,
Liam Morrow

Welcome

Welcome to my coding blog!
My name is Liam Morrow I've always had a passion for software development. Oftentimes I find myself wanting to write something down or share ideas with the world, so I thought today would be the day that I start this blog!

A bit of background, I've been coding for a number of years as a hobby, and recently (2 years ago) I've started doing it for a living.  This blog will contain posts of anything I learn or do and feel like sharing.
I mainly work in the .Net framework, primarily C# and that will most likely be what most of my posts will be about. I've always just loved how well thought out it is while still being powerful.  You can imagine my happiness when .Net Core was released and allowed me to utilize a fresh framework that most importantly worked cross platform.

That's enough for now, 
Until next time
Liam Morrow

Overcoming Azure Storage Queue Message Size Limits

Background I'd been working on a project for work that required the ability to output PDF files. Normally this is a simple enough tas...