Thursday 20 April 2017

Overcoming Azure Storage Queue Message Size Limits

Background

I'd been working on a project for work that required the ability to output PDF files.
Normally this is a simple enough task, and I'd just utilize the terrific WkHtmlToPdf tool to convert some markup to a PDF.  There's wrappers for doing this using standard c# but that isn't the point of this post.
Now inconveniently, Azure web apps restrict access to the drawing API that WkHtmlToPdf uses, rendering it largely useless.  I tinkered for a bit and found that this restriction did not apply to the webjobs that run inside of the Web App! Success! All I now needed to do was write a web job that took the HTML, and returned the rendered PDF.

Storage Queues to the Rescue!

Having worked with Azure web jobs before, I knew how powerful and easy to use the Storage Queue API is.  A great resource on getting started can be found here:

https://docs.microsoft.com/en-us/azure/app-service-web/websites-dotnet-webjobs-sdk-storage-queues-how-to

After a bit of design, I decided the best course of action was to send the HTML, along with a callback URL to post the resulting PDF back to the server, to the webjob through a queue message. I'll spare the implementation details, just trust me that it worked.

Or so I thought...

On the Subject of Message Limits

During testing, I found it kept producing the PDFs perfectly! Except for one.  This particular report was a few pages long, and contained a lot of text.  Over 64KB of text apparently.  Now this was an issue.  According to the azure documentation, 64 KB is the limit for an individual message.  This was simply upsetting.  I thought to myself, okay let's just minify the HTML and perhaps even gzip it!  These were the thoughts of a broken man.  I knew this would just put the problem off until it broke again.  I put my StackOverflow hat on and looked around for the solutions people had used before me.  In my heart I knew the answer was separate storage and only sending a link to the file in the message for the webjob to download.  The people of StackOverflow confirmed this.

Now working with the blob storage is a sinch.  Handling caching and deletion of old data is somewhat more of a problem.  It's not a very hard one, just one I didn't want to write a custom solution for.  Surely someone had the ambition to write this for me and release it to the world!
Not for free unfortunately.

  



The Birth of Storks

This section isn't as messy as it sounds.
I set off to create a multi purpose, platform independant, buzzword filled plugin to solve the problem for me and all those after me.  I laid out my specifications.  It needed to:
  • Be abstract enough to not care what the backing storage that handled retrieval and storage operations was
  • Handle most use cases by default
  • Be extensible enough to handle custom storage items and custom stores
  • Work out of the box with minimal configuration
  • Ideally handle old data deletion automatically
After a few hours of work, I had the core system completed.
It utilized a controller for calling all storage and retrieval messages.  This controller handles using the correct encoder to encode and decode data, and then using the supplied DataStore to pull and push data. The core implements a DataStore for:
  • In Memory storage, useful for tests and a caching mechanism.  Not that great for actual usage
  • Local File Storage, useful for on premise solutions or systems which don't differentiate between site storage and file storage
  • Delegated store.  This store allows you to supply functions as the Get/Set operations.  I haven't found a great need for it yet, but I imagine someone could use it when dealing heavily in compiled expression lambdas
A plugin extends this support to Azure Blob Storage, which also handles old data deletion to not use as much space.  I may incorporate this into the local file storage as well.

Find the Project and instructions on getting started on github here:
https://github.com/LiamMorrow/Storks

Or install via nuget with:

Core

Contains base interfaces and default implementations using file and in memory data stores
Install via the Package Manager Console
Install-Package Storks

Azure Storage Implementation

Contains an implementation using Azure Blob storage as a data store
Install via the Package Manager Console
Install-Package Storks.AzureStorage


That's all for now
Until next time,
Liam Morrow

No comments:

Post a Comment

Overcoming Azure Storage Queue Message Size Limits

Background I'd been working on a project for work that required the ability to output PDF files. Normally this is a simple enough tas...