Building a DSL for Scheduling Tasks
Join the DZone community and get the full member experience.
Join For FreeThis article is taken from Building Domain Specific Languages in Boo from Manning Publications. It shows how to build a DSL using task scheduling as the concrete example. For the table of contents, the Author Forum, and other resources, go to http://manning.com/rahien/.
It is being reproduced here by permission from Manning Publications. Manning early access books and ebooks are sold exclusively through Manning. Visit the book's page for more information.
MEAP Began: February 2008
Softbound print: November 2009 (est.) | 400 pages
ISBN: 1933988606
Use code "dzone30" to get 30% off any version of this book.
Let us take scheduling tasks as our example, and see how we can demonstrate the difference between the code and DSL. From the usage perspective, what do we want when we talk about scheduling tasks? See table 1.
Table 1 - High level overview of scheduling requirements
Defined named tasks | Define actions to occur when task is executing | Define when task should be executed |
Crawl site | Check URL response | Once a day |
Backup database | Send email | Once a week |
Check service is online | Generate Report | Every hour |
When we are writing the engine for that, we are concerned about how to schedule the task, how to handle errors, how to verify that we have not skipped executing the task because the machine restarted, etc.
But, from the point of view of the domain we are working at, that is not really meaningful. When I want to define a new task, I really want to be able to deal just with the scheduling semantics, not with implementation mechanics.
We can get a lot of that by building façades, hiding all the implementation details under the façade's abstraction, and we can get fairly good syntax using a fluent interface. So, what is the point in building a DSL?
The difference between fluent interfaces and DSLs
If you have a fluent interface, you already have a DSL. It is a limited one, admittedly, but it is still a DSL for all intents and purposes. There are significant differences in readability between a fluent interface and a DSL, because you have a lot of freedom when you define the language for the DSL, but you have to work within the limits of a typically rigid language (1) to get the fluent interface to work.
Because of those limitations, I tend to separate DSLs and fluent interfaces to different tasks. I would use a fluent interface where I need just a touch for a language oriented programming, and would go with a DSL whenever I need something with a bit more flexibility in it.
We will take a look at the code, and that should demonstrate these differences. Please do not concern yourself with the implementation details for now; right now, just look at the syntax.
First, Listing 1 has a fluent interface example.
Listing 1 – Using fluent interface to define a task to execute
new FluentTask("warn if website is down")
.Every( TimeSpan.FromMinutes(3) )
.StartingFrom( DateTime.Now )
.When(() => new WebSite("http://example.org").IsNotResponding )
.Execute(() => Notify(“admin@example.org”, “server down!”))
.Schedule();
And listing 2 has the DSL equivalent for this is:
Listing 2 – Using a DSL to define a task to execute
task "warn if website is down":
every 3.Minutes()
starting now
when WebSite("http://example.org").IsNotResponding
then:
notify "admin@example.org", "server down!"
The DSL code doesn’t have to work just to make the compiler happy. We don’t have the ugly lambda declaration in the middle of the code or syntactic baggage in terms of parenthesis and operators of various kinds.
It would be hard to take anything away from the DSL sample without losing some meaning. We have very little noise there. And noise reduction is important when we are talking about Language Oriented Programming. The less noise we have, the clearer the code is, after all.
Now, remember that a Fluent Interface is also a DSL. This means that we can make the fluent interface sample even clearer. Tim Wilde was kind enough to do just that for us (2), reaching this syntax (listing 3):
Listing 3 – A better fluent interface for scheduling tasks
Schedule.Task( "warn if website is down" ).
Repeat.Every( 3 ).Minutes.
Starting( DateTime.Now ).
If( Web.Site( "http://example.org" ).IsNotResponding() ).
Notify( "admin@example.org", "Site down!" );
But the catch, and there is always a catch, is that the complexity of the Fluent Interface implementation grows significantly as we try to express richer and richer concepts. In this case, the “back end” of the implementation got to 8 classes and 6 interfaces, all for five lines of code, while the DSL implementation is much simpler.
So, Fluent Interfaces tend to be harder to scale than DSL. Does this mean that we can stop using Fluent Interfaces all together? As usual, the answer is that it depends, but broadly, no.
Choosing between a fluent interface and a DSL
I’ll admit that I tend to favor building DSLs over Fluent Interfaces, precisely because I leant the other way in the past, and got badly burned by the complexity involved in trying to maintain something of that complexity in a rigid language.
There is a time and place for everything, but I tend to use the following questions as guidelines for decide what to use:
- Who will use the DSLs?
- When will they use the DSLs?
- When is it to be changed?
If I need to be expressive at a different time than the development of the application, then it probably means that I should use a DSL instead of a fluent interface. A fluent interface is probably a good idea if you intend to use the fluent interface at the same time that you write the code. You don’t have a mental disconnect when you switch between languages and you get to keep the flow going.
This directly related to the question of who is going to use the DSL / Fluent Interface. If it is something that we are going to be using the DSL with our domain experts, we probably want to have a full blown DSL in place, since that will make it easier to talk in the concepts of the domain. If the target audience are programmers, and the expected usage is during normal development, than a fluent interface would be appropriate.
The issues of complexity vs. expressiveness also come into play, of course. Getting a Fluent Interface expressive can be prohibitive in terms of the time and complexity involved, depending on what you need.
How to deal with language hopping
One thing that some people seem to have a theoretical issue with is dealing with more than one language at a time. I say theoretical because it is usually a theoretical issue more than a practical one.
On the web world, people have little problem hopping between HTML, JavaScript and server side programming, so this disconnect isn’t a big problem.In many enterprise applications, important parts of the applications are written in SQL (stored procedure, triggers, complex queries, etc). In this case as well, we need to frequently move between our code (C#, Java, VB.NET) and SQL queries.
From my experience, it has rarely caused any issues from the language disconnect side (it has other issues, but this is not the place to talk about them).
Given that you remain consistent in your domain, and kept your language short and to the point, I do not believe that you need to concern yourself with that.
If you need to perform actions outside the direct development cycle, a DSL is the place to look for. The need to add functionality to an application "on the fly", or modify business logic without a costly deployment cycle. If you feel the urge to put XML as anything except strict data storage mechanism – that is a good time to consider using a DSL.
It tends to be much easier to modify a DSL once you are in production, but I have a strong aversion of modifying things in production that haven’t gone through the full testing cycle. In particular, ad-hoc changes to production should never occur.
Extensibility is also a concern; let us go back to the scheduling sample. We have the scheduling engine, and we have the tasks themselves. Consider that to write a task using a DSL I usually have to write a text file, but to write a task using a Fluent Interface requires creating a project, compiling, etc.
Even if I am looking at the file alone, a fluent interface needs to have some structure around it. This usually means a class, a well known method name, etc.
This means that it is often easier to use a DSL than a Fluent Interface, especially if we would like to hand this DSL to non developers.
The benefit for using a fluent interface is that you usually don't have to write any tools to handle a fluent interface. Your IDE is already there and you can utilize that as is. A custom language would require us to build the tools, although we can base ourselves on existing infrastructure.
Coming back to DSLs again, we will explore the reasons and motives that drive us toward building each type of DSL, and how that affects the DSLs that we are building.
Implementing the Scheduling DSL
Listing 4 will refresh our memory about how the Scheduling DSL looks like:
Listing 4 – Sample code from the scheduling DSL
task "warn if website is down":
every 3.Minutes()
starting now
when WebSite("http://example.org").IsNotResponding
then:
notify "admin@example.org", "server down!"
It doesn’t look very much like code, right? But take a look at this class diagram in figure 1.
Figure 1 – Class diagram of BaseScheduler, the implicit base class for the scheduling DSL
This is the Implicit Base Class for the scheduling DSL. An Implicit Base Class is one of the more common way in which we can define and work with a DSL.
For now, please assume that the DSL code that you see is being magically placed in the Prepare() method of a derived class. This means that you now have full access to all the methods that the BaseScheduler exposes, since those are exposed by your base class.
What this mean, in turn, is that we can now look at both the DSL and the class diagram and suddenly understand that most of what goes on here involves plain old method calls. Nothing fancy or hard to understand, we are merely using a slightly different syntax to call them than we would usually would.
We are doing minor extension of the language here, however. Two methods here are not part of the API, but rather part of the language extension:
- Minutes() – This is a simple extension method that allows us to specify 3.Minutes(), which read nicer that TimeSpan.FromMinutes(3), which is what it basically does.
- when(Expression) – This is a meta method, a method that can modify the language. It specifies that the expression that is passed to it will be wrapped in a delegate and stored in an instance variable.
That doesn’t really make sense right now, I know. So let us start taking this DSL apart. We will use the exact opposite approach that we use when we are building the DSL. We will add the programming concepts to the existing DSL until we have fully understood how this works.
Let us start by just adding parenthesis and removing some compiler syntactic sugar magic. Listing 5 shows the results of that.
Listing 5 – The Scheduling DSL after removing most of the syntactic sugar niceties
task("warn if website is down", do() :
self.every( self.Minutes(3) )
self.starting ( self.now )
self.when( WebSite("http://example.org").IsNotResponding)
self.then( do():
notify( "admin@example.org", "server down!")
)
)
A couple of notes about this before we continue:
- self in Boo is the equivalent of this in C#/Java or Me in VB.NET
- do(): is the syntax for anonymous delegate in Boo
That looks a lot more like code now (and a lot less like a language). But this isn’t everything yet. We still need to resolve the when() meta method. When we run that, we will get the result shown in listing 6.
Listing 6 – The Scheduling DSL after we resolved the when() meta-method
task("warn if website is down"), do() :
self.every( self.Minutes(3) )
starting ( self.now )
condition = do():
return
WebSite("http://example.org").IsNotResponding
then( do():
notify( "admin@example.org", "server down!")
)
)
As you can see, we utterly removed the when method, replacing it with an assignment of an anonymous delegate for the instance variable.
Note also that this is also the only piece compiler magic that we have performed. Everything else is already in the language.
Meta methods and anonymous blocks
Take a look at the "when" and "then" methods. Both of them end up with a very similar syntax. But they are implemented in drastically different ways.
The "when" method is a meta method, it change the code that will be compiled at compilation time. The "then" method, however, is using an anonymous block, as a way to pass the delegate to execute.
The reason that we have two different approaches that end up with nearly the same thing has to do with the syntax that we want to achieve.
With the "when" method, we want to achieve a keyword-like behavior, so the "when" method accepts an expression, and transform that to a delegate.
The "then" keyword, however, have a different syntax, which accept a block of code, so we use Boo's anonymous blocks to help us out here.
Now, to make sure that we are clear, we can take the code above and make a direct translation to C#, in which case it would look like listing 7:
Listing 7 – The Scheduling DSL code, translated to C#
task("warn if website is down"), delegate
{
this.every( this.Minutes(3) );
this.starting ( this.now );
this.condition = delegate
{
return new WebSite("http://example.org"). IsNotResponding;
};
this.then( delegate
{
this.notify( "admin@example.org", "server down!");
});
});
Now, please take a look back at the original DSL text, and compare it this. Functionality wise, they are just the same. The syntactic differences between them are huge, however, and we want to get a good syntax for our DSL.
We have missed on important part; we haven’t talked yet about the Implicit Base Class implications. The actual code looks like listing 8.
Listing 8 – The full class that was generated using the implicit base class
public class MyDemoTask ( BaseScheduler ):
def override Prepare():
task("warn if website is down"), def():
# the rest of the code
Now that we have a firm grasp on what code we are getting out of the DSL, we need to get to grips with how we actually run this code…
Running the scheduling DSL
So far we have spoken entirely in terms of what kind of transformations we are putting the code through, but we haven’t talked yet about how we can actually compile and execute a DSL.
Remember, we are not dealing with scripts in the strict sense of the word; we have no interpreter to run. We are going to compile our DSL to IL, and then execute this IL.
So, in order to get something that we can actually execute, we will need to compile the code and then run it. The code that it takes to do this is not hard, just annoying to write time after time, so I wrapped it in a common project called Rhino DSL (3).
The Rhino DSL Project
The Rhino DSL project is a set of components that turned out to be useful across many DSL implementations. It contains classes to aid in building a DSL engine, support building implicit base classes, multi file DSLs, etc.
We are going to use Rhino DSL; it is an open source project, licensed under the BSD license, which means that you can use it freely in any type of application or scenario.Compilation is expensive, and in the CLR, once we load an assembly, we have no way of freeing the occupied memory short of unloading the entire AppDomain.
These two problems lead us toward the need to do at least some caching upfront. Again, having to do this in a DSL by DSL basis is annoying, and we would like to get the cost of creating a DSL down as much as possible.
For all of those reasons, Rhino.DSL provides the DslFactory class, which takes care of all of that for us. It works closely with the DslEngine, which is the class we derive from in order to specify how we want the compilation of the DSL to behave.
Again, none of this is strictly needed, you can do it yourself very easily, if you so choose, but it simply makes it easier and allows us to focus on the DSL implementation instead of the compiler mechanics.
We have already looked at the BaseScheduler class, now we are going to take a peek at the SchedulingDslEngine. Listing 9 has the full source code of the class.
Listing 9 – The implementation of SchedulingDslEngine
public class SchedulingDslEngine : DslEngine
{
protected override void CustomizeCompiler(
BooCompiler compiler,
CompilerPipeline pipeline,
string[] urls)
{
pipeline.Insert(1,
new ImplicitBaseClassCompilerStep(
typeof (BaseScheduler),
"Prepare",
// default namespace imports
"Rhino.DSL.Tests.SchedulingDSL"));
}
}
As you can see, while it is not doing much, what it does do is interesting. The method is called CustomizeCompiler. What is important to keep in mind right now is that Boo allows you to move code around during compilation, and the ImplicitBaseClassCompilerStep does just that.
What it means is that it will create an implicit class for us, which will derive from BaseScheduler. All the code in the file will be placed in the derived method “Prepare”. We can also specify default namespace imports. In listing 9, you can see that we are adding the "Rhino.DSL.Tests.ShedulingDSL" namespace. This namespace will be imported to all the DSL scripts, so we will not have to explicitly import it. VB.NET users are familiar with this feature, using the project imports.
This is nearly it, given that and the BaseScheduler class that we have already seen, we are nearly at the point when we can execute out DSL.
The one thing that is still missing is the DslFactory intervention. Listing 10 shows how we can work with that.
Listing 10 – Executing a Scheduler DSL script
//initialization
DslFactory factory = new DslFactory();
factory.Register<BaseScheduler>(new SchedulingDslEngine());
//get the DSL instance
BaseScheduler scheduler = factory.Create<BaseScheduler>(
@"path/to/ValidateWebSiteUp.boo");
//This is where we run the code from the DSL file
scheduler.Prepare();
//Run the prepared scheduler
scheduler.Run();
First we initialize the DslFactory and then we create and register a DslEngine for the specific base type that we want. Note that you should only do this once, probably in the startup of the application. This usually means the Main method in console and windows applications, Application_Startup in web applications.
We then get the DSL instance from the factory; we pass both the base type that we want (which is associated with the DslEngine that we registered and the return value of this method) and the path to the DSL script. Usually, this will be a path of the file system, but embedded resources, URL and even source control links are all things that I have seen used in the past.
Once we get the DSL instance, we can do whatever we want with it. Usually, this depends on the type of DSL that I have. Using imperative DSL, I would tend to call the Run() or Execute() methods. Using a declarative DSL, I would usually call a Prepare() or Build() method, which would execute the code that we wrote using the DSL, and then I would call the Run() / Execute() method, which would take the result of the previous method call and act upon it. In more complex scenarios, we would ask a separate class to process the results, instead of having the base class share both responsibilities.
In the Scheduling DSL case, we use a declarative approach, so we call the Prepare() method, to get whatever declaration were made in the DSL, and then we actually run the code. The Run() method in such a DSL would usually perform some sort of a registration into a scheduling engine.
(1) This, of course, assumes that we are talking about common static languages. Fluent interfaces in dynamic languages are a different matter, and much closer to a DSL.
(2) Fluent Interfaces – Tim Wilide http://www.midnightcoder.net/Blog/viewpost.rails?postId=38
(3) The Rhino [Project Name] is a naming convention that I use for most of my projects. You may be familiar with Rhino Mocks, for example, which is part of the same group of projects. Rhino DSL is another part. There is no association to Mozilla's Rhino project, which is a Javascript implementation in Java.
Opinions expressed by DZone contributors are their own.
Comments