Language’s type system and their impact in breaking dependencies
One of the things that makes TDD difficult to apply on existing code (or just “legacy code”) is the dependencies that the object subject to test may have. The more “dependencies” the object has, the more difficult it will be to test it, almost for sure.
What makes things even harder is that usually most dependencies are incorrect, there is an unnecessary coupling between objects result of bad designs (yes, there are good designs and bad designs as there are good paintings and bad paintings, good music and bad music... being the definition of good and bad relative to a set of principles or rules. At least there is a design principle in software development that we all agree on that looks to maximize cohesion and minimize coupling. So the more unnecessary dependencies, the bigger the coupling, the worst the design is).
We can see examples of unnecessary coupling everywhere, like objects that represent a Customer that are tight to a database connection, or objects that represent an e-mail that know how to convert themselves to XML and so on, coupling that breaks one of the basic design rules: do not mix orthogonal responsibilities, that is responsibilities from different problem domains. In this case, a Customer belongs to a domain and how to persist it in a database to another. In other words, an object should be as cohesive as possible (this principle is also know by the name of “Single Responsibility Principle”).
There are many problems that incorrect dependencies or strong coupling carry with them, I’m going to concentrate now: Lack of Reuse (you will see how it relates to testing and finally to the language type system)
It is not difficult to see that the stronger the coupling is, the harder the reuse is. In our example, let’s say you want to use a Customer in a different context of the one it was develop in, you will have to take the database connection to that context because the Customer is coupled with it, but maybe you don’t need it or you don’t want it in that context, or even worst, you may want to “persist” Customers in a completely different way...
We can see this problem clearly when testing because basically to test an object you have to reuse it in a context that it might not be thought for, like the testing environment (that’s why now a lot of people is talking about “Design for testability”, that is, design you software to be able to be tested, at the end make your software less coupled and more cohesive). Therefore, if you want to test Customer, you will need a database to do it! Crazy don’t you think? Has that ever happened to you? I bet it has... it has happened to all of us when maintaining a system.
So, due to this coupling we end up not testing Customer because it is too difficult to recreate the database in the testing context, and if we can do it, tests are very slow or fragile. So to be able to test the Customer we meed to break that dependency with the database, we need to lower the coupling. Clearly, there is no reason why a Customer needs a database to exist.
How to break this kind of dependencies is one of the main subjects of our TDD Advance class (a little bit of advertisement :-) but now I would like to point out the difference of doing it with a statically and dynamically typed languages given the case that it is not easily remove the database connection from Customer.
So, we want to test Customer but we can not remove the database connection collaborator from it, it is really entangled in the code and we just want to test a new and simple feature, that for sure does not need the database connection :-). One way to solve this problem is to fake the database connection, that is to provide a polymorphic object to the database connection that will make the customer believe that it is connected to the database when it is not. The main property we need for this “fake object” is to be polymorphic with the database connection, we need an object that will answer all the messages the customer will send to it. Let’s see how we get that polymorphic object using a statically and dynamically typed language.
There are basically four ways to have polymorphic objects in statically typed languages:
- Both objects are instances of the same class
- Both objects share a common superclass and therefore they are polymorphic to the type defined by that common superclass
- The classes both objects are instance of share a common implemented “interface” (construction not available in all statically typed OO languages but in many of them like Java and C#).
- We use meta-programming to accomplish this (I’m not going to comment on this particular option)
Which option is the best one for our problem? We will discard the fourth one on this post. Of the other tree, clearly the first one is not an option, we want to get rid of the database connection! and other object instance of the same class will not do it.
The second one could do it if we are willing to implement a dirty solution and sleep tight :-). That solution is to subclass DatabaseConnection and redefine all the messages it implements to behave as the test need. The advantage of this option is that we do not need to change the type of the database connection variable in Customer, that is, we do not need to change Customer. But the disadvantage is that it is pretty ugly to do something like this and of course it will not work if database connection is final or any of its public messages are final (a valid reason to never use the final option in java o similar in other languages! The final option restricts changes in your design!. If you are using C++, then all public methods should be virtual).
The third option, that is to have two classes that implement the same interface, looks nicer at least from the design point of view, but sadly it is not so easy to apply. It would be a piece of cake if the type of that variable that reference the database connection is really a type (an interface) and not a class, but sadly 99% of the time that is not the case. To explain why it would be easy, let me diverge a little bit about the difference of “typing” and “classing” a variable.
Typing a variable means that the variable’s type is “only” a type. In OO statically typed languages that is achieved with the “interface” construction or with pure abstract classes (although multiple inheritance is needed to really be able to use that abstract class as a “type”). On the other hand, “classing” a variable means that the variable’s type is a class. What is the difference? The coupling... when typing a variable, it is only coupled with the set of messages the type defines, when classing the variable not only is coupled with the type but also with the implementation the class provides, with a position in the class hierarchy and therefore all objects that that variable may reference have to be instance of that class or its subclasses. As you can see “classing” a variable makes our design more coupled, therefore makes our design more difficult to change as we can see in the Customer example.
So, if we want to use option three, first we need to create an interface for DatabaseConnection (please, do not call it IDatabaseConnection!!), make DatabaseConnection implement that interface and change the type of the variable that points to it in Customer. Then, to test Customer you have to change it first... hmm, not good don’t you think? We should try to test Customer before we change it!. Fortunately most current IDEs provide a refactoring to make this change, and due to the fact that it is a refactoring, it is safe to apply because it does not change the bahavior of the system. The “Extract Inteface” refactoring in Eclipse will create the new type for us and change all variables that were defined with the class to be defined with the new interface.
Now that we created the new type and changed the database connection variable’s type we have to create a new implementation for that interface. That is, we have to implement in a class all the messages the type defines in the way the test needs. This can get really tough if database connection is highly coupled because you will need to decouple it and... do you feel like you are in an endless loop? well certainly not endless but long for sure, and difficult.
But there is another issue that would make option three impossible to use. If we do not own DatabaseConnection, if DatabaseConnection was developed by a third party for sure we will not be able to change it and we need to change it to make it implement the new interface we defined!. So we want to fake DatabaseConnection doing a nice design, but in statically typed languages it is not easy and sometimes impossible. That is why framework developers should always “type” and never “class” variables, but as you know, interfaces are discovered almost at the end of the development cycle, when all your code is almost done! Would you change all your code to use interfaces instead of classes? for sure you will not or your manager will not give you the time to do it... feel again like in an endless loop? :-). So, faking the database connection in a statically typed language means a lot of work.
What about faking the database connection when using a dynamically typed language? Remember that we needed a polymorphic object with the database connection, but also a polymorphic object to the messages a Customer sends to it, and even more, only to the messages a Customer sends to it during the evaluation of the test! So, we do not need to implement all the messages DatabaseConnection defines but only the ones a Customer send to it... and there is even something more interesting, there is no variable type change we have to do in Customer because there are no explicit types in dynamically typed languages! Looks like faking the database connection in a dynamically typed language is easier that in a statically typed language, don’t you think?. No wonder why there are not so many “mock” frameworks on dynamically typed languages, because they are not really needed. Even more, in languages like Smalltalk, the same test implements the fake responsibility, maybe not nice but handy.
The reason behind all this difference is related to coupling again. Designs in statically typed languages are more coupled that in dynamically typed languages because variables in the former are coupled with the set of messages the type defines, while in the later objects are only coupled with the set of messages they send each other. Going to the extreme, we could say that in dynamically typed languages each message is a type. And of course, things get worse if in statically typed languages variables are not “typed” but “classed”. So, do you remember that coupling was against change and reuse? Then, it is a not difficult to see that statically typed languages makes change and sometimes reuse more difficult that dynamically typed languages.
Does that means that dynamically typed languages are “better” than static one? Of course not! As we said earlier, better or worse are relative terms and not only on fixed traits but also traits that may diverge in time and space. But if you need to do a system quickly and easy to change, you may consider to use a dynamically typed language, and if you don’t care about those requirements maybe you can stay on a statically typed language.
I hope that with this post I contributed to rationalize a little bit more the sometimes dogmatic discussion about which type system is better, if dynamically or statically typed ones and also to consider the importance to low the coupling and how to break it. I hope that with this post I contributed to rationalize a little bit more the sometimes dogmatic discussion about which type system is better, if dynamically or statically typed ones and also to consider the importance to low the coupling and how to break it.