It almost seems as if you're telling the majority of the users of EMF at Eclipse and beyond that what they do isn't good practice. All the models we have in EMF, including Ecore, Change, GenModel, all the ones in the rest of the modeling project, such as UML, XSD, OCL, and so on, and also most of those in other Eclipse projects all rely on generating repeatedly and combining the benefits of that code generation capability with regular coding practices without an artificial split between the two. Generated artifacts are clearly marked as such, but they aren't split into separate files.
To me it's very problematic if your CVS image doesn't contain a complete working version of your code but rather you need to invoke build tools before you even have a working image of the source. In the end, I simply can't agree with the premise that avoiding the loss of code due to regeneration overwriting it is the paramount issue to be solved and that all practices must revolve around that central tenant. It's very simple to ensure your code is committed to CVS before you regenerate and to review carefully the changes that regeneration makes to your code. Eclipse also has a history mechanism to recover anything you might delete for any reason.
I suspect that if you can argue that generated code can be thrown away you can argue it doesn't need to be generated at all. There's certainly a lot of cases where generating code is questionable. After all, if you have enough data to generate the right code, then you have enough data to emulate the behavior dynamically and thereby avoid the byte code bloat of generated code.
It's an interesting article though! I just don't agree with one aspect of it, and even then, simply don't agree with it as a sweeping statement to apply to all cases as opposed to something that is probably a best practice in a great many cases or perhaps even most...
Interesting discussion. I think we all agree that putting compiled classes into a VCS is a bad idea. Why is it a bad idea? Because compilation is a fast and stable process. It's reproducible. Even if you update to a new version of your compiler, you would not expect that it might produce different code. But what would you do when you update your compiler (=code generator) regularly and the source language would change too? And a whole team of developers uses your compiler on a hourly basis? I think your advice that DSLs have to be refactored shows the relevance of this scenario. The consequence is that you should put generated code in your VCS, Otherwise it may decrease your development speed and make it harder to repair a broken code generator. And DSLs should increase productivity, right? To me, separating generated artifacts via class inheritance has just a technical reason: It's the only possible solution with Java. Other languages like C# allow you to partially define your classes.
Well, as a coauthor I would like to add some points:
First of all, just because you don't check something into your VCS doesn't mean it is obsolete. It just means that it is redundant and can be recreated with little effort from the checked in artifacts.
Second, in most cases we are developing the generators (e.g. the templates) in parallel to other code. As a consequence, it is very important that the generator always matches the version of the generated and the manually written code. Therefore, we do check in our generator (at least its configuration, lets say the templates). But overall consistency is assured in the easiest way if the generated code is regenerated as soon as you update the generator. Otherwise you run the risk that some developer forgets to checkin the generated code etc. So which approach is more agile?
The third important thing is that most software manufacturers use a script-based build process for many good reasons. In Eclipse plug-in development this is very hard to set up unless you have Nick Boldt in your team :-) AFAIK, it is also still impossible to run the EMF code-generator without Eclipse (there is an Ant task but it requires to run in an Eclipse environment). As we have written in the article, this is a typical situation in which it is advisable to check in generated code. But if you have a generator that runs as natural inside or outside your IDE - and that is the analogy to a compiler - you should try to avoid the redundancy of checking in generated stuff.
Fourth, I have worked with some generative frameworks for a while now, stable and experimental ones, and I found it very usefull to be able to throw away all generated stuff. It is analogous to a "clean build", which should never be necessary in a perfect world, but there are definitely situations in which it saves your day. Deleting the generated source folder is the easiest solution if a generator run messes up everything or leaves stale artifacts.
Jan, a key aspect of your statement is to highlight the fact that there are shades of gray in life's best practices. You also have to be careful with the agile arguments because that too is a two edged sword. I.e., how do you ensure that your community, when the check out the code from CVS, immediately see a workable image?
Obviously if the generator is in CVS as well, the generator itself would need to be extracted, built, and then invoked. Of course with Eclipse 2.4's improved support for bootstrapping this is now far easier, but consider the fact that EMF's core, and generator models are themselves generated and hence there is a chicken-and-egg problem.
If the code is going to be automatically generated, by a builder say, the fact that the build is producing changes in the generated code will be highlighted by the fact that there is a CVS delta, so it would be very easy to see what's changed. In fact, this approach also helps highlight if the generator is perhaps producing unintentional changes and if hand written changes are being replaced by the generator (one of the problems mentioned as in need of a solutions).
It is possible to run the EMF generator standalone, and there are command line utilities for invoking the generator, importer, and other utilities. But I don't see that as much of an issue one way or the other. All utilities are going to require libraries to run so who is it significant which particular libraries are needed?
I think there continue to be a good arguments for why the full image of the source code, generated or otherwise, ought to be maintained in CVS. That being said, there are also very good arguments against that, and you've made a few good ones in that regard. I would generally take the approach of pointing out the strengths and weaknesses of the different approaches so folks will chose for themselves the trade-offs that work well for them.
4 Kommentare:
It almost seems as if you're telling the majority of the users of EMF at Eclipse and beyond that what they do isn't good practice. All the models we have in EMF, including Ecore, Change, GenModel, all the ones in the rest of the modeling project, such as UML, XSD, OCL, and so on, and also most of those in other Eclipse projects all rely on generating repeatedly and combining the benefits of that code generation capability with regular coding practices without an artificial split between the two. Generated artifacts are clearly marked as such, but they aren't split into separate files.
To me it's very problematic if your CVS image doesn't contain a complete working version of your code but rather you need to invoke build tools before you even have a working image of the source. In the end, I simply can't agree with the premise that avoiding the loss of code due to regeneration overwriting it is the paramount issue to be solved and that all practices must revolve around that central tenant. It's very simple to ensure your code is committed to CVS before you regenerate and to review carefully the changes that regeneration makes to your code. Eclipse also has a history mechanism to recover anything you might delete for any reason.
I suspect that if you can argue that generated code can be thrown away you can argue it doesn't need to be generated at all. There's certainly a lot of cases where generating code is questionable. After all, if you have enough data to generate the right code, then you have enough data to emulate the behavior dynamically and thereby avoid the byte code bloat of generated code.
It's an interesting article though! I just don't agree with one aspect of it, and even then, simply don't agree with it as a sweeping statement to apply to all cases as opposed to something that is probably a best practice in a great many cases or perhaps even most...
Interesting discussion. I think we all agree that putting compiled classes into a VCS is a bad idea.
Why is it a bad idea? Because compilation is a fast and stable process. It's reproducible. Even if you update to a new version of your compiler, you would not expect that it might produce different code. But what would you do when you update your compiler (=code generator) regularly and the source language would change too? And a whole team of developers uses your compiler on a hourly basis? I think your advice that DSLs have to be refactored shows the relevance of this scenario. The consequence is that you should put generated code in your VCS, Otherwise it may decrease your development speed and make it harder to repair a broken code generator. And DSLs should increase productivity, right?
To me, separating generated artifacts via class inheritance has just a technical reason: It's the only possible solution with Java. Other languages like C# allow you to partially define your classes.
Well, as a coauthor I would like to add some points:
First of all, just because you don't check something into your VCS doesn't mean it is obsolete. It just means that it is redundant and can be recreated with little effort from the checked in artifacts.
Second, in most cases we are developing the generators (e.g. the templates) in parallel to other code. As a consequence, it is very important that the generator always matches the version of the generated and the manually written code. Therefore, we do check in our generator (at least its configuration, lets say the templates). But overall consistency is assured in the easiest way if the generated code is regenerated as soon as you update the generator. Otherwise you run the risk that some developer forgets to checkin the generated code etc. So which approach is more agile?
The third important thing is that most software manufacturers use a script-based build process for many good reasons. In Eclipse plug-in development this is very hard to set up unless you have Nick Boldt in your team :-) AFAIK, it is also still impossible to run the EMF code-generator without Eclipse (there is an Ant task but it requires to run in an Eclipse environment). As we have written in the article, this is a typical situation in which it is advisable to check in generated code. But if you have a generator that runs as natural inside or outside your IDE - and that is the analogy to a compiler - you should try to avoid the redundancy of checking in generated stuff.
Fourth, I have worked with some generative frameworks for a while now, stable and experimental ones, and I found it very usefull to be able to throw away all generated stuff. It is analogous to a "clean build", which should never be necessary in a perfect world, but there are definitely situations in which it saves your day. Deleting the generated source folder is the easiest solution if a generator run messes up everything or leaves stale artifacts.
Jan, a key aspect of your statement is to highlight the fact that there are shades of gray in life's best practices. You also have to be careful with the agile arguments because that too is a two edged sword. I.e., how do you ensure that your community, when the check out the code from CVS, immediately see a workable image?
Obviously if the generator is in CVS as well, the generator itself would need to be extracted, built, and then invoked. Of course with Eclipse 2.4's improved support for bootstrapping this is now far easier, but consider the fact that EMF's core, and generator models are themselves generated and hence there is a chicken-and-egg problem.
If the code is going to be automatically generated, by a builder say, the fact that the build is producing changes in the generated code will be highlighted by the fact that there is a CVS delta, so it would be very easy to see what's changed. In fact, this approach also helps highlight if the generator is perhaps producing unintentional changes and if hand written changes are being replaced by the generator (one of the problems mentioned as in need of a solutions).
It is possible to run the EMF generator standalone, and there are command line utilities for invoking the generator, importer, and other utilities. But I don't see that as much of an issue one way or the other. All utilities are going to require libraries to run so who is it significant which particular libraries are needed?
I think there continue to be a good arguments for why the full image of the source code, generated or otherwise, ought to be maintained in CVS. That being said, there are also very good arguments against that, and you've made a few good ones in that regard. I would generally take the approach of pointing out the strengths and weaknesses of the different approaches so folks will chose for themselves the trade-offs that work well for them.
Post a Comment