Showing posts with label Open Source. Show all posts
Showing posts with label Open Source. Show all posts

Thursday, February 18, 2010

[Tech] GIT:Mercurial = Assembler:Java

I am using Mercurial since about half a year pretty regularly and I am also (forced) to use GIT recently. And I must say, that I am not pleased with the GIT experience at all. An initial statement first, though: I am not arguing about features here; it is no doubt, that GIT is an extremely powerful and also reliable sourcecode management system. But the user experience is, in my opinion, questionable at least. 

The first time I got into contact with distributed SCM systems was through Mercurial. There are some new concepts that have to be understood (coming from Subversion), but generally spoken it is pretty easy to start with. Simple things are simple, complex things are mostly reasonable to understand. The basic set of commands and switches is kept simple and are hence easy to learn and understand. One is not flooded with commands and options in the beginning; specific functions (e.g. patch queues, rebasing, ...) can be switched on later by enabling the very extension. About 35 extensions are part of the Mercurial distributions and can be enabled by adding one line to the config file. Other extensions can be installed when needed.

In my opinion, this is a very clever way to hide unecessary complexity in the beginning and to provide the new user with a clean and simple set of commands. Later on, enabling specific extensions allow a "fine-tunes" feature set. Along comes a concise but pretty good help documentation.  One example: "hg help log" explains the log command in about one screen page.

The first encounter with GIT on the other hand is in my opinion rather terrible. Try "git help log" and you get around 20 (sic!) screens with Unix-style documentation (and this is not meant as a compliment) of the log command. There is a lot of documentation on the GIT homepage though. But there are a lot of other "goodies" in the man pages as well. One more example? 

Well: "hg glog" shows me a representation of the history of my repository including branches. "git log" does only show the current branch. OK, so there should be an option to control that. Well, first of all: good luck with 20 man pages... Then I searched for "/branch" in the man-page and found the "--branches" switch (which apparently does the job). Yet what do I find as explanation of this switch?
--branches: Pretend as if all the refs in $GIT_DIR/refs/heads are listed on the command line as
I am a very peaceful character, but honestly, when I read such a statement in a documentation, I feel the urgent need rising to beat the developer who wrote that line.

The Git community book that is (according to the website) "meant to help you to learn how to use GIT as quickly and easily as possible" start in the first chapter with a detailed explanation of the internal GIT data model instead of explaining fundamental principles of DSCM und Git. WTF? To be fair, there is a set of other documentation artefacts on the website that appear to be significantly better for starters though.

The main issue in my point of view, all considered, is the lack of encapsulation/layering of functionality. What is done great in Mercurial (e.g. with extensions) is done very bad in GIT.  My feeling with GIT is, that there is at least one level of abstraction too few in the design. The user experience reminds me to the less good student projects I have seen over the years: In many of these projects there is no need to take a look at e.g. the ER diagram or database schema: a glance at the GUI is sufficient. Each database table is represented in the GUI, probably separated by "tabs". "Great" design: all internals layed out to the user, who is usually not interested in technical details, but in solving a specific (higher level) problem.

I got a similar feeling with GIT: no doubt it is technically on a very (!) different level than the mentioned student projects, but the usability feels pretty much the same. With nearly every interaction I have not the impression of solving my SCM problem, but interacting with technical details I am not actually interested in. It feels like cars in the 1920 (or Vespas in the 90s), you spend more time under the hood fixing some crap then driving to your destination.

I really hope, that the GIT team is going to improve the user interface. in the future.  Allowing low-level ("assembler commands") for specific purposes and experts and a significantly better abstracted set of commands for "day to day" operations. Consequentially, at the moment, I would definitly recommend Mercurial over GIT, simply by the much better user experience and layering/abstraction of functionality.

p.s.:  5min after I wrote this article, I figured, that Martin Fowler wrote an article yesterday about VCS;

Wednesday, December 30, 2009

[Tech] Distributed SCM: Playing with Repos

As some may have noticed, I migrated nearly all my projects in the last year from Subversion to Mercurial (and GIT). Step by step, as I am rather conservative with changing to new technologies, particularly when they are at the heart of the project. And changing the SCM is sort of a surgery on the open heart.

However, after nearly a year of experience I must say, SCM was (for me) never easier and more enjoyable than with distributed SCMs, particularly with Mercurial. Excellent documentation, easy and straightforward to use. Yet these days I was asking myself: If I would have to name one outstanding feature that would convince me to change from a centralised system like Subversion to DSCM, what would it be?

The answer might be surprising, but for me it is clearly this: No headache and fear when working with the repository any more. What do I mean with that? Well: I was never a Subversion guru and everytime I needed to do an operation I did not do very often (branching, merging) I was sweating. Should I press the button, am I making a mistake? What exactly do these options mean in the SVN client? Did Eclipse now mess up the local copy? Should I commit? After all, you are always working with the repository. If you mess up, you have a problem, and all team members with you. Not a nice procedure.

But with a DSCM there is no master repository, hence in case of doubt I make a clone, play around with the clone. Should I mess it up, I delete the clone and nothing happened. If everything is fine I push the results. This is for me personally the most essential feature of systems like Mercurial. I can play around even with esotherik plugins and features without the fear to destroy anything. This also makes learning for new users way easier.

What is your opinion?

Sunday, December 27, 2009

[Tech] Simple Java Template Engine

Template engines are widely used in Web Frameworks, such as Struts, JSF and many other technologies. Apart from classical Web Framework, template engines can be very useful in integration projects. In an actual integration project that deals with a lot of XML data exchange, I discovered the Java Template Engine Library FreeMarker. This Open Source Library is a generic template engine in order to generate any output, such as HTML, XML and any other user defined output based on your given template.
"[...]FreeMarker is designed to be practical for the generation of HTML Web pages, particularly by servlet-based applications following the MVC (Model View Controller) pattern. The idea behind using the MVC pattern for dynamic Web pages is that you separate the designers (HTML authors) from the programmers. Everybody works on what they are good at. Designers can change the appearance of a page without programmers having to change or recompile code, because the application logic (Java programs) and page design (FreeMarker templates) are separated. Templates do not become polluted with complex program fragments. This separation is useful even for projects where the programmer and the HTML page author is the same person, since it helps to keep the application clear and easily maintainable[...]"
I think HTML is one application area of FreeMarker. Consider 3rd party systems providing APIs consuming XML data or their own data structures. Construct their data format in the code is a grubby approach and furthermore the code becomes not maintainable. Using such a library you can manage your data exchange template outside your code and produce the final data by using the template engine. I see such template engines as classical transformers as in an Enterprise Service Bus:

In the above exmple you see, that you can use placeholders in your template files, which will be replaced by the real data when the transformation takes place. FreeMarker provides enhanced constructs such as if statements, loops and other stuff which can be used in your template files.

Template engines are often used in Web Frameworks, but the usage of template engines is also very useful when you must produce specific output for other systems.

Sunday, November 08, 2009

[Misc] Subversion turns into an Apache Project: so what?

Since a few days it is official: The Subversion project has submitted to become an Apache project. It seems that the incubation phase will start soon. Now my question: Subversion is conceptually dead, so what difference does that make? Ok, let's discuss this a little more in detail:

The thing is: most developers (even myself) meanwhile understand the concept of DSCM systems and all available projects are stable, fast, have good communities and are reasonably documented. Even tool support (IDEs, ...) is decent. Having understood DSCM I wonder why I would want to go back to a centralised system like SVN. There is no benefit in there for me. If I want to work server based so be it: I can do this with Mercurial for example in BitBucket, with Git in GitHub or by simply installing the DSCM on an arbitrary server that has ssh access.

I had a discussion with an Apache commiter recently about this fact and about the future of Subversion. He believed, that Subversion could (or better should) go through a complete redesign to embrace features provided by distributed source code management (DSCM) systems like Bazaar, Mercurial or GIT. I personally question this future of Subversion. We already have three pretty good systems and a very competitive game is played here since the last two to three years. Subversion would start with a delay of probably three years. Until a stable version of a (partly) distributed SVN is out, all other systems will be settled and far ahead.

There is, however, one major feature DSCM-systems can not provide by definition: this is collaboration via locking. This is an important feature for collaboration on (large) binary files like Photoshop documents, multimedia files, vector graphics and the like. Merging of such documents is practically not possible, but many (software) projects partly rely on a significant number of documents of that sort. Keeping those in a DSCM system is not the best idea. My future scenario for distributed (software) development hence consists of two repository types: a distributed one for sharing text-based files (like sourcecode) and a centralised one that provides versioning and a very good locking (check-in/out) support for (large) binary documents.

Now as I understand it, Subversion is neither really good in locking nor management of binary data either. Also tools-support for non-programmers (who often work with such binary documents) is not so great. So what is the future of Subversion? I believe already today it is pretty much a legacy like CVS. Off course, there are so many projects using Subversion, that we will probably have to deal with it for the next decade (not that I would like to). However, the migration wave has started and most new projects will use one of the mentioned DSCM systems. How could a re-write of Subversion help? Well, the principles are so different that Subversion with distributed characteristics would be either a new project (and I doubt that we need a forth DSCM system, as mentioned above) or would keep a lot of the disadvantages of the old system.

But maybe Subversion could focus on the locking-based approach: this is a very much needed feature for many projects and I also do not see much competition (in the Open Source environment) here. A good repository for binary data could be a reason to stick with Subversion for parts of the development efforts.

Your ideas?

Tuesday, September 08, 2009

[Conf] Zurich Open Source Jam

On August 13th, more than 50 other people, interested in open source software, attended the 8th Google Open Source Jam in Zurich, which is an informal (bar-camp like) meet-up at Zurich office (also available in other parts of the world) and a perfect opportunity to meet other open source developers as well as Google engineers in a relaxed atmosphere. As it is open to everyone, people held several lightning talks on a great variety of topics:
  • "G-WAN", Pierre Gauthier
  • "Dynamics of Open Source code", Markus Geipel
  • "Involving students in Open Source", Lukas Lang
  • "Open Source in Africa", Michel Pauli
  • "BTstack", Matthias Ringwald
  • "Free Software & basic income", Thomas Koch
  • "NxOS, an OS platform for Lego", David Anderson
  • "Open Source in the Humanities", Tara Andrews
My talk was related to open source student projects, accomplished within the scope of the course "Advance Software Engineering", held at QSE. Four projects were completed successfully in the last two years and got integrated to the codebase:
Similar to Summer of Code, these students have been mentored by experienced open source committers from the Apache Software Foundation and the Codehaus. Developers and students, participating in open source projects themselves, commented a lot on this topic: "I wish, I had something similar when I was a studying", said a Google engineer.

Afterwards, we continued to have interesting discussions. After some time I found myself in an exciting discussion on software engineering at Google. First off, I'd like to mention that employees never make clear statements concerning their work as they are bound to confidentiality. Even though no specific software development process was confirmed, one could identify tendencies:

Don't repeat yourself (DRY). Code and software reuse as a basic principle. The Google Code repository was created as a collaborative platform to manage, document and review free/libre open source software (FLOSS) projects. Indeed, employees spend up to 20% of their time contributing to open source projects.

Don't reinvent the wheel. "At Google we don't reinvent the wheel, we vaporize our own rubber", told me one of the engineers (they use heaps of metaphors like this) meaning that a vast majority of the software in production use is built on top of parts or complete open source libraries. Aside from releasing software like the Web Toolkit, Android, Chromium, etc. back into open source, Google contributes to a diversity of FLOSS projects (e.g. Linux kernel, Apache projects, MySQL, Mozilla Firefox) [1]. However, they keep implementations of key technologies a secret claiming that for instance their webserver, apparently a Tomcat re-write, was "to specific to benefit from" or just don't publish it for competitive reasons [1]. The same goes for Google File System (GFS), BigTable and MapReduce. In a nutshell, scientific publishing [2] of these core technologies at least led to great open source implementations (e.g. Apache Hadoop) which are open to everyone.

[1] A look inside Google's open source kitchen, http://www.builderau.com.au/strategy/architecture/soa/A-look-inside-Google-s-open-source-kitchen/0,339028264,339272690,00.htm
[2] Google Publications, http://research.google.com/pubs/papers.html

Thursday, September 03, 2009

[Process] Distributed Source Code Management and Branching

I am using Mercurial a lot recently (and love it); I really do wonder, why I struggled so long with Subversion. When I first heard the GIT presentation from Linus Torvalds (which is, hm, very entertaining) the whole distributed SCM thing sounded very esotheric for me. However I decided to give it a try, also motivated by the great Chaosradio Express 130 Podcast (German). Yet, I decided to go with Mercurial and not Git; allthough this created some flame-wars within our group, because one of my colleagues is a big Git fan. So be it ;-)

For me Mercurial is a great, easy to install, and pretty easy to understand system. The commandline is really straightforward and the help-texts well written (and interestingly - internationalised). Maybe I follow up another blog post with more details to Mercurial another time.

As easy branching and merging are among the main advantages of the new distributed SCMs, I want to recommend for now the very nice blog-post by Steve Losh "A Guide to Branching in Mercurial". This article provides a good and conclusive introduction to different methods for creating branches with Mercurial and also explains differences to Git and some of it's shortcomings (*g*).

p.s.: For my taste, just one thing is missing: some details on merging.
p.p.s.: Please no comments on my "Git shortcomings" statement, they will be censored out anyway ;-)

Thursday, August 20, 2009

[Arch] UML Tools for Mac OS X

Following up a question I received via Twitter, and the fact, that a significant part of the developer-community is using Macs, I thought this might be a good opportunity to discuss some "UML Options" for the Mac. Now, this article is not meant as a definitive answer, I would hope for some follow-ups by readers in the comments.

Ok lets start: First there is heavy weight stuff, most notable Visual Paradigm. A warning: this is a fat tool. However, among the fat tools it is the one I liked the most. I am not using it any more, but it is generally rather easy to use and very feature rich. However, it is a pretty expensive commercial tool. Yes, they have a "community edition", because it is cool to have a community edition these days. But this one was (when I used it last year) rather a joke. See it as a test-preview.

There are also other commercial tools as well, e.g. Omondo. I have not much idea about this one though. Anyone?

On the other end of the spectrum are tools like UMLet (or Violet), which are also Java-based and work more or less good also on the Mac. These tools are very basic and one should not expect much. They are definitly not suited for "real" projects or commercial application, but can be a nice option e.g. for educational purposes. Sometimes one just needs to create some simple UML diagrams for a presentation, paper or book. For such purposes these tools might be useful. Plus both are Open Source tools.

The probably best free (but not Open Source) UML tool, and the one I would recommend is BOUML and this is sure worth a try. The main issue I have with nearly all free/OS UML tools is, that they are often driven by a single person or just very few developers. Hence the future of the particular tool is always a little unclear. To make things worse, there is no accepted open file-format for UML diagrams, that would allow easy exchangeability of the tool. Hence selecting a UML tool is always sort of a lock-in situation.

Also a consideration could be ArgoUML, which is also an Open Source tool and maybe the oldest one around. Has some issues as all OS tools, but apparently has a functioning community.

Finally there are some more or less general purpose drawing programs, that can be used for technical diagrams like EER or UML models as well (with some limitations) like OmniGraffle or Concept Draw and finally also OpenOffice Draw can be used for general purpose vector-oriented diagrams.

Would be happy about comments, experiences and further suggestions!

Tuesday, April 21, 2009

[Tech] What about Maven 3

At thte last Maven Meetup Jason van Zyl talked about the future of Maven and important milestones of Maven 3, including:
  • Support of incremental buildings
  • Changes about the Plugin-API
  • Better Multilanguage support
  • and more
The video and slides about the presentation are available here.

[Misc] Sun and Oracle

Now it finally happened: Oracle bought Sun for 7,4 billion dollar. It sure is a little bit surprising, as the deal with IBM seemed to be settled already. From a developers point of view, the Oracle deal might be better for the community, allthough it also has certain risks.

For IBM Java is strategically very important, insofar Java would have been "save" with IBM. Additionally IBM has developed (similar to Sun) a solid Open Source strategy over the last decade which would also fit to Sun. However, a significant amount of their product lines would have overlapped: Both have Middleware products like Websphere and the Sun Glassfish project portfolio. Both have a series of database products: mySQL at Suns side and of course the DB2 line on IBMs side, and a similar story on the OS front: the probably superior Solaris versus IBM AIX. Finally Sun has the Netbeans IDE as central development tool whereas IBM has Eclipse. I doubt that IBM would have had a lot of interest in doubling all these product lines. Not to mention the Sun hardware.

Now, on the paper Oracle looks much more "compatible" to Sun. True, there are some overlaps in the middleware section. Most "afraid" might be the mySQL folks, as Oracle already showed some hostility against mySQL in the past. Then again, when they own the product, they probably can sell it in their database portfolio in the "low-end" market. Java is also important for Oracle and probably even more important is the operating system Solaris and the Sun hardware and a tight integration to e.g. the Oracle database. With these assets Oracle can offer "end-to-end" solutions starting from hardware, operating system, storage, solutions, database, middleware, web-frameworks and integrated development environment.

What worries me a little bit with Oracle is the lack of experience in the Open Source community. Oracle is in my opinion a rather closed shop compared to IBM and Sun. Maybe Oracle can learn a little bit from Sun's experience here. However, my fazit is, that there is significant potential in the combination of Sun and Oracle (probably more than with Sun/IBM) but also some significant risks in terms of openness and for certain parts of the Sun product line. I am particularly looking forward in terms of the consequences for the Open Source middleware-portfolio, Java and mySQL.

Update: Larry Dignan from zdnet blog writes about mysql:
"Oracle gets to kill MySQL. There’s no way Ellison will let that open source database mess with the margins of his database. MySQL at best will wither from neglect. In any case, MySQL is MyToast."
Well, I would not bet on that (but probably would not start a new project with mySQL either...), but it is for sure an option.

Tuesday, April 14, 2009

[Arch] Maven Best Practices

In this blog of sonatype there is a useful list of best practices with Maven, including:

  • Why putting repositories in your pom is a bad idea
  • Best Practices for releasing with 3rd party snapshot dependencies
  • Maven Continuous Integration Best Practices
  • How to detect if you have a snapshot version
  • Optimal Maven Plugin configuration
  • Adding additional source folders to your maven build
  • Misused maven terms defined
  • How to override a plugins dependency
  • How to share resources across projects
  • How plugin versions are determined
Before you search in mailing lists, look at the list, it will help you.

Saturday, April 11, 2009

[Misc] Open Protocol vs. Twitter: 1:0 ?

In a current ZDNet Blog Posting, Sam Diaz analyses the recent technical issues Twitter has (again). Twitter is growing dramatically in the last months and apparently the Twitter backbone is increasingly in trouble. The same happened already about a year ago.

The analysis of Sam Diaz is of course correct, but in my opinion he still completely misses the point in discussing technical issues why Twitter might or might not catch up with the upcoming demand in the service. The point actually is, that the communication concept of Twitter is appealing to many people, which is good, but in the history of the Internet it was never a good idea to rely on a proprietary protocol in any important communication channel.

So the real question is a much more generic and actually should be: how can we get rid of Twitter as fast as possible and replace it with an open protocol and a scalable distributed architecture, comparable to Email, XMPP chat and the like. There are good reasons why proptietary protocols largly failed on global communication systems like the Internet; those that are still around are a continuous pain in the ...

I confess, I am using Twitter as well, but it is of course a lock-in situation. If you want to follow the interesting stuff, you currently have to use Twitter. However, now we still have time to replace Twitter with something like Laconica (identi.ca) or anything similar down the road. Even better, Twitter might open up it's system and try scaling it that way. However, now is the time to act: Twitter is still a toy, but it is on the way to become a serious communication system we might depend upon in some years. And I believe, no one wants to depend on a communication system that is proprietary and unreliable at the same time.

Thursday, April 09, 2009

[Tech] Mavenizing AppEngine!

As I nagged yesterday about the fact, that AppEngine has no proper Maven-build system, already today the guys at Sonatype reacted ;-)

They describe preliminary attempts in how to "Mavenize" AppEngine projects; hope they will be able to fix also the last issues!

Wednesday, April 08, 2009

[Tech] Google AppEngine (and Java)

AppEngine is a rather recent new service from Google. It is probably Google's answer to Amazon's cloud-computing platform, yet targets a very different market. Where Amazon offers a a broad range of services and high flexibility (with the disadvantage of higher administration effort) Google targets web-developers that want to publish web-applications. AppEngine started with a Python environment, now since some days the long anticipated Java-version (Java 5 and 6) is online. Now what are the benefits of using AppEngine?

Java

First of all, it is possible to deploy applications without having to install, administrate and maintain an own server (instance). Google provides a runtime environment (sandbox) into which Python or Java applications can be deployed. Access to these applications is (for clients) only possible via http(s). So this is a feasible approach for web-applications or RESTful services.

An additional advantage is, that Google deals with scaling issues, i.e. it scales the applications dynamically to the demand. This is a significant advantage for startups, that have no clear idea about the number of customers they are going to have or how fast this number is growing. For the scaling to work, though, some restrictions have to be considered. Most notably this concerns the persistence strategy. E.g. applications (and libraries!) are allowed to read files from the filesystem, but are not allowed to write. For all persistence issues, the Google datastore has to be used. However, what is nice with the new Java-sandbox is the fact, that Google apparently tries to follow established standards. For persistence Java developers can use JDO or JPA or a low-level interface for the distributed datastore.

I wonder, however, how logging can be handled in that environment. Logging is usually done to a file or to a JDBC datasource. A JDO logging target I have not seen before; ideas anyone?

Generally spoken, arbitrary Java libraries can be deployed and used in the AppEngine as long as they do not violate the AppEngine sandbox. Also due to the scaling-approach not all libs/frameworks will run unchanged. As yet, it seems not quite clear for example, which Java Web-Frameworks will run seamlessly in the App-Engine. Googles webtoolkit (GWT) should work, other framework communities are currently testing their frameworks for compatibility, e.g. in the Apache Tapestry and the JSF framework Apache myFaces discussions are running on the mailinglists.

Build Automation and Development Process

The development process is, in my point of view, as also with other Google environments like GWT a mixed blessing. Everything is Eclipse centered, which is not really a good thing: Google provides an Eclipse plugin for the AppEngine including a runtime environment for testing applications offline. This is great for daily development activity, but not for a stable build- and testing environment. Unfortunately Maven support (like archetypes) are completly missing at the moment. Google is apparently pretty hostile towards Maven and focuses mostly on IDE integration, which is definitly not a sound way for a modern build automation. IDE "wizard-based" SE approaches usually turn out to be unstable and problematic, particularly in team-projects. This might be nice for a fast hack, but is no basis for a larger project. It seems, that some support is given for Apache Ant though.

Hopefully other developers will provide a Maven integration for the Java AppEngine. With the current approach not even an IDE-less deployment is possible.

Conclusion

So, despite of the build-issues, I believe that the AppEngine is a great option to deploy web-applications in Java or Python. For small applications (small in the sense of "low web-traffic"), the AppEngine is free, after exceeding certain thresholds (CPU, storage, bandwith...) one pays according to the resources needed. Google provides a web-interface to set daily financial limits for individual resources. E.g. one wants to spend a maximum of 5 $ a day for CPU time and so on.

Looking forward to the first experience reports, particularly with web-frameworks like Wicket, Tapestry or Cocoon.

Wednesday, March 25, 2009

[Tech] HSQLDB Version 1.9 alpha is out

Finally hsqldb 1.9 (alpha, though) is released. This release was announced for, I believe, nearly one year. It seemed to me already that hsqldb is a rather dead projects. I am glad they made the next round, because in a way I still like that system a lot. Sure, Apache Derby is most likely the superior system, and H2 looks very promising too (but is still, as I understand it, a "one man show" without community) however hsqldb has some tiny details that make it very nice: First, it always had a really tiny footprint and was extremly easy to understand and use.

And I particularly liked the feature to fine-tune the memory management, i.e., should the data be stored on disk, purely on memory... and this on a per-table basis. Plus, with one simple command it is possible to write the whole database as SQL statements into a file from which it is also loaded again from that file. A feature that is e.g. missing in Derby. This often turned out handy during development-phase.

Now for version 1.9 they seem to have rewritten significant parts of the software and added an impressive list of new features. What I have to figure out is, if they have finally implemented proper transaction isolation. In my opinion this was (beside the single-threaded kernel) the biggest issue in the previous versions, where dirty read could not be avoided. I am a little bit confused with the announcement(s) now, because they wrote that they have rewritten the core, however, in a forum posting the developers announced, that transaction isolation is not handled in the new release 1.9 but is planned for 2.0. The news announcements on the sourceforge are a little bit confusing for me. Does anyone have a better idea about this issue?

However, good luck for the stabilisation-phase of the new release!

Friday, January 23, 2009

[Tech] An easy to use XML Serializer

XML processing is an important part in present software systems, especially when communicate with other software components in the IT infrastructure. Pretty often you must provide your object data as XML. The Open Source market provides a wide range of XML tools, above all XML mapping tools, like Castor, JAXB and others. A very intersting and compact tool, existing since 2004 is XStream hosted on Codehaus. XStream is a proved XML serializing library and provides the following key features:
  • Easy to use API (see example)
  • You do not need explicit mapping files, like other serializing tools
  • Good performance
  • Full object graph support
  • You can modify the XML output
Let us consider a simple business object, person, implemented as a POJO (taken from the XStream homepage):

public class Person {
private String firstname;
private String lastname;
private PhoneNumber phone;
private PhoneNumber fax;
// ... constructors and methods
}

public class PhoneNumber {
private int code;
private String number;
// ... constructors and methods
}
In order to get a XML representation of the Person object we simple use the XStream API. We also set alias names which are used in the output XML.
XStream xstream = new XStream();
xstream.alias("person", Person.class);
xstream.alias("phonenumber", PhoneNumber.class);
String resultXml = xstream.toXml(myPerson)
When creating a new instance of the person object an serialize it via xstream (toXml) we get the following XML result. As we can see our alias names are used.

<person>
<firstname>Joe</firstname>
<lastname>Walnes</lastname>
<phone>
<code>123</code>
<number>1234-456</number>
</phone>
<fax>
<code>123</code>
<number>9999-999</number>
</fax>
</person>

The example illustrates that the framework is very compact and easy to use. Look at the 2 Minute tutorial on the XStream homepage to get more examples. You can also implement custom converter and transformation strategies to adapt XStream to your requirements.

Have fun with XStream.

Tuesday, January 20, 2009

[Pub] Data transformation in an SOA

I published a german on article on JAXCenter about data transformation in Service Oriented Architecture. When different applications talk to each other, you must find a suitable data format which all applications can interpret. For the most cases XML is the first choice, because there is a wide range of tool support and additional standards, like Schema Editors, XPath and the like.

In this article I give an overview about the Open Source Framework Smooks, which can be used for data transformation in SOAs. Smooks provide some interesting features:

  • Data Transformation (XML, CSV, EDI, Java, JSON,..) and custom transformers
  • Java Binding from any data source (CSV, EDI, XML,..)
  • Huge message processing by providing concepts like split, transform or route message fragments. You can also route fragments to different destinations, like JMS, File or databases
  • Message Enrichment
Above mentioned attributes/features are also ideal candiates where an Enterprise Service Bus can help. Rather existing Open Source ESBs, like Mule or JBoss ESB can profit from a technology like Smooks. In last part of the article I describe the Smooks extension for Mule which provides:
  • Smooks Transformer for Mule. The transformation logik is done in Smooks
  • Smooks Router for Mule. The routing logik can be configured in Smooks

Saturday, December 20, 2008

[Pub] Mule IDE

I published an article about the new Mule IDE in the current issue of the Eclipse Magazin. In the article I give an overview about Mule and how the IDE supports developers to model their Mule applications. The IDE provides the following features:
  • Mule project wizard
  • Mule runtime configuration (you can define different Mule runtimes)
  • Graphical Mule Configuration Editor
  • Start your Mule Server from your IDE
More information about the Mule IDE you can find on the Mule IDE homepage.

Tuesday, December 09, 2008

[Misc] Glassfish

I recently informed myself about the (Sun) Glassfish J2EE server. I never took it as a serious competitor in the field, as I had the impression it is just a reference implementation that is from Sun... However, I had to change my opinion. In the recent years it seems, that the Glassfish community worked hard on their baby and currently it seems to be a solid competitor in the field.

The Glassfish univers "not only" contains a J2EE server, but actually a set of Enterprise-tools like as message broker, clustering framework, enterprise service bus (JBI compatible), a library to implement SIP applications and the like. Additionally it is well supported by the Netbeans IDE. The recent (preview) version contains a J2EE runtime that additionally supports scripting languages like Ruby and Groovy and is based on the OSGi framework.

What I do like additionally is the fact, that Glassfish comes with a decent installation tool, provides a solid web-based administration interface and seems to be reasonably well documented. And, of course, the whole stuff is Open Source.

I must say, I am quite impressed so far. Any comments on that one?

Friday, December 05, 2008

[Misc] Mule Developer Blog

There is now a new blog which takes his focus only on Mule, by providing technical tips, comments and breaking news around the Mule product line (ESB, Mule Galaxy,..). Blogger from this blog are only developers from Mule Source and members of the Mule Service Team, which means that you get the information from first hand. There are already posts, e.g. how to write custom transformers in Mule or an introduction to Expression transformers. Another interesting post focus on performance tuning in Mule.

The backround idea behind this blog is to give the Mule community as much information as possible. This is the right way, because there are some issues (e.g. performance issues) where you always end up in the mailing list and search some hours for the right answer. Some posts are emerge from discussion threads in the user mailing list.

Wednesday, November 12, 2008

[Arch] RESTful applications with NetKernel

The architectural style REST has gained some popularity and is often brought up against SOAP for interoperable web services. REST stands for Representational State Transfer and has some characteristics that distinct itself from other architectural styles:
  • Resources such as a person, an order, or a collection of the ten most recent stock quotes are identified by a (not necessary unique) URL.
  • Requests for a resource return a representation of the resource (eg. an html page describing the person) rather than an Object that IS the resource. A resource representation represents the current state of the resource and as such is immutable.
  • Representations contain typically links to other resources, so that the application can be discovered interactively.
  • There is typically a fixed and rather limited set of actions that can be called upon resources to retrieve or manipulate them. HTTP is the best known example of a RESTful system and defines eg. GET, PUT, POST and DELETE actions.
Applications based on REST are typically very extensible, provide good caching support, and can be easily mashed up to bigger applications.

NetKernel

Using the RESTful application pattern in non web based applications is currently not very well supported by programming languages and frameworks. NetKernel is an open source framework designed to provide a simple to use environment to program RESTful applications.

Its architecture is rather simple: Programmers write Modules and register them with the kernel. Each module registers its address space that that states which logical addresses (URIs) the module will handle and which java class, script (Python, JavaScript, Groovy, …), or static resource will act upon the request and return a resource representation. A module can also register rewrite rules that translate from one address to another.

Accessing resources within NetKernel from outside is via Transports. Each module can have Transports that monitor for external system events (eg. JMS events, HTTP requests, CRON events, etc), translate these events into NetKernel requests, and place these requests into the NetKernels infrastructure that will route the request to the appropriate resource.

NetKernel supports a wide range of scripting languages uses resource representation caching to speed things up transparently for the developer. The internal request-response dispatching is done asynchronously so callers can easily state that they do not care for an answer after 10 seconds, are not interested in the response at all, or place several request first and then wait for the responses coming back. REST is most often associated with HTTP – with NetKernel one can apply the REST architecture style also to applications that do not use http; it is completely decoupled from the http stack.

Compared to other REST frameworks such as Restlet, NetKernel is extremely well documented and several large sample applications can be downloaded from their homepage to get started quickly.

Related Links
  • http://www.1060.org – the homepage of net kernel.
  • A recent article on TheServerSide.com about resource oriented computing with NetKernel that provide a more thorough introduction.
Benedikt Eckhard (edited by Alexander Schatten)