Tuesday, August 19, 2008

[Arch] Trends in Data-Management (aka Databases)

It is interesting for me to observe: relational databases have been attacked several times in the last decades, e.g. with object-oriented databases (gone) or XML databases (gone). Now recently a new trend in data-management seems to appear: databases or better data storage/management mechanisms that follow a much looser paradigm than relational databases often using a lightweigt (often REST or JSON based) access strategy. This demand for new datamanagement strategies seems to have several reasons, some come to my mind:
  • Performance: in some cases, complex queries are not required (or can be replaced by simple ones): databases that perform very fast with pure primary key retrieval
  • Complex datastructures are not needed
  • ACID is not needed, i.e. mostly simpel writes are performed but fast reads necessary
  • Agile development seems to favor rather ad-hoc data-structures vs. carefully planned ones (if this is a good trend is written on a different page)
  • Distribution is important and distributed relational databases are a hard thing to do
  • Access to rather document-oriented datastructures is required
and probably many more. Already older tools like Apache Lucene (actually designed as full-text search engine) is used in several projects as kind of a database replacement. This is particularly possible when reading is more important then writing data and no particular ACID requirements are in place. But Lucene provides a nice and rich query language for that matter.

Recently Amazons EC2 platform made a lot of waves as a distributed deployment platform to be used for applications that have to scale significantly (there is, btw. an Open Source version implementing part of the interfaces named Eucalyptus). Part of the Amazon toolset are two storage mechanisms: S3 and SimpleDB. For both APIs are available to be used from applications. S3 is a storage mechanism for storing rather larger junks of data (like files, documents) and is organised in "buckets". SimpleDB, currently beeing in beta, is a storage mechanism for more fine-grained issues. With SimpleDB chunks of data can be stored using a primary key (item id) and a set of items that can consist of attribute/value pairs. To access SimpleDB a WSDL interface description is available and a sort of REST-style interface.

The newest kid on the block (as appears to me) is Apaches CouchDB, which is currently in the Apache incubator. CouchDB seems to follow a similar strategy like Amazons SimpleDB but is focuses on REST/JSON style access (here is a nice comparison between SimpleDB and CoudhDB). CouchDB is (unfortunately, in my opinion) written in Erlang which makes installation and usage (at least in the Java environment which most Apache projects share) rather a difficult issue. However, conceptually it seems to be quite interesting and I suppose we will see more projects of that sort soon.

Ah, and speaking of marketing: projects like CouchDB explicitly express that they are not alternatives to relational databases :-) However, the first projects appear that provice RESTful interfaces for relational database...

Btw.: does anyone know other projects in that domain that I have not seen yet?

Monday, August 18, 2008

[Arch] Mock Objects

I stumbled over this article yesterday: A neat and short description of Mock objects and a motivation how Mock objects in general and Mocking frameworks can support (unit) testing particularly with classes that have dependencies. I like this very short introduction because the concept of Mock-objects is actually not so difficult to understand but the need for Mock-frameworks is not so easy to grasp.

If the basic idea is understood the documentation of frameworks like JMock can kick in and do the rest ;-)

Addition: Thanks to the comment of reader Touku who recommended the article from Martin Fowler: Mocks Aren't Stubs.

Thursday, August 07, 2008

[Misc] Puppet and Puppetmaster

I am back from Indonesia, and what could be a more worthy topic to write as first blog after the travel? Exactly: Puppet. In Indonesia I listened to the IT Conversations talk with Luke Kanies about his project. Puppet is an open source system-administration framework for Unix-based operating systems. I believe, that puppet shows quite some innovations not easily to be found in other tools and has the potential to be the next step in system administration.

First: the target audience of puppet are system administrators and/or developers that have to roll out and administrate a potential large number of server and client (!) systems. Everyone who has to administrate more than two machines know that doing that manually is for sure not an entertaining business. Now what I believe is puppets strongest idea is, to define an abstraction layer over system administration:

Puppet allowes to define the behaviour of machines in an abstract way by using a language to describe classes of configurations; as in object-oriented languages inheritence is possible. The usual tasks of a sysadmin can be written in the puppet language. More important, puppet tries to abstract from OS details, so it does not matter for ordinary activities like configuring an Apache webserver whether the target OS is Linux, Solaris or BSD. To abstract from concrete resources puppet uses so called resources: a good example are users. As we know, they can be defined and managed in different ways on different platforms and contexts. Puppets resources hence deal with concepts like user, file, cron and so on on different operating systems in the same way.

Essentially puppet can be seen as the missing next step after virtualisation solutions: a virtualisation describes the hardware requirements of a machine, puppet describes the operating system and services requirements. So ideally you define the specifications of your machine (needs Apache Webserver, mysql... version...) and then execute that on the very machine using puppet. If you need a second machine with the same configuration, just reuse the configuration from the first (puppet calls that repeatable configurations).

Puppet is also a tool in the sense, that a so called "puppet-master" can communicate with puppet clients. These clients are under control of the puppet master.

Configurations are idempotent, this means, you do not need to assume a specific context or status on the machine to run a configuration "script". You can simply start a configuration on a specific machine and the configuration definition with puppet brings the machine into the desired state. Actually puppet executed these configurations on a regular interval to keep the machine in the desired state.

As far as I understand puppet so far, it is the next level of system administration (as mentioned above, particularly also in combination with virtualisation) allowing to manage also complex infrastructure. There are apparently already a number of companies and institutions using puppet on a larger scale. Luke Kanies mentiones in his talk that also Google is using puppet so administrate several thousand machines (apparently partly MacOS) but also many other international companies.

Puppet written in Ruby and is provided as Open Source framework, however, one thing that worries me a little bit at the moment is the fact, that there is currently no big community behind puppet. Puppet is the "baby" from Reductive Labs and there essentially from Luke Kanies and I believe few further developers. What I have heard from this project so far is really impressive, and I hope that the project attracts more developers soon and Reductive Labs is open minded enough to open the development to outsiders.

Tuesday, August 05, 2008

[Pub] JBPM meets ESB

The combination of a process engine and an Enterprise Service Bus (ESB) is one interesting aspect of modern service oriented architectures (SOA). Both, an ESB and process engines provide similar concepts and software architects often have problems to find the right solution. Therefor I and Bernd Rücker wrote an article in the German Java Magazin about it. To have a practical showcase the integration is shown with a small example using JBoss jBPM and two Open Source ESBs: JBoss ESB and Mule.

The easy showcase implements the following example: Some event is generated and saved as a file (This may be an order, some incident, an alert, whatever). This file is picked up by the ESB and a new jbpm process is started. The process contains a human task, where somebody has to review the data of the event and decides, if that event can be ignored (e.g. a false alert) or if it has to be handled. In the latter case, the event is sent to an existing case management system via Web Service (could be Lotus Notes or something like that). The case management systems sends a JMS message as soon as the case is closed. This message is again picked up by the ESB and the right process instance is triggered (called "signaled" in jBPM).

The article covers the following topics:
  • The basic combination of a process engine and an ESB
  • When makes it sense to combine a process engine with an ESB
  • How does JBoss ESB integrate jbpm and which Event Handler the proces designer can use to call ESB services
  • How does Mule integrate jbpm and which Event Handlers does Mule provide for the process designer
  • Lessons Learned :)
To compare the ESB implementatations the show case was implemented with JBoss ESB available here and Mule, available here. Following the links, you will find a detailed description about the two implementation scenarios.