Tuesday, September 08, 2009

[Conf] Zurich Open Source Jam

On August 13th, more than 50 other people, interested in open source software, attended the 8th Google Open Source Jam in Zurich, which is an informal (bar-camp like) meet-up at Zurich office (also available in other parts of the world) and a perfect opportunity to meet other open source developers as well as Google engineers in a relaxed atmosphere. As it is open to everyone, people held several lightning talks on a great variety of topics:
  • "G-WAN", Pierre Gauthier
  • "Dynamics of Open Source code", Markus Geipel
  • "Involving students in Open Source", Lukas Lang
  • "Open Source in Africa", Michel Pauli
  • "BTstack", Matthias Ringwald
  • "Free Software & basic income", Thomas Koch
  • "NxOS, an OS platform for Lego", David Anderson
  • "Open Source in the Humanities", Tara Andrews
My talk was related to open source student projects, accomplished within the scope of the course "Advance Software Engineering", held at QSE. Four projects were completed successfully in the last two years and got integrated to the codebase:
Similar to Summer of Code, these students have been mentored by experienced open source committers from the Apache Software Foundation and the Codehaus. Developers and students, participating in open source projects themselves, commented a lot on this topic: "I wish, I had something similar when I was a studying", said a Google engineer.

Afterwards, we continued to have interesting discussions. After some time I found myself in an exciting discussion on software engineering at Google. First off, I'd like to mention that employees never make clear statements concerning their work as they are bound to confidentiality. Even though no specific software development process was confirmed, one could identify tendencies:

Don't repeat yourself (DRY). Code and software reuse as a basic principle. The Google Code repository was created as a collaborative platform to manage, document and review free/libre open source software (FLOSS) projects. Indeed, employees spend up to 20% of their time contributing to open source projects.

Don't reinvent the wheel. "At Google we don't reinvent the wheel, we vaporize our own rubber", told me one of the engineers (they use heaps of metaphors like this) meaning that a vast majority of the software in production use is built on top of parts or complete open source libraries. Aside from releasing software like the Web Toolkit, Android, Chromium, etc. back into open source, Google contributes to a diversity of FLOSS projects (e.g. Linux kernel, Apache projects, MySQL, Mozilla Firefox) [1]. However, they keep implementations of key technologies a secret claiming that for instance their webserver, apparently a Tomcat re-write, was "to specific to benefit from" or just don't publish it for competitive reasons [1]. The same goes for Google File System (GFS), BigTable and MapReduce. In a nutshell, scientific publishing [2] of these core technologies at least led to great open source implementations (e.g. Apache Hadoop) which are open to everyone.

[1] A look inside Google's open source kitchen, http://www.builderau.com.au/strategy/architecture/soa/A-look-inside-Google-s-open-source-kitchen/0,339028264,339272690,00.htm
[2] Google Publications, http://research.google.com/pubs/papers.html

Thursday, September 03, 2009

[Process] Distributed Source Code Management and Branching

I am using Mercurial a lot recently (and love it); I really do wonder, why I struggled so long with Subversion. When I first heard the GIT presentation from Linus Torvalds (which is, hm, very entertaining) the whole distributed SCM thing sounded very esotheric for me. However I decided to give it a try, also motivated by the great Chaosradio Express 130 Podcast (German). Yet, I decided to go with Mercurial and not Git; allthough this created some flame-wars within our group, because one of my colleagues is a big Git fan. So be it ;-)

For me Mercurial is a great, easy to install, and pretty easy to understand system. The commandline is really straightforward and the help-texts well written (and interestingly - internationalised). Maybe I follow up another blog post with more details to Mercurial another time.

As easy branching and merging are among the main advantages of the new distributed SCMs, I want to recommend for now the very nice blog-post by Steve Losh "A Guide to Branching in Mercurial". This article provides a good and conclusive introduction to different methods for creating branches with Mercurial and also explains differences to Git and some of it's shortcomings (*g*).

p.s.: For my taste, just one thing is missing: some details on merging.
p.p.s.: Please no comments on my "Git shortcomings" statement, they will be censored out anyway ;-)

Tuesday, September 01, 2009

[Misc] Clojure & Clojure Book Review

It looks like we are living in a fantastic time concerning programming languages. Creating a new language has never been easier then before. With the two great platforms Java and .Net it's not extremely difficult any more to generate intermediate code from the language you are dreaming of. And even the pragmatic bookshelf has a book in writing on "language patterns" to cover this topic.

Thus I assume that nearly every developer is looking out to bet on A) the best horse or to see B) a language from where you can learn the most. Unfortunately the whole world seems to bet on Scala from the sea of hot languages. And indeed Scala is quite cool and might win the race with good reason. Scala is hot stuff for reason. Nevertheless I never got really warm with Scala and it's hard for me to describe why. It's something in the syntax I really can not explain. My dream language must have a code that looks quite perfect. Years ago I had that feeling coding Ruby (and I love Ruby) because the code simply looked great (although I don't like other things in ruby as e.g the clumsy class definitions (there was a project on rubyforge to fix it. Does someone know the name?)).

So I am still constantly looking out for new languages and of course the pragmatic programmers have brought this book into my view:

Stuart Halloway, "Programming Clojure", 2009

So I might share my thoughts on this book and on the language.

For me still Clojure is more attractive then Scala because:

1. It's the toughest edge for your brain.
Perhaps you belong to the same generation as me: I never really used Lisp, Scheme or the dialects alike. But as I am a Java, D and Ruby Coder => Lisp and thus Clojure ("Lisp Reloaded") is the hardest challenge to learn. And according to the last two book reviews you should always go the hard way to learn the most. This might be even true for you.?! For me it turns out that the amount of round brackets is not the problem.

2. Clojure enforces the use of no variables but immutable data structures (as the successful Erlang does!). The code / structures itself mostly is the container for variables you would define use normally. This has two really strong advantages:

1. No variables mean less errors and less to debug
2. No variables and immutable data means no side effects
Hence it's not so easy to write Clojure code with much side effects.

3. Clojure has the strong concurrency concepts - if you really need mutable data - using STM (Software Transactional Memory) / MVCC, Agents, Atoms, giving you ACID Transactions without the "D" (which is only valid for databases).

4. The Java integration is really smart. Have a look:

(new java.util.Random)


simply generates an object you can easily use to work with.

Java and Clojure can call themselves easily vice versa. This might be a really good argument for you to save investments. Clojure doesn't really try to build up all the java libs from scratch; it reuses them in a clever way.

5. Clojure is indeed fast because it generates pure Java byte code.

And of course all the other stuff that a hot language must have:
  • Closures (I still can not believe that they are still discussion this hot feature for Java... it's a shame)
  • List Comprehensions
  • a workaround for tail recursion and currying and other weird stuff as
  • very lazy sequences, trampolining, etc.
What also challenged my mind is that Clojure has only three constructs for flow control (as we learned from Oz a simple and nevertheless powerful language doesn't really need much):
  • An "if" See an example (if (> num 100) "yes" "no"))
  • A "do" which is an iteration of statements - introducing side effects - and thus discouraged in clojure
  • And a "loop/recure" that can be used to build everything you need in flow control (Stuart Halloway calls it the Swiss army knife).
So before I discuss the downside of the language and the book, let me bring up the two points that I loved most in clojure (even if it takes a lifetime to understand them 100%):

1. It has Metadata incorporated in the language right from the start! So you can tie pre- and post-conditions, tests, doc, arbitrary macros to any kind of data or functions. Whow! Do you remember how long it took Java to introduce Annotations? And to me they are still not 100% part of the language but an add on.

2. Clojure has powerful macros and multimethods. This means DSLs are incorporated. So if you loved the way Ruby can build DLSs (and the pragmatic programmers will bring up a nice book on this Ruby DSL topic!) you will get a step further in Clojure.

The book itself uses all hot Clojure features to create a build system called lancelet (every language since Rubys Rake seems to do this a little bit cooler then Ant does).

What I disliked in the book is that it's still difficult for beginners although it is very well written with beginners in mind. One example: I still havn't found the page where I can read a string from the console. And this is a key feature for beginners to test something. The interactive REPL is not enough here. What I mean is that the book has a thousand brilliant examples like fibonnaci. But fib is a two edged example. It just calculates. It creates no emotions in the reader. A better example is the snake game the book creates on a few pages later (here we get input from a keyListener...).

The language itself has two downsides for me:

1. If you are used to the huge Java Collection Lib you are astonished that Ruby boils down all data structure to just a super powerful Array and Hash (and the others are rarely used). That's cool. So you get the impression that Clojure get's on step further in stating that everything is a set (like Lisp stated that everything is a list). But when working with Clojure you are suddenly confronted with not only sets but vectors, lists, maps and trees. Now the book tells you that you always use the set abstraction to work with this. To me this doesn't really help if e.g. the vector notation differs significantly from a set and list definition. I still don't get used to this. but it must be surely my personal inability.

2. The most important drawback is the key feature at the same time: Clojure has an extreme steep learning curve! To become a true expert in Clojure, i.e. to think and dream in Clojure you need to stress every neuron in your brain. So it all boils down to the question: Is it really worth investing half a year in a hot language like Clojure to be able to produce code that is 3 times more accurate imperative Code?

What do you think?

Regards
Stefan Edlich

Links:

And finally have a look at this nice language comparison: Java.next
Make a list. For every topic give points from 0 to 3. What is the most elegant for you?