Tuesday, January 29, 2008

Automated documentation code testing

During the first year or so of development work on Crunchy, I probably got a nickname of "Dr. NO!" by early Crunchy adopters as I often resisted suggestions for adding new capabilities. At the time, Crunchy required that html pages have additional markup added (vlam = very little additional markup) so that Crunchy could process them properly. I wanted Crunchy-enabled tutorials to be very easily created, without much additional work from tutorial writers, so that Crunchy would be adopted by many people. Most of the suggestions that were made, including some by Johannes, both while he was sponsored by Google as a Summer of Code student and afterwards when he became a co-developer, were rejected by me for that reason. Since then, the situation has changed, mainly for two reasons:
  1. Johannes created a new basic infrastructure for Crunchy where we can introduce new capabilities via plugins, without needing to change a single line of the core in most instances.
  2. Based on the new architecture, I came up with a new way to process pages so that no additional markup was needed for Crunchy to do its magic. This is what makes it possible, for example, to interact with the official Python tutorial on the python.org site.
Now, that it is so easy to implement new capabilities, I am revisiting some ideas I had rejected or ignored before. The struggle I have is to decide when enough is enough before finally having a version 1.0 officially released.

In any event, after reading some comments on this post by Georg Brandl, I started thinking about adding a new option to test code embedded in documentation. To quote from the comments on that post:

One thing that is seriously needed is the ability to run and test code snippets in some fashion. It's just too easy for documentation to get out of date relative to the code, and if you can effectively "unit test" your docs, you're in much better shape.

And I don't mean like doctests, because not everything lends it self well to that style of testing. If it's possible to mark up some code as being for test fixture and some code as being what belongs in the doc, that would be good.
Alternatively, from another reader:

For me a key is being able to test code in the docs, and think the key is being able to "annotate" a code snipit with information about the context in which it should run, and the output it should give.

I think that Crunchy is a very good platform to implement this. There are currently three complementary options I am considering, one of which I have started to implement.


The first option is to have something like the following [note that while I use html notation, Crunchy is now capable of handling reStructuredText, including having the possibility of dealing with additional directives]:

Some normally hidden code, used for setup:
<pre title="setup_code name=first">
a=42
</pre>

Followed by the code sample to be tested:
<pre title="check_code name=first">
print a
</pre>

And the expected output:
<pre title="code_output name=first">
42
</pre>

Upon importing a document containing such examples, Crunchy would insert a button for each code sample allowing the user to test the code by clicking on the button, invoking the appropriate setup, and comparing with the expected output. Alternatively, all such code samples in a document could be run by a single click on a button inserted at the top of a page. A javascript alert could be used to inform the user that all tests passed - otherwise error messages could be inserted in the page indicating which tests failed or passed.

This type of approach could, in theory, be used for other languages than Python; code could be executed by passing information to a separate process launched in a terminal window, with the result fed back into Crunchy as described above.

A second approach is to use the same method used by doctest to combine code sample and expected output; the setup code could still be used as described above.

A third approach, this one completely different, could be used for more general situation than simply for documentation code testing.

Currently, the Python code needs to be embedded inside an html (or rst) document. However, one could create links to code that lives inside separate Python files. For example, one could have the following:

<pre title="python_file">
<span title="python_file_name"> file_path </span>
<span title="python_file_linenumbers"> some_range </span>
</pre>

When viewing the above using a normal browser, one would see something like (using a fictitious example)

../crunchy_dir/crunchy.py
[1-3, 5, 7, 10-15]
However, when viewing the same page with Crunchy, the appropriate lines would be extracted from the file and displayed in the browser. Alternatively, instead of specifying the line numbers, one could have a directive to extract a specific function/method/class as in

<span title="python_file_function"> function_name </span>

which would instruct Crunchy to extract all the code for the function definition, and inserting it in the document. By using such links, the code in the documentation would always (by definition) be kept in sync with the real code. I realize that this is not exactly a novel idea but one whose potential could be extended by using Crunchy in ways never seen before. However, this last approach will have to wait until after Crunchy version 1.0 has been released.

What do you think of these ideas?

Monday, January 28, 2008

Kudos to ActiveState

ActiveState, the company behind the Python cookbook and many other useful and free resources, is a great supporter of free software. Its free Komodo Edit is a nice piece of software, well worth trying.

I started using Komodo Edit a few months after I switched from using Windows to using a Mac. On Windows, my Python editor of choice was SPE. However, I found that SPE, not being a native Mac application, had some small quirks that I found annoying. After trying Textmate, praised by many Mac users, and Wing among others, I settled on the free Komodo Edit. While I missed the class browser I had gotten used to with SPE, I found that Komodo was enough for my basic needs.

After reading this post by Jesse Noller, I started using pylint within Komodo and while I did not find any bugs (so far!) in my code, it did encourage me to improve the existing code. The possibility of easily adding new tools to Komodo Edit lead me to try its more powerful sibling, Komodo IDE. Komodo IDE has an integrated debugger (something I had *never* used before for code development but that I will likely use more and more in the future) and a code browser side bar which is even better than the one included with SPE. After using it for about a week, I decided to treat myself and purchase a license for it before the trial license ended. However, since I found the price a bit steep for something to use just for fun, I inquired about available discounts. I was told that, even though I did not use it for my work, I was eligible for an educational discount given that I work at a University.

However, there was more to come...

When I indicated that I intended to buy it online, I got an email telling me that I was actually eligible for a deeper discount since I had a license for an earlier version of Komodo Personal edition. This was a total surprise for me. Here's what happened: more than two and a half year ago, ActiveState had a special promotion for open source developers to get a free license for Komodo Personal edition. I had taken advantage of this offer at the time and installed Komodo on my Windows computer. However, I found it was comparable in functionality to SPE for which I had a slight preference. As a result, I gave up on Komodo after trying it for about a week.

Now, more than 2.5 years later, the friendly people at ActiveState reminded me that I had a valid (but free!) license and told me I could simply pay for an upgrade to what is in my opinion a much superior programming environment than the version I had a license for.

Talk about friendly customer service! Thank you ActiveState!

Wednesday, January 23, 2008

GHOP Python Related Success Story

Early on during Google's HOP contest, I joined the Python mentors group following a call for task suggestions, and submitted many Crunchy-related tasks. I was amazed by the quality of some contributions. As more potential mentors involved with other Python related projects joined in, I decided to quietly refrain from submitting more Crunchy related tasks. Still, Crunchy got more than its share of contributions from students from all over the world.

Today, one student posted a blog entry about a Crunchy presentation he made to his class. He describes it as a success - and I would agree. However, it is clear to me that the success of his presentation is due by far more to Python's strength than to Crunchy itself. I thought it was a very good example to use when advocating for the use of Python - and therefore, worth linking to.

Saturday, January 19, 2008

More power and removing an option

One of the neat features of Crunchy, suggested to me by Andrew Dalke at Pycon 2007, is its ability to dynamically display images that are generated by some Python code. The example given in the Crunchy distribution uses matplotlib. However, the way this is done is slightly cumbersome. First, some pre-existing vlam (very little added markup) must already have been added to an html page, specifying the name of the image (e.g. foo.png) to be displayed. This special vlam will result in an editor appearing on the page together with two buttons (one to execute the code, the other to load the image). Second, the Python code must be such that an image will be generated with that name. By default, the image is saved in and retrieved from a special Crunchy temp directory.

While the approach used works, it does also mean that images can't be generated and displayed using a simple Python interpreter, nor can they be displayed from an arbitrary location. At least that was the case until now.

Prompted by a suggestion from Johannes, I wrote a very simple module whose core consists of only 7 Python statements, and which does away entirely with the cumbersome vlam image file option. Images can now be loaded and displayed from anywhere using two lines of code:

import image_display
image_display.show(path, [width=400, height=400])

And by anywhere, I mean from any interactive Python element (interpreter, editor),
and using any image source (local images and images on the web), like

image_display.show('http://imgs.xkcd.com/comics/python.png')

Furthermore, one can use this approach to create a "slide show" by alternating image_display.show() and time.sleep().

Since there is no more need to use the old image_file option, it will be removed in the next Crunchy release. I may have to give some more thoughts to the API for this new option (e.g. is there a better name than image_display? Should I add a function for a slide show, giving a list of images and a time delay? etc.); suggestions are welcome.

Thursday, January 17, 2008

100 posts, olpc, rst and Crunchy

This is the 100th post on this blog which I started to write a little over 3 years ago - shortly after I started my programming hobby. And, as it so happens, I received two gifts in the past 24 hours:
  • I received my give-one-get-one olpc in the mail today.
  • A student wrote a simple programs to add Crunchy-specific "directives" for docutils so that rst can be used to create Crunchy ready tutorials. More on this below.
I had seen the olpc at Pycon 2007 but, not having tried it back then, I did not realise how small the keyboard really is. I have fairly small hands but they are much too big to use the keyboard comfortably. I thought of using it to write this blog entry but quickly gave up on that idea.

It will take a while to fully explore the olpc. I am extremely impressed by the quality of the screen. However, it is a slow computer which apparently can only handle a "small" number of applications running concurrently - I managed to freeze it with about 8 applications running. I seemed to remember from the Pycon presentation that the track pad had 2 (or 3, depending on how you count them) active region, but only the central one seems to be active. Members of my family were surprise at the "plastic toy" appearance of the olpc but they all seem to be fairly impressed once they saw it running.

Now, all I have to do is to figure out how to make Crunchy run on it. :-)

Speaking of Crunchy, as part of Google's HOP contest, I had set up two tasks related to reStructured Text. One of them was to write a plugin so that a rst file could be loaded, transformed into a html file (by docutils) and displayed by Crunchy. This was done a while ago as I reported here. However, when this is done, Crunchy treats all the code elements on a page (inside "pre" tags) the same way, which is specified via the variable crunchy.no_markup. This does not allow for the same fine-tuning that is possible with the addition of vlam (very little added markup) to a normal html page.

The second rst task was to write "directives" so that vlam could be inserted inside rst pages. I had read quickly about rst directives and figured this would be fairly complicated. From my superficial reading, I *thought* it would require a modification to docutils that would have to be incorporated into the core (or make a special version of docutils that incorporated those directives). Of course this would make little sense. In any event, a student wrote a 200 lines program that defined most required "rst directives" for Crunchy, allowing to take a rst file as input and output an html file. All I had to do was to cut-and-paste the student's program into the existing rst plugin, change 2 lines of code, and I now have a fully working vlam-compatible rst loader for Crunchy.

This will be part of the next Crunchy release.

Thursday, January 10, 2008

Small tip for porting unit tests to Python 3.0

While working on increasing test coverage of Crunchy, I encountered a puzzling change when running doctest based unit tests under Python 3.0a and Python 2.5. One test (of a function that was definitely too long) was failing under Python 3.0a1/2 with a single line containing the number 20 appearing in the output - in addition to all the other expected ones. To make a long story short, the problem was linked to PEP 3116. Using Python 3.0a1/2, when a file is written using ".write()", there is a return value corresponding to the number of bytes written; the "old" behavior was to return None. In order to have doctest-based tests running successfully under both 2.x and 3.x, we need to replace things like

f.write()

by

dummy = f.write()

I understand that the plan is to backport many of the Python 3.0 features to Python 2.6 so as to ease the transition, and use an automated tool for the conversion. I don't think that features from PEP 3116 will be implemented in Python 2.6, since they are very different from 2.5. And I am doubtful that the conversion tool will take care of including a dummy assignment for functions that will now return a value different than None. I am hoping that this post might end up saving a little bit of time for any reader that tries to migrate their code from 2.x to 3.0.

On a different topic altogether, I have fixed the bug which resulted in removing the styles from remote web pages when using Python 3.0 in conjunction with Crunchy. This will make Crunchy a more attractive tool to use in going through the official Python tutorial for Python 3.0. For the moment, the Crunchy embedded interpreter still only works with Python 3.0a1 and not 3.0a2. Hopefully I'll have this resolved in new public release by Pycon 2008 so that Johannes can demonstrate it in his talk.

Monday, January 07, 2008

More encoding pains...

It seems like every early January brings some new encoding pains...

Eons
ago (less than a year) I used to program on a Windows based computer. My Windows user name was simply André, which is not surprising since it's my first name. However, the observant reader may have noticed that my name requires one non-ASCII character. Such a small detail... that can cause so much annoyance.

Two years ago, on January 6th, I wrote about using site customization so that my favourite Python editor at the time (SPE) could deal with my user name.

Last year, on January 3rd, I wrote about how Crunchy dealt with encoding issues in a way that was independent of any site customizations. At the time, an astute reader made the following comment (which I had forgotten until today, when I decided to write this blog entry):

Without having looked at the rest of your code, so I might be completely off here, this somehow looks wrong:

result = result.decode(sys.getdefaultencoding()).encode('utf-8')

The reason I say this is that you're decoding and encoding in the same place. Since Python unicode support is so good, it's generally a good idea to decode to unicode any use input you get as early as possible, and to encode only as late as possible when outputting strings. Since you're doing complicated web ui stuff here, so it may be that you're not doing anything with 'result' between input and returning it to the browser, but if you are, the string should have already been decoded by the time it gets to this line. Otherwise this will bite you anytime you try to do anything with the string like simple concatenation. [emphasis added]
Of course, since Crunchy was working properly, I quickly dismissed that comment. With the way everything was implemented, Crunchy was working just fine... In fact, to this day, if you download the latest public release (0.9.8.6), everything works just fine - even if your user name includes non-ASCII characters.

However ... in adapting Crunchy to works with Python 3.0, I do things with the various strings like simple concatenation ... and, of course, this cause some problems as I found out when I "borrowed" the old Windows computer from my daughter to try the latest changes I had made.

It. Did. Not. Work.

Ok, so I have two possible solutions:
  1. Trade with my daughter for a while, letting her use my MacBook (which she loves!) while I use the "old" Windows desktop.
  2. Create an account with an accented name on my MacBook.
Well, as much as I love my daughter, I could not face the pain of going back to using the Windows desktop as my main computer. So, solution 2 was an easy choice.

Except that it wasn't....

My account name on the mac is my full name (André Roberge). Hmmm... this has already a non-ASCII character. But my home directory is "andre" - no accent. Under Mac OS, a user has a full name that appears in the login window, and a short name used for the root directory (/Users/andre in my case). When I tried to create a new account with a non-ASCII character in the short name, it just beeped and refused to enter it.

In search for an answer, I posted on three Mac related forums, got either no answer or some unhelpful and wrong answer on two of them ... I was considering posting on the Python list, but, fortunately, I eventually got a useful answer.

So, if you are thinking of writing i18n applications that either can run from any directory or that save information in the user's default directory, or both, and you want to make sure that it will work on a Windows computer, here's how you can do it on a Mac (under OS X 10.5):
  1. Create a test account user the account manager under any name of your choosing; however, the "short name" will have to be ASCII only (at this stage).
  2. From the account manager, ctrl-click on the account after it has been created; this will bring an advanced dialog.
  3. Edit the advanced dialog to change the short name, and the home directory, to the desired value. I chose the name "accentué" (self-referencing name, if you know French). Note that doing so does not change the actual name of the directory.
  4. Go to a terminal window and do "sudo mv old_name new_name" to change the name of the home directory that was created at step 1.
After I did all this, the development version of Crunchy did not work from the new account. This pleased me very much: it likely will meant that I do not have to trade computers with my daughter to continue working on Crunchy. ;-)