Saturday, December 31, 2005

Journey to 117

This blog entry is dedicated to Michael Spencer whose 120 character-long entry in the PyContest was submitted less than 90 minutes after the official beginning of submissions. Michael's entry gave a target to aim for which few people managed to reach. I strongly suspect that, in large part because of this, various hints have been posted on the web, enabling more people to slowly crawl towards ever-shorter solutions. I must admit that without the various hints I have read, I would never have been able to challenge Michael's solution.


After being asked a few questions here and elsewhere, I've decided that I should try and document the reasoning that eventually led me to my final solution to the PyContest. Of course, it is going to be more linear here (and with fewer dead-end explorations) than it was in real life. I definitely welcome comments from others' journeys.


First solution


After writing on paper the desired output for an input that included all digits, I quickly wrote down the following program to produce the desired output:



def seven_seg(i):
~~n=[[" _ "," "," _ "," _ "," "," _ "," _ "," _ "," _ "," _ "],
~~~~~["| |"," |"," _|"," _|","|_|","|_ ","|_ "," |","|_|","|_|"],
~~~~~["|_|"," |","|_ "," _|"," |"," _|","|_|"," |","|_|"," _|" ]]
~~m=''
~~for j in 0,1,2:
~~~~for d in i:
~~~~~~~~m+=n[j][int(d)]
~~~~m+='\n'
~~return m
print seven_seg('0123456789')

In an attempt to preserve the visual appearance, I have replaced all the leading spaces by "~". In the future, whenever you see a "~", take it to be a space. After removing the print statement, the number of characters is 310, which is quite a ways from 117. One can reduce the number of characters by removing many unnecessary spaces but I will not do that until we are much closer to the "final" solution. Furthermore, I will keep the explicit for loops rather than using generators [with the ''.join() method] until the very end as I find the code easier to read that way. I will, however, move the [m=''] assignment within the function argument, thus removing a line (and at least two extra characters!) from the solution.


My next step was to look at the various "triplets" (three-character combinations) that appeared in "n". Quite a few are repeated. In fact, there are 7 unique triplets; they are:



"~~~", "~_~", "~_|", "|_~", "|_|", "~~|", "|~|"

With this ordering, and giving each triplet a number from 0 to 6, we have each input digit can be given by the following three number combination:
0=164, 1=055, 2=123, 3=133, 4=065, 5=132, 6=134, 7=155, 8=144, 9=142.


Note that the first digit of each triplet is either a 0 or a 1; this will be important later. Using these triplets leads to a second, shorter, solution:



def seven_seg(i, m=''):
~~n=["~~~","~_~","~_|","|_~","|_|","~~|","|~|"]
~~for j in '1011011111', '6622433544', '4632524542':
~~~~for d in i:
~~~~~~m+= n[int(j[int(d)])]
~~~~m+= '\n'
~~return m

Note that the string '1011011111' represents the first digit of each triplet, and that the order in which they appear is '0123456789'.


By moving the inner for loop statement (5th line) at the end of the 4th line, using a single space for the first indentation and a tab character (expanded by Python to 8 spaces) for the second level of indentation and removing other non-required spaces, it is possible to reduce the length of the code to 171 characters -- still a long way from our final goal.


The next step is to eliminate "n" altogether; we can do this in two steps.
The first step is simply use it as is in the line



def seven_seg(i, m=''):
~~for j in '1011011111', '6622433544', '4632524542':
~~~~for d in i:
~~~~~~m+= ["~~~","~_~","~_|","|_~","|_|","~~|","|~|"][int(j[int(d)])]
~~~~m+= '\n'
~~return m

The second step is to try to replace the list of strings by a single string and extract, when assigning, a 3-character substring. The shortest such string that we can construct is 14 character long. There are many such strings; here is one of many examples:



'~_~_|_~~~|~|_|'
~01234567890123

Using this string, we see that the last 3-character substring, '|_|', starts at index 11. To encode the position of each beginning index, we would need at least 2 digits for some. A better way appears to lengthen the string a bit so that each required 3-character substring starts on an even index -- the last character of one substring being the first character of the next. Here is one such solution -- which, other than for some unnecessary spaces, is the exact first solution that I submitted at the exact opening of the contest at 14:00 UTC; for one minute it was in first place. :-)



def seven_seg(i, m=''):
~for s in '5455455555', '2600133611', '1630601610':
~~for k in i:
~~~~j=int(s[int(k)])*2;
~~~~m+='~_|_|~|_~~~_~~|'[j:j+3]
~~m+='\n'
~return m

Not having done any programming for a while, I didn't realise at the time that 'some_string'[j:j+3] could be written as 'some_string'[j][:3], thereby saving one assignment (to j) and some precious characters all around.


The next step is to try to shorten the three long numbers: '5455455555', '2600133611' and '160601610'. This can be done by encoding them in a base other than 10 and extracting the result when needed. A better way however is to re-arrange the number as a series of triplets and then encode the result. For example, the three long numbers just mentioned can be re-ordered as follows:



0=521, 1=466, 2=503, 3=500, 4=416, 5=530, 6=531, 7=566, 8=511, 9=510

This gives rather large integers (ex: 521) to try to encode in a base different from 10. A better choice would be to use the ordering



0=164, 1=055, 2=123, 3=133, 4=065, 5=132, 6=134, 7=155, 8=144, 9=142

that we had mentioned earlier, as it gives smaller integers. Now, Python's built-in function int() allows conversion to base 10 from other bases up to 36. Still, that would require a 2 character encoding from each triplet. A better choice is to take each triplet to be the decimal representation of an ascii character. As each triplet is a number less than 255, this is certainly possible (barring some unfortunate coincidence when a given triplet would correspond to a newline character.)


Actually, since none of the individual digits in the triplets is greater than 6, we can take them to be numbers in a base less than 10; from what I gather, most people chose 8 (as they try to do stuff with bit shifts and the like) and I chose 7 (for one of my submissions). Thus, a number like 164 is taken to mean

164 (in base 7)= 1*49 + 6*7 + 4 = 95 (in base 10). This will help further reduce the number of characters. Let's assume that everything is correct with these choices (i.e. still no newline character) and that the encoding of our triplets can be represented as



0=z, 1=o, 2=t, 3=T, 4=f, 5=F, 6=s, 7=S, 8=e, 9=n

where I have use the first letter of each number as a representation of its encoding.


As an aside, some useful comments about this (and other aspects of this problem) can be found on the following two blog entries (including comments):

Guyon Morée's blog




Roberto Alsina's blog


The corresponding program, still written with the explicit for-loops, can be written schematically as follows:



def seven_seg(i, m=''):
~for a in 49, 7, 0:
~~for k in i:
~~~~m+='~_|_|~|_~~~_~~|'[ord("z0tTfFsSen"[int(k)])/a%7*2:][:3]
~~m+='\n'
~return m

Let's examine what the innermost loop does. We convert each character in the input string into a digit, extract the corresponding ascii-encoded triplet, decode it into an actual number, divide by the appropriate power of our base and take the modulo in that base, thus extracting the relevant digit in the triplet; the result is multiplied by 2 to find the relevant index in our string of "~|_". So you can see the advantage of using a base less than 10 as it takes fewer characters to write.


To make the solution as short as possible, we rewrite it as a lambda and use join() and generators instead of explicit for loops. This also eliminates the use of an extra variable ("m"). The result can be written as:



j=''.join
seven_seg=lambda i:j(
j('~_|_|~|_~~~_~~|'[ord("z0tTfFsSen"[int(k)])/a%7*2:][:3]for k in i)
+'\n' for a in(49,7,0))

When all the extra spaces are removed, this gives a 121-character long solution. I'll refer you to the already mentioned blogs for a 120 character long solution!


To get to 117 characters, we need a slightly different approach.


We need to go back to some assumptions we made earlier. Do we really need to have even-numbered indexes? If not, we could use a shorter "~|_" type string which included all relevant 3-character substrings, thereby possibly saving one character. Let us look at such a possible string and the various indices. I will take the string of my posted 117 character long solution '~_~~~|_|_~_|~|', and extract the location of the indices of the relevant substrings:
"~~~":2, "~_~":0, "~_|":9, "|_~":7, "|_|":5, "~~|":3, "|~|":11.
With these 3-character substrings, our triplets are:


0=0-11-5, 1=233, 2=097, 3=099, 4=253, 5=079, 6=075, 7=033, 8=055, 9=059


For each triplet, instead of encoding it in a given base, we will look at a number which, when taking the modulo with three different bases, will give us back the relevant digit. By this, I mean something like the following:



165%3 = 0
165%14 = 11
165%10 = 5

i.e., the triplet for "0" could be represented by the integer 165 when taking the modulo with 3, 14 and 10 respectively - as I have done in the 117 solution. Using brute force, which is what one does when one is too tired, one can loop over the integers less than 255 and find out if all the relevant triplets can be found. Or, you notice that 11*5*3 = 165 [11%14=11, 5%10=5, 3%3=0], and you proceed from there. With this information (and the comments on the previous blog entry), you should now have all the relevant information to understand how this solution was obtained.


On that final note, I wish you a very Happy New Year!

Friday, December 30, 2005

pyContest challenge: 117 character-long solution

It has been quite a while since I did any programming; the absence of posts on this blog (nothing since the end of July) is a reflection of this. Still, with a welcome break during the Xmas holidays, I got spurred back into action with the http://www.pycontest.net/ pyContest challenge. It has been fun.

On to serious business. A 117 byte solution to the challenge is given by:

j=''.join;seven_seg=lambda z:j(j(' _ |_|_ _| |'
[ord("'\xa5\x8f\xb1\xdb\xad\xbdi\x03K\x9f'"[int(a)])%u:][:3]for a in z)
+"\n"for u in(3,14,10))

where I have written the encoded string in a way that blogger would not complain about and added some line breaks as blogger breaks it up at inappropriate places. Last night, I couldn't figure out how to generate the string with the \x03 character included. (Today, after the challenge was over, I simply did print chr(3) in a Python interpreter and cut and paste the appropriate character in the string.) My attempts at generating the encoded string and printing to a file with the \x03 character included always resulted in an empty file.

So, what I did was to subtract one from each character before writing to the file ... and then change "ord" to "-~ord" to shift back the values to the right ones ... but thereby adding two extra characters. As this was already a solution shorter than those posted, I decided it was time to give up and try to get some much needed sleep.

Note that the string ' _ |_|_ _| |' is shorter than the "usual one" where the 3-character strings start on an even index.

Here's how to generate the encoded string (with 33 replacing 3; replace with cut-and-paste after):

dec_list=(165,143,177,219,173,189,105,33,75,159)
encoded = ''.join(chr(i) for i in dec_list)

These decimal numbers encode the following
>>> 165%3
0
>>> 165%14
11
>>> 165%10
5

Thus the number '0' is represented by (0, 11, 5) which are the beginning indices of 3-character strings in ' _ |_|_ _| |'. Thus,
'0' = (' _ ', '| |', '|_|').

I was going to wait until the official announcement to post this but thinking of how curious I was earlier on to eventually find out what I thought would be the winning solution (120 character), I decided it was time to do it.

Note For a more detailed explanation, see the next entry at Journey to 117

Friday, July 29, 2005

Poor man's i18n

Internationalization (i18n) of a [Python] program appears to be daunting task. After looking on the web, I haven't been able to find a simple tutorial explaining the steps to be taken; the closest that I have found is the wxPython one, on the wxPython wiki. The procedure described appears to be rather involved, even if one only deals with translations of text (and not localization of dates, currencies, etc.), which is what I will focus on.

The first step is the replacement of all strings of the form:
"This needs to be translated"
by the following call (interpreted to be a C macro in Gnu's gettext)
_("This needs to be translated")
which is very simple.

The standard way then requires the use of gettextand results in the creation of ".pot" files (Portable Object Templates) to be copied and translated in ".po" (Portable Object) files by a human translator; these are then converted into ".mo" (Machine Object) files by a compiler. Yet, a few more steps, mostly dealing with directory structures and setting up locales, are needed before one can conclude the process.

I present here a simpler way to proceed which, at a later time, can easily be converted to the standard gettext method as it basically uses the same "_()" notation required by gettext. This was inspired by a comp.lang.python post by Martin v. Löwis to a question I asked almost a year ago.

Consider the following simple (English) program [Note: "." are used to indicate indentation as Blogger appears to eat up leading spaces]
def name():
....print "My name is Andre"

if __name__ == '__main__':
....name()
The French translation of this program would be

# -*- coding: latin-1 -*-
def name():
....print u"Je m'appelle André"

if __name__ == '__main__':
....name()
Without further ado, here's the internationalized version using a poor man's i18n method and demonstrating how one can easily switch between languages:

from translate import _, select

def name():
....print _("My name is Andre")

if __name__ == '__main__':
....name()
....select('fr')
....name()
....select('en')
....name()
The key here is the creation of the simple translate.py module:
__language = 'en'  # leading double underscore to avoid namespace collisions

def select(lang):
....global __language
....__language = lang
....if lang == 'fr':
........global fr
........import fr
def _(message):
....if __language == 'en':
........return message
....elif __language == 'fr':
........return fr.translation[message]
together with the language-specific fr.py module containing a single dictionary whose keys are the original English strings.:
# -*- coding: latin-1 -*-

translation = {
"My name is Andre" : u"Je m'appelle André"
}
That's it! Try it out!

In conclusion, if you want to make your programs translation friendly, all you have to do is:
  1. replace all "strings" by _("strings")
  2. include the statement "from translate import _, select" at the beginning of your program.
  3. create a file named "translate.py" containing the following:
def select(lang):
....pass

def _(message):
....return message
and leave the rest to the international users/programmers; you will have done them a huge favour!

Saturday, May 21, 2005

N = N + 1

When first learning about programming, most people get confused by expressions like
N = N + 1
The origin of the confusion is easy to understand. Expressions like
N = N + 1 often occurs in a computer program dealing with mathematical concepts. And, in mathematics, the expression
N = N + 1
is just plain wrong. Ergo the confusion. To get around the confusion, some programming language introduce a different symbol for "variable assignment". Something like ":=" , or "<--" can be used for that purpose. Python uses the equality symbol for variable assignments. As a result, beginners are often presented with the baffling N = N + 1 before they have understood enough about the language to grasp quickly what is meant by that expression. I suggest that examples based on a "human language context" as opposed to a traditional "mathematical context" could be more useful to teach the meaning of variable assignment.

1. A little story


John, Pierre and Antonio are three good friends that share a house. John is originally from England, Pierre from France and Antonio from Spain.

One day, John was speaking in English to Pierre. During that conversation, John said the word "once", referring to an event that occurred a single time. Pierre translated this concept in his mind as "une_fois".

>>> once = 1
>>> une_fois = once

>>> print 'once = ', once, ' une_fois = ', une_fois
once = 1 une_fois = 1

Later that day, John was speaking in Spanish to Antonio. He mentioned his favourite number: "once" (11), made up of the same letters as the English word "once".

>>> once = 11

However, the conversation that John is having with Antonio has not changed the meaning of "une_fois" for Pierre.

>>> print 'once = ', once, ' une_fois = ', une_fois
once = 11 une_fois = 1

The various words in human languages are associated with concepts or objects that are independent of the words used to represent them. So, "une_fois" is associated with the number 1, through its association with the word "once". However, "once" is just a word, not an object. So, when "once" takes on a different value later, the concept represented by "une_fois" is unchanged.

2. The meaning of = in Python

In Python the symbol "=" associates a synonym on its left hand side to a concept on its right hand side.

synonym = concept/object

W
ords that are chosen as synonyms, are chosen in a context dependent way. In our later story above, names are chosen based on the "human language context". The same combination of letters can mean two different things in different languages, or even in the same language. I remember a little test a friend of mine did in grad school, showing parts of a newspaper headline to fellow classmates (all science geeks) and asking them to quickly read it aloud. The word "unionized" appeared in the headline. Now, think "labour" and "chemistry" and you will think of two different way of pronouncing this word.

Now, in the typical programming situation, "N+1" represents a mathematical concept. We can give it a (temporary) synonym of our own chosing, depending on the context...

3. Another short story

John makes a groceries shopping list, with two items: "bananas" and "pears". He then tapes it on the fridge door.

>>> groceries = ["bananas", "pears"]
>>> epicerie = groceries
>>> print groceries
['bananas', 'pears']
>>> print epicerie
['bananas', 'pears']
Later that day, Pierre walks by and notices the list too (une liste d'épicerie) which he associates with the word epicerie. Still later that day, John adds "apples" to the list on the fridge door. He notices that Antonio had already bought pears and remembers that they also need oranges. He thus scratches the item "pears" and replace it by "oranges".

>>> groceries.append("apples")
>>> print groceries
['bananas', 'pears', 'apples']

>>> groceries[1]="oranges"
>>> print groceries
['bananas', 'oranges', 'apples']

>>> print epicerie
['bananas', 'oranges', 'apples']
The list has changed throughout the day, but it remains taped to the fridge. When they think about the grocery list, both John and Pierre are referring to the same object in their own language. When that object changes, it changes for both of them in the same way.

4. Yet another short story

John makes a groceries shopping list, with two items: "bananas" and "pears". He then tapes it on the fridge door. Later that day, Pierre walks by and notices the list and decides to make his own copy.
>>> groceries = ["bananas", "pears"]
>>> epicerie = list(groceries)

Still later that day, John adds "apples" to the list on the fridge door. He notices that Antonio had already bought pears and remembers that they also need oranges. He thus scratches the item "pears" and replace it by "oranges".

>>> groceries.append("apples")
>>> groceries[1] = "oranges"
>>> print groceries
['bananas', 'oranges', 'apples']
Pierre's list, however, is unchanged.
>>> print epicerie
['bananas', 'pears']
Let's just hope that Pierre won't go out and buy pears!

5. Conclusion?

Your comments are going to provide a much better conclusion than I could write at this point!

Wednesday, May 18, 2005

News about rur-ple

Things now seem to happen all at once on the rur-ple front.

A. Judkis at the Academy of Allied Health and Science, in Neptune NJ, had decided to inflict RUR-PLE on his 10th graders. The response has been so far very positive from the students from what I hear.

Someone contacted me offering to translate RUR-PLE in Spanish. Hopefully this can be done before version 1.0 is released.

A number of bugs were introduced accidently in version 0.8.5 - mostly due to the language selection ability (English/French) that was added. These bugs never manifested themselves on my computer, but they seem to affect everyone else! I suspect it's because I have a French version of Windows. These bugs should now be fixed in version 0.8.6a.

I have a neat (I think) Tower of Hanoi solution worked out and included (undocumented) in version 0.8.6a. This version also includes a solution to the 8 queen puzzle (also undocumented).

Unfortunately, I will be on the road for 3 weeks in June, and will not have time to do significant work on RUR-PLE for a while.

Tuesday, May 10, 2005

rur-ple version 0.8.5 released

Version 0.8.5 of rur-ple has been released.
The web site has completely changed; it includes over 35 new pages.
http://rur-ple.sourceforge.net/

--------------------------------------
Learning to program computer should be fun, for adults and children alike. RUR-PLE is an environment designed to help you learn computer programming using the language Python. To use RUR-PLE, you need wxPython. You can learn more about RUR-PLE or you can go to the download page.

=======
Apprendre à programmer devrait être amusant,
que l'on soit un adulte ou un enfant.

RUR-PLE est un environnement conçu pour vous aider à apprendre la programmation informatique avec le langage Python. Pour utiliser RUR-PLE, vous aurez besoin de wxPython. Vous pouvez en apprendre davantage au sujet de RUR-PLE ou vous pouvez aller à la page de téléchargement.
==================

(This morning, I got an invitation to talk about rur-ple at EuroPython next June. Did I dream this? ... Reality has not sunk in yet: I'm still floating above the ground. :-)

Saturday, May 07, 2005

Getting closer to version 1.0 of rur-ple

I just release version 0.8 of my project, rur-ple, on sourceforge. It looks even better than what I dreamed it would when I first started working on it:
Teaching computer programming

I do find it difficult to deal with the version number issue. On the one hand, I feel the software part has all the features I wanted (and more) working when I started this project. In that sense, it is past a version 1.0.

On the other hand, the 'ple' part in rur-ple means Python Learning Environment. I mean for this to be a 'complete' tutorial in learning about programming with Python. The lessons I have written so far only cover the use of about half the Python keywords. So, I can't bring myself to calling it version 1.0. So, I've sort of settled on adding a '0.01' to the version number for each additional keyword/programming concept I manage to introduce. However, somehow, I feel that giving a sub- 1.0 number to the version is sending the message that the software part is somehow buggy or incomplete, when it is (almost) bug-free and certainly feature complete.

On a somewhat unrelated note, I have found a solution on my own as to how to
change the language used in a wxPython-based GUI APP without having to restart the application under Windows. As a friend pointed out, by not using the common I18N starndard (gettext, .po and .mo files), it might deter people from adapting it to languages other than English and French which are currently working. However, I feel uncomfortable with what seems to be an underlying assumption about the "standard" approach.

As I understand, the standard approach is based on looking for some locale information on the user's machine. If the local language is available, then the app uses that language, otherwise it reverts to the default - usually English. But what if the user is a native Spanish speaker who has some knowledge of Italian and French, but essentially none in English. And what if translation are available in Italian, French but not in Spanish. As I understand it, the average user will be presented with an English interface, and will normally have no clue that French and Italian interfaces are available. Or, if she does, she might not know how to switch locale on her computer. (I don't ... as I never needed to myself. Can it be done easily under Windows XP - just a rhetorical question; answer not really needed :-).

The approach I use is to have the available languages in a pull down list (with flags as icons on the side) AND the language names appear in that language, i.e. English, Français, etc. (and not English, French ... or anglais, français).

This strikes me as being more sensible.

Adapting/translating a French expression: It is my own opinion, and I share it completely with myself! C'est mon opinion, et je la partage entièrement avec moi-même.

Friday, April 08, 2005

Computing derivatives using Python

In a recent thread on Python edu-sig, Kirby Urner started a discussion about the use of python's decorators in mathematics. Some of the examples included numerically computing derivatives (and integrals) of functions. This generated a fair bit of discussion and the general conclusion was that this was not a good use of decorators. Some information contributed by yours truly required the reader to do a graph to properly visualise exactly what was going on. I thought it appropriate to present a summary that included the required graphical information here as a permanent record.

The mathematical definition of the derivative of a function f(x) at point x is to take a limit as "h" goes to zero of the following expression:

( f(x+h) - f(x) ) / h

An example is given below: the red curve is the function f(x) and the green curve is the tangent line at x (with same slope as the derivative). The blue curve is a line going through x whose slope equals that of the above formula with a non-zero value of h.



As you can see, the slopes of the two straight lines are quite different. A better way of numerically computing the derivative for the same value of "h" is by taking a symmetrical interval around x as follows:

( f(x + h/2) - f(x - h/2) ) / h

This is illustrated in the next picture. As you can see, the two straight lines have much more similar slopes - hence, the value computed for the derivative is going to be more accurate.




The corresponding python code is as follows:

def derivative(f):
...."""
....Computes the numerical derivative of a function.
...."""
....def df(x, h=0.1e-5):
........return ( f(x+h/2) - f(x-h/2) )/h
....return df

And we use it as follows:

# sample function
def g(x): return x*x*x

# first derivative
dg = derivative(g)

# second derivative
d2g = derivative(dg) # == derivative(derivative(g))

# printing the value computed at a given point:

print dg(3)
print dg(3, 0.001)
print dg(3, 1e-10) # smaller h is not always more precise

Sunday, March 27, 2005

Changing blog's name

Originally, it was my intention to have a blog to post alternate viewpoints (hence the former name Un autre point de vue) as well as some Python musing. Since this blog has been listed in both
Planet Python and
Daily Python-URL, I have felt that non-Python related stuff didn't really belong here. I may revive the old title in a separate blog later...

James Tauber : Simple Algorithm for Recurring Tasks

James Tauber : Simple Algorithm for Recurring Tasks talks about a product (sciral) for keeping track of recurring tasks with priorities roughly based on how late they are. Sciral is now also available for Windows, which Tauber didn't mention. This looks like a simple thing to do in python; I'll have to try to implement it using wxPython.

Thursday, March 24, 2005

First looks: Text Processing in Python

I just received an autographed copy of
Text Processing in Python (a book) from David Mertz (Thanks David; great service!). While the content of this book is available (in ascii format) on the author's website, I find that there is nothing like having a copy in hand, especially when it is a good quality paperback (as expected from Addison Wesley).

There is a lot of material in that book; while I am an avid reader, I suspect that it will take me quite a while to get through it. If you can't wait to learn more about this book, you might want to have a look at
this review.

Wednesday, January 26, 2005

Useful wxPython info

In an old blog, that I just stumbled upon, Golden spud gives some interesting information about wxPython. Perhaps it is possible to use his technique to change the language used in a GUI without having to restart the application under Windows.

Tuesday, January 11, 2005

Teaching an old Python keyword new tricks

In my last post, I provided some additional thoughts about maintaining Python's pseudocode appearance while adding static typing information. The notation I suggested also allowed for easy inclusion of pre-conditions and post-conditions. However, this came as a cost of adding where as a new keyword to Python. After giving some more thoughts to this idea I realise that a new keyword was not needed at all. This can be seen starting from the following example using the keyword where as introduced before:
01    def gcd(a, b):

02 where:
03 assert isinstance(a, int)
04 assert isinstance(b, int)
05 '''Returns the Greatest Common Divisor,
06 implementing Euclid's algorithm.
07 Input arguments must be integers.'''
08 while a:
09 a, b = b%a, a
10 return b
If we remove the line with where: and change the indentation level, we have essentially valid (today!) Python syntax, but with many (two in this case) lines starting with the keyword assert. If we allow this keyword to introduce the beginning of a bloc of statements, we could then write:
01    def gcd(a, b):

02 assert:
03 isinstance(a, int)
04 isinstance(b, int)
05 '''Returns the Greatest Common Divisor,
06 implementing Euclid's algorithm.
07 Input arguments must be integers.'''
08 while a:
09 a, b = b%a, a
10 return b
The idea of a keyword that allows both a bloc structure with a colon, as well as a single line expression is not new: it has been introduced in Python with the addition of list comprehensions. For example, we can have
01    for i in range(10):

02 ...
03 ...
as well as
a = [i for i in range(10)]

With this suggested syntax, pre-conditions can easily be added.
01    def gcd(a, b):

02 assert:
03 isinstance(a, int)
04 isinstance(b, int)
05 a != 0 and b != 0 # fake pre-condition
06 '''Returns the Greatest Common Divisor,
07 implementing Euclid's algorithm.
08 Input arguments must be integers.'''
09 while a:
10 a, b = b%a, a
11 return b
Keeping this block structure in mind, we can add type information on return values as follows:
01    def gcd(a, b):

02 assert:
03 isinstance(a, int)
04 isinstance(b, int)
05 return c assert:
06 isinstance(c, int)
07 '''Returns the Greatest Common Divisor,
08 implementing Euclid's algorithm.
09 Input arguments must be integers;
10 return value is an integer.'''
11 while a:
12 a, b = b%a, a
13 return b
as well as adding pre- and post-conditions:
01    def gcd(a, b):

02 assert:
03 isinstance(a, int)
04 isinstance(b, int)
05 a != 0 and b != 0 # fake pre-condition
06 return c assert:
07 isinstance(c, int)
08 c != 0 # fake post-condition
09 '''Returns the Greatest Common Divisor,
10 implementing Euclid's algorithm.
11 Input arguments must be integers;
12 return value is an integer.'''
13 while a:
14 a, b = b%a, a
15 return b
To move further towards something similar to what I suggested in my previous post, we need to identify, as Guido van Rossum had himself suggested, the statement
isinstance(object, class-or-type-or-tuple)

with
object: class-or-type-or-tuple

Doing so allows us to rewrite the last example as
01    def gcd(a, b):

02 assert:
03 a: int
04 b: int
05 a != 0 and b != 0 # fake pre-condition
06 return c assert:
07 c: int
08 c != 0 # fake post-condition
09 '''Returns the Greatest Common Divisor,
10 implementing Euclid's algorithm.
11 Input arguments must be integers;
12 return value is an integer.'''
13 while a:
14 a, b = b%a, a
15 return b
which is essentially what was written in the previous post, but with the replacement of a new keyword (where) by an old one (assert). The general form would be:
01    def foo(...):

02 assert:
. type information and/or
. pre-conditions
. return [...] assert:
. type information and/or
. post-conditions
. '''docstring'''
. ...body...
As this post is already long enough, I will not discuss the interface issue here; the ideas introduced in the previous two posts are easily translated by replacing where by assert everywhere.

Sunday, January 09, 2005

Python as pseudo-code and Optional Static Typing

In my last post, I gave a series of examples intended to show how Python's pseudocode syntax could be preserved while adding optional static typing. While I felt that the syntax was self-explanatory (as pseudocode should be), I thought it might be useful to examine in more details the why's and the how's of my suggestion.

Before proceeding further, it might be argued that this is a case of premature optimization: Python does not currently have (optional) static typing so the concept should be discussed first, before implementation details and syntax can be looked at objectively. Yet, one of the strength of Python is its syntax, and many objections (NIMPY as GvR described it) to the proposed addition(s) to the language came from an unease with the suggested syntax.

1. Functions and optional typing information

I will use GvR's version of a Greatest Common Divisor finding function as an example and build from it; the complexity of the examples will increase as we go in an attempt to fully illustrate the suggested syntax. This function is defined as follows:

01 def gcd(a, b):
02 '''Returns the Greatest Common Divisor,
03 implementing Euclid's algorithm.'''
04 while a:
05 a, b = b%a, a
06 return b
I have included an optional docstring which, following Python's normal syntax, is indented at the same level as the function's body. The docstring is intended to be easily extractable for documentation purpose and is intended for human readers only. Suppose that it is required to provide static typing information. Using the notation I used in my previous post, here's how one might write additional syntax information.
01    def gcd(a, b):

02 where:
03 a: int, b: int
04 return c where:
05 c: int
06 '''Returns the Greatest Common Divisor,
07 implementing Euclid's algorithm.
08 Input arguments must be integers;
09 return value is an integer.'''
10 while a:
11 a, b = b%a, a
12 return b
The suggested where keyword, followed by a mandatory colon, follows the usual Python syntax of introducing a bloc of code identified by its indentation level. So we now have three structural items at the same indentation level: the static typing information, the docstring, and
the function body.

The docstring is between the typing information and the actual code, allowing to better separate visually the two of them. The docstring has been updated as well to inform a potential user
reading the documention as to the particular of this function. I have given some thoughts about including the typing information within the docstring (à la doctest) in order to avoid writing about the type information twice, as in the following:
01    def gcd(a, b):

02 '''Returns the Greatest Common Divisor,
03 implementing Euclid's algorithm.
04 where:
05 a: int, b: int
06 return c where:
07 c: int'''
08 while a:
09 a, b = b%a, a
10 return b
However, I find 1) that it is not easier to read; 2) that it would complicate extraction of the docstring for inclusion in a standard documentation (i.e. spaces are significant for only part
of the docstring); 3) it doesn't "scale" well with added features like pre-conditions and post-conditions.

2. Pre- and post-conditions

Building on the previous examples, here is how pre- and post-conditions might might be added within the suggested new syntax with, in this case, somewhat artificial conditions (not required by this algorithm).
01    def gcd(a, b):

02 where:
03 a: int, b: int
04 assert a != 0 and b != 0
05 return c where:
06 c: int
07 assert c <= abs(a) and c <= abs(b)
08 '''Returns the Greatest Common Divisor,
09 implementing Euclid's algorithm.
10 Input arguments must be integers,
11 and can not be both zero;
12 return value is an integer.'''
13 while a:
14 a, b = b%a, a
15 return b

The post-condition assertion

07 assert c <= abs(a) and c <= abs(b)

is an internal verification that the code is correct. It is irrelevant to the documentation which is one more reason to separate it from the docstring, even though such information is sometimes including in docstring. This notation, using the standard assert statement is different from what I suggested in my previous post and respects Python's current syntax.

One can easily imagine that such typing information and conditions might be shared by many functions. Rather than replicating code, it might be possible to define abstract functions, which are functions that have no body, only type information and/or conditions. For example, we could have
01    def two_integer_input(i, j):  # abstract function: no body

02 where:
03 i: int, j: int
04 '''Accepts only two integers as input.'''
05
06
07 def _gcd_return_info(a, b): # abstract function: no body
08 where:
09 return c where:
10 c: int
11 assert c <= abs(a) and c <= abs(b)
12
13
14 def gcd(a, b):
15 where:
16 two_integer_input(a, b)
17 assert a != 0 and b != 0
18 _gcd_return_info(a, b)
19 '''Returns the Greatest Common Divisor,
20 implementing Euclid's algorithm.
21 Input arguments must be integers,
22 and can not be both zero;
23 return value is an integer.'''
24 while a:
25 a, b = b%a, a
26 return b

I realise that this might be starting to look like a return to past suggestions during the decorator debate. However, given that the optional static typing proposal is new, it should be looked at objectively, without rejecting off-hand proposals that were rejected in a different context.

3. Interfaces

Instead of dealing with functions, in oop we have classes and methods. Classes can implement interfaces, which share some similarities with what we called abstract functions above. Here is how, using the notation we have suggested, we might introduce interfaces in the language. Once again I am using examples suggested by Python's BDFL in his blog.
01    interface I:

02 def foo(x):
03 where:
04 x: int
05 x > 0
06 return r where:
07 r: str
08
09 class C(object; I): # derived from object; implements I
10 def foo(self, x):
11 return str(x)

I suggest using a semi-colon to separate based classes from interfaces for clarity. Thus we could have:
01    interface I1(I2, I3):   # derived from I2 and I3, both interfaces

02 def foo(a: t1, b: t2):
03 where:
04 a: t1, b: t2
05 return c where:
06 c: t3
07
08 class C(C1, C2; I1): # derived from C1, C2; implements I1
09 def foo(a, b):
10 return a+b
For a class that does not implement interfaces, the following statements would be equivalent
class C(object):  # preferred

# and
class C(object;): # allowed

Classes that do not derived from a base class (aka 'old-style') might still be accommodated via
class C(; I1, I2)

although presumably they will have disappeared from the language by the time static typing information will have been implemented.


Friday, January 07, 2005

"where" keyword and Python as pseudo-code

There has been much discussion lately about what is now a series of blogs by Guido van Rossum (see Optional Static Typing -- Stop the Flames! for the latest discussion) about optional static typing in Python and related extensions to the language. A recent post
comp.lang.python
by Andrey Tatarinov about using where as a new keyword in the absence of anonymous functions gave me some ideas about how such a new keyword could be useful in rendering pythonic the addition of optional static typing. However, I do not use where in the same context as Tatarinov.

Python is often described as executable pseudo-code. Therefore, rather than explaining in words my ideas, I will let the code (examples taken from Python's BDFL blogs) speak for itself. Note that the amount of code versus type declarations is not really representative of real life in the following examples. I also deliberately did not add blank lines, which could have made this more readable.

1. Optional typing.

def gcd(a, b):
where:
a: int, b: int
return c where:
c: int
while a:
a, b = b%a, a
return b

2. Optional typing with pre- and post- conditions.

def gcd(a, b):
where:
a: int, b: int
a > 0, b > 0
return c where:
c: int
c > 0
c <= a and c <= b
while a:
a, b = b%a, a
return b

3. Interfaces

interface I:
def foo(x):
where:
x: int
x > 0
return r where:
r: str

class C(object; I): # derived from object; implements I
def foo(self, x):
return str(x)

4. More interfaces

interface I1(I2, I3):
def foo(a: t1, b: t2):
where:
a: t1, b: t2
return c where:
c: t3

class C(C1, C2; I1): # derived from C1, C2; implements I1
def foo(a, b):
return a+b

5. Formerly anonymous functions.

list_of_powers = [ powers(x) for x in my_list]:
def powers(x):
return (x, x*x, x*x*x)

6. Formerly anonymous functions with typing information

list_of_powers = [ powers(x) for x in my_list]:
def powers(x):
where:
x: int
return (a, b, c) where:
a: int, b: int, c: int
return (x, x*x, x*x*x)

7. Other optional typing style, meant to be equivalent to above

def type_safe_gcd(a, b):
where:
a: int, b: int
return c where:
c: int

def gcd(a, b):
where type_safe_gcd(a, b);
while a:
a, b = b%a, a
return b

If you "grok" the above, I have made my point. If not, feel free to comment!


Sunday, January 02, 2005

Avoiding spaghetti code

Note to myself:

functions/methods should either
1) change some internal attributes;
2) return a value/object
3) change some attribute of an object created by their parent class.

They should not directly change the state of some objects that calls them. I had the following situation:
X created Y.
X created Z.
A method of X called a method of Z which was then changing the state of an attribute of Y through a pointer reference created for that purpose. Ugly.... and difficult to maintain.

wxPython problem solved

Well, I did manage (with some help) to solve my wxPython problems. I'll update this post to indicate the solution soon.