Archive for the ‘software’ Category

PyGuile - Part 2 - Design Issues

Saturday, September 20th, 2008

While working on the PyGuile, I identified the following design issues.

  1. The data type trees of Scheme and Python do not have an 1:1 correspondence.
    • Do we want to convert a Scheme list into a Python Tuple or a Python List?
    • How about an alist (associative list) - should be a Python List of 2-tuples or a Python Dict?
    • And in the other direction - do we want to convert a Python string into a Scheme string, symbol or keyword?
  2. API for adding plugins which convert between Guile and Python representations of useful data types (such as file handles, images or Berkeley sockets).
  3. How do we want to pass large data structures - convert them immediately, or employ lazy conversion (convert an element only when it is requested)? If we employ lazy conversion, how do we implement the associated bookkeeping? See more about this below.
  4. How do we deal with the different garbage collection regimes of Guile and Python? In particular, how do we make SCM objects owned by Python objects known to the Guile garbage collector?
  5. How will we support Unicode? Bear in mind that we want to minimize manipulations of long text strings.
  6. How to allow each scripting language to seamlessly invoke functions in the other scripting language?

The problem of lack of 1:1 correspondence will be dealt with as follows.

A standard conversion convention, which will work for the overwhelming majority of cases, will be employed. Functions, which have special needs, will have their argument conversions specified by means of a suitable tree-structured template.

When passing a data structure (or object) created in language A to language B, the following cases can happen:

  1. Opaque pointer - B only passes it around. A performs all processing and B just holds the pointer for future reference.
  2. B accesses a single element (or small number of elements) in the data structure.
  3. B loops over all elements of the data structure.
  4. B needs arbitrary access to several elements of the data structure (example: image processing).

Those cases can be dealt with as follows:

  • Case 1 can be handled by wrapping a language A pointer by a language B object, which carries opaque data around.
  • Cases 2,3 can be dealt by means of custom data access procedures (such as Python’s __getitem__()). An element will be converted only when it is actually requested. Elements in nested data structures can be dealt with as in case 1.
  • Case 4 can be handled by implementing a mechanism for plugging in and registering custom conversion functions for specific data types.

In practice, the most tough design issue, which I identified so far, is the management of the SCM objects owned by Python objects.

When a SCM object is assigned to an attribute of a Python object, some registration mechanism needs to
be invoked so that the SCM object can be reclaimed by the Guile garbage collector if the Python object goes out of scope. The registration mechanism needs also to take care of marking the SCM objects while they are owned by a living Python object.

PyGuile - Part 1 - Using Python libraries in Guile (a Scheme implementation) scripts

Friday, September 19th, 2008

For long time I have dreamt of invoking Python libraries from scripts written in Scheme. The reason for this is to be able to enjoy the fantastically rich control structures possible in Scheme, yet use familiar libraries to accomplish useful actions, some of which are unavailable in SLIB and other Scheme libraries.

Now at last I am working on realizing this dream. The Scheme implementation being used is version 1.6 of Guile and the Guile extension being developed embeds a Python 2.4 interpreter. In the future, more recent versions of Guile and Python will be used.

The goals of the project are:

  1. Make it easy to invoke Python libraries from Guile.
  2. The integration between Python and Guile is to be seamless.
  3. The architecture of the implementation shall enable optimizations for efficient runtime behavior.

To accomplish those goals, it is necessary to:

  1. Convert primitive Scheme data types (integers, reals, Booleans, strings, lists) into the corresponding Python data types, and vice versa.
  2. Be able to invoke functions defined in one language from the other language. This has to be bidirectional in order to support callbacks.
  3. Be able to pass around pointers to objects (as opaque values) and invoke methods over them.
  4. Have efficient transfer of control and data between both languages.
  5. Deal with different garbage collection conventions in both environments.
  6. Be able to optimize code for a particular pair of language runtime systems.
  7. Nice to have: support for recursion, especially tail recursion.
  8. Nice to have: thread-safety.

It is envisioned that the software developed in this project will be part of a larger system, which will allow more scripting languages to interoperate with Guile and with each other.

There is another project - Schemepy - which embeds a Scheme interpreter in Python scripts.  This project has different focus and it essentially allows Scheme to be used for those parts of a project, in which its strengths are especially important.

A Vista Conspiracy Theory

Saturday, July 19th, 2008

One possible reason for the stupidity of Microsoft in handling MS-Vista, especially in its attempts to ram MS-Vista through its customers’ throats instead of MS-Windows XP, is as follows.

Shortly after SCO sued IBM and other companies due to violation of its Linux copyright, IBM and possibly other big companies decided upon two-pronged counter attack.  First, they would fight SCO in court to the bitter end.

The conspiracy theory expoused below has to do with the second prong.  The goal here is to cause Microsoft to bleed as much money and as quickly as possible, so that it’ll not have the financial means to continue to support SCO until its defendants wear out.

For this purpose, moles may have been installed in Microsoft (or maybe Microsoft employees were bribed) to deliberately make the wrong managerial decisions, to sap the morale of the working software developers, to entangle the projects in cobwebs, to bog the projects down in intricate dependencies and frivolous compatibilities with the past, to surrender too easily to Hollywood moguls when they ask for DRM measures to be built into MS-Vista.

Since Microsoft had the fatal combination of de-facto monopoly position and huge cash reserves, both had to be attacked.  The monopoly position was attacked by making MS-Vista incompatible with MS-Windows XP, so that people would find it just as easy to switch to Linux or to Mac OS as it is to MS-Vista.  The cash position was attacked by turning MS-Vista into huge cash drain.

A new software developers’ mutual help Web site (no longer) rudely excludes deaf software developers

Thursday, April 17th, 2008

The newly announced http://www.stackoverflow.com/ Web site confines all communications to the audio format. No provision for textual transcription of the audio podcasts exists. Users’ submissions are accepted only if they are in audio format. This is probably the founders’ newest idea for filtering out spam and flames.
However, it is a case of rude inaccessibility. Please do not contribute and do not browse the Web site - and let the founders know your opinion about this case of throwing out the baby with the bathwater.
The announcements in the founders’ blogs are as follows:

What next, a Web site, which excludes gay software developers?

23 APR 2008 UPDATE:

The podcasts are now transcribed into text, making them accessible to the deaf as well as being helpful to people, who want to discover them using search engines, and people having no time to listen through the entire podcast.
The transcription mechanism is Wiki-based, allowing people to transcribe text piece by piece. So even if you have only 15 minutes to spare, you can still make a contribution.
It is still necessary to persuade them to accept questions as text in addition to sound clips…

Choosing a Python module for accessing Microsoft SQL Server Unicode data

Tuesday, December 25th, 2007

One day I found myself in need of Python code, which retrieves Unicode data from Microsoft SQL Server tables. The code needs to run on a PC with MS-Windows XP.

The dbi and odbc modules, which I used in the past, failed miserably in this task, by forcing the Unicode data to be converted into string data, using the ascii encoder.

So, I had to look for other Python modules. My findings from evaluating the relevant Python modules are summarized below.

dbi,odbc from pywin32
  • Package: pywin32-210.win32-py2.5.exe, available from Python for Windows Extensions.
  • Textual data is passed as strings, rather than as Unicode.
  • Parameters in SQL queries are marked by ‘?’.
  • Dates/times are retrieved as instances of the dbi.dbiDate class (essentially, a wrapped long int).
win32com
I was not successful in using the win32com based code, which worked for
Arik Baratz. According to him, this code uses the Microsoft ActiveX Data Objects 2.8 Library. It requires the modified version 209.1 of pywin32, which comes with version 2.5.1.1 of the ActiveState Python distribution. This modified version adds to the win32com class an extra member - client.
You need to add the following line sometime after the import win32com:

win32com.client.gencache.EnsureModule('{DA9159C1-C9D5-4223-81AC-2B1080701D12}',0,1,0)

To actually start working, use win32com.client.Dispatch() to establish a connection to the SQL Server.

pymssql
pyodbc
  • Package: pyodbc-2.0.39.win32-py2.5.exe, available from pyodbc - A Python DB API module for ODBC
  • Textual data is passed as Unicode.
  • Parameters in SQL queries are marked by ‘?’.
  • Dates/times are retrieved as instances of the datetime.datetime class.

The Python module chosen is pyodbc.

Are you the Webmaster of an IE-only Web site?

Saturday, November 3rd, 2007

Then the following is a must read for you:

How Not To Do Market Research

Web sites, which support only IE, would not be visited by people, who use other browsers. So OF COURSE, they would not “report enough traffic to justify” support for W3C standards and/or other browsers.

By the way, my Web site’s browser statistics since the start of November 2007 indicate the following browser percentages (disclaimer for Israeli Webmasters: my Web site’s audience is international):

MS Internet Explorer 42.6%
Mozilla 31.7%
Unknown 12.5%
Firefox 9.7%
All the rest 3.5%

Technical Debt

Saturday, November 3rd, 2007

When developing software, there are several times at which you can choose among quick&ugly hack, which will incur higher maintenance headaches in the future, and slow&clean design. Usually, the slow&clean design is the way to go. But life is complicated and has a way to contrive an exception for each rule. Therefore, one should know how to manage the consequences of quick&ugly hacks.

This is the subject of technical debt, written about by people like Steve McConnell.

Roughly, you incur technical debt whenever you make a design or implementation decision which will require future rework or higher maintenance workload. The decision could be also something trivial such as not bothering to invest in documenting your present design, causing the future maintainer to waste time learning your design before modifying it.

Technical debt is written off when the software package in question is taken out of use due to replacement by a newer software package, the application area becoming irrelevant, or the company going bankrupt.

References:

When is it bad idea to have modularity in software?

Saturday, October 6th, 2007

Modularity is the most successful software engineering practice ever. Unlike other practices, it is practically never abused.
One day I was asked when is it bad idea to modularize software. Of course, for every good thing there are always pathological or contrived circumstances, in which it turns out to be a bad idea. Software modularity is no exception to this.

Algorithms

To be able to present and understand complicated algorithms, they need to be modularized. Then, when wishing to optimize such an algorithm, one typically confines himself to local optimizations, rather than to global optimizations.
When global optimizations are needed, the algorithm developer has to forsake modularity, and the resulting algorithm becomes very big and difficult to comprehend.

Source code level

Due to the way the human brain works, modularity is always good in source code level. However, it needs language support, such as support for macros and in-line functions to allow compilation into efficient machine code.
On the other hand, one can expect the software development environment to have source code pre-processing tools, which work around any language support deficiencies. Nowadays, it is not a big deal, unless one works for a mentally retarded software development operation.

Machine code level

In machine code level, software modularity means usage of DLLs, inter-module interfaces, plug-ins, etc.
This kind of modularity can be bad, if a module interface overhead directly affects a system bottleneck. A system bottleneck could be CPU time, memory consumption, I/O, database accesses, network latency/throughput, etc.
A good system design implements machine language level modularity when the overhead is not critical to performance; and optimizes the interfaces away where system bottlenecks occur.

16th Linux Day

Monday, September 17th, 2007

Today, 16 years ago, Linus Torvalds released version 0.0.1 of the Linux kernel.

This is an occasion to reminisce how I began to use Linux, and how I subsequently switched to 100% Linux usage at home.

I started using Linux about 13 years ago. For me, the killer application was Brian Marick’s GCT - a C Coverage Tool. At the time I worked as freelancer in the area of medical software testing, and needed a way to assess code coverage of my tests.

After the failure of an attempt to port GCT over to the world of 16-bit computing in MS-DOS, I found out about Linux. I soon found Harvey Stein, who had Linux (the Linux-IL mailing list, whose Patron Saint was Harvey Stein, started operating at about the same time - and this is no coincidence!). Mr. Stein let me come to his office and copy from him about 40 5.25″ diskettes of the Slackware distribution.

I copied the diskettes and installed Linux in an empty partition in my 5MB AT386 PC. Soon afterwards, I got GCT working!

The first Kernel version, which I installed, was 1.0.8. Soon after installation, I upgraded to Kernel version 1.1.13.

The old AT386 PC is still operational, and is bootable into either MS-DOS or Linux (Kernel version 1.2.13).

Additional links:

One day I acquired a new PC, but used MS-Windows 95 on it. I used the old AT386 for E-mail and surfing, and the new PC - for software development. At the time I developed software, rather than testing it. Few upgrades later, I installed RedHat 5.1 on the new PC, and it became dual-boot.

Subsequent years saw me switch to RedHat 7.2, 8.0, and then to Debian. I also had MS-Windows 2000 (in another hard disk).

One day, the PC’s motherboard died and I was forced to upgrade to a new one, with clock frequency beyond 1GHz. The MS-Windows 95 ceased to operate, and MS-Windows 2000 was problematic. Linux booted on the new motherboard without having to make any modifications or installations whatsoever. This was when I abandoned MS-Windows altogether and switched to Linux fulltime.

Over the years, I did not need to rebuild my PC’s Linux hard disk due to malware. I did rebuild it due to switching to new versions of RedHat and then Debian. As a proof, I present the fact that my ICQ number is still 8-digit long.

A Proven Free Software Business Model

Monday, September 10th, 2007

Companies like MySQL, RedHat and Zend (trademarks belong to their owners) make a lot of money from Free Software. This indicates that they have a Free Software business model, which really works. This is interesting, because when people discuss Free Software business models, usually there is a lot of handwaving. There are assertions, which are left unsubstantiated. However, the above companies found a business model, which really works. This business model goes as follows.

  1. If you are hobbyist and make no money from our software, then our software is free for you.
  2. If you make little money from our software, then our software is free for you, too.
  3. If you make a lot of money from our software, then you pay for using our software.

The above model works, when it works, due to the following reasoning.

  1. If you do not make money from our software, then you do not have the money to pay us anyway. If we demand money from you, you just stop using our software and switch to another hobby, in which our software is not needed. We prefer that you use our software, even if we get no money from you, due to the same reason Microsoft tolerated software piracy as a means for conquering a market for their software. We want you to find more uses for our software. We want you to debug it. We want you to contribute improvements to our software. On the other hand, you cannot make more money by having our software optimized for your environment, so you do not need support from us.
  2. If you make little money from using our software, then we would like to have a cut from your profits, as well. However, we cannot justify the costs involved in collecting from you. For this, we need to sign contracts, install licensing infrastructure, enforce licenses, incur badwill, and even support you if our licensing mechanism causes you problems. Therefore we would not collect money from you. However, we win from your using our software due to the same reasons as we would win from people who make no profit from using our software. There is also the chance that one day you will become a big business; or even come to make our software a critical part of your business infrastructure. Then the following applies.
  3. If you make a lot of money from using our software, then you have an interest in having the software work all the time. You want any bugs to be fixed promptly. You want support in optimizing the software. You can afford to pay us a lot of money, because the software makes and saves you much more money when it works and when it works with you in a smooth way. Therefore you would sign a support contract with us. Since big money is on stake, we can afford the transaction costs involved with collecting the money. If your optimizations and customizations are your trade secret, we license our software to you using a proprietary license.

A consequence from the above thinking is that not every software package can take advantage of this business model. For this, the software package must have the following attributes:

  • Be useful for both private individuals, small businesses and big businesses.
  • Be “tinkerable” i.e. facilitate development of enhancements and add-ons by individuals with bright ideas.
  • Be such that optimizations and adaptations to meet special needs would yield significant profits or savings in the right place.
  • Be critical to the functioning of some of the big businesses, which use it.

Typically, such software is dual-licensed, usually under the GPL and a proprietary license.