Joe Amenta's Blog

August 28, 2009

3to2 0.1 alpha 1 released

Filed under: 3to2 — Tags: , , — AirBreather @ 3:23 am

After finishing the project for the Google Summer of Code 2009, I have made the first alpha release.  You can get a gzipped tarball at http://bitbucket.org/amentajo/lib3to2/downloads/3to2_0.1-alpha1.tar.gz.  Please file bug reports at http://bitbucket.org/amentajo/lib3to2/issues/.  A Mercurial branch has been made for a snapshot of this release at http://bitbucket.org/amentajo/3to2-01-alpha-1/.   The code is in basically the same state that it was in on the last day of the Summer of Code, except I have made fix_imports and fix_imports2 explicit-only.  These two fixes were rather rushed, have inadequate test cases, are particularly fragile, and are known not to work properly with some imports in suites.  They will, however, perform their basic function for the simple case of imports at the top of the module, no levels of indentation in.  Another shortfall for fix_imports2, in particular, is that if a name is invoked from a module that is new to the py3k branch, the module name will be called “None” in the refactored code.

With the exception of those two fixers, everything should work relatively smoothly.  The print fixer still just does “from __future__ import print_function”.  The best thing that could happen for the project at this point would be for people to start using it on their actual 3.x code and manually verifying the results.  I would love to hear via comments / e-mails the results of people actually using the program, whether they be good / bad / ugly (but especially ugly).  I will gladly accept samples of code that do not translate as expected via e-mail, if you do not wish to go through the process of filing a bug report.  These samples will help speed along the next release.

Note that 3to2 must be run with python2.7, as the version of 2to3 that comes with python2.6 is not recent enough.

August 8, 2009

Things 3to2 does not accept

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 9:50 pm

It’s time now for “what not to do if you want 3to2 to work”.  The following is a list of things to avoid if you want 3to2 to work on your code (I’ll try to keep this up-to-date):

(Pay particular attention to imports if you import names referenced in PEP 3108)

  • Incorrect syntax — If a Py3k interpreter fails on part of your code, don’t expect 3to2 to know what to do.  Much of 3to2 is already picky enough without running it on things that simply don’t work in the first place.
  • Third-party module references — If it doesn’t ship with Python, 3to2 won’t touch it.  The intention is for 3to2 to do an adequate job automatically refactoring those modules so that the distributor will ship a 2.x version more easily.
  • some_name = print; some_name(some_stuff) — Though technically this works for now, when the print function syntax is actually converted, it will not work.
  • import [metamod] (For example, import urllib.request) — A work in progress, name-imports of modules made up of multiple 2.x modules merged into one  in the 2->3 gap will not work.  Use (from metamod import [names]) for now. The support for this one is getting better, but the bugs are still being worked out.  Currently, you are golden if you only import at the top-level namespace of a module and always use at least one name from each module you import in this way.  Also note that imports of this type that are never referenced with an attribute later are automatically deleted.  Another “bug” with this is that if a module is imported, then an attribute of that module is referenced for which there never was a 2.x version, the name of that module after 3to2 becomes None without warning.  This is visible at least in urllib.parse, which has some bytes references.
  • import [metamod] as [name] — This will probably not work for a while, as the code is independently scanned for references to “metamods”, so please avoid using this form of imports.
  • from [meta] import [mod] as [some_name] (For example, from dbm import gnu as zombie) — The underlying 2to3 code for (from meta import mod) disallows matching (as some_name), possible fix in the future.  For now, refactor your code not to have that “as some_name”
  • Any feature from the Py3k branch that has no syntactical equivalent in Python 2.x — 3to2 is here to refactor your syntax, not add features to Python 2.x.  Warnings are in place for if your syntax is only valid in 2.6 or greater.
  • Docstrings at the beginning of modules — If a future_stmt needs to be added, it is added before the docstring.  This will be fixed, but be wary of it for now.
  • Encodings that do not work for 2to3 — I’m not sure what these encodings are, but if you are having problems with it, please use chars valid in ascii only for now, until I can figure out if this is something that can be fixed.  If anyone has any information that might be helpful, please comment or e-mail.  More information on this: Even though py3k is more Unicode-friendly, 3to2 runs in python 2.x, so expect errors if you use an encoding that didn’t work in python 2.x.  Even a character in a comment is not safe from crashing the parser.  (See py3k/Lib/getopt.py)
  • import [module] as [something_else] — This will probably never be fixed; 2to3 imposes this rule, so I will do the same for 3to2. (This is incorrect, please ignore.)
  • from [metamod] import * (For example, from urllib.request import *) — Soon to be working, 3to2 will not succeed on imports of this type from a module that is made up of multiple 2.x modules merged into one.  Use (from [metamod] import [names]) for now. This syntax will work now, but be careful about name conflicts related to having all the extra names in the namespace from importing * from all modules that the “metamod” depends on.
  • from [module] import (names, more names, more names, some more names) — Status unknown, this failed for me.  Use the line-continuation character \ at the end of lines of this type. This bug has been fixed.

Also, note that the status of fix_imports is “incomplete”.

August 7, 2009

Fixer for division has always existed.

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 2:39 am

It occurred to me this evening that fix_division.py has been in the lib3to2/fixes folder for a good, long time.  I am so glad I didn’t completely overlook one of the “major” backwards-incompatible changes, only to remember it while musing about an unrelated topic.

Also, we have always been at war with Eastasia.

July 30, 2009

Crackdown time

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 12:51 am

I’ve been pulled away from GSoC so much over the past few weeks that I have actually fallen a little behind.  So, over the next few days, I plan to finish everything that is left.  Starting when I finish one last physics problem (which is, at the latest, midnight Friday) (approximately 24 hours from this post), I plan to dedicate the week-end to nothing but coding and the personal necessities.  My goal is to have the following completed by the end of the week-end:

  • done.. fix_imports also fixes the usage of the names imported (at time of writing, it only fixed the import)
  • done.. most of fix_imports2, a fixer that deals with multiple modules that were merged into a single module (at the minimum, the “from x import y” component)
  • done.. fix_features, a fixer that will warn the user which code cannot be converted syntactically because of new py3k features
  • not done.. (time-permitting) fix_print works for real (currently only adds “from __future__ import print_function”, which caps the python2.x version at 2.6 if there is a print)

I really think I can pull this off this week-end.  It will take just the right combination of Mountain Dew and my ADD meds.  And a DND message.

Also, barring catastrophic failure of reaching those goals, that will bring me very close to the end of the project.  The ample amounts of ahead-of-schedule time I had at the beginning have been brought in check by ever-increasingly difficult physics homework and the more-than-occasional free time to spend with Kristina.  By August 10 (suggested pencils-down date), barring acts of $DEITY, I should be prepared to do some more documentation, clean up the code, make sure my test suite is thorough, etc. to be prepared for the August 17 deadline.

Edit:

fix_imports2 should now be done, as far as (from x import y) is concerned.  It requires a heck of a lot more testing than my few test cases, though.  In any case, my weekend goal is now complete! (with the exception of fix_print refactoring, as I did not end up having the extra time to do it)  Final challenges are:

  • fix_imports2 needs to work for (import x) and (from x import *) imports.
  • The code needs to be more readable (ongoing process).
  • Need to remind myself what I needed to do with itertools. Fixed in changeset e343ee635f0b
  • Add in 3.1 features to fix_features.
  • Test cases, test cases, test cases!!!  If anyone can help with this, I would really appreciate it!  Even just a link to a really comprehensive piece of py3k software to look at would help out tremendously.

July 26, 2009

Milestone 1.5: Shoot, it doesn’t work right.

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 9:00 pm

Well, this sucks.  I found out, while trying to refactor stdin, that my testing framework does not test fixer interactions.  It started out simply: refactoring “print(‘omg’)”  just turned into “print(u’omg’)” (without adding from __future__ import print_function).  That was partially fixed.  However, this led me to test others, and there are other cases where I get the wrong output.  Fixes are on the way.

Edit: It works fine now; I just have to stay away from self.tree.was_changed.

July 14, 2009

Milestone #2: Core finished

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 2:48 pm

And now, another milestone.  With set_literal being finished, I have now completed the 3to2 version of all 2to3 fixers that don’t deal with imports.  Now, all that’s left (which is really the hardest part) is the standard library (stdlib).  This will be split into two parts (one is easy, one is hard).

Part 1 (easy) is renaming every library module that was simply renamed in py3k.  This is easy, because all I have to do is change one line, e.g.: “_pickle” will be changed into “cPickle”, everywhere, consistently.

Part 2 (hard) involves modules in py3k that are amalgamations of multiple modules (or parts from multiple modules) in python2.x.  This is hard, because it will involve writing a hell of a lot of filters to determine whether (for example) the correct fix for “http.server” would be “BaseHTTPServer”, “CGIHTTPServer”, or “SimpleHTTPServer”, or some monstrous blend of all of them (would it ever?).

A (complete?) list of what the rest of my project will have to undo can be found in PEP 3108 — Standard Library Reorganization.  (There are many more things listed in this PEP than what I need to deal with: much of it involves things that were completely removed from the language for py3k, so I don’t have to deal with those)

As posted before, I will not yet be tackling new things added in py3k with no backwards-compatible syntactical equivalent.  That will be implemented first with warnings that point to what cannot be fixed (GSoC completion will require at least this by my current standard), then the option to raise the minimum version of python2.x supported, then (selective-case only) implementation of any features backported by third-parties that will require either the presence of that module on the host machine, or will fallback to that module distributed with lib3to2 (will require pythonx.y setup.py install).

June 16, 2009

Metaclass fix coming… I swear it!

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 3:49 pm

I don’t know why, but I’ve been hitting roadblocks all day about fix_metaclass.  I’m not going to commit the trash I have, because I will probably start over completely on it.  I think I’m trying to rely too much on the pattern compiler; I should probably be mimicking 2to3′s fix_metaclass and use the PATTERN to match all classdefs, then call some of my own functions to narrow it down… but I keep trying to mess around with the PATTERN to match a class with a metaclass; I know it’s possible, I just can’t get it to work.  So tonight I’ll start over on it, and hopefully it’ll be done tomorrow.

June 9, 2009

“New features in 3.0 and 3to2′s role”, or, “When it is appropriate/necessary to accept 2.6+ instead of 2.5″

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 3:58 pm

Python 3.0 added a nice bucketload of new features that simply have no syntactical equivalent in Python 2.5, (See Appendix for a list) (table coming soon!).  Up to this point, I have not touched anything that requires more than a syntactic change.  However, this is an important topic, because developers using 3.0 are encouraged to use all of its new features and syntax, and the goal for any code refactoring tool is (or should be!) to work universally for all cases.  (See reasoning why fix_memoryview is required explicitly, even though not using the fixer would result in a SyntaxError in 2.5/2.6 if a memoryview is encountered)  Thus, I will address the feature issue here.

This blog post applies to all features present in any py3k branch of Python that have no backwards-compatible syntactic equivalent (“BCSE”), or an equivalent representation using syntax that already existed in Python 2.5 without an unreasonable amount of helper code, e.g., that recreates data types that are built-in to py3k (said features herein named “New Features”).   The end-game goal for lib3to2 is to make all code using any built-in functionality of all py3k branches work on python2.5 with minimal (or no) user intervention when 3to2-ified.  Obviously, this is a very difficult goal to achieve, given that new features added in py3k are always being added, and there is already enough to occupy a huge chunk of my time, projecting out towards Christmas 2009.  For the purposes of having a pretty good tool by the end of GSoC 2009, my goal is to make 3to2 refactor syntax where there is a BCSE, and (case-by-case) raise warnings that tell the user their options.  An example warning would look like this (using “memoryview” as the sample case):

Warning: memoryview does not have an exact syntactic equivalent in Python 2.5, and the fixer for memoryview -> buffer is disabled by default.  You may either enable the “memoryview” fixer and accept the consequences of using buffer (which only allows a subset of the operations available to memoryview), manually refactor your code to avoid using memoryview, or ignore this message.  Ignoring this message limits your code to work on the following Python version (at minimum): Python 2.7

That’s obviously rather verbose.  A short version might look like:

Warning: memoryview encountered, fixer disabled.  Enable fixer, manually refactor, or ignore (2.7 minimum required).

Comments on this blog post are welcome: I welcome and request alternatives to this structure.

Appendix

Features, comma-delimited, in the format feature(minimum python version supported): bin (2.6), str.format (2.6), multiprocessing (“2.5″)*, io (2.6), abc (2.6), numbers  (2.6), memoryview (2.7).  3.1 New Features not included.

* pypi has backports for the multiprocessing package back to 2.4, but it is not built in, so the syntax would require some additional fluff to make it work, which is a future goal, not for now.

June 5, 2009

Milestone: print function

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 2:47 am

I’m pretty pumped.  I just “finished” the print function fixer for 3to2.  I put “finished” in quotes (and only “working” instead of “complete” under its status text on the previous post) because it’s a sort of a placeholder right now: all it does is add “from __future__ import print_function” at the top of a refactored module that uses print(), so this will only successfully give you code certified for 2.6 and up, not the 2.5 branch as originally promised.  But still, this is a huge leap: now, most (if not all) of my own little py3k modules are working unmodified after running them through 3to2!  This is a good time for me to stop and explain how to get 3to2 working on your computer for those of you that do not know:

Find an SVN client (TortoiseSVN for Windows, subversion for others) , and check out (into separate folders) both http://svn.python.org/projects/sandbox/trunk/2to3 and http://svn.python.org/projects/python/trunk/.  Then, copy the folder (the entire folder!) lib2to3 (from the root of the 2to3 folder) into the Lib/ folder (from the root of the other folder).  Then, look up a great tutorial for how to build Python on your system (sorry, I’m not here to tell you how to do that… just how to get my stuff to work).

Once you’ve built your own custom Python build (with the most up-to-date version of lib2to3, that this project relies on [and I contributed a line or two of code here and there to make it work for my purposes]), then it’s time to get 3to2.  Find a Mercurial client (TortoiseHG for Windows, mercurial for others), and clone (into yet another separate folder) http://bitbucket.org/amentajo/lib3to2/.  Run “python2.7 setup.py build” (python2.7 being the path to your python2.7 executable you built earlier on), and then “python2.7 setup.py install” (you need administrator / root privileges for the latter command).  This will put “3to2″ on your sys path, and you will be able to simply run 3to2 like this:

3to2 /path/to/my_file.py # obviously, /path/to/my_file.py stands for a (relative or absolute) path to an existing py3k file that you need to refactor. (This will also work on an entire directory that contains .py files)

This will show you (in unified diff format) the changes that 3to2 will make to your file.  You can either redirect the stdout to a .patch file then apply that patch the normal way, or run 3to2 like this:

3to2 -w /path/to/my_file.py

with the -w command, and it will back up your original files, then fix them with the fixer suite.

One note: there is one fixer so far that must be run explicitly.  That is memoryview, which changes all memoryview() calls into buffer() calls.  To enable this, you must run 3to2 with the “-f memoryview” option.  There are two reasons for this.  The first is that buffer objects do not support all of the same methods that memoryview objects do, so if this fixer were enabled by default, it would fix code perfectly for some cases, and not for others.  The second reason is that memoryview was backported into python 2.7, so if the user wishes to run their fixed code on a python2.7 platform, they can benefit from keeping their memoryview() calls alone.

(Of course, if you’re using memoryview() or buffer(), you probably already know or can easily figure out that you can use -xmemoryview to turn that fixer off ;-) )

Having reached this milestone, I look both back and ahead: it feels like I’m almost all the way to a finished product.  What’s left as a high priority at this point is exception stuff, various import stuff, dict methods, the stdlib, magical metaclass mayhem, sys.intern, numliterals (0o0777, 0b011101, et al.), reduce, and set literals… plus with_statements, annotations, floor division, and some other stuff I’m too tired to think of, but was added in 3.0 that had no 2to3 fix.  Then, fix the print fixer to work for 2.5, i.e., actually perform logic and move stuff out of the function into a true print statement.  After that, I want to rework the stuff that was changed from lists into iterators in py3k to remove a direct list() call, so that list(range(…)) will (more) correctly be fixed into range(…) rather than list(xrange(…)), e.g.  After that’s done, I feel like more robust test suites are in order (for both 3to2 and 2to3).

OK, writing it all down like that in list form makes it look longer than it really is.  This doesn’t seem like it will take until the end of the Google Summer of Code program, at least not to me.  The stdlib stuff will take a good week or so, most of which will involve me learning what the modules do so that I can have a grasp on how they will be used and how they changed from 2.x to 3.x.  The print function -> statement is probably two or three days.  Exception stuff will probably take me another week, for the same reason that the stdlib stuff will: I have never used any more extended form of exception handling than “raise Exception()”, and “except Exception:”… so I don’t have much experience to go on.

So it looks like I just may get to the “Things to do if I finish early” part of my proposal.  That is, assuming I keep up the pace. :-)

June 1, 2009

Organization time.

Filed under: 3to2,GSoC 2009,Google Summer of Code,Work — Tags: , , — AirBreather @ 4:57 pm

Here is a table consisting of the 2to3 fixes, whether they need a corresponding 3to2 fix, and their status (3to2 fix complete / doesn’t need a 3to2 fix, working (could do better, but the syntax is correct), incomplete (does not fix the syntax yet, but that fix is in progress), or not started (I haven’t touched it yet)):

2to3 fix 3to2 needed status
apply no unnecessary
basestring no unnecessary
buffer yes complete
callable no unnecessary
dict no unnecessary
except yes complete
execfile no unnecessary
exec no unnecessary
filter yes complete
funcattrs yes complete
future no unnecessary
getcwdu yes complete
has_key no unnecessary
idioms no unnecessary
imports yes incomplete
input no unnecessary
intern yes complete
isinstance no unnecessary
itertools_import yes complete
itertools yes complete
long yes complete
map yes complete
metaclasses yes working
methodattrs yes complete
ne no unnecessary
next yes complete
nonzero yes complete
numliterals yes complete
paren no unnecessary
print yes working
raise yes complete
raw_input yes complete
reduce yes complete
renames no unnecessary
repr no unnecessary
set_literal yes complete
standarderror no unnecessary
sys_exc no unnecessary
throw yes complete
tuple_params no unnecessary
types no unnecessary
unicode yes complete
urllib yes incomplete
ws_comma no unnecessary
xrange yes complete
xreadlines no unnecessary
zip yes complete

#Out of fear of creating a monster post, the new features in 3.0 will get their own post above.

This particular post will be the post to watch for the summary of the status of lib3to2′s progress.

Older Posts »

Powered by WordPress