I am proud to announce the first beta version of lib3to2. Links:
3to2 for 2.7 (pypi)
3to2 for 3.1 (pypi)
Features of this release:
- Development on lib3to2 is now done in Python 3, and 2.x releases are provided by backporting with 3to2 itself.
- fix_imports2 is now mostly complete. While more testing needs to be done, it is highly representative of what it will look like when I make a full release.
- The 2.7 release will actually run on versions as far back as Python 2.5, if a sufficiently recent version of lib2to3 is provided (e.g., from the Python 2.7 source)
The issue tracker can be found on bitbucket. Please post any bugs that you find with this release.
This week hasn’t been a great week as far as coding hours go; I’ve just been obnoxiously distracted. From the compulsion to get Brawl Minus working on Dolphin to playing with my new Nexus One, I haven’t worked the amount of hours that I should be. I’ll try to do better next week.
In spite of that, I did manage to reorganize the test cases in fix_imports2 and even get some of them to pass. I also optimized the pattern to match Python code that needs fixing; this results in a huge speed increase when running fix_imports2, as evidenced when running the tests.
Specifically, fix_imports2 now fixes cases of ”from spam.ham import eggs” (no “as bacon” or multiple comma-separated modules yet) and “from spam.ham import *”. The framework is already there to extend this to the rest of the “from spam.ham import …” cases, which will possibly end up being done today, or maybe not. At this point, it looks like very reasonable to expect “from spam.ham import …” imports to be done by the end of next week.
“import spam.ham” variants along with usage will be the next subtask after that. I think that, from a pragmatic standpoint, a better approach than the one I’ve been considering would be just to change “import urllib.request” to “import urllib2, urllib”, then refactor uses of “urllib.request” on a case-by-case basis.
My previous approach was, upon encountering “import urllib.request”, to find all references to “urllib.request” and refactor them, then import “urllib2″ and/or “urllib” as used. If only names from “urllib2″ were used, then only “urllib2″ would be imported.
This way is overly complex and adds serious mental strain. I originally thought to try it this way because that’s closer to the way that Python logically ends up running such code: it sees “import urllib.request” (or “import urllib.request as billy”) and binds that name in the namespace. A change to that import statement (and the name it binds) will affect everything that references that specific name, and all references must be changed in tandem.
In theory, this is an approach suitable for handling “import urllib.request as subtle”, and I can’t think of a better way to handle that case. However, I’ve already determined that binding a builtin module to a new name adds too much complexity and/or ambiguity (if “import urllib.request as subtle” needs both “urllib” and “urllib2″, which one gets the name “subtle”? should lib3to2 just pick an arbitrary similar [should it even be similar? how similar?] name for the module that does not? why rebind the second one to begin with? follow-up: why rebind the first one if the second one doesn’t have a good reason to be? should it be acceptable to replace this with “import urllib2 as subtle” when only names from “urllib2″ are used, but “import urllib, urllib2″ in other cases? what about third-party usage of the name “subtle”?). So, that reasoning no longer applies. The same argument can be applied to “from urllib import request”, as it is essentially the same as “import urllib.request as request”. In the end, I was so committed to that idea, that I did not take seriously enough the idea of refactoring the import statement and usage separately.
I think I will approach the next part of fix_imports2 (after “from spam.ham import …” cases are done and slightly more thoroughly tested) assuming that “urllib.request.urlopen” actually means the name “urlopen” from the builtin “urllib.request” module that has already been imported, and that “import urllib.request” means to import everything that proviles “urllib.request” functionality, even if it is never referenced later.
With that finished, fix_imports2 should have the level of completion that the other fixers have.
So I’ve started the rewrite of fix_imports2. Here’s what I’ve done so far:
- Condensed the list of py2k module members to only the documented ones. If I removed any important ones, let me know in the comments here.
- Added a few helper functions that will be integral to development.
- Added more tests, though I will be adding more before I end up implementing enough to pass any of the current ones.
The goal of fix_imports2 is to replace the import and usage of a single module with the import and usage of several. This has a bunch of implications. During the process of adding more tests and thinking about all the ways that you can import stuff in Python, I thought of two issues that I would like to share with the rest of the class. I consult The Zen of Python to try to determine which is the “right way” and which is the “wrong way” to proceed, even though the whole module itself breaks one of the tenants (“in the face of ambiguity, refuse the temptation to guess”).
- “from http import server” and “import http.server as something_else” bind to a single name what could turn into multiple modules. fix_imports2 will attempt to disambiguate this. In the best case, this will end up in one of two ways:
1) Intentionally or accidentally, the code only makes use of members of the py3k module provided by a single py2k module, and it is possible to keep that name, or
2) The code makes use of names from the py3k module that are provided by multiple different py2k modules, and it is not possible that the one name imported will be able to be used for all of them.
The right way to fix 2) (“explicit is better than implicit”) is to bind neither module to the original name and replace each usage of the original name with the relevant one.
This turns 1) into an even more special case (“special cases aren’t special enough to break the rules”). I believe that the “right way” to resolve this is to remove the special bindings altogether (“there should be one– and preferably only one –obvious way to do it”).
- importing standard library modules in a class namespace is really, really hard to deal with. Exposing the name “http.server”, or a simple renaming thereof, to methods and to derived classes means the same thing as the last bullet: you’re ascribing to one name what could end up being the amalgamation of multiple names. I’m talking about this:
class A(object):
import http.server
For all modules that fix_imports2 deals with, don’t do this. It will cause errors if and only if (“errors should never pass silently”) a derived class references “http.server” in the A’s namespace, and this could be in another module. Plus, if I did implement in fix_imports2 a fix for this, it would involve going to every class derived from class A and checking every member function, along with every piece of code that uses instances of A or classes derived from A, which sounds prohibitively complicated and error-prone. And again, this would not fix code in other modules that reference A.http.server.
I’ve done a lot of thinking about fix_imports2 (on-and-off for almost a year, and for the past couple of weeks). I keep coming back to one single thought: using “from urllib.request import spam, ham, eggs, …” in the outermost indent level is the best way to ensure that everything will work properly after fix_imports2 is done with it (“simple is better than complex”). That case can be fixed independently from code that make use of a, b, c, and d by a simple pattern and transformation, the way all other fixers are done.
All other cases require extra thinking (both by me and by the fixer) in varying degrees. ”import urllib.request” requires looking for code that uses “urllib.request.something” and giving feedback to guide transformation of “import urllib.request” into something else.* Conditional imports require following a dedent out of the suite but probably will not end up breaking anything, and “from urllib import request” and “import urllib.request as spam” are doable in a similar way as “import urllib.request”, if you don’t mind those different names going away. Importing modules into a class namespace is going to cause problems, and fix_imports2 will probably never handle this solely because of the complexity involved in getting to a solution that still will cause errors.
* Or, it could just replace “import urllib.parse” with “import urllib, urllib2, urlparse” separately from code that uses urllib.parse, and just replace each usage piecemeal, potentially leaving those modules in the namespace unused. Actually, that sounds like a really good idea. Maybe I’ll do that next.
So the main thing to take away from this is that “from urllib.request import a, b, c, d” in outermost indent level is the best way to write code that you want fix_imports2 to handle. And if I am to assume that reading this is a strong indicator that you write code that you want 3to2 to handle, then I imagine that this may actually be a relevant takeaway.
Sorry it took so long for me to post this.
This year, I am going to exclusively develop lib3to2 with Python 3. It’s mature enough now to provide its own 2.x versions. I will also try to make it play nicer with older versions of Python, even if they might not ship a compatible lib2to3. This will be my focus starting Monday, when coding starts.
After that, I will be working on rewriting fix_imports2, a non-trivial (in my opinion) omission that keeps me from saying “use 3to2, set it and forget it!” I expect this to be complete by the midterm evaluation time (“Checkpoint 1″).
Beyond that comes the most exciting part of this project: functional testing. At the completion of Google Summer of Code 2010, I intend for lib3to2 to be able to deliver under widespread use. To this end, unit testing does not cut it. My plan for doing this is:
- Find real-world Python 3 code. I am also prepared to craft some by auditing parts of the fixer suite that I feel are weak or prone to poor interaction with other fixers.
- Translate that code using lib3to2 and test the resulting code’s functionality in Python 2.7 against the original code’s functionality in Python 3.x
- File appropriate tracker entries as soon as bugs are discovered, even if a fix seems trivial, and write automated tests for them at the same time.
After Checkpoint 1, no less than one week (until “Checkpoint 2″) will be dedicated to the above process. Unless lib3to2 exceeds all reasonable expectations at this point, this should fill up the tracker with issues to guide the rest of the process; predictions from this point on are very difficult to make.
If lib3to2 does turn out to perform exceptionally well under my stress testing, within reasonable boundaries, I will move onto finding a single pure Python 3 project and backport it with lib3to2. I will document that experience on this blog and improve lib3to2 during the process as is evident at the time. This is an unlikely scenario.
Applications for GSoC 2010 are open. I’m submitting a proposal for pushing lib3to2 further along.
Specifically, my proposal will focus on testing code I’ve already written, (re)writing some subpar fixers, and exploring lib3to2′s behavior on wild type Python 3 code. Purely 2to3′d code doesn’t count here, but code that was ported to Python 3 using 2to3 and then development continues in Python 3, what Martin v. Löwis aptly called a “burn-your-bridges fashion”, is fair game.
Edit: Woo-hoo, it was accepted! More involved post once I finish up final exams.
So I wrote a fishing e-mail just now to python-dev asking for feedback on the idea of including lib3to2 in the Python 3.2 standard library. I’m going to write the PEP regardless (unless the right amount of people say that there is no possible way that lib3to2 will ever make it into the standard library); I just want to hear initial thoughts about the idea in general.
Stabilization of lib3to2 for inclusion in the standard library will involve:
- (Starting and) finishing the rewrite of fix_imports2.py
- Actively hunting down as much native (i.e., not simply 2to3′d) Python3 code as possible, then analyzing how 3to2 translates it
- Stress testing with crafted Python 3 code that should expose hidden bugs that may actually be relevant to real-world use cases (it’s rather easy to mess with 3to2 by screwing around with builtins and name binding… this is not what I’m referring to here.)
And, of course, moving to Py3k. I have entertained the idea of proposing inclusion in the Python 2 standard library, and I disagree with the idea; here’s why.
It’s possible that Python 2 does not get a new release when a new version of Python 3 is released (or ever, but that’s another topic), and thus backwards-incompatible features that can potentially be added to Python 3 do not get accompanying warnings in the Python 2 version of lib3to2, leading to code that refactors without major warnings in Python 2′s lib3to2 but fails at runtime.
There’s also a really good reason to include lib3to2 in the Python 3 standard library exclusively: if lib3to2 is provided with Python 3.x and warns about a complete list of backwards-incompatible Python 3 features that version 3.x includes, then lib3to2 is future-proof. New features in Python 3.y will not affect the correctness of lib3to2 provided with Python 3.x, and 3.y has its own lib3to2 to refactor code, again assuming that lib3to2 warns about a complete list of backwards-incompatible features.
This rationale provides the nail-in-the-coffin for not putting lib3to2 in the Python 2 standard library: when 3.y is released, the Python 2.x version of lib3to2 will become obsolete. Most likely, a user of Python 2.x will find a barrier to upgrading to Python 2.y even if the releases are synchronized indefinitely. That user will, however, need lib3to2 to warn about the 3.y features incompatible with their version of Python. The solution: use the latest lib3to2 for Python 2 from PyPI. This is a lot of trouble to deal with if there is a version of lib3to2 provided in the standard library, so for simplicity’s sake, don’t provide it with Python 2.
Sorry for the long-winded explanation.
OK, after taking a while to think about this and the feedback I got from my e-mail, I think it’s a bad idea at this point in time to push for standard library inclusion. Providing it with Python and freezing the code until the next Python release (apart from minor bugfixes) doesn’t sound like the right move, considering the fact that even if I were to finish everything that I can think of that needs finishing today and released a beta version, I’m not comfortable handing it off so soon to python-dev, basically saying “everything that this is intended to do works to the extent that it is intended to work.” My main issue is this: when writing lib3to2, I found a couple of issues with lib2to3 that required modifying parts of lib2to3 to get lib3to2 to work. Because lib2to3 is in the standard library, people who use 3to2 have only two options: either run it on a version of Python released after 2.6 / 3.0 or manually replace lib2to3 with the one in the sandbox.
3.2 is just too early of a target. I think I’m going to see if I can get a GSoC 2010 project for lib3to2 to finish up the outstanding rewrites, thoroughly test, and finally release a beta version. I can’t predict how it will go after I put a beta out there, but I know I’m going to be more confident in recommending it than in its current state. Right now, the blemish of fix_imports2 is annoying me to no end, and I’m not even sure what proportion of code is impacted by its status as “not working, do not use”.
I’m here in the Hyatt Regency in Downtown Atlanta for PyCon 2010. I’m going to be doing a poster session on Sunday… just finished the first draft of my poster slides.
Edit: Finished it.
Edit2: removed broken links, WordPress’s upload feature isn’t the right thing for my setup.
PyCon is just a couple of weeks away, and I haven’t gotten around to designing a poster for my session yet. I’m open to suggestions for what people would like to see (though I won’t hold out for them, of course).
I also haven’t written any 3to2 code since my last update — I haven’t heard anything from anyone using it yet. I’m still going to stick with the first beta release being when fix_imports2 works, which may be a (legitimate) reason that people aren’t using it enough to have feedback to give about it. I’m not coding as actively right now because of other priorities — but if I get feedback from users, I may be more inclined to prioritize 3to2 higher than it currently is.
I have decided to keep with the idea of having a fix_imports2 separate from fix_imports, so after finishing the usage parts of fix_imports_test, I have declared it finished and ready for general usage. It has replaced the old fix_imports.
fix_imports2 will be another rewrite, so I will go ahead and start on that eventually.
fix_imports_test is almost done – the actual imports are mostly successful, with just a few annoying bugs in how it handles things with commas, like:
import http.client, winreg
and
from http import client, spam, eggs
The next step is getting the actual usage of those imported modules to work.
The next step after that, of course, is the fix_imports2 stuff. I think that this will still be split out into a separate module – otherwise, special cases would pollute the heck out of the code for the other one, and it would make debugging a nightmare.