So I start with the very first pure Python 3 package I find on PyPI (stagger). 3to2 works mostly fine, except for the class decorations used in id3.py: it doesn’t add a newline to the end of the assignment statement that implements the decorations in Python 2 syntax. ”OK”, I thought, “I’ll make a bug report and add a few more test cases to the test suite for fix_classdecorator.” Except there was no test suite for fix_classdecorator.
The initial bug is simple: I forgot to (and/or simply didn’t know that I should) wrap the assignment statement in a simple_stmt with a newline afterwards. But after adding a few test cases (several, actually), it turns out that the fixer wasn’t really doing the language feature much justice: it indiscriminately threw out anything in parentheses (and the parentheses themselves) after the name in the @decorator.
*sigh*
I’m going to resolve my phone problem today and then start hunting for code for 3to2 tomorrow. Sorry about the big delay period, I’ve just been trying to keep my mind off of my issues.
So, I lost my Nexus One about a week ago, and I’ve been stressing out about it really bad. After tearing apart my house only to find out that it’s not here, I just have to assume that it’s gone. I haven’t been able to sleep right or think straight since.
I only had it for less than a month…
I think I’ve been a little too cautious about version naming. The “beta” tag is rather unnecessary at this point. I have rolled in a couple of changes and released the first non-alpha, non-beta version.
I wanted to do this before I started with bug hunting to emphasize the fact that lib3to2 is usable, though incomplete.
Today, I start hunting down native Python 3 code to test out lib3to2. It’s quite a bit earlier than the midterm evaluation, which was my target date to start this, but I don’t think that anything else I could do in the meantime would have a better impact than this.
For the next week, I will be searching for Python 3 code, running 3to2 on it and testing both versions to try and find as many lib3to2 bugs as possible. Unless the fix is obvious and quick, each bug will go into the tracker and a test case will be written. This accomplishes a few things:
- Most obviously, it seems like the best way to find lib3to2 bugs, including failure to warn when a fix is not made.
- Filling up the tracker seems to me like the best way to document what works and what does not.
- Less subtly, this will help figure out the popularity of certain Python idioms that are prohibitively difficult to refactor with lib3to2, such as “from urllib import request”. If it seems like every single large package uses such an idiom, it might be worth it to re-evaluate the decision not to refactor that.
As an aside, I consider anything not fixed that emits a proper, descriptive message not to be a lib3to2 bug. If a fix would be possible within the framework of the lib2to3 API, then it could be considered for an “enhancement” entry on the tracker. I’ve probably written that before, but it is relevant here again.
I’ll start with the Python 3 standard library, then work my way to PyPI, then I’ll do some Google Code Searching.
I am proud to announce the first beta version of lib3to2. Links:
3to2 for 2.7 (pypi)
3to2 for 3.1 (pypi)
Features of this release:
- Development on lib3to2 is now done in Python 3, and 2.x releases are provided by backporting with 3to2 itself.
- fix_imports2 is now mostly complete. While more testing needs to be done, it is highly representative of what it will look like when I make a full release.
- The 2.7 release will actually run on versions as far back as Python 2.5, if a sufficiently recent version of lib2to3 is provided (e.g., from the Python 2.7 source)
The issue tracker can be found on bitbucket. Please post any bugs that you find with this release.
This week hasn’t been a great week as far as coding hours go; I’ve just been obnoxiously distracted. From the compulsion to get Brawl Minus working on Dolphin to playing with my new Nexus One, I haven’t worked the amount of hours that I should be. I’ll try to do better next week.
In spite of that, I did manage to reorganize the test cases in fix_imports2 and even get some of them to pass. I also optimized the pattern to match Python code that needs fixing; this results in a huge speed increase when running fix_imports2, as evidenced when running the tests.
Specifically, fix_imports2 now fixes cases of ”from spam.ham import eggs” (no “as bacon” or multiple comma-separated modules yet) and “from spam.ham import *”. The framework is already there to extend this to the rest of the “from spam.ham import …” cases, which will possibly end up being done today, or maybe not. At this point, it looks like very reasonable to expect “from spam.ham import …” imports to be done by the end of next week.
“import spam.ham” variants along with usage will be the next subtask after that. I think that, from a pragmatic standpoint, a better approach than the one I’ve been considering would be just to change “import urllib.request” to “import urllib2, urllib”, then refactor uses of “urllib.request” on a case-by-case basis.
My previous approach was, upon encountering “import urllib.request”, to find all references to “urllib.request” and refactor them, then import “urllib2″ and/or “urllib” as used. If only names from “urllib2″ were used, then only “urllib2″ would be imported.
This way is overly complex and adds serious mental strain. I originally thought to try it this way because that’s closer to the way that Python logically ends up running such code: it sees “import urllib.request” (or “import urllib.request as billy”) and binds that name in the namespace. A change to that import statement (and the name it binds) will affect everything that references that specific name, and all references must be changed in tandem.
In theory, this is an approach suitable for handling “import urllib.request as subtle”, and I can’t think of a better way to handle that case. However, I’ve already determined that binding a builtin module to a new name adds too much complexity and/or ambiguity (if “import urllib.request as subtle” needs both “urllib” and “urllib2″, which one gets the name “subtle”? should lib3to2 just pick an arbitrary similar [should it even be similar? how similar?] name for the module that does not? why rebind the second one to begin with? follow-up: why rebind the first one if the second one doesn’t have a good reason to be? should it be acceptable to replace this with “import urllib2 as subtle” when only names from “urllib2″ are used, but “import urllib, urllib2″ in other cases? what about third-party usage of the name “subtle”?). So, that reasoning no longer applies. The same argument can be applied to “from urllib import request”, as it is essentially the same as “import urllib.request as request”. In the end, I was so committed to that idea, that I did not take seriously enough the idea of refactoring the import statement and usage separately.
I think I will approach the next part of fix_imports2 (after “from spam.ham import …” cases are done and slightly more thoroughly tested) assuming that “urllib.request.urlopen” actually means the name “urlopen” from the builtin “urllib.request” module that has already been imported, and that “import urllib.request” means to import everything that proviles “urllib.request” functionality, even if it is never referenced later.
With that finished, fix_imports2 should have the level of completion that the other fixers have.
So I’ve started the rewrite of fix_imports2. Here’s what I’ve done so far:
- Condensed the list of py2k module members to only the documented ones. If I removed any important ones, let me know in the comments here.
- Added a few helper functions that will be integral to development.
- Added more tests, though I will be adding more before I end up implementing enough to pass any of the current ones.
The goal of fix_imports2 is to replace the import and usage of a single module with the import and usage of several. This has a bunch of implications. During the process of adding more tests and thinking about all the ways that you can import stuff in Python, I thought of two issues that I would like to share with the rest of the class. I consult The Zen of Python to try to determine which is the “right way” and which is the “wrong way” to proceed, even though the whole module itself breaks one of the tenants (“in the face of ambiguity, refuse the temptation to guess”).
- “from http import server” and “import http.server as something_else” bind to a single name what could turn into multiple modules. fix_imports2 will attempt to disambiguate this. In the best case, this will end up in one of two ways:
1) Intentionally or accidentally, the code only makes use of members of the py3k module provided by a single py2k module, and it is possible to keep that name, or
2) The code makes use of names from the py3k module that are provided by multiple different py2k modules, and it is not possible that the one name imported will be able to be used for all of them.
The right way to fix 2) (“explicit is better than implicit”) is to bind neither module to the original name and replace each usage of the original name with the relevant one.
This turns 1) into an even more special case (“special cases aren’t special enough to break the rules”). I believe that the “right way” to resolve this is to remove the special bindings altogether (“there should be one– and preferably only one –obvious way to do it”).
- importing standard library modules in a class namespace is really, really hard to deal with. Exposing the name “http.server”, or a simple renaming thereof, to methods and to derived classes means the same thing as the last bullet: you’re ascribing to one name what could end up being the amalgamation of multiple names. I’m talking about this:
class A(object):
import http.server
For all modules that fix_imports2 deals with, don’t do this. It will cause errors if and only if (“errors should never pass silently”) a derived class references “http.server” in the A’s namespace, and this could be in another module. Plus, if I did implement in fix_imports2 a fix for this, it would involve going to every class derived from class A and checking every member function, along with every piece of code that uses instances of A or classes derived from A, which sounds prohibitively complicated and error-prone. And again, this would not fix code in other modules that reference A.http.server.
I’ve done a lot of thinking about fix_imports2 (on-and-off for almost a year, and for the past couple of weeks). I keep coming back to one single thought: using “from urllib.request import spam, ham, eggs, …” in the outermost indent level is the best way to ensure that everything will work properly after fix_imports2 is done with it (“simple is better than complex”). That case can be fixed independently from code that make use of a, b, c, and d by a simple pattern and transformation, the way all other fixers are done.
All other cases require extra thinking (both by me and by the fixer) in varying degrees. ”import urllib.request” requires looking for code that uses “urllib.request.something” and giving feedback to guide transformation of “import urllib.request” into something else.* Conditional imports require following a dedent out of the suite but probably will not end up breaking anything, and “from urllib import request” and “import urllib.request as spam” are doable in a similar way as “import urllib.request”, if you don’t mind those different names going away. Importing modules into a class namespace is going to cause problems, and fix_imports2 will probably never handle this solely because of the complexity involved in getting to a solution that still will cause errors.
* Or, it could just replace “import urllib.parse” with “import urllib, urllib2, urlparse” separately from code that uses urllib.parse, and just replace each usage piecemeal, potentially leaving those modules in the namespace unused. Actually, that sounds like a really good idea. Maybe I’ll do that next.
So the main thing to take away from this is that “from urllib.request import a, b, c, d” in outermost indent level is the best way to write code that you want fix_imports2 to handle. And if I am to assume that reading this is a strong indicator that you write code that you want 3to2 to handle, then I imagine that this may actually be a relevant takeaway.
Sorry it took so long for me to post this.
This year, I am going to exclusively develop lib3to2 with Python 3. It’s mature enough now to provide its own 2.x versions. I will also try to make it play nicer with older versions of Python, even if they might not ship a compatible lib2to3. This will be my focus starting Monday, when coding starts.
After that, I will be working on rewriting fix_imports2, a non-trivial (in my opinion) omission that keeps me from saying “use 3to2, set it and forget it!” I expect this to be complete by the midterm evaluation time (“Checkpoint 1″).
Beyond that comes the most exciting part of this project: functional testing. At the completion of Google Summer of Code 2010, I intend for lib3to2 to be able to deliver under widespread use. To this end, unit testing does not cut it. My plan for doing this is:
- Find real-world Python 3 code. I am also prepared to craft some by auditing parts of the fixer suite that I feel are weak or prone to poor interaction with other fixers.
- Translate that code using lib3to2 and test the resulting code’s functionality in Python 2.7 against the original code’s functionality in Python 3.x
- File appropriate tracker entries as soon as bugs are discovered, even if a fix seems trivial, and write automated tests for them at the same time.
After Checkpoint 1, no less than one week (until “Checkpoint 2″) will be dedicated to the above process. Unless lib3to2 exceeds all reasonable expectations at this point, this should fill up the tracker with issues to guide the rest of the process; predictions from this point on are very difficult to make.
If lib3to2 does turn out to perform exceptionally well under my stress testing, within reasonable boundaries, I will move onto finding a single pure Python 3 project and backport it with lib3to2. I will document that experience on this blog and improve lib3to2 during the process as is evident at the time. This is an unlikely scenario.
Applications for GSoC 2010 are open. I’m submitting a proposal for pushing lib3to2 further along.
Specifically, my proposal will focus on testing code I’ve already written, (re)writing some subpar fixers, and exploring lib3to2′s behavior on wild type Python 3 code. Purely 2to3′d code doesn’t count here, but code that was ported to Python 3 using 2to3 and then development continues in Python 3, what Martin v. Löwis aptly called a “burn-your-bridges fashion”, is fair game.
Edit: Woo-hoo, it was accepted! More involved post once I finish up final exams.