Follow Us on Twitter
by Peter Galli on June 29, 2009 07:29pm
As Microsoft continues to support and participate in open source communities, the company is again a proud sponsor of the annual O'Reilly Open Source Convention (OSCON), which is being held in San Jose from July 20 to July 24.
In addition to having a booth on the show floor, Tony Hey, the Corporate Vice President for Microsoft External Research, will deliver a keynote address on Thursday July 23, titled "Open Tools and Services on Microsoft Platforms," which will examine the far-reaching changes open research tools and services will have to support every stage of the research process.
Erik Meijer, one of Microsoft's Principal Architects, will also give a keynote talk on Friday July 24 and titled "Fundamentalist Functional Programming."
His talk will argue that fundamentalist functional programming - that is, radically eliminating all side effects from programming languages, including strict evaluation - is what it takes to conquer the concurrency and parallelism dragon.
Following his keynote, Erik is also presenting on using the LiveLabs Reactive Framework to democratize the cloud.
Vijay Rajagopalan, a Principal Architect in Microsoft's Interoperability group, is also giving a talk on Wednesday July 22 in the Product and Services Track, titled "Interoperability - Build Mission Critical Applications in PHP, Ruby, Java and Eclipse Using Microsoft Software & Services."
During his presentation, Vijay will talk about how Microsoft has delivered multiple technologies that focus on interoperability with non-Microsoft and Open Source technologies. He will also show how developers can, today, use Eclipse tools to build Silverlight applications that run on PCs and Macs, as well as how they can develop using combinations of PHP, Java and Ruby in addition to the standard Microsoft languages.
In addition to all the talking, we also expect to do a lot of "showing," and a number of product groups will be represented in the Microsoft booth, including folk from the Education, External Research, Open Source Technology Center, Interoperability and CodePlex parts of the company, all of whom will be giving technical demos and chatting to attendees..
An analyst/partner roundtable discussion is also on the cards, as is a broader interoperability discussion. You won't want to miss any of it.
by Garrett Serack on June 23, 2009 07:12pm
In the previous post, I discussed what it took to use PGO on the Windows PHP build. That led to me building automated build scripts...
"Anything that can be done for you, automatically, can be done to you, automatically." - David C. Wyland
First, I had to get the entire dependency stack into the mix. While some of the dependent libraries had VCProject files, some didn't. Worse, even if they had them, you couldn't tell with a degree of certainty that they were compiled with the same settings which would enable them to take advantage of PGO optimization.
I began taking each project, updating (or creating, using the Trace and mkProject tools) the Visual C++ project files that would use the same settings as the rest, and eventually came up with a solution file that had 74 projects in it - some of the projects generated more than one binary.
Next, I had to actually automate the process of creating the vcproject files. Once you've got the right dependencies, the PHP build process cranks out over 30 binaries when you include the PHP extensions that get built as part of the core.
After what seemed like a million compile-verify-tweak iterations, I had the tools that could generate VCProject files for the core PHP and all the extensions, provided it was all in the right place.
Next I wrote a .cmd batch script that went step-by-step, checking out the source, compiling the dependent libraries, building the PHP makefile, compiling PHP like the community did - and logging what it was doing, then switching to instrumentation, rebuilding the dependencies again, building the stack, PGO training it with test data and some applications (Wordpress, MediaWiki and phpBB) and then relinking it with optimization.
I got the .cmd script almost working, but it was fairly fragile. At that point I decided to switch batch scripting strategies and, in about a week, rewrote the batch script in JScript, which was far more flexible, and a lot more reliable.
"The future always arrives too fast... and in the wrong order." - Alvin Toffler
During this process, I tweaked the build process that is generated quite a bit, adding in a few more applications to the PGO training, which cranks the performance up more and more.
Now, I can add in more scripts to assist with the training pretty trivially, but it still takes some effort to package up an entire application like MediaWiki or Wordpress and include it into the build process.
Even once I've added in an application, I end up doing a whole slew of comparative testing to see what impact it has on the final executables.
As time goes by, I'm sure there will be more tweaking to be done but, in all likelihood, any significant performance gains are going to be the result some modification of the PHP codebase itself.
by Garrett Serack on June 17, 2009 03:47pm
Previously, I talked about using PGO in the PHP build process. In order to use it I had to observe...
"A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it." - The First Law of Mentat, quoted by Paul Atreides to Reverend Mother Gaius Helen Mohiam
Really, what I needed was a tool in two parts. The first would watch what happens during the build process, and the second would take that data and spit out some .vcproj files.
When I want to see what's happening on my own system I use ProcMon - a Sysinternals tool that monitors processes, what files they touch, what commands get executed, etc. I grabbed that and tried to watch what happens when you run NMake on the makefile when building PHP. It turns out that are a few problems with that - ProcMon isn't very scriptable (making it tricky to automate) and even if it was, it has problems chopping off the command line in its log files when it's past a certain length.
I found nothing else that did quite what I needed, so I started thinking about how to write a tool that does the same thing. In the past I have used Detours (an API detouring library built by Microsoft Research) to build a couple of quick-and-dirty snoop/debugging tools. Starting with a sample that came from the Detours library, I cobbled together a tool that would watch a process and its children, recording every file written or read, every command issued, and dump it into an XML file which I could process later.
At the same time, I began working on a tool that would generate .vcproj files from the data gathered during the make process. I first tried just putting together a tool which assembled the .vcproj XML file from what I knew about the layout of the project file but, as the build got trickier, the xml was getting harder to make sure it came out the way that Visual Studio expected. I turned to the Visual Studio SDK to see if there were any COM objects I could use to manipulate project files - there were, but they aren't documented in great detail, and they were really designed to be used to inside Visual Studio for automation. Having scoured the planet, I found some examples of using the VCProjectEngine to generate project files.
For a couple of weeks solid, I worked on the tool to generate project files, compiling, testing, tweaking, etc. I finally reached a point where I generated a project file completely that would compile the php.exe and php5.dll . Having finally arrived at this point, I built PHP using PGO instrumentation, ran the bench.php script from the PHP source directory, and then re-linked the project. This first time, I saw about an 18% improvement in speed over the previous version!
"It ain't over 'til it's over, and maybe not then, either. " - Slovotsky's Law #29
Well, as anyone who's done software development will tell you, there's the moment when you finally get your program to do what you want under very controlled conditions, and then - quite some time later - there's the moment that you can give the fruits of that labor to someone else so they can do the same thing.
Now that I had passed the point where I'd finally proven that it was worth the effort to build a PGO-optimized version of PHP, I had to get it scripted so that it could be done in an automated fashion, not just on my computer or a computer in our Lab.
In the final part, I wrap up with the automation of the build and look to where we might go next in PHP.
by Garrett Serack on June 11, 2009 03:10pm
I talked about getting started in building the PHP stack in my last post, now I'm taking it...
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Donald Knuth
A chance conversation I had last summer at OSCON with Trent Nelson - who was building Python on Windows - planted the seeds of how to get PHP on Windows optimized further. Trent was using the PGO features of Visual Studio to generate Python binaries that run faster.
Rather than spend a lot of time optimizing all the little bits of PHP itself, I thought that this would be an ideal way to improve the overall speed of PHP, provided I could find the right scenarios to train PHP with. Little did I know that finding the right scenarios wasn't the hardest part.
"I have not failed, I've just found 10,000 ways that won't work." - Thomas Edison
I had downloaded the source to the dependent libraries off the PHP wiki, checked out the PHP source code, and begun the process of adding in PGO support to the existing build process. This proved to be extremely difficult.
Even limiting the scope to just the core of PHP itself - without the dependent libraries - I ran into trouble trying to compile using PGO instrumentation and then re-linking after running some tests. The make file that gets generated by the configure.js script (a JScript version of the automake configure script for the Windows platform) was just not built with what I had in mind.
I spent the better part of two weeks trying different approaches to tweaking the makefile so that I could use PGO to improve the PHP executable, but I kept running into roadblocks. Worse, the closer I got to a makefile that did what I wanted, the farther away from the current build process I was getting, and I wasn't sure that what I would end up with would even be close to what was being built today.
"Only the meek get pinched. The bold survive." - Ferris Bueller
I came to the conclusion that I'd have to build new Visual Studio project files from scratch. What worried me was that this would end up to be a completely different build process, and I'd never get the community to abandon what was already working, so I'd better be able to rebuild these new project files easily.
I started looking (inside Microsoft and out) for any tools which generated Visual C++ project files. I found someone internally who had used some JScript to create project files from text files, but after some experimentation, I found this was nowhere near what I needed. What I really needed was a way to convert the generated Makefile into a .vcproj file-and not just 'wrap' it.
Once I found there was no such tool* , I began trying to figure out how to create one. I had this idea a few times in the last decade or so: watch how a program is compiled, and create a project file that does the same thing. Having tossed around the idea in my head before, I knew it wasn't going to be trivial but, without it, I couldn't do what needed to be done.
In Part III, I'll talk about the trouble with observing the build process.
by Garrett Serack on June 09, 2009 01:22pm
The last several months, I've been working very deeply with PHP - specifically, compiling the PHP core itself, and looking for avenues for optimization. This is the first of four posts about the journey I've been on with PHP.
"It is a bad plan that admits of no modification" - Publilius Syrus
I started working with building PHP itself about a year ago. Initially, I was trying to put together an environment to compile up the PHP stack so that I could do some debugging, and track down a few faults that we were encountering in some of the PHP applications that we were trying to modify to use the SQL Server PHP driver that the SQL Server team here at Microsoft was creating.
Once I began to work with the source code, I found out very quickly that on top of having a hard time recreating the exact same binaries that the community build process generated, there were a large number of dependent libraries that were available in binary-only form and which were kept in a zip file that was passed around from developer to developer. That seemed a little odd for an open-source project, but I can certainly understand that over time, unless someone is working hard to keep it all together, these things happen.
Around the same time, the community had started to invest time and effort to 'clean up' the dependencies for building PHP on Windows, and move towards supporting VC9 (Visual Studio 2008) as an officially supported compiler.
In order to help in this process, I built out some testing environments in our Lab, which would let me compile up PHP on Windows and Linux, in order to get decent and reliable test results which we could use to identify any shortcomings that we could then address. This includes benchmarking not just the core PHP executable, but replicable and comparable testing of PHP applications such as Wordpress, MediaWiki, Gallery and phpBB.
"I'm looking for a lot of men who have an infinite capacity to not know what can't be done." - Henry Ford
For PHP 5.3, Pierre (and others) had gone out and found up-to-date versions of all the dependencies, brought them together, and managed to get them compiling with VC6 and VC9. They had posted these in binary and source form to the PHP Windows Internals site, which allows anyone to rebuild the PHP stack on Windows and, theoretically, get the same results as the 'official' build.
Jumping in at that point was much easier than it had been, as all you had to do was download the binaries of the libraries, check out the source code, run a few commands at the command line and, presto, you had your PHP executables.
At this point Pierre and I played around with the build flags on VC9 and found some settings that gave some pretty significant improvements to the speed of PHP vs. the speed of the VC6 version -and a lot of speed improvements vs. the old 5.2x line of PHP.
In Part II, I'll talk about going one step further with optimization.
by Peter Galli on June 02, 2009 10:05am
There has been some renewed media interest in the NASA Space Act Agreement with Microsoft, which was signed earlier this year.
Microsoft signed that agreement so as to provide an umbrella framework of contractual terms that allows for a variety of cooperative projects. At the same time, we also entered into a project agreement that had an initial project to write code that would allow for the conversion of a lot of data about the planets (the planetary data system) and Martian survey information (LROC) into the World Wide Telescope (WWT) format.
WWT is an online program that pulls together images from across space and which lets users use their computer screens to traverse the universe.
Conversion of the information into the WWT format Tessellated Octahedral Adaptive Subdivision Transform (TOAST), allows the data to be viewed in both the WWT client and a newly developed Silverlight version of the client that supports multiple platforms, while other clients can also implement TOAST. The WWT is a freely downloadable Windows client application.
TOAST displays flat images, like those from telescopes,on representations of spherical objects like planets and moons, on a computer screen. One of the primary reasons for adapting the TOAST format for WWT was that it can accurately render the celestial poles with little distortion.
The traditional Mercator projection that is used by Google Earth and Sky in Google Earth cannot accurately render terrain or the sky within 15 degrees of the poles. TOAST is able to accurately render the sky and polar regions of the sky, Earth and planets with little distortion, which was important to both WWT and NASA.
The intent all along was also to make the conversion utilities that were developed for this initial project available under an open source licensed distribution. This is part of Microsoft's Open Edge strategy, which allows for the extension of the platforms that we provide to the community, by the community, in cooperation with the leading domain authorities - in this case NASA.
The WWT platform is the best astronomical data visualization technology available, and it makes sense that the most knowledgeable members of the community should be able to extend the platform with a variety of components under a mixture of licensing models.
As Alyssa Goodman, a Professor of Astronomy at Harvard University, says: "If only we could travel faster than the speed of light, we could leave our solar system, go past the nearby stars of our galaxy, leave the Milky Way and visit the many galaxies beyond. Until then more and more incredible telescopes, including this WorldWide Telescope, will continue to let us marvel at the wonders of the Universe."
by Mark Stone on June 01, 2009 11:15am
The 1.0 release of WinBioinfTools might seem like a modest event; as of this writing, the project has 44 downloads. High Performance Computing (HPC) is a small community, granted, and the number of HPC applications for bioinformatics is a small subset of that. Let's not confuse popularity with importance, however.
We use the phrase "mission critical" very frequently and somewhat casually within software development. In talking to a friend about the swine flu outbreak, I was reminded that the phrase has its origin in military history: an aspect of a mission so critical that failure in that aspect would result in the loss of life. In the developing world where medical infrastructure can be a fragile thing, information about the origins or genetic makeup of a virus can be vital. It can be mission critical.
Historically, the developing world has been dependent on developed western countries to do their research for them. Open source is beginning to level that playing field, though. Using a cluster environment and software projects like CoCoNUT for gene sequencing and comparison, even university research centers with modest x86 server environments can play in the HPC space. This is important because the research priorities for a university in a developing country may be very different from the research priorities of a major western research university. At its best, this is exactly the kind of lowering of barriers to entry that open source should facilitate.
For all its value, CoCoNUT has two significant limitations. Its license is an academic license, not a fully open source license. And it runs only on Linux/Unix systems. The latter is particularly important. Research scientists are not IT professionals, and they should not have to care about the underlying platform on which their software runs. The spirit of open source is to make software as widely available as possible, and there is no way to meet that spirit without including Windows Server among the target platforms. Mission critical demands no less.
So WinBioinfTools makes important steps forward on both fronts. The team at Nile University has released a GPL-licensed project that "contains a number of programs for Bioinformatics running over Windows Cluster running Windows HPC server 2008. The current version includes the CoCoNUT system for pairwise genome comparison, parallel global sequence alignment, and parallel BLAST."
This is a great example of a local software community using open source to make their needs a priority, and delivering a project that will benefit local software communities in other developing countries with similar needs. WinBioinfTools puts us one step closer to making scientific computing software platform neutral, and closer to making Windows Server a first class citizen in the open source world of HPC.