Understand the process
Tue ,11/08/2009

Recently, Eric asked me to speak at one of our company meetings about the future of the OpenSpan platform. He didn’t want me to speak about releases or features, but rather about the platform as a whole. What is our vision for the OpenSpan platform?
This is actually a topic I think a lot about. For every feature we design and every release we plan, I ask the question: “How will this push the platform forward?” One of the things I think we’ve done very well at OpenSpan is to stay true to the principles that led us to create OpenSpan in the first place. When Stephen and I first started planning the platform we each brought two sets of distinct experiences to the table.
Stephen had spent the past several years creating visual programming environments. He believed strongly that the barrier that prevented more business analysts from programming was not the concepts of programming but rather the syntax of programming. Ultimately, many of the concepts programmers use everyday, conditions and loops in particular, are familiar to every advanced Excel or Access user. Other concepts, such as types, casts, threads and objects are either unnecessary for many tasks or presented in such a way as to be impenetrable to the lay person. Stephen was committed to providing an environment where business users could develop or participate in the development of automations.
I had spent the past few years creating custom desktop automation solutions for enterprise customers with diverse application requirements. I had become painstakingly familiar with the available APIs: COM libraries for Lotus Notes, Internet Explorer, Office, Siebel and others; MSAA and windows functions for Win32 applications; HLLAPI, EHLLAPI, etc. for mainframe emulators. I had also become painfully familiar with the limitations of these APIs: narrow or incomplete APIs; little or no support for events; frustratingly inconsistent implementations of supposedly standard APIs. My experiences had convinced me that there was a better way to automate these applications.
In our very first face to face conversation, Stephen and I decided two things: OpenSpan would be accessible to both developers and non-developers and OpenSpan would be completely reliable. If you clicked a button with OpenSpan, it would click every time.
I would be lying if I claimed that we knew that these principles would lead us to our current automation engine and injection library. Some ideas like controls, targets and match rules evolved quickly. Other ideas like asynchronous links and keys evolved in response to customer requirements. Whatever happened, though, we stayed true to our principles. When we reached the limits of message hooks and accessibility interfaces, we scrapped them and wrote an injection and hooking library from the ground up. When we encountered a control we didn’t support, we reverse engineered the platform to get at the real object. When a public interface didn’t supply the events we needed, we made our own with some well placed hooks.
Over time, some new principles evolved:
So what is our vision for the OpenSpan platform?
OpenSpan was built from the ground up to be a accessible and reliable desktop automation platform. Those two simple words, accessible and reliable, have directly guided the evolution of our most valuable IP: automation and injection. Our vision is to extend those two technologies into new areas. We are actively working on new ways to expose them to developers including our upcoming Visual Studio plug-in, Lotus Expeditor container and translator SDK. We are also working to apply these technologies in new environments. In the future, we will move beyond the desktop to provide automation and injection on the server. When we do, you can be sure it will be just as accessible and reliable as our current platform.
In 4.1 we introduced a new set of methods: PerformSynchronousClick, PerformSynchronousDoubleClick and PerformSynchronousRightClick. These methods are only available in our windows adapter currently, but will be extended to other technology stacks in the future. These new methods complement our existing PerformClick, PerformDoubleClick and PerformRightClick. So why do we need synchronous versions of these methods? And why are the existing methods not synchronous? What does synchronous mean in this context anyway?
To understand these functions, let’s take a step back and talk about the Windows message loop. All Windows programs (at least those with a user interface) feature some variation of the following:
MSG msg;while( GetMessage( &msg, NULL, 0, 0 ) != 0){ TranslateMessage(&msg); DispatchMessage(&msg); }
This simple code snippet is the prototypical Windows message loop. Essentially, GetMessage asks Windows for a message and waits. When Windows has a message (such as a keyboard or mouse message), GetMessage returns. The caller then passes the message to TranslateMessage (which turns keyboard messages into char messages), and then DispatchMessage (which passes the message to the actual destination window).
However, not all messages are returned by GetMessage. Only posted messages (input messsages and messages posted using PostMessage) are returned by GetMessage. So what happens to sent messages (messages sent using SendMessage or one of its variants)? These messages are dispatched synchronously to the destination window by GetMessage before it returns. The sequence looks like this:
In truth, it gets a little more complicated than this, particularly when you’re dealing with another application’s message loop. Applications can and do add all sorts of additional processing to the message loop to do special input filtering, show modal dialogs, etc. Moreover, simulating even the simplest click can involve over twenty messages, some sent, some posted, depending on the styles of the destination window.
Unfortunately, directly sending messages to the destination window will not yield the correct behavior because if the application does any special message processing in the message loop we will have bypassed it entirely. Moreover, we never know when one of the twenty messages we send will result in a long running application process that we don’t want to wait for. For these reasons, prior to 4.1, we did not provide any synchronous click functions at all. We filled the message queue and let’er rip.
For most scenarios, this works fine. OpenSpan always waits for new controls to be created using our implicit WaitForCreate methods so there are very few instances where we need to wait for the actual result of a click. However, one of those situations involves our prototypical demo application: calculator. Using the standard click methods, the following automation will return 0 or 4 rather than the desired 8. With the information above can you understand why?
That’s right, each step of the automation doesn’t wait for the previous step to complete. Thus, the automation gets the text before calculator has updated the text field with the final result. The new synchronous click methods do wait for all of the click messages to be processed. Thus, the automation below will return 8 every time.
But what if clicking a button creates a modal dialog? Will OpenSpan wait for the modal dialog to be dismissed by the user before continuing? What if there is no user? What if I want to dismiss the dialog in my automation? Oh noesss!
Simmer down. The short answer is that we do not wait for the modal dialog to be dismissed. Modal dialogs are nothing more than a nested GetMessage loop inside the outer GetMessage loop. Usually the sequence looks something like this:
OpenSpan simply waits until step 4, when GetMessage is called again after the mouse up message and returns. Perceptive readers with knowledge of hooking may have guessed how we do this, but for everyone else, it’s sufficient to say that we ensure that each time an application calls GetMessage (or any of its variants such as PeekMessage) the next simulated message is always returned. Some other time, maybe I’ll talk about how we ensure that OpenSpan messages receive priority over other messages and are always processed in order. Until then, I’ll let everyone guess in the comments (everyone except OpenSpan employees, that would be cheating) and give the correct answer a free “Do It on the Desktop” t-shirt.
One of the questions we frequently get asked is “When should I use client-side integration instead of server-side integration?” Typically, we provide a list of scenarios where client-side integration is the only viable option:
Today, however, I’d like to make an argument I don’t think we make enough. For many scenarios, client-side integration is the best option, even when viable server-side options exist.
When?
When you’re trying to make your users more productive.
Why?
Because you’re spending your money making what they do easier.
You’re not spending your money exposing integration points for user oriented tasks. You’re not spending your money writing a new user interface that wraps or embeds older functionality. You are automating tasks. You are eliminating errors. You are making your user experience better.
In our industry we are accustomed to thinking that there is only one way to solve user productivity problems: a new user interface. Typically, this means jamming as much functionality as possible into a single window using mashups, portlets, frames, tabs, etc. Now you’re users don’t have to switch between different windows to do their job.
But was that really the problem? Instead of multiple windows, they now have everything in one place, either smashed together or separated into tabs or frames. They now have a new user interface to learn rather than the one they were accustomed to. Moreover, you’re application developers are now spending their time creating portlets, web services, etc. rather than adding new functionality.
But what if the user interface you already have is the best one for that particular business function. It was designed for that specific function. It has been through multiple improvement cycles. It does what you need it to do. Were you really unhappy with the user interface or were you unhappy that you have to switch back and forth when you copy and paste? Or that your workflow requires you to look up the user in three different applications? Or that it takes too long to place an order because the wizard was designed for the public website?
Why are there still so many host systems in the enterprise when windows alternatives have existed for thirty years?
Because their faster than windows systems so your most productive, expert users don’t want to let go of them.
Why are there so many windows systems in the enterprise when web alternatives have existed for 10 years?
Because their user interfaces are more rich than web systems so your most productive, expert users don’t want to let go of them.
When you’re trying to make your users more productive, server-side integration is a sledgehammer when you just need a hammer. Your server-side architects and developers should focus on the big integration problems: synchronizing data, backend transactions, exposing web services for your partners. User productivity is a different kind of problem that requires a different kind of solution.
OpenSpan allows you to improve user productivity without changing all of your applications. It allows your users to be more productive without having to be trained on an entirely new system. It allows your users to do what they always did, but faster and with fewer errors. It allows your application developers to focus on new features instead of integration. It not just the only solution, it’s the best solution.