[Rpcemu] Improving CPU usage when idle
jake at waskett.org
Sun Apr 5 14:23:43 EDT 2009
The following is continued from a discussion at comp.sys.acorn.misc.
The subject of (part of) this discussion was improving the CPU usage
of RpcEmu when RISC OS is idle. I'm moving the discussion here, as it
seems a more suitable place. For those who are interested, the
original thread may be found at:
[Re the 'NANOSLEEP' SWI, I wrote]
>> Yes, I found a primitive program (called "Sleep") that makes use of it,
>> but oddly it seems to make the emulator unresponsive to mouse clicks.
>> It still responds to mouse movements (and alt-break), though, which
>> suggests that the emulator core is still working fine. I have a
>> feeling that it's related to (and magnifying the effect of) another
>> bug, in which mouse clicks are occasionally lost in ordinary usage. I
>> may be mistaken, but polling the button state seems to be the
>> underlying problem, and would be susceptible to thread timing issues --
>> queueing mouse events would be more reliable. Unfortunately I'm
>> unfamiliar with Allegro, so I don't know whether the event queue can be
>> accessed. It's fairly easy under SDL...
[And Theo Markettos replied]
> Congratulations on taking my bait ;-)
> Emulator timing is quite a tricky area, because it's running on an
> emulated 2MHz timer. In a real machine this increments uniformly, and
> things happen based on it (eg mouse double click timing). On an
> emulator 2MHz is a bit too fast if the system is at all loaded, so it
> increments in fits and starts. This may be what's confusing apps: in
> between your double click, Firefox is suddenly paged in, chews some
> centiseconds of CPU, and then paged out again. By which time the queued
> second mouse event is far too late to count as a double click.
I see what you're saying, but I wonder if there could be two separate
problems. I'm occasionally seeing lost single-clicks. Playing around
dragging boxes on the pinboard, I'd guess this happens about 1 time in
10-20. I also find that click-and-drags are often interpreted as
double-click, but this is more likely to be related to the timing
problem that you describe.
I think that the underlying problem is that the button state is polled
(see pollmouse in keyboard.c), and changes are detected through
comparison of the current and previous states. This means that it's
possible for clicks to be lost, if polling is too infrequent or
irregular. It also means that it's difficult to preserve the correct
delay between clicks.
What we *ought* to do is to record (and queue) the event whenever
we're notified that the button state changes. We can then pass these
state changes to RISC OS through simulated interrupts. Fortunately
the exact timing of the events doesn't matter too much, as long as the
relative timing is preserved. This shouldn't be too hard to do if we
tag each queue entry with a timestamp. Most GUIs give us these events
(as well as keyboard events, which should be dealt with in a similar
way) through their event queue -- X11 and Win32 both work this way --
but Allegro is hiding this behind a sub-optimal abstraction which can
lose information. It's possible that we can make use of Allegro's
mouse_callback (and keyboard_callback, I guess), however.
> I think the sleep() call tramples all over this... because when sleeping
> you don't receive /any/ activity.
Yes, that's true, in a single-threaded app. (I previously made the
mistake of assuming that RpcEmu was polling the keyboard in a separate
thread, but it seems to be done in the main thread.)
> I did try changing it to usleep()
> some small amount and it was better, but still not good enough. Maybe
> it wants an equivalent of select() on the appropriate thread status
> (timer updates, mouse clicks)
Hmm, if it were a pure Win32 app or an X11 app, it would be fairly
easy to implement a select()-style approach. The difficulty is how to
do this portably, within the confines of Allegro. The best solution
that springs to mind at present is something like this:
First, create a fairly simple thread-safe queue implementation. Each
entry needs to contain: an event type (initially mousebuttonup or
mousebuttondown), some associated data (eventually keycode, mouse
position, etc), and a timestamp.
Second, install a callback for mouse (and eventually keyboard). These
simply create an event and append them to our queue.
Third, modify pollmouse() to process the queue instead of polling the
mouse directly. (It'll also need to take account of the timestamps.)
Finally, the semantics of nanosleep() should be changed to something
* if our event queue is not empty, do nothing.
* else, sleep until an event is posted to the queue, or a timeout
occurs (simplification: treat a timeout as a special kind of queued
event). (This requires a small amount of platform-specific thread
sync code, unfortunately.)
* as long as callbacks are invoked from a separate thread (as I think
they must be), then this should work.
> Allegro timer doc is here:
> rest() or rest_callback() appropriate - say if the callback func checked
> the other thread status. But it's not clear if that's actually yielding
> the CPU at any point. Otherwise perhaps something like:
I think rest() and rest_callback() are just thin wrappers around
sleep(). Sleep() isn't a bad solution, as long as we have another
thread available to receive input events. It's not quite as elegant
as simply select()ing on the X11 file descriptor (or doing the
Win32/MacOS equivalent), but it's probably the most portable approach.
> for (i=0;i<1000;i++)
> if (event_queue_has_something)
> but I'm not sure how efficient that is.
It's not all that great: sleeping for a microsecond doesn't give the
host's kernel much of a chance to do anything. It can switch to
another process, but it's too short an interval for it to be worth
entering a power-saving state.
>> If I can solve the input events problem, then writing an improved Sleep
>> program should be fairly straightforward. Actually, it would be rather
>> fun to write something for RISC OS again. Ideally, it would be good to
>> be told when the Wimp is idle and all tasks have idle events masked
> That's what the Sleep BASIC program does. Except it only knows about
> Wimp events, and not other mouse, keyboard events. It just grabs Idle
> polls and turns them into sleeps.
It's not *exactly* what the Sleep program does. The Sleep program polls
the Wimp for messages, and invokes the NANOSLEEP SWI whenever the
reason code is 0 (idle). But I think this is a bit too aggressive,
and I would expect it to interfere with the performance of other tasks
(eg taskwindows) that use idle to perform background processing. What
we really want to know is if a) the Wimp is idle, *and* b) no other
tasks are performing idle processing. Most tasks will mask out reason
code 0 if they don't intend to use it, so the Wimp should be able to
detect this state. Whether this helps us is of course another matter!
What we need is a way of guessing the system load. As long as this is
fairly accurate, it doesn't need to be perfect. If we guess that the
system is mostly idle, then we can nanosleep() to free up, say, 90% of
the CPU time. On the other hand, if the system seems to be busy, then
we shouldn't try to free up CPU time.
The simplest method for guessing the system load is to measure elapsed
time between Wimp_Polls that return reason code 0. If the period is
very small, then it's likely that not much else is happening in the
system. So, for example, if Wimp_Poll returns reason 0 ten times in a
row, and each time is less than a millisecond after the previous
event, we can probably assume that it's safe to nanosleep(). These
figures are obviously guesses, and better values could probably be
obtained through testing.
>> Then the emulator could just sleep until an event occurred. I don't
>> think there's any way to get such a callback, though, so I guess I
>> could estimate the system load by measuring elapsed time between
>> Wimp_Polls with an idle event code. I would have thought that there
>> must be some kind of power management interface that could be leveraged
>> ... I wonder what the A4 did...
> Good thinking... I can't remember what the A4 did (it was something
> similar - 24MHz to 8MHz switching when the CPU was quiet). Maybe the
> source of the Portable module will have enough relics to work it out?
> <has a look>
> I think you're onto something there. The Wimp
> (castle/RiscOS/Desktop/Wimp/s/Wimp02 and Wimp07 in the ROOL sources)
> calls SWI XPortable_Speed to select fast or slow speed in the A4 or
> Stork (details in the RO3.1 or 6 PRM). In RO3.6 and later the SWI just
> gives an error, which the Wimp ignores. The kernel does the same for
> keypresses, in castle/RiscOS/Kernel/s/PMF/key. So if the emulator
> intercepts Portable_Speed and uses it to control some kind of idle
> control, that should work I think.
Now here's an interesting thing. According to the documentation for
the "Portable" module, there is a "Portable_Idle" SWI (0x42fc6), which
is defined to basically stop the CPU until something interesting
occurs. In other words, exactly the same semantics as a select() on
an XLib socket (or MsgWaitForMultipleObjects in Win32).
I've modified the source code to "implement" a few Portable SWIs
directly in the emulator, but this seems to have no effect except,
oddly, causing a crash on shutdown (I think RISC OS is confused by the
presence of some Portable APIs but not all of them). It seems that
Idle is never called, in spite of being marked as available by
ReadFeatures. It could be that Idle has been added after RO4.02...
Ah, looking at the ROOL Wimp sources, it seems that Portable_Idle is
only called in one place, in Wimp03, in a conditional assembly block
for Stork. That's a shame.
Portable_Speed, however, is certainly being called in 4.02, and would
be perfect for gauging the system load.
A possible approach, then:
* Implement (as a minimum) mouse event callbacks & queueing, as
described above, to avoid lost clicks & to ensure accurate timing
for double-clicks, etc. This ought to make the BASIC "Sleep"
program usable by itself, but to get really good performance...
* Implement an RpcEmu-specific version of the "Portable" module (or
alternatively implement the SWIs in the emulator), the main purpose
of which is to capture the speed changes, from which we can
accurately infer system load. Question: is there a way to ensure
that this is loaded before the Wimp initialises?
* Implement a trivial app (or module task, perhaps) that uses Wimp
Poll info, as well as the current 'speed', to selectively invoke
NANOSLEEP when CPU load is low enough.
> Unless ROL have 'tidied' this from their tree, the same code is present
> right the way from RISC OS 3.1 to ROOL's CVS HEAD. So if the emulator
> supports it it should cover all OSs. That's a very neat idea!
More information about the Rpcemu