Saturday, March 31, 2012

Anzhelka Rev A boards have ARRIVED!

After a full 30 days of waiting the REV A Quad Rotor boards have arrived! Just as a reminder these boards were ordered through IteadStudio's PCB Prototyping services. Taking my first look at these boards I get a sense success in having a physical entity. It is really hard to see if all the traces are connected because of the white solder mask, but when I do the electrical testing on Monday I should be able to easily determine that. Solder mask looks decent, but the smaller fonts and text didnt come out at all. As some of you guys may know, there was atleast one major thing that was forgotten on this rendition of the board, Gound Plane. This has been fixed in the diagrams for Rev B. I also  forgot to add a resistor on the DOUT pin of the MCP3208. Both of these issues should not contribute to any serious problems with this board.  Take a look at the photos below.



Final thoughts
 If you are in a time crunch to get a set of boards, I would highly NOT recommend getting your boards through IteadStudio. I also placed an order for some PCB's from SeeedStudio a week later and received them on the same day at twice the quantity and for only about $10 more.

Stay tuned in the days to come. There is going to be lots of soldering work on the way. :)

Sunday, March 25, 2012

Motor Testing: PWM vs RPM

Today I was able to test the response of the motors to pwm inputs. I tested the Turnigy 2217 860Kv brushless outrunner (link) Note that the motor is black on the product page, but ours is red.

To test, I hooked the motor up to it's ESC and a Parallax Propeller development board. Motor power is provided at 12v from a computer power supply*, and the code slowly ramps the motor up in 10uS increments. The rpm is read at the end of each step to allow the motor one second to settle in to it's new speed.

The up loop code:
repeat
 repeat i from 0 to 1000 step 10
  pwmoutput := i + 1000
  debug.str(string("PWM= "))
  debug.dec(pwmoutput)
  pwm.servo(ESC_PIN, pwmoutput)
  waitcnt(clkfreq + cnt)
  debug.str(string(9,9,"RPM= "))
  debug.dec(rpm.getrpm(0))
  DebugNewline

A couple of comments on the data:

  • Minimum power up was with a pulse of 1120 uS (microseconds), and resulted in 1800 rpm.
  • Maximum rpm was approximately 12000rpm.
  • The graph is definitely not linear, so coming up with a function that maps pwm to rpm would be difficult

* I found that, with these motors in particular, that if the throttle is suddenly reduced to off (1000uS) the motor creates a surge on it's power lines, which then causes the computer power supply to automatically turn off. But it works fine if the speed is gradually reduced.

Eagle Tree Brushless RPM Sensors

In order to determine the amount of thrust that a motor is producing we must be able to measure the RPM of the motor. For a normal two wire DC motor the only solution would be to use some sort of optical sensor to watch the motor rotate and to count the number rotations. Usually this is done with an IR sensor that either senses a black stripe on the can of the motor, or watches the propeller and detects it's passing over the sensor.

With a brushless motor, we have a new option: monitoring the control pulses. A brushless motor has three control lines that go into the motor, and to make the motor spin the lines are pulsed in a specific order. The  timing of the pulses determine the speed of the motor, and the pulses must match the position of the motor rotor. Modern brushless motor controller chips (ESCs: Electronic Speed Controllers) have circuitry built into them that can automatically sense the position of the motor rotor and which could, in theory, be used by a host microcontroller to determine motor rotation speed. Unfortunately, most ESCs don't provide this sort of information to a host, so we'll have to measure the pulses directly and infer from the control pulses instead.

By taping into a brushless motor control wire we connect to the circuit show below. In that figure, each arrow is a pulse sent by the ESC to the motor, and each pulse is sent once per rotation. Now, imagine our RPM sensor has a tap right at the B marker. Pulses 1,4,2, and 5 all pass through B, so that's 4 pulses per rotation. For the remaining two pulses 3 and 6, the ESC sets B to be high impedance (ie unconnected). But for pulses 3 and 6 there is still a detectable pulse on B simply because a high impedance input can "skip" the b coil and measure the voltage at com, which has a pulse. With that, a single rotor revolution produces 6 measurable pulses.

In the above image, there are two magnetic poles. I suspect that if there were four magnetic poles the sensor would produce twice any many pulses, if there were six poles it would produce three times as many, and so on. I don't have a motor to test this on, but this is an interesting thread on the RCgroups forum that has a custom circuit to measure brushless RPM.

Anyway, this is my current theory on why the number of pulses have to be divided by 6. I've also heard  that outrunner motors have 6 poles, so that may be why. I'll have to take apart a motor and take a look at it to see what's up.

The Eagle Tree Brushless RPM sensor is the only ready made solution on the market. It's a very small device, priced at about $12 a piece, and it can sense the rotation rate of a single motor. It converts the brushless motor signals to a series of pulses where each pulse is equivelent to a rotation. We don't measure the signal directly from a brushless motor control line because there are very high voltages and back EMF from the motor, all creating a very nasty environment for our 3.3v microcontroller. So we use the Eagle Tree sensors instead.
The only Propeller source code that I could find came from a single forum post here. The code is under documented and very minimal, but it works. For the rest of this post I will analyze the object and show how it is used. I have attached a condensed version of the source code to the end of this post.

First, the circuit. In the object, it says
  Connect black to 3.3V, red to Gnd, white to prop pin  
And yes, it does work the way it is written. Despite going against standard color coding conventions, the red line is connected to ground and the black is to 3.3v (the Propeller VDD).

Next, connect both single red wires from the sensor to any two of the brushless motor lines. A single line would work, but I found that connecting both reduced erroneous readings on motor load situations. In a simple test, with only one lead connected there was a variation of 15 RPS, while with two there was a variation of 10 RPS (and a slightly higher bias).
The object will work for up to eight sensors, each with it's own Propeller I/O pin. These pins must be in a contiguous block of eight pins, regardless of how many sensors are actually used. The pins to use are declared by calling the setpins(_pinmask) function. Once the pins are set, call start to dedicate a new cog to the process.

The object times the number of clock cycles between rising edges of a pulse, and stores this value in the  Pins[n] variable. Most users probably don't care about the absolute time, so the getrps function will return the rotations per second of the specified channel. This function accounts for the six pulses per revolution, and the current system clock rate. It could be sped up a bit by making the division and multiplication done in assembly.

As currently written, the object supports up to 8 sensors, and regardless of how many you use the update time (minimum delta clock) will be the same since the code runs through the entire sequence on every iteration. You can remove the unused code, or if you have more than 8 motors (wow!) you can add more copies without too much loss of precision.

The (nearly complete) object. For the original, check out post four of this thread. For the latest and greatest, check out code.anzhelka.com.
Con
  Mhz    = (80+10)                                      ' System clock frequency in Mhz. + init instructions
   
VAR
  long  Cog
  long  Pins[8]
  long  PinShift                                          
  long  PinMask

PUB setpins(_pinmask)
'' Set pinmask for active input pins [­0..31]
'' Example: setpins(10_1001) to read from pin 0, 3 and 5
  PinMask := _pinmask
  PinShift := 0
  repeat 32
    if _pinmask & 1
      quit
    _pinmask >>= 1
    PinShift++ 
PUB start : sstatus
--- etc.
PUB stop
--- etc.
PUB getpinptr
  return @Pins
PUB getrpm(i) | delta
--- etc.
PUB getrps(i) | delta
'' Get the RPS of motor i (by index, not pin number)
'' Valid index range is 0-7
'' Returns -1 when no valid data
 if i > 7 OR i < 0 'Check Range
  return -1
 delta := Pins&#091;&#173;i&#093;
 if delta == 0
  return -1
 return (clkfreq / (delta*6))
DAT
        org   0
INIT    mov   p1, par                           ' Get data pointer
        add   p1, #4*8                          ' Point to PinShift
        rdlong shift, p1                        ' Read PinShift
        add   p1, #4
        rdlong pin_mask, p1                     ' Read PinMask
        andn  dira, pin_mask                    ' Set input pins

'=================================================================================

:loop   mov   d2, d1                            ' Store previous pin status
        waitpne d1, pin_mask                    ' Wait for change on pins
        mov   d1, ina                           ' Get new pin status 
        mov   c1, cnt                           ' Store change cnt                           
        and   d1, pin_mask                      ' Remove unrelevant pin changes
        shr   d1, shift                         ' Get relevant pins in 8 LSB
{
d2      1100
d1      1010
-------------
!d2     0011
&d1     1010
=       0010 POS edge
}
        ' Mask for POS edge changes
        mov   d3, d1
        andn  d3, d2

'=================================================================================

:POS    'tjz   d3, #:loop                       ' Skip if no POS edge changes
        mov   p1, par       ' Hub variable address
'Pin 0
        test  d3, #00_0001   wz    ' Change on pin?
        mov   d4, c1       ' Copy :loop count value to d4
        sub   d4, pe0       ' Subtract old count value from new count value ( delta(cv) = d4 - peo )
                  ' If pos change:
if_nz   cmp   d4, mintim wc      '  -> write c if d4 (delta count value) is less than minimum time
if_nz_and_nc wrlong d4, p1      ' -> write the delta count value to the hub if greater than minimum time
if_nz_and_nc mov   pe0, c1                      ' -> Store POS edge change cnt (system clk time, not delta)
            ' If no pos change:
if_z    cmp   d4, maxtim wc      '  -> write c if d4 (count value) is less than maximum time 
if_z_and_nc wrlong zero, p1      ' -> write zero to the hub if greater than maximum time

'Pin 1
        add   p1, #4
        test  d3, #00_0010   wz              ' ...
        mov   d4, c1
        sub   d4, pe1
if_nz   cmp   d4, mintim wc
if_nz_and_nc wrlong d4, p1
if_nz_and_nc mov   pe1, c1
if_z    cmp   d4, maxtim wc
if_z_and_nc wrlong zero, p1
'Pin 2
        add   p1, #4
        test  d3, #00_0100   wz
        mov   d4, c1
        sub   d4, pe2
if_nz   cmp   d4, mintim wc
if_nz_and_nc wrlong d4, p1
if_nz_and_nc mov   pe2, c1
if_z    cmp   d4, maxtim wc
if_z_and_nc wrlong zero, p1
'Pin 3
--- etc.
        jmp   #:loop

fit Mhz                                         ' Check for at least 1µs resolution with current clock speed
'=================================================================================
mintim  long  3000
maxtim  long  10_000_000
pin_mask long 00_0000
shift   long  0
c1      long  0      
d1      long  0
d2      long  0
d3      long  0
d4      long  0
p1      long  0
pe0     long  0
pe1     long  0
pe2     long  0
pe3     long  0
pe4     long  0
pe5     long  0
pe6     long  0
pe7     long  0
zero    long  0
        FIT   496

Motor Testing: Volts vs RPM

Today I was able to test the motors under no load to get a general idea of how they work. I tested some Turnigy 860kV motors with a bench top power supply, and the eagle tree RPM sensors. I used an RC airplane remote set at full throttle for all tests.



VoltRPMAmps
986400.6
1096000.64
11105600.685
1211520
0.73
13124800.78









From this graph, a few things are clear:
  • RPM and current vary linearly with voltage.
  • No load current is very low (~675 mA)
  • The kV rating is off by a bias, but a very good predictor of no load.
Concerning the kV rating I found that with 

expected rpm / measured rpm

I came up with .895833 for every result. The measured kV can be calculated by

measured rpm / measured voltage

This resulted in a measured kV of 960, a full 100 more rpm than the manufacturer specified 860kV.

This test is important because it indicates that we can test the other characteristics of the motors (thrust, torque, etc.) and reliably extrapolate the results to different voltages.


Wednesday, March 14, 2012

So What has Luke been doing?

So these last couple of weeks I have been finished up the circuitboards, part sampling and ordering. I sent the design off to the PCB manufacture on March 1st. They said that the boards had a 4 day turn around. I got an email from them on the 7th saying that the order had been shipped out to me, however, from the tracking number all I can see is that the package is "POSTING". If someone could help me understand what this means it would be much appreciated.

Last week I ordered the $300 in parts needed so that we could populate our 5 circuit boards that are on the way. These components have already arrived and are ready to be used and tested.

For anyone who has never sampled parts, it is a God send! Sampling parts allows you to receive nearly anything for free. If you need a box, you can most likely sample that. If you need some ICs you can most likely get them. Here is a Instructable on a list of known places that sample and what they sample. So with that information aside, I placed orders to Texas Instruments and Microchip to sample: TBX0108PWR, INA169NA, I2C EEPROM, MCP3208.

In Cody's post below you will notice that we are building a stand to test the thrust and torque of our motors so that we can learn their characteristics. One thing that he didn't mention is how we are going to be powering our test stand. We were able to come across a Corsair HX520W. This power supply is has three 12V lines capable of delivering 18amps max on each channel, however if you tie all the lines together you can achieve a max of 40amps. Just to let everyone know, because you cant find this info in the manual, one of the 12V rails resides in the Molex connectors for harddrives and like components, another is in the ATX12V connector for the Motherboard, the last one can be found on PCI-Express power connectors. On the side I am working on making a board so that I can plug these connectors into a single board that will breakout the 3V3, 5V, 12V, -12V, 5VUSB with the high current capability. (BTW, Molex also lets you sample components from them.)

Who are we using for our PCB manufacturing?
http://www.iteadstudio.com/

Would you recommend them?
Currently unsure of that question. Stay tuned to find out.

If you guys have any questions be sure to leave us a comment or send us an email to either ilukester@anzhelka.com or srlm@anzhelka.com. We love questions.

Monday, March 12, 2012

Introducing: The Thrust/Torque Test Stand!

A good autonomous quadrotor needs to be able to measure, in real units such as kilograms and seconds, important aspects about itself such as orientation, motor thrust, acceleration, and so on. It's fairly easy to make a remote controlled quadrotor platform since the human in the loop can intuitively correct for many small errors, and our eyes are very good at collecting the necessary raw information. An autonomous quadrotor does not have this luxury, and must explictly define each kinematic and dynamic equation. Most of the fundamental equations that a quadrotor must use will have some sort of general constant in them. This constant is meant to account for small errors in the system, differences between almost identical parts, and so on. Some of the most important constants and most difficult to measure are the motor torque and thrust constants. This post will cover details about our motor thrust/torque test stand.

I posted the equations last week, but here they are again:




[If your browser is like mine and is messing up the equations, then the first is K_T = T / (rho*n^2*D^4) and the second is K_Q = Q / (rho*n^2*D^5) ]

Don't be intimadated by equations. Most of the quadrotor mathematics look very difficult and impossible to understand, but it's my belief that understanding is not too difficult. I'm writing a paper to address the subject in great detail, and I'll be covering each equation from first principles to implementation.

Above, we have the two equations that define how our motors affect our quadrotor system. The form that they are in now makes it convienient for us to measure the constants K_T and K_Q: if we can somehow measure the terms on the right hand side then we can figure out what the constants are.

But first, we need to cover some notation details. When a term is written with a subscript it will be denoted in this text with an underscore (e.g. K_T is K with subscript T), and exponents are denoted with a chevron ( ^ ).  Greek letters such as rho (the sideways 'p') will be spelled out with the standard English letters. Other notation follows standard mathematical practice. The terms used are:

K_T - Thrust constant
K_Q - Torque constant
T - Thrust
Q - Torque
rho - air density
n - motor shaft rotation rate
D - propeller diameter

From these equations, it is clear that we need to measure thrust, torque, air density, rotation speed, and the propeller diameter. Ideally, we would build a machine that for any given motor and propeller combination will automatically test and produce the constants, along with information about how the motor operates and it's efficiency. Measuring rotor diameter, motor speed and air density are all fairly easy to do, and so we won't cover it here. The real challenge comes from measuring motor thrust and torque.

Most of the people who measure motor thrust seem to be RC airplane people. They measure the motor thrust in order to properly size a motor for their airplanes. Most motor thrust test stands are simple: a motor is mounted at the end of a lever arm with the opposite end fixed in place, and an L bend that presses on a scale (similar to a sensitive bathroom scale). From there the hobbyist powers the motor and and reads the force directly off of the scale and ends up with a thrust measurement. Easy.

Calculating torque is a bit more complicated: the motor body needs to be mounted on a rotating axis that is directly in line with the motor shaft, and the torque along this axis needs to be measured. In my research I found only a few examples of a motor torque test stand. This is likely because on an airplane the torque that a motor produces can be considered negligable since the motor is so proportionally small. Unfortunately, this is not the case on a quadrotor. In fact we rely on the torque to yaw the quadrotor vehicle.

To measure torque, most test stands had the motor mounted to a rod, which then had a lever arm attached that pressed on a scale. In a same way as thrust the force pressing on the scale can be read, and with the length of the lever arm torque can be calculated.

We want our test stand to be different: we want to measure both thrust and torque simultaniously, and we want to do all the testing automatically. To do this means that we will need a method to measure force (or pressure) without relying on scale, since a scale would be difficult to connect a microcontroller to. We have decided to try using the Flexiforce pressure sensors. These sensors vary the resistance based on the amount of pressure, and resistance is very easy to measure with a microcontroller.

For motor speed we will be using a Eagle Tree brushless RPM sensors, and we will use an contactless IR thermometer from Parallax. Our main control board will simply be the quadpower board that we have developed for our quadrotor. This has the advantage of being identical to what we will be flying, it will have the motor current and voltage sensing built in, and we'll be able to test the functionality of the board.

At this point, the motor test stand is almost completely built, and we are almost done with the hardware. The hardware is particularly complex because each joint needs ball bearings to make friction negligible, and there are some odd mechanical linkages that we need to account for.

I'll post more later this week when we (hopefully!) have a test stand up and running.

Monday, March 5, 2012

Current Status from Cody

I'd like to post a project status update. It's been a busy few weeks, and we are finally reaching the point where we are able to write software for the hardware. Luke has been working hard on the PCB that we will use for flight control, and I've been trying to get a handle on the math behind the quadrotor.

Quadrotor dynamics are very complex. It's difficult because you have four motors, all pointed in the same direction, and that's it. But yet the vehicle can move on three axes in translational motion, and can rotate about another three axes. A quadrotor must be able to convert the four motor inputs into a these six different (and often conflicting) motions.

The simplest case is hovering in a single location. Effectively, this means that the translational waxes are ignored. To hover, a quadrotor must produce a total amount of thrust equal to the weight of the vehicle. Using the classic equation

F = ma
we can rewrite it for our vehicle

F1 + F2 + F3 + F4 = mg

This says that the total thrust from the motors (also known as the force of the motors) must be equal to the mass of the vehicle multiplied by the gravity.

The system is complicated by the fact that a quadrotor will need to be able to balance itself in flight. If a gust of wind blows from the side and rolls the quadrotor, it will no longer be a stable hover. The vehicle will need to increase the speed of some motors and decrease the speed of others so that the vehicle is rotated back into a vertical hovering state.

There is still a problem: each motor is producing torque on the body as well as thrust. This is manifested by the motor "attempting" to rotate the body due to the spinning mass of the motor and propeller, and the air resistance the propeller encounters. On a quadrotor frame we have two motors spinning clockwise and two spinning counter clockwise, and the torques cancel each other out. But if a motor speed in increased to account for a tilt, then it's torque on the frame will increase as well, causing the whole body to rotate.

And so now, when a gust of wind blows on the frame, we have to somehow correct the tilt without allowing the torque to yaw the frame. At other times we may want to yaw without tilting, and sometimes we want to do both simultaneously.  But there is so much more beyond just the basic thrust, tilt, and yaw: we have to account for many factors such as free stream velocity (or air, in layman's terms) through the propellers, blade flapping, sensor failure, and the many small equations that support the main equations.

One of the most important things that is required to have a stable platform is to accurately characterize the chosen motor and propeller unit that will be flown. There are two equations that rely on this:



[If your browser is like mine and is messing up the equations, then the first is K_T = T / (rho*n^2*D^4) and the second is K_Q = Q / (rho*n^2*D^5) ]

These two equations relate the motor speed (n) to the thrust (T) or torque (Q). Thrust and torque are proportional to a few constants as well: the rotor diameter D, the density of air rho, and a motor specific constant K. It's this constant K that we need to measure.

The test stand that we are building will be able to not only measure the output torque and thrust of the motor, but also the current consumption, voltage supplied, rotational speed, air density, and temperature of the motor body. These extra statistics will allow us to confirm that the relationship between thrust/torque and rotational speed is indeed exponential, and to allow the system to perform self health checks. For example, if a motor is consuming 30A of current and only moving at 5000RPM, and on the test bench the temperature was 90C, then the system can assume that the motor is likely overheating and take steps to account for it.

Over the past few weeks I've been working every day on this project, with as much as 10 hours a day devoted to it. There are so many different facets to this project that it is exhausting. I have worked on everything from the math, to reviewing the control board, to deciding the best method for writing the code, to designing the thrust/torque test stand, to working on all the project organization components. Luke and I have both been going full throttle on this, and it is exhausting.

The project is going fairly well. It's a bit slow since there is so much involved, but we are definitely getting a very good understanding of the complete system, from electronic components to the high level software. There aren't any major blocks in our path except time. We have so many different facets to focus on that we are constantly busy and working hard.

Sunday, March 4, 2012

Compiling HLL to PASM


Introduction

I've been looking for a way to write math intensive code for the Propeller that can run in a single cog. I need to write code to do floating point quaternion multiplication, and I don't want to have to write it in assembly. So, I posted a question on the Parallax forums about how to compile from a high level language (HLL) to readable Propeller assembly (PASM). I got two responses: PropBasic and PropGCC. I took a look at each of them, and did a side by side comparison of their features and the code they produce. GCC seemed to be better, mostly because it can do certain optimizations.

For the most part, I hope to use this information to write a quaternion math object. Quaternion math is computationally intensive: one quaternion multiplication has 12 floating point additions and 16 floating point multiplications. This is particularly relevant for the Propeller since it has no floating point hardware or multiplier hardware. Thus, I can't have ~28 source code lines with just the math, it has to have routines for floating point operations.

The root of the problem is that most HLL compilers for the Propeller use an interpreter, rather than generating PASM directly. This is because the code space for a processor is so small: 496 32 bit words (instructions plus data). So it's very difficult to get a useful program in that amount of memory. Usually a compiler will store instructions in the hub RAM, EEPROM, external RAM, or an SD card. The compiler will place a interpreter in the cog RAM which fetches instructions and executes them. This allows for larger programs, but it is much slower since each instruction needs to be fetched.

With this testing I needed something that did not use a memory model, but just treated the available cog memory as unlimited and compiled directly to PASM. With this method it is up to the designer to make sure that the result fits in cog memory, and is interfaced to properly.

I evaluated the two languages that were suitable: PropBasic and PropGCC.

PropBasic

I used PropBasic version 00.01.14 (2011-07-26) to test with. The source code that I used is a simple program to multiply some numbers together, add them, and divide with them. This goes well with the math intensive but no I/O application that I need. It's probably not a good benchmark to use if you're going to be doing complicated serial communication or anything with delays, I/O, etc.

A side note about PropBasic: the syntax is a bit quirky. It requires that your code have only one operator/statement per line. So "num = a+b+c" is out. It's odd, but easy enough to work with.

1:  DEVICE P8X32A, XTAL1, PLL16X  
2:  FREQ 80_000_000  
3:  num1 VAR LONG  
4:  num2 VAR LONG  
5:  num3 VAR LONG  
6:  result0 VAR LONG  
7:  PROGRAM Start  
8:  Start:  
9:   DO  
10:    num3 = num3 * num3  
11:    num2 = num2 * num2  
12:    num1 = num1 * num1  
13:    result0 = num1 + num2  
14:    result0 = result0 + num3  
15:    result0 = result0 / num3  
16:   LOOP  
17:  END  

I used the following command to test with:
./PropBasic-bst.linux test.pbas

There doesn't seem to be any command line options to use. Anyway, that generated the following Spin file:

1:  '{$BST PATH {REMOVED FROM POSTING}}  
2:  '' *** COMPILED WITH PropBasic VERSION 00.01.14 July 26, 2011 ***  
3:                                 '' This program tests the compiler for PropBasic.  
4:                                 '' Is result a command???  
5:  CON                             'DEVICE P8X32A, XTAL1, PLL16X  
6:   _ClkMode = XTAL1 + PLL16X                   
7:   _XInFreq =  5000000                    'FREQ 80_000_000  
8:  ' num1 VAR LONG                       'num1 VAR LONG  
9:  ' num2 VAR LONG                       'num2 VAR LONG  
10:  ' num3 VAR LONG                       'num3 VAR LONG  
11:  ' result0 VAR LONG                      'result0 VAR LONG  
12:  PUB __Program                        'PROGRAM Start  
13:   CogInit(0, @__Init, @__DATASTART)               
14:  DAT                               
15:           org      0                
16:  __Init                             
17:  __RAM                              
18:           mov      dira,__InitDirA         
19:           mov      outa,__InitOutA         
20:           jmp      #Start             
21:  Start                            'Start:  
22:  __DO_1                            ' DO  
23:           mov      __temp1,num3         '  num3 = num3 * num3  
24:           mov      __temp2,num3          
25:           abs      __temp1,__temp1 WC       
26:           muxc     __temp3,#1           
27:           abs      __temp2,__temp2 WC, WZ     
28:    IF_C     xor      __temp3,#1           
29:           mov      __temp4,#0           
30:           mov      __temp5,#32           
31:           shr      __temp1,#1 WC          
32:  __L0001                             
33:    IF_C     add      __temp4,__temp2 WC       
34:           rcr      __temp4,#1 WC          
35:           rcr      __temp1,#1 WC          
36:           djnz     __temp5,#__L0001        
37:           test     __temp3,#1 WZ          
38:    IF_NZ     neg      __temp4,__temp4         
39:    IF_NZ     neg      __temp1,__temp1 WZ       
40:    IF_NZ     sub      __temp4,#1           
41:           mov      num3,__temp1          
42:           mov      __temp1,num2         '  num2 = num2 * num2  
43:           mov      __temp2,num2          
44:           abs      __temp1,__temp1 WC       
45:           muxc     __temp3,#1           
46:           abs      __temp2,__temp2 WC, WZ     
47:    IF_C     xor      __temp3,#1           
48:           mov      __temp4,#0           
49:           mov      __temp5,#32           
50:           shr      __temp1,#1 WC          
51:  __L0002                             
52:    IF_C     add      __temp4,__temp2 WC       
53:           rcr      __temp4,#1 WC          
54:           rcr      __temp1,#1 WC          
55:           djnz     __temp5,#__L0002        
56:           test     __temp3,#1 WZ          
57:    IF_NZ     neg      __temp4,__temp4         
58:    IF_NZ     neg      __temp1,__temp1 WZ       
59:    IF_NZ     sub      __temp4,#1           
60:           mov      num2,__temp1          
61:           mov      __temp1,num1         '  num1 = num1 * num1  
62:           mov      __temp2,num1          
63:           abs      __temp1,__temp1 WC       
64:           muxc     __temp3,#1           
65:           abs      __temp2,__temp2 WC, WZ     
66:    IF_C     xor      __temp3,#1           
67:           mov      __temp4,#0           
68:           mov      __temp5,#32           
69:           shr      __temp1,#1 WC          
70:  __L0003                             
71:    IF_C     add      __temp4,__temp2 WC       
72:           rcr      __temp4,#1 WC          
73:           rcr      __temp1,#1 WC          
74:           djnz     __temp5,#__L0003        
75:           test     __temp3,#1 WZ          
76:    IF_NZ     neg      __temp4,__temp4         
77:    IF_NZ     neg      __temp1,__temp1 WZ       
78:    IF_NZ     sub      __temp4,#1           
79:           mov      num1,__temp1          
80:           mov      result0,num1         '  result0 = num1 + num2  
81:           adds     result0,num2          
82:                                 '  result0 = result0 + num3  
83:           adds     result0,num3          
84:           mov      __temp1,result0       '  result0 = result0 / num3  
85:           mov      __temp2,num3          
86:           mov      __temp4,#0           
87:           abs      __temp1,__temp1 WC       
88:           muxc     __temp5,#1           
89:           abs      __temp2,__temp2 WC, WZ     
90:    IF_Z     mov      __temp1,#0           
91:    IF_Z     jmp      #__L0004            
92:    IF_C     xor      __temp5,#1           
93:           mov      __temp3,#0           
94:           min      __temp2,#1           
95:  __L0005                             
96:           add      __temp3,#1           
97:           shl      __temp2,#1 WC          
98:    IF_NC     jmp      #__L0005            
99:           rcr      __temp2,#1           
100:  __L0006                             
101:           cmpsub    __temp1,__temp2 WC       
102:           rcl      __temp4,#1           
103:           shr      __temp2,#1           
104:           djnz     __temp3,#__L0006        
105:           test     __temp5,#1 WZ          
106:    IF_NZ     neg      __temp4,__temp4         
107:    IF_NZ     neg      __temp1,__temp1         
108:  __L0004                             
109:           mov      result0,__temp4         
110:           jmp      #__DO_1           ' LOOP  
111:  __LOOP_1                            
112:           mov      __temp1,#0          'END  
113:           waitpne    __temp1,__temp1         
114:  '**********************************************************************  
115:  __InitDirA    LONG 000000_00000000_00000000_00000000  
116:  __InitOutA    LONG 000000_00000000_00000000_00000000  
117:  _FREQ      LONG 80000000  
118:  __remainder  
119:  __temp1     RES 1  
120:  __temp2     RES 1  
121:  __temp3     RES 1  
122:  __temp4     RES 1  
123:  __temp5     RES 1  
124:  __param1     RES 1  
125:  __param2     RES 1  
126:  __param3     RES 1  
127:  __param4     RES 1  
128:  __paramcnt    RES 1  
129:  num1       RES 1  
130:  num2       RES 1  
131:  num3       RES 1  
132:  result0     RES 1  
133:  FIT 492  
134:  CON  
135:   LSBFIRST             = 0  
136:   MSBFIRST             = 1  
137:   MSBPRE              = 0  
138:   LSBPRE              = 1  
139:   MSBPOST             = 2  
140:   LSBPOST             = 3  
141:  DAT  
142:  __DATASTART  

1. Every source code line is in the .spin file as a comment, which is very helpful.I think the compiler did a good job of being faithful to the original code, but I noticed some things:
2. The multiplication and division is done inline, so each additional multiplication consumes 18 longs. It does share temporary variables however.
3. All variables are stored in cog RAM, and user defined variables use the user defined name.
4. The compiler added the remnants of some serial communication code: three longs at "__RAM" and a constants block.
5. The code is nicely formatted straight from the compiler (although it uses spaces instead of tabs).

Propeller GCC

I used the most recent (and only) version posted in the GCC downloads page (v0_2_3 from 2012-02-08). The source program I used was the same as from the PropBasic, except modified a bit for C.

1:  #if defined(__propeller__)  
2:  #include <propeller.h>  
3:  #define int32_t int  
4:  #define int16_t short int  
5:  #else  
6:  #endif  
7:  int main()  
8:  {  
9:   for(;;){  
10:   volatile int num1, num2, num3, result0;  
11:   num3 = num3 * num3;  
12:    num2 = num2 * num2;  
13:    num1 = num1 * num1;  
14:    result0 = num1 + num2;  
15:    result0 = result0 + num3;  
16:    result0 = result0 / num3;  
17:   }  
18:  }  

I based it off the fft_bench.c demo, which is why it has the various preprocessor statements at the begining. Note the use of the keyword "volatile" for the int declaration: wihtout it the compiler simply optimized away everything into a simple jump loop.

Anyway, I used the following command to generate the code:

propeller-elf-gcc -Os -S -mcog -mspin test.c

The options do the following:
-0s: optimize code for minimum size
-S: output source code as a file
-mcog: use the cog memory model (put everything in a single cog)
-mspin: generate the resulting spin file

There is also the -mfcache option, but in this case it did not generate code any differently.

And, when run it generated the following spin code:

1:  '' spin code automatically generated by gcc  
2:  CON  
3:   _clkmode = xtal1+pll16x  
4:   _clkfreq = 80_000_000  
5:   __clkfreq = 0 '' pointer to clock frequency  
6:   '' adjust STACKSIZE to how much stack your program needs  
7:   STACKSIZE = 256  
8:  VAR  
9:   long cog '' cog that was started up by start method  
10:   long stack[STACKSIZE]  
11:   '' add parameters here  
12:   long param  
13:  '' add any appropriate methods below  
14:  PUB start  
15:   stop  
16:   cog := cognew(@entry, @param) + 1  
17:  PUB stop  
18:   if cog  
19:    cogstop(cog~ - 1)  
20:  DAT  
21:       org  
22:  entry  
23:  r0     mov     sp,PAR  
24:  r1     mov     r0,sp  
25:  r2     jmp     #_main  
26:  r3     long 0  
27:  r4     long 0  
28:  r5     long 0  
29:  r6     long 0  
30:  r7     long 0  
31:  r8     long 0  
32:  r9     long 0  
33:  r10     long 0  
34:  r11     long 0  
35:  r12     long 0  
36:  r13     long 0  
37:  r14     long 0  
38:  lr     long 0  
39:  sp     long 0  
40:       '.text  
41:       long  
42:       'global variable     _main  
43:  _main  
44:       sub     sp, #16  
45:  L_L2  
46:       mov     r5, #4  
47:       add     r5, sp  
48:       mov     r6, #8  
49:       add     r6, sp  
50:       mov     r7, #12  
51:       add     r7, sp  
52:       rdlong     r0, r5  
53:       rdlong     r1, r5  
54:       call     #__MULSI  
55:       wrlong     r0, r5  
56:       rdlong     r0, r6  
57:       rdlong     r1, r6  
58:       call     #__MULSI  
59:       wrlong     r0, r6  
60:       mov     r6, #8  
61:       add     r6, sp  
62:       rdlong     r0, r7  
63:       rdlong     r1, r7  
64:       call     #__MULSI  
65:       wrlong     r0, r7  
66:       mov     r7, #12  
67:       add     r7, sp  
68:       rdlong     r7, r7  
69:       rdlong     r6, r6  
70:       add     r7, r6  
71:       wrlong     r7, sp  
72:       rdlong     r7, sp  
73:       rdlong     r6, r5  
74:       add     r7, r6  
75:       wrlong     r7, sp  
76:       rdlong     r0, sp  
77:       rdlong     r1, r5  
78:       call     #__DIVSI  
79:       wrlong     r0, sp  
80:       jmp     #L_L2  
81:  __MASK_0000FFFF     long     $0000FFFF  
82:  __TMP0     long     0  
83:  __MULSI  
84:       mov     __TMP0,r0  
85:       min     __TMP0,r1  
86:       max     r1,r0  
87:       mov     r0,#0  
88:  __MULSI_loop  
89:       shr     r1,#1 wz,wc  
90:   IF_C     add     r0,__TMP0  
91:       add     __TMP0,__TMP0  
92:   IF_NZ     jmp     #__MULSI_loop  
93:  __MULSI_ret     ret  
94:  __MASK_00FF00FF     long     $00FF00FF  
95:  __MASK_0F0F0F0F     long     $0F0F0F0F  
96:  __MASK_33333333     long     $33333333  
97:  __MASK_55555555     long     $55555555  
98:  __CLZSI     rev     r0,#0  
99:  __CTZSI     neg     __TMP0,r0  
100:       and     __TMP0,r0 wz  
101:       mov     r0,#0  
102:       IF_Z     mov     r0,#1  
103:       test     __TMP0, __MASK_0000FFFF wz  
104:       IF_Z     add     r0,#16  
105:       test     __TMP0, __MASK_00FF00FF wz  
106:       IF_Z     add     r0,#8  
107:       test     __TMP0, __MASK_0F0F0F0F wz  
108:       IF_Z     add     r0,#4  
109:       test     __TMP0, __MASK_33333333 wz  
110:       IF_Z     add     r0,#2  
111:       test     __TMP0, __MASK_55555555 wz  
112:       IF_Z     add     r0,#1  
113:  __CLZSI_ret ret  
114:  __DIVR     long     0  
115:  __DIVCNT     long     0  
116:  __UDIVSI  
117:       mov     __DIVR,r0  
118:       call     #__CLZSI  
119:       neg     __DIVCNT,r0  
120:       mov     r0,r1  
121:       call     #__CLZSI  
122:       add     __DIVCNT,r0  
123:       mov     r0,#0  
124:       cmps     __DIVCNT,#0 wz,wc  
125:   IF_C     jmp     #__UDIVSI_done  
126:       shl     r1,__DIVCNT  
127:       add     __DIVCNT,#1  
128:  __UDIVSI_loop  
129:       cmpsub     __DIVR,r1 wz,wc  
130:       addx     r0,r0  
131:       shr     r1,#1  
132:       djnz     __DIVCNT,#__UDIVSI_loop  
133:  __UDIVSI_done  
134:       mov     r1,__DIVR  
135:  __UDIVSI_ret     ret  
136:  __DIVSGN     long     0  
137:  __DIVSI     mov     __DIVSGN,r0  
138:       xor     __DIVSGN,r1  
139:       abs     r0,r0 wc  
140:       muxc     __DIVSGN,#1 wc  
141:       abs     r1,r1  
142:       call     #__UDIVSI  
143:       cmps     __DIVSGN,#0 wz,wc  
144:       IF_B     neg     r0,r0  
145:       test     __DIVSGN,#1 wz  
146:       IF_NZ     neg     r1,r1  
147:  __DIVSI_ret     ret  

Some things that I have noticed about this code:
1. The output lacks suitable comments, and the resultant code is rather difficult to understand. It doesn't use original variable names.
2. It creates a multiplication subroutine. This is slightly less efficient in execution time than putting it inline, but it is vastly more efficient on space.
3. The code stores variables in the hub, not the cog as expected.
4. The -Os option appears to be needed: with no optimization the output code is 192 lines. Interstingly, -O2 gives the same output as -0s.
5. The multiply loop ("__MULSI") is very compact (9 longs). It looks like it is O(1). It is also only 4 lines, so at most it will take 32*4 cycles to complete. I'm not sure how it works yet though (especially with a sign).
6. The divide routine is a bit more expensive: 51 longs. To support it though, the loop ("__UDIVSI") is as efficient as the multiply loop.
7. GCC isn't very efficient in memory management from the default: it creates a 256 long hub stack and a 16 long cog stack frame. This could probably be cleaned up manually.
8. It's missing a "FIT" statement at the end.
9. The generated code isn't very well formatted.

Next, I tried a slightly modified source:

1:  #if defined(__propeller__)  
2:  #include <propeller.h>  
3:  #define int32_t int  
4:  #define int16_t short int  
5:  #else  
6:  #endif  
7:  int main()  
8:  {  
9:       for(;;){  
10:            int num1, num2, num3;  
11:            volatile int result0;  
12:            num3 = num3 * num3;  
13:             num2 = num2 * num2;  
14:             num1 = num1 * num1;  
15:             result0 = num1 + num2;  
16:             result0 = result0 + num3;  
17:             result0 = result0 / num3;  
18:       }  
19:  }  


Note here that the only variable marked volatile is result0. I compiled with
propeller-elf-gcc -Os -S -mcog -mspin test.c

And got the following output:

1:  '' spin code automatically generated by gcc  
2:  CON  
3:   _clkmode = xtal1+pll16x  
4:   _clkfreq = 80_000_000  
5:   __clkfreq = 0 '' pointer to clock frequency  
6:   '' adjust STACKSIZE to how much stack your program needs  
7:   STACKSIZE = 256  
8:  VAR  
9:   long cog '' cog that was started up by start method  
10:   long stack[STACKSIZE]  
11:   '' add parameters here  
12:   long param  
13:  '' add any appropriate methods below  
14:  PUB start  
15:   stop  
16:   cog := cognew(@entry, @param) + 1  
17:  PUB stop  
18:   if cog  
19:    cogstop(cog~ - 1)  
20:  DAT  
21:       org  
22:  entry  
23:  r0     mov     sp,PAR  
24:  r1     mov     r0,sp  
25:  r2     jmp     #_main  
26:  r3     long 0  
27:  r4     long 0  
28:  r5     long 0  
29:  r6     long 0  
30:  r7     long 0  
31:  r8     long 0  
32:  r9     long 0  
33:  r10     long 0  
34:  r11     long 0  
35:  r12     long 0  
36:  r13     long 0  
37:  r14     long 0  
38:  lr     long 0  
39:  sp     long 0  
40:       '.text  
41:       long  
42:       'global variable     _main  
43:  _main  
44:       sub     sp, #4  
45:  L_L2  
46:       mov     r1, r7  
47:       mov     r0, r7  
48:       call     #__MULSI  
49:       mov     r7, r0  
50:       mov     r1, r4  
51:       mov     r0, r4  
52:       call     #__MULSI  
53:       mov     r1, r5  
54:       mov     r4, r0  
55:       mov     r0, r5  
56:       call     #__MULSI  
57:       mov     r6, r0  
58:       add     r6, r4  
59:       mov     r5, r0  
60:       mov     r1, r7  
61:       wrlong     r6, sp  
62:       rdlong     r6, sp  
63:       add     r6, r7  
64:       wrlong     r6, sp  
65:       rdlong     r0, sp  
66:       call     #__DIVSI  
67:       wrlong     r0, sp  
68:       jmp     #L_L2  
69:  __MASK_0000FFFF     long     $0000FFFF  
70:  __TMP0     long     0  
71:  __MULSI  
72:       mov     __TMP0,r0  
73:       min     __TMP0,r1  
74:       max     r1,r0  
75:       mov     r0,#0  
76:  __MULSI_loop  
77:       shr     r1,#1 wz,wc  
78:   IF_C     add     r0,__TMP0  
79:       add     __TMP0,__TMP0  
80:   IF_NZ     jmp     #__MULSI_loop  
81:  __MULSI_ret     ret  
82:  __MASK_00FF00FF     long     $00FF00FF  
83:  __MASK_0F0F0F0F     long     $0F0F0F0F  
84:  __MASK_33333333     long     $33333333  
85:  __MASK_55555555     long     $55555555  
86:  __CLZSI     rev     r0,#0  
87:  __CTZSI     neg     __TMP0,r0  
88:       and     __TMP0,r0 wz  
89:       mov     r0,#0  
90:       IF_Z     mov     r0,#1  
91:       test     __TMP0, __MASK_0000FFFF wz  
92:       IF_Z     add     r0,#16  
93:       test     __TMP0, __MASK_00FF00FF wz  
94:       IF_Z     add     r0,#8  
95:       test     __TMP0, __MASK_0F0F0F0F wz  
96:       IF_Z     add     r0,#4  
97:       test     __TMP0, __MASK_33333333 wz  
98:       IF_Z     add     r0,#2  
99:       test     __TMP0, __MASK_55555555 wz  
100:       IF_Z     add     r0,#1  
101:  __CLZSI_ret ret  
102:  __DIVR     long     0  
103:  __DIVCNT     long     0  
104:  __UDIVSI  
105:       mov     __DIVR,r0  
106:       call     #__CLZSI  
107:       neg     __DIVCNT,r0  
108:       mov     r0,r1  
109:       call     #__CLZSI  
110:       add     __DIVCNT,r0  
111:       mov     r0,#0  
112:       cmps     __DIVCNT,#0 wz,wc  
113:   IF_C     jmp     #__UDIVSI_done  
114:       shl     r1,__DIVCNT  
115:       add     __DIVCNT,#1  
116:  __UDIVSI_loop  
117:       cmpsub     __DIVR,r1 wz,wc  
118:       addx     r0,r0  
119:       shr     r1,#1  
120:       djnz     __DIVCNT,#__UDIVSI_loop  
121:  __UDIVSI_done  
122:       mov     r1,__DIVR  
123:  __UDIVSI_ret     ret  
124:  __DIVSGN     long     0  
125:  __DIVSI     mov     __DIVSGN,r0  
126:       xor     __DIVSGN,r1  
127:       abs     r0,r0 wc  
128:       muxc     __DIVSGN,#1 wc  
129:       abs     r1,r1  
130:       call     #__UDIVSI  
131:       cmps     __DIVSGN,#0 wz,wc  
132:       IF_B     neg     r0,r0  
133:       test     __DIVSGN,#1 wz  
134:       IF_NZ     neg     r1,r1  
135:  __DIVSI_ret     ret  

This is much better: the output no longer has a bunch of RDLONG and WRLONGs, and is hench much more efficient. Previously, the main loop was 34 lines (many of which are hub access), and now it is 22 lines. The other comments still apply though. Also as before, -mfcache did not change the output code.

Conclusion

I think I will look into Propeller GCC more. It seems to do a good job for compiling down to efficient Propeller assembly, and it isn't too hard to read the output. I hope that it will be improved over time as well. The PropBasic compiler has a more understandable output, but the inefficient use of cog RAM and the lack of updates (no changes in 8 months) has me worried. Propeller GCC seems to fit my requirements.