Pulse compression
The most thorough explanation of pulse compression in radar application I've found so far is Radar Systems Analysis and Design Using Matlab by Baseem R. Mahafza (2000 edition). Repeating the relevant parts of Chapter 6 (I changed some notations for my own clarity), firstly a rectangular pulse signal of duration is 𝞽p:
s(t) = Rect(t/𝞽p) = 1 for 0 <= t <= 𝞽p, else 0
A compressed pulse is a sinusoid of linearly increasing frequency (which means the phase is increasing quadratically) modulated by the Rect function above. For an EM (electro-magnetic) wave which has displacements in the direction orthogonal to the direction of wave propagation (Z by convention)--and therefore can be polarized, the wave can be expressed conveniently with a complex exponential for the uncompressed signal in time domain su(t)
su(t) = Rect(t/𝞽p) exp{ j 2𝞹(f0 t + 𝞵/2 t2) }
Its real or imaginary part will be a sinusoid with increasing frequency, as shown below, for 𝞽p = 512 / 44.1 kHz, f0 = 200, 𝞵 = 0.5 * B/𝞽p, where the bandwidth B = 10 kHz.
Note that the frequency spectrum of the signal is NOT symmetric about DC, because the signal is complex. If this fictitious wave (that has polarization and travels at the speed of sound) hits 3 targets at distances 3.9 m, 4 m, and 10 m away from the speaker/receiver combo, with sonar cross-section (relative scale of how well the target back-reflects the sound wave) of 1, 1.5, and 2 respectively, the received sound wave at the receiving antenna will be a superposition of the reflection of the 3 targets. Because the 1st and the 2nd target round trip distances are close together, the reflected waves interfere, and produce a non-constant magnitude at the receiver, as shown below.
Radar signals drop off as R^4, so the returned wave from the 3rd target is tiny. Also note that the returned wave duration is as long as the transmitted rectangular pulse, so the ability to resolve the returned signal into a precise radial distance is challenging--and rather impossible when the target distances overlap. But in compressed pulse processing, we cross-correlate the transmitted with the received signal. Note conjugation of FFTsr
FFTsu = FFT(su[k]); FFTsr = FFT(sr[k])
corr[k] = FFT-1(FFTsu FFT*sr) / length(sr[k])
where su[k] and sr[k] are the sampled sequence of the transmitted and received signals. The result is magical.The diffuse energy of the pulse is concentrated into the correlation peak, improving the signal strength by the compression ratio 𝞵. And the correlation peak has a width of roughly c/B, where c is the wave propagation speed. In this fictitious example, c/B works out to 3.3 cm, which is much smaller than the radial distance separation of targets 1 and 2, so the 2 targets appear distinct. No wonder compressed pulse is widely used for radar and sonar. But I am not trying to reinvent sonar, but rather estimate the distance of my phone from another device using the mic and speakers available on all modern smartphones.
1 way sonar pulse compression
If I constrain myself to using the existing HW on a smartphone, the sound wave bandwidth should be limited to roughly B <= 20 kHz. Since the sound wave is a pressure oscillation in the direction of the wave propagation, the sound wave equation is real, which I write as a cosine.su(t) = A_u Rect(t/𝞽p) cos{ 2𝞹(f0 t + 𝞵/2 t2) }
where A_u is the sound amplitude. Because the pulse duration cannot be too long for a fast dynamics of mobile game player, I limit the pulse duration to < 50 ms. This time and bandwidth limit I put on myself is a problem, as I explain below. The transmitted signal in time and frequency domain is given below. Note that the frequency spectrum is now symmetric since the signal is real only.
In an indoor gaming environment, this sound has to compete with ambient sound, such as people talking, TV noise, and game music itself. I found that I had to keep the transmitted pulse fairly strong against the ambient, to detect pulses coming from roughly 10 m away. Even at 10x the ambient sound level, it was difficult to detect a putative source 10 m away (but closer targets are no problem), as you can see in the cross correlation magnitude below:
Conclusion: borderline practical
For distances < 5 m, the xcorr SNR seems strong enough for a fairly robust distance measurement even with the signal volume turned all the way down to the ambient sound level (therefore borderline unnoticeable), as you can see below.
5 m sounds too small a work volume to be useful for a dynamic peer-to-peer shooting game. But it is plenty large for a more stationary scenario like playing with Lego blocks. I am going to try other external means of estimating attitude and position first.