FeenPhone Non-Windows Developer Protocol Guide

CIMG0771

FeenPhone source code is on GitHub, HERE.
.
Licensed BipCot NoGov License, version 1. More info HERE.
.
OVERVIEW / DEFINITION section by Michael W. Dean. PACKETS and AUDIO PARAMETERS sections by Derrick Slopey:
.

OVERVIEW / DEFINITION OF FeenPhone:

FeenPhone is free NoGov (BipCot NoGov License, version 1) decentralized P2P true-duplex mono VoIP software designed with highest quality audio in mind. FeenPhone uses Opus codec over TCP. Default Opus 24KHz (32kbps).
.
FeenPhone is made for people who wish to produce broadcast-quality conversational spoken audio with people who are not in the same location.
.
FeenPhone uses Opus codec and NAudio, plus a bunch of original code. The only component that is imperative to use in porting the functionalities in other OSes is the Opus Codec and the packet protocol.
.
Unlike all other VoIPs, FeenPhone contains no echo cancellation, no noise reduction and no audio compression. This helps make the audio quality higher than with other VoIPs. But this also means that FeenPhone must be used in a quiet room with a decent mic, a windscreen, the mic close to the talker’s mouth, and most importantly, with headphones. The headphones should be closed-ear, not open-ear, to prevent echo.
.
FeenPhone could be used without headphones, but only in a push-to-talk system where only one person speaks at a time. This would be a possible application for FeenPhone for mobile devices. The microphones on iPhones, Android phones and Windows phones are fairly good, the limitations of the phone network and / or the limitations of various VoIPs commonly used on these devices are the reason they don’t sound great. FeenPhone could sound great on any of these devices if set up to be push-to-talk in one direction at a time. (Read the last paragraph of the “Uses for FeenPhone” section on the front page of this website for more on push-to-talk uses for FeenPhone.)
.
If you add echo cancellation, noise reduction or audio compression to FeenPhone, it will cease to be FeenPhone. If that’s your plan, you should probably use different VoIP code that already has those features anyway.
.
FeenPhone will work with any microphone, but is designed for, and sounds best with, a good cardioid dynamic USB mic, with a windscreen, on a stand, near the mouth of the user, in a quiet sound-conditioned room. The recommended specific mic is the $50 Audio-Technica  AT2005USB. The makers of FeenPhone tested many microphones, and made FeenPhone with this mic in mind.
.
FeenPhone will work with gamer headsets, but those don’t sound great. FeenPhone will work with USB condenser mics, but they’re too sensitive, and will pick up background noise. FeenPhone will work with electret condenser mics on iPhones, iPads, Android devices, Windows Phones and Windows pads. This could sound fairly good, if the talker on each end is a few inches away from the device, and in a quiet room. As stated above, users on both ends would have to use headphones (not ear buds), or only use FeenPhone in one direction, in a push-to-talk capacity.
.
Low buffer sizes for input and output are important for low latency, so audio drivers which work closest to the hardware are preferred. FeenPhone is client-server, and currently the server is responsible for redistributing the audio from each client to the others. The server will have the lowest audio latency, but requires the most bandwidth and processing power.
.
FeenPhone could be built for many of different platforms, but on some of them it may make more sense to just rebuild from scratch, using the same design goal (no noise cancellation, made to use for high-quality spoken audio in a quiet room using headphones and a good mic), and the same codec, codec settings, and the FeenPhone-specific packet protocol. A lot of the work on FeenPhone itself was concerned with interfacing with the audio system and handling system-level audio device stuff, this won’t likely translate well to other systems.
.
The networking engine may work well on Mono, but the audio system will not.
.
It will be a lot easier to simply start by making Clients for Linux, Android and Apple than to make full Server / Clients like our Windows version.
.
As of version beta v0.1.5492.36528 (released 1/18/15), there is no encryption in FeenPhone. Encryption is a planned feature to add later. Encryption was not included in the working proof-of-concept beta version because FeenPhone was originally created for producing media with the intention of sharing that media to the public, and we had a lot of features to add. Adding encryption to FeenPhone will make FeenPhone the best-sounding secure VoIP in existence. This is because encryption, done right, will not degrade the audio quality of FeenPhone. And FeenPhone is already the best-sounding VoIP in existence.
.

PACKETS:

Currently the packet writers are defined in FeenPhone\Packets\Packet.cs

Readers can probably be inferred from them. The Packet readers/writers could use some cleanup and that is intended to be done before the next release.

Basic packet structure:

[byte packetId][ushort payloadLength][byte[] payload]

Encoding:

All numbers are little-endian unless otherwise specified.

A ushort is a little-endian two-byte number, for example 258 is encoded as [0x01][0x02]

A bool is a single byte 0 or 1

Text is Ascii encoded, and is not prefixed with a length unless otherwise specified. Most text was put at the end of the message so that its length can be calculated from the payload length.

Packet Ids:

  • Chat = 2,
  • LoginStatus = 5,
  • LoginRequest = 6,
  • UserList = 12,
  • UserLogout = 14,
  • UserLogin = 15,
  • Audio = 16,
  • PingReq = 22,
  • PingResp = 23,

Structures:

UserData
[ushort length][AsciiEncodedText]The user data structure contains a unsigned 16bit short little-endian length of the ASCII encoded text data followed by the text data which is a tab delimited list of the follow string values:UserID: a guid string in the format shown in the example.
Admin: “1” or “0” indicating whether the user is an administrator
Username
NicknameText data example:”0989547f-9986-4319-a0e2-eeeab91a9137\t0\tderrick\tderrick”

Packets:

Chat (Client, Server):
[0x02][ushort payloadLength][UserData][AsciiEncodedText]Note: The acsii encoded text is not prefixed with a length, but will be equal to the payloadLength-UserDataLength. The UserData length is specified in the first two bytes of the UserData structure.
LoginStatus (Server):
[0x05][ushort payloadLength][bool loginStatus][AsciiEncodedText]

AsciiEncodedText:
An optional message from the server. If omitted, payloadLength will be 1

Client should prompt user for login credentials if loginStatus==0

LoginRequest (Client):
[0x06] [ushort payloadLength][AsciiEncodedText]

AsciEncodedText
Tab delimited username then password. Like: “derrick\tpass”

Server will reply with LoginStatus packet.

UserList (Server):
[0x0C][ushort payloadLength][byte count][UserData….]

Count
The number of user structures contained in the packet.
UserInfoStructures
The UserInfo datas for each connected user.
UserLogout (Server):
[0x0D][ushort payloadLength][][UserData]

UserData:
The User structure for the user which logged out or disconnected
UserLogin (Server):
[0x0E][ushort payloadLength][][UserData]

UserData
The User structure for the user which logged in
Audio(Client, Server):
[0x10][ushort payloadLength][byte codecID][bool containsGuid][optional byte(16) userID][AudioData]

codecID
This is the defined codec under which the audio data is encoded. See the Audio Codecs section for more information.
containsGuid
True if the UserID is specified, used by server for broadcasting other user’s data. It is always false when the client is sending its own audio to the server, or if the server is sending its own audio to clients.
UserID:
If containsGuid is true, this is a 16-byte representation of the guid UserId which the audio was received from.
AudioData:
The encoded audio stream

The Server is responsible for echoing any audio received from clients to all the other clients. There is no mixing done on the server, each client receives a discrete audio stream from each user. It is preferred that the server add the received audio to its own play buffer before sending the packets to the clients to minimize local latency (at the expense of remote latency).

PingReq (Client, Server):
[0x16][0x02][ushort timecode]

Timecode
A 16-bit timestamp generated by the requestor. The source if the timestamp does not matter because the remote device simply echoes this code back to the requestor. In the .Net implementation the total milliseconds since program start is used for this value.

Upon receipt, the PingResp message should be sent back containing the same timecode which was sent.

PingResp (Client, Server):
[0x17][0x02][ushort echo]

echo
The 16-bit value that was sent in the request.

AUDIO PARAMETERS:

These are the most basic parameters for encoding the audio in supported formats. Please refer to the documentation for each codec for additional implementation information. The left prefixes each codec name with the CodecID as specified by the packet protocol for the AudioData packet.

Currently Supported CodecsIDs:

43: OpusCodec24kHzVoip32768
Channels: 1Opus segment frames: 960bitRate: 32768outputSampleRate: 24000codingMode: Voip = 2048
77: OpusCodecAudio48kHz65536
Channels: 1Opus segment frames: 960bitRate: 65536outputSampleRate: 48000codingMode: Audio = 2049
140: G722ChatCodec
Channels: 1bitRate: 64000outputSampleRate: 16000Flags: None
106: Uncompressed8KHzMonoPcmChatCodec
Channels: 2bitRate: 128000outputSampleRate: 8000Sample format: 16bit

The non-Opus codecs / audio formats (G.722 and PCM) and non-TCP protocols (the options for UDP and for Telnet) in FeenPhone are optional and are not needed in a derived version. The client that does not support a codecID should gracefully decline to play the audio, and perhaps inform the user that the codec is unsupported.
.
UPDATE 2/25/15: While G.722 and PCM are still in the code of the program, we’ve removed the ability to access them from the Interface. They don’t sound nearly as good as Opus. And G.722 and PCM will not allow FeenPhone to interface with hardware devices that use those formats, because FeenPhone’s unique and better packet protocol will not work with that hardware.
.
Many people will say that UDP should be used instead of TCP. We found TCP works better with Opus under our packet protocol.

=-

Please contact us HERE or post a comment below if you’re working on a non-Windows version so people can coordinate. And we’re available for testing your version over the Internet, and will offer feedback free. If you work with us on it and it rocks, we may help you promote it when you’re done. Thank you!

 

Posted in Tech   

Author: MichaelWDean

12 thoughts on “FeenPhone Non-Windows Developer Protocol Guide

  1. Hey Derrick, is there any reason to have gone with this type of signalling instead of utilizing SIP? If you were able to wrap a SIP interpreter into the windows server, many high quality devices could connect directly to the server/user agent. Even free sip phones like xlite, or ones readily available in the android market could talk with the supported g722. High quality speakerphones like the polycom IP7000 could also join in the mix. With SIP, it would outclass Blink.

    1. Nick,

      G.722 sounds bad compared to Opus and uses more bandwidth.

      And initiating the call vs. an external server is silly. It may open this up to more people, but this software is not for most people. It’s for smart people wanting to do high-quality two-way spoken audio. I don’t want to dumb it down and compromise it with anything just to make it work for lazy people who aren’t even going to bother to use headphones and use a good mic with a windscreen up close. That’s what FeenPhone is made for.

      That’s my take. Derrick may have more.

      FWIW, we do have this in the long list of possible features: “WebRTC support (for callers calling in to a radio show or podcast).”
      http://feenphone.com/?page_id=78

      It’s pretty far down the list though, not a huge priority, and dependent on donations.

      I think WebRTC is going to replace SIP as the go-to method. So you’re basically asking “why don’t you make this work like old things?” Normal question, but we’re building new things, not replicating old things. SIP can work with WebRTC, but it’s not limited as the signaling method. But like I say, this is VERY far down our list. There are a lost more important things to add first. We’ve raised less than four figures this round. Post FeenPhone around to help get people to chip in, if you would.

      I’m not trying to be rude, at all. I’m just explaining that what we’re doing here is a new paradigm. Asking why FeenPhone won’t work with SIP phone software is like asking why Bitcoin won’t work with PayPal.

      Also, as it stands now, FeenPhone could easily connect with existing devices simply by using the same Codec or audio format and connection protocol. Derrick has come up with new, better ways of doing packets and byte orders
      http://feenphone.com/?p=2424

      that are superior to some of those other ones. We don’t want to dumb this down to them, those programs should smart up to us. Your test call with Drew and I tonight using a mixer to connect you on Skype with Drew and I on FeenPhone is a good example of why I don’t want to dumb down to other systems. We sounded stellar, Skype sounded horrible. Why would anyone WANT FeenPhone to make calls with crappy sounding programs / systems? It would be maddening for the person on FeenPhone.

      Also, I don’t want to do anything with Blink. I’ve had some email exchanges with the guy who makes Blink. A year ago we were looking at building FeenPhone from the Source of Blink. The guy is using a proprietary closed-source license, and I did not enjoy dealing with him. I do not want to try to work with him on anything. The upside is Derrick started from scratch and built FeenPhone from the ground up, and found better ways to network, route packets and deal with buffering, on his own. Connecting point-to-point via IP was my idea. So was using no noise reduction or echo cancelation, which is one of the big things that makes FeenPhone FeenPhone, and makes it have far better sound than Blink, Skype, Mumble, et al.

      As for other devices, if you make software able to connect with physical phones over telecom systems (even internet phones), you suddenly become a telecom and are heavily regulated by the FCC.

      I’m already on commercial radio every day. It’s controlled by the FCC. One of the goals of FeenPhone is to replace radio (it could be used as an ultra-high quality one-to-many streaming sever, as well as connecting co-hosts). Getting into a telecom realm is getting deep into bed with the FCC. Ewwwww!

      Telecoms have to deal with the FCC on the federal level, and all 50 members of the National Association of Regulatory Utility Commissioners – one for each state regardless of what state your company is in, plus similar bodies in different countries. Ugh.

      Also if you’re interfacing with speaker phones that are hands free, FeenPhone isn’t going to work, unless FeenPhone is push-to-talk one direction only. If you add echo cancellation to FeenPhone it’s not going to be FeenPhone anymore, and you should just use existing solutions instead.

      Worms.
      MWD

    2. Nick,

      Also, you’re kind of late to this and don’t realize that we’ve gotten so many mission creep “MAKE FeenPhone WORK FOR MY ________ DEVICE!” comments that people have made about 20 memes about it. Here’s one:

    3. And of course the real answer is this: FeenPhone is open source. Make whatever you want with the source, but don’t call it FeenPhone.

  2. I understand your concerns and an not asking anyone to make it work on any device. I was just giving examples of how versatile an rfc compliant system like this could work, that was the point of the speakerphone comment.

    SIP is amazingly versatile and provides signal negotiations features I’ve heard you say are coming (to some degree) such as encryption. It can even provide desktop sharing so that the server operator could tweak setting on the client, or even be able to remotely control client “knobs”. The groundwork is there and doesn’t have to be rebuilt.

    I work with audio codecs and voip/sip every day so I know the potentials of the protocols. That was why I asked the reason for the current signaling method. Perhaps sip was not a good fit for a very good reason. I only brought up Blink as they are t3h suck and squashing them with openness would be rad.

    Free snd open sip phones exist for all OSs and are not tied to a certain audio codec. SDP negotiations over SIP can setup opus wb streams. A SIP interpreter on the feenphone server would make cross platform clients as trivial as installing a codec and setting some options in an already built piece of software.

    And I could finally get my damn speak n spell on THE RADIO, a life long dream 😉

    1. Thanks man. We really appreciate the BTC.

      As for particular methods, Derrick is the man to ask. But this page has some info: http://feenphone.com/?p=2424

      I guess the main thing is we’re extrodinarliay underfunded, Derrick has performed above and beyond and built this thing on very little money. And every day five or more people send me comments on here, emails, facebook comments, phone calls, etc. suggesting “hey! cool! Add more features! Here’s my list of suggestions / demands!” lol.

      It’s overwhelming. I guess it feels like we’re trying to build the engine on the flying car, and everyone is telling us what bumper stickers and paint job it needs.

      MWD

      p.s. dig the speak n spell comment.

  3. Hey Nick,

    I think it’d be nice to have both SIP and WebRTC support. From my perspective, the simple answer on why it’s not in there is that it wasn’t a principal design goal.

    However that being said, the networking interface was designed to make it very easy for a C# programmer to support additional protocols and transports. I don’t know a lot about either sip or WebRTC yet, outside of what’s printed in the brochures.

    I’d be happy to help anyone who’s willing to work on adding these features.

    Best regards,
    Derrick

  4. Right on guys, this is a great tool. I’m not a programmer, but I understand SIP, RTP, RTCP, sRTP, etc. I’d be happy to help in understanding some of this stuff (to the best of my abilities). So if you ever get to that point, let me know. I plan on spinning up some workstations soon to see what the traffic looks like on a sniffer. I’m curious about things like jitter, latency, packet loss… all things that can introduce echo in a voip stream. I have a lot of experience fighting echo… echo return loss, ACOMM etc. With an RTCP report, it is a lot easier to figure out where it is coming from, and I don’t even know if you have RTCP in the mix. Typically I would turn up an echo cancellation parameter once identifying the source, but that is not wanted in feenphone (which is totally fine), but there are other things that can be done.

  5. I’m just leaving this here for later as it may prove useful at some point.
    http://webrtc2sip.org/
    https://code.google.com/p/webrtc2sip/

    Its a SIP to WebRTC “converter” or proxy. It can do Opus passthrough (and conversion, but not very useful here) and can handle ICE and DTLS/SRTP to RTP w/ RTCP stream conversions. It uses GNU GPL v3. I’m not into reading licenses, so not sure how compatible this is with the BSD 3CL, or your plans for FP.

Leave a Reply

Your email address will not be published. Required fields are marked *