Transfer Mic and Audio Source

I have a situation where I need to transfer both the input from the Mic along with the output from an Audio Source. I've looked at the FAQ and found the section about using a custom Audio Source but this seems to rely on an Audio Clip rather than an Audio Source so I'm not sure how to handle it.

Is this possible?

Answers

  • JohnTube
    JohnTube ✭✭✭✭✭

    Hi @Minneth,

    Thank you for choosing Photon!

    You can easily use an AudioSource (same for AudioListener) as input for Recorder. In order to do this you can make use of the OnAudioFilterRead Unity callback. We already have a nice helper component AudioOutCapture that could be attached to the same GameObject as an AudioSource (or AudioListener) and then subscribe to its OnAudioFrame event. Use this event handler inside a custom script that sets a factory in Recorder. We already have something similar (experimental) MicWrapperPusher set using Recorder.UseOnAudioFilterRead which uses a muted AudioSource (as a buffer) to play AudioClip recorded by Unity's Microphone. We use OnAudioFilterRead to allow Unity's audio chain pipeline to feed (push) audio into Voice. So Microphone -> AudioClip -> AudioSource -> OnAudioFilterRead -> Photon Voice (Opus Encoder).

    So with a few modifications to remove the microphone part here is a first sample of what you are looking for. This is not tested but should work, simply attach AudioSourceInputFactorySetter to the same GameObject as the Recorder, also preferably keep RecordOnlyWhenJoined true or set AutoStart false.

  • Hi John, Many thanks for your suggestion. I am having difficulties with it though, maybe I've set things up slightly wrong.

    I've added the script to a gameobject and saved it in my resources, then when I was to start streaming the AudioSource I am then instantiating the game object using Photon and also enabling the above script on the host side (so the player pushing the audio).

    I can see the game object being instantiated both on the host side and also the client side but I can't hear any sound. I am noticing that a AudioOutCapture script is added on the host side, but it doesn't appear any audio is being passed to it - The audio level display isn't changing.

    I'm also noticing that my AudioSource, which originally was a wav file for testing, is being changed to UnityAudioOut - I'm unsure what that is.

    Lastly I can also see that the recorder doesn't have Transmit enabled, but if I toggle this manually I'm not seeing the Level move at all, so its as if it's not receiving any audio from the factory.

    Does the above ring any bells at all?

    Also, I thought I should ask since I'm unsure - I'm guessing the game object also needs to have a photon voice view in order for Photon Voice to actually stream the audio? Just double checking.

    Thanks again.

    Martin

  • JohnTube
    JohnTube ✭✭✭✭✭

    Hi @Minneth,

    is being changed to UnityAudioOut 

    I think you are using the same AudioSource used by Speaker component which is not supported.

    Please do not have the Speaker and Recorder on the same GameObject.

    Also, I thought I should ask since I'm unsure - I'm guessing the game object also needs to have a photon voice view in order for Photon Voice to actually stream the audio? 

    It depends if you want to use the PUN integration or not.

    Always make use of the logs and in doubt increase log levels of Voice components.

  • Thanks John, apologies to have to ask follow up questions, unfortunately I'm still having issues.

    As you advised, removing the speaker has ensured my audiosource remains playing the correct clip from the host side, however I can see that both the recorder and AudioOutCapture are still not receiving anything - The level gauges remain at zero.

    A couple of things I have noticed:

    If I disable the IsRecording on the Recorder on the host (so the person pushing the audiosource) I can then hear the AudioSource I'm trying to transmit as well as see the AudioOutCapture receiving the audio. It's obviously still not being received by any clients though.

    Whilst debugging on the host I can also see that the pushCallback is being called and the following line in the VoiceClient is being executed, which I'm assuming is correct:

    ((IAudioPusher<float>)source).SetCallback(buf => localVoice.PushDataAsync(buf), localVoice.BufferFactory);

    I've also checked the logs and ensured I'm outputing all logs, but I can't see any clues other than one log that states "InputSource is null or not resettable." on the game object trying to send the audio.

    Could this be anything to do with the fact I'm tranferring Mic audio already in a separate Recorder and Photon Voice View, so maybe adding another Photon Voice View is screwing things up? I need to transfer both the audio and the voice because the app is a karaoke style app.

  • JohnTube
    JohnTube ✭✭✭✭✭

    Hi @Minneth,

    Could this be anything to do with the fact I'm tranferring Mic audio already in a separate Recorder and Photon Voice View, so maybe adding another Photon Voice View is screwing things up?

    No, it's unrelated.

    Photon Voice can support multiple simultaneous (Recorder) streams per player.

    If I disable the IsRecording on the Recorder on the host (so the person pushing the audiosource) I can then hear the AudioSource I'm trying to transmit as well as see the AudioOutCapture receiving the audio. It's obviously still not being received by any clients though.

    hmm, I see.

    This could be an issue with timing.

    Not sure how you fill the AudioClip data dynamically at runtime.

    It might be that the AudioSource is being used in the opposite way: the Recorder' "is trying to push silence" to it which overrides what you are setting in the AudioClip. Read last sentence in OnAudioFilterRead docs paragraph:

    OnAudioFilterRead is called every time a chunk of audio is sent to the filter (this happens frequently, every ~20ms depending on the sample rate and platform). The audio data is an array of floats ranging from [-1.0f;1.0f] and contains audio from the previous filter in the chain or the AudioClip on the AudioSource. If this is the first filter in the chain and a clip isn't attached to the audio source, this filter will be played as the audio source. In this way you can use the filter as the audio clip, procedurally generating audio.

    Could you somehow try to set AudioClip data BEFORE OnAudioFilterRead is called?

    Or maybe we need to re evaluate or maybe we could send audio from the AudioListener instead of the AudioSource but in this case we could also transmit the received audio?

  • Hi @JohnTube , Many thanks once again, however your suggestion regarding the audio clip isn't really applicable here since it's preassigned to the audiosource in the editor and it's playing on a loop. I'm doing this just as a test while In get to the bottom of this issue, was hoping to swap it out once I've proven it can be done.

    Let me share some screenshots to show you what's going on, hopefully that will give you more context.

    Here's a screenshot of my prefab that I'm instantiating on the host end using PhotonNetwork when we're ready to start streaming audio:

    https://www.udrop.com/6n5t/Screenshot_2022-02-17_at_07.54.33.png

    And here's the KaraokeScript shown in the above - Once the host calls PhotonNetwork.Instantiate we then also call the BeginRecording() method in the KaraokeScript which will starting the audio source playing, as well as the recorder recording and also ensure the AudioSourceInputFactorySetter is activated:

    https://www.udrop.com/6n5y/Screenshot_2022-02-17_at_08.50.36.png

    The only difference I've made to your original AudioSourceInputFactorySetter script is abstracting out the starting of recording from the Awake() method, this is so once the prefab is instantiated on the clients (via Photon) they don't begin recording - Only the host needs to record. I guess I need to do this since I'm using Photon Voice View to transfer the audio.

    And finally once the game starts and the prefab is instantiated by the server, this is what I'm seeing:

    https://www.udrop.com/6n5v/Screenshot_2022-02-17_at_08.25.36.png

    As you can see the recorder is enabled, recording and transmitting however the level is remaining at zero. I'm also seeing the AudioOutCapture but its flickering around 0 to 0.02 constantly and obviously no sound is being transferred.

    Is there anything else you think might be useful for me to share to allow us to get to the bottom of the problem? If not, maybe its worth me trying using an AudioListener as you suggest - Are you able to provide details of how I'd do this?

    Thanks again.

  • JohnTube
    JohnTube ✭✭✭✭✭

    Hi @Minneth,

    Having to initialize Recorder looks weird.

    If you are using PUN integration I would first make sure I got this working properly using microphone or AudioClip.

    You need a PhotonVoiceView attached next to PhotonView. You will need two PhotonView and two PhotonVoiceView, one per Recorder, one for microphone and one for custom factory.

    Once you make sure recording and transmission work as expected with a built-in source you can experiment with the custom factory.

    Maybe I should have asked earlier, if you do karaoke and the AudioSource will be playing an AudioClip and this AudioClip is on every client why not simply synchronize playing that AudioClip with the need for Photon Voice (maybe use PUN). If the AudioClip is not on every client maybe simply stream the AudioClip.

    I still don't see a valid reason why you need to transmit output of AudioSource and maybe if I understand this I can help you better.

  • Minneth
    Minneth
    edited February 18

    Hi John

    Absolutely, hopefully the below helps bring us closer.

    Firstly, I've taken your advice and swapped out the factory for an audio clip and it works with absolutely no issues, switching back to the factory causes the previously described issues once again.

    To clarify my reasoning for needing to use an AudioSource, basically we're using AVPRO to stream a hsl/dash video stream to the host so they can sing along (basically we use YouTube-dl to get the underlying hls stream for a YouTube video). That way our users can search for a karaoke version of the song they like, sing along but the YouTube audio (which outputs to an audio source) can be streamed alongside the mic which is already working fine. Unfortunately it doesn't output the audio to an audio clip since it's streaming over the internet.

    My initial thought was to have the 2 audio sources combined somehow, then streamed but I had a hard time finding an answer to that, then I came across your posts regarding factories.

    I've just gone ahead and updated PUN and Photon Voice to the latest versions as well but unfortunately the issue persists. Is there anything else I could check? To me it looks like the OnAudioFilterRead is not receiving the correct audio, given it is constantly flickering between 0 and 0.02. As soon as I turn the audio source off, this flickering stops so it's definitely receiving something from the audiosource, just not what it should receive.

    Alternatively, is there any advice you could give around streaming an Audio Listener? That might be my next option, since the Audio Listener should receive all audio I need to stream.

    Thanks once again for your advice so far.

  • JohnTube
    JohnTube ✭✭✭✭✭

    Hi @Minneth,

    Here is what I would try:

    • try manually adding an AudioClip to the AudioSource and set Loop true, see if the AudioSource plays a regular clip it will make the custom factory work?
    • try delaying adding the AudioOutCapture + factory until the AudioSource is already playing something so somehow the OnAudioFilterRead will not be the first one in the Unity audio DSP chain
    • if the AVPRO script that feeds the stream into AudioSource is attached to the same GameObject and Recorder, make sure the order is as follows, from top to bottom: AudioSource, AVPRO script THEN custom factory setter so somehow the OnAudioFilterRead will not be the first one in the Unity audio DSP chain. As you can read in OnAudioFilterRead docs: "The filter is inserted in the same order as the MonoBehaviour script is shown in the Inspector."
    • try delaying execution order/time of factory so somehow the OnAudioFilterRead will not be the first one in the Unity audio DSP chain
    • if you have the source code of the plugin AVPRO maybe take a look at the script that uses the AudioSource and feed the stream to it and use something similar in a custom factory directly to feed the same stream to Photon Voice directly instead of using the intermediary AudioSource

    Here is what I would do to switch to using AudioListener as input:

    • in the script replace AudioSource w/ AudioListener and that's it.
  • Hi @JohnTube ,

    We have success! I'm not sure exactly what this is being used for, but by removing this line it appears to fix the issue for me:

        //Array.Clear(frame, 0, frame.Length);

    Since we're clearing the array every time we receive data from the AudioSource, that would probably explain why the AudioOutCapture is constantly flickering.

    The only issue I'm seeing now is that the audio seems to warp on occasion. I'm putting that down to my terrible test devices, but if you have any thoughts around why that might be happening then it'd be much appreciated. Maybe some kind of bitrate mismatch, or am I just making this up at this point?

    Once again thanks for all your help, had lots of late nights battling with this but I'm glad to see the light at the end of the tunnel.

    Cheers.

  • JohnTube
    JohnTube ✭✭✭✭✭

    Hi @Minneth,

    I'm glad it works now!

    Sorry that line of code was causing issues. I think it was useful in the use case we had before, from MicWrapperAudioPusher script.

    I will review the code.

    I don't have any input on the other issue you mentioned now. It could be a sampling rate mismatch between what the AudioSource is playing and the Unity audio settings output.

  • Hi there, apologies to reopen this issue but unfortunately I'm experiencing issues with the suggested approach so was hoping for some advice.

    Since I'm creating a karaoke app, I'm transferring both the audio and also the voice separately and this results in a slight delay and it unfortunately makes the above solution unviable.

    I wanted to query whether there are other options (or something you'd suggest I learn more about) that would result in the input from the mic and also the audio source to be combined in real time before they're transferred via the above method?

    Thanks in advance,

    Martin

  • Hi Martin,

    How large is the delay? Is it constant on different devices and under different network conditions?

    Voice API does not provide a real stream synchronization mechanism but can apply a constant delay to remote (incoming) streams to make them more or less in-sync. Unfortunately, this delay can be applied only per codec currently. We can update the API to make it possible do apply different delays to streams with the same codec.

    It's possible also to write an audio processor that adds delay to local (outgoing) audio stream.

  • Hi @vadim , thanks for the response. The delay appears to be around a second from our testing, but unsure whether this is different since we only have a set number of test VR headsets.

    I can try adding a delay if you're able to provide some guidance on how this can be achieved - Is it possible you could provide details on both approaches you mentioned?

    I was hoping there would be a way to combine audio sources before they're sent out by the host, but unfortunately I don't know enough about processing audio data to know how I'd go about this.

  • Hi Martin,

    1 second delay is quite high. Do you mean that 2 audio streams play in-sync on local machine but when they are streamed via Photon Voice, they are desynchronized that much on remote machines? I don't see where such huge delay may come from. Do you use AudioClipWrapper (input source type AudioClip) to stream music? Are you playing back the same clip locally in parallel? Maybe AudioClipWrapper and clip playback are started at different time.

    Of course in case you do not need 2 separate streams on receiver side, the best approach would be mixing 2 audio sources to a single stream locally. You can use Voice.IProcessor<T> for this. Implement this simple interface and add an instance to microphone audio processing pipeline with photonVoiceCreatedParams.Voice.AddPreProcessor() in "PhotonVoiceCreated" message handler. See SaveOutgoingStreamToFile.cs for example. In Process() method implementation, you just need to add music audio source sample values to the values of the buffer provided as a parameter.

  • Minneth
    Minneth
    edited March 17

    Hi @vadim , your second suggestion sounds like what I'm looking for though I'm unsure of the implementation.

    I presume the values passed in to the Process() buffer are the audio coming from the Mic, is that right?

    If I also want to combine that with float data coming from my music, I'm not sure how I'd do that. I've currently got implemented an IAudioPusher which is receiving my music data and I'm storing it in an array called dataToBeSent:

    private void AudioOutCaptureOnOnAudioFrame(float[] frame, int channelsNumber)

      {

        if (this.dataToBeSent.Length != frame.Length)

        {

          this.dataToBeSent= new float[frame.Length];

      }

    I then also have the IProcessor you described above which has access to that audioPusher's cached float array and in it's Process() method I am getting the stored float data from my cached array and attempting to combine that with the buffer that's passed in. I'm then also clearing my array afterwards, which I assume is correct.

    Firstly I'm seeing a syntax issue when I'm trying to add the 2 buffers together - "Operation '+' cannot be applied to operands of type float[] and float[]:

        public float[] Process(float[] buf)

        {

          var musicData = audioSourcePusher.dataToBeSent;

          return musicData + buf;

        }

    Is this what you had in mind when you said "add music audio source sample values to the values of the buffer provided as a parameter"?

  • Data passed to Process() is audio from microphone if you attached the processor to the microphone local voice.

    It's still not clear how you handle music audio. If you intercept Unity output with OnAudioFilterRead, then you may need a ring buffer allowing simultaneous write by output capture and read by microphone processor.

    Alternatively you can read music from an audio clip (as in AudioClipWrapper) directly in processor. Unity can play the clip at the same time since the clip belongs to the scene but Unity play and processor read positions may diverge over time in some cases.

    I meant addition per sample (an element in float[]) in a loop. This is the only way to mix 2 audio sources.

  • Hi @vadim , thanks again for assisting me through this. I feel I'm quite close but lack some of the fundamentals.

    The bit I'm confused about those is the last line of your most recent comment but I think that's the real crux of my issue:

    "I meant addition per sample (an element in float[]) in a loop. This is the only way to mix 2 audio sources."

    Can you go into a bit more detail here? Apologies if this is verging outside of the realms of Photon but this is definitely the part I'm unsure of. If there's any documentation you can provide or online examples, I'm happy to study those.

    Thanks once again.

  • Hi Martin,

    This is really the easiest part. To mix 2 audio buffers, you need to sum their elements:

    public float[] Process(float[] buf) // buf comes from microphone

    {

    float[] bufMusic = readMusicStream(buf.Length);

      for (int i = 0; i < buf.Length; i++)

      {

    buf[i] += bufMusic[i];

    // or buf[i] = buf[i] * VOICE_VOLUME + bufMusic[i]; * MUSIC_VOLUME;

      }

    return buf;

    }

    It's a bit more complicated if one buffer has 2 channel and other only 1.