Capture microphone input from Quest while others are in a room speaking

Setup:
There are user in an open world where they can speak with each other using photon voice. Now one user wants to say something that is then transcribed by an online service (This happens by pressing a button).

System:
Quest with Unity 2019.2.3f1

What I tried:
1) Using Unitys internal mic - but this does interfer with the one that photon uses which will result into not hearing anybody after transcribing.

2) Looking at WebRTC tut where onAudioFrameFloat() and FloatProcessor() was used.

There is a strange case, that after my class
[RequireComponent(typeof(Recorder))]
public class VoiceController : VoiceComponent

awkes
protected override void Awake()
    {
        LogManager.Instance.Log("000 AWAKE");
        base.Awake();
        _isRecording = false;
        AudioListener audioListener = FindObjectOfType<AudioListener>();
        if (audioListener != null)
        {
            AudioOutCapture aoc = audioListener.GetComponent<AudioOutCapture>();
            if (aoc == null)
            {
                aoc = audioListener.gameObject.AddComponent<AudioOutCapture>();
                LogManager.Instance.Log("000 AudioOutCapture was init");
            }

            // maybe this works
            aoc.OnAudioFrame += this.OnAudioOutFrameFloat;

            _aoc = aoc;

        }
        else
        {
            LogManager.Instance.Log("AUDIOLISENER IS NULL");
        }
    }

the method assigned by aoc.OnAudioFrame += is never called although it is implemented in the same class. In the class is also a start and stop- recording method, where I earlier tried using the _aoc to assign the OnAudioOutFrameFloat, but after awake _aoc is somehow null ???

Ealier I tried using the FloatProcessor(), but I think it is not supposed to capture audio. (Did also not work)


Please help me by pointing at maybe other possible approaches. If you need more code I can post it too.

Thanks in advance.

Best Answers

Answers

  • JohnTube
    JohnTube ✭✭✭✭✭
    edited January 2020
    Hi @bluk,

    Thank you for choosing Photon!

    So you want to use the voice recorded by Photon Voice Recorder with a service that do speech to text, is that it?

    Are you sure you want to do both at the same time? transmit voice AND transcribe it? Maybe you want to do only one thing at a time and not both at the same time?

    Anyway, if you want to do both at the same time, you should implement a custom (pre|post)processor for Photon Voice.
    For outgoing audio stream, you can create a custom processor by extending Voice.LocalVoiceAudio<T>.IProcessor. You can get the locally recorded audio frame in IProcessor.Process. A component attached to the same GameObject as the Recorder is needed to intercept PhotonVoiceCreated Unity message. Inside that method, insert the custom processor in the local voice processing pipeline using LocalVoice.AddPreProcessor (before transmission) or LocalVoice.AddPostProcessor (after transmission). See "WebRtcAudioDsp.cs" for an example.
    2) Looking at WebRTC tut where onAudioFrameFloat() and FloatProcessor() was used.
    Could you share a link to that tutorial?

    Also note that Photon Voice, currently, does not work with WebRTC or web platforms or browsers.
  • So I updated my class enterly.
    using Photon.Voice;
    using Photon.Voice.PUN;
    using Photon.Voice.Unity;
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.IO;
    using UnityEngine;
    
    [RequireComponent(typeof(Recorder))]
    
    public class VoiceController : VoiceComponent
    {
        private AudioOutCapture ac;
        private LocalVoice localVoice;
        private Recorder recorder;
    
        public VoiceToText vtt;
    
        private void PhotonVoiceCreated(PhotonVoiceCreatedParams p)
        {
            LogManager.Instance.Log("000 PARAMS");
    
            if (this.recorder != null && this.recorder.SourceType != Recorder.InputSourceType.Microphone)
            {
    
                LogManager.Instance.Log("WebRtcAudioDsp should be used with Recorder.SourceType == Recorder.InputSourceType.Microphone only.");
    
                this.enabled = false;
                return;
            }
            this.localVoice = p.Voice;
            if (this.localVoice.Info.Channels != 1)
            {
    
                LogManager.Instance.Log("Only mono audio signals supported.");
    
                this.enabled = false;
                return;
            }
            if (!(this.localVoice is LocalVoiceAudioFloat))
            {
    
                LogManager.Instance.Log("Only short audio voice supported.");
    
                this.enabled = false;
                return;
            }
    
            LogManager.Instance.Log("000 Adding PP");
    
            LocalVoiceAudioFloat v = this.localVoice as LocalVoiceAudioFloat;
            v.AddPreProcessor(vtt);
        }
    
        protected override void Awake()
        {
            LogManager.Instance.Log("000 AWAKE");
            base.Awake();
            vtt = GameObject.Find("Setup").GetComponent<VoiceToText>();
        }
    
    }
    
    

    The logmanager is a custom one.

    https://doc.photonengine.com/en-us/voice/v2/troubleshooting/faq#main
    For outgoing audio stream, you can create a custom processor by extending Voice.LocalVoiceAudio<T>.IProcessor. You can get the locally recorded audio frame in IProcessor.Process. A component attached to the same GameObject as the Recorder is needed to intercept PhotonVoiceCreated Unity message. Inside that method, insert the custom processor in the local voice processing pipeline using LocalVoice.AddPreProcessor (before transmission) or LocalVoice.AddPostProcessor (after transmission). See "WebRtcAudioDsp.cs" for an example.

    Then I looked the WebRtcAudioDsp.cs up in the photon files.

    So now things are somehow working, but
    public float[] Process(float[] buf)
        {
            
            if (_isRecording)
            {
                LogManager.Instance.Log("Getting some audio and i am recording");
                voiceToTextProvider.ReceiveAudio(buf);
            }
            else
            {
                LogManager.Instance.Log("Getting some audio");
            }
    
            return buf;
        }
    

    is called according to my log only once - even with speaking. _isRecording was set to false. This can not be the case or ?
  • bluk
    bluk
    edited January 2020
    In addtion, when is the method Process triggered by Photon ? Documentation does not lead to anything ... because it has no comments
  • bluk
    bluk
    edited January 2020
    bluk wrote: »
    public float[] Process(float[] buf)
        {
            
            if (_isRecording)
            {
                LogManager.Instance.Log("Getting some audio and i am recording");
                voiceToTextProvider.ReceiveAudio(buf);
            }
            else
            {
                LogManager.Instance.Log("Getting some audio");
            }
    
            return buf;
        }
    

    is called according to my log only once - even with speaking. _isRecording was set to false. This can not be the case or ?

    Must the Process method be in a non unity class(so not implementing monobehaviour) ?

  • JohnTube
    JohnTube ✭✭✭✭✭
    Hi @bluk,

    Process method is called automatically by Photon Voice when there is a recorded frame to be transmitted.
    I think, it does not matter if the class that implements the processor interface is a MonoBehaviour or not.

    If it's called once, it means it works.
    Not sure how you know that. but it could be an issue in your code, check how you set the value of _isRecording.
  • So a small follow up if someone had the same problem. You need to copy the incoming buffer, if you want to use it later. Somehow clear but at one late point I thougt about it
    public float[] Process(float[] buf)
        {
            if (stt != null && stt.canReceiveFrames())
            {
                float[] bufCopy = new float[buf.Length];
                System.Array.Copy(buf, bufCopy, buf.Length);
                stt.receiveFrame(bufCopy);
                return new float[buf.Length];
            }
            else
            {
                return buf;
            }
        }
    

    @JohnTube, is it possible to return NULL instead of new float array ? The goal here is that other users dont hear you while you are speaking.
  • JohnTube wrote: »
    Hi @bluk,

    So you want to use Photon Voice Recorder to record voice and not transmit it?
    Maybe you should use a custom script instead?

    You are already returning an empty float array...
    Not sure what happens if you return null.
    @vadim what do you think?

    It is only needed if the user wants to use speech to text translation, we do not want to disturb other users if the user uses speech to text (stt).(You have to speak a bit more elegant in order for stt to work out). After that the arrays are passed normally. Mh but I guess to denable the transmit would archive the same result ...

    One mini-question:
    If I want to archive that I do not want to hear others for some seconds, I can put the user using sst in a (audio)group and then put him back in the old one ? Or is there a better approach in your mind ?

    Thanks for helping out.