6.5. Recording Sound

Recording sound operates in much the same way as an audio queue plays it, however, a queue is created and set up as a recording queue, providing an output to the application instead of accepting an input to the speakers. You can record sound into many different formats including Apple Lossless, PCM, and others. The example in this section will closely parallel our previous audio queue example, but with some changes. We'll document these throughout the example.

When recording sound, the audio queue's conveyor belt is spinning in reverse. The iPhone's microphone is doing all of the work of filling boxes with sound and sending them from the mic to your application. You're still responsible for telling the framework what kind of format and sample rate you'd like, but instead of filling boxes, you'll now be responsible for emptying them and writing them out to disk or some other storage mechanism. In the example to follow, you'll use Audio Toolbox's AudioFile functions to write directly out to a file instead of copying it to memory.

A recording queue is still strictly first-in-first-out; that is, the conveyor belt moves the samples in the order they are recorded.

Audio Toolbox's audio queue works like the following:

  1. An audio queue is created and assigned properties that identify the type of sound that will be recorded (format, sample rate, etc.).

  2. Sound buffers are attached to the queue, which will contain the actual sound frames as they are recorded. Think of a sound frame here as a single box full of sound that was recorded, whereas a sample is a single piece of digital sound within the box.

  3. The developer supplies an "input" callback function, which the audio queue calls every time a sound buffer has been filled with recorded audio. This callback function is responsible for writing the recorded frames to disk (or some other destination) and sending the box around for another fill.

6.5.1. Audio Queue Structure

As you learned earlier, the Audio Toolbox framework uses low-level C function calls, and so it has no concept of a class. You must first create a callback structure to contain all of the variables that will be moving around in your recording application. Think of this as a context. The AQCallbackStruct structure below is similar to the playback version of this structure, but with a few added pieces:

typedef struct AQCallbackStruct {
    AudioStreamBasicDescription mDataFormat;
    AudioQueueRef queue;
    AudioQueueBufferRef mBuffers[AUDIO_BUFFERS];
    AudioFileID outputFile;
    unsigned long frameSize;
    long long recPtr;
    int run;
} AQCallbackStruct;

The following components are grouped into this structure to service the audio framework:



AudioStreamBasicDescription mDataFormat

Information about the format of audio that will be recorded.



AudioQueueRef queue

A pointer to the audio queue object your program will create.



AudioQueueBufferRef mBuffers

An array containing the total number of sound buffers used.



AudioFileID outputFile

Pointer to an output file, where the sound will be written as it is recorded.



unsigned long frameSize

The total number of samples to be copied per audio sync. This is largely up to the implementer.



long long recPtr

A numeric pointer to the current position of the recording "needle" in terms of what raw sound data the application has already processed. This is incremented as more data is recorded.



int run

A value to key from in determining whether our audio queue should requeue the sound buffers; that is, whether or not to send the boxes back around for more sound. When it's time to stop recording, this should be set by your application to zero.

Before you can create the audio queue, you'll need to initialize a description of the audio input you'd like your application to receive:

AQCallbackStruct aqc;
aqc.mDataFormat.mFormatID = kAudioFormatLinearPCM;
    aqc.mDataFormat.mSampleRate = 44100.0;
    aqc.mDataFormat.mChannelsPerFrame = 2;
    aqc.mDataFormat.mBitsPerChannel = 16;
    aqc.mDataFormat.mBytesPerPacket =
    aqc.mDataFormat.mBytesPerFrame =
        aqc.mDataFormat.mChannelsPerFrame * sizeof (short int);
    aqc.mDataFormat.mFramesPerPacket = 1;
    aqc.mDataFormat.mFormatFlags =
            kLinearPCMFormatFlagIsBigEndian
          | kLinearPCMFormatFlagIsSignedInteger
          | kLinearPCMFormatFlagIsPacked;
    aqc.frameSize = 735;

In the preceding example, a structure is prepared to record 16-bit (two bytes per sample) stereo sound (two channels) with a sample rate of 44 Khz (44100). The output sample will be provided in the form of two 2-byte short integers, hence four total bytes per frame (two bytes for the left and right channel, each).

The sample rate and frame size dictate how often your application will receive more sound. With a frequency of 44100 samples per second, the application can be made to sync the sound every 60th of a second by defining a frame size of 735 samples (44100 / 60 = 735). This is very aggressive to accommodate real-time sound processing applications, so if you don't need to sync that often, you can choose a larger frame size, such as 22050, which will sync every ½ second.

The format used in the above example calls for PCM (raw data), but the audio queue supports many of the audio formats supported by the iPhone. These include the following:

kAudioFormatLinearPCM
kAudioFormatAppleIMA4
kAudioFormatMPEG4AAC
kAudioFormatULaw
kAudioFormatALaw
kAudioFormatMPEGLayer3
kAudioFormatAppleLossless
kAudioFormatAMR

6.5.2. Provisioning Audio Input

Once you have defined the audio queue's properties, you can provision a new audio queue object. The AudioQueueNewInput function is responsible for creating an input (recording) channel and attaching it to the queue. The prototype function follows:

AudioQueueNewInput(
    const AudioStreamBasicDescription *inFormat,
    AudioQueueInputCallback           inCallbackProc,
    void *                            inUserData,
    CFRunLoopRef                      inCallbackRunLoop,
    CFStringRef                       inCallbackRunLoopMode,
    UInt32                            inFlags,
    AudioQueueRef *                   outAQ);



inFormat

Pointer to a structure describing the audio format to be recorded. You defined this structure earlier as a member of data type AudioStreamBasicDescription within the AQCallbackStruct structure.



inCallbackProc

The name of a callback function to be called when the audio queue has a full buffer of recorded audio. The callback function is responsible for doing something with the sound buffer, such as writing it to disk, and then sending the buffer back around for more data.



inUserData

A pointer to data that the developer can optionally pass to the callback function. Our example will contain a pointer to the instance of the user-defined AQCallbackStruct structure, which will contain information about the audio queue as well as any information relevant to the application about the samples being recorded.



inCallbackRunLoopMode

Tells the audio queue how it should expect to loop the audio. A NULL value indicates that the callback function should be invoked whenever a sound buffer is filled. Additional modes are available to run the callback under other conditions.



inFlags

Not used; reserved.



outAQ

When the AudioQueueNewInput function returns, this pointer will be set to the newly created audio queue. The presence of this argument allows an error code to be used as the return value of the function.

An actual call to this function, using the audio queue structure created earlier, follows. In this example, the name of our callback function is specified as AQInputCallback. It is this function that will be responsible for taking recorded sound delivered to your application and writing it to disk:

AudioQueueNewInput (
        &aqc.mDataFormat,
        AQInputCallback,
        &aqc,
        NULL,
        kCFRunLoopCommonModes,
        0,
        &aqc.queue
    );

6.5.3. Sound Buffers

A sound buffer contains sound data from the microphone while it is in transit to your application (the output device). Going back to our box-on-a-conveyor-belt concept, the buffer is the box that carries your sound between the microphone and your callback function. If the iPhone can't provide enough sound in the pipeline, you may end up with gaps or skipping in your recording. The more boxes you have, the more sound you can queue up in advance to avoid running out (or running slow). The downside is that it also takes longer for the sound at the microphone end to catch up to the sound going into the application. This could be problematic if you are writing a voice synthesizer or other type of application that requires close to real-time sound.

When recording is ready to start, sound buffers are created and placed on the audio queue. The minimum number of buffers needed to start a recording queue on an Apple desktop is only one, but on the iPhone it is three. In applications that might cause high CPU usage, it may be appropriate to use even more buffers to prevent recording underruns:

#define AUDIO_BUFFERS 3

for (i=0; i<AUDIO_BUFFERS; i++) {
        AudioQueueAllocateBuffer (aqc.queue, aqc.frameSize, &aqc.mBuffers[i]);
        AudioQueueEnqueueBuffer (aqc.queue, aqc.mBuffers[i], 0, NULL);
    }

                                          

In the playback example, the audio buffers were sent to the callback function to be primed with data. Since this example is recording sound instead of playing it, the buffer needs to be queued (sent around the conveyor belt) first so that the audio framework can fill it with recorded data. Once filled, the framework will automatically invoke your callback function.

The queue is now ready to be started, which turns on the conveyor belt sending the sound buffers your way from the microphone. As this occurs, the callback function will empty the buffers of their contents (no, it doesn't need to zero the data) and send the boxes back around the conveyor belt for a refill:

AudioQueueStart(aqc.queue, NULL);

Later on, when you're ready to turn off recording, deactivate the sound queue using the AudioQueueStop and AudioQueueDispose functions. The AudioQueueStop function only stops the queue, leaving it in a state where it can later be restarted. When the audio queue is disposed of, however, it is deallocated from memory, and cannot be restarted:

AudioQueueStop(aqc.queue, true);
AudioQueueDispose(aqc.queue, true);

6.5.4. Callback Function

Once the audio queue is running, your application will be periodically presented with a sound buffer containing data. What we haven't explained yet is how this happens. After a buffer is filled with recorded data, the audio queue calls the callback function you specified as the second argument to AudioQueueNewInput. This callback function is where the application does its work; it empties the box that carries the microphone's output, and places it back on the queue. When invoked, your callback function will empty the audio queue buffer by copying the latest sound frames to their destination; in the case of this example, into a file:

static void AQInputCallback (
    void                                 *aqr,
    AudioQueueRef                        inQ,
    AudioQueueBufferRef                  inQB,
    const AudioTimeStamp                 *timestamp,
    unsigned long                        frameSize,
    const AudioStreamPacketDescription   *mDataFormat)
{

The callback structure you created at the very beginning, aqc, is passed into your callback function as a user-defined argument, followed by pointers to the audio queue itself and the audio queue buffer to be emptied:

AQCallbackStruct *aqc = (AQCallbackStruct *)aqr;

Because the AQCallbackStruct structure is considered user data, the audio queue presents it to the callback function as a void pointer. It will need to be cast back to an AQCallbackStruct structure pointer (here, named aqc) before it can be accessed.

6.5.5. Accessing Raw Data

In most cases, you'll be writing the audio directly to a file, but if you are going to access the raw audio data inside the buffer, you can tap into the raw input buffer:

short *CoreAudioBuffer = (short *) inQB->mAudioData;

The CoreAudioBuffer variable represents the space inside the sound buffer where the microphone's raw samples will be copied at each sync. Your application needs to maintain a type of "record needle" to keep track of what sound has already been sent to the audio queue. An example of copying data into allocated memory follows:

int recNeedle = 0;
myBuffer = malloc(aqc.frameSize * nSamples);
...
static void AQInputCallback (
    void                                 *aqr,
    AudioQueueRef                        inQ,
    AudioQueueBufferRef                  inQB,
    const AudioTimeStamp                 *timestamp,
    unsigned long                        frameSize,
    const AudioStreamPacketDescription   *mDataFormat)
{
    AQCallbackStruct *aqc = (AQCallbackStruct *) aqr;

    short *CoreAudioBuffer = (short *) inQB->mAudioData;
    memcpy(myBuffer + recNeedle, CoreAudioBuffer,
        aqc.mDataFormat.mBytesPerFrame * aqc.frameSize);
    recNeedle += aqc.frameSize;
    if (!aqc->run)
      return;

    AudioQueueEnqueueBuffer (aqc->queue, inQB, 0, NULL);
}

6.5.6. Writing to a File

To write to a file, you'll use the Audio Toolbox's AudioFile set of functions. To prepare an audio file, you'll first need to define the file format. The code below configures the property needed for an AIFF audio file:

AudioFileTypeID fileFormat = kAudioFileAIFFType;

Use a CFURL structure to contain the actual file path to the audio file:

CFURLRef filename =
        CFURLCreateFromFileSystemRepresentation (
            NULL,
            (const unsigned char *) path_to_file,
            strlen (path_to_file),
            false
        );

Make sure the path you choose for the file exists within your application's sandbox, using the NSHomeDirectory function, or similar functions. You will not be allowed to write a sound file anywhere outside of your sandbox.


Finally, you'll create the audio file itself with a call to AudioFileCreateWithURL containing the filename and format properties you just created. A pointer to the file is written into the AQCallbackStruct structure so that you'll know how to access the file whenever there is sound to write:

AudioFileCreateWithURL (
        filename,
        fileFormat,
        &aqc.mDataFormat,
        kAudioFileFlags_EraseFile,
        &aqc.mAudioFile
    );

As new audio samples are recorded, you'll write to this file using the AudioFileWritePackets function, which is another function built into Audio Toolbox specifically for writing audio packets into a file. You'll see how this works in the following example.

6.5.7. Example: Sound Recorder

Continuing in the spirit of good old-fashioned C hacking, this example can run on the command line with a filename and duration on either the iPhone or the desktop. It records data from the microphone for a preset duration provided and saves it to the filename provided.

Because Leopard also includes the Audio Toolbox framework, you can compile Example 6-10 for the desktop and iPhone:

$ gcc -o recorder recorder.c -framework AudioToolbox -framework CoreFoundation

                                          

Example 6-10. Sound recorder example (recorder.c)
#include <AudioToolbox/AudioQueue.h>
#include <AudioToolbox/AudioFile.h>
#include <AudioToolbox/AudioConverter.h>

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/select.h>

#define AUDIO_BUFFERS 3

typedef struct AQCallbackStruct {
    AudioStreamBasicDescription mDataFormat;
    AudioQueueRef queue;
    AudioQueueBufferRef mBuffers[AUDIO_BUFFERS];
    AudioFileID outputFile;
    unsigned long frameSize;
    long long recPtr;
    int run;
} AQCallbackStruct;

static void AQInputCallback (
    void                                 *aqr,
    AudioQueueRef                        inQ,
    AudioQueueBufferRef                  inQB,
    const AudioTimeStamp                 *timestamp,
    unsigned long                        frameSize,
    const AudioStreamPacketDescription   *mDataFormat)
{
    AQCallbackStruct *aqc = (AQCallbackStruct *) aqr;

    /* Write data to file */
    if (AudioFileWritePackets (aqc->outputFile, false, inQB->mAudioDataByteSize,
        mDataFormat, aqc->recPtr, &frameSize, inQB->mAudioData) == noErr)
    {
        aqc->recPtr += frameSize;
    }

    /* Don't re-queue the sound buffers if we're supposed to stop recording */
    if (!aqc->run)
      return;

    AudioQueueEnqueueBuffer (aqc->queue, inQB, 0, NULL);
}

int main(int argc, char *argv[]) {
    AQCallbackStruct aqc;
    AudioFileTypeID fileFormat;
    CFURLRef filename;
    struct timeval tv;
    int i;

    if (argc < 3) {
        fprintf(stderr, "Syntax: %s [filename] [duration]", argv[0]);
        exit(EXIT_FAILURE);
    }

    aqc.mDataFormat.mFormatID = kAudioFormatLinearPCM;
    aqc.mDataFormat.mSampleRate = 44100.0;
    aqc.mDataFormat.mChannelsPerFrame = 2;
    aqc.mDataFormat.mBitsPerChannel = 16;
    aqc.mDataFormat.mBytesPerPacket =
    aqc.mDataFormat.mBytesPerFrame =
        aqc.mDataFormat.mChannelsPerFrame * sizeof (short int);
    aqc.mDataFormat.mFramesPerPacket = 1;
    aqc.mDataFormat.mFormatFlags =
            kLinearPCMFormatFlagIsBigEndian
          | kLinearPCMFormatFlagIsSignedInteger
          | kLinearPCMFormatFlagIsPacked;
    aqc.frameSize = 735;

    AudioQueueNewInput (&aqc.mDataFormat, AQInputCallback, &aqc, NULL,
        kCFRunLoopCommonModes, 0, &aqc.queue);

    /* Create output file */

    fileFormat = kAudioFileAIFFType;
    filename = CFURLCreateFromFileSystemRepresentation (NULL, argv[1],
        strlen (argv[1]), false);

    AudioFileCreateWithURL (
        filename,
        fileFormat,
        &aqc.mDataFormat,
        kAudioFileFlags_EraseFile,
        &aqc.outputFile
    );

    /* Initialize the recording buffers */

    for (i=0; i<AUDIO_BUFFERS; i++) {
        AudioQueueAllocateBuffer (aqc.queue, aqc.frameSize, &aqc.mBuffers[i]);
        AudioQueueEnqueueBuffer (aqc.queue, aqc.mBuffers[i], 0, NULL);
    }

    aqc.recPtr = 0;
    aqc.run = 1;

    AudioQueueStart (aqc.queue, NULL);

    /* Hang around for a while while the recording takes place */

    tv.tv_sec = atof(argv[2]);
    tv.tv_usec = 0;
    select(0, NULL, NULL, NULL, &tv);

    /* Shut down recording */

    AudioQueueStop (aqc.queue, true);
    aqc.run = 0;

    AudioQueueDispose (aqc.queue, true);
    AudioFileClose (aqc.outputFile);

    exit(EXIT_SUCCESS);
}

                                          

6.5.8. What's Going On

Here's how the record program works:

  1. When the program starts, the application's main function extracts the filename and recording duration from the argument list (as supplied on the command line).

  2. The main function builds our user-defined AQCallbackStruct structure, whose construction is declared at the beginning of the program. This structure holds pointers to the recording queue, sound buffers, and the output file that was created. It also contains the sample's length and an integer called recPtr, which acts as record needle, identifying the last sample that was written to disk.

  3. A new recording queue is initialized and started. Each sound buffer is initialized and placed on the queue. The queue is then started. The program then sits and sleeps until the sample is finished recording.

  4. As audio is recorded, the sound buffers are sent to the callback, where they become filled one by one. Whenever a buffer is ready to be emptied, the AQInputCallback function is called.

  5. The AQInputCallback function increments recPtr and copies the sound frame to disk.

6.5.9. Further Study

  • Modify this example to sync at one-second intervals.

  • Check out AudioFile.h in Mac OS X Leopard on the desktop. This can be found in /System/Library/Frameworks/AudioToolbox.framework/Headers/.