Friday 15 November 2013

WAV file format and brief intro into its structure

WAV file format and brief intro into its structure


detail of the table below


*One of the options that came to mind when developing this drum synth, was also follow the sampler route ! It can be quicker, easier but wasn't really what i was aiming at. Still, i still think that it can be a nice add on to it, so i share some of the concepts in order to understand a canonical wave player ( SD Card for storage, of course). Here is a brief introduction...


The WAV file-type is a standard for storing audio data in chunks and sub-chunks using the RIFF (Resource Interchange File Format) format , therefore a subset of it. The most common type of WAV file in audio is PCM (Pulse Code Modulation) .
Most common WAV formats contains uncompressed audio in the linear pulse code modulation (LPCM) format.
Although we will be focusing on PCM and LPCM( Linear Pulse Code Modulation) here, there are several compression codecs available to use with .WAV file format; including µ-Law, PCM, ADPCM, Microsoft GSM 06.10, CELP, SBC, Truespeech and MPEG Layer-3.
Most common WAV formats contains uncompressed audio in the linear pulse code modulation (LPCM) format. The standard audio file format for CDs, for example, is LPCM-encoded, containing two channels of 44,100 samples per second, 16 bits per sample. Since LPCM uses an uncompressed storage method which keeps all the samples of an audio track, professional users or audio experts may use the WAV format for maximum audio quality.
Since the sampling-rate of a WAV file can vary from 1 Hz to 4.3 GHz, and with as many channels as 65535, .wav files have also been used for non-audio data.
WAV format is limited to files less than 4 GB, because of its use of a 32-bit unsigned integer to record the file size header (some programs limit the size to 2 GB). The W64 format was  created for its 64-bit header, that allows much longer recording times.

A RIFF file is a generic file container format for storing data in tagged chunks. It  starts out with a file header followed by a sequence of data chunks.
RIFF files that are used to store audio and video information are called AVI files. The RIFF AVI file format normally contains only a single AVI chunk; however, other types of chunks may also appear. An AVI reader should ignore all chunks it does not need or recognize that are stored within a RIFF AVI file.

A WAVE file is often just a RIFF file with a single "WAVE" chunk which consists of two sub-chunks - a "fmt " chunk specifying the data format && a "data" chunk containing the actual sample data.

There is even a few quirks like “Audio CDs do not use WAV as their sound format, using instead Red Book audio ( almost a standard requirement on masters to be pressed and released as well). The commonality is that both audio CDs and WAV files have the audio data encoded in PCM. WAV is a data file format for a computer to use that cannot be understood by CD players directly. To record WAV files to an Audio CD the file headers must be stripped and the remaining PCM data written directly to the disc as individual tracks with zero-padding added to match the CD's sector size. In order for a WAV file to be able to be burned to a CD, it should be in the 44100 Hz, 16-bit stereo format.”
http://en.wikipedia.org/wiki/WAV
http://en.wikipedia.org/wiki/Resource_Interchange_File_Format





HEX  | DESCRIPTION
0000  | RIFF (in plain ASCII text)
0004  | Length of the entire file as a 32-bit unsigned integer
0008  | WAVE (in plain ASCII text)
000C | fmt[] (fmt[] in ASCII text denotes subchunk)
0010  | subchunk1size as a 32-bit unsigned integer
0032  | audio format 1=PCM
0016  | # of channels 1=1 2=2 (you can figure that out right?!)
0018  | Sample Rate!
001C | Byte Rate
0020  | Block Align
0022  | Bits/Sample
0024  | data (in plain ASCII text start of data subchunk)
0028  | Subchunk2Size
002C | First Sample (Left if 0016 = 2)
002E  | Second Sample (Right if 0016 = 2)
REST OF DATA



52 49 46 46 90 56 00 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00 44 ac 00 00 44 ac 00 00 01 00 08 00 4c 49 53 54 42 00 00 00 49 4e 46 4f 49 4e 41 4d 16 00 00 00 44 72 75 6d 20 53 79 6e 74 68 20 66 6f 75 6e 64 61 74 69 6f 6e 00 49 41 52 54 0a 00 00 00 44 75 62 77 6f 72 6b 73 00 00 49 43 52 44 06 00 00 00 32 30 31 33 00 00 64 61 74 61 22 56 00 00 80 80 81 82 84 87 8a 8e 92 97 9c a1 a7 ae b4 bb c2 c8 cf d7 de e4 eb f1 f8 fd ff ff ff ff ff fe fb f8 f5 f2 ee e9 e4 df da d4 ce c7 c1 ba b3 ab a4 9c 95 8d 86 7e 76 6f 67 60 58 51 4a 44 3d 37 31 2b 26 21 1c 18 14 11 0e 0b 09 08 06 06 05 06 06 07 09 0b 0d 10 13 17 1b 20 24 29 2f 35 3b 41 48 4f 55 5d 64 6b 73 7a 82 89 91 98 a0 a7 ae b5 bc c3 c9 cf d5 db e0 e5 ea ee f2 f5 f8 fb fd ff ff ff ff ff ff ff ff fd fb f8 f5 f1 ed

So lets see what we have...
1- RIFF in ASCII characters, is 52 49 46 46 in HEXADECIMAL
2- 0x90 0x56 0x00 0x00 is b10010000 b01010110 b00000000 b00000000 BINARY, which is 36,950 in DECIMAL
Despite not relevant in this specific case remember, this is little endian oriented (check table above).
3- 0x57 0x41 0x56 0x45, which translates to WAVE, in ASCII characters

The WAVE format supports a number of different compression algorithms. The format tag entry in the fmt chunk indicates the type of compression used. A value of 1 indicates Pulse Code Modulation (PCM, the one we often used in embedded systems), which is a "straight," or uncompressed encoding of the samples. Values other than 1 indicate some form of compression..
The standard Format codes for waveform data are given below. The references usually give many more format codes for compressed data, a good fraction of which are now obsolete.



...
...
4- The data section  is easy, as 0x64 0x61 0x74 0x61 is "data" in ASCII.
To make a player using an SD Card, usually you can search straight away to the location on the file ( file.seek(44); kinda thing), hence a lot of projects with smaller microcontrollers using 8 bit Unsigned PCM encoding, as that makes everything easier; Plus, its  lighter to implement on 8 Bit architectures.

Lets take a smaller example, in order to establish some concepts, making it easier to illustrate it, better .

Hex Dump

52 49 46 46 6e 00 00 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00 44 ac 00 00 44 ac 00
00 01 00 08 00 4c 49 53 54 3a 00 00 00 49 4e 46 4f 49 4e 41 4d 0e 00 00 00 54 65 73 74 20 44
75 62 77 6f 72 6b 73 00 49 41 52 54 0a 00 00 00 44 75 62 77 6f 72 6b 73 00 00 49 43 52 44 06
00 00 00 32 30 31 33 00 00 64 61 74 61 08 00 00 00 80 c3 db d4 be a6 94 89

ASCII equivalent

RIFFn...WAVEfmt ........D¬..D¬.
.....LIST:...INFOINAM....Test D
ubworks.IART....Dubworks..ICRD.
...2013..data.....ÃÛÔ¾¦..    


We can recognize the start of data easy, as HEXA - 64 61 74 61 is "data" in ASCII, as mentioned before..
After we have 8 samples size  specifically for this test (check test wave file listen and download links below).

The samples DECIMAL - 128 195 219 212 190 166 148 137 ( HEXA 80 c3 db d4 be a6 94 89 , 8 bit unsigned PCM) , gives us this , graphed ( using Gnumeric) :




And as seen with an audio editor !

























* TEST FILES HERE and HERE ( listen) and (download) HERE and HERE
One is a basic tom sound synthetized and the other a handmade colection with 8 samples, for example purposes, also seen in the previous graphs

http://www.asciitohex.com/
http://www.dolcevie.com/js/converter.html
http://mathmatrix.narod.ru/Wavefmt.html
http://www.topherlee.com/software/pcm-tut-wavformat.html

No comments:

Post a Comment

Feel free to contact me with any suggestions, doubts or requests.

Bless