Searching \ for '[PIC]:Voice synthesis' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: techref.massmind.org/techref/microchip/ios.htm?key=voice
Search entire site for: 'Voice synthesis'.

Exact match. Not showing close matches.
PICList Thread
'[PIC]:Voice synthesis'
2001\04\24@125328 by Sudol, Pete J (L-M)

flavicon
face
source= http://www.piclist.com/postbot.asp?id=piclist\2001\04\22\032021a


Hi,

I have been reading this news group for several months now trying to learn
some things new and get ideas for hobby projects that I never have a chance
to build. I have been thinking about using a PIC for a Text-to-Speech system
for some time and the following is just some thoughts and some info I have
collected. Now remember this is just thoughts so when i error let me know
gently.

To do a decent speech synthesis you need at least 8K samples per second. You
need some way of taking in ASCII text and then converting it into some sort
of PCM/WAV format. You would then have to send the waveform to a D/A and
Amplifier. To get decent speech you would need about 8K samples per second
to the D/A. The rest is just a simple matter of software.

Now about the software. There is a set of english rules that are in the
public domain to convert text into phonemes. It was developed by the NRL
some years ago but since the language has not changed they are still good.
You could take the output phonemes and either send them to a synthesis
algorithm which would create the PCM or have a mass storage device with
prerecorded phonemes and just output them to the D/A. Now the catch-22,
there are about 50 basic phonemes in the english language, mix and match
them and you get words. They range from range from 50 to about 250
miliseconds long each. If
you used prerecorded then you would need a large amount of fast access
external storage to hold the phonemes. If you opt for the synthesis method
then you need to calculate the wave form for each phoneme on the fly. There
are several synthesis algorithms available but all were written for highend
machines. My research seems to point to the KLATT synthesis method as the
best of the open source public domain methods. But there is a large amount
of math involved and it is ugly 32 bit Float stuff.

Can a 16F877 or 18CXXX PIC handle the task? I do not know. I figure a
parrallel D/A with 2 24XX256 EEPROMs could handle the prerecorded version.
But can it handle the translation and PCM output at the same time? As for
the synthesis version I think that a PIC just does not have the memory or
speed to handle all the math fast enough. Maybe when and if the DSPICs come
out?

If you want to take a look at the code required for a PC or UNIX box, do a
search on rsynth. rsynth is a public domain program which uses the NRL rules
and a KLATT synthesizer. The code is all in ANSI C so it could be cross
compiled to almost any platform but I have found no reference of anyone
trying to put it into anything as small as a PIC. It has has certainly not
been optimized for size.


One more thing. There is a set of prerecorded phonemes, in .AU format, on
the web someplace but I have lost the page and all I remember is that the
site was in England.


---
Peter J Sudol
http://www.piclist.com/member/pJS--AA6a
PIC/PICList FAQ: http://www.piclist.com

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.


2001\04\24@180723 by Olin Lathrop

face picon face
> I have been thinking about using a PIC for a Text-to-Speech system
> for some time and the following is just some thoughts and some info I have
> collected.
> ...
> To do a decent speech synthesis you need at least 8K samples per second.
> ...
> there are about 50 basic phonemes in the english language, mix and match
> them and you get words. They range from range from 50 to about 250
> miliseconds long each.
> ...
> Can a 16F877 or 18CXXX PIC handle the task?

My first knee-jerk reaction is to have one PIC handle just the task of
emitting phonemes.  This doesn't sound too bad using prerecorded data.  If
you have a total of 8 seconds of sound at 8KHz sampling rate, that is
64Kbytes.  Two IIC EEPROM chips can handle that.  8000 bytes per second
requires about an 80Kbit/second bit rate.  I would run the IIC at 250 to 500
Kbits/second, which leaves lots of headroom.

I would look at the 16F876 or others instead of the 877 because I don't see
why you need all those pins.  You don't need a separate D/A because you can
use the PWM output for that.  You need to send phonemes to the chip with
some sort of flow control.  The UART might be good for that, with either an
extra busy line or always sending an ACK character that tells the sender you
can accept the next phoneme byte.  Of course you would buffer up a few
phoneme bytes so that you're always ready to start the next one when the
previous one finishes.

I would probably have the foreground loop read the UART and put phonemes
into a FIFO.  Another foreground task would be driven from that FIFO to
fetch sample data for the current phoneme into another FIFO.  The interrupt
routine would drain that second fifo and write the value to the PWM output
at a regular 8KHz rate.  That means you get 625 instruction cycles per
sample using a 20MHz oscillator.  Sounds like plenty.

I'm not going to get into how to decide what phonemes to emit when.

Like I said, I haven't thought this all the way thru.  These are just my
initial impressions.


********************************************************************
Olin Lathrop, embedded systems consCkxhnt in Littleton Massachusetts
(978) 742-9014, spam_OUTolinTakeThisOuTspamembedinc.com, http://www.embedinc.com

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.


2001\04\25@011633 by Roman Black

flavicon
face
Sudol, Pete J (L-M) wrote:

> I have been reading this news group for several months now trying to learn
> some things new and get ideas for hobby projects that I never have a chance
> to build. I have been thinking about using a PIC for a Text-to-Speech system
> for some time and the following is just some thoughts and some info I have
> collected. Now remember this is just thoughts so when i error let me know
> gently.
>
> To do a decent speech synthesis you need at least 8K samples per second.


Actually, if you have 8bit DAC then 3k/sec is fine.
I recorded music for a game at 8bit 10k, it sounded great.
Voice is good at 3k, starts to get scratchy below 2k,
and 1k sounds like those kids toys...
The secret is never in the playback, it's in the recording.
You need a good voice person (pref skilled announcer)
and good microphone and audio equipment. Getting a good
result is 90% recording, 10% playback...

There are a lot of skilled announcers who will work for
$100 a day, and many have their own equipment or have
connections to borrow/hire some. I have a $300 microphone
which is barely good enough. Compared to $8000 mic
i've used in a recording studio (with an equally expensive
pre-amp) my mic is hopeless. Don't even think about
using a cheap mic.

I'm not sure about copyright (etc) but have you considered
using an existing set of phoneme wavs?? Soundblaster had
an app called "Text Ole" that had full phoneme sets in
different voices. It read text out. You could get the system
up and running using their wavs and do the recording later.
:o)
-Roman

--
http://www.piclist.com#nomail Going offline? Don't AutoReply us!
email .....listservKILLspamspam@spam@mitvma.mit.edu with SET PICList DIGEST in the body


2001\04\25@040959 by Russell McMahon

picon face
Search for MBrola.
University of ?Brussels?

Speech synthesis using diphones - joined formant pairs.
Has a very low data rate and low processor load for very good quality.
Needs a largish stored speech fragment data base (some MBs typically).
Not on bottom end micros yet AFAIK but  Free for non commercial use.
Uuix?Linux / WInxx versions available and many languages.



     Russell McMahon
_____________________________

What can one man* do?
Donate food daily free !!! -  http://www.thehungersite.com/
Donate Vitamin A!  http://www.thechildsurvivalsite.com/
http://www.rawa.com  - one perspective on Afghanistan
http://www.changingourworld.com    http://www.easttimor.com   http://www.sudan.com

(* - or woman, child or internet enabled intelligent entity :-))


{Original Message removed}

More... (looser matching)
- Last day of these posts
- In 2001 , 2002 only
- Today
- New search...