Star trek style

3 replies [Last post]
j paul fellows
Offline
Last seen: 1 year 23 weeks ago
Title:
Joined: 6 Dec 2014
Posts: 3
I have posted this here because: I use libreoffice. People here have the skill with data bases and spell checkers to possibly help make it work, or to save wasted effort if those bits are completely imposable. If it woks it would be nice to have the speech to text incorporated as a normal part of office.

From http://murga-linux.com/puppy/viewtopic.php?t=96705
the misc section of the puppy linux forum.

Star trek style audio computer inputs

I need a lot of help with this idea
First dissemination. Please share this idea as widely as possible, add it to any forum that may have members who may help.
If you feel that the idea as a hole is crap, but that some tiny part of it is interesting. Use that part as you see fit.

There are a number of different skill areas needed to make this work:
terminals (computing without graphics)
chat bots
a text t speech engine
word processor with spell checker
data base comparisons (matching this to that)
sox filters (or another set of highly tailor-able digital audio filters)

and most of all a Barry (for any one who dose not know, a Barry is the type of person who creates wonderful quirky things such as my puppy (woof) and give them away so that others can build upon them.)

I will start with the chat bots and work backward.
A chat bot is a program that is designed to give human like responses to typed statements or questions. To do this it must recognise the key components in a line of text, then to produce a response that look like it is about those key components.
I am suggesting limiting the chat bots out put to terminal instructions and using the chat bot to translate the output of the speech to text system that I am proposing into something that the machine con use.

The word processor accepts its input from the data base at 18 syllable per second and sends a spell checked version to the chat bot.
The spell checker will have to deal with only 4 types of mistakes, but will have to deal with a lot of them. 1 normal phonetic mistakes, the substitution of ph for gh or ff or f. this is a normal problem for a spell checker. 2 multiple spaces even within word. 3 Duplication of syllable and short words, this could be reduced by reducing the sample rate from 18 syllables per second, but care will be needed not t go to far. 4 Unusual phonetic mistakes i.e. ation for asian. This should only be a problem if care is not taken when setting up, or it is being used be some one else.

The date base part one. I imagine the data base as a 256 by 256 by 256 cube, over 16 million slots most returning the symbol for the space bar. Most languages have fewer than 50 sounds, so the symbols that correspond to each sound can be stored tens of thousand of times.

The data base part 2. leaving the Hollywood version of the data base aide, the function that this data base is a many to few conversion. As the word is spoken it addresses many slots in the data base in succession. This data base is interrogated 18 times per second, and the continence of the addressed slot are coped to the word processor. 18 samples per second is more than enough to ensure that each syllable is passed on at least twice, before the next one is passed on.

Data base part 3. the correct combinations of letters for each sound are loaded into the data base by first adjusting a text to speech engine to match the sound of the users voice (after the input filters) in terms of rhythm and tempo. Then have it read the contents of the spell checkers dictionary through the main filter to get the addresses for the data base, whilst at the same time the syllables from the text to speech engine are stored in the addressed slots. This is done twice, first with added Wight noise, then without. Where a syllable is being loaded over a different syllable it should store a space in that slot.

The filters will be the hardest bit to describe, but they have a very simple function. To convert a sound Sybille, into a string of data base addresses that correspond to that sound and that sound only.
SOX is a program that aloes for the construction of a wide range of audio effects and filters.
The first filter cuts of all of the higher frequencies, this both limits the maximum sampling rate needed and makes the voice sound more robotic making it easer to make a simple text to speech engine match it.
Next the volume would be normalized. Then the signal is copied to make 25 identical mono tracks, labelled: O 1a 1b 1c a 2b 2c 3a 3b 3c …………8a 8b 8c. 1A is delayed by 3 milliseconds, 1b 7 milliseconds, 1c 11 milliseconds, 2a = 13, 2b = 17, then 19………. 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 prime intervals.
The current value of O is then subtracted from a1 to 8c to leave 24 different tracks that reflect the way that the sound has changed over the 24 time intervals. Sox allows for he setting the out put bit rate, by setting it t one bit the out put will be ether 1 or 0, depending upon relieve volumes. Grouping the 24 one bit tracks into a 1 to 8, b 1 to 8, and c 1 to 8, gives the three addressees for the data base.

So with the mic on but no one speaking the data base will give a string of out puts, mainly spaces, but with some random letter groups. Then when a word is spoken they each In there own time, turn to become 0, because the sound is louder than it was a fraction of a second earlier. As the word progresses the bits will alternate in a manner that depends only upon the word being spoken and is independent of he state of any of the other bits. As the word ends the bits will start to all become 1.

Coming back to Hollywood view of the data base, then a word can be viewed as a thread that starts down near 0,0,0 and winds it way through the reliant syllables towards 256,256,256. suppose the word is transfer. Then the out put from the data base might look something like “ing,—-tr-tran,ance,—f,fair-er – de”. The 2 random syllables ether side of the word, from the pauses between words, where chosen by me to show the type of problem that might beet the word processor some of the time.
What will be the out put of the chat bot if its its input is “transferd the out put of x to the in put of z”?
‘x|z” as well as setting up the chat bot to produce terminal speak it can be set up to work with the errors from the word processor.

dose anyone know of a good free text to speech package that work well at free dictation.

oweng
oweng's picture
Offline
Last seen: 5 days 10 hours ago
Title: ==Moderator==
Joined: 26 Jan 2012
Posts: 3281
patented technology
Quote:
dose anyone know of a good free text to speech package that work well at free dictation.
I would imagine both text-> speech and speech->text to be rather patent encumbered.
j paul fellows
Offline
Last seen: 1 year 23 weeks ago
Title:
Joined: 6 Dec 2014
Posts: 3
Upto the chat bot, this is my
Upto the chat bot, this is my attempt at a STT system, for free dictation. A) sound to digital code. B) use the code to address an over sized data base. C) the data base returns groups of ascii codes. D) a spell checker assembles the words.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.