[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 289
  • Last Modified:

Obtain maxima of minima, and minima of maxima from a continuous number series.

Given a stream of numbers such as this :

70,70,70,75,65,55,60,70,70,74,3,4,5,2,0,11,12,4,3,9,60,60,60,65,55,45,50,60,60,64 [continues]

(which in this example falls broadly into 3 sub-series), I'd like to obtain the minimum value of the first sub-series; the maximum value of the 2nd series, and the minimum value from the third sub-series, which will replace that obtained from the first sub-series. But as the co-terminals between the sub-series will, in real life, not be as evident as they are here, given that the stream will *not* be broken into convenient 10-value chunks, how to go about analysing a continuous stream to extract the results I'd described above? Thank you for your help.
0
krakatoa
Asked:
krakatoa
  • 22
  • 17
  • 3
  • +1
3 Solutions
 
d-glitchCommented:
It is easy to keep track of the max and min for the current sub series.

But you need to know how to tell where one sub series ends and the next begins.
The first ranges from 55 to 75 -- delta of 20
The second ranges from 0 to 12 -- delta of 12
The third ranges from 45 to 65 -- delta of 20

You need to analyze more behavior to understand what typical behavior looks like.

If the data is not well behaved, there may not be an unambiguous solution.

If the second sub series is taken away, you would not notice a transition until the 45.
That would mean you would miss the real  max=65 for the sub series entirely.
0
 
Kyle AbrahamsSenior .Net DeveloperCommented:
Homework?

What is a sub series?    And if there not in 10 digit chunks then how are they determined?
0
 
d-glitchCommented:
You might try keeping the running average for the current series.

When the next input is larger or smaller than the average by some amount
(this value is the critical decision/definition):

   Terminate the current series
   Output the max and min
   Reset the average, max, and min to the new value
   Start processing the next series.
0
Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

 
krakatoaAuthor Commented:
Homework - lol. Well, I work from home, but Ieft academia almost 35 years ago. ;)

What I am getting at d-glitch is that I *think* I need to somehow obtain the rolling maximum of a (roughly contiguous) 'sub-series' of "minimum" values (I say sub-series, because there will be a shape to the stream of values, a shape which, broadly means that a string of low-value values will be quite likely followed by a string of appreciably higher-value values), and at the same time obtain the rolling minimum of a sub-series of "maximum" values.

(Actually the words maximum and minimum here are not right of course, but what I should rather say instead is something akin to a mountain chain, where there are local maxima (peaks) and minima (valleys) as you travel along the chain. I want to calculate a single number that changeably reflects the moving point between that maximum of minimums, and minimum of maximums.

(So maybe better forget I said sub-series at this point, and instead think of a constant stream of numbers where *most* of the adjacent values will be similar).
0
 
krakatoaAuthor Commented:
Just thinking about my own problem here  - - -  maybe all I need to do is take the average of all the values? For the ones above, that would yield a useful 43.7 - - - but is that pure chance that it is not above the lowest of all the 'high' values (viz subseries 1 and 3)? If this average would always represent a point just below and never above the higher values lowest value, then I don't care if it is further away from the upper bound of the lower values (2nd set).

Does that make any sense?
0
 
d-glitchCommented:
it always helps to understand the data.
Are we talking about prices of stocks, weights of pumpkins, or ages of people?

Do you know why the data behaves this way, with distinctive sub series?

Why do you need to know what the max and min for each sub series, and how important is it to get it right?

I don't see how averaging all the numbers helps.

Without more information, there is no way to tell if particular characteristics of the data are pure chance or an intentional trap.
0
 
krakatoaAuthor Commented:
d-glitch. Thanks.

Although this problem, mathematically, should lie in a mathematical space afaiac, and thus be tractable theoretically, if you think that knowing the entity that the data represent will really help, then I will of course tell you. But I would like to ask whether that is entirely necessary - not because the data are secret in any way, but for me, as a non-mathematician, I'd be very happy to know what the 'pathology' underlying such a situation really is, so that any answer doesn't just become a received piece of information for me, but that I understand a little bit more about how one arrives at (any possible) conclusions. Kindly indicate on this. k.
0
 
d-glitchCommented:
I certainly don't need to know, but you will be making assumptions based on your observations of the data and your understanding of the underlying process.

It's good if you have observed that sub series of high values alternate with sub series of low
values.  It's much better if you understand why this is the case.  It could be time of day, or time of year, or the age of experience of some machine operator.  But if you don't know why,
you may not know anything.

You are trying to so something.  I don't need to know what it is.  But if there is value in doing it
right, then there is cost for doing it wrong.

What is that cost?  One dollar or one thousand?  Your job, your life, ..., the end of the world
as we know it???
0
 
d-glitchCommented:
Looking back at your sample data....

Assume that there really are three distinct sub series that are each 20 units wide.
And assume that #2 is missing, and you go from #1 to #3.

In this case, you will miss the transition (which really occurs between 74 and 60),
you will misclassify five data values as belonging to the first sequence,
you will get the wrong max value (64 not 65),
you will get the correct min value (45).

Is this sort of error acceptable?
0
 
krakatoaAuthor Commented:
Well, my example was just an example. As I've said, and in hindsight, it's best to forget about series. The data are a stream in which values are likely to be grouped into indeterminate-length runs of neighbourhood values. The discontinuity between any part of the run and another adjacent section is only that it can be characterised by its values being larger or smaller than a neighbour. Since a series of any length >= 1 is a part of the series, it's obviously no good comparing shorter episodes with longer ones. That is why I am now asking the question in the form I gave in my comment ID: 38406653.
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Krakatoa,

You do really need to consider the nature of the data. I sort of understand the intention of wanting to consider it just in terms of the numbers, but the result that you will get will just be sub-optimal. Think of it in terms of information; the more (relevant) input information that you can provide to any data processing task, the better the result will be. Now, the numbers above are one obvious input, but the nature of the data is just another input into the data processing task, and will help achieve a better result. For example, you asked this question...
the average .... would yield a useful 43.7 - - - but is that pure chance that it is not above the lowest of all the 'high' values
Isn't it obvious that that question could only be answered by knowing the nature of the data?

On to the question at hand, since I have an electrical background I will make an assumption that this might be sampled data from an electrical input and that you are trying to 'detect' if the signal is high or low. If this were the case and I was trying to implement this, than I would know that the variation in the numbers is likely due to some form of noise (either electrical noise or noise in the physical quantity that the signal represents), and hence I would probably start by filtering the numbers. Possibly a median filter would be a good choice here as it preserves (to a large degree) the slope of the signal at the transition point. One parameter of the filter is the window size of the numbers to consider, and the selection of this is again another decision that is based on knowledge of the input. Is there a certain frequency of data variation where it can't possibly due to the actual measured thing and so must be noise. ie, in terms of numbers similar to your original question, is sometihng like {70,65,60,68,70, 2 ,73,67,61} just not physically possible? And therefore can that be filtered out since it's just noise?

Once the data is filtered, you can use a number of different algorithms for detecting if the signal is a true high or a true low. As you said, you could do a simple is a higher/lower than a number/threshold. You could do something where you test the delta between consecutive (filtered samples) is greater than a certain value/threshold and this signifies a change in state. It has the advantage of adding some hysteresis which may be beneficial (again depending on the nature of the data).

I realise that I haven't provided exactly what you were hoping to get, but hopefully it has provoked some thinking of what you are really trying to do. And if you want to provide some more insight into what it is that these numbers represent, I'm sure that we can help further to getting you an acceptable answer on this.
0
 
krakatoaAuthor Commented:
mccarl

That's a very useful contribution you make. Thank you very much indeed, and very nice to know you'd like to assist.

Your electronic background is certainly useful for this. So, let me begin by saying that the data are byte[]s, from UDP voice datagrams. I'm constructing a VoIP application. Ultra-simply (or simplistically, afaiac, since I am far removed from much of an understanding of electronics and hardware), what I tell myself I am looking for is an arbiter function / routine to sort out silence (or relative silence) from speech packets.

My app already works, shall we say, more than satisfactorily well with two peers in a conversation. The QoS is very acceptable over WAN and LAN alike. The underlying Java infrastructure is also already there, in what appears to be robustly-functioning form. By infrastructure I mean the distribution of the lists of who is talking to whom (or who wants to talk to whom, plus a slew of other similar housekeeping matters, before the voice conversations ever get underway).

So it comes down, in deployment terms, to the digital/analog audio ballpark, and the situation where, as soon as a "third" (read "nth") peer enters the conversation, the signals fall apart. Although all the connections remain up, the QoS takes a nose dive.

My present answer to this is to try to obtain an indication of what constitutes active voice data in a packet; for this I need to analyse what is effectively a byte stream which has some shape - the shape being that there are a lot of neighbourhood values resembling each other, but the resemblance can change at the drop of a hat if someone stops talking, for example.

But when you say :
Isn't it obvious that that question could only be answered by knowing the nature of the data?
then I'd have to say (as you can probably see by now) that, no, it isn't obvious to me at all, and that is why I need to understand why it should be. ;) (Having said that, I can now *possibly* see that two sets of values with quite widely-differing local means, when put together and averaged, may make a different kind of average than one for a (more strictly) linear set. But that's me surmising, rather than knowing, if that makes any sense).

I've got two classes which assess the packets. One assesses *the* packet, absolutely, for level; the other class does a soft assess at fixed intervals on packets, setting a threshold pass value that the actual sending threads pick up. These classes make some difference to the overall performance, as you'd probably expect, but they don't improve it dramatically. I'd like to know now whether this is a function of overloaded hardware, or a function of the naivety of my variables and sampling techniques. Because if it is the latter, then an insight into the way the numbers ought to be crunched could do the trick relatively painlessly, whereas if it is the former, brought on by kack-handed programming paradigms or whatever, then it's another matter again.
0
 
d-glitchCommented:
The first algorithm that comes to mind when you talk about processing speech signals is the FFT.  Human speech has a characteristic frequency spectrum.  And the com channel (in the absence of speech) will have a characteristic noise spectrum.  

If detecting the silences is enough, you might only need to take the RMS power of 50 ms segments.  Speech will have much higher power content than silence.

Speech signals (and sound in general) are typically bipolar.  Positive and negative values, with zero in the middle.  So a straight average of speech and silence would be zero.  You need to square (or take the absolute value of) the signals before you average them.

You have to make sure you understand the encoding for the digital data.  Do you have analog floating point values, signed integers, or raw digital data from an A/D convertor?

50 milliseconds is the typical duration for a phoneme, the basic element of speech.

http://ocw.usu.edu/electrical_and_computer_engineering/Science_of_Sound/Module_4_-_The_Human_Voice_2.htm
0
 
krakatoaAuthor Commented:
Thanks dglitch.

Do you have analog floating point values, signed integers, or raw digital data from an A/D convertor?

I've got bytes; just bytes, as I'm working in Java. A byte of course holds a signed integer range, -127 to 128.

And I'm putting 5120 bytes in each packet, btw.
0
 
Kyle AbrahamsSenior .Net DeveloperCommented:
Now that I have a better understanding of what you're after, I'd like to offer the following points:

1)  A typical VOIP (or chat program) has a threshold setting where the user can configure in terms of noise level to determine what makes speech and what is not.  (Sensitivity so to speak).

2)  Instead of transmitting all packets, a packet only needs to be transmitted once there is voice packets to be sent.  Filtering should be done tramsitter side before sending out to the other users.

Point 2 should help tremendously with your QOS.  You can then assume all incoming data is voice, and you only need filter send side between voice and silence, and not a variety of streams.

Edit:

I would also "0" out anything below the threshold to help with the processing.  This way on the play back side you're dealing with 0 or not 0.
0
 
krakatoaAuthor Commented:
2)  Instead of transmitting all packets, a packet only needs to be transmitted once there is voice packets to be sent.  Filtering should be done tramsitter side before sending out to the other users.

I think if you'd got the gist of what I have been saying, you'd have noticed that this is the overarching strategy, and the entire justification for this question. Why would I want to send packets with nothing in them?

As to your first point. Chat is not a term one applies to voice. Chat, ironically, means text data. And in the voip applications that I have used, I've never seen a place where the user can set thresholds of the kind you refer to. Setting the thresholds is the province of the application afaics, and it would be interesting if you could back up what you say with some evidence about where this sort of thing takes place, for example in Skype . . .? If not, then can I ask that you channel your thoughts into the maths at hand, rather than general metas.
0
 
Kyle AbrahamsSenior .Net DeveloperCommented:
http://forum.ventrilo.com/showthread.php?t=48860

Ventrillo is a free chat server used by many gamers for it's low latency performance.  

When I said chat, what I meant was chat applications where voice to voice was also an option.  



I think if you'd got the gist of what I have been saying, you'd have noticed that this is the overarching strategy, and the entire justification for this question. Why would I want to send packets with nothing in them?

You don't.

So while reading the input stream, let's say you're reading silence.  As soon as the threshold is met, you begin recording those packets into the buffer.

You send the packet once silence is read again or when the buffer is full.


As to the math:  
The threshold is some arbitrary value, set by the user.  Ventrillo uses a negative threshold but given what was said earlier here it can be a positive one.

Square your signal and if the result is above the square of the threshold record the packet into the buffer using the algorithm above, sending as needed.

I see no point in obtaining the maximum of the subseries as the threshold becomes the toggle for value vs non-value.
0
 
krakatoaAuthor Commented:
ged325

Let's just leave your thoughts where you left them, please. I want to move on from here, not dwell on topics I already considered. I'm not really interested in Ventrillo, thanks; and once again, I'm *already* not sending silent packets. So thanks for your help - it was nice to meet you.
0
 
krakatoaAuthor Commented:
I've requested that this question be deleted for the following reason:

The question has gone into the shrubbery. I do not want to discuss other applications, but rather the pathology of byte streams, and the math that can be done on them, given the limitations of the data. Apart from a mention of FFTs, there hasn't been discussion of any algorithms, so the point of the question has been lost.
0
 
d-glitchCommented:
I do object.

The question was well into the shrubbery when it was first posted.
It appeared to be an attempt to understand properties of psuedo random data.

Apparently krakatoa thinks we are bright enough to figure out how he needs to analyze data without knowing what its nature, and to help with his VOIP app performance without knowing he is working on one.

A half day later, we discover that the data is speech, so max and min become irrelevant. We should have been talking about the FFT, RMS power, average amplitude, and dynamic range from the beginning.

A full day goes by before we discover that the actual problem he is dealing with is combining multiple UDP data streams.  

That would have been a great title for the question.  We should have been talking a time-stamping and sequencing the data packets, and the choice of codec and data
compression.  It's never too late to start over.
0
 
krakatoaAuthor Commented:
A full day goes by before we discover that the actual problem he is dealing with is combining multiple UDP data streams.  

I don't ever remember in having been on EE, a finite time limit for commenting. There's never any point rushing into commenting before having had a chance to think about what has already been said, and perhaps trialling it if appropriate. Re-coding, recompiling, and particularly running my app is not a trivial matter - the live test always has to be set up on multiple PCs - you can't use loopback convenience on this one at all.

More importantly, I am not combining multiple UDP data streams. OK, genealogically, it could be said they are from different sources, as they are generated by separate users. But as far as the receiving computer is concerned they make up only one stream, because only meaningful packets are sent to any individual. If meaningful has to include background noise or people talking over one another, that doesn't concern me.

As far as the original question is concerned, you don't seem to want to give me any credit at all for being familiar enough with the shape of the data to allow myself to quote representative values that exemplify the profiles I'm working with. I would have thought that in the maths/algorithms area, this TA would have spotted that the above sandwich of value families was made of two slices of higher value bread, and a filling of tasteless spam. If I had not meant you to take that as a literally interpretable meta-pattern for the stream ad infinitum, I would have said so to allow you to avoid the needless pain you are causing yourself(ves).

Fine, keep the question open - it's not as if I suddenly don't want an answer to it, or a best judgement. And so what help would you like me to provide to you now so that you can answer the question?
0
 
d-glitchCommented:
What does the 5120 bytes per packet represent?

A.    0.8 seconds of uncompressed POTS quality audio.
B.   2-5 seconds of compressed POTS quality audio.
C.   0.1 seconds of higher quality audio.
D.   Something else.

As a test, how many packets per second are sent while reciting 10 s of  the Gettysburg Address.

A two person conversation is trivial.  Each listener has to deal with a single input source.
With a reasonably fast internet connection and a low packet rate, the probability of an out of sequence packet is small.

But this all goes to hell when you add a third party.  Every listener now has to deal with two potential inputs sources.

>>  More importantly, I am not combining multiple UDP data streams.

But that is exactly what you must do.  If fact you have to decode, decompress, time align, and combine the packets for each listener.  You might also want to do some buffering (in time)  and filtering.
0
 
krakatoaAuthor Commented:
If fact you have to decode, decompress, time align, and combine the packets for each listener.

I'm not sure I'm quite up to speed with you on this one. The packets can only be manhandled in Java - the lowest level you can get to with Java is to operate on bytes, shove them into packets, send them out and receive them. There are no low-level tools in Java to do more than that. So to answer your bullets above, the audio in is PCM, from a regular PC mic, played back through what could be anything from a tin can to an surround sound system I guess.

There is the Mixer of course, and this is a place where one can fiddle with levels etc. But before I do that, I would like to exhaust if that's the right word, my investigation into the maths of the bytes. If you think code would help, I can post what I'm doing on this aspect of it, no problem.

Oh, the 5120 bytes are just a divisor of the codec conditions - any multiple of that number would "work", but this seems to be one where performance already seems better than when a higher packet size is chosen.
0
 
d-glitchCommented:
Here are the options for my microphone input.
What are yours?  It does matter.

If you aren't near the top of the list, near telephone quality, you will need several 5kB packets to make up a single phoneme.  

If you just play packets from several sources in arrival order, you will have an unintelligible mess.  A human listener can distinguish two or three simultaneous conversations, but not mashed up this way.

Think about the conservation of time.  Assume one listener and two speakers.
The speakers talk simultaneously, constantly, and rapidly for 10 seconds, generating N packets each with no packets to be discarded.  The listener gets 2N packets that will require 20 seconds of playing time.
You could buffer and sort them by speaker, then play one after the other.  But this would rapidly become unsustainable as you add speakers.  You have to combine the packets.
Audio-Options.png
0
 
krakatoaAuthor Commented:
Sorry about the delay getting back again. Something else cropped up.

OK, so I think my next question is then ,d-glitch, are you referring to using a Mixer on the receiving end?
0
 
d-glitchCommented:
I am not sure of the level of abstraction you are dealing with.

If you are dealing with scrambled packets from multiple audio streams, you have to unscramble the packets and restore the individual streams, and maybe insert delays for the discarded silent packets.  Then you can recombine them with a mixer.  Filtering and equalization would be optional.

If you are dealing with bytes, you have to decode them into audio samples, then into audio streams.  In its simplest form, MIXING N streams together is just averaging the N samples for each instant of time.
0
 
krakatoaAuthor Commented:
Where do you think I could get help on dealing with decoding audio samples from bytes? It doesn't sound to me as if it lies inside the Java realm . . .?
0
 
d-glitchCommented:
There is lots of stuff on line.  

For example    http://www.totalrecorder.com/primerpc.htm

I would dig into the specs of whatever software you are using to capture the audio.
0
 
d-glitchCommented:
Presumably, the packet has some header information that include the number of samples.

If the samples are 16 bits, they are almost certainly stored in consecutive bytes.
Java should be capable of dealing with the sort of byte operations.
0
 
krakatoaAuthor Commented:
I would dig into the specs of whatever software you are using to capture the audio.

Meaning the OS's sound, or Realtek - which I believe runs on my PC?

re: your most recent comment - so if I understand your shorthand there, it would be asking me to (given 2 bytes to work with at a time) work on the assumption that an indeterminate series of 2 bytes belong to the same voice? Or am I completely wide of the mark with that thinking?
0
 
d-glitchCommented:
Are you implementing a distributed, peep-to-peer system or a centralized, server based system?

In either case, doesn't each data packet come from a single, identifiable source?
If so, all the data in a packet certainly belongs to the same voice.
0
 
d-glitchCommented:
I objected to closing this question earlier, but I must have hit Submit rather than the Object.

I am just reiterating my objection here.
0
 
krakatoaAuthor Commented:
Yes, the data in a packet comes from the same voice. However, I'm probably misguided in thinking that there can be an algorithm which would measure a packet, decide if it contained enough signal to identify it as a voice packet or not, and drop it or send it according to that metric. I was *hoping* that the saving grace of this admittedly tenuous speculation was that as the voices are coming from people and not computers, that any clashes could be sorted out - as they would be in any conversation - by the other parties asking for just one person to speak at a time, or for there to be a natural give and take in the conversation which would obviate actually sorting out packets electronically. I understand that packets continue to be sent even if a person isn't speaking, but those packets would lend themselves to identification as uninteresting packets, and allow the algo to drop or pass them as the case may be. This, apparently, is something that you seem to be telling me cannot happen. Is that right?
0
 
krakatoaAuthor Commented:
I don't know if it would be helpful to this discussion, but the app could be run to actually see (hear) what the situation is. Either that, or, if anyone is into Java here, the (relevant bits of) source code can be posted somehow.
0
 
krakatoaAuthor Commented:
Would an algorithm definition necessarily include the substantive packet disassembly?
0
 
krakatoaAuthor Commented:
I would say that what I am trying to achieve is this :

"Everyone" - (N peers) - can all *hear* the conversation; but only 1 peer can speak at a time.

If this were somehow feasible, what would you gentlemen make of that situation?
0
 
d-glitchCommented:
That is certainly possible.
It is easy with a central server.  
Everyone talks to the server, and the server keeps track of who has the floor and sends out the proper audio.
The server could even buffer speech from each listener, holding it until his/her turn comes around.

It is more difficult, but still possible with a distributed system.
Software in every users computer has accept (N-1) audio streams and keep track of the speaker order.  Everyone would know when they have the floor.
It gets easier if you can circulate a token among the speakers. Everyone would know when they have the floor, and maybe how long they get to keep it.  Lots of networks use this sort of architecture.

     http://www.ianswer4u.com/2011/05/ring-topology-advantages-and.html#axzz27UHss422
0
 
krakatoaAuthor Commented:
I've got a client/server setup for the instant messaging as well as the initial placement of the voice call by a client. A client requests the entire peer list from the server, and picks and chooses locally from it whom he wants to speak to. The implementation from that point onwards for voice, is peer-to-peer; the invitations go out, P2P, and of course the voice is conducted P2P.

In a human conversation (I know there are not many other types of conversation ;) ), interruptions can be short and frequent if the direction of the conversation requires exchanges of a comprehensive nature ("right"; "oh yes"; "I agree"; "what time is the meeting"; etc etc), and so the passing of that token is going to have to be extremely rapid. And what criterion would validate the release and affordability of the token's 'lock'? Another token??
0
 
krakatoaAuthor Commented:
Having said what I said 2 comments ago, I see two problems - firstly, I can see the initial packets still being lost when a new speaker acquires the token, as it would require impossibly swift token turnaround to capture everything from the word go each time, and, secondly, even with or without the first problem, the other missing factor would be atmospherics in the whole thing - the sort of quietness of attendance that accompanies a conversation wherein you can hear that the other party is still alive. l)
0
 
d-glitchCommented:
There are all sorts of strategies to consider:

The first client's computer could serve as the default master and broadcast a network status packet once every second or two.  The status packet could indicate who is on line, who has the floor, how much longer he gets to speak, and who's up next.  All this info could be displayed on screen so everybody knows what's going on.

The peers would all know when to transmit and who to listen to.  Conversations are typically mono, but most PC's have stereo audio.  You might be able to put  audio from peanut gallery on the second channel at reduced volume.  It could help with the atmospherics.  You could fit a lot of people into the second channel if all they are making the sort of short, supportive responses you mentioned earlier.
0
 
krakatoaAuthor Commented:
Cripes! Will I have to start all over again with the coding? (I am going to post the code on my website shortly, in case anyone wishes to see what on Earth I'm talking about at any particular juncture). BTW, what is a peanut channel? Sounds like I *should* know . . . ?
0
 
d-glitchCommented:
Sorry.  Peanut Gallery is old (and new) US slang
     http://en.wikipedia.org/wiki/Peanut_gallery

It might be time to start a new question with a more relevant title in another area.
I do a lot of signal processing with Matlab, but no Java coding.
Java, UDP, Signal Processing, P2P Communication are possibly relevant keywords.


There is still an one open question:
>>  What does the 5120 bytes per packet represent?

If this is large enough to include several phonemes, then it is large enough to include one phoneme and large chunk of silence.  At some point this will limit your performance, and you may need or want to disassemble the packets.

The 5120 bytes may be optimum for generic data transmission but much less so for speech processing.
0
 
krakatoaAuthor Commented:
I'd never heard of the Peanut Gallery, and I always thought I knew at least my fair share of American culture, having already been old enough to laugh at Desi Arnez and Lucille Ball. I had no idea that I'd one day end up mentioning such things on a Java programming website . . . mainly I presume, because computers hadn't been invented yet. (Give or take). ;0

Being a great believer in sweeping problems under the carpet whenever an abstraction can be used in place of real work, I was harbouring the hope that a high packet amplitude reading would act as a kind of implicit token - IOW, if someone wanted to "make their voice heard" badly enough in a conversation, the algo which tracks the amplitude would allow the packets to be forwarded, and that the irritation caused to other speakers by the unintelligible noise thus resulting, would be enough to make the present speaker fall silent, and the parvenu to waffle on. So, in that scenario, the high amp. values carried a soft tokenisation, as well as eventually worthwhile, 'I have the floor' signal. As I said before, I can't get away from that conviction.

5120 is the buffer size (the number of bytes) in a packet. It could be twice or 4 times as large, but those values produced unintelligible noise - it was only from 10240 that the curtain was lifted.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

  • 22
  • 17
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now