Page 1 of 1

Estimating Max with Only Definite Integrals

Posted: Fri Sep 29, 2017 8:28 pm UTC
by Cradarc
Hi guys! It's been a while.
I've hit a problem in my work that I could use a little help in. Unfortunately I cannot divulge the full details, so I'm formulating it as a purely mathematical problem.

The Problem:
I have a function f(x) that is defined over some real domain x = 0 to n, where n is known. f(x) is assumed to be continuous, smooth, and non-negative everywhere in the domain. My objective is to estimate a xmax where f(xmax) >= f(x) for all x in the domain. The "accuracy" of my estimation is measured by 1/|xmax-xest|.

The Catch:
f(x) is unknown and inaccessible.
However, I do have access to an "oracle", O, which allows me to compute definite integrals of f(x) over any region in the domain.
O: (a,b) --> (|b-a|/n)(Integral{f(x)dx} from a to b)

Each query to the oracle is associated with a cost Q.
I can choose to have the oracle do multiple queries in parallel (ie. respond to multiple queries simultaneously for a single cost of Q). However this option requires I pay a one-time upfront cost of S per each additional query stream. S is on the order of 10Q.

Naive Solution:
1. Choose a desired accuracy, k.
2. Divide [0,n] into 1/k spaced intervals.
3. Compute the definite integrals across each 1/k interval.
4. Set xest = midpoint of interval with maximum definite integral.

This would solve the problem as long as f(x) doesn't have "spikes" with width narrower than 1/k. For a sufficiently large k, I can make this work.
The cost would be Q(n/p)k + S(p-1) if I choose to use p parallel streams. This is not ideal since it grows linearly with k and I have to make k large to ensure high probability of success.

Suggestions or tips for a more efficient approach?

Re: Estimating Max with Only Definite Integrals

Posted: Mon Oct 02, 2017 3:42 pm UTC
by SuicideJunkie
One thing is that if you're going to start up a new thread, you should use it more than once.
At a cost of 10 quatloos, you'll need to run 20 queries before you break even on the second thread, and 60 to break even on the 3rd.
On the upside, for algorithms where you know the total number of queries to make ahead of time, then it is simple to calculate how many threads to run.

If there are quick transitions, you also need to worry about spikes that are cancelled out by valleys.
Do you know what the maximum frequency or frequency distribution of the signal is?
What kind of balance do you need to strike between normal and worst case performance?

One thing to consider is multiple passes, in which you start with a low resolution sweep, but then subdivide the highest results a few times to zoom in on likely maxima until you run out of quatloos and have to give an answer.

Re: Estimating Max with Only Definite Integrals

Posted: Mon Oct 02, 2017 5:12 pm UTC
by Xanthir
The "use the second thread multiple times" thing was factored into their cost equation already.

Depending on how the data is shaped, multiple sweeps at different resolution may or may not be useful. If things are usually broad and smooth, then yeah, but if the maxes happen in relatively narrow spikes, they'll contribute very little to the integral and easily be missed.

Ultimately, yeah, I don't think you can do a lot better than a linear number of scans at near the "width" of the value spikes.

Re: Estimating Max with Only Definite Integrals

Posted: Mon Oct 02, 2017 7:22 pm UTC
by Qaanol
Is this for work?

If so, how much is a solution worth to your employer?

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 03, 2017 5:25 pm UTC
by Cradarc
This is part of a larger system that I want to optimize. The system is my assignment. The naive algorithm I gave works, but I'm dissatisfied with the performance.

If it helps, f(x) is actually a sampled signal, but at very, very high sampling rate.
My oracle is just summing over some interval very efficiently (ie. faster than me doing it manually), but it needs to be set up beforehand.

I can definitely compute the FFT, but at that point I feel like scanning though individual indices would be faster.

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 03, 2017 5:48 pm UTC
by Zamfir
From your description, you can't do a multi-stage approach, right? You cannot do a coarse sweep first, then zoom in on promising regions, because those samples will be lost?

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 03, 2017 7:20 pm UTC
by Cradarc
No, the samples are stored in RAM. So I can access them multiple times. In fact, that is mainly why I think my current solution is suboptimal.

The problem is not the accessibility but rather the time required to go through each sample looking for the max. Through the oracle (which pre-processes the signal at the analog level), I can save a lot of time. Technically the oracle cost is in terms of hardware, so they can't really be comparable to the search time, but I lumped them together for simplicity.
Ideally I would have access to the oracle and simply modify it so it finds the max for me, but I don't. As far as I'm concerned it's just a black box with said features.

I have considered the coarse then fine approach (halving the interval size each time and picking the best one). However, it quickly becomes a bad approximation since the energy density becomes less and less correlated with the max as your interval becomes larger. Maybe there's some trick we can play? I don't know.

I would also be fine if someone can prove to me there's nothing significantly better so I can get closure and not get nerd sniped. :P

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 03, 2017 7:57 pm UTC
by Zamfir
Based on previous runs, you might be able to define limits on the spikiness of the function. Not just width, also height. As a rough sketch: you go through all local maximums in old datasets, measure their peak height from the neighbouring minima, and measure the area (integral) of the peak.

This might give you a hard limit function - a peak of height X requires at least an area of Y. Or perhaps a more probabilistic relation.

Either way, this would allow you to do ' worst case' analysis of the results from a course grid run over the data. You know the average of a section. Now assume that this section is a low, flat plateau with a single misleading peak. Based on the prior experience, how high of a peak is possible/likely?

Perhaps this will allow you to rule out many sections, because their worst case is still below the highest observed average. Then you can do a higher-resolution run on the remaining sections, and repeat. A kind of branch-and-bound algorithm

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 03, 2017 9:56 pm UTC
by arbiteroftruth
The fact that the data is sampled puts a hard limit on the spikiness of the signal. In the worst case you have a spike of only one sample. And the fact that the data is non-negative puts a limit on the maximum value of any sample in an integrated region. If you integrate a region and get a value y, the highest possible value of a single sample is y, which occurs exactly when every other sample in the region is 0. We also know that the minimum possible value of the maximum sample is equal to y/n, where n is the number of samples in the region.

So if you integrate a region with width n and the oracle gives you a value y, the value of the maximum sample in that region is bounded between y/n and y. This gives the potential to rule out certain regions with coarser intervals, then refine the search on everything that remains.

One possible approach would be essentially a binary search. Start by integrating the two halves of the data, and see if by some miracle you're able to rule one out. If you can't, then integrate the first and third quarters (first half of each half). You don't have to integrate the second or fourth quarters, since the integral of the second quarter is just (first half) - (first quarter) and the fourth quarter is just (second half) - (third quarter). You still split the data into 4 segments with a total of 4 calls to the oracle, so the only additional cost compared to the naive solution is the subtraction you perform to get the second and fourth quarters. You can repeat this process to get arbitrarily fine divisions with no additional calls to the oracle compared to the naive solution, but you'll eventually be able to use the [y/n,y] bounds on the maximum value to start eliminating entire regions, reducing the total number of calls to the oracle in the end.

Depending on the length of the data set, you might have to worry about floating point rounding errors when calculating the missing fine intervals from an earlier coarse interval, but mathematically the idea seems sound.

Re: Estimating Max with Only Definite Integrals

Posted: Wed Oct 04, 2017 4:21 pm UTC
by Qaanol
Assuming the sampling rate is significantly higher than the frequencies of the signal, there is indeed an approach that uses a relatively small number of calls to the oracle.

Re: Estimating Max with Only Definite Integrals

Posted: Wed Oct 04, 2017 10:35 pm UTC
by Xanthir
@arbiteroftruth: Ah, didn't think of the fact that you didn't need to repeat the finer-grained calls, that's clever and would indeed work.

Re: Estimating Max with Only Definite Integrals

Posted: Thu Oct 05, 2017 7:10 pm UTC
by Cradarc
arbiteroftruth, I tried your strategy (with one oracle). It definitely isn't worse than the naive solution. However, I don't see much improvement either.

I think the problem is the likelihood of eliminating a section is the probability that one section has n times the energy of the other, where n is the number of samples in the section. So when n is large, this is statistically unlikely. Only when you get to somewhere on the order of n = 2 does it sometimes happen. But by that time, you already committed a lot of resources into the search...

Clever approach though.

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 10, 2017 11:46 am UTC
by DavidSh
One thing that might help performance of arbiteroftruth's method is to run the binary tree search in a best-first fashion. At each branch, explore the half of the tree with the greater average value first. Depending on the function, this can give you better bounds for improving the pruning. This still requires that max/min be large, and the function be fairly smooth, but it improves the chance that you are using something like the max value for pruning.

Re: Estimating Max with Only Definite Integrals

Posted: Tue Oct 10, 2017 11:30 pm UTC
by eta oin shrdlu
Cradarc wrote:The cost would be Q(n/p)k + S(p-1) if I choose to use p parallel streams. This is not ideal since it grows linearly with k and I have to make k large to ensure high probability of success.
I don't see it pointed out explicitly, but this is not quite right; the growth of the minimal cost with k (choosing p appropriately, for fixed n, Q, and S) is sublinear.

Re: Estimating Max with Only Definite Integrals

Posted: Fri Oct 20, 2017 10:17 pm UTC
by quadmaster