Missed Detections: Underlying model and math

Underlying model and math

The underlying model

We provide a web application, a command-line application, and a Java API, for a process in which a researcher (moderator or coordinator) carries out a survey, iniviting a number of experts to provide their statistical estimates about the behaviour of a group ("population") of illegal border crossers. It is assumed that a number of persons (the "population") have been once apprehended as illegal border crossers and removed from the country. Some percentage of them will attempt to cross the border again, and some percentage of those will be apprehended. Observable values.

During the K consecutive periods (T₁, T₂, ..., T_K) following the removal, some of those removed persons will be apprehended again when trying to cross the border. The numbers of these apprehensions, b₁, b₂, ..., b_K, measured as fractions of the original "population" of the removed persons, are the only observable values we have. Some other members of the "population" may attempt to cross the border even later (after the end of the K observation periods, i.e. during the future time for which we don't have any observable data), and some will never attempt another crossing (we say that they are "deterred").

In this model, we simplify the reality by assuming that no one tried to carry out more than one attempt at a repeat crossing, i.e. every member of the population either attempts to cross during one of the K periods, "even later", or never.

Aggregation

The researcher is interested in the number of people who attempt crossing during each time period. It is not directly observable - we ony know the numbers of apprehensions for each of the K periods (i.e. the number of attempted crossings multiplied by the apprehension rate), but not the number of people who have made the crossing without being apprehended.

(We also assume that the apprehension rate is the same during all K periods, which means that the relative values of attempt rates during different periods are the same as the relative values of the observables apprehension nhumbers. That is, for example, if we know that twice as many people from our "population" were appehended in Decemeber than in August, then we believe that twice as many people attempted crossing in in Decemeber than in August. This, of course, is a simplifying assumption).

To form an educated guess of the values that are not directly observable (the crossing attempt rates in various periods, and the apprehension rate), the researcher (coordinator) canvasses a number of experts, asking them to provide their estimates of these unobservable numbers. The experts are not shown the known observables (the "ground truth" data).

For a given number (K) of consecutive observation periods, the expert will be asked to provide the estimates of:

the border crossing attempt rates (a₁, a₂, ..., a_K) for each of the K periods (i.e., what percentage of the "population" will attempt to cross the border in each period);
the border crossing attempt rate a_K+1 for the "even later" period (i.e. which percent of the "population" will attempt to cross the border after the end of the K observation periods;
the deterrance rate a_K+2, i.e. the percentage of the population who'll never attempt to cross the border again.
the apprehensvion rate r, i.e. what percentage of the people attempting to cross the border during any of the observation periods is going to be apprehended by the authorities. (It is assumed that the apprehension rate will be the same during all K periods).

Since it is assumed that no one makes more than crossing attempt after the initial release, the K+2 rates in items 1-3 above must sum to a₁ + .. + a_K + a_K+1 + a_K+2 = 1.0.

For the convenience of the participating experts, in this application the rates are multipled by 100 (i.e. one enters 5, rather than 0.05) to mean "5% of the population". Thus the K+2 values entered by the expert in items 1-3 above must sum to 100.

Processing the experts' estimates

While the experts are not told what the actually observed values are, the coordinator has the observables at his or her disposal, and uses them to evaluate the quality of experts' estimates. The following procedure is used:

Weighting of experts.

For each expert, we compute the vector of expected observables that would be observed if the expert's estimates were correct. It consists of K+1 numbers, i.e. the K expected apprehension numbers for the K periods and the number of people who are not apprehended during these K (because they either never attempt to cross, or attempt to cross after the end of T_K, or because they cross during one of the T_K periods without being apprehended). The numbers are expressed as fractions of the total "population", and therefore sum to 1.0.
For each expert, the vector of observables computed for this expert in the previous step is compared with the "ground truth" observables (i.e. the actual apprehension numbers during the T_K periods, and the number of people who are not apprehended; these are also expressed as fractions of the total "population"). The two (K+1)-dimensional vectors can be viewed as probability distributions. For the measure of dissimilarity d(p,q) between two such distribution p and q we use the Normalized Jensen-Shannon divergence (NJSD), which is defined as the Jensen-Shannon divergence between p and q divided by the maximum possible value of JSD, which is log(2). Thus,
d(p,q) = JSD(p,q)/log(2).
Among all possible distributions p and q, the value of d(p,q) ranges between 0 and 1. The value of 0 obtains only when the two distributions are identical; the value of 1 is reached only when the two distributions are so different that p only has non-zero values in the periods when q has zero values, and vice versa.
Based on the NJSD values computed for the experts' predictions, a weight is assigned to each expert's prediction. Two modes are supported for computing weights:
- Unscaled weights: The weight is a linear function of the NJSD,
  w(p) = 1 - d(p,t),
  where t is the "ground truth" vector of observables. Thus an expert whose predictions results in a vector of observables identical to the actually observable one will get the weight of 1. An expert whose predictions are maximally different from the actually observable ones (e.g. he predicted that the entire population will try a crossing and will be apprehended during a period when there were 0 actual apprehensions) will get the weight of 0.
- Scaled weights: This is similar to grading students on a curve. The experts are ranked by the similarity of the vectors of observables based on their predictions and that of the "actual" ("ground truth") observables. The weights assigned to the experts are calculated by a linear function of the experts' NJSD which is chosen so that the expert whose observables are most similar to the ground truth (i.e. have the lowest value of NJSD) always receives the weight of 1, and the expert with the least similarity to the ground truth (the highest NJSD) receives the weight of 1/R. Here R is a constant with a value greatear of equal than 1 supplied by the moderator. At one extreme, R=1, all experts will be assigned the same weight 1; at the other extreme, a very large positive R, the worst expert will have a weight close to 0. This means that the weight of an expert p is computed as
  w(p) = (d_max - d(p,t) + (1/R)*(d(p,t) - d_min)) / (d_max-d_min),
  where d_max and d_min are the highest and lowest values of d(x) among all experts x.

Averaging.

A weighted average of all predictions is computed, with the apprehension rate

r^* = (Σ_x w(x) r(x)) /(Σ_x w(x)),

and the attempted crossing rates

a^*_i = (Σ_x a_i(x) r(x)) /(Σ_x w(x)),

where r(x) and a_i(x) (for i=1 to K+2) are the estimates, respectively, of the apprehension rate r and the attempted crossing rate (or deterrance rate, for i=K+2) a_i, provided by expert x.

The "missed to capured ratio" (MCR),

μ = (1-r)/r = 1/r -1,

is computed for each expert estimate. Its meaning is the expert's estimation of the ratio of the number of crossers who have evaded capture to the number of apprehended crossers.

In addition to the averaged estimates of the crossing rates and apprehension rates, the system also computes the weighted average of the "missed to capured ratios", i.e.

μ^* = (Σ w(x) μ(x)) /(Σ_x w(x)).

Note that μ^* is normally not equal to 1/r^*-1, because the average of the inverse values is not equal to the inverse of the averages. In fact, unless all experts predicted the same apprehension rate, the averaged MCR will be greater than the MCR that one can compute based on the averaged apprehension rate.

Along with the weighted averages, standard deviations are computed for each value.

Back to main documentation page