As dpl implements desirable algorithms differently from the way it is done
by a sequential implementation of the corresponding formula, it is necessary to verify:
How different are the results generated by parallel code from the results generated by the corresponding sequential code
Also, due to inexact nature of the floating point calculation, the result generated by the sequential code may not be fully trusted.
It is therefore necessary to verify
How different are the results generated by parallel code from the precise ( calculated with unlimited precision ) results that implement sequential algorithm
The implemented verification process is almost identical to the one described in chapter testing dco parallelization of a stencil found
here.
The results of this verification show that, in most cases,
the difference between parallel and sequential results is acceptably small.
However there are exception to this  functions that use division:
dpl_stencil_div_1p, dpl_stencil_o_div_1p and
dpl_stencil_div_acc.
Let study such an exception  case with the difference between the result generated by the dpl derived parallel code and
the result generated by the corresponding sequential code is big.
In most cases the results to be verified are vectors of double precision values.
Define norms that will help with evaluation. For vectors x and x1 of the size ( dimension ) n we deifine:
Maximal Relative Difference ( MaxRelDiff )
max( fabs( 2.*( x[i]  x1[i] ) )/( fabs( x[i] ) + fabs( x1[i] ) ), i=0,...,n1
Vector Relative Difference ( VectorRelDiff )
x  x1/x1
where
vector = √ ∑( vector[i] )^{2}
Vector Normal Difference ( VectorNormDiff )
max(  x[i]  x1[i]  ), for i = 0,...,n1
Comprehensive evaluation of the function dpl_stencil_o_div_1p using
drt based evaluation procedure ( see the reference mentioned above ) reported the following data for the worst case detected:
stencil_div_o_1p cmp_parallel_serial_loop
MaxRelDiff: 3.265649e04
VectorRelDiff: 2.630082e04
VectorNormDiff: 1.710950e01
Happen in the test case 3804350 at the index 875 of 1000 with the stride 1
myRslt= 523.83780351572954714
exRslt= 524.00889851128829378
Total 7119197, Attempted 7119197, Done 7119197 cases for 1000 seconds error count = 10152.
Random number generator seed used 1596510236.
The data seems to be disappointing, the result derived by using dpl parallel library ( myRslt )
agrees with the result generated by the sequential code ( exRslt ) in less than 4 decimal places.
Also, all the norms show low level of convergence.
Let further investigate this case.
First, we create serial implementation of the function dpl_stencil_o_div_1p using
the routines from the Dalsoft High Precision ( dhp ) library
( see this for details )  the result produced by this implementation
we will call preciseRslt; we assume that the preciseRslt, generated with 1024bits mantissa, is fully accurate when expressed as 64bit double precision value.
Second, utilizing drtsupported functionality we recreated the case to be studied
( establishing the used seed and skipping 3804350 cases to the case where the problem was observed ) and perform
the following testing on it:
 compare the result, derived by using dpl parallel library,  myRslt 
with the preciseRslt.
 compare the result, generated by the sequential code,  exRslt 
with the preciseRslt.
comparing the result derived by using dpl parallel library  myRslt 
with the preciseRslt, generated the following data:
stencil_div_o_1p cmp_parallel_serialhp
Dim.errs.chsums 1000 0 0 6777.462714921427505
6774.995656373027487
MaxRelDiff 3.925580826263917e04
VectorRelDiff 3.161913979263145e04
VectorNormDiff 2.055964094892033e01
comparing the result generated by the sequential code  exRslt 
with the preciseRslt, generated the following data:
stencil_div_o_1p cmp_serial_serialhp
Dim.errs.chsums 1000 0 0 6779.515773068097587
6774.995656373027487
MaxRelDiff 7.191229955145954e04
VectorRelDiff 5.793222860496758e04
VectorNormDiff 3.766914050479500e01
The following table summarizes the above results.
Each row represents the norm being used.
The first column contains values for these norms
when comparing the result derived by using dpl parallel library  myRslt 
with the preciseRslt.
The second column contains values for these norms
when comparing results generated by the sequential code  exRslt  with the preciseRslt.

myRslt vs preciseRslt 
exRslt vs preciseRslt 
MaxRelDiff

3.925580826263917e04 
7.191229955145954e04 
VectorRelDiff

3.161913979263145e04 
5.793222860496758e04 
VectorNormDiff

2.055964094892033e01 
3.766914050479500e01 
Analyzing the data from this table becomes clear that presiceRslt is closer to myRslt than to exRslt 
all the norms for myRslt vs preciseRslt are smaller than corresponding norms
exRslt vs preciseRslt.
The problem we observed ( low precision ) seems to be due to the fact that the function dpl_stencil_o_div_1p, that performs division, is inherently unstable and
thus may not be used on an arbitrary ( randomgenerated ) input.