[VERIFIED] MATLAB 6.5 Portable: How to Get Started with MATLAB in Minutes
- comppropningnivo
- Aug 14, 2023
- 7 min read
Embedded Coder offers built-in support for AUTOSAR, MISRA C, and ASAP2 software standards. It also provides traceability reports, code documentation, and automated software verification to support DO-178, IEC 61508, and ISO 26262 software development. Embedded Coder code is portable, and can be compiled and executed on any processor. In addition, Embedded Coder offers support packages with advanced optimizations and device drivers for specific hardware.
Some compiler writers view restrictions which prohibit converting (x + y) + z to x + (y + z) as irrelevant, of interest only to programmers who use unportable tricks. Perhaps they have in mind that floating-point numbers model real numbers and should obey the same laws that real numbers do. The problem with real number semantics is that they are extremely expensive to implement. Every time two n bit numbers are multiplied, the product will have 2n bits. Every time two n bit numbers with widely spaced exponents are added, the number of bits in the sum is n + the space between the exponents. The sum could have up to (emax - emin) + n bits, or roughly 2emax + n bits. An algorithm that involves thousands of operations (such as solving a linear system) will soon be operating on numbers with many significant bits, and be hopelessly slow. The implementation of library functions such as sin and cos is even more difficult, because the value of these transcendental functions aren't rational numbers. Exact integer arithmetic is often provided by lisp systems and is handy for some problems. However, exact floating-point arithmetic is rarely useful.
[VERIFIED] MATLAB 6.5 Portable
A number of claims have been made in this paper concerning properties of floating-point arithmetic. We now proceed to show that floating-point is not black magic, but rather is a straightforward subject whose claims can be verified mathematically. This section is divided into three parts. The first part presents an introduction to error analysis, and provides the details for the section Rounding Error. The second part explores binary to decimal conversion, filling in some gaps from the section The IEEE Standard. The third part discusses the Kahan summation formula, which was used as an example in the section Systems Aspects.
The increasing acceptance of the IEEE floating-point standard means that codes that utilize features of the standard are becoming ever more portable. The section The IEEE Standard, gave numerous examples illustrating how the features of the IEEE standard can be used in writing practical floating-point codes.
The preceding paper has shown that floating-point arithmetic must be implemented carefully, since programmers may depend on its properties for the correctness and accuracy of their programs. In particular, the IEEE standard requires a careful implementation, and it is possible to write useful programs that work correctly and deliver accurate results only on systems that conform to the standard. The reader might be tempted to conclude that such programs should be portable to all IEEE systems. Indeed, portable software would be easier to write if the remark "When a program is moved between two machines and both support IEEE arithmetic, then if any intermediate result differs, it must be because of software bugs, not from differences in arithmetic," were true.
Several of the examples in the preceding paper depend on some knowledge of the way floating-point arithmetic is rounded. In order to rely on examples such as these, a programmer must be able to predict how a program will be interpreted, and in particular, on an IEEE system, what the precision of the destination of each arithmetic operation may be. Alas, the loophole in the IEEE standard's definition of destination undermines the programmer's ability to know how a program will be interpreted. Consequently, several of the examples given above, when implemented as apparently portable programs in a high-level language, may not work correctly on IEEE systems that normally deliver results to destinations with a different precision than the programmer expects. Other examples may work, but proving that they work may lie beyond the average programmer's ability.
In this section, we classify existing implementations of IEEE 754 arithmetic based on the precisions of the destination formats they normally use. We then review some examples from the paper to show that delivering results in a wider precision than a program expects can cause it to compute wrong results even though it is provably correct when the expected precision is used. We also revisit one of the proofs in the paper to illustrate the intellectual effort required to cope with unexpected precision even when it doesn't invalidate our programs. These examples show that despite all that the IEEE standard prescribes, the differences it allows among different implementations can prevent us from writing portable, efficient numerical software whose behavior we can accurately predict. To develop such software, then, we must first create programming languages and environments that limit the variability the IEEE standard permits and allow programmers to express the floating-point semantics upon which their programs depend.
Conventional wisdom maintains that extended-based systems must produce results that are at least as accurate, if not more accurate than those delivered on single/double systems, since the former always provide at least as much precision and often more than the latter. Trivial examples such as the C program above as well as more subtle programs based on the examples discussed below show that this wisdom is naive at best: some apparently portable programs, which are indeed portable across single/double systems, deliver incorrect results on extended-based systems precisely because the compiler and hardware conspire to occasionally provide more precision than the program expects.
For the algorithm of Theorem 4 to work correctly, the expression 1.0 + x must be evaluated the same way each time it appears; the algorithm can fail on extended-based systems only when 1.0 + x is evaluated to extended double precision in one instance and to single or double precision in another. Of course, since log is a generic intrinsic function in Fortran, a compiler could evaluate the expression 1.0 + x in extended precision throughout, computing its logarithm in the same precision, but evidently we cannot assume that the compiler will do so. (One can also imagine a similar example involving a user-defined function. In that case, a compiler could still keep the argument in extended precision even though the function returns a single precision result, but few if any existing Fortran compilers do this, either.) We might therefore attempt to ensure that 1.0 + x is evaluated consistently by assigning it to a variable. Unfortunately, if we declare that variable real, we may still be foiled by a compiler that substitutes a value kept in a register in extended precision for one appearance of the variable and a value stored in memory in single precision for another. Instead, we would need to declare the variable with a type that corresponds to the extended precision format. Standard FORTRAN 77 does not provide a way to do this, and while Fortran 95 offers the SELECTED_REAL_KIND mechanism for describing various formats, it does not explicitly require implementations that evaluate expressions in extended precision to allow variables to be declared with that precision. In short, there is no portable way to write this program in standard Fortran that is guaranteed to prevent the expression 1.0 + x from being evaluated in a way that invalidates our proof.
Some algorithms that depend on correct rounding can fail with double-rounding. In fact, even some algorithms that don't require correct rounding and work correctly on a variety of machines that don't conform to IEEE 754 can fail with double-rounding. The most useful of these are the portable algorithms for performing simulated multiple precision arithmetic mentioned in the section Exactly Rounded Operations. For example, the procedure described in Theorem 6 for splitting a floating-point number into high and low parts doesn't work correctly in double-rounding arithmetic: try to split the double precision number 252 + 3 226 - 1 into two parts each with at most 26 bits. When each operation is rounded correctly to double precision, the high order part is 252 + 227 and the low order part is 226 - 1, but when each operation is rounded first to extended double precision and then to double precision, the procedure produces a high order part of 252 + 228 and a low order part of -226 - 1. The latter number occupies 27 bits, so its square can't be computed exactly in double precision. Of course, it would still be possible to compute the square of this number in extended double precision, but the resulting algorithm would no longer be portable to single/double systems. Also, later steps in the multiple precision multiplication algorithm assume that all partial products have been computed in double precision. Handling a mixture of double and extended double variables correctly would make the implementation significantly more expensive.
Likewise, portable algorithms for adding multiple precision numbers represented as arrays of double precision numbers can fail in double-rounding arithmetic. These algorithms typically rely on a technique similar to Kahan's summation formula. As the informal explanation of the summation formula given on Errors In Summation suggests, if s and y are floating-point variables with s y and we compute: t = s + y; e = (s - t) + y;
then in most arithmetics, e recovers exactly the roundoff error that occurred in computing t. This technique doesn't work in double-rounded arithmetic, however: if s = 252 + 1 and y = 1/2 - 2-54, then s + y rounds first to 252 + 3/2 in extended double precision, and this value rounds to 252 + 2 in double precision by the round-ties-to-even rule; thus the net rounding error in computing t is 1/2 + 2-54, which is not representable exactly in double precision and so can't be computed exactly by the expression shown above. Here again, it would be possible to recover the roundoff error by computing the sum in extended double precision, but then a program would have to do extra work to reduce the final outputs back to double precision, and double-rounding could afflict this process, too. For this reason, although portable programs for simulating multiple precision arithmetic by these methods work correctly and efficiently on a wide variety of machines, they do not work as advertised on extended-based systems. 2ff7e9595c
Comments