
Class: LimitedPrecisionReal
Object

+Magnitude

+ArithmeticValue

+Number

+LimitedPrecisionReal

+Float

+LongFloat

+QDouble

+ShortFloat
 Package:
 stx:libbasic
 Category:
 MagnitudeNumbers
 Version:
 rev:
1.115
date: 2018/05/09 23:22:36
 user: stefan
 file: LimitedPrecisionReal.st directory: libbasic
 module: stx stcclassLibrary: libbasic
 Author:
 Claus Gittinger
Abstract superclass for anyprecision floating point numbers (i.e. IEEE floats and doubles).
Short summary for beginners (find details in wikipedia):
========================================================
Floating point numbers are represented with a mantissa and an exponent, and the number's value is:
mantissa * (2 raisedTo: exponent)
with (1 > mantissa >= 0) and exponent adjusted as required for the mantissa to be in that range
(so called ''normalized'')
therefore,
13 asFloat mantissa > 0.8125
13 asFloat exponent > 4
0.8125 * (2 raisedTo:4) > 13
and:
104 asFloat mantissa > 0.8125
104 asFloat exponent > 7
0.8125 * (2 raisedTo:7) > 104
and:
0.1 mantissa > 0.8
0.1 exponent > 3
0.8 * (2 raisedTo:3) > 0.1
however:
(1 / 3.0) mantissa > 0.666666666666667
(1 / 3.0) exponent > 1
0.666666666666667 * (2 raisedTo:3) > 0.1
Danger in using Floats:
=======================
Beginners seem to forget (or never learn?) that flt. point numbers are always APPROXIMATIONs of some value.
You may never ever use them when exact results are neeed (i.e. when computing money!)
Take a look at the FixedPoint class for that.
See also 'Float comparison' below.
The Float/Double confusion in ST/X:
===================================
Due to historic reasons, ST/X's Floats are what Doubles are in VisualWorks.
The reason is that in some Smalltalks, double floats are called Float, and no single float exists (VSE, V'Age),
whereas in others, there are both Float and Double classes (VisualWorks).
In order to allow code from both families to be loaded into ST/X without a missing class error, and without
loosing precision, we decided to use IEEE doubles as the internal representation of Float
and make Double an alias to it.
This should work for either family (except for the unexpected additional precision in some cases).
If you really only want single precision floating point numbers, use ShortFloat instances.
But be aware that there is usually no advantage (neither in memory usage, due to memory alignment restrictions,
nor in speed), as these days, the CPUs are just as fast doing double precision operations.
(There might be a noticable difference when doing bulk operations, and you should consider using FloatArray for those).
Hardware supported precisions
=============================
The only really portable sizes are IEEEsingle and IEEEdouble floats (i.e. ShortFloat and Float instances).
These are supported on all architectures.
Some do provide an extended precision floating pnt. number,
however, the downside is that CPUarchitects did not agree on a common format and precision:
some use 80 bits, others 96 and others even 128.
See the comments in the LongFloat class for more details.
We recommend using Float (i.e. IEEE doubles) unless absolutely required,
and care for machine dependencies in the code otherwise.
For higher precision needs, you may also try the new QDouble class, which gives you >200bits (60digits)
of precision on all machines (at a noticable performance price, though).
Range and Precision of Storage Formats:
=======================================
Format  Class  Array Class  Bits / Significant  Smallest Pos Number  Largest Pos Number  Significant Digits
   (Binary)    (Decimal)
++++++
half    HalfFloatArray  16 / 11  6.10.... x 10−5  6.55... x 10+5  3.3
++++++
single  ShortFloat  FloatArray  32 / 24  1.175... x 1038  3.402... x 10+38  69
++++++
double  Float  DoubleArray  64 / 53  2.225... x 10308  1.797... x 10+308  1517
++++++
double  LongFloat    128 / 113  3.362... x 104932  1.189... x 10+4932  3336
ext      
(SPARC)     
+  +++
double    96 / 64  3.362... x 104932  1.189... x 10+4932  1821
ext      
(x86)      
++++++
  QDouble    256 / 212  2.225... x 10308  1.797... x 10+308  >=60
++++++
  LargeFloat    arbitrary  arbitrarily small  arbitrarily large  arbitrary
++++++
HalfFloats are only supported in fixed array containers.
This was added for OpenGL and other graphic libraries which allow for texture,
and vertex data to be passed quickly in that format (see http://www.opengl.org/wiki/Small_Float_Formats).
Long and LargeFloat are not supported as array containers.
These formats are seldom used for bulk data.
QDoubles are special soft floats; slower in performance, but providing 4 times the precision of regular doubles.
To see the differences in precision:
'%60.58f' printf:{ 1 asShortFloat exp } > '2.718281828459045*090795598298427648842334747314453125' (32 bits)
'%60.58f' printf:{ 1 asFloat exp } > '2.718281828459045*090795598298427648842334747314453125' (64 bits)
'%60.58f' printf:{ 1 asLongFloat exp } > '2.718281828459045235*4281681079939403389289509505033493041992' (only 80 valid bits on x86)
'%60.58f' printf:{ 1 asQDouble exp } > '2.71828182845904523536028747135266249775724709369995957496698' (>200 bits)
correct value is: 2.71828182845904523536028747135266249775724709369995957496696762772407663035354759457138217852516642742746
Bulk Containers:
================
If you have a vector or matrix (and especially: large ones) of floating point numbers, the well known
Array is a very inperformant choice. The reason is that it keeps pointers to each of its elements, and each element
(if it is a float) is itself stored somewhere in the object memory.
Thus, there is both a space overhead (every float object has an object header, for class and other information), and
also a performance overhead (extra indirection, cache misses and alignment inefficiencies).
For this, the bulk numeric containers are provided, which keep the elements unboxed and properly aligned.
Use them for matrices and large numeric vectors. They also provide some optimized bulk operation methods,
such as adding, multiplying etc.
Take a look at FloatArray, DoubleArray, HalfFloatArray etc.
Comparing Floats:
=================
Due to rounding errors (usually on the last bit(s)), you shalt not compare two floating point numbers
using the #= operator. For example, the value 0.1 cannot be represented as a sum of powersoftwo fractions,
and will therefore always be an approximation with a half bit error in the last bit of the mantissa.
Usually, the print functions take this into consideration and return a (faked) '0.1'.
However, this half bit error may accumulate, for example, when multiplying that by 0.1 then by 100,
the error may get large enough to be no longer pushed under the rug by the print function,
and you will get '0.9999999999999' from it.
Also, comparing against a proper 1.0 (which is representable as an exact power of 2),
you will get a false result.
i.e. (0.1 * 0.1 * 100 ~= 1.0) and (0.1 * 0.1 * 100  1.0) ~= 0.0
This often confuses noncomputer scientists (and occasionally even some of those).
For this, you should always provide an epsilon value, when comparing two noninteger numbers.
The epsilon value is the distance you accept two number to be apart to be still considered equal.
Effectively the epsilon says are those nearer than this epsilon?.
Now we could say is the delta between two numbers smaller than 0.00001,
and get a reasonable answer for big numbers. But what if we compare two tiny numbers?
Then a reasonable epsilon must also be much smaller!
Actually, the epsilon should always be computed dynamically depending on the two values compared.
That is what the #isAlmostEqualTo:nEpsilon: method does for you. It does not take an absolute epsilon,
but instead the number of distinct floating point numbers that the two compared floats may be apart.
That is: the number of actually representable numbers between those two.
Effectively, that is the difference between the two mantissas,
when the numbers are scaled to the same exponent, taking the number of mantissa bits into account.
Fraction
FixedPoint
class initialization

initialize

initialize ANSI compliant float globals
usage example(s):
constants & defaults

NaN

return the constant NaN (not a Number) in my representation.
usage example(s):
ShortFloat NaN
Float NaN
LongFloat NaN
LargeFloat NaN


computeEpsilon

return the maximum relative spacing
usage example(s):
Float radix
Float precision
Float computeEpsilon
ShortFloat computeEpsilon
LongFloat computeEpsilon
QDouble epsilon


emax

return the largest exponent
usage example(s):
Float emax
ShortFloat emax


emin

return the smallest exponent
** This method raises an error  it must be redefined in concrete classes **

epsilon

return the maximum relative spacing of instances of mySelf
(i.e. the valuedelta of the least significant bit)
usage example(s):
Float epsilon > 2.22044604925031E16
ShortFloat epsilon > 1.192093e07
LongFloat epsilon > 1.084202172485504434E19
QDouble epsilon > 1.21543267145725E63


fmax

The largest value allowed by instances of this class.
usage example(s):
Float fmax
ShortFloat fmax
LongFloat fmax
QDouble fmax


fmin

The smallest value allowed by instances of this class.
** This method raises an error  it must be redefined in concrete classes **

infinity

return an instance of myself which represents positive infinity
usage example(s):
ShortFloat infinity
Float infinity
LongFloat infinity
LargeFloat infinity


maxSmallInteger

answer the largest possible SmallInteger value as instance of myself
usage example(s):
Float maxSmallInteger.
LongFloat maxSmallInteger.
ShortFloat maxSmallInteger.
QDouble maxSmallInteger.


negativeInfinity

return an instance of myself which represents negative infinity
usage example(s):
ShortFloat negativeInfinity
Float negativeInfinity
LongFloat negativeInfinity
LargeFloat negativeInfinity

instance creation

fromInteger: anInteger

return a float with anInteger's value.
Since floats have a limited precision, you usually loose bits when doing this
with a large integer
(i.e. when numDigits is above the flt. pnt number's precision)
(see Float decimalPrecision, LongFloat decimalPrecision.
usage example(s):
ShortFloat fromInteger:2
12345678901234567890 asShortFloat
1234567890 asFloat
1234567890 asFloat asInteger
1234567890 asFloat asInteger
12345678901234567890 asFloat storeString
12345678901234567890 asFloat asInteger
12345678901234567890 asFloat asInteger
12345678901234567890 asLongFloat
12345678901234567890 asLongFloat asInteger
12345678901234567890 asLongFloat asInteger
123456789012345678901234567890 asLongFloat
123456789012345678901234567890 asLongFloat asInteger
123456789012345678901234567890 asLongFloat asInteger
1234567890123456789012345678901234567890 asLongFloat
1234567890123456789012345678901234567890 asLongFloat asInteger
1234567890123456789012345678901234567890 asLongFloat asInteger
'this test is on 65 bits'.
self assert: 16r1FFFFFFFFFFFF0801 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
'this test is on 64 bits'.
self assert: 16r1FFFFFFFFFFFF0802 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
'nearest even is upper'.
self assert: 16r1FFFFFFFFFFF1F800 asDouble = 16r1FFFFFFFFFFF20000 asDouble.
'nearest even is lower'.
self assert: 16r1FFFFFFFFFFFF0800 asDouble = 16r1FFFFFFFFFFFF0000 asDouble.


fromLimitedPrecisionReal: anLPReal

return a float with anLPReals value.
You might loose bits when doing this.
Slow fallback.

fromNumerator: numerator denominator: denominator

Create a limited precision real from a Rational.
This version will answer the nearest flotaing point value,
according to IEEE 754 round to nearest even default mode
usage example(s):
Time millisecondsToRun:[
1000000 timesRepeat:[
Float fromNumerator:12345678901234567890 denominator:987654321
].
]
fraction
fraction := 12345678901234567890//987654321.
Time millisecondsToRun:[
1000000 timesRepeat:[
fraction asFloat
].
]


new: aNumber

catch this message  not allowed for floats/doubles

readFrom: aStringOrStream onError: exceptionBlock

read a float from a string
usage example(s):
Float readFrom:'.1'
Float readFrom:'0.1'
Float readFrom:'0'
ShortFloat readFrom:'.1'
ShortFloat readFrom:'0.1'
ShortFloat readFrom:'0'
LongFloat readFrom:'.1'
LongFloat readFrom:'0.1'
LongFloat readFrom:'0'
LimitedPrecisionReal readFrom:'bla' onError:nil
Float readFrom:'bla' onError:nil
ShortFloat readFrom:'bla' onError:nil

queries

decimalPrecision

return the number of valid decimal digits
usage example(s):
ShortFloat decimalPrecision
Float decimalPrecision
LongFloat decimalPrecision


defaultPrintPrecision

return the number of decimal digits printed by default
usage example(s):
ShortFloat defaultPrintPrecision
Float defaultPrintPrecision
LongFloat defaultPrintPrecision


denormalized

Return whether the instances of this class can
represent values in denormalized format.

hasSharedInstances

return true if this class has shared instances, that is, instances
with the same value are identical.
Although not really shared, floats should be treated
so, to be independent of the implementation of the arithmetic methods.

isAbstract

Return if this class is an abstract class.
True is returned for LimitedPrecisionReal here; false for subclasses.
usage example(s):

isIEEEFormat

return true, if this machine represents floats in IEEE format.
Currently, no support is provided for nonieee machines
to convert their floats into this (which is only relevant,
if such a machine wants to send floats as binary to some other
machine).
Machines with nonIEEE format are VAXed and IBM370type systems
(among others). Today, most systems use IEEE format floats.

numBitsInExponent

return the number of bits in the exponent
** This method raises an error  it must be redefined in concrete classes **

numBitsInIntegerPart

answer the number of bits in the integer part of the mantissa.
Most floating point formats are normalized to get rid of the extra
bit.

numBitsInMantissa

return the number of bits in the mantissa
(typically 1 less than the precision due to the hidden bit)
** This method raises an error  it must be redefined in concrete classes **

precision

return the number of valid mantissa bits
usage example(s):
HalfFloat precision
ShortFloat precision
Float precision
LongFloat precision
QDouble precision


radix

return the radix (base)
** This method raises an error  it must be redefined in concrete classes **
accessing

at: index

redefined to prevent access to individual bytes in a real.

at: index put: aValue

redefined to prevent access to individual bytes in a real
arithmetic

* aNumber

return the product of the receiver and the argument.

+ aNumber

return the sum of the receiver and the argument, aNumber

 aNumber

return the difference of the receiver and the argument, aNumber

/ aNumber

return the quotient of the receiver and the argument, aNumber

// aNumber

return the integer quotient of dividing the receiver by aNumber with
truncation towards negative infinity.

ceiling


floor


timesTwoPower: anInteger

multiply self by a power of two.
Implementation takes care of preserving class and avoiding overflow/underflow
Thanks to Nicolas Cellier for this code
usage example(s):
(1 asShortFloat timesTwoPower: 3) class = ShortFloat.
(1 asLongFloat timesTwoPower: 1024).
(1 asFloat timesTwoPower: 1024) timesTwoPower: 1024.
(2.0 asShortFloat timesTwoPower: 150) timesTwoPower: 150
(2.0 asLongFloat timesTwoPower: 150) timesTwoPower: 150
(2.0 asFloat timesTwoPower: 150) timesTwoPower: 150
(2.0 asShortFloat timesTwoPower: 149) timesTwoPower: 149
(2.0 asLongFloat timesTwoPower: 149) timesTwoPower: 149
(2.0 asFloat timesTwoPower: 149) timesTwoPower: 149
Time millisecondsToRun:[
1000000 timesRepeat:[
(2.0 timesTwoPower: 150)
]
]

bytes access

digitBytes

answer the float's digit bytes im IEEE format.
Use the native machine byte ordering.
usage example(s):
Float pi digitBytes
ShortFloat pi digitBytes


digitBytesMSB: msb

answer the float's digit bytes im IEEE format.
If msb == true, use MSB byte order, otherwise LSB byte order.
usage example(s):
Float pi digitBytesMSB:false
Float pi digitBytesMSB:true
ShortFloat pi digitBytesMSB:false
ShortFloat pi digitBytesMSB:true

coercing & converting

asFloat


asFraction

Answer a rational number (Integer or Fraction) representing the receiver.
This conversion uses the continued fraction method to approximate
a floating point number.
In contrast to #asTrueFraction, which returns exactly the value of the float,
this rounds in the last significant bit of the floating point number.
usage example(s):
1.1 asFraction
1.2 asFraction
0.3 asFraction
0.5 asFraction
(1/5) asFloat asFraction
(1/8) asFloat asFraction
(1/13) asFloat asFraction
(1/10) asFloat asFraction
(1/10) asFloat asTrueFraction asFixedPoint scale:20
3.14159 asFixedPoint scale:20
3.14159 storeString
3.14159 asFraction asFloat storeString
1.3 asFraction
1.0 asFraction
1E6 asFraction
1E6 asFraction


asInteger

return an integer with same value  might truncate
usage example(s):
12345.0 asInteger
1e15 asInteger
1e33 asInteger asFloat
1e303 asInteger asFloat


asLargeFloat

return a large float with (approximately) my value

asLimitedPrecisionReal

return a float of any precision with same value

asLongFloat


asRational

Answer a Rational numberInteger or Fractionrepresenting the receiver.
Same as asFraction fro st80 compatibility.
usage example(s):
1.1 asRational
1.2 asRational
0.3 asRational
0.5 asRational
(1/5) asFloat asRational
(1/8) asFloat asRational
(1/13) asFloat asRational
3.14159 asRational
3.14159 asRational asFloat
1.3 asRational
1.0 asRational


asShortFloat


asTrueFraction

Answer a fraction or integer that EXACTLY represents self,
an anyprecision IEEE floating point number, consisting of:
numMantissaBits bits of normalized mantissa (i.e. with hidden leading 1bit)
optional numExtraBits between mantissa and exponent (normalized flag for extreal)
numExponentBits bits of 2s complement exponent
1 sign bit.
Taken from Floats asTrueFraction
usage example(s):
(result asFloat = self) ifFalse: [self error: 'asTrueFraction validation failed'].

usage example(s):
0.3 asFloat asTrueFraction
0.3 asShortFloat asTrueFraction
0.3 asLongFloat asTrueFraction
1.25 asTrueFraction
1.25 asShortFloat asTrueFraction
0.25 asTrueFraction
0.25 asTrueFraction
3e37 asTrueFraction
LongFloat NaN asTrueFraction
LongFloat infinity asTrueFraction
LongFloat negativeInfinity asTrueFraction

comparing

< aNumber

return true, if the argument is greater
copying

deepCopy

return a deep copy of myself
 because storing into floats is not recommended/allowed, its ok to return the receiver

deepCopyUsing: aDictionary postCopySelector: postCopySelector

return a deep copy of myself
 because storing into floats is not recommended/allowed, its ok to return the receiver

shallowCopy

return a shallow copy of the receiver

simpleDeepCopy

return a deep copy of the receiver
 because storing into floats is not recommended/allowed, its ok to return the receiver
double dispatching

differenceFromFraction: aFraction

sent when a fraction does not know how to subtract the receiver, a float

productFromFraction: aFraction

sent when a fraction does not know how to multiply the receiver, a float

quotientFromFraction: aFraction

Return the quotient of the argument, aFraction and the receiver.
Sent when aFraction does not know how to divide by the receiver.

sumFromFraction: aFraction

sent when a fraction does not know how to add the receiver, a float

sumFromTimestamp: aTimestamp

I am to be interpreted as seconds, return the timestamp this number of seconds
after aTimestamp
usage example(s):
Timestamp now sumFromTimestamp:aTimestamp
100.0 sumFromTimestamp:Timestamp now
t1 t2
t1 := Timestamp now.
t2 := 1.5 sumFromTimestamp:t1.
t1 inspect. t2 inspect.

inspecting

inspectorExtraAttributes
( an extension from the stx:libtool package )

extra (pseudo instvar) entries to be shown in an inspector.
queries

decimalPrecision

Answer how many digits of accuracy this class supports
usage example(s):
1.0 asFloat decimalPrecision
1.0 asLongFloat decimalPrecision
1.0 asShortFloat decimalPrecision
1.0 asQDouble decimalPrecision
1.0 asLargeFloat decimalPrecision


defaultNumberOfDigits

Answer how many digits of accuracy this class supports
usage example(s):
Float new defaultNumberOfDigits
LongFloat new defaultNumberOfDigits
ShortFloat new defaultNumberOfDigits
QDouble new defaultNumberOfDigits


exponent

extract a normalized float's exponent.
This is a fallback for systems which do not provide frexp in their math lib,
als also for error reporting (NaN or Inf).
The returned value depends on the floatrepresentation of
the underlying machine and is therefore highly unportable.
This is not for general use.
This assumes that the mantissa is normalized to
0.5 .. 1.0 and the float's value is mantissa * 2^exp
usage example(s):
Extract the sign and the biased exponent

usage example(s):
0.3 asFloat exponent
0.3 asShortFloat exponent
0.3 asLongFloat exponent
0.0 exponent2 0
1.0 exponent2 1
2.0 exponent2 2
3.0 exponent2 2
4.0 exponent2 3
0.5 exponent2 0
0.4 exponent2 1
0.25 exponent2 1
0.00000011111 exponent2 23


fractionalPart

This has been renamed to #fractionPart for ST80 compatibility.
extract the afterdecimal fraction part.
the floats value is
float truncated + float fractionalPart
** This is an obsolete interface  do not use it (it may vanish in future versions) **

mantissa

extract a normalized float's mantissa.
This is a fallback for systems which do not provide frexp in their math lib,
als also for error reporting (NaN or Inf).
usage example(s):
0.3 asFloat mantissa
0.3 asShortFloat mantissa
0.3 asLongFloat mantissa
0.3 asQDouble mantissa


precision

return the number of valid mantissa bits.
Should be redefined in classes which allow perinstance precision specification

size

redefined since reals are kludgy (ByteArry)
testing

isFinite

(comment from inherited method)
return true, if the receiver is finite
i.e. it can be represented as a rational number.
** This method raises an error  it must be redefined in concrete classes **

isFloat

return true, if the receiver is some kind of floating point number;
false is returned here.
Same as #isLimitedPrecisionReal, but a better name ;)

isInfinite

return true, if the receiver is an infinite float (Inf).
These are not created by ST/X float operations (they raise an exception);
however, inline Ccode could produce them ...
usage example(s):
1.0 isInfinite
(0.0 uncheckedDivide: 0.0) isInfinite
(1.0 uncheckedDivide: 0.0) isInfinite


isLimitedPrecisionReal

return true, if the receiver is some kind of limited precision real (i.e. floating point) number;
true is returned here  the method is redefined from Object.

isNaN

(comment from inherited method)
return true, if the receiver is an invalid float (NaN  not a number).
** This method raises an error  it must be redefined in concrete classes **

isNegativeZero

many systems have two float.Pnt zeros
usage example(s):
0.0 asLongFloat isNegativeZero
0.0 asLongFloat isNegativeZero
1.0 asLongFloat isNegativeZero
1.0 asLongFloat isNegativeZero
0.0 asLargeFloat isNegativeZero
0.0 asLargeFloat isNegativeZero


numberOfBits

return the size (in bits) of the real;
typically, this is 64 for Floats and 32 for ShortFloats,
but who knows ...
** This method raises an error  it must be redefined in concrete classes **

positive

return true if the receiver is greater or equal to zero (not negative)

sign

return the sign of the receiver (1, 0 or 1)
usage example(s):
1.0 sign
0.0 sign
1.0 sign
0.0 sign

truncation & rounding

ceilingAsFloat

0.4 asLongFloat ceilingAsFloat

floorAsFloat

0.4 asLongFloat floorAsFloat

roundedAsFloat


truncatedAsFloat

0.4 asLongFloat truncatedAsFloat

truncatedToPrecision

truncates to the precision of the float.
This is slightly different from truncated.
Taking for example 1e32,
the printed representation will be 1e32,
but the actual value, when truncating to an integer
would be 100000003318135351409612647563264.
This is due to the inaccuracy in the least significant bits,
and the way the printconverter compensates for this.
This method tries to generate an integer value which corresponds
to what is seen in the float's printString.
Here, a slow fallback (generating and rescanning the printString)
is provided, which should work on any float number.
Specialized versions in subclasses may be added for more performance
(however, this is probably only used rarely)
usage example(s):
1e32 asShortFloat truncated
1e32 asShortFloat truncatedToPrecision
1.234e10 asShortFloat truncatedToPrecision
1234e1 asShortFloat truncatedToPrecision
1e32 truncated
1e32 truncatedToPrecision
1.234e10 truncatedToPrecision
1234e1 truncatedToPrecision
1e32 asLongFloat truncated
1e32 asLongFloat truncatedToPrecision
1.234e10 asLongFloat truncatedToPrecision
1234e1 asLongFloat truncatedToPrecision

visiting

acceptVisitor: aVisitor with: aParameter

dispatch for visitor pattern; send #visitFloat:with: to aVisitor.
