|
Class: OctaFloat
Object
|
+--Magnitude
|
+--ArithmeticValue
|
+--Number
|
+--LimitedPrecisionReal
|
+--AbstractIEEEFloat
|
+--OctaFloat
- Package:
- stx:libbasic2
- Category:
- Magnitude-Numbers
- Version:
- rev:
1.70
date: 2024/01/15 08:48:29
- user: cg
- file: OctaFloat.st directory: libbasic2
- module: stx stc-classLibrary: libbasic2
Notice:
Unfinished, ongoing work.
Basic arithmetic should work, but rounding is not working correctly,
this affects some of the series approximations (i.e. trigonometric).
Therefore for the time being:
Please only use them if you need to represent 256bit floats to be exchanged
with the external world (i.e. to get the bit representation).
If you need more precision than double IEEE precision, wither use QDoubles;
which are faster and provide almost the same precision as OctaFloats,
or use LargeFloats which provide any precision.
OctaFloats represent rational numbers with limited precision
and are mapped to IEEE octuple precision format (256bit),
also called binary256.
Notice, that a software emulation is done, which is much slower.
Thus only use them, if you really need the additional precision;
if not, use Float (which are doubles) or LongFloats which usually have IEEE extended precision (80bit).
OctaFloats give you definite 256 bit quadruple floats,
thus, code using octaFloats is guaranteed to be portable from one architecture to another.
Representation:
256bit octuple IEEE floats (32bytes);
237 bit mantissa,
19 bit exponent,
71 decimal digits (approx.)
Mixed mode arithmetic:
octaFloat op anyFloat -> octaFloat
anyFloat op octaFloat -> octaFloat
Range and precision of storage formats: see LimitedPrecisionReal >> documentation
[aliases:]
Float256
copyrightCOPYRIGHT (c) 2018 by eXept Software AG
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
class initialization
-
initialize
-
an alias
coercing & converting
-
coerce: aNumber
-
convert the argument aNumber into an instance of the receiver (class) and return it.
-
generality
-
return the generality value - see ArithmeticValue>>retry:coercing:
constants
-
NaN
-
return an octaFloat which represents not-a-Number (i.e. an invalid number)
Usage example(s):
-
e
-
return the constant e as octaFloat
Usage example(s):
eDigits has enough digits for 256bit IEEE quads
|
Usage example(s):
do not use as a literal constant here - we cannot depend on the underlying C-compiler here...
|
Usage example(s):
Usage example(s):
Modified (comment): / 27-10-2021 / 11:54:11 / cg
|
-
halfPi
-
return the constant pi/2 as octaFloat
Usage example(s):
halfPiDigits has enough digits for 256bit IEEE quads
|
-
infinity
-
return an octaFloat which represents +INF
Usage example(s):
PositiveInfinity := nil.
self infinity
|
-
ln10
-
return the constant natural logarithm log(10) as an octaFloat.
Usage example(s):
ln10Digits has enough digits for 256bit IEEE quads
|
Usage example(s):
Ln10 := nil.
OctaFloat ln10
|
-
ln2
-
return the constant ln(2) as octaFloat
Usage example(s):
ln2Digits has enough digits for 256bit IEEE quads
|
Usage example(s):
-
negativeInfinity
-
return an octaFloat which represents -INF
Usage example(s):
NegativeInfinity := nil.
self negativeInfinity
|
-
phi
-
return the constant phi as octaFloat
Usage example(s):
phiDigits has enough digits for 256bit IEEE quads
|
Usage example(s):
-
pi
-
return the constant pi as octaFloat
Usage example(s):
piDigits has enough digits for 256bit IEEE quads
|
Usage example(s):
do not use as a literal constant here - we cannot depend on the underlying C-compiler here...
|
Usage example(s):
-
sqrt2
-
return the constant sqrt(2) as OctaFloat
Usage example(s):
sqrt2Digits has enough digits for 128bit IEEE quads
|
Usage example(s):
LongFloat sqrt2 -> 1.414213562373095049
QuadFloat sqrt2 -> 1.4142135623730936799802295816596154
OctaFloat sqrt2 -> 1.41421356237309504880168872420969807856967187537694807317667973799073249
|
-
sqrt3
-
return the constant sqrt(3) as OctaFloat
Usage example(s):
sqrt3Digits has enough digits for 128bit IEEE quads
|
Usage example(s):
LongFloat sqrt3 -> 1.732050807568877294
QDouble sqrt3 -> 1.73205080756888
QuadFloat sqrt3 -> 1.7320508075688772935274463415058723
OctaFloat sqrt3 -> 1.73205080756887729352744634150587236694280525381038062805580697945193301
|
-
unity
-
return the neutral element for multiplication (1.0) as OctaFloat
Usage example(s):
OctaFloatOne := nil.
self unity
|
-
zero
-
return the neutral element for addition (0.0) as OctaFloat
Usage example(s):
OctaFloatZero := nil.
self zero
|
error reportng
-
errorUnsupported
-
you may proceed from this error, to get a long float number result
(of course, with less than expected precision)
instance creation
-
basicNew
-
return a new octaFloat - here we return 0.0
- OctaFloats are usually NOT created this way ...
Its implemented here to allow things like binary store & load of octaFloats.
(but it is not a good idea to store the bits of a float - the reader might have a
totally different representation - so floats should be
binary stored in a device independent format).
-
basicNew: size
-
(comment from inherited method)
return an instance of myself with anInteger indexed variables.
If the receiver-class has no indexed instvars, this is only allowed
if the argument, anInteger is zero.
** Do not redefine this method in any class **
-
fromFloat: aFloat
-
return a new octaFloat, given a float value
Usage example(s):
OctaFloat fromFloat:123.0
123.0 asOctaFloat
123 asOctaFloat
|
-
fromInteger: anInteger
-
return a new octaFloat, given an integer value
Usage example(s):
self fromInteger:1
self fromInteger:-1
self fromInteger:2
self fromInteger:1024 * 1024 * 1024 * 1024 * 1024 * 1024
self fromInteger:1e20 asInteger
self fromInteger:1e100 asInteger
self fromInteger:2r1010101010101010101010101010101
self fromInteger:2r1010101010101010101010101010101010101010101010101010101010101010
self fromInteger:(2 raisedTo:10000)
1 asIEEEFloat
|
Usage example(s):
OctaFloat fromInteger:123
123 asOctaFloat
|
-
fromLongFloat: aLongFloat
-
return a new octaFloat, given a long float value
Usage example(s):
OctaFloat fromLongFloat:123.0 asLongFloat
|
-
fromShortFloat: aShortFloat
-
return a new octaFloat, given a float32 value
Usage example(s):
OctaFloat fromShortFloat:123.0 asShortFloat
|
-
new: size
-
(comment from inherited method)
catch this message - not allowed for floats/doubles
queries
-
defaultExponentSizeForByteSize: nBytes
-
(comment from inherited method)
self defaultExponentSizeForByteSize:2 5
self defaultExponentSizeForByteSize:4 8
self defaultExponentSizeForByteSize:5 8
self defaultExponentSizeForByteSize:8 11
self defaultExponentSizeForByteSize:10 15
self defaultExponentSizeForByteSize:16 15
self defaultExponentSizeForByteSize:32 19
self defaultExponentSizeForByteSize:64 32
-
defaultPrintPrecision
-
the default number of digits when printing
-
defaultPrintfPrecision
-
the default number of digits when printing with printf's %f format.
Notice, that the C-language standard states that this should be 6;
however, we can adjust it on a per-class basis.
-
epsilon
-
return the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)
according to ISO C standard;
Ada, C, C++ and Python language constants;
Mathematica, MATLAB and Octave; and various textbooks
see https://en.wikipedia.org/wiki/Machine_epsilon
Usage example(s):
-
exponentCharacter
-
return the character used to print between mantissa an exponent.
Also used by the scanner when reading numbers.
-
isSupported
-
-
numBitsInExponent
-
answer the number of bits in the exponent.
This is a 256bit octuple float, where 19 bits are available in the exponent:
seeeeeee eeeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
Usage example(s):
1.0 class numBitsInExponent -> 11
1.0 asShortFloat class numBitsInExponent -> 8
1.0 asLongFloat class numBitsInExponent -> 15
1.0 asQuadFloat class numBitsInExponent -> 15
1.0 asOctaFloat class numBitsInExponent -> 19
|
-
numBitsInMantissa
-
answer the number of bits in the mantissa (the significant).
The hidden bit is not counted here.
This is a 256bit octafloat,
where 236 bits are available in the mantissa:
seeeeeee eeeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
Usage example(s):
1.0 class numBitsInMantissa
1.0 asShortFloat class numBitsInMantissa
1.0 asLongFloat class numBitsInMantissa
1.0 asQuadFloat class numBitsInMantissa
1.0 asOctaFloat class numBitsInMantissa
|
-
radix
-
answer the radix of a OctaFloat's exponent
This is an IEEE float, which is represented as binary
arithmetic
-
* aNumber
-
return the product of the receiver and the argument.
-
+ aNumber
-
return the sum of the receiver and the argument, aNumber
-
- aNumber
-
return the difference of the receiver and the argument, aNumber
-
/ aNumber
-
return the quotient of the receiver and the argument, aNumber
-
abs
-
return the absolute value of the receiver
reimplemented here for speed
Usage example(s):
1.0 asOctaFloat -> 1.0
1.0 asOctaFloat abs -> 1.0
-1.0 asOctaFloat abs -> 1.0
|
-
negated
-
return the receiver negated
Usage example(s):
1.0 asOctaFloat
1.0 asOctaFloat negated
-1.0 asOctaFloat negated
|
-
rem: aNumber
-
return the floating point remainder of the receiver and the argument, aNumber
coercing & converting
-
asFloat
-
return a Float (i.e. an IEEE double) with same value as the receiver.
Does NOT raise an error if the receiver exceeds the float range or is non-finite.
Returns infinity if the receiver exceeds the float range.
Usage example(s):
-
asOctaFloat
-
1.0 asOctaFloat asOctaFloat
-
generality
-
return the generality value - see ArithmeticValue>>retry:coercing:
comparing
-
< aNumber
-
return true, if the argument is greater
-
= aNumber
-
return true, if the argument represents the same numeric value
as the receiver, false otherwise
-
hash
-
return a number for hashing; redefined, since floats compare
by numeric value (i.e. 3.0 = 3), therefore 3.0 hash must be the same
as 3 hash.
Usage example(s):
1.2345 hash
1.2345 asShortFloat hash
1.2345 asLongFloat hash
1.2345 asOctaFloat hash
1.0 hash
1.0 asShortFloat hash
1.0 asLongFloat hash
1.0 asOctaFloat hash
0.5 asShortFloat hash
0.5 asShortFloat hash
0.5 asLongFloat hash
0.5 asOctaFloat hash
0.25 asShortFloat hash
0.25 asShortFloat hash
0.25 asLongFloat hash
0.25 asOctaFloat hash
|
double dispatching
-
differenceFromOctaFloat: anOctaFloat
-
sent when anOctaFloat does not know how to subtract the receiver, self
-
equalFromOctaFloat: anOctaFloat
-
sent when anOctaFloat does not know how to compare against the receiver, self
-
lessFromOctaFloat: anOctaFloat
-
sent when anOctaFloat does not know how to compare against the receiver, self
-
productFromOctaFloat: anOctaFloat
-
sent when anOctaFloat does not know how to multiply the receiver, self
-
quotientFromOctaFloat: anOctaFloat
-
sent when anOctaFloat does not know how to multiply the receiver, self
-
sumFromOctaFloat: anOctaFloat
-
sent when anOctaFloat does not know how to add the receiver, self
mathematical functions
-
exp
-
return e raised to the power of the receiver
-
ln
-
return the natural logarithm of the receiver.
-
log
-
return log base 10 of the receiver.
Alias for log:10.
-
log2
-
return logarithm dualis of the receiver.
printing
-
printOn: aStream
-
self commonPrintOn:aStream
private accessing
-
basicAt: index
-
return an internal byte of the float.
The value returned here depends on byte order, float representation etc.
Therefore, this method should be used strictly private.
Notice:
the need to redefine this method here is due to the
inability of many machines to store floats in non-double aligned memory.
Therefore, on some machines, the first 4 bytes of a float are left unused,
and the actual float is stored at index 5 .. 12.
To hide this at one place, this method knows about that, and returns
values as if this filler wasnt present.
-
basicAt: index put: value
-
set an internal byte of the float.
The value to be stored here depends on byte order, float representation etc.
Therefore, this method should be used strictly private.
Notice:
the need to redefine this method here is due to the
inability of many machines to store floats in non-double aligned memory.
Therefore, on some machines, the first 4 bytes of a float are left unused,
and the actual float is stored at index 5 .. 12.
To hide this at one place, this method knows about that, and returns
values as if this filler wasnt present.
-
basicSize
-
return the size in bytes of the float.
Notice:
the need to redefine this method here is due to the
inability of many machines to store floats in non-double aligned memory.
Therefore, on some machines, the first 4 bytes of a float are left unused,
and the actual float is stored at index 5 .. 12.
To hide this at one place, this method knows about that, and returns
values as if this filler wasn't present.
-
exponentSize: numBitsInExponent
-
I have a hard-coded exponentSize;
verify that instances are created correctly here
queries
-
eBias
-
Answer the exponent's bias;
that is the offset of the zero exponent when stored
Usage example(s):
1.0 asOctaFloat eBias -> 262143
1.0 asQuadFloat eBias -> 16383
1.0 eBias -> 1023
|
-
emax
-
The largest exponent value allowed by instances like me.
This is also implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
Usage example(s):
Float emax -> 1023
ShortFloat emax -> 127
LongFloat emax -> 16383
QuadFloat emax -> 16383
OctaFloat emax -> 262143
QDouble emax -> 1023
|
-
emin
-
The smallest exponent value allowed by (normalized) instances of this class.
This is also implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
Usage example(s):
Float emin
OctaFloat emin
OctaFloat emax
|
-
exponent
-
extract a normalized float's (unbiased) exponent.
The returned value depends on the float-representation of
the underlying machine and is therefore highly unportable.
This is not for general use.
This assumes that the mantissa is normalized to
0.5 .. 1.0 and the float's value is: mantissa * 2^exp
Usage example(s):
self eBias
QuadFloat eBias => 16383
OctaFloat eBias => 262143
1.0 exponent => 1
1.0 asOctaFloat exponent => 1
2.0 exponent => 2
2.0 asOctaFloat exponent => 2
3.0 exponent => 2
3.0 asOctaFloat exponent => 2
3.0 mantissa => 0.75
3.0 asOctaFloat mantissa => 0.75
3.0 mantissa * (2 raisedTo:3.0 exponent) => 3.0
3.0 asOctaFloat mantissa * (2 raisedTo:3.0 asOctaFloat exponent) => 3.0
4.0 exponent 3
4.0 asOctaFloat exponent 3
0.5 exponent 0
0.5 asOctaFloat exponent 0
0.4 exponent -1
0.4 asOctaFloat exponent -1
0.25 exponent -1
0.25 asOctaFloat exponent -1
0.2 exponent -2
0.2 asOctaFloat exponent -2
0.00000011111 exponent -23
0.00000011111 asOctaFloat exponent -23
0.0 exponent 0
0.0 asOctaFloat exponent 0
0.0 nextFloat => 4.94065645841247e-324
0.0 asOctaFloat nextFloat wrong!
0.0 nextFloat exponent => -1073
0.0 asOctaFloat nextFloat exponent => -262377
1e1000 exponent -> error (INF)
1Q1000 exponent -> 3322
OctaFloat fmax exponent -> 262144
OctaFloat fmin exponent -> -262141
OctaFloat NaN exponent -> error
OctaFloat infinity exponent -> error
|
-
exponentBits
-
return the bits of my exponent.
These might be biased.
Usage example(s):
1.0 exponentBits -> 1023
-1.0 exponentBits -> 1023
10.0 exponentBits -> 1026
0.125 exponentBits -> 1020
0.1 exponentBits -> 1019
1.0 asQuadFloat exponentBits -> 16383
-1.0 asQuadFloat exponentBits -> 16383
10.0 asQuadFloat exponentBits -> 16386
0.125 asQuadFloat exponentBits -> 16380
0.1 asQuadFloat exponentBits -> 16379
1.0 asOctaFloat exponentBits -> 262143
-1.0 asOctaFloat exponentBits -> 262143
10.0 asOctaFloat exponentBits -> 262146
0.125 asOctaFloat exponentBits -> 262140
0.1 asOctaFloat exponentBits -> 262139
|
-
isFinite
-
return true, if the receiver is a finite float (not NaN and not +/-INF)
Usage example(s):
1.0 asOctaFloat isFinite true
OctaFloat fmin isFinite true
OctaFloat fmax isFinite true
self NaN isFinite false
self infinity isFinite false
self negativeInfinity isFinite false
(0.0 uncheckedDivide: 0.0) isFinite false
(1.0 uncheckedDivide: 0.0) isFinite false
|
-
isInfinite
-
return true, if the receiver is an infinite float (+Inf or -Inf).
Usage example(s):
1.0 asOctaFloat isInfinite false
self NaN isInfinite false
self infinity isInfinite true
self negativeInfinity isInfinite true
(0.0 uncheckedDivide: 0.0) isInfinite false
(1.0 uncheckedDivide: 0.0) isInfinite true
|
-
isNaN
-
return true, if the receiver is an invalid float (NaN - not a number).
These are usually not created by ST/X float operations (they raise an exception);
however, inline C-code or proceeded exceptions or reading from a stream
could produce them.
Usage example(s):
OctaFloat NaN isNaN true
self NaN isNaN true
self infinity isNaN false
self negativeInfinity isNaN false
(0.0 uncheckedDivide: 0.0) isNaN true
(1.0 uncheckedDivide: 0.0) isNaN false
|
-
isZero
-
return true, if the receiver is zero
Usage example(s):
0 asOctaFloat isZero
0 asOctaFloat negated isZero
1 asOctaFloat isZero
|
-
mantissa
-
extract a normalized float's mantissa (as OctaFloat).
That is a float of the same type as the receiver,
such that:
(f mantissa) * (2 ^ f exponent) = f
The returned value depends on the float-representation of
the underlying machine and is therefore highly unportable.
This is not for general use.
This assumes that the mantissa is normalized to 0.5 .. 1.0
Usage example(s):
1.0 exponent -> 1
1.0 asOctaFloat exponent -> 1
1.0 mantissa -> 0.5
1.0 asOctaFloat mantissa
0.25 exponent
0.25 asOctaFloat exponent
0.25 mantissa
0.25 asOctaFloat mantissa
0.00000011111 exponent
0.00000011111 mantissa
1e1000 mantissa
|
testing
-
isFloat256
-
Answer whether the receiver is a 256bit octuple precision float.
Always true here.
-
isOctaFloat
-
return true, if the receiver is some kind of quad floating point number (iee quad precision)
trigonometric
-
cos
-
return the cosine of the receiver (interpreted as radians)
-
sin
-
return the sine of the receiver (interpreted as radians)
-
tan
-
return the tangent of the receiver (interpreted as radians)
trigonometric - hyperbolic
-
cosh
-
return the hyperbolic cosine of the receiver (interpreted as radians)
-
sinh
-
return the hyperbolic sine of the receiver (interpreted as radians)
-
tanh
-
return the hyperbolic tangent of the receiver (interpreted as radians)
truncation & rounding
-
ceiling
-
return the smallest integer which is greater or equal to the receiver.
Usage example(s):
0.5 asOctaFloat ceiling
0.5 asOctaFloat ceilingAsFloat
-0.5 asOctaFloat ceiling
-0.5 asOctaFloat ceilingAsFloat
|
-
ceilingAsFloat
-
return the smallest integer-valued float greater or equal to the receiver.
This is much like #ceiling, but avoids a (possibly expensive) conversion
of the result to an integer.
It may be useful, if the result is to be further used in another float-operation.
-
floor
-
return the integer nearest the receiver towards negative infinity.
Usage example(s):
0.5 asOctaFloat floor
0.5 asOctaFloat floorAsFloat
-0.5 asOctaFloat floor
-0.5 asOctaFloat floorAsFloat
|
-
floorAsFloat
-
return the integer nearest the receiver towards negative infinity as a float.
This is much like #floor, but avoids a (possibly expensive) conversion
of the result to an integer.
It may be useful, if the result is to be further used in another float-operation.
|