eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'OctaFloat':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: OctaFloat


Inheritance:

   Object
   |
   +--Magnitude
      |
      +--ArithmeticValue
         |
         +--Number
            |
            +--LimitedPrecisionReal
               |
               +--AbstractIEEEFloat
                  |
                  +--OctaFloat

Package:
stx:libbasic2
Category:
Magnitude-Numbers
Version:
rev: 1.70 date: 2024/01/15 08:48:29
user: cg
file: OctaFloat.st directory: libbasic2
module: stx stc-classLibrary: libbasic2

Description:


Notice:
    Unfinished, ongoing work.
    Basic arithmetic should work, but rounding is not working correctly,
    this affects some of the series approximations (i.e. trigonometric).
    Therefore for the time being:
        Please only use them if you need to represent 256bit floats to be exchanged
        with the external world (i.e. to get the bit representation).

        If you need more precision than double IEEE precision, wither use QDoubles;
        which are faster and provide almost the same precision as OctaFloats,
        or use LargeFloats which provide any precision.

OctaFloats represent rational numbers with limited precision
and are mapped to IEEE octuple precision format (256bit),
also called binary256.

Notice, that a software emulation is done, which is much slower.
Thus only use them, if you really need the additional precision;
if not, use Float (which are doubles) or LongFloats which usually have IEEE extended precision (80bit).

OctaFloats give you definite 256 bit quadruple floats,
thus, code using octaFloats is guaranteed to be portable from one architecture to another.

Representation:
        256bit octuple IEEE floats (32bytes);
        237 bit mantissa,
        19 bit exponent,
        71 decimal digits (approx.)

Mixed mode arithmetic:
    octaFloat op anyFloat    -> octaFloat
    anyFloat op octaFloat    -> octaFloat

Range and precision of storage formats: see LimitedPrecisionReal >> documentation

[aliases:]
    Float256

copyright

COPYRIGHT (c) 2018 by eXept Software AG All Rights Reserved This software is furnished under a license and may be used only in accordance with the terms of that license and with the inclusion of the above copyright notice. This software may not be provided or otherwise made available to, or used by, any other person. No title to or ownership of the software is hereby transferred.

Class protocol:

class initialization
o  initialize
an alias

coercing & converting
o  coerce: aNumber
convert the argument aNumber into an instance of the receiver (class) and return it.

o  generality
return the generality value - see ArithmeticValue>>retry:coercing:

constants
o  NaN
return an octaFloat which represents not-a-Number (i.e. an invalid number)

Usage example(s):

     NaN := nil.
     self NaN

o  e
return the constant e as octaFloat

Usage example(s):

eDigits has enough digits for 256bit IEEE quads

Usage example(s):

do not use as a literal constant here - we cannot depend on the underlying C-compiler here...

Usage example(s):

     E := nil.
     OctaFloat e

Usage example(s):

Modified (comment): / 27-10-2021 / 11:54:11 / cg

o  halfPi
return the constant pi/2 as octaFloat

Usage example(s):

halfPiDigits has enough digits for 256bit IEEE quads

o  infinity
return an octaFloat which represents +INF

Usage example(s):

     PositiveInfinity := nil.
     self infinity

o  ln10
return the constant natural logarithm log(10) as an octaFloat.

Usage example(s):

ln10Digits has enough digits for 256bit IEEE quads

Usage example(s):

     Ln10 := nil.
     OctaFloat ln10

o  ln2
return the constant ln(2) as octaFloat

Usage example(s):

ln2Digits has enough digits for 256bit IEEE quads

Usage example(s):

     Ln2 := nil.
     self ln2

o  negativeInfinity
return an octaFloat which represents -INF

Usage example(s):

     NegativeInfinity := nil.
     self negativeInfinity

o  phi
return the constant phi as octaFloat

Usage example(s):

phiDigits has enough digits for 256bit IEEE quads

Usage example(s):

     Phi := nil.
     self phi

o  pi
return the constant pi as octaFloat

Usage example(s):

piDigits has enough digits for 256bit IEEE quads

Usage example(s):

do not use as a literal constant here - we cannot depend on the underlying C-compiler here...

Usage example(s):

     Pi := nil.
     self pi

o  sqrt2
return the constant sqrt(2) as OctaFloat

Usage example(s):

sqrt2Digits has enough digits for 128bit IEEE quads

Usage example(s):

     LongFloat sqrt2 -> 1.414213562373095049
     QuadFloat sqrt2 -> 1.4142135623730936799802295816596154
     OctaFloat sqrt2 -> 1.41421356237309504880168872420969807856967187537694807317667973799073249

o  sqrt3
return the constant sqrt(3) as OctaFloat

Usage example(s):

sqrt3Digits has enough digits for 128bit IEEE quads

Usage example(s):

     LongFloat sqrt3 -> 1.732050807568877294
     QDouble sqrt3   -> 1.73205080756888
     QuadFloat sqrt3 -> 1.7320508075688772935274463415058723
     OctaFloat sqrt3 -> 1.73205080756887729352744634150587236694280525381038062805580697945193301

o  unity
return the neutral element for multiplication (1.0) as OctaFloat

Usage example(s):

     OctaFloatOne := nil.
     self unity

o  zero
return the neutral element for addition (0.0) as OctaFloat

Usage example(s):

     OctaFloatZero := nil.
     self zero

error reportng
o  errorUnsupported
you may proceed from this error, to get a long float number result
(of course, with less than expected precision)

instance creation
o  basicNew
return a new octaFloat - here we return 0.0
- OctaFloats are usually NOT created this way ...
Its implemented here to allow things like binary store & load of octaFloats.
(but it is not a good idea to store the bits of a float - the reader might have a
totally different representation - so floats should be
binary stored in a device independent format).

o  basicNew: size
(comment from inherited method)
return an instance of myself with anInteger indexed variables.
If the receiver-class has no indexed instvars, this is only allowed
if the argument, anInteger is zero.
** Do not redefine this method in any class **

o  fromFloat: aFloat
return a new octaFloat, given a float value

Usage example(s):

     OctaFloat fromFloat:123.0
     123.0 asOctaFloat
     123 asOctaFloat

o  fromInteger: anInteger
return a new octaFloat, given an integer value

Usage example(s):

     self fromInteger:1
     self fromInteger:-1
     self fromInteger:2
     self fromInteger:1024 * 1024 * 1024 * 1024 * 1024 * 1024
     self fromInteger:1e20 asInteger
     self fromInteger:1e100 asInteger
     self fromInteger:2r1010101010101010101010101010101
     self fromInteger:2r1010101010101010101010101010101010101010101010101010101010101010
     self fromInteger:(2 raisedTo:10000)
     1 asIEEEFloat

Usage example(s):

     OctaFloat fromInteger:123
     123 asOctaFloat

o  fromLongFloat: aLongFloat
return a new octaFloat, given a long float value

Usage example(s):

     OctaFloat fromLongFloat:123.0 asLongFloat

o  fromShortFloat: aShortFloat
return a new octaFloat, given a float32 value

Usage example(s):

     OctaFloat fromShortFloat:123.0 asShortFloat

o  new: size
(comment from inherited method)
catch this message - not allowed for floats/doubles

queries
o  defaultExponentSizeForByteSize: nBytes
(comment from inherited method)
self defaultExponentSizeForByteSize:2 5
self defaultExponentSizeForByteSize:4 8
self defaultExponentSizeForByteSize:5 8
self defaultExponentSizeForByteSize:8 11
self defaultExponentSizeForByteSize:10 15
self defaultExponentSizeForByteSize:16 15
self defaultExponentSizeForByteSize:32 19
self defaultExponentSizeForByteSize:64 32

o  defaultPrintPrecision
the default number of digits when printing

o  defaultPrintfPrecision
the default number of digits when printing with printf's %f format.
Notice, that the C-language standard states that this should be 6;
however, we can adjust it on a per-class basis.

o  epsilon
return the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)
according to ISO C standard;
Ada, C, C++ and Python language constants;
Mathematica, MATLAB and Octave; and various textbooks
see https://en.wikipedia.org/wiki/Machine_epsilon

Usage example(s):

     self epsilon

o  exponentCharacter
return the character used to print between mantissa an exponent.
Also used by the scanner when reading numbers.

o  isSupported

o  numBitsInExponent
answer the number of bits in the exponent.
This is a 256bit octuple float, where 19 bits are available in the exponent:
seeeeeee eeeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...

Usage example(s):

     1.0 class numBitsInExponent -> 11
     1.0 asShortFloat class numBitsInExponent -> 8
     1.0 asLongFloat class numBitsInExponent -> 15
     1.0 asQuadFloat class numBitsInExponent -> 15
     1.0 asOctaFloat class numBitsInExponent -> 19

o  numBitsInMantissa
answer the number of bits in the mantissa (the significant).
The hidden bit is not counted here.
This is a 256bit octafloat,
where 236 bits are available in the mantissa:
seeeeeee eeeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...

Usage example(s):

     1.0 class numBitsInMantissa
     1.0 asShortFloat class numBitsInMantissa
     1.0 asLongFloat class numBitsInMantissa
     1.0 asQuadFloat class numBitsInMantissa
     1.0 asOctaFloat class numBitsInMantissa

o  radix
answer the radix of a OctaFloat's exponent
This is an IEEE float, which is represented as binary


Instance protocol:

arithmetic
o  * aNumber
return the product of the receiver and the argument.

o  + aNumber
return the sum of the receiver and the argument, aNumber

o  - aNumber
return the difference of the receiver and the argument, aNumber

o  / aNumber
return the quotient of the receiver and the argument, aNumber

o  abs
return the absolute value of the receiver
reimplemented here for speed

Usage example(s):

     1.0 asOctaFloat       -> 1.0
     1.0 asOctaFloat abs   -> 1.0
     -1.0 asOctaFloat abs  -> 1.0

o  negated
return the receiver negated

Usage example(s):

     1.0 asOctaFloat
     1.0 asOctaFloat negated
     -1.0 asOctaFloat negated

o  rem: aNumber
return the floating point remainder of the receiver and the argument, aNumber

coercing & converting
o  asFloat
return a Float (i.e. an IEEE double) with same value as the receiver.
Does NOT raise an error if the receiver exceeds the float range or is non-finite.
Returns infinity if the receiver exceeds the float range.

Usage example(s):

     1.0 asOctaFloat asFloat

o  asOctaFloat
1.0 asOctaFloat asOctaFloat

o  generality
return the generality value - see ArithmeticValue>>retry:coercing:

comparing
o  < aNumber
return true, if the argument is greater

o  = aNumber
return true, if the argument represents the same numeric value
as the receiver, false otherwise

o  hash
return a number for hashing; redefined, since floats compare
by numeric value (i.e. 3.0 = 3), therefore 3.0 hash must be the same
as 3 hash.

Usage example(s):

     1.2345 hash
     1.2345 asShortFloat hash
     1.2345 asLongFloat hash
     1.2345 asOctaFloat hash

     1.0 hash
     1.0 asShortFloat hash
     1.0 asLongFloat hash
     1.0 asOctaFloat hash

     0.5 asShortFloat hash
     0.5 asShortFloat hash
     0.5 asLongFloat hash
     0.5 asOctaFloat hash

     0.25 asShortFloat hash
     0.25 asShortFloat hash
     0.25 asLongFloat hash
     0.25 asOctaFloat hash

double dispatching
o  differenceFromOctaFloat: anOctaFloat
sent when anOctaFloat does not know how to subtract the receiver, self

o  equalFromOctaFloat: anOctaFloat
sent when anOctaFloat does not know how to compare against the receiver, self

o  lessFromOctaFloat: anOctaFloat
sent when anOctaFloat does not know how to compare against the receiver, self

o  productFromOctaFloat: anOctaFloat
sent when anOctaFloat does not know how to multiply the receiver, self

o  quotientFromOctaFloat: anOctaFloat
sent when anOctaFloat does not know how to multiply the receiver, self

o  sumFromOctaFloat: anOctaFloat
sent when anOctaFloat does not know how to add the receiver, self

mathematical functions
o  exp
return e raised to the power of the receiver

o  ln
return the natural logarithm of the receiver.

o  log
return log base 10 of the receiver.
Alias for log:10.

o  log2
return logarithm dualis of the receiver.

printing
o  printOn: aStream
self commonPrintOn:aStream

private accessing
o  basicAt: index
return an internal byte of the float.
The value returned here depends on byte order, float representation etc.
Therefore, this method should be used strictly private.

Notice:
the need to redefine this method here is due to the
inability of many machines to store floats in non-double aligned memory.
Therefore, on some machines, the first 4 bytes of a float are left unused,
and the actual float is stored at index 5 .. 12.
To hide this at one place, this method knows about that, and returns
values as if this filler wasnt present.

o  basicAt: index put: value
set an internal byte of the float.
The value to be stored here depends on byte order, float representation etc.
Therefore, this method should be used strictly private.

Notice:
the need to redefine this method here is due to the
inability of many machines to store floats in non-double aligned memory.
Therefore, on some machines, the first 4 bytes of a float are left unused,
and the actual float is stored at index 5 .. 12.
To hide this at one place, this method knows about that, and returns
values as if this filler wasnt present.

o  basicSize
return the size in bytes of the float.

Notice:
the need to redefine this method here is due to the
inability of many machines to store floats in non-double aligned memory.
Therefore, on some machines, the first 4 bytes of a float are left unused,
and the actual float is stored at index 5 .. 12.
To hide this at one place, this method knows about that, and returns
values as if this filler wasn't present.

o  exponentSize: numBitsInExponent
I have a hard-coded exponentSize;
verify that instances are created correctly here

queries
o  eBias
Answer the exponent's bias;
that is the offset of the zero exponent when stored

Usage example(s):

     1.0 asOctaFloat eBias  -> 262143
     1.0 asQuadFloat eBias  -> 16383
     1.0 eBias              -> 1023

o  emax
The largest exponent value allowed by instances like me.
This is also implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

Usage example(s):

     Float emax       -> 1023
     ShortFloat emax  -> 127
     LongFloat emax   -> 16383
     QuadFloat emax   -> 16383
     OctaFloat emax   -> 262143
     QDouble emax     -> 1023

o  emin
The smallest exponent value allowed by (normalized) instances of this class.
This is also implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

Usage example(s):

     Float emin
     OctaFloat emin
     OctaFloat emax

o  exponent
extract a normalized float's (unbiased) exponent.
The returned value depends on the float-representation of
the underlying machine and is therefore highly unportable.
This is not for general use.
This assumes that the mantissa is normalized to
0.5 .. 1.0 and the float's value is: mantissa * 2^exp

Usage example(s):

self eBias
    QuadFloat eBias => 16383
    OctaFloat eBias => 262143
     1.0 exponent                    => 1
     1.0 asOctaFloat exponent        => 1
     2.0 exponent                    => 2
     2.0 asOctaFloat exponent        => 2

     3.0 exponent                    => 2
     3.0 asOctaFloat exponent        => 2
     3.0 mantissa                    => 0.75
     3.0 asOctaFloat mantissa        => 0.75
     3.0 mantissa * (2 raisedTo:3.0 exponent) => 3.0
     3.0 asOctaFloat mantissa * (2 raisedTo:3.0 asOctaFloat exponent) => 3.0
     
     4.0 exponent                    3
     4.0 asOctaFloat exponent        3
     0.5 exponent                    0
     0.5 asOctaFloat exponent        0
     0.4 exponent                    -1
     0.4 asOctaFloat exponent        -1
     0.25 exponent                   -1
     0.25 asOctaFloat exponent       -1
     0.2 exponent                    -2
     0.2 asOctaFloat exponent        -2
     0.00000011111 exponent          -23
     0.00000011111 asOctaFloat exponent -23
     0.0 exponent                    0
     0.0 asOctaFloat exponent        0

     0.0 nextFloat               => 4.94065645841247e-324
     0.0 asOctaFloat nextFloat   wrong!
     0.0 nextFloat exponent              => -1073
     0.0 asOctaFloat nextFloat exponent  => -262377

     1e1000 exponent                -> error (INF)
     1Q1000 exponent                -> 3322
     OctaFloat fmax exponent        -> 262144
     OctaFloat fmin exponent        -> -262141
     OctaFloat NaN exponent         -> error
     OctaFloat infinity exponent    -> error

o  exponentBits
return the bits of my exponent.
These might be biased.

Usage example(s):

     1.0 exponentBits  -> 1023
     -1.0 exponentBits -> 1023
     10.0 exponentBits  -> 1026
     0.125 exponentBits -> 1020
     0.1 exponentBits   -> 1019

     1.0 asQuadFloat exponentBits  -> 16383
     -1.0 asQuadFloat exponentBits -> 16383
     10.0 asQuadFloat exponentBits  -> 16386
     0.125 asQuadFloat exponentBits -> 16380
     0.1 asQuadFloat exponentBits   -> 16379

     1.0 asOctaFloat exponentBits  -> 262143
     -1.0 asOctaFloat exponentBits -> 262143
     10.0 asOctaFloat exponentBits  -> 262146
     0.125 asOctaFloat exponentBits -> 262140
     0.1 asOctaFloat exponentBits   -> 262139

o  isFinite
return true, if the receiver is a finite float (not NaN and not +/-INF)

Usage example(s):

     1.0 asOctaFloat isFinite            true
     OctaFloat fmin isFinite             true
     OctaFloat fmax isFinite             true
     self NaN isFinite                   false
     self infinity isFinite              false
     self negativeInfinity isFinite      false
     (0.0 uncheckedDivide: 0.0) isFinite false
     (1.0 uncheckedDivide: 0.0) isFinite false

o  isInfinite
return true, if the receiver is an infinite float (+Inf or -Inf).

Usage example(s):

     1.0 asOctaFloat isInfinite            false
     self NaN isInfinite                   false
     self infinity isInfinite              true
     self negativeInfinity isInfinite      true
     (0.0 uncheckedDivide: 0.0) isInfinite false
     (1.0 uncheckedDivide: 0.0) isInfinite true

o  isNaN
return true, if the receiver is an invalid float (NaN - not a number).
These are usually not created by ST/X float operations (they raise an exception);
however, inline C-code or proceeded exceptions or reading from a stream
could produce them.

Usage example(s):

     OctaFloat NaN isNaN              true
     self NaN isNaN                   true
     self infinity isNaN              false
     self negativeInfinity isNaN      false
     (0.0 uncheckedDivide: 0.0) isNaN true
     (1.0 uncheckedDivide: 0.0) isNaN false

o  isZero
return true, if the receiver is zero

Usage example(s):

     0 asOctaFloat isZero
     0 asOctaFloat negated isZero
     1 asOctaFloat isZero

o  mantissa
extract a normalized float's mantissa (as OctaFloat).
That is a float of the same type as the receiver,
such that:
(f mantissa) * (2 ^ f exponent) = f
The returned value depends on the float-representation of
the underlying machine and is therefore highly unportable.
This is not for general use.
This assumes that the mantissa is normalized to 0.5 .. 1.0

Usage example(s):

     1.0 exponent              -> 1
     1.0 asOctaFloat exponent  -> 1
     1.0 mantissa              -> 0.5
     1.0 asOctaFloat mantissa

     0.25 exponent
     0.25 asOctaFloat exponent
     0.25 mantissa
     0.25 asOctaFloat mantissa

     0.00000011111 exponent
     0.00000011111 mantissa

     1e1000 mantissa

testing
o  isFloat256
Answer whether the receiver is a 256bit octuple precision float.
Always true here.

o  isOctaFloat
return true, if the receiver is some kind of quad floating point number (iee quad precision)

trigonometric
o  cos
return the cosine of the receiver (interpreted as radians)

o  sin
return the sine of the receiver (interpreted as radians)

o  tan
return the tangent of the receiver (interpreted as radians)

trigonometric - hyperbolic
o  cosh
return the hyperbolic cosine of the receiver (interpreted as radians)

o  sinh
return the hyperbolic sine of the receiver (interpreted as radians)

o  tanh
return the hyperbolic tangent of the receiver (interpreted as radians)

truncation & rounding
o  ceiling
return the smallest integer which is greater or equal to the receiver.

Usage example(s):

     0.5 asOctaFloat ceiling
     0.5 asOctaFloat ceilingAsFloat
     -0.5 asOctaFloat ceiling
     -0.5 asOctaFloat ceilingAsFloat

o  ceilingAsFloat
return the smallest integer-valued float greater or equal to the receiver.
This is much like #ceiling, but avoids a (possibly expensive) conversion
of the result to an integer.
It may be useful, if the result is to be further used in another float-operation.

o  floor
return the integer nearest the receiver towards negative infinity.

Usage example(s):

     0.5 asOctaFloat floor
     0.5 asOctaFloat floorAsFloat
     -0.5 asOctaFloat floor
     -0.5 asOctaFloat floorAsFloat

o  floorAsFloat
return the integer nearest the receiver towards negative infinity as a float.
This is much like #floor, but avoids a (possibly expensive) conversion
of the result to an integer.
It may be useful, if the result is to be further used in another float-operation.



ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Sun, 22 Dec 2024 02:47:43 GMT