Smalltalk/X Webserver

Documentation of class 'LimitedPrecisionReal':

Class: LimitedPrecisionReal

Inheritance
Description
Class protocol
Instance protocol

Inheritance:

   Object
   |
   +--Magnitude
      |
      +--ArithmeticValue
         |
         +--Number
            |
            +--LimitedPrecisionReal
               |
               +--AbstractIEEEFloat
               |
               +--Float
               |
               +--HalfFloat
               |
               +--LargeFloat
               |
               +--LongFloat
               |
               +--QDouble
               |
               +--RaisedNumber
               |
               +--ShortFloat

Package:: stx:libbasic

Category:: Magnitude-Numbers

Version:: rev: 1.238 date: 2024/01/15 08:48:59; user: cg; file: LimitedPrecisionReal.st directory: libbasic; module: stx stc-classLibrary: libbasic

Description:

Abstract superclass for any-precision floating point numbers (i.e. IEEE floats and doubles).

Short summary for beginners (find details in wikipedia):
========================================================

Floating point numbers are represented with a sign,
a mantissa and an exponent, and the number's magnitude is: 
    mantissa * (2 raisedTo: exponent) 
with (1 > mantissa >= 0) and exponent adjusted as required for the mantissa to be in that range
(so called ''normalized'')

therefore,
    13 asFloat mantissa -> 0.8125
    13 asFloat exponent ->  4  
    0.8125 * (2 raisedTo:4) -> 13

and:    
    104 asFloat mantissa -> 0.8125
    104 asFloat exponent -> 7  
    0.8125 * (2 raisedTo:7) -> 104

and:    
    0.1 mantissa -> 0.8
    0.1 exponent -> -3  
    0.8 * (2 raisedTo:-3) -> 0.1

however:    
    (1 / 3.0) mantissa -> 0.666666666666667
    (1 / 3.0) exponent -> -1  
    0.666666666666667 * (2 raisedTo:-1) -> 0.333333333333333


Danger in using Floats:
=======================

Beginners seem to forget (or never learn?) that fltoating point numbers 
are always APPROXIMATIONs of some value.
You may never ever use them when exact results are neeed (i.e. when computing money!)
Take a look at the ScaledDecimal and FixedDecimal classes for that.
See also 'Float comparison' below.


The Float/Double confusion in ST/X:
===================================

Due to historic reasons, ST/X's Floats are what Doubles are in VisualWorks.

The reason is that in some Smalltalks, double floats are called Float, and no single float exists (VSE, V'Age),
whereas in others, there are both Float and Double classes (VisualWorks).
In order to allow code from both families to be loaded into ST/X without a missing class error, and without
loosing precision, we decided to use IEEE doubles as the internal representation of Float 
and make Double an alias to it.
This should work for either family (except for the unexpected additional precision in some cases).

If you really only want single precision floating point numbers, use ShortFloat instances.
But be aware that there is usually no advantage (neither in memory usage, due to memory alignment restrictions,
nor in speed), as these days, the CPUs are just as fast doing double precision operations.
(There might be a noticable difference when doing bulk operations, and you should consider using FloatArray for those).


Hardware supported precisions
=============================

The only really portable sizes are IEEE-single and IEEE-double floats (i.e. ShortFloat and Float instances).
These are supported on all architectures.
Some CPUs provide an extended precision floating point number,
however, the downside is that CPU-architects did not agree on a common format and precision: 
some use 80 bits, others 96 and others even 128.
See the comments in the LongFloat class for more details.
We recommend using Float (i.e. IEEE doubles) unless absolutely required,
and care for machine dependencies in the code otherwise.
For higher precision needs, you may also try the new QDouble class, which gives you >200bits (60digits) 
of precision on all machines or the software emulated QuadFloat or OctaFloat classes
(all come at a noticable performance price, though).
For very high precision (actually: arbitrary), take a look at the LargeFloat class.


Range and Precision of Storage Formats:
=======================================

  Format |   Class    |   Array Class   | Bits / Significant  | Smallest Pos Number | Largest Pos Number | Significant Digits
         |            |                 |      (Binary)       |                     |                    |     (Decimal)
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  half   |     --     | HalfFloatArray  |    16 / 11          |  6.10.... x 10−5    | 6.55...  x 10+5    |      3.3
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  single | ShortFloat | FloatArray      |    32 / 24          |  1.175... x 10-38   | 3.402... x 10+38   |      6-9
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  double | Float      | DoubleArray     |    64 / 53          |  2.225... x 10-308  | 1.797... x 10+308  |     15-17
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  double | LongFloat  |     --          |   128 / 113         |  3.362... x 10-4932 | 1.189... x 10+4932 |     33-36
  extend.|            |                 |                     |                     |                    |
  (SPARC)|            |                 |                     |                     |                    |
  -------+            |                 |---------------------+---------------------+--------------------+--------------------
  double |            |                 |    96 / 64          |  3.362... x 10-4932 | 1.189... x 10+4932 |     18-21
  extend.|            |                 |                     |                     |                    |
  (x86)  |            |                 |                     |                     |                    |
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
    --   | QDouble    |     --          |   256 / 212         |  2.225... x 10-308  | 1.797... x 10+308  |     >=60
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
    --   | QuadFloat  |     --          |   128 / 113         |  1.054... x 10-4931 | 1.189... x 10+4932 |     >=60
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
    --   | OctaFloat  |     --          |   256 / 237         |  3.271... x 10-78913| 1.611... x 10+78913|     >=60
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
    --   | LargeFloat |     --          |     arbitrary       |  arbitrarily small  |  arbitrarily large |     arbitrary
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------

HalfFloats are only supported in fixed array containers. 
This was added for OpenGL and other graphic libraries which allow for texture, 
and vertex data to be passed quickly in that format (see http://www.opengl.org/wiki/Small_Float_Formats).

Long- and LargeFloat are not supported as array containers.
These formats are seldom used for bulk data.

QDoubles are special soft floats; slower in performance, but providing 4 times the precision of regular doubles.

To see the differences in precision:
    
    '%60.58f' printf:{ 1 asShortFloat exp } -> '2.718281828459045*090795598298427648842334747314453125'          (32 bits)
    '%60.58f' printf:{ 1 asFloat exp }      -> '2.718281828459045*090795598298427648842334747314453125'          (64 bits)
    '%60.58f' printf:{ 1 asLongFloat exp }  -> '2.718281828459045235*4281681079939403389289509505033493041992'   (only 80 valid bits on x86)
    
    '%60.58f' printf:{ 1 asQDouble exp }    -> '2.71828182845904523536028747135266249775724709369995957496698'   (>200 bits)

    correct value is:                           2.71828182845904523536028747135266249775724709369995957496696762772407663035354759457138217852516642742746

Bulk Containers:
================
If you have a vector or matrix (and especially: large ones) of floating point numbers, the well known
Array is a very inperformant choice. The reason is that it keeps pointers to each of its elements, and each element
(if it is a float) is itself stored somewhere in the object memory.
Thus, there is both a space overhead (every float object has an object header, for class and other information), and
also a performance overhead (extra indirection, cache misses and alignment inefficiencies).
For this, the bulk numeric containers are provided, which keep the elements unboxed and properly aligned.
Use them for matrices and large numeric vectors. They also provide some optimized bulk operation methods,
such as adding, multiplying etc.
Take a look at FloatArray, DoubleArray, HalfFloatArray etc.


Comparing Floats:
=================
Due to rounding errors (usually on the last bit(s)), you shalt not compare two floating point numbers
using the #= operator. For example, the value 0.1 cannot be represented as a sum of powers-of-two fractions,
and will therefore always be an approximation with a half bit error in the last bit of the mantissa.
Usually, the print functions take this into consideration and return a (faked) '0.1'.
However, this half bit error may accumulate, for example, when multiplying that by 0.1 then by 100, 
the error may get large enough to be no longer pushed under the rug by the print function, 
and you will get '0.9999999999999' from it.

Also, comparing against a proper 1.0 (which is representable as an exact power of 2), 
you will get a false result.
i.e. (0.1 * 0.1 * 100 ~= 1.0) and (0.1 * 0.1 * 100 - 1.0) ~= 0.0
This often confuses non-computer scientists (and occasionally even some of those).

For this, you should always provide an epsilon value, when comparing two non-integer numbers. 
The epsilon value is the distance you accept two number to be apart to be still considered equal. 
Effectively the epsilon says are those nearer than this epsilon?.

Now we could say is the delta between two numbers smaller than 0.00001,
and get a reasonable answer for big numbers. But what if we compare two tiny numbers?
Then a reasonable epsilon must also be much smaller!

Actually, the epsilon should always be computed dynamically depending on the two values compared.
That is what the #isAlmostEqualTo:nEpsilon: method does for you. It does not take an absolute epsilon,
but instead the number of distinct floating point numbers that the two compared floats may be apart.
That is: the number of actually representable numbers between those two. 
Effectively, that is the difference between the two mantissas, 
when the numbers are scaled to the same exponent, taking the number of mantissa bits into account.

copyrightCOPYRIGHT (c) 1994 by Claus Gittinger
             All Rights Reserved

This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice.   This software may not
be provided or otherwise made available to, or used by, any
other person.  No title to or ownership of the software is
hereby transferred.

return the constant NaN (not a Number) in my representation.
Here, based on the assumption that division of zero by zero generates a NaN
(which is defined as such in the IEEE standard).
If a subclass does not, it has to redefine this method and generate a NaN differently

Usage example(s):

      ShortFloat NaN  
      Float NaN       
      LongFloat NaN   
      LargeFloat NaN   
      IEEEFloat NaN

negativeInfinity

return an instance of myself which represents negative infinity (for my instances).
Warning: do not compare equal against infinities;
instead, check using isFinite or isInfinite

Usage example(s):

      ShortFloat negativeInfinity   
      Float negativeInfinity       
      LongFloat negativeInfinity   
      LargeFloat negativeInfinity   
      QDouble negativeInfinity   
      IEEEFloat negativeInfinity

constants & defaults

computeEpsilon

compute the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit from the
next number after 1.0 and 1.0).
See https://en.wikipedia.org/wiki/Machine_epsilon

Usage example(s):

      Float radix 
      Float precision           
      
      ShortFloat computeEpsilon -> 1.192093e-07 
      Float computeEpsilon      -> 2.22044604925031E-16
      LongFloat computeEpsilon  -> 1.084202172485504434E-19
      QDouble computeEpsilon    -> 7.77876909732643E-62    
      QuadFloat computeEpsilon  -> 1.92593e-34    
      OctaFloat computeEpsilon  -> 9.05568e-72    

      QuadFloat radix
      (QuadFloat coerce:QuadFloat radix) => 2.00000
      2 asQuadFloat                      => 2.00000

eBias

Answer the exponent's bias;
that is the offset of the zero exponent when stored.
The computation below assumes standard IEEE format

Usage example(s):

     Float eBias       -> 1023
     ShortFloat eBias  -> 127
     HalfFloat eBias   -> 15
     LongFloat eBias   -> 16383
     QuadFloat eBias   -> 16383
     OctaFloat eBias   -> 262143
     QDouble eBias     -> 1023
     LargeFloat eBias  -> 0

Usage example(s):

     1.0 numBitsInExponent 11
     1.0 eBias             1023
     1.0 emin              -1022
     1.0 emax              1023
     1.0 fmin              2.2250738585072E-308
     1.0 fmax              1.79769313486232E+308

emax

The largest exponent value allowed by instances of this class.
The computation below assumes standard IEEE format

Usage example(s):

     Float emax       -> 1023
     ShortFloat emax  -> 127
     LongFloat emax   -> 16383
     QuadFloat emax   -> 16383
     OctaFloat emax   -> 262143
     QDouble emax     -> 1023

emin

The smallest exponent value allowed by (normalized) instances of this class.
The computation below assumes standard IEEE format

Usage example(s):

     Float emin       -> -1022
     ShortFloat emin  -> -126
     LongFloat emin   -> -16382
     QuadFloat emin   -> -16382
     OctaFloat emin   -> -262142
     QDouble emin     -> -1022

epsilon

return the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)
according to ISO C standard;
Ada, C, C++ and Python language constants;
Mathematica, MATLAB and Octave; and various textbooks
see https://en.wikipedia.org/wiki/Machine_epsilon

Usage example(s):

     Float epsilon       -> 2.22044604925031E-16
     ShortFloat epsilon  -> 1.192093e-07
     LongFloat epsilon   -> 1.084202172485504434E-19
     QDouble epsilon     -> 7.778769097326426826491248689356e-62

fmax

The largest value allowed by instances of this class.
Not required to return an instances of the class,
but may return a double (aka Float) with that value (eg. for HalfFloats)

Usage example(s):

     Float fmax       -> 1.79769313486232E+308
     ShortFloat fmax  -> 3.402823e+38
     LongFloat fmax   -> 1.189731495357231765E+4932
     HalfFloat fmax   -> 65504.0
     QuadFloat fmax   -> 1.189731495e4932
     OctaFloat fmax   -> 1.61132571748e78913
     QDouble fmax     -> error
     (IEEEFloat size:16 exponentSize:5) fmax asFloat 65504.0

fmaxDenormalized

the largest denormalized value which can be represented
by instances of this class.
Should actually be sent to the instance,
because of IEEEFloat, which has instance-specific representation

fmin

the smallest normalized non-zero value which can be represented
by instances of this class;
should actually be sent to the instance,
because some of my subclasses have an instance-specific representation.
Not required to return an instances of the class,
but may return a double (aka Float) with that value (eg. for HalfFloats)

Usage example(s):

     (1.0 asIEEEFloat:8) fmin -> 0.015625
     HalfFloat fmin           -> 6.103515625e-05
     ShortFloat fmin          -> 1.175494e-38
     Float fmin               -> 2.2250738585072e-308
     LongFloat fmin           -> 3.362103143112093506e-4932
     QuadFloat fmin           -> 3.3621031431119363650068581666578087e-4932
     OctaFloat fmin   
     QDouble fmin             -> 2.2250738585072e-308

     (IEEEFloat size:16 exponentSize:5) fmin asFloat 6.103515625e-05

     Float fmin      = (2.0 raisedTo:Float emin)                 -> true
     ShortFloat fmin = (2.0 raisedTo:ShortFloat emin)            -> true 
     QuadFloat fmin  = (2.0 asQuadFloat raisedTo:QuadFloat emin) -> true 
     OctaFloat fmin  = (2.0 asOctaFloat raisedTo:OctaFloat emin) -> true

fminDenormalized

the smallest non-zero value which can be represented
by instances of this class;
should actually be sent to the instance,
because of IEEEFloat, which has instance-specific representation

** This method must be redefined in concrete classes (subclassResponsibility) **

infinity

return an instance of myself which represents positive infinity (for my instances).
Warning: do not compare equal against infinities;
instead, check using isFinite or isInfinite

Usage example(s):

      ShortFloat infinity  
      Float infinity       
      LongFloat infinity   
      LargeFloat infinity   
      IEEEFloat infinity   
      QuadFloat infinity   
      OctaFloat infinity   
      QDouble infinity

maxSmallInteger

answer the largest possible SmallInteger value as instance of myself.
Notice: if my precision is smaller than the number of bits in a SmallInteger
you'll loose some precision.

Usage example(s):

     Float maxSmallInteger.       4.61168601842739e+18
     LongFloat maxSmallInteger.   4611686018427387903.0
     ShortFloat maxSmallInteger.  4.611686e+18
     QDouble maxSmallInteger.     4.61169e+18
     QuadFloat maxSmallInteger.   4.61169e+18

minSmallInteger

answer the smallest possible SmallInteger value as instance of myself

Usage example(s):

     Float maxSmallInteger.         
     LongFloat maxSmallInteger.     
     ShortFloat maxSmallInteger.    
     QDouble maxSmallInteger.       
     LargeFloat maxSmallInteger.     

     Float minSmallInteger.     
     LongFloat minSmallInteger. 
     ShortFloat minSmallInteger.
     QDouble minSmallInteger.   
     LargeFloat minSmallInteger.

instance creation

fromBytes: bytes

Float fromBytes:#[0 0 0 0 0 0 8 0]

fromInteger: anInteger

return a float with anInteger's value.
Since floats have a limited precision, you usually loose bits when doing this
with a large integer
i.e. when numDigits is above the flt. pnt number's precision.
(see Float decimalPrecision, LongFloat decimalPrecision).
Also, a domainError could be raised, if the integer cannot be
represented as an instance of the receiver class.
(can be caught with trapInfinity:)

Usage example(s):

     ShortFloat fromInteger:2
     12345678901234567890 asShortFloat            

     1234567890 asFloat                     
     1234567890 asFloat asInteger                    
     -1234567890 asFloat asInteger                    

     12345678901234567890 asFloat storeString            
     12345678901234567890 asFloat asInteger   
     -12345678901234567890 asFloat asInteger   

     12345678901234567890 asLongFloat           
     12345678901234567890 asLongFloat asInteger 
     -12345678901234567890 asLongFloat asInteger 

     123456789012345678901234567890 asLongFloat           
     123456789012345678901234567890 asLongFloat asInteger  
     -123456789012345678901234567890 asLongFloat asInteger  

     1234567890123456789012345678901234567890 asLongFloat           
     1234567890123456789012345678901234567890 asLongFloat asInteger  
     -1234567890123456789012345678901234567890 asLongFloat asInteger

     'this test is on 65 bits'.
     self assert: 16r1FFFFFFFFFFFF0801 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
     'this test is on 64 bits'.
     self assert: 16r1FFFFFFFFFFFF0802 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
     'nearest even is upper'.
     self assert: 16r1FFFFFFFFFFF1F800 asDouble = 16r1FFFFFFFFFFF20000 asDouble.
     'nearest even is lower'.
     self assert: 16r1FFFFFFFFFFFF0800 asDouble = 16r1FFFFFFFFFFFF0000 asDouble.

     -- loosing bits!
     (Float fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString     '1FFFFFFFFFFFF1000'    
     (Float fromInteger:16r1FFFFFFFFFFFF0880) asInteger hexPrintString     '1FFFFFFFFFFFF1000'    
     (Float fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString   '2000000000000000000'
     (Float fromInteger:16r1FFFFFFFFFFFFFFFFFFFF0801) asInteger hexPrintString '2000000000000000000000000'

     (LongFloat fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString     '1FFFFFFFFFFFF0800'    
     (LongFloat fromInteger:16r1FFFFFFFFFFFF0880) asInteger hexPrintString     '1FFFFFFFFFFFF0880'    
     (LongFloat fromInteger:16r1FFFFFFFFFFFFFF0880) asInteger hexPrintString   '1FFFFFFFFFFFFFF0800'
     (LongFloat fromInteger:16r1FFFFFFFFFFFFFFFFFFFF0801) asInteger hexPrintString  '2000000000000000000000000'

     (QuadFloat fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString   '1FFFFFFFFFFFF0801'
     (QuadFloat fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFFFF0801'

     (QDouble fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString   '1FFFFFFFFFFFF0801'
     (QDouble fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFFFF0801'

     (OctaFloat fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString   '1FFFFFFFFFFFF0801'
     (OctaFloat fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFFFF0801'

fromLimitedPrecisionReal: anLPReal

return a float with anLPReal's value.
You might loose bits when doing this.
Slow fallback.

fromNumerator: numerator denominator: denominator

Create a limited precision real from a Rational.
This version will answer the nearest flotaing point value,
according to IEEE 754 round to nearest even default mode

Usage example(s):

        Time millisecondsToRun:[
            1000000  timesRepeat:[
                Float fromNumerator:12345678901234567890 denominator:987654321
            ].
        ]

        |fraction|
        fraction := 12345678901234567890//987654321.
        Time millisecondsToRun:[
            1000000  timesRepeat:[
                fraction asFloat
            ].
        ]

new: aNumber

catch this message - not allowed for floats/doubles

random
( an extension from the stx:libbasic2 package )

Float random
Float32 random

readFrom: aStringOrStream onError: exceptionBlock

read a float from a string

Usage example(s):

     Float readFrom:'.1'
     Float readFrom:'0.1'
     Float readFrom:'0'

     ShortFloat readFrom:'.1'
     ShortFloat readFrom:'0.1'
     ShortFloat readFrom:'0'

     LongFloat readFrom:'.1'
     LongFloat readFrom:'0.1'
     LongFloat readFrom:'0'

     LimitedPrecisionReal readFrom:'bla' onError:nil
     Float readFrom:'bla' onError:nil
     ShortFloat readFrom:'bla' onError:nil

queries

decimalEmax

Answer how many digits of accuracy this class supports

Usage example(s):

     ShortFloat emax   
     ShortFloat decimalEmax  

     Float emax  
     Float emin
     Float decimalEmax  

     LongFloat emax        
     LongFloat emin        
     LongFloat decimalEmax

decimalPrecision

return the number of valid decimal digits

Usage example(s):

     HalfFloat decimalPrecision  -> 3
     ShortFloat decimalPrecision -> 7
     Float decimalPrecision      -> 16 
     LongFloat decimalPrecision  -> 19
     QuadFloat decimalPrecision  -> 34
     OctaFloat decimalPrecision  -> 71
     QDouble decimalPrecision    -> 61

defaultPrintPrecision

the default number of digits when printing

Usage example(s):

     ShortFloat defaultPrintPrecision -> 5 
     Float defaultPrintPrecision      -> 6 
     LongFloat defaultPrintPrecision  -> 8 
     QDouble defaultPrintPrecision    -> 10
     QuadFloat defaultPrintPrecision  -> 9
     OctaFloat defaultPrintPrecision  -> 11
     LargeFloat defaultPrintPrecision -> 12

defaultPrintfPrecision

the default number of digits when printing with printf's %f format.
Notice, that the C-language standard states that this should be 6;
however, we can adjust it on a per-class basis.

denormalized

Return whether the instances of this class can
represent values in denormalized format.

exactDecimalPrecision

return the exact number of decimal digits

Usage example(s):

     HalfFloat exactDecimalPrecision  -> 3.612359947967774002
     ShortFloat exactDecimalPrecision -> 7.224719895935548004
     Float exactDecimalPrecision      -> 15.95458977019100184      
     LongFloat exactDecimalPrecision  -> 19.26591972249479468
     QuadFloat exactDecimalPrecision  -> 34.01638951002987185
     OctaFloat exactDecimalPrecision  -> 71.34410897236353654
     QDouble exactDecimalPrecision    -> 61.41011911545215804

hasSharedInstances

return true if this class can share instances when stored binary,
that is, instances with the same value can be stored by reference.
Although not really shared, floats should be treated
so, to be independent of the implementation of the arithmetic methods.

isAbstract

Return if this class is an abstract class.
True is returned for LimitedPrecisionReal here; false for subclasses.

Usage example(s):

     1.0 class isAbstract

isIEEEFormat

return true, if this machine represents floats in IEEE format.
Currently, no support is provided for non-ieee machines
to convert their floats into this (which is only relevant,
if such a machine wants to send floats as binary to some other
machine).
Machines with non-IEEE format are VAXen and IBM370-type systems
(among others). Today, every system uses IEEE format floats.

numBitsInExponent

return the number of bits in the exponent

** This method must be redefined in concrete classes (subclassResponsibility) **

numBitsInIntegerPart

answer the number of bits in the integer part of the mantissa.
I.e. 0 is returned if there is a hidden bit, 1 if not.
Most floating point formats are normalized to get rid of the extra bit.

numBitsInMantissa

return the number of bits in the mantissa (the significant)
Typically the precision is 1 more than the significant due to the hidden bit
the hidden bit is not counted here.

** This method must be redefined in concrete classes (subclassResponsibility) **

numHiddenBits

answer the number of hidden bits in the mantissa.
This will return 0 or 1; 0 if there is no hidden bit, 1 if there is.
Most floating point formats are normalized to get one extra bit of precision
and thus will return 1 here.

precision

answer the precision (the number of bits in the mantissa) of my elements (in bits)
If my elements are IEEE floats, where only the fraction from the normalized mantissa is stored,
there will be a hidden bit and the mantissa will be actually represented by 1 more binary digits
(i.e. the number returned is 1 plus the actual number of bits stored)
any hidden bits are included here

Usage example(s):

      HalfFloatArray precision  
      ShortFloat precision  
      Float precision       
      LongFloat precision   
      QDouble precision

radix

answer the radix of my instance's exponent

** This method must be redefined in concrete classes (subclassResponsibility) **

Instance protocol:

Compatibility-Squeak

defaultNumberOfDigits
( an extension from the stx:libcompat package ): marked as obsolete by exept MBP at 13-11-2021

** This is an obsolete interface - do not use it (it may vanish in future versions) **

accessing

at: index: redefined to prevent access to individual bytes in a real.
at: index put: aValue: redefined to prevent access to individual bytes in a real

arithmetic

* aNumber

return the product of the receiver and the argument.

+ aNumber

return the sum of the receiver and the argument, aNumber

- aNumber

return the difference of the receiver and the argument, aNumber

/ aNumber

return the quotient of the receiver and the argument, aNumber

// aNumber

return the integer quotient of dividing the receiver by aNumber with
truncation towards negative infinity.

ceiling

(comment from inherited method)
return the integer nearest the receiver towards positive infinity.

floor

(comment from inherited method)
return the receiver truncated towards negative infinity

timesTwoPower: anInteger

multiply self by a power of two.
I.e. self * (2**n)
Implementation takes care of preserving class and avoiding overflow/underflow
if possible; otherwise returns infinity or zero.
Thanks to Nicolas Cellier for this code

Usage example(s):

     (3 asShortFloat timesTwoPower:10) -> 3072.0.
     (3 asFloat timesTwoPower:10)      -> 3072.0.

     (3 asShortFloat timesTwoPower:100) -> 3.802952e+30.
     (3 asFloat timesTwoPower:100)      -> 3.80295180068469e+30.

     (3 asShortFloat timesTwoPower:200) -> inf.
     (3 asFloat timesTwoPower:200)      -> 4.82081413277697e+60.

     (1 asShortFloat timesTwoPower: 3) class = ShortFloat.
     (1 asLongFloat timesTwoPower: 1024).
     (1 asFloat timesTwoPower: -1024) timesTwoPower: 1024.
     (1 asLongFloat timesTwoPower: -1024) timesTwoPower: 1024.

     (2.0 asShortFloat timesTwoPower: -150) timesTwoPower: 150    
     (2.0 asLongFloat timesTwoPower: -150) timesTwoPower: 150   
     (2.0 asFloat timesTwoPower: -150) timesTwoPower: 150       

     (2.0 asShortFloat timesTwoPower: -149) timesTwoPower: 149  
     (2.0 asLongFloat timesTwoPower: -149) timesTwoPower: 149    
     (2.0 asFloat timesTwoPower: -149) timesTwoPower: 149        

     (ShortFloat infinity timesTwoPower:10) -> inf  
     (LongFloat infinity timesTwoPower:10)  -> inf  
     (Float infinity timesTwoPower:10)      -> inf        

     Time millisecondsToRun:[
        1000000 timesRepeat:[
            (2.0 timesTwoPower: 150)
        ]
     ]

bytes access

digitBytes

answer the float's digit bytes in IEEE format.
Use the native machine byte ordering.

Usage example(s):

     1.0 digitBytes   
     Float pi digitBytes   
     ShortFloat pi digitBytes

digitBytesMSB: msb

answer the float's digit bytes im IEEE format.
If msb == true, use MSB byte order, otherwise LSB byte order.

Usage example(s):

      Float pi digitBytesMSB:false
      Float pi digitBytesMSB:true
      ShortFloat pi digitBytesMSB:false
      ShortFloat pi digitBytesMSB:true

coercing & converting

asFloat

(comment from inherited method)
return a float with same value

asFraction

Answer a rational number (Integer or Fraction) representing the receiver.
This conversion uses the continued fraction method to approximate
a floating point number.
In contrast to #asTrueFraction, which returns exactly the value of the float,
this rounds in the last significant bit of the floating point number.

Usage example(s):

     1.1 asFraction      
     1.2 asFraction      
     0.3 asFraction   
     0.5 asFraction  
     (1/5) asFloat asFraction  
     (1/8) asFloat asFraction  
     (1/13) asFloat asFraction 
     (1/10) asFloat asFraction 
     (1/10) asFloat asTrueFraction asFixedPoint scale:20 
     3.14159 asFixedPoint scale:20        
     3.14159 storeString       
     3.14159 asFraction asFloat storeString       
     1.3 asFraction            
     1.0 asFraction            
     1E6 asFraction            
     1E-6 asFraction

asIEEEFloat
( an extension from the stx:libbasic2 package )

return an IEEE soft float with same value as receiver

Usage example(s):

     123 asFloat asIEEEFloat
     0 asShortFloat asIEEEFloat
     0.0 asIEEEFloat                       
     Float NaN asIEEEFloat                
     Float positiveInfinity asIEEEFloat   
     Float negativeInfinity asIEEEFloat   
     ShortFloat NaN asIEEEFloat                
     ShortFloat positiveInfinity asIEEEFloat   
     ShortFloat negativeInfinity asIEEEFloat   
     QuadFloat NaN asIEEEFloat                
     QuadFloat positiveInfinity asIEEEFloat   
     QuadFloat negativeInfinity asIEEEFloat

asIEEEFloat: numBits

return an IEEE soft float with same value as receiver and numBits overAll
numBits should be a multiple of 8,
i.e. 32 for IEEE single, 64 for double, 128 for quadFloat, etc.)

Usage example(s):

     123 asFloat asIEEEFloat
     123 asFloat asIEEEFloat:32  
     123 asFloat asIEEEFloat:16
     12 asFloat asIEEEFloat:8
     12 asIEEEFloat:8
     0 asShortFloat asIEEEFloat
     0.0 asIEEEFloat

asInteger

return an integer with same value - might truncate.
Does not raise an error for non-finite numbers (NaN or INF)

Usage example(s):

     12345.0 asInteger     
     1e15 asInteger        
     1e33 asInteger asFloat
     1e303 asInteger asFloat

asLargeFloat
( an extension from the stx:libbasic2 package )

return a large float with (approximately) my value.
If the LargeFloat class is not present, a regular float is returned

asLargeFloatPrecision: n
( an extension from the stx:libbasic2 package )

return a large float with (approximately) my value.
If the largeFloat class is not present, a regular float is returned

Usage example(s):

     1.0 asLargeFloatPrecision:10

asLimitedPrecisionReal

return a float of any precision with same value

asLongFloat

(comment from inherited method)
return a longFloat with same value

asOctaFloat
( an extension from the stx:libbasic2 package )

(comment from inherited method)
return an octaFloat with same value

asQuadFloat
( an extension from the stx:libbasic2 package )

return a QuadFloat with same value as the receiver

asRational

Answer a Rational number--Integer or Fraction--representing the receiver.
Same as asFraction fro st-80 compatibility.

Usage example(s):

     1.1 asRational      
     1.2 asRational      
     0.3 asRational   
     0.5 asRational 
     (1/5) asFloat asRational
     (1/8) asFloat asRational  
     (1/13) asFloat asRational 
     3.14159 asRational        
     3.14159 asRational asFloat       
     1.3 asRational  
     1.0 asRational

asShortFloat

(comment from inherited method)
return a shortFloat with same value.
Does NOT raise an error if the receiver exceeds the float range.

asTrueFraction

Answer a fraction or integer that EXACTLY represents self,
an any-precision IEEE floating point number, consisting of:
numMantissaBits bits of normalized mantissa (i.e. with hidden leading 1-bit)
optional numExtraBits between mantissa and exponent (normalized flag for ext-real)
numExponentBits bits of 2s complement exponent
1 sign bit.
Taken from Float's asTrueFraction

Usage example(s):

(result asFloat = self) ifFalse: [self error: 'asTrueFraction validation failed'].

Usage example(s):

     1.0 asLongFloat asTrueFraction 

     0.3 asFloat asTrueFraction      (5404319552844595/18014398509481984)
     0.3 asShortFloat asTrueFraction (5033165/16777216) 
     0.3 asLongFloat asTrueFraction  (5404319552844595/18014398509481984) 
     0.3 asQuadFloat asTrueFraction  (5404319552844595/18014398509481984)  
     0.3 asOctaFloat asTrueFraction  (5404319552844595/18014398509481984)  

     1.25 asTrueFraction               (5/4)
     1.25 asShortFloat asTrueFraction  (5/4)   
     1.25 asLongFloat asTrueFraction   (5/4)  

     0.25 asTrueFraction                (1/4)
     0.25 asShortFloat asTrueFraction   (1/4)  
     0.25 asLongFloat asTrueFraction    (1/4) 

     -0.25 asTrueFraction               (-1/4)
     -0.25 asShortFloat asTrueFraction  (-1/4)
     -0.25 asLongFloat asTrueFraction   (-1/4)

     3e37 asTrueFraction                30000000000000002158062836758597337088
     3e37 asShortFloat asTrueFraction   30000001069098037760363920625477091328
     3e37 asLongFloat asTrueFraction    30000000000000002158062836758597337088
     3e37 asQuadFloat asTrueFraction    30000000000000002158062836758597337088
     3e37 asOctaFloat asTrueFraction    30000000000000002158062836758597337088
     3e37 asQDouble asTrueFraction      30000000000000002158062836758597337088

     0 asLongFloat negated asTrueFraction              
     LongFloat NaN asTrueFraction              
     LongFloat infinity asTrueFraction          
     LongFloat negativeInfinity asTrueFraction  

     Float fmin asTrueFraction              
     Float fminDenormalized asTrueFraction              
     Float fmaxDenormalized asTrueFraction              
     LongFloat fmin asTrueFraction              
     LongFloat fminDenormalized asTrueFraction              
     LongFloat fmaxDenormalized asTrueFraction

comparing

< aNumber: return true, if the argument is greater

double dispatching

differenceFromFraction: aFraction

sent when a fraction does not know how to subtract the receiver

equalFromFraction: aFraction

sent when a fraction does not know how to compare with the receiver

lessFromFraction: aFraction

aFraction does not know how to compare to the receiver -
Return true if aFraction < self.

productFromFraction: aFraction

sent when a fraction does not know how to multiply the receiver

quotientFromFloat: aFloat

return the quotient of aFloat and the receiver.
Return aFloat / self

quotientFromFraction: aFraction

Return the quotient of the argument, aFraction and the receiver.
Sent when aFraction does not know how to divide by the receiver.

sumFromFraction: aFraction

sent when a fraction does not know how to add the receiver

sumFromTimestamp: aTimestamp

I am to be interpreted as seconds, return the timestamp this number of seconds
after aTimestamp

Usage example(s):

     Timestamp now sumFromTimestamp:aTimestamp   
     100.0 sumFromTimestamp:Timestamp now 

     |t1 t2|
     t1 := Timestamp now. 
     t2 := 1.5 sumFromTimestamp:t1.
     t1 inspect. t2 inspect.

error reportng

errorUnsupported

inspecting

inspectorExtraAttributes
( an extension from the stx:libtool package ): extra (pseudo instvar) entries to be shown in an inspector.

printing & storing

commonPrintOn: aStream

a zero mantissa is impossible - except for zero and a few others

printOn: aStream

0.0 printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:0.0. Transcript cr.

0.0 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:0.0 asIEEEFloat. Transcript cr.

0.0 asOctaFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:0.0 asOctaFloat. Transcript cr.

-0.0 printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:-0.0. Transcript cr.

-0.0 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'%-g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'%+g' on:Transcript argument:0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'%+g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:0.0 asIEEEFloat. Transcript cr.

1234.0 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:1234.0 asIEEEFloat. Transcript cr.

1e39 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:1e39 asIEEEFloat. Transcript cr.

PrintfScanf printf:'% g' on:Transcript argument:IEEEFloat NaN. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:IEEEFloat infinity. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:IEEEFloat negativeInfinity. Transcript cr.

printStringScientific

return a 'user friendly' scientific printString.
Notice: this returns a Text object with superscript digits,
which requires a font capapble of displaying it correctly.
Also: the returned string is not meant to be read back - purely for GUIs

Usage example(s):

     1.23456 printString            -> '1.23456'
     1.23456 printStringScientific  1.23456×10^0 (with superscript zero at end)
     1.23e14 printStringScientific  1.23×10^14   (with superscript zero at end)
     PrintfScanf printf:'%e' argument:1.23456 -> '1.23456e0' 
     PrintfScanf printf:'%g' argument:1.23456 -> '1.23456' 
     PrintfScanf printf:'%f' argument:1.23456 -> '1.23456' 
     PrintfScanf printf:'%e' argument:1.234   -> '1.234e0' 
     PrintfScanf printf:'%g' argument:1.234   -> '1.234' 
     PrintfScanf printf:'%f' argument:1.234   -> '1.234'

printStringWithFormat: format

return a printed representation of the receiver;
fmt must be of the form: .nn, where nn is the number of digits.
To print 6 valid digits, use printStringWithFormat:'.6'
For Floats, the default used in printString, is 15 (because its a double);
for ShortFloats, it is 6 (because it is a float)

Usage example(s):

     Float pi printStringWithFormat:'.20'              => '3.141592653589793116'
     Float pi asQuadFloat printStringWithFormat:'.20'  => '3.14159265358978956320'

private accessing

digitBytes: bytesLSB

queries

decimalEmax

Answer how many digits of exponent-accuracy this class supports

Usage example(s):

     1.0 asShortFloat emax   
     1.0 asShortFloat decimalEmax  

     1.0 asFloat emax  
     1.0 asFloat emin
     1.0 asFloat decimalEmax  

     1.0 asLongFloat emax        
     1.0 asLongFloat emin        
     1.0 asLongFloat decimalEmax

decimalPrecision

Answer how many significant decimal digits (accuracy) this instance supports

Usage example(s):

     1.0 asShortFloat decimalPrecision -> 7
     1.0 asFloat decimalPrecision      -> 15
     1.0 asLongFloat decimalPrecision  -> 19   
     1.0 asQDouble decimalPrecision    -> 61
     1.0 asLargeFloat decimalPrecision -> 15
     (1.0 asLargeFloatPrecision:200) decimalPrecision -> 60
     (1.0 asLargeFloatPrecision:400) decimalPrecision -> 120
     1.0 asQuadFloat decimalPrecision  -> 34
     1.0 asOctaFloat decimalPrecision  -> 71

     1.0 asIEEEFloat decimalPrecision               -> 15
     (1.0 asIEEEFloat:128) decimalPrecision         -> 34
     (1.0 asIEEEFloat:256) decimalPrecision         -> 71
     (1.0 asIEEEFloat:512) decimalPrecision         -> 148
     (1.0 asIEEEFloat:1024) decimalPrecision        -> 302
     1.0 asLongFloat asIEEEFloat decimalPrecision   -> 15
     1.0 asShortFloat asIEEEFloat decimalPrecision  -> 15

defaultPrintPrecision

the default number of digits when printing

Usage example(s):

     1.0 asFloat defaultPrintPrecision        15
     1.0 asLongFloat defaultPrintPrecision    19
     1.0 asShortFloat defaultPrintPrecision   6
     1.0 asQDouble defaultPrintPrecision      60
     1.0 asQuadFloat defaultPrintPrecision    30
     1.0 asOctaFloat defaultPrintPrecision    70
     (1.0 asLargeFloatPrecision:100) defaultPrintPrecision   29
     (1.0 asLargeFloatPrecision:200) defaultPrintPrecision   59
     (1.0 asLargeFloatPrecision:300) defaultPrintPrecision   79

defaultPrintfPrecision

the default number of digits when printing with printf's %f format.
Notice, that the C-language standard states that this should be 6;
however, we can adjust it on a per-class basis.

eBias

Answer the exponent's bias;
that is the offset of the zero exponent when stored
(i.e. the real exponent is exponentBits - eBias).
This is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

Usage example(s):

     1.0 numBitsInExponent 11
     1.0 eBias             1023
     1.0 emin              -1022
     1.0 emax              1023
     1.0 fmin              2.2250738585072E-308
     1.0 fmax              1.79769313486232E+308

Usage example(s):

     1.0 asLongFloat numBitsInExponent 15
     1.0 asLongFloat eBias             16383
     1.0 asLongFloat emin              -16382
     1.0 asLongFloat emax              16383
     1.0 asLongFloat fmin              3.362103143112093506E-4932
     1.0 asLongFloat fmax              1.189731495357231765E+4932

Usage example(s):

     1.0 asShortFloat numBitsInExponent 8
     1.0 asShortFloat eBias             127
     1.0 asShortFloat emin              -126
     1.0 asShortFloat emax              127
     1.0 asShortFloat fmin              1.175494e-38
     1.0 asShortFloat fmax              3.402823e+38

Usage example(s):

     1.0 asQuadFloat numBitsInExponent 15
     1.0 asQuadFloat eBias             16383
     1.0 asQuadFloat emin              -16382
     1.0 asQuadFloat emax              16383
     1.0 asQuadFloat fmin              
     1.0 asQuadFloat fmax

Usage example(s):

     1.0 asIEEEFloat numBitsInExponent 15
     1.0 asIEEEFloat eBias             16383
     1.0 asIEEEFloat emin              -16382
     1.0 asIEEEFloat emax              16383
     1.0 asIEEEFloat fmin              
     1.0 asIEEEFloat fmax

emax

The largest exponent value allowed by instances like this.
The computation below assumes standard IEEE format.
This is also implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

Usage example(s):

     Float emax       -> 1023
     ShortFloat emax  -> 127
     LongFloat emax   -> 16383
     QuadFloat emax   -> 16383
     OctaFloat emax   -> 262143
     QDouble emax     -> 1023

emin

The smallest exponent value allowed by (normalized) instances of this class.
The computation below assumes standard IEEE format.
This is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

epsilon

exponent

generic; assumes IEEE float

Usage example(s):

     1.0 exponent               1
     1.0 xexponent              1
     0.0 exponent               0
     0.0 xexponent              0
     Float fmin exponent        -1021
     Float fmin xexponent       -1021
     (Float fmin / 2) exponent  -1022
     (Float fmin / 2) xexponent -1022
     (Float fmin / 4) exponent  -1023
     (Float fmin / 4) xexponent -1023
     (Float fmin / 32) exponent  -1026
     (Float fmin / 32) xexponent -1026
     (Float fminDenormalized) exponent  -1073
     (Float fminDenormalized) xexponent -1073

     Float NaN exponent
     Float infinity exponent

exponentBits

extract the biased exponentBits.
Assumes that subclasses are IEEE based (or at least can provide
an IEEE compatible byteArray for themself

Usage example(s):

     0.0 mantissaBits  0  
     0.0 exponentBits  0

     1.0 mantissaBits hexPrintString            -> '0' 
     1.0 mantissaWithHiddenBits hexPrintString  -> '10000000000000' 
     1.0 exponentBits                           -> 1023 16r3FF 

     2.0 mantissaBits hexPrintString            -> '0' 
     2.0 mantissaWithHiddenBits hexPrintString  -> '10000000000000' 
     2.0 exponentBits                           -> 1024 16r400

     3.0 mantissaBits hexPrintString            -> '8000000000000'
     3.0 mantissaWithHiddenBits hexPrintString  -> '18000000000000'
     3.0 exponentBits                           -> 1024 16r400

     4.0 mantissaBits hexPrintString            -> '0'
     4.0 exponentBits                           -> 1025 16r401

     5.0 mantissaBits hexPrintString            -> '4000000000000'
     5.0 exponentBits                           -> 1025 16r401

    -5.0 mantissaBits hexPrintString            -> '4000000000000'
    -5.0 exponentBits                           -> 1025 16r401

     0.1 mantissaBits hexPrintString '1999999999999A'
     0.1 exponentBits                 1019 16r3FB

     0.3 mantissaBits hexPrintString '13333333333333'
     0.3 exponentBits                 1021 16r3FD

     0.3 asShortFloat mantissaBits    10066330 16r99999A
     0.3 asShortFloat exponentBits    125 16r7D

     0.3 asLongFloat mantissaBits     11068046444225730560 16r9999999999999800
     0.3 asLongFloat exponentBits     16381 16r3FFD

     0.3 asQDouble mantissaBits  

     Float fmin exponentBits   1
     Float fminDenormalized exponentBits

fmax

the largest finite value which can be represented
by normalized instances of this class;
this is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

fmaxDenormalized

the largest denormalized value which can be represented
by instances of this class.
This is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

fmin

the smallest non-zero value which can be represented
by normalized instances of this class;
this is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

fminDenormalized

the smallest non-zero value which can be represented by instances of this class;
this is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.

fractionalPart

This has been renamed to #fractionPart for ST80 compatibility.

extract the after-decimal fraction part.
the floats value is
float truncated + float fractionalPart

** This is an obsolete interface - do not use it (it may vanish in future versions) **

hasIEEEFormat

HalfFloat isIEEEFormat true
ShortFloat isIEEEFormat true
Float isIEEEFormat true
LongFloat isIEEEFormat true
QuadFloat isIEEEFormat true
OctaFloat isIEEEFormat true
QDouble isIEEEFormat false
LargeFloat isIEEEFormat false

mantissa

extract a float's mantissa (as Float).
That is a float of the same type as the receiver,
such that:
(f mantissa) * (2 ^ f exponent) = f
This assumes that the mantissa is normalized to 0.5 .. 1.0

** This method must be redefined in concrete classes (subclassResponsibility) **

mantissaBits

extract a float's mantissaBits (excl. any hidden bit).
I.e. this returns the normalized mantissaBits as an integer.
Assumes that subclasses are IEEE based (or at least can provide
an IEEE compatible byteArray for themself

Usage example(s):

     0.0 mantissaBits     
     1.0 mantissaBits  hexPrintString -> '0' 
     2.0 mantissaBits  hexPrintString -> '0' 
     3.0 mantissaBits  hexPrintString -> '8000000000000'
     4.0 mantissaBits  hexPrintString -> '0'
     5.0 mantissaBits  hexPrintString  -> '4000000000000'
     10.0 mantissaBits  hexPrintString -> '4000000000000'
     0.1 mantissaBits  hexPrintString -> '999999999999A'
     0.3 mantissaBits  hexPrintString -> '3333333333333'

     10.0 asShortFloat mantissaBits hexPrintString  -> '200000'
     10.0 asLongFloat mantissaBits hexPrintString   -> 'A000000000000000'

     10.0 mantissaWithHiddenBits hexPrintString               -> '14000000000000'
     10.0 asShortFloat mantissaWithHiddenBits hexPrintString  -> 'A00000'
     10.0 asLongFloat mantissaWithHiddenBits hexPrintString   -> 'A000000000000000'
     0.3 asShortFloat mantissaBits    -> 1677722               16r19999A
     0.3 asLongFloat mantissaBits     -> 29514790517935282176  16r19999999999999800

mantissaWithHiddenBits

extract a float's mantissaBits (incl. any hidden bit).
I.e. this returns the denormalized mantissaBits

Usage example(s):

     0.0 mantissaBits                    0
     0.0 mantissaWithHiddenBits          0

     1.0 mantissaBits  hexPrintString -> '0' 
     1.0 mantissaWithHiddenBits hexPrintString -> '10000000000000' 

     2.0 mantissaBits  hexPrintString -> '0'
     2.0 mantissaWithHiddenBits  hexPrintString -> '10000000000000'

     0.1 mantissaBits  hexPrintString -> '999999999999A'
     0.1 mantissaWithHiddenBits  hexPrintString -> '1999999999999A'
     0.3 mantissaBits  hexPrintString -> '3333333333333'
     0.3 mantissaWithHiddenBits  hexPrintString -> '13333333333333'

     10.0 mantissaWithHiddenBits hexPrintString               -> '14000000000000'   / 2r10100000000000000000000000000000000000000000000000000
     10.0 asShortFloat mantissaWithHiddenBits hexPrintString  -> 'A00000'           / 2r101000000000000000000000
     10.0 asLongFloat mantissaWithHiddenBits hexPrintString   -> 'A000000000000000' / 2r1010000000000000000000000000000000000000000000000000000000000000
     10.0 asQuadFloat mantissaWithHiddenBits hexPrintString   -> 'A000000000000000' / 2r1010000000000000000000000000000000000000000000000000000000000000
     0.3 asShortFloat mantissaBits    -> 1677722               16r19999A
     0.3 asLongFloat mantissaBits     -> 29514790517935282176  16r19999999999999800

     Float fminDenormalized mantissaWithHiddenBits

nextFloat

answer the next representable float after myself

Usage example(s):

     (1.0 nextFloat) storeString     
     (1.0 asShortFloat nextFloat) storeString     
     (67329.234 nextFloat) storeString
     (67329.234 asShortFloat nextFloat) storeString
     (10000000000.0 nextFloat) storeString
     (10000000000.0 asShortFloat nextFloat) storeString

nextFloat: nUlps

answer the next representable float nUlps after myself

** This method must be redefined in concrete classes (subclassResponsibility) **

numBitsInExponent

answer the number of bits in the exponent
11 for double precision:
seeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
8 for single precision:
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
15 for long floats (x86):
00000000 00000000 seeeeeee eeeeeeee immmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
15 for long floats (sparc):
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
15 for quad floats:
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
15 for octuple floats:
seeeeeee eeeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
other for LargeFloats

numBitsInMantissa

answer the number of bits in the mantissa (the significant) of my instances
any hidden bits are not counted.
11 for half precision:
seeeemmm mmmmmmmm
23 for single precision:
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
52 for double precision:
seeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
64 for longfloat precision (x86):
00000000 00000000 seeeeeee eeeeeeee immmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
112 for longfloat precision (sparc):
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
112 for quadfloat precision:
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...

Usage example(s):

     1.0 numBitsInMantissa
     1.0 asShortFloat numBitsInMantissa
     1.0 asLongFloat numBitsInMantissa

numHiddenBits

answer the number of bits in the integer part of the mantissa.
Most floating point formats are normalized to get rid of the extra bit.
(i.e. except for LongFloats and LargeFloats,
instances are normalized to exclude any integer bit

precision

previousFloat

answer the previous representable float after myself

Usage example(s):

     (1.0 previousFloat) storeString     
     (1.0 asShortFloat previousFloat) storeString     
     (67329.234 previousFloat) storeString
     (67329.234 asShortFloat previousFloat) storeString
     (10000000000.0 previousFloat) storeString
     (10000000000.0 asShortFloat previousFloat) storeString

radix

answer the radix of the exponent
Typically, but not required to be, this will be 2
(as floats ary usually represented as IEEE binary floats)

size

redefined since reals are kludgy (ByteArry)

ulp

answer the distance between me and the next representable number;
One exception here: for fmax, the distance to the previous float is returned

Usage example(s):

     (1.0 nextFloat:1) storeString     
     (1.0 ulp) storeString     
     (10.0 nextFloat:1) storeString     
     (10.0 ulp) storeString     
     (-10.0 nextFloat:1) storeString     
     (-10.0 ulp) storeString     
     (-10.0 nextFloat:-1) storeString     

     (67329.234 nextFloat:1) storeString
     (67329.234 ulp) storeString
     (67329.234 asShortFloat nextFloat:1) storeString
     (67329.234 asShortFloat ulp) storeString
     Float NaN nextFloat:100000
     Float infinity nextFloat:100000

     1.0 ulp                         -> 2.22044604925031E-16
     10000000000000000000000.0 ulp   -> 2097152.0
     34.543 ulp storeString          -> '7.1054273576010019E-15'
     -34.543 ulp storeString         -> '7.1054273576010019E-15'
     Float NaN ulp                   -> nan
     0.0 ulp                         -> 4.94065645841247E-324
     0.0 asShortFloat ulp            -> 1.401298e-45
     Float infinity ulp              -> nan
     Double fmax previousFloat ulp   -> 1.99584030953472E+292
     Double fmax ulp                 -> 1.99584030953472E+292
     Double fmin ulp                 -> 4.94065645841247e-324
     Double NaN ulp                  -> nan

special access

partValues: aBlock

invoke aBlock with sign, exponent and abs(mantissa)

Usage example(s):

     1.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     2.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     -1.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     -2.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].

     1.0 asShortFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     1.0 asLongFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     1.0 asLargeFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].

testing

isFinite

return true, if the receiver is a finite float (not NaN and not +/-INF)

** This method must be redefined in concrete classes (subclassResponsibility) **

isFloat

return true, if the receiver is some kind of floating point number;
true is returned here.
Same as #isLimitedPrecisionReal, but a better name ;-)

isInfinite

return true, if the receiver is an infinite float (+Inf or -Inf).
These are not created by ST/X float operations (they raise an exception);
however, inline C-code could produce them.

Usage example(s):

        1.0 isInfinite
        (0.0 uncheckedDivide: 0.0) isInfinite
        (1.0 uncheckedDivide: 0.0) isInfinite

isLimitedPrecisionReal

return true, if the receiver is some kind of limited precision real (i.e. floating point) number;
true is returned here - the method is redefined from Object.

isNaN

return true, if the receiver is an invalid float (NaN - not a number).
These are usually not created by ST/X float operations (they raise an exception);
however, inline C-code or proceeded exceptions or reading from a stream
could produce them.

** This method must be redefined in concrete classes (subclassResponsibility) **

isNegativeZero

many systems have two float.Pnt zeros

Usage example(s):

     0.0 asLongFloat isNegativeZero     
     -0.0 asLongFloat isNegativeZero       
     -1.0 asLongFloat isNegativeZero       
     1.0 asLongFloat isNegativeZero       

     0.0 asLargeFloat isNegativeZero     
     -0.0 asLargeFloat isNegativeZero

numberOfBits

return the size (in bits) of the real;
typically, this is 64 for Floats and 32 for ShortFloats,
but who knows ...

** This method must be redefined in concrete classes (subclassResponsibility) **

positive

return true if the receiver is greater or equal to zero (not negative)

sign

return the sign of the receiver (-1, 0 or 1)

Usage example(s):

     -1.0 sign
     -0.0 sign
     1.0 sign
     0.0 sign
     Infinity infinity sign
     Infinity infinity negated sign

truncation & rounding

ceilingAsFloat

for protocol compatibility with floats;
returns the smallest integer which is greater or equal to the receiver as a float

Usage example(s):

     0.4 asLongFloat ceilingAsFloat

floorAsFloat

for protocol compatibility with floats;
returns the receiver truncated towards negative infinity as a float

Usage example(s):

     0.4 asLongFloat floorAsFloat

integerAndFractionParts

return the integer and the fraction part of the receiver as a pair
of floats (i.e. the result of the modf function).
Adding the parts gives the original value

integerPart

return a float with value from digits before the decimal point
(i.e. the truncated value)

Usage example(s):

     1234.56789 integerPart  
     1.2345e6 integerPart
     12.5 integerPart
     -12.5 integerPart
     (5/3) integerPart
     (-5/3) integerPart
     (5/3) truncated
     (-5/3) truncated

roundedAsFloat

for protocol compatibility with floats;
returns the receiver rounded to the nearest integer as a float

truncatedAsFloat

return the receiver truncated towards zero as a long float.
This is much like #truncated, but avoids a (possibly expensive) conversion
of the result to an integer.
It may be useful, if the result is to be further used in another
float-operation.

Usage example(s):

     0.4 asLongFloat truncatedAsFloat

truncatedToPrecision

truncates to the precision of the float.
This is slightly different from truncated.
Taking for example 1e32,
the printed representation will be 1e32,
but the actual value, when truncating to an integer
would be 100000003318135351409612647563264.

This is due to the inaccuracy in the least significant bits,
and the way the print-converter compensates for this.
This method tries to generate an integer value which corresponds
to what is seen in the float's printString.

Here, a slow fallback (generating and rescanning the printString)
is provided, which should work on any float number.
Specialized versions in subclasses may be added for more performance
(however, this is probably only used rarely)

Usage example(s):

     1e32 asShortFloat truncated
     1e32 asShortFloat truncatedToPrecision
     1.234e10 asShortFloat truncatedToPrecision
     1234e-1 asShortFloat truncatedToPrecision

     1e32 truncated
     1e32 truncatedToPrecision
     1.234e10 truncatedToPrecision
     1234e-1 truncatedToPrecision

     1e32 asLongFloat truncated
     1e32 asLongFloat truncatedToPrecision
     1.234e10 asLongFloat truncatedToPrecision
     1234e-1 asLongFloat truncatedToPrecision

visiting

acceptVisitor: aVisitor with: aParameter: dispatch for visitor pattern; send #visitFloat:with: to aVisitor.

ST/X 7.7.0.0; WebServer 1.702 at 20f6060372b9.unknown:8081; Sun, 10 Aug 2025 12:55:01 GMT

Smalltalk/X Webserver

Documentation of class 'LimitedPrecisionReal':

Class: LimitedPrecisionReal

Inheritance:

Description:

copyright

Class protocol:

Instance protocol: