eXept Software AG Logo

Smalltalk/X Webserver

Documentation of class 'LimitedPrecisionReal':

Home

Documentation
www.exept.de
Everywhere
for:
[back]

Class: LimitedPrecisionReal


Inheritance:

   Object
   |
   +--Magnitude
      |
      +--ArithmeticValue
         |
         +--Number
            |
            +--LimitedPrecisionReal
               |
               +--Float
               |
               +--LongFloat
               |
               +--QDouble
               |
               +--QuadFloat
               |
               +--ShortFloat

Package:
stx:libbasic
Category:
Magnitude-Numbers
Version:
rev: 1.132 date: 2019/07/19 15:26:42
user: cg
file: LimitedPrecisionReal.st directory: libbasic
module: stx stc-classLibrary: libbasic
Author:
Claus Gittinger

Description:


Abstract superclass for any-precision floating point numbers (i.e. IEEE floats and doubles).

Short summary for beginners (find details in wikipedia):
========================================================

Floating point numbers are represented with a mantissa and an exponent, and the number's value is: 
    mantissa * (2 raisedTo: exponent) 
with (1 > mantissa >= 0) and exponent adjusted as required for the mantissa to be in that range
(so called ''normalized'')

therefore,
    13 asFloat mantissa -> 0.8125
    13 asFloat exponent ->  4  
    0.8125 * (2 raisedTo:4) -> 13

and:    
    104 asFloat mantissa -> 0.8125
    104 asFloat exponent -> 7  
    0.8125 * (2 raisedTo:7) -> 104

and:    
    0.1 mantissa -> 0.8
    0.1 exponent -> -3  
    0.8 * (2 raisedTo:-3) -> 0.1

however:    
    (1 / 3.0) mantissa -> 0.666666666666667
    (1 / 3.0) exponent -> -1  
    0.666666666666667 * (2 raisedTo:-3) -> 0.1


Danger in using Floats:
=======================

Beginners seem to forget (or never learn?) that flt. point numbers are always APPROXIMATIONs of some value.
You may never ever use them when exact results are neeed (i.e. when computing money!)
Take a look at the FixedPoint class for that.
See also 'Float comparison' below.


The Float/Double confusion in ST/X:
===================================

Due to historic reasons, ST/X's Floats are what Doubles are in VisualWorks.

The reason is that in some Smalltalks, double floats are called Float, and no single float exists (VSE, V'Age),
whereas in others, there are both Float and Double classes (VisualWorks).
In order to allow code from both families to be loaded into ST/X without a missing class error, and without
loosing precision, we decided to use IEEE doubles as the internal representation of Float 
and make Double an alias to it.
This should work for either family (except for the unexpected additional precision in some cases).

If you really only want single precision floating point numbers, use ShortFloat instances.
But be aware that there is usually no advantage (neither in memory usage, due to memory alignment restrictions,
nor in speed), as these days, the CPUs are just as fast doing double precision operations.
(There might be a noticable difference when doing bulk operations, and you should consider using FloatArray for those).


Hardware supported precisions
=============================

The only really portable sizes are IEEE-single and IEEE-double floats (i.e. ShortFloat and Float instances).
These are supported on all architectures.
Some do provide an extended precision floating pnt. number,
however, the downside is that CPU-architects did not agree on a common format and precision: 
some use 80 bits, others 96 and others even 128.
See the comments in the LongFloat class for more details.
We recommend using Float (i.e. IEEE doubles) unless absolutely required,
and care for machine dependencies in the code otherwise.
For higher precision needs, you may also try the new QDouble class, which gives you >200bits (60digits) 
of precision on all machines (at a noticable performance price, though).


Range and Precision of Storage Formats:
=======================================

  Format |   Class    |   Array Class   | Bits / Significant  | Smallest Pos Number | Largest Pos Number | Significant Digits
         |            |                 |      (Binary)       |                     |                    |     (Decimal)
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  half   |     --     | HalfFloatArray  |    16 / 11          |  6.10.... x 10−5    | 6.55...  x 10+5    |      3.3
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  single | ShortFloat | FloatArray      |    32 / 24          |  1.175... x 10-38   | 3.402... x 10+38   |      6-9
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  double | Float      | DoubleArray     |    64 / 53          |  2.225... x 10-308  | 1.797... x 10+308  |     15-17
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
  double | LongFloat  |     --          |   128 / 113         |  3.362... x 10-4932 | 1.189... x 10+4932 |     33-36
  ext    |            |                 |                     |                     |                    |
  (SPARC)|            |                 |                     |                     |                    |
  -------+            |                 |---------------------+---------------------+--------------------+--------------------
  double |            |                 |    96 / 64          |  3.362... x 10-4932 | 1.189... x 10+4932 |     18-21
  ext    |            |                 |                     |                     |                    |
  (x86)  |            |                 |                     |                     |                    |
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
    --   | QDouble    |     --          |   256 / 212         |  2.225... x 10-308  | 1.797... x 10+308  |     >=60
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
    --   | LargeFloat |     --          |     arbitrary       |  arbitrarily small  |  arbitrarily large |     arbitrary
  -------+------------+-----------------+---------------------+---------------------+--------------------+--------------------

HalfFloats are only supported in fixed array containers. 
This was added for OpenGL and other graphic libraries which allow for texture, 
and vertex data to be passed quickly in that format (see http://www.opengl.org/wiki/Small_Float_Formats).

Long- and LargeFloat are not supported as array containers.
These formats are seldom used for bulk data.

QDoubles are special soft floats; slower in performance, but providing 4 times the precision of regular doubles.

To see the differences in precision:
    
    '%60.58f' printf:{ 1 asShortFloat exp } -> '2.718281828459045*090795598298427648842334747314453125'          (32 bits)
    '%60.58f' printf:{ 1 asFloat exp }      -> '2.718281828459045*090795598298427648842334747314453125'          (64 bits)
    '%60.58f' printf:{ 1 asLongFloat exp }  -> '2.718281828459045235*4281681079939403389289509505033493041992'   (only 80 valid bits on x86)
    
    '%60.58f' printf:{ 1 asQDouble exp }    -> '2.71828182845904523536028747135266249775724709369995957496698'   (>200 bits)

    correct value is:                           2.71828182845904523536028747135266249775724709369995957496696762772407663035354759457138217852516642742746

Bulk Containers:
================
If you have a vector or matrix (and especially: large ones) of floating point numbers, the well known
Array is a very inperformant choice. The reason is that it keeps pointers to each of its elements, and each element
(if it is a float) is itself stored somewhere in the object memory.
Thus, there is both a space overhead (every float object has an object header, for class and other information), and
also a performance overhead (extra indirection, cache misses and alignment inefficiencies).
For this, the bulk numeric containers are provided, which keep the elements unboxed and properly aligned.
Use them for matrices and large numeric vectors. They also provide some optimized bulk operation methods,
such as adding, multiplying etc.
Take a look at FloatArray, DoubleArray, HalfFloatArray etc.


Comparing Floats:
=================
Due to rounding errors (usually on the last bit(s)), you shalt not compare two floating point numbers
using the #= operator. For example, the value 0.1 cannot be represented as a sum of powers-of-two fractions,
and will therefore always be an approximation with a half bit error in the last bit of the mantissa.
Usually, the print functions take this into consideration and return a (faked) '0.1'.
However, this half bit error may accumulate, for example, when multiplying that by 0.1 then by 100, 
the error may get large enough to be no longer pushed under the rug by the print function, 
and you will get '0.9999999999999' from it.

Also, comparing against a proper 1.0 (which is representable as an exact power of 2), 
you will get a false result.
i.e. (0.1 * 0.1 * 100 ~= 1.0) and (0.1 * 0.1 * 100 - 1.0) ~= 0.0
This often confuses non-computer scientists (and occasionally even some of those).

For this, you should always provide an epsilon value, when comparing two non-integer numbers. 
The epsilon value is the distance you accept two number to be apart to be still considered equal. 
Effectively the epsilon says are those nearer than this epsilon?.

Now we could say is the delta between two numbers smaller than 0.00001,
and get a reasonable answer for big numbers. But what if we compare two tiny numbers?
Then a reasonable epsilon must also be much smaller!

Actually, the epsilon should always be computed dynamically depending on the two values compared.
That is what the #isAlmostEqualTo:nEpsilon: method does for you. It does not take an absolute epsilon,
but instead the number of distinct floating point numbers that the two compared floats may be apart.
That is: the number of actually representable numbers between those two. 
Effectively, that is the difference between the two mantissas, 
when the numbers are scaled to the same exponent, taking the number of mantissa bits into account.


Related information:

    Fraction
    FixedPoint

Class protocol:

class initialization
o  initialize
initialize ANSI compliant float globals

usage example(s):

     self initialize

constants & defaults
o  NaN
return the constant NaN (not a Number) in my representation.
Here, based on the assumption that division of zero by zero generates a NaN
(which is defined as sch in the IEEE standard).
If a subclass does not, it needs to generate a NaN differently

usage example(s):

      ShortFloat NaN  
      Float NaN       
      LongFloat NaN   
      LargeFloat NaN   

o  computeEpsilon
compute the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)

usage example(s):

      Float radix
      Float precision
      
      Float computeEpsilon    
      ShortFloat computeEpsilon  
      LongFloat computeEpsilon   
      QDouble epsilon   

o  emax
return the largest exponent

usage example(s):

     Float emax
     ShortFloat emax

o  emin
return the smallest exponent

** This method raises an error - it must be redefined in concrete classes **

o  epsilon
return the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)

usage example(s):

     Float epsilon       -> 2.22044604925031E-16
     ShortFloat epsilon  -> 1.192093e-07
     LongFloat epsilon   -> 1.084202172485504434E-19
     QDouble epsilon     -> 1.21543267145725E-63

o  fmax
The largest value allowed by instances of this class.

usage example(s):

     Float fmax      
     ShortFloat fmax 
     LongFloat fmax  
     QDouble fmax 

o  fmin
The smallest value allowed by instances of this class.

** This method raises an error - it must be redefined in concrete classes **

o  infinity
return an instance of myself which represents positive infinity (for my instances).
Warning: do not compare equal against infinities;
instead, check using isFinite or isInfinite

usage example(s):

      ShortFloat infinity  
      Float infinity       
      LongFloat infinity   
      LargeFloat infinity   

o  maxSmallInteger
answer the largest possible SmallInteger value as instance of myself

usage example(s):

     Float maxSmallInteger.
     LongFloat maxSmallInteger.
     ShortFloat maxSmallInteger.
     QDouble maxSmallInteger.

o  negativeInfinity
return an instance of myself which represents negative infinity (for my instances).
Warning: do not compare equal against infinities;
instead, check using isFinite or isInfinite

usage example(s):

      ShortFloat negativeInfinity   
      Float negativeInfinity       
      LongFloat negativeInfinity   
      LargeFloat negativeInfinity   

instance creation
o  fromInteger: anInteger
return a float with anInteger's value.
Since floats have a limited precision, you usually loose bits when doing this
with a large integer
(i.e. when numDigits is above the flt. pnt number's precision)
(see Float decimalPrecision, LongFloat decimalPrecision.

usage example(s):

     ShortFloat fromInteger:2
     12345678901234567890 asShortFloat            

     1234567890 asFloat                     
     1234567890 asFloat asInteger                    
     -1234567890 asFloat asInteger                    

     12345678901234567890 asFloat storeString            
     12345678901234567890 asFloat asInteger   
     -12345678901234567890 asFloat asInteger   

     12345678901234567890 asLongFloat           
     12345678901234567890 asLongFloat asInteger 
     -12345678901234567890 asLongFloat asInteger 

     123456789012345678901234567890 asLongFloat           
     123456789012345678901234567890 asLongFloat asInteger  
     -123456789012345678901234567890 asLongFloat asInteger  

     1234567890123456789012345678901234567890 asLongFloat           
     1234567890123456789012345678901234567890 asLongFloat asInteger  
     -1234567890123456789012345678901234567890 asLongFloat asInteger

     'this test is on 65 bits'.
     self assert: 16r1FFFFFFFFFFFF0801 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
     'this test is on 64 bits'.
     self assert: 16r1FFFFFFFFFFFF0802 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
     'nearest even is upper'.
     self assert: 16r1FFFFFFFFFFF1F800 asDouble = 16r1FFFFFFFFFFF20000 asDouble.
     'nearest even is lower'.
     self assert: 16r1FFFFFFFFFFFF0800 asDouble = 16r1FFFFFFFFFFFF0000 asDouble.

o  fromLimitedPrecisionReal: anLPReal
return a float with anLPReals value.
You might loose bits when doing this.
Slow fallback.

o  fromNumerator: numerator denominator: denominator
Create a limited precision real from a Rational.
This version will answer the nearest flotaing point value,
according to IEEE 754 round to nearest even default mode

usage example(s):

        Time millisecondsToRun:[
            1000000  timesRepeat:[
                Float fromNumerator:12345678901234567890 denominator:987654321
            ].
        ]

        |fraction|
        fraction := 12345678901234567890//987654321.
        Time millisecondsToRun:[
            1000000  timesRepeat:[
                fraction asFloat
            ].
        ]

o  new: aNumber
catch this message - not allowed for floats/doubles

o  readFrom: aStringOrStream onError: exceptionBlock
read a float from a string

usage example(s):

     Float readFrom:'.1'
     Float readFrom:'0.1'
     Float readFrom:'0'

     ShortFloat readFrom:'.1'
     ShortFloat readFrom:'0.1'
     ShortFloat readFrom:'0'

     LongFloat readFrom:'.1'
     LongFloat readFrom:'0.1'
     LongFloat readFrom:'0'

     LimitedPrecisionReal readFrom:'bla' onError:nil
     Float readFrom:'bla' onError:nil
     ShortFloat readFrom:'bla' onError:nil

queries
o  decimalPrecision
return the number of valid decimal digits

usage example(s):

     ShortFloat decimalPrecision -> 7
     Float decimalPrecision      -> 16
     LongFloat decimalPrecision  -> 19

o  defaultPrintPrecision
return the number of decimal digits printed by default

usage example(s):

     ShortFloat defaultPrintPrecision 
     Float defaultPrintPrecision      
     LongFloat defaultPrintPrecision  

o  denormalized
Return whether the instances of this class can
represent values in denormalized format.

o  exactDecimalPrecision
return the exact number of decimal digits

usage example(s):

     ShortFloat exactDecimalPrecision -> 7.224719895935548004
     Float exactDecimalPrecision  -> 15.95458977019100184      
     LongFloat exactDecimalPrecision -> 19.26591972249479468

o  hasSharedInstances
return true if this class has shared instances, that is, instances
with the same value are identical.
Although not really shared, floats should be treated
so, to be independent of the implementation of the arithmetic methods.

o  isAbstract
Return if this class is an abstract class.
True is returned for LimitedPrecisionReal here; false for subclasses.

usage example(s):

     1.0 class isAbstract

o  isIEEEFormat
return true, if this machine represents floats in IEEE format.
Currently, no support is provided for non-ieee machines
to convert their floats into this (which is only relevant,
if such a machine wants to send floats as binary to some other
machine).
Machines with non-IEEE format are VAXed and IBM370-type systems
(among others). Today, most systems use IEEE format floats.

o  numBitsInExponent
return the number of bits in the exponent

** This method raises an error - it must be redefined in concrete classes **

o  numBitsInIntegerPart
answer the number of bits in the integer part of the mantissa.
Most floating point formats are normalized to get rid of the extra bit.

o  numBitsInMantissa
return the number of bits in the mantissa
(typically 1 less than the precision due to the hidden bit)

** This method raises an error - it must be redefined in concrete classes **

o  precision
return the number of valid mantissa bits

usage example(s):

      HalfFloatArray precision  
      ShortFloat precision  
      Float precision       
      LongFloat precision   
      QDouble precision  

o  radix
return the radix (base)

** This method raises an error - it must be redefined in concrete classes **


Instance protocol:

accessing
o  at: index
redefined to prevent access to individual bytes in a real.

o  at: index put: aValue
redefined to prevent access to individual bytes in a real

arithmetic
o  * aNumber
return the product of the receiver and the argument.

o  + aNumber
return the sum of the receiver and the argument, aNumber

o  - aNumber
return the difference of the receiver and the argument, aNumber

o  / aNumber
return the quotient of the receiver and the argument, aNumber

o  // aNumber
return the integer quotient of dividing the receiver by aNumber with
truncation towards negative infinity.

o  ceiling

o  floor

o  timesTwoPower: anInteger
multiply self by a power of two.
Implementation takes care of preserving class and avoiding overflow/underflow
Thanks to Nicolas Cellier for this code

usage example(s):

     (1 asShortFloat timesTwoPower: 3) class = ShortFloat.
     (1 asLongFloat timesTwoPower: 1024).
     (1 asFloat timesTwoPower: -1024) timesTwoPower: 1024.
     (1 asLongFloat timesTwoPower: -1024) timesTwoPower: 1024.

     (2.0 asShortFloat timesTwoPower: -150) timesTwoPower: 150    
     (2.0 asLongFloat timesTwoPower: -150) timesTwoPower: 150   
     (2.0 asFloat timesTwoPower: -150) timesTwoPower: 150       

     (2.0 asShortFloat timesTwoPower: -149) timesTwoPower: 149  
     (2.0 asLongFloat timesTwoPower: -149) timesTwoPower: 149    
     (2.0 asFloat timesTwoPower: -149) timesTwoPower: 149        

     Time millisecondsToRun:[
        1000000 timesRepeat:[
            (2.0 timesTwoPower: 150)
        ]
     ]  

bytes access
o  digitBytes
answer the float's digit bytes im IEEE format.
Use the native machine byte ordering.

usage example(s):

        Float pi digitBytes
        ShortFloat pi digitBytes

o  digitBytesMSB: msb
answer the float's digit bytes im IEEE format.
If msb == true, use MSB byte order, otherwise LSB byte order.

usage example(s):

        Float pi digitBytesMSB:false
        Float pi digitBytesMSB:true
        ShortFloat pi digitBytesMSB:false
        ShortFloat pi digitBytesMSB:true

coercing & converting
o  asFloat

o  asFraction
Answer a rational number (Integer or Fraction) representing the receiver.
This conversion uses the continued fraction method to approximate
a floating point number.
In contrast to #asTrueFraction, which returns exactly the value of the float,
this rounds in the last significant bit of the floating point number.

usage example(s):

     1.1 asFraction      
     1.2 asFraction      
     0.3 asFraction   
     0.5 asFraction  
     (1/5) asFloat asFraction  
     (1/8) asFloat asFraction  
     (1/13) asFloat asFraction 
     (1/10) asFloat asFraction 
     (1/10) asFloat asTrueFraction asFixedPoint scale:20 
     3.14159 asFixedPoint scale:20        
     3.14159 storeString       
     3.14159 asFraction asFloat storeString       
     1.3 asFraction            
     1.0 asFraction            
     1E6 asFraction            
     1E-6 asFraction            

o  asInteger
return an integer with same value - might truncate

usage example(s):

     12345.0 asInteger     
     1e15 asInteger        
     1e33 asInteger asFloat
     1e303 asInteger asFloat

o  asLargeFloat
return a large float with (approximately) my value.
If the largeFloat class is not present, a regular float is returned

o  asLargeFloatPrecision: n
return a large float with (approximately) my value.
If the largeFloat class is not present, a regular float is returned

usage example(s):

     1.0 asLargeFloatPrecision:10

o  asLimitedPrecisionReal
return a float of any precision with same value

o  asLongFloat

o  asQuadFloat
(comment from inherited method)
return a quadFloat with same value

o  asRational
Answer a Rational number--Integer or Fraction--representing the receiver.
Same as asFraction fro st-80 compatibility.

usage example(s):

     1.1 asRational      
     1.2 asRational      
     0.3 asRational   
     0.5 asRational 
     (1/5) asFloat asRational
     (1/8) asFloat asRational  
     (1/13) asFloat asRational 
     3.14159 asRational        
     3.14159 asRational asFloat       
     1.3 asRational  
     1.0 asRational  

o  asShortFloat

o  asTrueFraction
Answer a fraction or integer that EXACTLY represents self,
an any-precision IEEE floating point number, consisting of:
numMantissaBits bits of normalized mantissa (i.e. with hidden leading 1-bit)
optional numExtraBits between mantissa and exponent (normalized flag for ext-real)
numExponentBits bits of 2s complement exponent
1 sign bit.
Taken from Float's asTrueFraction

usage example(s):

(result asFloat = self) ifFalse: [self error: 'asTrueFraction validation failed'].

usage example(s):

     0.3 asFloat asTrueFraction   
     0.3 asShortFloat asTrueFraction  
     0.3 asLongFloat asTrueFraction   

     1.25 asTrueFraction     
     1.25 asShortFloat asTrueFraction     
     0.25 asTrueFraction     
     -0.25 asTrueFraction    
     3e37 asTrueFraction     

     LongFloat NaN asTrueFraction              
     LongFloat infinity asTrueFraction          
     LongFloat negativeInfinity asTrueFraction 

comparing
o  < aNumber
return true, if the argument is greater

double dispatching
o  differenceFromFraction: aFraction
sent when a fraction does not know how to subtract the receiver

o  productFromFraction: aFraction
sent when a fraction does not know how to multiply the receiver

o  quotientFromFraction: aFraction
Return the quotient of the argument, aFraction and the receiver.
Sent when aFraction does not know how to divide by the receiver.

o  sumFromFraction: aFraction
sent when a fraction does not know how to add the receiver

o  sumFromTimestamp: aTimestamp
I am to be interpreted as seconds, return the timestamp this number of seconds
after aTimestamp

usage example(s):

     Timestamp now sumFromTimestamp:aTimestamp   
     100.0 sumFromTimestamp:Timestamp now 

     |t1 t2|
     t1 := Timestamp now. 
     t2 := 1.5 sumFromTimestamp:t1.
     t1 inspect. t2 inspect.

inspecting
o  inspectorExtraAttributes
( an extension from the stx:libtool package )
extra (pseudo instvar) entries to be shown in an inspector.

printing
o  printStringScientific
return a 'user friendly' scientific printString.
Notice: this returns a Text object with emphasis.
Also: the returned string is not meant to be read back - purely for GUIs

usage example(s):

     1.23e14 printStringScientific

queries
o  decimalPrecision
Answer how many digits of accuracy this class supports

usage example(s):

     1.0 asFloat decimalPrecision
     1.0 asLongFloat decimalPrecision
     1.0 asShortFloat decimalPrecision
     1.0 asQDouble decimalPrecision
     1.0 asLargeFloat decimalPrecision

o  defaultNumberOfDigits
Answer how many digits of accuracy this class supports

usage example(s):

        Float new defaultNumberOfDigits
        LongFloat new defaultNumberOfDigits
        ShortFloat new defaultNumberOfDigits
        QDouble new defaultNumberOfDigits

o  exponent
extract a normalized float's exponent.
This is a fallback for systems which do not provide frexp in their math lib,
als also for error reporting (NaN or Inf).
The returned value depends on the float-representation of
the underlying machine and is therefore highly unportable.
This is not for general use.
This assumes that the mantissa is normalized to
0.5 .. 1.0 and the float's value is mantissa * 2^exp

usage example(s):

Extract the sign and the biased exponent 

usage example(s):

     0.3 asFloat exponent  
     0.3 asShortFloat exponent  
     0.3 asLongFloat exponent  

     0.0 exponent2      0
     1.0 exponent2      1
     2.0 exponent2      2
     3.0 exponent2      2
     4.0 exponent2      3
     0.5 exponent2      0
     0.4 exponent2      -1
     0.25 exponent2     -1
     0.00000011111 exponent2  -23 

o  fractionalPart
This has been renamed to #fractionPart for ST80 compatibility.

extract the after-decimal fraction part.
the floats value is
float truncated + float fractionalPart

** This is an obsolete interface - do not use it (it may vanish in future versions) **

o  mantissa
extract a normalized float's mantissa.
This is a fallback for systems which do not provide frexp in their math lib,
als also for error reporting (NaN or Inf).

usage example(s):

     0.3 asFloat mantissa  
     0.3 asShortFloat mantissa  
     0.3 asLongFloat mantissa  
     0.3 asQDouble mantissa  

o  numBitsInExponent
answer the number of bits in the exponent
11 for double precision:
seeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
8 for single precision:
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
15 for long floats (x86):
00000000 00000000 seeeeeee eeeeeeee immmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
15 for long floats (sparc):
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
other for LargeFloats

o  numBitsInIntegerPart
answer the number of bits in the integer part of the mantissa.
Most floating point formats are normalized to get rid of the extra bit.
(i.e. except for LongFloats and LargeFloats,
instances are normalized to exclude any integer bit

o  numBitsInMantissa
answer the number of bits in the mantissa (any hidden bits are not counted).
52 for double precision:
seeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
23 for single precision:
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
64 for longfloat precision (x86):
00000000 00000000 seeeeeee eeeeeeee immmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
112 for longfloat precision (sparc):
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...

usage example(s):

     1.0 numBitsInMantissa
     1.0 asShortFloat numBitsInMantissa
     1.0 asLongFloat numBitsInMantissa

o  precision
return the number of valid mantissa bits.
Should be redefined in classes which allow per-instance precision specification

o  size
redefined since reals are kludgy (ByteArry)

special access
o  partValues: aBlock
invoke aBlock with sign, exponent and abs(mantissa)

usage example(s):

     1.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     2.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     -1.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     -2.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].

     1.0 asShortFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     1.0 asLongFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
     1.0 asLargeFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].

testing
o  isFinite
(comment from inherited method)
return true, if the receiver is finite
i.e. it can be represented as a rational number.

** This method raises an error - it must be redefined in concrete classes **

o  isFloat
return true, if the receiver is some kind of floating point number;
false is returned here.
Same as #isLimitedPrecisionReal, but a better name ;-)

o  isInfinite
return true, if the receiver is an infinite float (Inf).
These are not created by ST/X float operations (they raise an exception);
however, inline C-code could produce them ...

usage example(s):

        1.0 isInfinite
        (0.0 uncheckedDivide: 0.0) isInfinite
        (1.0 uncheckedDivide: 0.0) isInfinite

o  isLimitedPrecisionReal
return true, if the receiver is some kind of limited precision real (i.e. floating point) number;
true is returned here - the method is redefined from Object.

o  isNaN
(comment from inherited method)
return true, if the receiver is an invalid float (NaN - not a number).

** This method raises an error - it must be redefined in concrete classes **

o  isNegativeZero
many systems have two float.Pnt zeros

usage example(s):

     0.0 asLongFloat isNegativeZero     
     -0.0 asLongFloat isNegativeZero       
     -1.0 asLongFloat isNegativeZero       
     1.0 asLongFloat isNegativeZero       

     0.0 asLargeFloat isNegativeZero     
     -0.0 asLargeFloat isNegativeZero       

o  isZero
return true, if the receiver is zero

o  numberOfBits
return the size (in bits) of the real;
typically, this is 64 for Floats and 32 for ShortFloats,
but who knows ...

** This method raises an error - it must be redefined in concrete classes **

o  positive
return true if the receiver is greater or equal to zero (not negative)

o  sign
return the sign of the receiver (-1, 0 or 1)

usage example(s):

        -1.0 sign
        -0.0 sign
        1.0 sign
        0.0 sign

truncation & rounding
o  ceilingAsFloat
0.4 asLongFloat ceilingAsFloat

o  floorAsFloat
0.4 asLongFloat floorAsFloat

o  roundedAsFloat

o  truncatedAsFloat
0.4 asLongFloat truncatedAsFloat

o  truncatedToPrecision
truncates to the precision of the float.
This is slightly different from truncated.
Taking for example 1e32,
the printed representation will be 1e32,
but the actual value, when truncating to an integer
would be 100000003318135351409612647563264.

This is due to the inaccuracy in the least significant bits,
and the way the print-converter compensates for this.
This method tries to generate an integer value which corresponds
to what is seen in the float's printString.

Here, a slow fallback (generating and rescanning the printString)
is provided, which should work on any float number.
Specialized versions in subclasses may be added for more performance
(however, this is probably only used rarely)

usage example(s):

     1e32 asShortFloat truncated
     1e32 asShortFloat truncatedToPrecision
     1.234e10 asShortFloat truncatedToPrecision
     1234e-1 asShortFloat truncatedToPrecision

     1e32 truncated
     1e32 truncatedToPrecision
     1.234e10 truncatedToPrecision
     1234e-1 truncatedToPrecision

     1e32 asLongFloat truncated
     1e32 asLongFloat truncatedToPrecision
     1.234e10 asLongFloat truncatedToPrecision
     1234e-1 asLongFloat truncatedToPrecision

visiting
o  acceptVisitor: aVisitor with: aParameter
dispatch for visitor pattern; send #visitFloat:with: to aVisitor.



ST/X 7.2.0.0; WebServer 1.670 at bd0aa1f87cdd.unknown:8081; Thu, 18 Apr 2024 00:40:28 GMT