|
Class: LimitedPrecisionReal
Object
|
+--Magnitude
|
+--ArithmeticValue
|
+--Number
|
+--LimitedPrecisionReal
|
+--AbstractIEEEFloat
|
+--Float
|
+--HalfFloat
|
+--LargeFloat
|
+--LongFloat
|
+--QDouble
|
+--RaisedNumber
|
+--ShortFloat
- Package:
- stx:libbasic
- Category:
- Magnitude-Numbers
- Version:
- rev:
1.238
date: 2024/01/15 08:48:59
- user: cg
- file: LimitedPrecisionReal.st directory: libbasic
- module: stx stc-classLibrary: libbasic
Abstract superclass for any-precision floating point numbers (i.e. IEEE floats and doubles).
Short summary for beginners (find details in wikipedia):
========================================================
Floating point numbers are represented with a sign,
a mantissa and an exponent, and the number's magnitude is:
mantissa * (2 raisedTo: exponent)
with (1 > mantissa >= 0) and exponent adjusted as required for the mantissa to be in that range
(so called ''normalized'')
therefore,
13 asFloat mantissa -> 0.8125
13 asFloat exponent -> 4
0.8125 * (2 raisedTo:4) -> 13
and:
104 asFloat mantissa -> 0.8125
104 asFloat exponent -> 7
0.8125 * (2 raisedTo:7) -> 104
and:
0.1 mantissa -> 0.8
0.1 exponent -> -3
0.8 * (2 raisedTo:-3) -> 0.1
however:
(1 / 3.0) mantissa -> 0.666666666666667
(1 / 3.0) exponent -> -1
0.666666666666667 * (2 raisedTo:-1) -> 0.333333333333333
Danger in using Floats:
=======================
Beginners seem to forget (or never learn?) that fltoating point numbers
are always APPROXIMATIONs of some value.
You may never ever use them when exact results are neeed (i.e. when computing money!)
Take a look at the ScaledDecimal and FixedDecimal classes for that.
See also 'Float comparison' below.
The Float/Double confusion in ST/X:
===================================
Due to historic reasons, ST/X's Floats are what Doubles are in VisualWorks.
The reason is that in some Smalltalks, double floats are called Float, and no single float exists (VSE, V'Age),
whereas in others, there are both Float and Double classes (VisualWorks).
In order to allow code from both families to be loaded into ST/X without a missing class error, and without
loosing precision, we decided to use IEEE doubles as the internal representation of Float
and make Double an alias to it.
This should work for either family (except for the unexpected additional precision in some cases).
If you really only want single precision floating point numbers, use ShortFloat instances.
But be aware that there is usually no advantage (neither in memory usage, due to memory alignment restrictions,
nor in speed), as these days, the CPUs are just as fast doing double precision operations.
(There might be a noticable difference when doing bulk operations, and you should consider using FloatArray for those).
Hardware supported precisions
=============================
The only really portable sizes are IEEE-single and IEEE-double floats (i.e. ShortFloat and Float instances).
These are supported on all architectures.
Some CPUs provide an extended precision floating point number,
however, the downside is that CPU-architects did not agree on a common format and precision:
some use 80 bits, others 96 and others even 128.
See the comments in the LongFloat class for more details.
We recommend using Float (i.e. IEEE doubles) unless absolutely required,
and care for machine dependencies in the code otherwise.
For higher precision needs, you may also try the new QDouble class, which gives you >200bits (60digits)
of precision on all machines or the software emulated QuadFloat or OctaFloat classes
(all come at a noticable performance price, though).
For very high precision (actually: arbitrary), take a look at the LargeFloat class.
Range and Precision of Storage Formats:
=======================================
Format | Class | Array Class | Bits / Significant | Smallest Pos Number | Largest Pos Number | Significant Digits
| | | (Binary) | | | (Decimal)
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
half | -- | HalfFloatArray | 16 / 11 | 6.10.... x 10−5 | 6.55... x 10+5 | 3.3
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
single | ShortFloat | FloatArray | 32 / 24 | 1.175... x 10-38 | 3.402... x 10+38 | 6-9
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
double | Float | DoubleArray | 64 / 53 | 2.225... x 10-308 | 1.797... x 10+308 | 15-17
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
double | LongFloat | -- | 128 / 113 | 3.362... x 10-4932 | 1.189... x 10+4932 | 33-36
extend.| | | | | |
(SPARC)| | | | | |
-------+ | |---------------------+---------------------+--------------------+--------------------
double | | | 96 / 64 | 3.362... x 10-4932 | 1.189... x 10+4932 | 18-21
extend.| | | | | |
(x86) | | | | | |
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
-- | QDouble | -- | 256 / 212 | 2.225... x 10-308 | 1.797... x 10+308 | >=60
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
-- | QuadFloat | -- | 128 / 113 | 1.054... x 10-4931 | 1.189... x 10+4932 | >=60
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
-- | OctaFloat | -- | 256 / 237 | 3.271... x 10-78913| 1.611... x 10+78913| >=60
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
-- | LargeFloat | -- | arbitrary | arbitrarily small | arbitrarily large | arbitrary
-------+------------+-----------------+---------------------+---------------------+--------------------+--------------------
HalfFloats are only supported in fixed array containers.
This was added for OpenGL and other graphic libraries which allow for texture,
and vertex data to be passed quickly in that format (see http://www.opengl.org/wiki/Small_Float_Formats).
Long- and LargeFloat are not supported as array containers.
These formats are seldom used for bulk data.
QDoubles are special soft floats; slower in performance, but providing 4 times the precision of regular doubles.
To see the differences in precision:
'%60.58f' printf:{ 1 asShortFloat exp } -> '2.718281828459045*090795598298427648842334747314453125' (32 bits)
'%60.58f' printf:{ 1 asFloat exp } -> '2.718281828459045*090795598298427648842334747314453125' (64 bits)
'%60.58f' printf:{ 1 asLongFloat exp } -> '2.718281828459045235*4281681079939403389289509505033493041992' (only 80 valid bits on x86)
'%60.58f' printf:{ 1 asQDouble exp } -> '2.71828182845904523536028747135266249775724709369995957496698' (>200 bits)
correct value is: 2.71828182845904523536028747135266249775724709369995957496696762772407663035354759457138217852516642742746
Bulk Containers:
================
If you have a vector or matrix (and especially: large ones) of floating point numbers, the well known
Array is a very inperformant choice. The reason is that it keeps pointers to each of its elements, and each element
(if it is a float) is itself stored somewhere in the object memory.
Thus, there is both a space overhead (every float object has an object header, for class and other information), and
also a performance overhead (extra indirection, cache misses and alignment inefficiencies).
For this, the bulk numeric containers are provided, which keep the elements unboxed and properly aligned.
Use them for matrices and large numeric vectors. They also provide some optimized bulk operation methods,
such as adding, multiplying etc.
Take a look at FloatArray, DoubleArray, HalfFloatArray etc.
Comparing Floats:
=================
Due to rounding errors (usually on the last bit(s)), you shalt not compare two floating point numbers
using the #= operator. For example, the value 0.1 cannot be represented as a sum of powers-of-two fractions,
and will therefore always be an approximation with a half bit error in the last bit of the mantissa.
Usually, the print functions take this into consideration and return a (faked) '0.1'.
However, this half bit error may accumulate, for example, when multiplying that by 0.1 then by 100,
the error may get large enough to be no longer pushed under the rug by the print function,
and you will get '0.9999999999999' from it.
Also, comparing against a proper 1.0 (which is representable as an exact power of 2),
you will get a false result.
i.e. (0.1 * 0.1 * 100 ~= 1.0) and (0.1 * 0.1 * 100 - 1.0) ~= 0.0
This often confuses non-computer scientists (and occasionally even some of those).
For this, you should always provide an epsilon value, when comparing two non-integer numbers.
The epsilon value is the distance you accept two number to be apart to be still considered equal.
Effectively the epsilon says are those nearer than this epsilon?.
Now we could say is the delta between two numbers smaller than 0.00001,
and get a reasonable answer for big numbers. But what if we compare two tiny numbers?
Then a reasonable epsilon must also be much smaller!
Actually, the epsilon should always be computed dynamically depending on the two values compared.
That is what the #isAlmostEqualTo:nEpsilon: method does for you. It does not take an absolute epsilon,
but instead the number of distinct floating point numbers that the two compared floats may be apart.
That is: the number of actually representable numbers between those two.
Effectively, that is the difference between the two mantissas,
when the numbers are scaled to the same exponent, taking the number of mantissa bits into account.
copyrightCOPYRIGHT (c) 1994 by Claus Gittinger
All Rights Reserved
This software is furnished under a license and may be used
only in accordance with the terms of that license and with the
inclusion of the above copyright notice. This software may not
be provided or otherwise made available to, or used by, any
other person. No title to or ownership of the software is
hereby transferred.
class initialization
-
initialize
-
initialize ANSI compliant float globals
Usage example(s):
constants
-
NaN
-
return the constant NaN (not a Number) in my representation.
Here, based on the assumption that division of zero by zero generates a NaN
(which is defined as such in the IEEE standard).
If a subclass does not, it has to redefine this method and generate a NaN differently
Usage example(s):
ShortFloat NaN
Float NaN
LongFloat NaN
LargeFloat NaN
IEEEFloat NaN
|
-
negativeInfinity
-
return an instance of myself which represents negative infinity (for my instances).
Warning: do not compare equal against infinities;
instead, check using isFinite or isInfinite
Usage example(s):
ShortFloat negativeInfinity
Float negativeInfinity
LongFloat negativeInfinity
LargeFloat negativeInfinity
QDouble negativeInfinity
IEEEFloat negativeInfinity
|
constants & defaults
-
computeEpsilon
-
compute the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit from the
next number after 1.0 and 1.0).
See https://en.wikipedia.org/wiki/Machine_epsilon
Usage example(s):
Float radix
Float precision
ShortFloat computeEpsilon -> 1.192093e-07
Float computeEpsilon -> 2.22044604925031E-16
LongFloat computeEpsilon -> 1.084202172485504434E-19
QDouble computeEpsilon -> 7.77876909732643E-62
QuadFloat computeEpsilon -> 1.92593e-34
OctaFloat computeEpsilon -> 9.05568e-72
QuadFloat radix
(QuadFloat coerce:QuadFloat radix) => 2.00000
2 asQuadFloat => 2.00000
|
-
eBias
-
Answer the exponent's bias;
that is the offset of the zero exponent when stored.
The computation below assumes standard IEEE format
Usage example(s):
Float eBias -> 1023
ShortFloat eBias -> 127
HalfFloat eBias -> 15
LongFloat eBias -> 16383
QuadFloat eBias -> 16383
OctaFloat eBias -> 262143
QDouble eBias -> 1023
LargeFloat eBias -> 0
|
Usage example(s):
1.0 numBitsInExponent 11
1.0 eBias 1023
1.0 emin -1022
1.0 emax 1023
1.0 fmin 2.2250738585072E-308
1.0 fmax 1.79769313486232E+308
|
-
emax
-
The largest exponent value allowed by instances of this class.
The computation below assumes standard IEEE format
Usage example(s):
Float emax -> 1023
ShortFloat emax -> 127
LongFloat emax -> 16383
QuadFloat emax -> 16383
OctaFloat emax -> 262143
QDouble emax -> 1023
|
-
emin
-
The smallest exponent value allowed by (normalized) instances of this class.
The computation below assumes standard IEEE format
Usage example(s):
Float emin -> -1022
ShortFloat emin -> -126
LongFloat emin -> -16382
QuadFloat emin -> -16382
OctaFloat emin -> -262142
QDouble emin -> -1022
|
-
epsilon
-
return the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)
according to ISO C standard;
Ada, C, C++ and Python language constants;
Mathematica, MATLAB and Octave; and various textbooks
see https://en.wikipedia.org/wiki/Machine_epsilon
Usage example(s):
Float epsilon -> 2.22044604925031E-16
ShortFloat epsilon -> 1.192093e-07
LongFloat epsilon -> 1.084202172485504434E-19
QDouble epsilon -> 7.778769097326426826491248689356e-62
|
-
fmax
-
The largest value allowed by instances of this class.
Not required to return an instances of the class,
but may return a double (aka Float) with that value (eg. for HalfFloats)
Usage example(s):
Float fmax -> 1.79769313486232E+308
ShortFloat fmax -> 3.402823e+38
LongFloat fmax -> 1.189731495357231765E+4932
HalfFloat fmax -> 65504.0
QuadFloat fmax -> 1.189731495e4932
OctaFloat fmax -> 1.61132571748e78913
QDouble fmax -> error
(IEEEFloat size:16 exponentSize:5) fmax asFloat 65504.0
|
-
fmaxDenormalized
-
the largest denormalized value which can be represented
by instances of this class.
Should actually be sent to the instance,
because of IEEEFloat, which has instance-specific representation
-
fmin
-
the smallest normalized non-zero value which can be represented
by instances of this class;
should actually be sent to the instance,
because some of my subclasses have an instance-specific representation.
Not required to return an instances of the class,
but may return a double (aka Float) with that value (eg. for HalfFloats)
Usage example(s):
(1.0 asIEEEFloat:8) fmin -> 0.015625
HalfFloat fmin -> 6.103515625e-05
ShortFloat fmin -> 1.175494e-38
Float fmin -> 2.2250738585072e-308
LongFloat fmin -> 3.362103143112093506e-4932
QuadFloat fmin -> 3.3621031431119363650068581666578087e-4932
OctaFloat fmin
QDouble fmin -> 2.2250738585072e-308
(IEEEFloat size:16 exponentSize:5) fmin asFloat 6.103515625e-05
Float fmin = (2.0 raisedTo:Float emin) -> true
ShortFloat fmin = (2.0 raisedTo:ShortFloat emin) -> true
QuadFloat fmin = (2.0 asQuadFloat raisedTo:QuadFloat emin) -> true
OctaFloat fmin = (2.0 asOctaFloat raisedTo:OctaFloat emin) -> true
|
-
fminDenormalized
-
the smallest non-zero value which can be represented
by instances of this class;
should actually be sent to the instance,
because of IEEEFloat, which has instance-specific representation
** This method must be redefined in concrete classes (subclassResponsibility) **
-
infinity
-
return an instance of myself which represents positive infinity (for my instances).
Warning: do not compare equal against infinities;
instead, check using isFinite or isInfinite
Usage example(s):
ShortFloat infinity
Float infinity
LongFloat infinity
LargeFloat infinity
IEEEFloat infinity
QuadFloat infinity
OctaFloat infinity
QDouble infinity
|
-
maxSmallInteger
-
answer the largest possible SmallInteger value as instance of myself.
Notice: if my precision is smaller than the number of bits in a SmallInteger
you'll loose some precision.
Usage example(s):
Float maxSmallInteger. 4.61168601842739e+18
LongFloat maxSmallInteger. 4611686018427387903.0
ShortFloat maxSmallInteger. 4.611686e+18
QDouble maxSmallInteger. 4.61169e+18
QuadFloat maxSmallInteger. 4.61169e+18
|
-
minSmallInteger
-
answer the smallest possible SmallInteger value as instance of myself
Usage example(s):
Float maxSmallInteger.
LongFloat maxSmallInteger.
ShortFloat maxSmallInteger.
QDouble maxSmallInteger.
LargeFloat maxSmallInteger.
Float minSmallInteger.
LongFloat minSmallInteger.
ShortFloat minSmallInteger.
QDouble minSmallInteger.
LargeFloat minSmallInteger.
|
instance creation
-
fromBytes: bytes
-
Float fromBytes:#[0 0 0 0 0 0 8 0]
-
fromInteger: anInteger
-
return a float with anInteger's value.
Since floats have a limited precision, you usually loose bits when doing this
with a large integer
i.e. when numDigits is above the flt. pnt number's precision.
(see Float decimalPrecision, LongFloat decimalPrecision).
Also, a domainError could be raised, if the integer cannot be
represented as an instance of the receiver class.
(can be caught with trapInfinity:)
Usage example(s):
ShortFloat fromInteger:2
12345678901234567890 asShortFloat
1234567890 asFloat
1234567890 asFloat asInteger
-1234567890 asFloat asInteger
12345678901234567890 asFloat storeString
12345678901234567890 asFloat asInteger
-12345678901234567890 asFloat asInteger
12345678901234567890 asLongFloat
12345678901234567890 asLongFloat asInteger
-12345678901234567890 asLongFloat asInteger
123456789012345678901234567890 asLongFloat
123456789012345678901234567890 asLongFloat asInteger
-123456789012345678901234567890 asLongFloat asInteger
1234567890123456789012345678901234567890 asLongFloat
1234567890123456789012345678901234567890 asLongFloat asInteger
-1234567890123456789012345678901234567890 asLongFloat asInteger
'this test is on 65 bits'.
self assert: 16r1FFFFFFFFFFFF0801 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
'this test is on 64 bits'.
self assert: 16r1FFFFFFFFFFFF0802 asDouble ~= 16r1FFFFFFFFFFFF0800 asDouble.
'nearest even is upper'.
self assert: 16r1FFFFFFFFFFF1F800 asDouble = 16r1FFFFFFFFFFF20000 asDouble.
'nearest even is lower'.
self assert: 16r1FFFFFFFFFFFF0800 asDouble = 16r1FFFFFFFFFFFF0000 asDouble.
-- loosing bits!
(Float fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFF1000'
(Float fromInteger:16r1FFFFFFFFFFFF0880) asInteger hexPrintString '1FFFFFFFFFFFF1000'
(Float fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '2000000000000000000'
(Float fromInteger:16r1FFFFFFFFFFFFFFFFFFFF0801) asInteger hexPrintString '2000000000000000000000000'
(LongFloat fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFF0800'
(LongFloat fromInteger:16r1FFFFFFFFFFFF0880) asInteger hexPrintString '1FFFFFFFFFFFF0880'
(LongFloat fromInteger:16r1FFFFFFFFFFFFFF0880) asInteger hexPrintString '1FFFFFFFFFFFFFF0800'
(LongFloat fromInteger:16r1FFFFFFFFFFFFFFFFFFFF0801) asInteger hexPrintString '2000000000000000000000000'
(QuadFloat fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFF0801'
(QuadFloat fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFFFF0801'
(QDouble fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFF0801'
(QDouble fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFFFF0801'
(OctaFloat fromInteger:16r1FFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFF0801'
(OctaFloat fromInteger:16r1FFFFFFFFFFFFFF0801) asInteger hexPrintString '1FFFFFFFFFFFFFF0801'
|
-
fromLimitedPrecisionReal: anLPReal
-
return a float with anLPReal's value.
You might loose bits when doing this.
Slow fallback.
-
fromNumerator: numerator denominator: denominator
-
Create a limited precision real from a Rational.
This version will answer the nearest flotaing point value,
according to IEEE 754 round to nearest even default mode
Usage example(s):
Time millisecondsToRun:[
1000000 timesRepeat:[
Float fromNumerator:12345678901234567890 denominator:987654321
].
]
|fraction|
fraction := 12345678901234567890//987654321.
Time millisecondsToRun:[
1000000 timesRepeat:[
fraction asFloat
].
]
|
-
new: aNumber
-
catch this message - not allowed for floats/doubles
-
random
( an extension from the stx:libbasic2 package )
-
Float random
Float32 random
-
readFrom: aStringOrStream onError: exceptionBlock
-
read a float from a string
Usage example(s):
Float readFrom:'.1'
Float readFrom:'0.1'
Float readFrom:'0'
ShortFloat readFrom:'.1'
ShortFloat readFrom:'0.1'
ShortFloat readFrom:'0'
LongFloat readFrom:'.1'
LongFloat readFrom:'0.1'
LongFloat readFrom:'0'
LimitedPrecisionReal readFrom:'bla' onError:nil
Float readFrom:'bla' onError:nil
ShortFloat readFrom:'bla' onError:nil
|
queries
-
decimalEmax
-
Answer how many digits of accuracy this class supports
Usage example(s):
ShortFloat emax
ShortFloat decimalEmax
Float emax
Float emin
Float decimalEmax
LongFloat emax
LongFloat emin
LongFloat decimalEmax
|
-
decimalPrecision
-
return the number of valid decimal digits
Usage example(s):
HalfFloat decimalPrecision -> 3
ShortFloat decimalPrecision -> 7
Float decimalPrecision -> 16
LongFloat decimalPrecision -> 19
QuadFloat decimalPrecision -> 34
OctaFloat decimalPrecision -> 71
QDouble decimalPrecision -> 61
|
-
defaultPrintPrecision
-
the default number of digits when printing
Usage example(s):
ShortFloat defaultPrintPrecision -> 5
Float defaultPrintPrecision -> 6
LongFloat defaultPrintPrecision -> 8
QDouble defaultPrintPrecision -> 10
QuadFloat defaultPrintPrecision -> 9
OctaFloat defaultPrintPrecision -> 11
LargeFloat defaultPrintPrecision -> 12
|
-
defaultPrintfPrecision
-
the default number of digits when printing with printf's %f format.
Notice, that the C-language standard states that this should be 6;
however, we can adjust it on a per-class basis.
-
denormalized
-
Return whether the instances of this class can
represent values in denormalized format.
-
exactDecimalPrecision
-
return the exact number of decimal digits
Usage example(s):
HalfFloat exactDecimalPrecision -> 3.612359947967774002
ShortFloat exactDecimalPrecision -> 7.224719895935548004
Float exactDecimalPrecision -> 15.95458977019100184
LongFloat exactDecimalPrecision -> 19.26591972249479468
QuadFloat exactDecimalPrecision -> 34.01638951002987185
OctaFloat exactDecimalPrecision -> 71.34410897236353654
QDouble exactDecimalPrecision -> 61.41011911545215804
|
-
hasSharedInstances
-
return true if this class can share instances when stored binary,
that is, instances with the same value can be stored by reference.
Although not really shared, floats should be treated
so, to be independent of the implementation of the arithmetic methods.
-
isAbstract
-
Return if this class is an abstract class.
True is returned for LimitedPrecisionReal here; false for subclasses.
Usage example(s):
-
isIEEEFormat
-
return true, if this machine represents floats in IEEE format.
Currently, no support is provided for non-ieee machines
to convert their floats into this (which is only relevant,
if such a machine wants to send floats as binary to some other
machine).
Machines with non-IEEE format are VAXen and IBM370-type systems
(among others). Today, every system uses IEEE format floats.
-
numBitsInExponent
-
return the number of bits in the exponent
** This method must be redefined in concrete classes (subclassResponsibility) **
-
numBitsInIntegerPart
-
answer the number of bits in the integer part of the mantissa.
I.e. 0 is returned if there is a hidden bit, 1 if not.
Most floating point formats are normalized to get rid of the extra bit.
-
numBitsInMantissa
-
return the number of bits in the mantissa (the significant)
Typically the precision is 1 more than the significant due to the hidden bit
the hidden bit is not counted here.
** This method must be redefined in concrete classes (subclassResponsibility) **
-
numHiddenBits
-
answer the number of hidden bits in the mantissa.
This will return 0 or 1; 0 if there is no hidden bit, 1 if there is.
Most floating point formats are normalized to get one extra bit of precision
and thus will return 1 here.
-
precision
-
answer the precision (the number of bits in the mantissa) of my elements (in bits)
If my elements are IEEE floats, where only the fraction from the normalized mantissa is stored,
there will be a hidden bit and the mantissa will be actually represented by 1 more binary digits
(i.e. the number returned is 1 plus the actual number of bits stored)
any hidden bits are included here
Usage example(s):
HalfFloatArray precision
ShortFloat precision
Float precision
LongFloat precision
QDouble precision
|
-
radix
-
answer the radix of my instance's exponent
** This method must be redefined in concrete classes (subclassResponsibility) **
Compatibility-Squeak
-
defaultNumberOfDigits
( an extension from the stx:libcompat package )
-
marked as obsolete by exept MBP at 13-11-2021
** This is an obsolete interface - do not use it (it may vanish in future versions) **
accessing
-
at: index
-
redefined to prevent access to individual bytes in a real.
-
at: index put: aValue
-
redefined to prevent access to individual bytes in a real
arithmetic
-
* aNumber
-
return the product of the receiver and the argument.
-
+ aNumber
-
return the sum of the receiver and the argument, aNumber
-
- aNumber
-
return the difference of the receiver and the argument, aNumber
-
/ aNumber
-
return the quotient of the receiver and the argument, aNumber
-
// aNumber
-
return the integer quotient of dividing the receiver by aNumber with
truncation towards negative infinity.
-
ceiling
-
(comment from inherited method)
return the integer nearest the receiver towards positive infinity.
-
floor
-
(comment from inherited method)
return the receiver truncated towards negative infinity
-
timesTwoPower: anInteger
-
multiply self by a power of two.
I.e. self * (2**n)
Implementation takes care of preserving class and avoiding overflow/underflow
if possible; otherwise returns infinity or zero.
Thanks to Nicolas Cellier for this code
Usage example(s):
(3 asShortFloat timesTwoPower:10) -> 3072.0.
(3 asFloat timesTwoPower:10) -> 3072.0.
(3 asShortFloat timesTwoPower:100) -> 3.802952e+30.
(3 asFloat timesTwoPower:100) -> 3.80295180068469e+30.
(3 asShortFloat timesTwoPower:200) -> inf.
(3 asFloat timesTwoPower:200) -> 4.82081413277697e+60.
(1 asShortFloat timesTwoPower: 3) class = ShortFloat.
(1 asLongFloat timesTwoPower: 1024).
(1 asFloat timesTwoPower: -1024) timesTwoPower: 1024.
(1 asLongFloat timesTwoPower: -1024) timesTwoPower: 1024.
(2.0 asShortFloat timesTwoPower: -150) timesTwoPower: 150
(2.0 asLongFloat timesTwoPower: -150) timesTwoPower: 150
(2.0 asFloat timesTwoPower: -150) timesTwoPower: 150
(2.0 asShortFloat timesTwoPower: -149) timesTwoPower: 149
(2.0 asLongFloat timesTwoPower: -149) timesTwoPower: 149
(2.0 asFloat timesTwoPower: -149) timesTwoPower: 149
(ShortFloat infinity timesTwoPower:10) -> inf
(LongFloat infinity timesTwoPower:10) -> inf
(Float infinity timesTwoPower:10) -> inf
Time millisecondsToRun:[
1000000 timesRepeat:[
(2.0 timesTwoPower: 150)
]
]
|
bytes access
-
digitBytes
-
answer the float's digit bytes in IEEE format.
Use the native machine byte ordering.
Usage example(s):
1.0 digitBytes
Float pi digitBytes
ShortFloat pi digitBytes
|
-
digitBytesMSB: msb
-
answer the float's digit bytes im IEEE format.
If msb == true, use MSB byte order, otherwise LSB byte order.
Usage example(s):
Float pi digitBytesMSB:false
Float pi digitBytesMSB:true
ShortFloat pi digitBytesMSB:false
ShortFloat pi digitBytesMSB:true
|
coercing & converting
-
asFloat
-
(comment from inherited method)
return a float with same value
-
asFraction
-
Answer a rational number (Integer or Fraction) representing the receiver.
This conversion uses the continued fraction method to approximate
a floating point number.
In contrast to #asTrueFraction, which returns exactly the value of the float,
this rounds in the last significant bit of the floating point number.
Usage example(s):
1.1 asFraction
1.2 asFraction
0.3 asFraction
0.5 asFraction
(1/5) asFloat asFraction
(1/8) asFloat asFraction
(1/13) asFloat asFraction
(1/10) asFloat asFraction
(1/10) asFloat asTrueFraction asFixedPoint scale:20
3.14159 asFixedPoint scale:20
3.14159 storeString
3.14159 asFraction asFloat storeString
1.3 asFraction
1.0 asFraction
1E6 asFraction
1E-6 asFraction
|
-
asIEEEFloat
( an extension from the stx:libbasic2 package )
-
return an IEEE soft float with same value as receiver
Usage example(s):
123 asFloat asIEEEFloat
0 asShortFloat asIEEEFloat
0.0 asIEEEFloat
Float NaN asIEEEFloat
Float positiveInfinity asIEEEFloat
Float negativeInfinity asIEEEFloat
ShortFloat NaN asIEEEFloat
ShortFloat positiveInfinity asIEEEFloat
ShortFloat negativeInfinity asIEEEFloat
QuadFloat NaN asIEEEFloat
QuadFloat positiveInfinity asIEEEFloat
QuadFloat negativeInfinity asIEEEFloat
|
-
asIEEEFloat: numBits
-
return an IEEE soft float with same value as receiver and numBits overAll
numBits should be a multiple of 8,
i.e. 32 for IEEE single, 64 for double, 128 for quadFloat, etc.)
Usage example(s):
123 asFloat asIEEEFloat
123 asFloat asIEEEFloat:32
123 asFloat asIEEEFloat:16
12 asFloat asIEEEFloat:8
12 asIEEEFloat:8
0 asShortFloat asIEEEFloat
0.0 asIEEEFloat
|
-
asInteger
-
return an integer with same value - might truncate.
Does not raise an error for non-finite numbers (NaN or INF)
Usage example(s):
12345.0 asInteger
1e15 asInteger
1e33 asInteger asFloat
1e303 asInteger asFloat
|
-
asLargeFloat
( an extension from the stx:libbasic2 package )
-
return a large float with (approximately) my value.
If the LargeFloat class is not present, a regular float is returned
-
asLargeFloatPrecision: n
( an extension from the stx:libbasic2 package )
-
return a large float with (approximately) my value.
If the largeFloat class is not present, a regular float is returned
Usage example(s):
1.0 asLargeFloatPrecision:10
|
-
asLimitedPrecisionReal
-
return a float of any precision with same value
-
asLongFloat
-
(comment from inherited method)
return a longFloat with same value
-
asOctaFloat
( an extension from the stx:libbasic2 package )
-
(comment from inherited method)
return an octaFloat with same value
-
asQuadFloat
( an extension from the stx:libbasic2 package )
-
return a QuadFloat with same value as the receiver
-
asRational
-
Answer a Rational number--Integer or Fraction--representing the receiver.
Same as asFraction fro st-80 compatibility.
Usage example(s):
1.1 asRational
1.2 asRational
0.3 asRational
0.5 asRational
(1/5) asFloat asRational
(1/8) asFloat asRational
(1/13) asFloat asRational
3.14159 asRational
3.14159 asRational asFloat
1.3 asRational
1.0 asRational
|
-
asShortFloat
-
(comment from inherited method)
return a shortFloat with same value.
Does NOT raise an error if the receiver exceeds the float range.
-
asTrueFraction
-
Answer a fraction or integer that EXACTLY represents self,
an any-precision IEEE floating point number, consisting of:
numMantissaBits bits of normalized mantissa (i.e. with hidden leading 1-bit)
optional numExtraBits between mantissa and exponent (normalized flag for ext-real)
numExponentBits bits of 2s complement exponent
1 sign bit.
Taken from Float's asTrueFraction
Usage example(s):
(result asFloat = self) ifFalse: [self error: 'asTrueFraction validation failed'].
|
Usage example(s):
1.0 asLongFloat asTrueFraction
0.3 asFloat asTrueFraction (5404319552844595/18014398509481984)
0.3 asShortFloat asTrueFraction (5033165/16777216)
0.3 asLongFloat asTrueFraction (5404319552844595/18014398509481984)
0.3 asQuadFloat asTrueFraction (5404319552844595/18014398509481984)
0.3 asOctaFloat asTrueFraction (5404319552844595/18014398509481984)
1.25 asTrueFraction (5/4)
1.25 asShortFloat asTrueFraction (5/4)
1.25 asLongFloat asTrueFraction (5/4)
0.25 asTrueFraction (1/4)
0.25 asShortFloat asTrueFraction (1/4)
0.25 asLongFloat asTrueFraction (1/4)
-0.25 asTrueFraction (-1/4)
-0.25 asShortFloat asTrueFraction (-1/4)
-0.25 asLongFloat asTrueFraction (-1/4)
3e37 asTrueFraction 30000000000000002158062836758597337088
3e37 asShortFloat asTrueFraction 30000001069098037760363920625477091328
3e37 asLongFloat asTrueFraction 30000000000000002158062836758597337088
3e37 asQuadFloat asTrueFraction 30000000000000002158062836758597337088
3e37 asOctaFloat asTrueFraction 30000000000000002158062836758597337088
3e37 asQDouble asTrueFraction 30000000000000002158062836758597337088
0 asLongFloat negated asTrueFraction
LongFloat NaN asTrueFraction
LongFloat infinity asTrueFraction
LongFloat negativeInfinity asTrueFraction
Float fmin asTrueFraction
Float fminDenormalized asTrueFraction
Float fmaxDenormalized asTrueFraction
LongFloat fmin asTrueFraction
LongFloat fminDenormalized asTrueFraction
LongFloat fmaxDenormalized asTrueFraction
|
comparing
-
< aNumber
-
return true, if the argument is greater
double dispatching
-
differenceFromFraction: aFraction
-
sent when a fraction does not know how to subtract the receiver
-
equalFromFraction: aFraction
-
sent when a fraction does not know how to compare with the receiver
-
lessFromFraction: aFraction
-
aFraction does not know how to compare to the receiver -
Return true if aFraction < self.
-
productFromFraction: aFraction
-
sent when a fraction does not know how to multiply the receiver
-
quotientFromFloat: aFloat
-
return the quotient of aFloat and the receiver.
Return aFloat / self
-
quotientFromFraction: aFraction
-
Return the quotient of the argument, aFraction and the receiver.
Sent when aFraction does not know how to divide by the receiver.
-
sumFromFraction: aFraction
-
sent when a fraction does not know how to add the receiver
-
sumFromTimestamp: aTimestamp
-
I am to be interpreted as seconds, return the timestamp this number of seconds
after aTimestamp
Usage example(s):
Timestamp now sumFromTimestamp:aTimestamp
100.0 sumFromTimestamp:Timestamp now
|t1 t2|
t1 := Timestamp now.
t2 := 1.5 sumFromTimestamp:t1.
t1 inspect. t2 inspect.
|
error reportng
-
errorUnsupported
-
inspecting
-
inspectorExtraAttributes
( an extension from the stx:libtool package )
-
extra (pseudo instvar) entries to be shown in an inspector.
printing & storing
-
commonPrintOn: aStream
-
a zero mantissa is impossible - except for zero and a few others
-
printOn: aStream
-
0.0 printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:0.0. Transcript cr.
0.0 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:0.0 asIEEEFloat. Transcript cr.
0.0 asOctaFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:0.0 asOctaFloat. Transcript cr.
-0.0 printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:-0.0. Transcript cr.
-0.0 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'%-g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'%+g' on:Transcript argument:0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'%+g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:-0.0 asIEEEFloat. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:0.0 asIEEEFloat. Transcript cr.
1234.0 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:1234.0 asIEEEFloat. Transcript cr.
1e39 asIEEEFloat printOn:Transcript. Transcript cr.
PrintfScanf printf:'%g' on:Transcript argument:1e39 asIEEEFloat. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:IEEEFloat NaN. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:IEEEFloat infinity. Transcript cr.
PrintfScanf printf:'% g' on:Transcript argument:IEEEFloat negativeInfinity. Transcript cr.
-
printStringScientific
-
return a 'user friendly' scientific printString.
Notice: this returns a Text object with superscript digits,
which requires a font capapble of displaying it correctly.
Also: the returned string is not meant to be read back - purely for GUIs
Usage example(s):
1.23456 printString -> '1.23456'
1.23456 printStringScientific 1.23456×10^0 (with superscript zero at end)
1.23e14 printStringScientific 1.23×10^14 (with superscript zero at end)
PrintfScanf printf:'%e' argument:1.23456 -> '1.23456e0'
PrintfScanf printf:'%g' argument:1.23456 -> '1.23456'
PrintfScanf printf:'%f' argument:1.23456 -> '1.23456'
PrintfScanf printf:'%e' argument:1.234 -> '1.234e0'
PrintfScanf printf:'%g' argument:1.234 -> '1.234'
PrintfScanf printf:'%f' argument:1.234 -> '1.234'
|
-
printStringWithFormat: format
-
return a printed representation of the receiver;
fmt must be of the form: .nn, where nn is the number of digits.
To print 6 valid digits, use printStringWithFormat:'.6'
For Floats, the default used in printString, is 15 (because its a double);
for ShortFloats, it is 6 (because it is a float)
Usage example(s):
Float pi printStringWithFormat:'.20' => '3.141592653589793116'
Float pi asQuadFloat printStringWithFormat:'.20' => '3.14159265358978956320'
|
private accessing
-
digitBytes: bytesLSB
-
queries
-
decimalEmax
-
Answer how many digits of exponent-accuracy this class supports
Usage example(s):
1.0 asShortFloat emax
1.0 asShortFloat decimalEmax
1.0 asFloat emax
1.0 asFloat emin
1.0 asFloat decimalEmax
1.0 asLongFloat emax
1.0 asLongFloat emin
1.0 asLongFloat decimalEmax
|
-
decimalPrecision
-
Answer how many significant decimal digits (accuracy) this instance supports
Usage example(s):
1.0 asShortFloat decimalPrecision -> 7
1.0 asFloat decimalPrecision -> 15
1.0 asLongFloat decimalPrecision -> 19
1.0 asQDouble decimalPrecision -> 61
1.0 asLargeFloat decimalPrecision -> 15
(1.0 asLargeFloatPrecision:200) decimalPrecision -> 60
(1.0 asLargeFloatPrecision:400) decimalPrecision -> 120
1.0 asQuadFloat decimalPrecision -> 34
1.0 asOctaFloat decimalPrecision -> 71
1.0 asIEEEFloat decimalPrecision -> 15
(1.0 asIEEEFloat:128) decimalPrecision -> 34
(1.0 asIEEEFloat:256) decimalPrecision -> 71
(1.0 asIEEEFloat:512) decimalPrecision -> 148
(1.0 asIEEEFloat:1024) decimalPrecision -> 302
1.0 asLongFloat asIEEEFloat decimalPrecision -> 15
1.0 asShortFloat asIEEEFloat decimalPrecision -> 15
|
-
defaultPrintPrecision
-
the default number of digits when printing
Usage example(s):
1.0 asFloat defaultPrintPrecision 15
1.0 asLongFloat defaultPrintPrecision 19
1.0 asShortFloat defaultPrintPrecision 6
1.0 asQDouble defaultPrintPrecision 60
1.0 asQuadFloat defaultPrintPrecision 30
1.0 asOctaFloat defaultPrintPrecision 70
(1.0 asLargeFloatPrecision:100) defaultPrintPrecision 29
(1.0 asLargeFloatPrecision:200) defaultPrintPrecision 59
(1.0 asLargeFloatPrecision:300) defaultPrintPrecision 79
|
-
defaultPrintfPrecision
-
the default number of digits when printing with printf's %f format.
Notice, that the C-language standard states that this should be 6;
however, we can adjust it on a per-class basis.
-
eBias
-
Answer the exponent's bias;
that is the offset of the zero exponent when stored
(i.e. the real exponent is exponentBits - eBias).
This is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
Usage example(s):
1.0 numBitsInExponent 11
1.0 eBias 1023
1.0 emin -1022
1.0 emax 1023
1.0 fmin 2.2250738585072E-308
1.0 fmax 1.79769313486232E+308
|
Usage example(s):
1.0 asLongFloat numBitsInExponent 15
1.0 asLongFloat eBias 16383
1.0 asLongFloat emin -16382
1.0 asLongFloat emax 16383
1.0 asLongFloat fmin 3.362103143112093506E-4932
1.0 asLongFloat fmax 1.189731495357231765E+4932
|
Usage example(s):
1.0 asShortFloat numBitsInExponent 8
1.0 asShortFloat eBias 127
1.0 asShortFloat emin -126
1.0 asShortFloat emax 127
1.0 asShortFloat fmin 1.175494e-38
1.0 asShortFloat fmax 3.402823e+38
|
Usage example(s):
1.0 asQuadFloat numBitsInExponent 15
1.0 asQuadFloat eBias 16383
1.0 asQuadFloat emin -16382
1.0 asQuadFloat emax 16383
1.0 asQuadFloat fmin
1.0 asQuadFloat fmax
|
Usage example(s):
1.0 asIEEEFloat numBitsInExponent 15
1.0 asIEEEFloat eBias 16383
1.0 asIEEEFloat emin -16382
1.0 asIEEEFloat emax 16383
1.0 asIEEEFloat fmin
1.0 asIEEEFloat fmax
|
-
emax
-
The largest exponent value allowed by instances like this.
The computation below assumes standard IEEE format.
This is also implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
Usage example(s):
Float emax -> 1023
ShortFloat emax -> 127
LongFloat emax -> 16383
QuadFloat emax -> 16383
OctaFloat emax -> 262143
QDouble emax -> 1023
|
-
emin
-
The smallest exponent value allowed by (normalized) instances of this class.
The computation below assumes standard IEEE format.
This is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
-
epsilon
-
return the maximum relative spacing of instances of mySelf
(i.e. the value-delta of the least significant bit)
according to ISO C standard;
Ada, C, C++ and Python language constants;
Mathematica, MATLAB and Octave; and various textbooks
see https://en.wikipedia.org/wiki/Machine_epsilon
-
exponent
-
generic; assumes IEEE float
Usage example(s):
1.0 exponent 1
1.0 xexponent 1
0.0 exponent 0
0.0 xexponent 0
Float fmin exponent -1021
Float fmin xexponent -1021
(Float fmin / 2) exponent -1022
(Float fmin / 2) xexponent -1022
(Float fmin / 4) exponent -1023
(Float fmin / 4) xexponent -1023
(Float fmin / 32) exponent -1026
(Float fmin / 32) xexponent -1026
(Float fminDenormalized) exponent -1073
(Float fminDenormalized) xexponent -1073
Float NaN exponent
Float infinity exponent
|
-
exponentBits
-
extract the biased exponentBits.
Assumes that subclasses are IEEE based (or at least can provide
an IEEE compatible byteArray for themself
Usage example(s):
0.0 mantissaBits 0
0.0 exponentBits 0
1.0 mantissaBits hexPrintString -> '0'
1.0 mantissaWithHiddenBits hexPrintString -> '10000000000000'
1.0 exponentBits -> 1023 16r3FF
2.0 mantissaBits hexPrintString -> '0'
2.0 mantissaWithHiddenBits hexPrintString -> '10000000000000'
2.0 exponentBits -> 1024 16r400
3.0 mantissaBits hexPrintString -> '8000000000000'
3.0 mantissaWithHiddenBits hexPrintString -> '18000000000000'
3.0 exponentBits -> 1024 16r400
4.0 mantissaBits hexPrintString -> '0'
4.0 exponentBits -> 1025 16r401
5.0 mantissaBits hexPrintString -> '4000000000000'
5.0 exponentBits -> 1025 16r401
-5.0 mantissaBits hexPrintString -> '4000000000000'
-5.0 exponentBits -> 1025 16r401
0.1 mantissaBits hexPrintString '1999999999999A'
0.1 exponentBits 1019 16r3FB
0.3 mantissaBits hexPrintString '13333333333333'
0.3 exponentBits 1021 16r3FD
0.3 asShortFloat mantissaBits 10066330 16r99999A
0.3 asShortFloat exponentBits 125 16r7D
0.3 asLongFloat mantissaBits 11068046444225730560 16r9999999999999800
0.3 asLongFloat exponentBits 16381 16r3FFD
0.3 asQDouble mantissaBits
Float fmin exponentBits 1
Float fminDenormalized exponentBits
|
-
fmax
-
the largest finite value which can be represented
by normalized instances of this class;
this is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
-
fmaxDenormalized
-
the largest denormalized value which can be represented
by instances of this class.
This is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
-
fmin
-
the smallest non-zero value which can be represented
by normalized instances of this class;
this is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
-
fminDenormalized
-
the smallest non-zero value which can be represented by instances of this class;
this is implemented on the instance side,
because of IEEEFloat, which has instance-specific representation.
-
fractionalPart
-
This has been renamed to #fractionPart for ST80 compatibility.
extract the after-decimal fraction part.
the floats value is
float truncated + float fractionalPart
** This is an obsolete interface - do not use it (it may vanish in future versions) **
-
hasIEEEFormat
-
HalfFloat isIEEEFormat true
ShortFloat isIEEEFormat true
Float isIEEEFormat true
LongFloat isIEEEFormat true
QuadFloat isIEEEFormat true
OctaFloat isIEEEFormat true
QDouble isIEEEFormat false
LargeFloat isIEEEFormat false
-
mantissa
-
extract a float's mantissa (as Float).
That is a float of the same type as the receiver,
such that:
(f mantissa) * (2 ^ f exponent) = f
This assumes that the mantissa is normalized to 0.5 .. 1.0
** This method must be redefined in concrete classes (subclassResponsibility) **
-
mantissaBits
-
extract a float's mantissaBits (excl. any hidden bit).
I.e. this returns the normalized mantissaBits as an integer.
Assumes that subclasses are IEEE based (or at least can provide
an IEEE compatible byteArray for themself
Usage example(s):
0.0 mantissaBits
1.0 mantissaBits hexPrintString -> '0'
2.0 mantissaBits hexPrintString -> '0'
3.0 mantissaBits hexPrintString -> '8000000000000'
4.0 mantissaBits hexPrintString -> '0'
5.0 mantissaBits hexPrintString -> '4000000000000'
10.0 mantissaBits hexPrintString -> '4000000000000'
0.1 mantissaBits hexPrintString -> '999999999999A'
0.3 mantissaBits hexPrintString -> '3333333333333'
10.0 asShortFloat mantissaBits hexPrintString -> '200000'
10.0 asLongFloat mantissaBits hexPrintString -> 'A000000000000000'
10.0 mantissaWithHiddenBits hexPrintString -> '14000000000000'
10.0 asShortFloat mantissaWithHiddenBits hexPrintString -> 'A00000'
10.0 asLongFloat mantissaWithHiddenBits hexPrintString -> 'A000000000000000'
0.3 asShortFloat mantissaBits -> 1677722 16r19999A
0.3 asLongFloat mantissaBits -> 29514790517935282176 16r19999999999999800
|
-
mantissaWithHiddenBits
-
extract a float's mantissaBits (incl. any hidden bit).
I.e. this returns the denormalized mantissaBits
Usage example(s):
0.0 mantissaBits 0
0.0 mantissaWithHiddenBits 0
1.0 mantissaBits hexPrintString -> '0'
1.0 mantissaWithHiddenBits hexPrintString -> '10000000000000'
2.0 mantissaBits hexPrintString -> '0'
2.0 mantissaWithHiddenBits hexPrintString -> '10000000000000'
0.1 mantissaBits hexPrintString -> '999999999999A'
0.1 mantissaWithHiddenBits hexPrintString -> '1999999999999A'
0.3 mantissaBits hexPrintString -> '3333333333333'
0.3 mantissaWithHiddenBits hexPrintString -> '13333333333333'
10.0 mantissaWithHiddenBits hexPrintString -> '14000000000000' / 2r10100000000000000000000000000000000000000000000000000
10.0 asShortFloat mantissaWithHiddenBits hexPrintString -> 'A00000' / 2r101000000000000000000000
10.0 asLongFloat mantissaWithHiddenBits hexPrintString -> 'A000000000000000' / 2r1010000000000000000000000000000000000000000000000000000000000000
10.0 asQuadFloat mantissaWithHiddenBits hexPrintString -> 'A000000000000000' / 2r1010000000000000000000000000000000000000000000000000000000000000
0.3 asShortFloat mantissaBits -> 1677722 16r19999A
0.3 asLongFloat mantissaBits -> 29514790517935282176 16r19999999999999800
Float fminDenormalized mantissaWithHiddenBits
|
-
nextFloat
-
answer the next representable float after myself
Usage example(s):
(1.0 nextFloat) storeString
(1.0 asShortFloat nextFloat) storeString
(67329.234 nextFloat) storeString
(67329.234 asShortFloat nextFloat) storeString
(10000000000.0 nextFloat) storeString
(10000000000.0 asShortFloat nextFloat) storeString
|
-
nextFloat: nUlps
-
answer the next representable float nUlps after myself
** This method must be redefined in concrete classes (subclassResponsibility) **
-
numBitsInExponent
-
answer the number of bits in the exponent
11 for double precision:
seeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
8 for single precision:
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
15 for long floats (x86):
00000000 00000000 seeeeeee eeeeeeee immmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
15 for long floats (sparc):
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
15 for quad floats:
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
15 for octuple floats:
seeeeeee eeeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
other for LargeFloats
-
numBitsInMantissa
-
answer the number of bits in the mantissa (the significant) of my instances
any hidden bits are not counted.
11 for half precision:
seeeemmm mmmmmmmm
23 for single precision:
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
52 for double precision:
seeeeeee eeeemmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
64 for longfloat precision (x86):
00000000 00000000 seeeeeee eeeeeeee immmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm
112 for longfloat precision (sparc):
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
112 for quadfloat precision:
seeeeeee eeeeeeee mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm...
Usage example(s):
1.0 numBitsInMantissa
1.0 asShortFloat numBitsInMantissa
1.0 asLongFloat numBitsInMantissa
|
-
numHiddenBits
-
answer the number of bits in the integer part of the mantissa.
Most floating point formats are normalized to get rid of the extra bit.
(i.e. except for LongFloats and LargeFloats,
instances are normalized to exclude any integer bit
-
precision
-
answer the precision (the number of bits in the mantissa) of my elements (in bits)
If my elements are IEEE floats, where only the fraction from the normalized mantissa is stored,
there will be a hidden bit and the mantissa will be actually represented by 1 more binary digits
(i.e. the number returned is 1 plus the actual number of bits stored).
Should be redefined in classes which allow per-instance precision specification
the hidden bit is included here
-
previousFloat
-
answer the previous representable float after myself
Usage example(s):
(1.0 previousFloat) storeString
(1.0 asShortFloat previousFloat) storeString
(67329.234 previousFloat) storeString
(67329.234 asShortFloat previousFloat) storeString
(10000000000.0 previousFloat) storeString
(10000000000.0 asShortFloat previousFloat) storeString
|
-
radix
-
answer the radix of the exponent
Typically, but not required to be, this will be 2
(as floats ary usually represented as IEEE binary floats)
-
size
-
redefined since reals are kludgy (ByteArry)
-
ulp
-
answer the distance between me and the next representable number;
One exception here: for fmax, the distance to the previous float is returned
Usage example(s):
(1.0 nextFloat:1) storeString
(1.0 ulp) storeString
(10.0 nextFloat:1) storeString
(10.0 ulp) storeString
(-10.0 nextFloat:1) storeString
(-10.0 ulp) storeString
(-10.0 nextFloat:-1) storeString
(67329.234 nextFloat:1) storeString
(67329.234 ulp) storeString
(67329.234 asShortFloat nextFloat:1) storeString
(67329.234 asShortFloat ulp) storeString
Float NaN nextFloat:100000
Float infinity nextFloat:100000
1.0 ulp -> 2.22044604925031E-16
10000000000000000000000.0 ulp -> 2097152.0
34.543 ulp storeString -> '7.1054273576010019E-15'
-34.543 ulp storeString -> '7.1054273576010019E-15'
Float NaN ulp -> nan
0.0 ulp -> 4.94065645841247E-324
0.0 asShortFloat ulp -> 1.401298e-45
Float infinity ulp -> nan
Double fmax previousFloat ulp -> 1.99584030953472E+292
Double fmax ulp -> 1.99584030953472E+292
Double fmin ulp -> 4.94065645841247e-324
Double NaN ulp -> nan
|
special access
-
partValues: aBlock
-
invoke aBlock with sign, exponent and abs(mantissa)
Usage example(s):
1.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
2.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
-1.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
-2.0 partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
1.0 asShortFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
1.0 asLongFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
1.0 asLargeFloat partValues:[:sign :exp :mantissa | Transcript showCR:'%1/%2/%3' with:sign with:exp with:mantissa].
|
testing
-
isFinite
-
return true, if the receiver is a finite float (not NaN and not +/-INF)
** This method must be redefined in concrete classes (subclassResponsibility) **
-
isFloat
-
return true, if the receiver is some kind of floating point number;
true is returned here.
Same as #isLimitedPrecisionReal, but a better name ;-)
-
isInfinite
-
return true, if the receiver is an infinite float (+Inf or -Inf).
These are not created by ST/X float operations (they raise an exception);
however, inline C-code could produce them.
Usage example(s):
1.0 isInfinite
(0.0 uncheckedDivide: 0.0) isInfinite
(1.0 uncheckedDivide: 0.0) isInfinite
|
-
isLimitedPrecisionReal
-
return true, if the receiver is some kind of limited precision real (i.e. floating point) number;
true is returned here - the method is redefined from Object.
-
isNaN
-
return true, if the receiver is an invalid float (NaN - not a number).
These are usually not created by ST/X float operations (they raise an exception);
however, inline C-code or proceeded exceptions or reading from a stream
could produce them.
** This method must be redefined in concrete classes (subclassResponsibility) **
-
isNegativeZero
-
many systems have two float.Pnt zeros
Usage example(s):
0.0 asLongFloat isNegativeZero
-0.0 asLongFloat isNegativeZero
-1.0 asLongFloat isNegativeZero
1.0 asLongFloat isNegativeZero
0.0 asLargeFloat isNegativeZero
-0.0 asLargeFloat isNegativeZero
|
-
numberOfBits
-
return the size (in bits) of the real;
typically, this is 64 for Floats and 32 for ShortFloats,
but who knows ...
** This method must be redefined in concrete classes (subclassResponsibility) **
-
positive
-
return true if the receiver is greater or equal to zero (not negative)
-
sign
-
return the sign of the receiver (-1, 0 or 1)
Usage example(s):
-1.0 sign
-0.0 sign
1.0 sign
0.0 sign
Infinity infinity sign
Infinity infinity negated sign
|
truncation & rounding
-
ceilingAsFloat
-
for protocol compatibility with floats;
returns the smallest integer which is greater or equal to the receiver as a float
Usage example(s):
0.4 asLongFloat ceilingAsFloat
|
-
floorAsFloat
-
for protocol compatibility with floats;
returns the receiver truncated towards negative infinity as a float
Usage example(s):
0.4 asLongFloat floorAsFloat
|
-
integerAndFractionParts
-
return the integer and the fraction part of the receiver as a pair
of floats (i.e. the result of the modf function).
Adding the parts gives the original value
-
integerPart
-
return a float with value from digits before the decimal point
(i.e. the truncated value)
Usage example(s):
1234.56789 integerPart
1.2345e6 integerPart
12.5 integerPart
-12.5 integerPart
(5/3) integerPart
(-5/3) integerPart
(5/3) truncated
(-5/3) truncated
|
-
roundedAsFloat
-
for protocol compatibility with floats;
returns the receiver rounded to the nearest integer as a float
-
truncatedAsFloat
-
return the receiver truncated towards zero as a long float.
This is much like #truncated, but avoids a (possibly expensive) conversion
of the result to an integer.
It may be useful, if the result is to be further used in another
float-operation.
Usage example(s):
0.4 asLongFloat truncatedAsFloat
|
-
truncatedToPrecision
-
truncates to the precision of the float.
This is slightly different from truncated.
Taking for example 1e32,
the printed representation will be 1e32,
but the actual value, when truncating to an integer
would be 100000003318135351409612647563264.
This is due to the inaccuracy in the least significant bits,
and the way the print-converter compensates for this.
This method tries to generate an integer value which corresponds
to what is seen in the float's printString.
Here, a slow fallback (generating and rescanning the printString)
is provided, which should work on any float number.
Specialized versions in subclasses may be added for more performance
(however, this is probably only used rarely)
Usage example(s):
1e32 asShortFloat truncated
1e32 asShortFloat truncatedToPrecision
1.234e10 asShortFloat truncatedToPrecision
1234e-1 asShortFloat truncatedToPrecision
1e32 truncated
1e32 truncatedToPrecision
1.234e10 truncatedToPrecision
1234e-1 truncatedToPrecision
1e32 asLongFloat truncated
1e32 asLongFloat truncatedToPrecision
1.234e10 asLongFloat truncatedToPrecision
1234e-1 asLongFloat truncatedToPrecision
|
visiting
-
acceptVisitor: aVisitor with: aParameter
-
dispatch for visitor pattern; send #visitFloat:with: to aVisitor.
|