<=
Predictor of DVM-program performance. Detailed design. |
Predictor of DVM-program
performance. |
- last edited 22.05.01 -
Appendix 1. Link from field name in output HTML-file to field name in structure _IntervalResult
Field Name | Anchor | inter variable | ||
Efficiency | Effic | Efficiency | ||
Execution time | Exec | Execution_time | ||
Total time | Total | Total_time | ||
Productive time | Ptime | Productive_time | ||
CPU | Ptimec | Productive_CPU_time | ||
SYS | Ptimes | Productive_SYS_time | ||
I/O | Ptimei | IO_time | ||
Lost time | Lost | Lost_time | ||
Insufficient parallelism | Insuf | Insuff_parallelism | ||
USR | iuser | Insuff_parallelism_sys | ||
SYS | isyst | Insuff_parallelism_usr | ||
Communications | comm | Communication | ||
SYN | csyn | Communication_SYNCH | ||
Idle time | idle | Idle | ||
Load imbalance | imbal | Load_imbalance | ||
Synchronization | synch | Synchronization | ||
Time variation | vary | Time_variation | ||
Overlap | over | Overlap | ||
IO | # op | nopi | Num_op_io | |
Communications | comi | IO_comm | ||
Real synch | synchi | IO_synch | ||
Overlap | overi | IO_overlap | ||
Reduction | # op | nopr | Num_op_reduct | |
Communications | comr | Wait_reduction | ||
Real synch | synchr | Reduction_synch | ||
Overlap | overr | Reduction_overlap | ||
Shadow | # op | nops | Num_op_shadow | |
Communications | coms | Wait_shadow | ||
Real synch | synchs | Shadow_synch | ||
Overlap | overs | Shadow_overlap | ||
Remote access | # op | nopa | Num_op_remote | |
Communications | coma | Remote_access | ||
Real synch | syncha | Remote_synch | ||
Overlap | overa | Remote_overlap | ||
Redistribution | # op | nopd | Num_op_redist | |
Communications | comd | Redistribution | ||
Real synch | synchd | Redistribution_synch | ||
Overlap | overd | Redistribution_overlap |
Appendix 2. Definition of auxiliary functions and classes
Below there is a description of functions and classes used in implementation of algorithms described in the previous chapter.
// Base class for most of the classes class Space { protected: long Rank; // Number of space dimensions vector<long> SizeArray; // Size of each dimension vector<long> MultArray; // Multiplier for each dimension public: Space(); Space(long ARank, vector<long> ASizeArray, vector<long> MultArray); Space(long ARank, long *ASizeArray); Space(const Space &); ~Space(); Space & operator= (const Space &x); long GetRank(); long GetSize(long AAxis); void GetSI(long LI, vector<long> & SI); long GetLI(const vector<long> & SI); long GetCenterLI(); long GetSpecLI(long LI, long dim, int shift); long GetLSize(); long GetNumInDim(long LI, long dimNum); long GetDistance(long LI1, long LI2); };
GetRank | | returns space rank. |
GetSize | | returns size of the space with number AAxis. |
GetSI | | calculates coordinates of SI by the linear index LI. |
GetLI | | calculates the linear index by coordinates in the given space. |
GetCenterLI | | returns the linear index of the element that is the geometric center of the space. |
GetSpecLI | | returns the linear index of the element moved by shift in the dimension dim from the element with linear index LI. |
GetLSize | | returns linear size (number of elements) of the space. |
GetNumInDim | | returns coordinate of the element with linear index LI in the given dimension dimNum. |
GetDistance | | distance between two elements of the space with linear indexes LI1 and LI2. |
Virtual machine (Processor
system) class.
class VM : public Space { int MType; // distributed processor system type // 0 net with bus organization, 1 transputer system double TStart; // Start time of exchange operation double TByte; // Tyme to send one byte public: // constructor VM(vector<long>& ASizeArray, int AMType, double ATStart, double ATByte, double AProcPower); ~VM(); double getTByte(); double getTStart(); int getMType(); };
Abstract machine representation class.
class AMView : public Space { public: VM *VM_Dis; // Processor system on which the template is mapped list<DArray *> AlignArrays; // List of arrays aligned by the given template vector<DistAxis> DistRule; // Rule by which the template is mapped // on the processor system vector<long> FillArr; // Array containing the information about how the processor // system is filled with the template elements AMView(long ARank, long *ASizeArray); AMView(const AMView &); ~AMView(); void DelDA(DArray *RAln_da); void AddDA(DArray *Aln_da); void DisAM(VM *AVM_Dis, long AParamCount, long *AAxisArray, long *ADistrParamArray); double RDisAM(long AParamCount, long *AAxisArray, long *ADistrParamArray, long ANewSign); bool IsDistribute(); };
DelDA | - | removes DArray from the list of aligned arrays. |
AddDA | - | adds DArray to the list of the aligned arrays. |
DisAM | - | function that maps the template onto processor system. The pointer on the processor system, mapping rule and array with information about filling the processor system with template elements are initialized according to function parameters. |
RdisAM | - | function that determines the time spent in exchanges when the template mapping is changed (template redistribution). The algorithm implemented in it is described in 3.2. |
IsDistribute | - | checks if the template is already distributed on the processor system. |
Distributed array class.
class DArray : public Space { private: void PrepareAlign(long& TempRank, long *AAxisArray, long *ACoeffArray, long *AConstArray, vector<AlignAxis>& IniRule); long CheckIndex(long *InitIndexArray, long *LastIndexArray, long *StepArray); public: long TypeSize; // Size of one array element in bytes. AMView *AM_Dis; // Template the array is aligned by. vector<AlignAxis> AlignRule // Align rule. int Repl; // Criterion of fully replicated array. DArray(); DArray(long ARank, long *ASizeArray, long ATypeSize); DArray(const DArray &); ~DArray(); DArray & operator= (DArray &x); void AlnDA(AMView *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray); void AlnDA(DArray *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray); double RAlnDA(AMView *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray, long ANewSign); double RAlnDA(DArray *APattern, long *AAxisArray, long *ACoeffArray, long *AConstArray, long ANewSign); friend double ArrayCopy(DArray *AFromArray, long *AFromInitIndexArray, long *AFromLastIndexArray, long *AFromStepArray, DArray *AToArray, long *AToInitIndexArray, long *AToLastIndexArray, long *AToStepArray, long ACopyRegim); long GetMapDim(long arrDim, int &dir); bool IsAlign(); };
PrepareAlign | | initializes the rule by which the array is aligned by the template. |
CheckIndex | | returns number of elements in the array section given in the function parameters (0 if it is empty or array indexes are out of bounds). |
AlnDA | | functions that set the position (alignment) of the distributed array. In the second function, the template is set indirectly through the distributed array. Template pointer is initialized. In the first function, the fully replicated array criterion is determined. In the second, it is inherited from the array that acts as a mapping template. Also, the align rule is initialized in the first function using the PrepareAlign function, and besides, in the second function the rule is altered according to how the template is aligned (alignment superposition is used to receive the resulting alignment). |
RAlnDA | | function that determines the time needed for exchanges during the array realignment. Algorithm is described in 3.2. |
ArrayCopy | | function that determines the time needed for exchanges while loading the buffers with remote array elements. Algorithm is described in 3.5. |
GetMapDim | | function returns number of the processor system dimension on which the arrDim array dimension is mapped as a result. If the array dimension is replicated by all the directions of processor matrix, 0 is returned. 1 or 1 is put into dir, according to the direction of array dimension break-down. |
IsAlign | | checks if the array is aligned by the template. |
Bound group class.
class BoundGroup { AMView *amPtr; // Template by which the arrays with bounds in the group // are aligned CommCost boundCost; // Processor exchange information. public: BoundGroup(); virtual ~BoundGroup(); void AddBound(DArray *ADArray, long *ALeftBSizeArray, long *ARightBSizeArray, long ACornerSign); double StartB(); };
AddBound | | inclusion of the distributed array bound in the bound group. Algorithm is described in 3.3. |
StartB | | function that determines the time spent in distributed array bound exchanges, with the bounds that are in the group. Algorithm is described in 3.3. |
Reduction variable class.
class RedVar { public: long RedElmSize; // Size of the reduction variablearray in bytes long RedArrLength; // Number of elements in the reduction variable-array long LocElmSize; // Size of one element of the array with auxiliary information RedVar(long ARedElmSize, long ARedArrLength, long ALocElmSize); RedVar(); virtual ~RedVar(); long GetSize(); };
GetSize | | returns the size of the reduction variable and of the array with auxiliary information in bytes. |
Reduction group class.
class RedGroup { public: VM *vmPtr; // Pointer to the processor system vector<RedVar *> redVars; // Array of reduction variables long TotalSize; // Total size of reduction variables in the group with their // auxiliary information, in bytes long CentralProc; // Linear index of the geometrical center of the processor //system RedGroup(VM *AvmPtr); virtual ~RedGroup(); void AddRV(RedVar *ARedVar); double StartR(DArray *APattern, long ALoopRank, long *AAxisArray); };
void AddRV | | inclusion of the reduction variable in the reduction group. Algorithm is described in 3.4. |
StartR | | function that returns the time spent in exchanges during the reduction operation. Algorithm is described in 3.4. |
Distribution of the array dimension class.
class DistAxis { public: long Attr; // Distribution type long Axis; // Number of the template dimension long PAxis; // Number of the processor system dimension DistAxis(long AAttr, long AAxis, long APAxis); DistAxis(); virtual ~DistAxis(); DistAxis& operator= (const DistAxis&); };
Alignment of the distributed array by the template class.
class AlignAxis { public: long Attr; // Distribution type long Axis; // Number of the array dimension long TAxis; // Number of the template dimension long A; // Coefficient for the index variable of the array in the linear align rule of //the TAxis template dimension long B; // Constant of the linear align rule for the TAxis template dimension long Bound; // Dimension size of the array that acts as a template //during the partial replication of the array being aligned AlignAxis(long AAttr, long AAxis, long ATAxis, long AA = 0, long AB = 0, long ABound = 0); AlignAxis(); virtual ~AlignAxis(); AlignAxis& operator= (const AlignAxis&); };
Shadow edge by one distributed array dimension class.
class DimBound { public: long arrDim; // Array dimension number long vmDim; // Processor system dimension number int dir; // 1 or 1 according to the break-down direction of the array dimension long LeftBSize; // Width of the left bound for the arrDim array dimension long RightBSize; // Width of the right bound for the arrDim array dimension DimBound(long AarrDim, long AvmDim, int Adir, long ALeftBSize, long ARightBSize); DimBound(); virtual ~DimBound(); };
Array section class.
class Block { vector<LS> LSDim; // Vector containing the corresponding linear segments for every array // dimension, that describe the section public: Block(vector<LS> &v); Block(DArray *da, long ProcLI); Block(); virtual ~Block(); Block & operator =(const Block & x); long GetRank(); long GetBlockSize(); long GetBlockSizeMult(long dim); long GetBlockSizeMult2(long dim1, long dim2); bool IsLeft(long arrDim, long elem); bool IsRight(long arrDim, long elem); bool IsBoundIn(long *ALeftBSizeArray,long *ARightBSizeArray); bool empty(); friend Block operator^ (Block &x, Block &y); };
Block | | creates da array section situated on the processor with ProcLI linear index. |
GetRank | | returns rank of the section. |
GetBlockSize | | number of elements in the section. |
GetBlockSizeMult,GetBlockSizeMult2 | | these functions return the result of multiplying sizes of the section in all the dimensions except dimensions that have been given in the function call. |
IsLeft, IsRight | | checks if the element elem is positioned to the left (right) of the section in the arrDim dimension. |
IsBoundIn | | checks if the distributed array bound is in the given section. |
Empty | | checks if the section has no elements. |
Block operator^ | | returns the intersection of the sections given in the function call. |
Linear segment class.
class LS { public: long Lower; // Lower index value long Upper; // Upper index value LS(long ALower, long AUpper); LS(); virtual ~LS(); long GetLSSize(); void transform(long A, long B, long daDimSize); bool IsLeft(long elem); bool IsRight(long elem); bool IsBoundIn(long ALeftBSize, long ARightBSize); bool empty(); LS operator^ (LS &x); };
GetLSSize | | returns the size of the linear segment. |
Transform | | transforms the linear segment of the template into the linear segment of the distributed array aligned by the given template. |
IsLeft, IsRight | | check that the elem element is to the left(right) of the segment. |
IsBoundIn | | check that the given bound is in the bounds of the segment. |
empty | | check if there are no elements in the segment. |
LS operator | | segment intersection operator. |
Evaluation of interprocessor exchanges class.
class CommCost { public: Dim2Array transfer; // Array that contains the information about the number of bytes // transferred between two processors VM *vm; // Pointer to the processor system CommCost(VM *Avm); CommCost(); virtual ~CommCost(); CommCost & operator =(const CommCost &); double GetCost(); void Update(DArray *oldDA, DArray *newDA); void BoundUpdate(DArray *daPtr, vector<DimBound> & dimInfo, bool IsConer); void CopyUpdate(DArray *FromArray, Block & readBlock); };
GetCost | | returns the time spent in interprocessor exchanges inside the system. Algorithm is described in 3.2. |
Update | | function that alters the transfer array according to the exchanges between the processors that occur during redistribution of the array. Algorithm implemented is described in 3.2. |
BoundUpdate | | function that changes the transfer array according to transfers that occur during the given distributed array bound exchange. Algorithm is described in 3.3. |
CopyUpdate | | function that changes the transfer array according to exchanges that occur during the replication of the readBlock section of the FromArray by all the processors. |
Appendix 3. Main functions of time extrapolation
Constructor of the Virtual machine object
VM::VM( vector<long> ASizeArray, int AMType, double ATStart, double ATByte, double AProcPower );
ASizeArray | | vector, element in i-th position is the size of the given processor system in dimension i + 1 (0 £ i £ ARank 1); |
AMType | | type of the distributed processor system (0 net with bus organization, 1 transputer system); |
ATStart | | start time of the exchange operation; |
ATByte | | time to send one byte; |
AProcPower | | relative processor power. |
Constructor of the Abstract machine representation object
AMView::AMView( vector< long> ASizeArray );
ASizeArray | | vector, element in i-th position is the size of the template in dimension i+1 (0 £ i £ ARank1). |
Template mapping
void AMView::DisAM (ImLastVM *AVM_Dis, vector<long>
AAxisArray,
vector<long>
*ADistrParamArray );
AVM_Dis | | pointer to the processor system on which the template is mapped; |
AAxisArray | | vector, element in j-th position is the number of the template dimension which is used in the mapping rule for (j+1)-th processor system dimension; |
ADistrParamArray | | ignored (only two mapping rules are provided (read Lib-DVM documentation). In the first rule the block size is calculated, not taken from AdistrParamArray). |
Task of redistributing the template on the processor system and evaluating the time of the redistribution.
double AMView::RdisAM( vector<long> AAxisArray,
vector<long>
ADistrParamArray, long ANewSign );
AAxisArray | | vector, element in j-th position is the number of the template dimension which is used in the mapping rule for (j+1)-th processor system dimension; |
ADistrParamArray | | ignored (only two mapping rules are provided (read Lib-DVM documentation). In the first rule the block size is calculated, not taken from AdistrParamArray); |
ANewSign | | flag of updating contents of the redistributed arrays, active if value is 1. |
Constructor of the Distributed array object
DArray::DArray( vector<long> ASizeArray,
vector<long> AlowShdWidthArray,
vector<long> AhiShdWidthArray, long ATypeSize );
ASizeArray | | vector, element in i-th position contains the size of the array being created, in dimension i+1 (0 £ i £ ARank1). |
AlowShdWidthArray | | vector, element in i-th position contains the width of the left boundary, in dimension i+1. |
AhiShdWidthArray | | vector, element in i-th position contains the width of the right boundary, in dimension i+1. |
ATypeSize | | size of one array element in bytes. |
Distributed array alignment
void DArray::AlnDA(AMView *APattern, vector<long>
AAxisArray,
vector<long>
ACoeffArray, vector<long> AConstArray );
void DArray::AlnDA(DArray *APattern, vector<long>
AAxisArray,
vector<long>
ACoeffArray, vector<long> AConstArray );
APattern | | pointer to the align pattern. |
AAxisArray | | vector, element in j-th position contains number of the index variable (number of the dimension) of the distributed array for the linear align rule of the (j+1)-th pattern dimension. |
ACoeffArray | | vector, element in j-th position contains the coefficient for the index variable of the distributed array in the linear align rule of the (j+1)-th pattern dimension. |
AConstArray | | vector, element in j-th position contains constant for the linear align rule of the (j+1)-th pattern dimension. |
Realignment of the distributed array. Evaluation of the time needed to perform this operation.
double DArray::RAlnDA( AMView *APattern, vector<long>
AAxisArray,
vector<long>
ACoeffArray, vector<long> AConstArray,
long
ANewSign );
double DArray::RAlnDA( DArray *APattern, vector<long>
AAxisArray,
vector<long>
ACoeffArray, vector<long> AConstArray,
long
ANewSign );
APattern | | pointer to the align pattern (array or template). |
AAxisArray | | vector, element in j-th position contains number of the index variable (number of the dimension) of the distributed array for the linear align rule of the (j+1)-th pattern dimension. |
ACoeffArray | | vector, element in j-th position contains the coefficient for the index variable of the distributed array in the linear align rule of the (j+1)-th pattern dimension. |
AConstArray | | vector, element in j-th position contains constant for the linear align rule of the (j+1)-th pattern dimension. |
ANewSign | | flag of updating contents of the redistributed array, active if value is 1. |
The function returns time of the array realignment.
Constructor of the Parallel loop object.
ParLoop::ParLoop( long ARank );
ARank | | rank of the parallel loop. |
Creation of the parallel loop.
void ParLoop::MapPL( AMView *APattern, vector<long>
AAxisArray,
vector<long>
ACoeffArray, vector<long> AConstArray,
vector<long>
AInitIndexArray,
vector<long>
ALastIndexArray, vector<long> AStepArray );
void ParLoop::MapPL( DArray *APattern, vector<long>
AAxisArray,
vector<long>
ACoeffArray, vector<long>AConstArray,
vector<long>AInitIndexArray,
vector<long>
ALastIndexArray, vector<long>AStepArray );
APattern | | pointer to the parallel loop pattern. |
AAxisArray | | vector, element in j-th position contains number of the index variable (number of the dimension) of the parallel loop for the linear align rule of the (j+1)-th pattern dimension. |
ACoeffArray | | vector, element in j-th position contains the coefficient for the index variable of the parallel loop in the linear align rule of the (j+1)-th pattern dimension. |
AConstArray | | vector, element in j-th position contains constant for the linear align rule of the (j+1)-th pattern dimension. |
AInitIndexArray | | vector, element in i-th position contains start value for the index variable of the (i+1)-th parallel loop dimension. |
ALastIndexArray | | vector, element in i-th position contains end value for the index variable of the (i+1)-th parallel loop dimension. |
AStepArray | | vector, element in i-th position contains step value for the index variable of the (i+1)-th parallel loop dimension. |
Parallel loop mapping.
void ParLoop::ExFirst( ParLoop *AParLoop, BoundGroup *ABoundGroup)ImLast;
AParLoop | | pointer to the parallel loop. |
ABoundGroup | | pointer to the group of bounds that must be exchanged after calculating the exported elements of the local parts of the distributed arrays. |
Set flag of changed order of loop iteration execution.
void ParLoop::ImLast( ParLoop *AParLoop, BoundGroup *ABoundGroup)ImLast;
AParLoop | | pointer to the parallel loop. |
ABoundGroup | | pointer to the group of bounds that must be exchanged after calculating the exported elements of the local parts of the distributed arrays. The function sets flag of changed order of loop iteration execution. |
Calculation of the time spent for exchanges while loading buffers by remote array elements
double ArrayCopy( DArray *AFromArray, vector<long>
AFromInitIndexArray,
vector<long>
AFromLastIndexArray,
vector<long>
AFromStepArray, DArray *AToArray,
vector<long>
AToInitIndexArray,
vector<long>
AToLastIndexArray,
vector<long>
AToStepArray, long ACopyRegim );
AFromArray | | pointer to the distributed array under reading. |
AFromInitIndexArray | | vector, element in i-th position contains start value for the index variable of the (i+1)-th dimension of the array under reading. |
AFromLastIndexArray | | vector, element in i-th position contains end value for the index variable of the (i+1)-th dimension of the array under reading. |
AFromStepArray | | vector, element in i-th position contains step value for the index variable of the (i+1)-th dimension of the array under reading. |
AToArray | | header of written distributed array. |
AToInitIndexArray | | vector, element in j-th position contains start value for the index variable of the (j+1)-th dimension of written distributed array. |
AToLastIndexArray | | vector, element in j-th position contains end value for the index variable of the (j+1)-th dimension of written distributed array. |
AToStepArray | | vector, element in j-th position contains step value for the index variable of the (j+1)-th dimension of written distributed array. |
ACopyRegim | | copy mode. |
The function returns required time.
Constructor of the Edge group object
BoundGroup::BoundGroup( );
Creation of the edge group. An empty edge group is created (does not contain any edge).
Add an array edges into the group.
void BoundGroup::AddBound( DArray *ADArray,
vector<long> ALeftBSizeArray,
vector<long>
ARightBSizeArray, long ACornerSign);
ADArray | | pointer to the distributed array. |
ALeftBSizeArray | | vector, element in i-th position contains width of the low edge of (i+1)-th dimension of the array. |
ARightBSizeArray | | vector, element in i-th position contains width of the high edge of (i+1)-th dimension of the array. |
ACornerSign | | flag of including corner elements in the edge. |
Calculation of the time spent for exchanges of distributed array edges included in the group.
double BoundGroup::StartB( );
The function returns required time.
Constructor of the Reduction variable object
RedVar::RedVar( long ARedElmSize, long ARedArrLength, long ALocElmSize);
AredElmSize | | size of one element of the reduction variable-array in bytes. |
ARedArrLength | | number of elements in the reduction variable-array. |
ALocElmSize | | size of one element of the array with auxiliary information in bytes. |
Constructor of the Reduction group object
RedGroup::RedGroup( VM *AvmPtr );
AvmPtr | | pointer to the processor system. |
Creation of the reduction group. An empty reduction group is created (does not contain any reduction variable).
Add reduction variable into the reduction group.
void RedGroup::AddRV( RedVar *ARedVar );
ARedVar | | pointer to the reduction variable. |
Calculation of the time spent for exchanges during reduction operation execution.
double RedGroup::StartR( ParLoop *AParLoop );
AParLoop | | pointer to the parallel loop in which values of reduction variables of the given group are calculated. |
Appendix 4. Trace fragments and parameters of Lib-DVM functions simulated by Predictor
CREATE AN ABSTRACT MACHINE REPRESENTATION
getamr_ 3.3 revision of pointers to element of abstract machine representation
AMRef getamr_ (AMViewRef *AMViewRefPtr, long IndexArray[]);
*AMViewRefPtr | | pointer to the abstract machine representation. |
IndexArray | | array, i-th element contains the index value of the requested element (abstract machine) on (i+1)-th dimension. |
call_getamr_ TIME=0.000000 LINE=6 FILE=tasks.fdv AMViewRefPtr=4dff90; AMViewRef=9024c0; IndexArray[0]=0; ret_getamr_ TIME=0.000000 LINE=6 FILE=tasks.fdv AMRef=903350;
MULTIPROCESSOR SYSTEMS
genblk_ Weights of multiprocessor system elements
long genblk_(PSRef *PSRefPtr, AMViewRef *AMViewRefPtr,
AddrType
AxisWeightAddr[], long *AxisCountPtr,
long
*DoubleSignPtr );
*PSRefPtr | | pointer to multiprocessor system, weights are set for elements of this system. |
*AMViewRefPtr | | pointer to the representation of multiprocessor system, weights of coordinates will be used while mapping the multiprocessor system on the given processor system. |
AxisWeightAddr[] | | weights of processor coordinates are defined for each dimension of processor system. |
*AxisCountPtr | | (nonnegative number) defines the number of elements in AxisWeightAddr array. |
*DoubleSignPtr | | non-zero flag of representation of processor weight coordinates as real positive numbers (double). |
call_genblk_ TIME=0.000000 LINE=7 FILE=gausgb.fdv PSRefPtr=4d4c48; PSRef=8417d0; AMViewRefPtr=4d4c60; AMViewRef=842860; AxisCount=1; DoubleSign=0 AxisWeightAddr[0][0] = 3 ret_genblk_ TIME=0.000000 LINE=7 FILE=gausgb.fdv
crtps_ 4.2 Create subsystem of the given multiprocessor system
PSRef crtps_ (PSRef *PSRefPtr, long InitIndexArray[], long
LastIndexArray[],
long
*StaticSignPtr);
*PSRefPtr | | pointer to the processor system (source), its subsystem is to be created. |
InitIndexArray | | array, i-th element contains the start value of the source processor system on (i+1)-th dimension. |
LastIndexArray | | array, i-th element contains the end value of the source processor system on (i+1)-th dimension. |
*StaticSignPtr | | flag of static subsystem creation. |
call_crtps_ TIME=0.000000 LINE=15 FILE=tasks.fdv PSRefPtr=4ded68; PSRef=902450; StaticSign=0; InitIndexArray[0]=0; LastIndexArray[0]=0; SizeArray[0]=1; CoordWeight[0]= 1.00(1.00) ret_crtps_ TIME=0.000000 LINE=15 FILE=tasks.fdv PSRef=903950;
psview_ 4.3 Reconfiguration of multiprocessor system
PSRef psview_ (PSRef *PSRefPtr, long *RankPtr, long
SizeArray[],
long
*StaticSignPtr);
*PSRefPtr | | pointer to the source processor system to be reconfigured. |
*RankPtr | | rank of resulting processor system. |
SizeArray | | array, i-th element contains the rank of resulting processor system on (i+1)-th dimension. |
*StaticSignPtr | | flag of static resulting processor system. |
call_psview_ TIME=0.000000 LINE=6 FILE=tasks.fdv PSRefPtr=4dff84; PSRef=901330; Rank=1; StaticSign=0; SizeArray[0]=1; SizeArray[0]=1; CoordWeight[0]= 1.00(1.00) ret_psview_ TIME=0.000000 LINE=6 FILE=tasks.fdv PSRef=902450;
MAPPING DISTRIBUTED ARRAY
getamv_ 7.8 revision of pointer to abstract machine representation, which the given distributed array is mapped on
AMViewRef getamv_ (long * ArrayHeader);
ArrayHeader | | header of the distributed array. |
call_getamv_ TIME=0.000000 LINE=16 FILE=tasks.fdv ArrayHeader=4dfee8; ArrayHandlePtr=903530; ret_getamv_ TIME=0.000000 LINE=16 FILE=tasks.fdv AMViewRef=0;
PROGRAM AS AN AGGREGATE OF SUBTASKS EXECUTED IN PARALLEL
mapam_ 10.1 Mapping an abstract machine (create subtask)
long mapam_ (AMRef *AMRefPtr, PSRef *PSRefPtr );
*AMRefPtr | | pointer to the abstract machine to be mapped. |
*PSRefPtr | | pointer to processor subsystem determining processors allocated for the abstract machine (domain of the created subtask execution). |
call_mapam_ TIME=0.000000 LINE=51 FILE=tsk_ra.cdv AMRefPtr=4b3cc0; AMRef=823210; PSRefPtr=4b3ec4; PSRef=8231a0; ret_mapam_ TIME=0.000000 LINE=51 FILE=tsk_ra.cdv
runam_ 10.2 Start of the subtask execution (activation, start)
long runam_ (AMRef *AMRefPtr);
*AMRefPtr | | pointer to the abstract machine of the started subtask. |
call_runam_ TIME=0.000000 LINE=102 FILE=tsk_ra.cdv AMRefPtr=4b3cc0; AMRef=823210; ret_runam_ TIME=0.000000 LINE=102 FILE=tsk_ra.cdv
stopam_ 10.3 End of execution of the current subtask (stop)
long stopam_ (void);
call_stopam_ TIME=0.000000 LINE=104 FILE=tsk_ra.cdv ret_stopam_ TIME=0.000000 LINE=104 FILE=tsk_ra.cdv
REDUCTION
strtrd_ 11.5 Start of reduction group
long strtrd_ (RedGroupRef *RedGroupRefPtr);
*RedGroupRefPtr | | pointer to reduction group. |
call_strtrd_ TIME=0.000000 LINE=129 FILE=tsk_ra.cdv RedGroupRefPtr=6ffcdc; RedGroupRef=8291f0; rf_MAX; rt_DOUBLE; RVAddr = 6ffd24; RVVal = 7.000000 ret_strtrd_ TIME=0.000000 LINE=129 FILE=tsk_ra.cdv
waitrd_ 11.6 Waiting for the reduction completion
long waitrd_ (RedGroupRef *RedGroupRefPtr);
*RedGroupRefPtr | | pointer to the reduction group. |
call_waitrd_ TIME=0.000000 LINE=129 FILE=tsk_ra.cdv RedGroupRefPtr=6ffcdc; RedGroupRef=8291f0; rf_MAX; rt_DOUBLE; RVAddr = 6ffd24; RVVal = 7.000000 rf_MAX; rt_DOUBLE; RVAddr = 6ffd24; RVVal = 7.000000 ret_waitrd_ TIME=0.000000 LINE=129 FILE=tsk_ra.cdv
DISTRIBUTED ARRAY EDGE EXCHANGE
recvsh_ 12.4 Initialization of receiving imported elements of the given edge group
long recvsh_(ShadowGroupRefPtr *ShadowGroupRefPtr);
*ShadowGroupRefPtr | | pointer to edge group. |
call_recvsh_ TIME=0.000000 LINE=20 FILE=sor.fdv ShadowGroupRefPtr=4cf6b8; ShadowGroupRef=8433c0; ret_recvsh_ TIME=0.000000 LINE=20 FILE=sor.fdv
sendsh_ 12.5 Initialization of sending imported elements of the given edge group
long sendsh_(ShadowGroupRefPtr *ShadowGroupRefPtr);
*ShadowGroupRefPtr | | pointer to edge group. |
call_sendsh_ TIME=0.000000 LINE=29 FILE=sor.fdv ShadowGroupRefPtr=4cf6b8; ShadowGroupRef=8433c0; ret_sendsh_ TIME=0.000000 LINE=29 FILE=sor.fdv
REGULAR ACCESS TO REMOTE DATA
crtrbl_ 14.1 Create buffer of distributed array remote elements
long crtrbl_(long RemArrayHeader[], long BufferHeader[],
void *BasePtr,
long
*StaticSignPtr, LoopRef *LoopRefPtr, long AxisArray[],
long
CoeffArray[], long ConstArray[]);
RemArrayHeader | | header of remote distributed array. |
BufferHeader | | header of the buffer for remote elements. |
BasePtr | | base pointer for access to the buffer of remote elements. |
*StaticSignPtr | | flag of static buffer creation. |
*LoopRefPtr | | pointer to the parallel loop, where remote array elements from the buffer are required. |
AxisArray | | array, i-th element contains dimension number of the parallel loop (k(i+1)), corresponding to (i+1)-th dimension of the remote array. |
CoeffArray | | array, i-th element contains coefficient of the index variable of linear retrieval rule for (i+1)-th dimension of the remote array A(i+1). |
ConstArray | array, i-th element contains constant of linear retrieval rule for (i+1)-th dimension of the remote array B(i+1). |
call_crtrbl_ TIME=0.000000 LINE=45 FILE=tasks.fdv RemArrayHeader=4dfd2c; RemArrayHandlePtr=9057c0; BufferHeader=4dfd48; BasePtr=4e1200; StaticSign=1; LoopRefPtr=4dffd0; LoopRef=906b70; AxisArray[0]=1; AxisArray[1]=0; CoeffArray[0]=1; CoeffArray[1]=0; ConstArray[0]=-1; ConstArray[1]=1; SizeArray[0]=8; LowShdWidthArray[0]=0; HiShdWidthArray[0]=0; Local[0]: Lower=0 Upper=7 Size=8 Step=1 ret_crtrbl_ TIME=0.000000 LINE=45 FILE=tasks.fdv BufferHandlePtr=906e70; IsLocal=1
loadrb_ 14.2 Start of loading the buffer of distributed array remote elements
long loadrb_ (long BufferHeader[], long *RenewSignPtr);
BufferHeader | | header of remote element buffer. |
*RenewSignPtr | | flag of repeated reloading of the buffer, which has already been loaded. |
call_loadrb_ TIME=0.000000 LINE=45 FILE=tasks.fdv BufferHeader=4dfd48; BufferHandlePtr=906e70; RenewSign=0; FromInitIndexArray[0]=0; FromInitIndexArray[1]=1; FromLastIndexArray[0]=7; FromLastIndexArray[1]=1; FromStepArray[0]=1; FromStepArray[1]=1; ToInitIndexArray[0]=0; ToLastIndexArray[0]=7; ToStepArray[0]=1; ResInitIndexArray[0]=0; ResInitIndexArray[1]=1; ResLastIndexArray[0]=7; ResLastIndexArray[1]=1; ResStepArray[0]=1; ResStepArray[1]=1; ResInitIndexArray[0]=0; ResLastIndexArray[0]=7; ResStepArray[0]=1; ret_loadrb_ TIME=0.000000 LINE=45 FILE=tasks.fdv
waitrb_ 14.3 Waiting for completion of loading buffer of distributed array remote elements
long waitrb_ (long BufferHeader[]);
BufferHeader | | header of remote element buffer. |
call_waitrb_ TIME=0.000000 LINE=45 FILE=tasks.fdv BufferHeader=4dfd48; BufferHandlePtr=906e70; ret_waitrb_ TIME=0.000000 LINE=45 FILE=tasks.fdv
crtbg_ 14.6 Create group of remote element buffers
RegularAccessGroupRef crtbg_(long *StaticSignPtr, long * *DelBufSignPtr );
*StaticSignPtr | | flag of static buffer group creation. |
*DelBufSignPtr | | flag of deleting all buffers from the group while the group deleting. |
call_crtbg_ TIME=0.000000 LINE=43 FILE=tasks.fdv StaticSign=0; DelBufSign=1; ret_crtbg_ TIME=0.000000 LINE=43 FILE=tasks.fdv RegularAccessGroupRef=906310;
insrb_ Insert remote element buffer in the group
long insrb_(RegularAccessGroupRef *RegularAccessGroupRefPtr, long BufferHeader[]);
*RegularAccessGroupRefPtr | | pointer to the buffer group. |
BufferHeader | | header of the buffer to be inserted. |
call_insrb_ TIME=0.000000 LINE=45 FILE=tasks.fdv RegularAccessGroupRefPtr=4e1210; RegularAccessGroupRef=906310; BufferHeader=4dfd48; BufferHeader[0]=906e70 ret_insrb_ TIME=0.000000 LINE=45 FILE=tasks.fdv
loadbg_ Start of loading remote element buffers of the given group
long loadbg_(RegularAccessGroupRef long *RegularAccessGroupRefPtr, *RenewSignPtr);
*RegularAccessGroupRefPtr | | pointer to the buffer group. |
*RenewSignPtr | | flag of repeated reloading of the buffer group, which has already been loaded. |
call_loadbg_ TIME=0.000000 LINE=43 FILE=tasks.fdv RegularAccessGroupRefPtr=4e1210; RegularAccessGroupRef=906310; RenewSign=1 FromInitIndexArray[0]=0; FromInitIndexArray[1]=1; FromLastIndexArray[0]=7; FromLastIndexArray[1]=1; FromStepArray[0]=1; FromStepArray[1]=1; ToInitIndexArray[0]=0; ToLastIndexArray[0]=7; ToStepArray[0]=1; ResInitIndexArray[0]=0; ResInitIndexArray[1]=1; ResLastIndexArray[0]=7; ResLastIndexArray[1]=1; ResStepArray[0]=1; ResStepArray[1]=1; ResInitIndexArray[0]=0; ResLastIndexArray[0]=7; ResStepArray[0]=1; FromInitIndexArray[0]=0; FromInitIndexArray[1]=3; FromLastIndexArray[0]=7; FromLastIndexArray[1]=3; FromStepArray[0]=1; FromStepArray[1]=1; ToInitIndexArray[0]=0; ToLastIndexArray[0]=7; ToStepArray[0]=1; ResInitIndexArray[0]=0; ResInitIndexArray[1]=3; ResLastIndexArray[0]=7; ResLastIndexArray[1]=3; ResStepArray[0]=1; ResStepArray[1]=1; ResInitIndexArray[0]=0; ResLastIndexArray[0]=7; ResStepArray[0]=1; ret_loadbg_ TIME=0.010000 LINE=43 FILE=tasks.fdv
waitbg_ Waiting for the loading completion of the remote element buffers of the given group
long waitbg_ (RegularAccessGroupRef *RegularAccessGroupRefPtr);
*RegularAccessGroupRefPt | | pointer to the buffer group. |
call_waitbg_ TIME=0.000000 LINE=45 FILE=tasks.fdv RegularAccessGroupRefPtr=4e1210; RegularAccessGroupRef=906310; ret_waitbg_ TIME=0.000000 LINE=45 FILE=tasks.fdv
1. V.E.Denisov, V.N.Iliakov, N.V.Kovaleva, V.A.Krukov. Debugging DVM-program efficiency. Keldysh Institute of Applied Mathematics, Russian Academy of Science. Preprint N74, 1998.