# Tutorial 2: Understanding the multi-block structure

The code structure of Palabos programs is driven by the duality between atomic-blocks, which represent regular data-arrays, and complex multi-blocks. Thanks to a practically identical interface, they appear to the user on a seemingly equal footing. In reality, however, there exists a hierarchical relationship bewteen them. The multi-block is actually a composite construct of several adjacent blocks and uses itself atomic-blocks for its internal implementation. Atomic- and multi-blocks come in three flavors: the (multi-) block-lattice which holds lattice Boltzmann variables, the (multi-) scalar-field for scalar data-arrays, and the (multi-) tensor-field for vector- or tensor-valued data-arrays.

Tutorial 2.1 exploits the similarity of interface between atomic-blocks and multi-blocks, and rewrites a program from Tutorial 1, replacing the multi block-lattice by a plain block-lattice. Tutorial 2.2 provides further insight in the multi-block and gives the user the possibility to construct such an object manually by explicitly specifying the position of the consisting atomic-blocks. In tutorial 2.3, a multi-block (or for all means, an atomic-block) is manipulated through the explicit use of so-called data processors. They provide explicit access to atomic-block cells inside a multi-block, even if the internal structure of the multi-block is unknown to the user. Tutorial 2.4 finally explains how a program is parallelized (again manually for didactic purposes) by attributing the components of a multi-block to different threads.

Warning

The purpose of Tutorial 2 is to provide deeper insight into the mechanisms of Palabos, and not to establish good coding practice. As already mentioned, you should in practice always prefer multi-blocks over atomic-blocks at the end-user level, in spite of the counter-example shown in Tutorial 2.1.

## Tutorial 2.1: Formulating a program with an atomic-block

In Tutorial 1.5: Boundary conditions, a simulation for a 2D Poiseuille flow was presented, which used the data structure MultiBlock2D to hold the data. This is the right way of doing, because it is recommended to use multi-block structures for practically all purposes in end-user programs. Strictly speaking, a non-parallel version of this program could however also be coded using a simpler data structure, because of the simple, rectangular shape of the domain. Such a conversion is straightforward in Palabos, because most of the code is generic and works identically for atomic-blocks and multi-blocks. As it can be seen on the following code, also found in the file tutorial_2_1.cpp, it is sufficient to replace the keyword MultiBlockLattice2D by BlockLattice2D at two places:

1. /* Code 2.1 in the Palabos tutorial

2. */

3.

4. // ... Definition of the Poiseuille profile, and instantiation of the geometry are

5. // exactly identical with a multi-block or an atomic-block

6.

7.

8. // Multi-Block version of tutorial 1.5:

9. // void writeGifs(MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)

10. // { ...

11.

12. // Atomic-Block version of tutorial 2.1:

13. void writeGifs(BlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)

14. {

15. // ...

16.

17. int main(int argc, char* argv[])

18. {

19. // ... Definition of the parameters are identical in both versions

20.

21. // Multi-Block version of tutorial 1.5:

22. // MultiBlockLattice2D<T, DESCRIPTOR> lattice (

23. // parameters.getNx(), parameters.getNy(),

24. // new BGKdynamics<T,DESCRIPTOR>(parameters.getOmega()) );

25.

26. // Atomic-Block version of tutorial 2.1:

27. BlockLattice2D<T, DESCRIPTOR> lattice (

28. parameters.getNx(), parameters.getNy(),

29. new BGKdynamics<T,DESCRIPTOR>(parameters.getOmega()) );

30.

31. // ... Creating the initial condition and running the simulation is

32. // identical in both versions

33. }

Although very little has changed on the surface, the algorithm instantiated in the above code is much simpler than in the multi-block case. You can view the BlockLattice2D as a simple nx-by-ny-by-9 value array, just as you would use it in a straightforward example lattice Boltzmann code.

As you progress with your work with Palabos, you will learn to appreciate the abstraction mechanism through which the complex multi-block behaves like a regular construct. In practical work, it offers a relatively simple way to cope with complicated constructs.

## Tutorial 2.2: Creating a multi-block structure manually

One of the uses of the multi-block consists in the memory-efficient implementation of sparse domains. To understand this, consider the following 2D fluid problem:

The fluid is confined in a channel shaped like a half-circle. Pressure boundary conditions are used with an appropriate difference between inlet and outlet, responsible for driving the fluid through the channel. It is obviously memory wasting to represent this problem by a regular matrix structure, because a large area in the interior of the half-circle is unused. This is therefore a typical candidate for a sparse domain implementation. While the sparse structure of a multi-block is usually automatically created by an appropriate tool, it is in the following created manually for didactical reasons. We choose an approach in which the channel is covered by three block, while the fourth block, in the center of the half-circle, is left void, as indicated by the black area on the following image:

The construction of this sparse domain is done in the code tutorial_2_2.cpp, it is integrally printed here (without the usual header lines):

1. /// Describe the geometry of the half-circular channel, used in tutorial 2.

2. template<typename T>

3. class BounceBackNodes : public DomainFunctional2D {

4. public:

6. : cx(N/2),

7. cy(N/2),

9. outerR(N/2)

10. { }

11. /// Return true for all cells outside the channel, on which bounce-back

12. /// dynamics must be instantiated.

13. virtual bool operator() (plint iX, plint iY) const {

14. T rSqr = util::sqr(iX-cx) + util::sqr(iY-cy);

15. return rSqr <= innerR*innerR || rSqr >= outerR*outerR;

16. }

17. virtual BounceBackNodes<T>* clone() const {

18. return new BounceBackNodes<T>(*this);

19. }

20. private:

21. plint cx; //< X-position of the center of the half-circle.

22. plint cy; //< Y-position of the center of the half-circle.

23. plint innerR; //< Outer radius of the half-circle.

24. plint outerR; //< Inner radius of the half-circle.

25. };

26.

27.

28. void halfCircleSetup (

29. MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius,

30. OnLatticeBoundaryCondition2D<T,DESCRIPTOR>& boundaryCondition )

31. {

32. // The channel is pressure-driven, with a difference deltaRho

33. // between inlet and outlet.

34. T deltaRho = 1.e-2;

35. T rhoIn = 1. + deltaRho/2.;

36. T rhoOut = 1. - deltaRho/2.;

37.

38. Box2D inlet (0, N/2, N/2, N/2);

39. Box2D outlet(N/2+1, N, N/2, N/2);

40.

43.

44. // Specify the inlet and outlet density.

45. setBoundaryDensity (lattice, inlet, rhoIn);

46. setBoundaryDensity (lattice, outlet, rhoOut);

47.

48. // Create the initial condition.

49. Array<T,2> zeroVelocity(0.,0.);

50. T constantDensity = (T)1;

51. initializeAtEquilibrium (

52. lattice, lattice.getBoundingBox(), constantDensity, zeroVelocity );

53.

54. defineDynamics(lattice, lattice.getBoundingBox(),

56. new BounceBack<T,DESCRIPTOR>);

57.

58. lattice.initialize();

59. }

60.

61. void writeGifs(MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)

62. {

63. const plint imSize = 600;

64. ImageWriter<T> imageWriter("leeloo");

65. imageWriter.writeScaledGif(createFileName("u", iter, 6),

66. *computeVelocityNorm(lattice),

67. imSize, imSize );

68. }

69.

70. int main(int argc, char* argv[]) {

71. plbInit(&argc, &argv);

72.

73. global::directories().setOutputDir("./tmp/");

74.

75. // Parameters of the simulation

76. plint N = 400; // Use a 400x200 domain.

77. plint maxT = 20001;

78. plint imageIter = 1000;

79. T omega = 1.;

81.

82. // Parameters for the creation of the multi-block.

83.

84. // d is the width of the block which is exempted from the full domain.

85. plint d = (plint) (2.*std::sqrt(util::sqr(radius)-util::sqr(N/4.)));

86. plint x0 = (N-d)/2 + 1; // Begin of the exempted block.

87. plint x1 = (N+d)/2 - 1; // End of the exempted block.

88.

89. // Create a block distribution with the three added blocks.

90. plint envelopeWidth = 1;

91. SparseBlockStructure2D sparseBlock(N+1, N/2+1);

92. sparseBlock.addBlock(Box2D(0, x0, 0, N/2), sparseBlock.nextIncrementalId());

93. sparseBlock.addBlock(Box2D(x0+1, x1-1, 0, N/4+1), sparseBlock.nextIncrementalId());

94. sparseBlock.addBlock(Box2D(x1, N, 0, N/2), sparseBlock.nextIncrementalId());

95.

96. // Instantiate the multi-block, based on the created block distribution and

97. // on default parameters.

98. MultiBlockLattice2D<T, DESCRIPTOR> lattice (

99. MultiBlockManagement2D(

100. blockDistribution,

102. defaultMultiBlockPolicy2D().getBlockCommunicator(),

103. defaultMultiBlockPolicy2D().getCombinedStatistics(),

104. defaultMultiBlockPolicy2D().getMultiCellAccess<T,DESCRIPTOR>(),

105. new BGKdynamics<T,DESCRIPTOR>(omega)

106. );

107.

108. pcout << getMultiBlockInfo(lattice) << std::endl;

109.

110. OnLatticeBoundaryCondition2D<T,DESCRIPTOR>*

111. boundaryCondition = createLocalBoundaryCondition2D<T,DESCRIPTOR>();

112.

114.

115. // Main loop over time iterations.

116. for (plint iT=0; iT<maxT; ++iT) {

117. if (iT%imageIter==0) {

118. pcout << "Saving Gif at time step " << iT << endl;

119. writeGifs(lattice, iT);

120. }

121. lattice.collideAndStream();

122. }

123.

124. delete boundaryCondition;

125. }

Among other information, the program prints the following lines to the screen, as a result of the instruction on line 108:

Size of the multi-block: 401-by-201

Number of atomic-blocks: 3

Smallest atomic-block: 172-by-102

Largest atomic-block: 115-by-20

Number of allocated cells: 0.063573 millio

Fraction of allocated cells: 78.8737 percent

This behavior, as well as other parts of the code, are commented in the following:

Line 3The half-circle shape of the simulation is described by the function class BounceBackNodes. Later in the program, this object is use to define all cells outside the cell as bounce-back nodes, and inside the channel as BGK dynamics nodes. The mechanisms involved here are further investigated in Tutorial 2.3 (technically, we will refer to class BounceBackNodes as a data-processor), while the present tutorial concentrates on sparse domain implementations.Line 85-87In this part, the coordinates of the three blocks which cover the channel are computed.Line 91A MultiBlockDistribution2D describes the arrangement of atomic-blocks inside a multi-block. Three blocks are added. Each of them gets an integer ID, which in this case is incremental (number 0, 1, an 2). As explained in Tutorial 2.4: Generate the sparse-block structure automatically, this ID is used in parallel programs to associate a block to a MPI process.Line 98Instead of the usual default constructor of the MultiBlockLattice2D, a more detailed version is used which offers the possibility to control various aspects of the multi-block instantiation, and in particular to specify the arrangement of the internal atomic-blocks. The other arguments of this constructors are not discussed here. As it is shown in the code, one can always refer to the object returned by defaultMultiBlockPolicy2D() to use a default choice for these parameters. The argument envelopeWidth=1 hints at the fact that the dynamics executed on this lattice uses a nearest-neighbor communication pattern, and an overlap of one-cell width between atomic-blocks is needed to express the streaming step inside an atomic-block consistently.Line 108The getMultiBlockInfo function provides some insight in the internal structure of a multi-block. You learn for example that the block added in the middle has dimensions 172-by-102, while the two later blocks have size 115-by-201. Switching from a regular to a sparse block structure obviously provided a gain of 21% in memory usage.

## Tutorial 2.3: Understanding data processors

Both atomic-blocks and multi-blocks are most often manipulated with help of constructs known under the name of “data processor” in Palabos. A data processor specifies an operation to be executed on each cell of a chosen domain on the block. In case a multi-block is used, the desired domain is intersected with the region occupied by the internal atomic-blocks, and then executed on the area of intersection of each atomic-block. Data processors are often used indirectly, through higher-level user functions. In the code of Tutorial 2.2 for example, bounce-back nodes are defined by calling the function defineDynamics. This function in its turn instantiates a data processor, responsible for re-defining the collision step on all the chosen nodes. Other types of data processors are used for example to attribute initial values to the simulation, define boundary conditions, or post-process data. The user functions offered in Palabos for all these operations are listed in the appendix of the user guide. It should also be mentioned that data processors can act simultaneously on several blocks, and in this way create a coupling between two block-lattices, or between a block-lattice and a scalar-field or tensor-field. This behavior is for example used to compute the velocity in a block-lattice and store the result in a three-component tensor-field, or to create the coupling between two lattices for the implementation of multi-component fluids.

In order to obtain a better understanding of how such a data processor works, the call to the function defineDynamics in the previous tutorial is now replaced successively by two alternative, explicit ways of attributing a BounceBack dynamics to the chosen wall cells. All discussed code constructs can also be found in the file tutorial_2_3.cpp.

The first approach is easiest to understand, because it is entirely manual. A loop over all space directions is written manually and, with help of the function class BounceBackNodes from the previous tutorial, it is decided to re-define the dynamics of chosen nodes through the interface of the multi-block:

1. /// Manual instantiation of the bounce-back nodes for the boundary.

2. /** Attention: This is NOT the right way of doing, because it is slow and

3. * non-parallelizable. Instead, use a data-processor.

4. */

5. void createHalfCircleManually (

6. MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )

7. {

9. for (plint iX=0; iX<=N; ++iX) {

10. for (plint iY=0; iY<=N/2; ++iY) {

11. if (domain(iX,iY)) {

12. defineDynamics(lattice, iX, iY, new BounceBack<T,DESCRIPTOR>);

13. }

14. }

15. }

16. }

This approach is really self-explaining. It is however crucial to understand that such an approach should never be chosen in practice, for efficiency considerations. This approach is slow, because for each access to a cell through the function defineDynamics, the atomic-block on which the cell is located must first be determined. Even worse yet, this way of accessing a multi-block is not properly parallelized (the result of the operation is correct in parallel, but the speed of the operation is equal or even inferior to the serial execution speed). As it is shown in the next tutorial, parallelism is implemented in Palabos by assigning each atomic-block inside a multi-block to a given processor (CPU). During operations that are executed on an extended domain, such as, the definition of the collision rule to a given part of the simulation, it is expected that each processor works only with the atomic-block assigned to it. Writing out explicitly a loop over space directions however means to assign work for the whole domain to all processors. This can impact the parallel performance of a program significantly, even if it is only part of the setup of a simulation. In programs which are massively parallelized on hundreds or thousands of processors/cores, a badly parallelized initial stage can represent a bottleneck for the whole simulation. One of the most important rules in Palabos is therefore to never write loops over space dimensions in end-user applications. Instead, space loops are only written inside data processors, as shown below.

Here’s how you can manually write a data processor which creates the half-circle domain, without resorting to the helper function defineDynamics (the function defineDynamics is of course nothing else than a wrapper which instantiates a data processor for you):

1. /// This functional defines a data processor for the instantiation

2. /// of bounce-back nodes following the half-circle geometry.

3. template<typename T, template<typename U> class Descriptor>

4. class HalfCircleInstantiateFunctional

5. : public BoxProcessingFunctional2D_L<T,Descriptor>

6. {

7. public:

10. { }

11. virtual void process(Box2D domain, BlockLattice2D<T,Descriptor>& lattice) {

13. Dot2D relativeOffset = lattice.getLocation();

14. for (plint iX=domain.x0; iX<=domain.x1; ++iX) {

15. for (plint iY=domain.y0; iY<=domain.y1; ++iY) {

16. if (bbDomain(iX+relativeOffset.x,iY+relativeOffset.y)) {

17. lattice.attributeDynamics (

18. iX,iY, new BounceBack<T,DESCRIPTOR> );

19. }

20. }

21. }

22. }

23. virtual void getTypeOfModification(std::vector<modif::ModifT>& modified) const {

24. modified[0] = modif::dynamicVariables;

25. }

26. virtual HalfCircleInstantiateFunctional<T,Descriptor>* clone() const

27. {

28. return new HalfCircleInstantiateFunctional<T,Descriptor>(*this);

29. }

30. private:

31. plint N;

33. };

34.

35. /// Automatic instantiation of the bounce-back nodes for the boundary,

36. /// using a data processor.

37. void createHalfCircleFromDataProcessor (

38. MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )

39. {

40. applyProcessingFunctional (

42. lattice.getBoundingBox(), lattice );

43. }

## Tutorial 2.4: Generate the sparse-block structure automatically

As previously pointed out, you will practically never want to compute the internal structure of a sparse multi-block manually, and instead use one of Palabos’ automatic domain generators. For example, the Palabos application in the directory examples/showCases/aneurysm shows how you can automatically read a geometry description from an STL file, “voxelize” the domain (decide which cells are part of the fluid), and then generate the sparse structue. By the way, Palabos’ automatic domain generation algorithms go even a step further than what is done in the present tutorial, as they also instantiate sophisticated algorithms for curved walls, in which the wall location is represented with second-order accuracy (while in this tutorial the wall is represented through a stair-case shape).

Anyway, let’s get back to tutorial 2.2, and modify the code in such a way that the sparse block structure is generated automatically by Palabos:

1. ...

2.

3. template<typename T>

4. class FluidNodes {

5. public:

7. { }

8. bool operator() (plint iX, plint iY) const {

10. }

11. private:

13. };

14.

15. ...

16.

17. MultiScalarField2D<int> flagMatrix(N+1, N/2+1);

19. plint blockSize = 15;

20. plint envelopeWidth = 1;

21. MultiBlockManagement2D sparseBlockManagement =

22. computeSparseManagement (

23. *plb::reparallelize(flagMatrix, blockSize,blockSize),

24. envelopeWidth );

25.

26. // Instantiate the multi-block, based on the created block distribution and

27. // on default parameters.

28. MultiBlockLattice2D<T, DESCRIPTOR> lattice (

29. sparseBlockManagement,

30. defaultMultiBlockPolicy2D().getBlockCommunicator(),

31. defaultMultiBlockPolicy2D().getCombinedStatistics(),

32. defaultMultiBlockPolicy2D().getMultiCellAccess<T,DESCRIPTOR>(),

33. new BGKdynamics<T,DESCRIPTOR>(omega)

34. );

Line 3-13In order to create the sparse block structure, Palabos needs to know which lattice cells are part of the fluid. A specific functional is implemented here which returns this information, which is, the negation of BounceBackNodes.Line 17 and 18An integer-valued flag matrix is constructed which is zero valued on the bounce-back domain, and non-zero on fluid nodes.Line 19Let’s cover the domain with blocks of size (approximately) 15x15. The smaller these blocks, and the more memory efficient a simulation you get. If the blocks are chosen too small, you loose however efficiency, as the overhead of communicating between blocks is too large. The optimal atomic block size depends on your hardware and must be determined experimentally.Line 23To start with, the multi-block is restructured to be covered by blocks of size 15x15.Line 22As a next step, all blocks that have no fluid cell are removed. The result is saved in a data structure of type MultiBlockManagementLine 28The created multi-block management structure is now used to create a block-lattice with the sparse block-structure.As it can be seen from the program’s output, this code is more efficient than the one in tutorial 2.2, as it removes 45% of the cells in the original multi-block (against 11% in tutorial 2.2), due to a more efficient coverage by small blocks. The following image shows, in red, the domain for which memory is allocated in this program:

## Tutorial 2.5: Parallelism in a manually created multi-block

If you use multi-blocks as your basic data type, if you construct them in a standard way, as in all examples of Tutorial 1, and if you compile the program with the MPI_PARALLEL flag, then the program is automatically parallelized. This means that the multi-block is automatically subdivided into components which are associated to the different processors.

We leave it as an exercice to modify the code of tutorial 2.2 in such a way that it can be executed on exactly two MPI processes, by assigning two blocks to the first process, and the third block to the second process.

0 条评论

• ### 外部数据插值到fluent变量中

3.19.2. Format of the Interpolation File

• ### Computational Geometry in Python

This post is a simplified version of the accompanying notebook to chapter 6 of m...

• ### Palabos Tutorial 1/3: First steps with Palabos

The following tutorial provides an overview of the C++ programmer interface of t...

• ### From High Ceph Latency to Kernel Patch with eBPF/BCC

There are a lot of tools for debugging kernel and userspace programs in Linux. M...

• ### SAP S/4HANA里如何创建Customer主数据以及执行后续处理

1, Launch tcode: BP and select the Organization 2, Maintain the information for...

• ### 关于db_files和maxdatafiles的问题(r4笔记第31天)

昨天在做生产监控的时候发现有个库的表空间不够了，就发邮件给客户的dba去处理，但是得到的反馈是尝试添加的时候发现已经超过了数据文件的最大数限制。这个错误毫无疑问...

• ### 论综合 | 是什么让一个数字前端实现硅农开始学习Floorplan 的？

如题，是什么让一个数字前端实现硅农开始学习Floorplan 的？是制造工艺的进步，是实现方法学的被迫更新，是养家糊口生的本能，正可谓：头发落完终不悔，为伊消得...

• ### VS.Net 2005 Design-Time Integration

Introduction This article provides an overview of the VS.NET 2005 Design-Time I...

• ### 对称排序

In your job at Albatross Circus Management (yes, it's run by a bunch of clowns),...